Hi all,

I wanted to flag an update from the Wikimedia Data Engineering team in case
it’s relevant to your work, especially for those of you who may rely on
Wikimedia XML dumps for research.

In short:

   - A new set of MediaWiki Content File Exports is now available, providing
   unparsed content from Wikimedia’s public wikis in XML format.
   - There are two monthly datasets:
      - mediawiki_content_history:
      https://dumps.wikimedia.org/other/mediawiki_content_history/ -
Full revision
      history for all pages
      - mediawiki_content_current:
      https://dumps.wikimedia.org/other/mediawiki_content_current/ -
Latest revision
      only for each page
   - This change was made because the legacy dump infrastructure at
   https://dumps.wikimedia.org/backup-index.html has struggled to reliably
   generate XML exports for larger wikis.
   - The older XML dump pipeline is now considered deprecated, though SQL
   dumps will continue and some legacy generation may persist temporarily.

You can read the full announcement here:
https://lists.wikimedia.org/hyperkitty/list/[email protected]/thread/E6D5EU4PMSTSOI2J7A46HJ3YW2W554CS/,
and view the full documentation at:
https://wikitech.wikimedia.org/wiki/MediaWiki_Content_File_Exports.

Best,
Kinneret

-- 

Kinneret Gordon

Lead Research Community Officer

Wikimedia Foundation <https://wikimediafoundation.org/>

*Learn more about Wikimedia Research <https://research.wikimedia.org/>*
_______________________________________________
Wiki-research-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to