Those of you following along will notice that dewiki and wikidatawiki have
more files than usual for the page content dumps (pages-meta-history).
We'll have more of this going forward; if I get the work done in time,
starting April we'll split up these jbos ahead of time into small files
that can be rerun right then when they fail, rather than waiting for
MediaWiki to split up the output based on run time and wait for a set of MW
jobs to complete before retrying failures.  This will mean more resiliency
when dbs are pulled out of the pool for various reasons (schema changes,
upgrades, etc).

Later on during the current run, I hope we will see dumps of magic words
and namespaces, provided as json files.  Let me put it this way: the code
is tested and deployed, now we shall see.

At this very moment, status of a given dump can be retrieved via a file in
the current run directory: <wikiname>/20170320/dumpstatus.json  These files
are updated frequently during the run.  You can also get the status of all
current runs at https://dumps.wikimedia.org/index.json  Thanks to Hydriz
for the idea on how a status api could be implemented cheaply.  This will
probably need some refinement, but feel free to play.  More information at
https://phabricator.wikimedia.org/T147177

The last of the UI updates went live, thanks to Ladsgroup for all of those
fixups.  It's nice to enter the new century at last :-)

And finally, we moved all the default config info out into a yaml file
(Thanks to Adam Wight for the first version of that changeset).  There were
a couple hiccups with that, which resulted in my starting the en wikipedia
run manually for this run, though via the standard script.

Happy trails,

Ariel
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to