https://bugzilla.wikimedia.org/show_bug.cgi?id=45974

--- Comment #1 from Ariel T. Glenn <[email protected]> ---
Instead of parsing the XML, it would be better if you download the file of md5
sums (which you will want anyways to verify the files just downloaded).  In the
above example this would be at
http://dumps.wikimedia.org/enwiki/20130204/enwiki-20130204-md5sums.txt
The format is pretty boring and therefore good for machines: md5sum, space,
filename.  That format is not expected to change anytime soon, and if it were
to change I am sure there would be a giant dicussion about it on the various
lists.

Assumning that you know which type of file you want (pages-meta-history,
stub-articles, etc) you can check for the existence in the md5 file of
enwiki-date-filestring.xml.{gz,bz2,7z} and grab the compressed file of your
choice if it's there.  Otherwise look for
enwiki-date-filestring[0-9+].xml.{gz,bz2,7z} andf get those; if you don't see
those, look for enwiki-date-filestring[0-9+].xml*{gz,bz,7z} and get those
instead.

I think there are tools out there already for scripted download, you might poke
folks on the xmldatadumps-l list about that.

As an aside, it's quite likely that we will go to multipart soon for a few of
the other large projects since they take so long to complete running them as
one single job.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to