https://bugzilla.wikimedia.org/show_bug.cgi?id=45974
--- Comment #1 from Ariel T. Glenn <[email protected]> --- Instead of parsing the XML, it would be better if you download the file of md5 sums (which you will want anyways to verify the files just downloaded). In the above example this would be at http://dumps.wikimedia.org/enwiki/20130204/enwiki-20130204-md5sums.txt The format is pretty boring and therefore good for machines: md5sum, space, filename. That format is not expected to change anytime soon, and if it were to change I am sure there would be a giant dicussion about it on the various lists. Assumning that you know which type of file you want (pages-meta-history, stub-articles, etc) you can check for the existence in the md5 file of enwiki-date-filestring.xml.{gz,bz2,7z} and grab the compressed file of your choice if it's there. Otherwise look for enwiki-date-filestring[0-9+].xml.{gz,bz2,7z} andf get those; if you don't see those, look for enwiki-date-filestring[0-9+].xml*{gz,bz,7z} and get those instead. I think there are tools out there already for scripted download, you might poke folks on the xmldatadumps-l list about that. As an aside, it's quite likely that we will go to multipart soon for a few of the other large projects since they take so long to complete running them as one single job. -- You are receiving this mail because: You are on the CC list for the bug. You are watching all bug changes. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
