The individually numbered files change sizes radically because I'm moving around start and end points. You can ignore that.
I am looking at piece 10 however to see why it's smaller: ah. I have a typo in the size for that one, I asked for only 200000 pages to go in it instead of the 240000 I intended :-D And so that's all that went in (minus deleted pages). Nothing's missing though; anything "extra" winds up in the last piece (15). You can look at the stub files to verify that. FWIW we'll be juggling the number of pages per chunk on a regular basis. Ariel Στις 29-03-2011, ημέρα Τρι, και ώρα 17:08 -0700, ο/η Jamie Morken έγραψε: > Hi all, > > Congrats Ariel! :) The sum of pages-meta-history files for the last > two enwiki dumps are 342.7GB for the 20110115 dump and 353.5GB for the > 20110317 dump, which shows that the overall dump size grew over 2 > months. Seven of the individually numbered pages-meta-history files > reduced in size while eight increased in size from 20110115 to > 20110317. By far the biggest decrease was the > pages-meta-history10.xml.bz2 file which dropped from 18.7GB down to > 1.9GB. I think there is probably missing revisions in that page ID > range. > > Here are some historical dumps sizes for comparison to show the growth > of these files: > > enwiki-20060816-pages-meta-history.xml.7z 5.08GB > enwiki-20070402-pages-meta-history.xml.7z 11.3GB (229 days since > previous dump) > enwiki-20080103-pages-meta-history.xml.7z 17.2GB (276 days since > previous dump) > enwiki-20100130-pages-meta-history.xml.7z 31.8GB (758 days since > previous dump) > enwiki-20110115-pages-meta-history[1-15].xml.7z 38.0GB (350 days since > previous dump) > enwiki-20110115-pages-meta-history[1-15].xml.7z (7z compression in > progress) > > Here's a graph of this data showing the dump file size growth seems to > be pretty linear: > (chart x-axis starts from 20060816 dump and ends at 20110115 dump) > "http://nekrom.com/wikipedia/enwiki%20history%20dump%20file%20size% > 20over%20time.png" > > cheers, > Jamie > > > ----- Original Message ----- > From: "Ariel T. Glenn" <[email protected]> > Date: Tuesday, March 29, 2011 3:24 pm > Subject: [Xmldatadumps-l] March 17 en wikipedia history bz2 files > ready > To: [email protected] > Cc: [email protected] > > > Well, that used up all my good luck for the year, but the bz2s > > are ready > > for download. The md5sums are still calculating, give them > > a couple > > hours to show up. If all continues to go well we'll have > > the 7z files > > in 4-5 days. > > > > As before I do not plan to provide a single 350gb file of the > > bz2, nor a > > single 7z file for download. > > > > Happy trails, > > > > Ariel > > > > > > _______________________________________________ > > Xmldatadumps-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l > > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
