The individually numbered files change sizes radically because I'm
moving around start and end points.  You can ignore that. 

I am looking at piece 10 however to see why it's smaller: ah.  I have a
typo in the size for that one, I asked for only 200000 pages to go in it
instead of the 240000 I intended :-D  And so that's all that went in
(minus deleted pages).   Nothing's missing though; anything "extra"
winds up in the last piece (15).  You can look at the stub files to
verify that.

FWIW we'll be juggling the number of pages per chunk on a regular basis.

Ariel

Στις 29-03-2011, ημέρα Τρι, και ώρα 17:08 -0700, ο/η Jamie Morken
έγραψε:
> Hi all,
> 
> Congrats Ariel! :)  The sum of pages-meta-history files for the last
> two enwiki dumps are 342.7GB for the 20110115 dump and 353.5GB for the
> 20110317 dump, which shows that the overall dump size grew over 2
> months.  Seven of the individually numbered pages-meta-history files
> reduced in size while eight increased in size from 20110115 to
> 20110317.  By far the biggest decrease was the
> pages-meta-history10.xml.bz2 file which dropped from 18.7GB down to
> 1.9GB.  I think there is probably missing revisions in that page ID
> range.
> 
> Here are some historical dumps sizes for comparison to show the growth
> of these files:
> 
> enwiki-20060816-pages-meta-history.xml.7z 5.08GB
> enwiki-20070402-pages-meta-history.xml.7z 11.3GB (229 days since
> previous dump)
> enwiki-20080103-pages-meta-history.xml.7z 17.2GB (276 days since
> previous dump)
> enwiki-20100130-pages-meta-history.xml.7z 31.8GB (758 days since
> previous dump)
> enwiki-20110115-pages-meta-history[1-15].xml.7z 38.0GB (350 days since
> previous dump)
> enwiki-20110115-pages-meta-history[1-15].xml.7z (7z compression in
> progress)
> 
> Here's a graph of this data showing the dump file size growth seems to
> be pretty linear:
> (chart x-axis starts from 20060816 dump and ends at 20110115 dump)
> "http://nekrom.com/wikipedia/enwiki%20history%20dump%20file%20size%
> 20over%20time.png"
> 
> cheers,
> Jamie
> 
> 
> ----- Original Message -----
> From: "Ariel T. Glenn" <[email protected]>
> Date: Tuesday, March 29, 2011 3:24 pm
> Subject: [Xmldatadumps-l] March 17 en wikipedia history bz2 files
> ready
> To: [email protected]
> Cc: [email protected]
> 
> > Well, that used up all my good luck for the year, but the bz2s 
> > are ready
> > for download.  The md5sums are still calculating, give them 
> > a couple
> > hours to show up.  If all continues to go well we'll have 
> > the 7z files
> > in 4-5 days. 
> > 
> > As before I do not plan to provide a single 350gb file of the 
> > bz2, nor a
> > single 7z file for download.  
> > 
> > Happy trails,
> > 
> > Ariel
> > 
> > 
> > _______________________________________________
> > Xmldatadumps-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
> > 



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to