----- Original Message ----- From: Brian J Mingus <[email protected]> Date: Tuesday, March 29, 2011 7:15 pm Subject: Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready To: Wikimedia developers <[email protected]> Cc: Jamie Morken <[email protected]>, "Ariel T. Glenn" <[email protected]>, [email protected] > > > According to this data the 7z dump for enwp will reach 1 > terabyte on Jan 2, > 2145. > > =)
Hi, I made a graph for the uncompressed XML file size for the enwiki pages-meta-history files over time, I thought that these files would be growing exponentially but they appear to grow linear. For comparison in 2145 the raw XML should be about 178 TB I think, so the 7z files are growing linearly about 180x faster than the raw XML. "http://nekrom.com/wikipedia/enwiki%20history%20uncompressed%20XML%20dump%20file%20size%20over%20time.png" (data below) cheers, Jamie enwiki-20060816-pages-meta-history.xml 782741875000 (728.99 GB) enwiki-20070402-pages-meta-history.xml 1763048493749 (1641.97 GB) (229 days since previous dump) enwiki-20080103-pages-meta-history.xml 2807444044080 (2614.64 GB) (276 days since previous dump) enwiki-20100130-pages-meta-history.xml 5873134833455 (5469.78 GB) (758 days since previous dump) enwiki-20110115-pages-meta-history[1-15].xml 7218617857754 (6722.86 GB) (350 days since previous dump) enwiki-20110115-pages-meta-history1.xml 1 080 719 385 129 enwiki-20110115-pages-meta-history2.xml 677 956 948 289 enwiki-20110115-pages-meta-history3.xml 550 889 319 423 enwiki-20110115-pages-meta-history4.xml 447 001 611 247 enwiki-20110115-pages-meta-history5.xml 453 700 983 270 enwiki-20110115-pages-meta-history6.xml 540 208 590 115 enwiki-20110115-pages-meta-history7.xml 458 817 000 243 enwiki-20110115-pages-meta-history8.xml 649 710 293 818 enwiki-20110115-pages-meta-history9.xml 471 183 250 318 enwiki-20110115-pages-meta-history10.xml 406 115 459 739 enwiki-20110115-pages-meta-history11.xml 342 840 308 580 enwiki-20110115-pages-meta-history12.xml 310 507 626 798 enwiki-20110115-pages-meta-history13.xml 362 264 384 002 enwiki-20110115-pages-meta-history14.xml 269 988 897 698 enwiki-20110115-pages-meta-history15.xml 196 713 799 085 > > -- > Brian Mingus > Graduate student > Computational Cognitive Neuroscience Lab > University of Colorado at Boulder > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
