Jamie Morken wrote: > > Hi, > > Thanks for the info, while I was at it I did some more checking of the > history dump file sizes and compression ratios (as reported by 7-Zip 9.20): > > enwiki-20110115-pages-meta-history1.xml.7z 434.99x compression > enwiki-20110115-pages-meta-history2.xml.7z 289.46x compression > enwiki-20110115-pages-meta-history3.xml.7z 248.72x compression > enwiki-20110115-pages-meta-history4.xml.7z 216.29x compression > enwiki-20110115-pages-meta-history5.xml.7z 198.67x compression > enwiki-20110115-pages-meta-history6.xml.7z 176.94x compression > enwiki-20110115-pages-meta-history7.xml.7z 161.42x compression > enwiki-20110115-pages-meta-history8.xml.7z 208.59x compression > enwiki-20110115-pages-meta-history9.xml.7z 126.86x compression > enwiki-20110115-pages-meta-history10.xml.7z 112.10x compression > enwiki-20110115-pages-meta-history11.xml.7z 117.27x compression > enwiki-20110115-pages-meta-history12.xml.7z 118.88x compression > enwiki-20110115-pages-meta-history13.xml.7z 133.07x compression > enwiki-20110115-pages-meta-history14.xml.7z 107.10x compression > enwiki-20110115-pages-meta-history15.xml.7z 83.24x compression > > pages-meta-history1 has the oldest articles and also the most revisions, > therefore it has the > highest compression ratio (as most revisions have only minor changes for > established articles). > The pages-meta-history15 file contains the most recently created articles > which have the least revisions, > but tend to have greater relative changes compared to the overall article > size, and thus has the lowest 7z compression. > > enwiki-20110115-pages-meta-history8.xml doesn't follow the pattern of > decreasing compression ratios.
Maybe it contains many bot created articles? > That's all I can report without actually looking inside these files! :) > > cheers, > Jamie _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
