Jamie Morken wrote:
> 
> Hi,
> 
> Thanks for the info, while I was at it I did some more checking of the 
> history dump file sizes and compression ratios (as reported by 7-Zip 9.20):
> 
> enwiki-20110115-pages-meta-history1.xml.7z 434.99x compression
> enwiki-20110115-pages-meta-history2.xml.7z 289.46x compression
> enwiki-20110115-pages-meta-history3.xml.7z 248.72x compression
> enwiki-20110115-pages-meta-history4.xml.7z 216.29x compression
> enwiki-20110115-pages-meta-history5.xml.7z 198.67x compression
> enwiki-20110115-pages-meta-history6.xml.7z 176.94x compression
> enwiki-20110115-pages-meta-history7.xml.7z 161.42x compression
> enwiki-20110115-pages-meta-history8.xml.7z 208.59x compression
> enwiki-20110115-pages-meta-history9.xml.7z 126.86x compression
> enwiki-20110115-pages-meta-history10.xml.7z 112.10x compression
> enwiki-20110115-pages-meta-history11.xml.7z 117.27x compression
> enwiki-20110115-pages-meta-history12.xml.7z 118.88x compression
> enwiki-20110115-pages-meta-history13.xml.7z 133.07x compression
> enwiki-20110115-pages-meta-history14.xml.7z 107.10x compression
> enwiki-20110115-pages-meta-history15.xml.7z 83.24x compression
> 
> pages-meta-history1 has the oldest articles and also the most revisions, 
> therefore it has the
> highest compression ratio (as most revisions have only minor changes for 
> established articles).  
> The pages-meta-history15 file contains the most recently created articles 
> which have the least revisions,
> but tend to have greater relative changes compared to the overall article 
> size, and thus has the lowest 7z compression.
>
> enwiki-20110115-pages-meta-history8.xml doesn't follow the pattern of 
> decreasing compression ratios.

Maybe it contains many bot created articles?

> That's all I can report without actually looking inside these files! :)
> 
> cheers,
> Jamie


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to