On Sat, Jan 10, 2009 at 9:14 AM, Keisial <[email protected]> wrote:
> bzipping the pages by blocks as I did for my offline reader produces a
> file size similar to the the original*
> There may be ways to get similar results without having to rebuild the
> revisions.
> Also note that in both cases you still need an intermediate app to
> provide input dumps for those tools.
>
> *112% measuring enwiki-20081008-pages-meta-current. Looking at
> ruwiki-20081228-history, both the original bz2 and my faster-access one
> are 8.2G.

-history dumps and one off page dumps are pretty distinct cases: The
history dumps have a lot more available redundancy.

For fast access articles you might want to consider compressing
articles one-off with a a dictionary based pre-pass such as
http://xwrt.sourceforge.net/

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to