On Wed, Dec 15, 2010 at 12:01 PM, Andrew Dunbar <[email protected]> wrote:
> By the way I'm keen to find something similar for .7z

I've written something similar for .xz, which uses LZMA2 same as .7z.
It creates a virtual read-only filesystem using FUSE (the FUSE part is
in perl, which uses pipes to dd and xzcat).  Only real problem is that
it doesn't use a stock .xz file, it uses a specially created one which
concatenates lots of smaller .xz files (currently I concatenate
between 5 and 20 or so 900K bz2 blocks into one .xz stream - between 5
and 20 because there's a preference to split on </page><page>
boundaries).

Apparently the folks at openzim have done something similar, using LZMA2.

If anyone is interesting in working with me to make a package capable
of being released to the public, I'd be willing to share my code.  But
it sounds like I'm just reinventing a wheel already invented by
opensim.

> It would be incredibly useful if these indices could be created as
> part of the dump creation process. Should I file a feature request?

With concatenated .xz files, creating the index is *much* faster,
because the .xz format puts the stream size at the end of each stream.
 Plus with .xz all streams are broken on 4-byte boundaries, whereas
with .bz2 blocks can end at any *bit* (which means you have to do
painful bit shifting to create the index).

The file is also *much* smaller, on the order of 5-10% of bzip2 for a
full history dump.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to