On Thu, Dec 16, 2010 at 12:47 AM, Andrew Dunbar <[email protected]> wrote:
> At the moment I'm interested in .bz2 and .7z because those are the
> formats WikiMedia currently publishes data in.

I'm fairly certain the specific 7z format which Wikimedia uses doesn't
allow for random access, because the dictionary is never reset.

> Have we made the case for this format to the WikiMedia people?

No, there's no off-the-shelf tool to create these files - the standard
.xz file created by xz utils puts everything in one stream, which is
basically equivalent to the .7z files already being made.  I'm sure
"patches are welcome", but I don't have the time to create the patch.

> How is .xz for compression times?

At the default settings, it's quite slow.  I believe it's pretty much
the same as 7zip with its default settings.  The main reason I was
using xz instead of 7zip is that xz handles pipes better -
specifically, 7zip doesn't allow you to pipe from stdin to stdout.
(See https://bugs.launchpad.net/ubuntu/+source/p7zip/+bug/383667 and
the response - "You should use lzma." - well, lzma utils has been
replaced by xz utils.)

For decompression, .xz is generally faster than .bz2, slower than .gz

> Would we have to worry about patent issues for LZMA?

No, it uses LZMA2.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to