On Thu, Dec 16, 2010 at 12:47 AM, Andrew Dunbar <[email protected]> wrote: > At the moment I'm interested in .bz2 and .7z because those are the > formats WikiMedia currently publishes data in.
I'm fairly certain the specific 7z format which Wikimedia uses doesn't allow for random access, because the dictionary is never reset. > Have we made the case for this format to the WikiMedia people? No, there's no off-the-shelf tool to create these files - the standard .xz file created by xz utils puts everything in one stream, which is basically equivalent to the .7z files already being made. I'm sure "patches are welcome", but I don't have the time to create the patch. > How is .xz for compression times? At the default settings, it's quite slow. I believe it's pretty much the same as 7zip with its default settings. The main reason I was using xz instead of 7zip is that xz handles pipes better - specifically, 7zip doesn't allow you to pipe from stdin to stdout. (See https://bugs.launchpad.net/ubuntu/+source/p7zip/+bug/383667 and the response - "You should use lzma." - well, lzma utils has been replaced by xz utils.) For decompression, .xz is generally faster than .bz2, slower than .gz > Would we have to worry about patent issues for LZMA? No, it uses LZMA2. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
