Hello Chris, 2009/12/10 Christopher Smith <[email protected]>: > One compression algo that I thought would be particularly useful with PB's > would be LZO. It lines up nicely with PB's goals of being fast and compact. > Have you thought about allowing an integrated LZO stream? > > --Chris
My goal is to compress huge amounts >5GB of small serialized chunks (~150...500 Bytes) into a single stream, and still being able to randomly access each part of it without having to decompress to whole stream. GzipOutputStream (with level 5) reduces the size to about 40% compared to the uncompressed binary stream, whereas my LzipOutputStream (with level 5) reduces the size to about 20%. The difficulty with gzip is to find synchronizing boundaries in the stream during uncompression If your aim is to exchange small messages, say by RPC, than a fast but less efficient algorithm is the right choice. If however you want to store huge amounts of data permanently, your requirements may be different. In my opinion, generic streaming classes such as ZeroCopyIn/OutputStream, shall offer different compression algorithms for different purposes. LZO has advantages if used for communication of small to medium sized chunks of data. LZMA on the other hand has advantages if you have to store lots of data for a long term. GZIP is somewhere in the middle. Unfortunately Kenton has another opinion about adding too many compression streaming classes. Today I studied the API of LZO. From what I have seen, I think one could implement two LzoIn/OutputStream classes. LZO compression however has a small drawback, let me explain why: The LZO API is not intended to be used for streams. Instead it always compresses and decompresses a whole block. This is different behaviour than gzip and lzip, which are intended to compress streams. A compression class has a fixed sized buffer of typically 8 or 64kB. If this buffer is filled with data, lzip and gzip digest the input and you can start to fill the buffer from the beginning. On the other hand, the LZO compressor has to compress the whole buffer in one step. The next block then has to be concatenated with the already compressed data, which means that during decompression you have to fiddle these chunks apart. If your intention is to compress a chunk of data with, say less than 64kB each, and then to put it on the wire, then LZO is the right solution for you. For my requirements, as you will understand now, LZO does not really fit well. If there is a strong interest in an alternative Protocol Buffer compression stream, don't hesitate to contact me. Jacob -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
