Re: [protobuf] Re: Protocol Buffers using Lzip

Jacob Rief Fri, 11 Dec 2009 14:20:59 -0800

Hello Chris,

2009/12/10 Christopher Smith <[email protected]>:
> One compression algo that I thought would be particularly useful with PB's
> would be LZO. It lines up nicely with PB's goals of being fast and compact.
> Have you thought about allowing an integrated LZO stream?
>
> --Chris


My goal is to compress huge amounts >5GB of small serialized chunks
(~150...500 Bytes) into a single stream, and still being able to
randomly access each part of it without having to decompress to whole
stream. GzipOutputStream (with level 5) reduces the size to about 40%
compared to the uncompressed binary stream, whereas my
LzipOutputStream (with level 5) reduces the size to about 20%. The
difficulty with gzip is to find synchronizing boundaries in the stream
during uncompression
If your aim is to exchange small messages, say by RPC, than a fast but
less efficient algorithm is the right choice. If however you want to
store huge amounts of data permanently, your requirements may be
different.

In my opinion, generic streaming classes such as
ZeroCopyIn/OutputStream, shall offer different compression algorithms
for different purposes. LZO has advantages if used for communication
of small to medium sized chunks of data. LZMA on the other hand has
advantages if you have to store lots of data for a long term. GZIP is
somewhere in the middle. Unfortunately Kenton has another opinion
about adding too many compression streaming classes.

Today I studied the API of LZO. From what I have seen, I think one
could implement two LzoIn/OutputStream classes. LZO compression
however has a small drawback, let me explain why: The LZO API is not
intended to be used for streams. Instead it always compresses and
decompresses a whole block. This is different behaviour than gzip and
lzip, which are intended to compress streams. A compression class has
a fixed sized buffer of typically 8 or 64kB. If this buffer is filled
with data, lzip and gzip digest the input and you can start to fill
the buffer from the beginning. On the other hand, the LZO compressor
has to compress the whole buffer in one step. The next block then has
to be concatenated with the already compressed data, which means that
during decompression you have to fiddle these chunks apart.

If your intention is to compress a chunk of data with, say less than
64kB each, and then to put it on the wire, then LZO is the right
solution for you. For my requirements, as you will understand now, LZO
does not really fit well.
If there is a strong interest in an alternative Protocol Buffer
compression stream, don't hesitate to contact me.

Jacob

--

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Re: Protocol Buffers using Lzip

Reply via email to