In article <[EMAIL PROTECTED]>, Bennett Todd  <[EMAIL PROTECTED]> wrote:

Yeah, yeah, yeah, I'm replying to month-old emails again... ;-)

>2000-08-08-11:33:04 Fran=E7ois Pinard:
>> [...] I'm using a modem link [...]
>> I'm facing compression:
>> * at the PPP level,
>> * at the ssh level,
>> * at the rsync level.
>>[...]
>> Adding RSYNC compression in such context does not seem any useful
>> to me.

>I'd expect that rsync would be able to do a better job of
>compression, even if it's applying the same basic compression
>algorithm, just because it knows more about the structure of the
>data and so can apply it more wisely. And the rsync man page
>concurs:
>
>       Note this this option typically achieves better compression
>       ratios that can be achieved by using a compressing remote
>       shell, or a compressing transport, as it takes advantage of
>       the implicit information sent for matching data blocks.

rsync should have a *huge* compression performance advantage compared
to compression at lower-level protocols.  The reason: rsync has access
to known data on the receiving side that it can use to pre-load a
decompression dictionary.

For example, say you have some source code in a 100K text file that can
be compressed by 90% with gzip, and you change 1000 bytes near the end of
the file.  rsync sends (approximately) a 1000-byte delta block.  If you
compressed that delta block with ssh or PPP, the compression routine would
only be able to use the data within the block for compression--typically
you get much less than 90% compression on such a small amount of data,
more like 50-75%.

On the other hand, rsync can say "OK, pretend I just sent you a compressed
version of the last 32K of the file before this block.  I know you can do
this because you just sent me the signatures for this data, so I know you
have access to it, and I don't have to send it to you.  Now here's what
zlib would have output for the following 1000 bytes."  That's _much_
smaller; in fact, it may be only tens of bytes if the delta block is
mostly a repeated copy of non-block-aligned data somewhere else in the
file--something rsync would miss but gzip would catch.

To verify this, try extracting the last 1000 bytes from a text file larger
than 100K, and compress with gzip (which uses the same compression engine
as zlib, which rsync uses)

        1.  the last 1000 bytes alone
        2.  the original text file with the last 1000 bytes removed
        3.  the entire original text file.

The difference between the sizes of 2 and 3 should be ~50% smaller than
the size of 1, even if you take constant-size gzip headers into account.

Of course all this assumes compressible data and small, sparsely
distributed changes within large files, blah blah blah...if you're
rsync'ing a bunch of mp3 files, attempting compression at any level
between ppp and sync is futile.


Reply via email to