On Wed Nov  7 15:02:57 2007, Michal 'vorner' Vaner wrote:
Can't compression solve this? Does anyone know, how the base64 encoded data grow/shrink, if they are put trough zlib? Would be nice to know,
how far it is worth going with the blob transfers & modifications to
protocol.

I've been accused - on this list - of treating compression as a panacea. But it's not a substitute for efficiency. Base64 encoding is recovered to a degree by a good minimal redundancy algorithm, but it tends to shield patterns from a dictionary algorithm. DEFLATE uses a Lempel-Ziv dictionary algorithm first, then Huffman, a minimal redundancy algorithm.

Lucky, practise is easier than theory. Grab some suitable data, compress it, base64+compress it, and compare all the sizes. Gzip is a useful tool to do this - the results aren't 100% accurate due to gzip overhead, but are close to the zlib compression we use in the application layer of XMPP, and are pretty close to DEFLATE (as we should be using, and as TLS uses).

I took a C source file, and found this:

-rwxr-xr-x 1 dwd dwd  36K 2007-11-07 15:43 connection.c
The original file. (100%)
-rw-r--r-- 1 dwd dwd  49K 2007-11-07 15:44 connection.c.b64
Base64 encoded, traditionally, with newlines. (135%)
-rw-r--r-- 1 dwd dwd  15K 2007-11-07 15:44 connection.c.b64.gz
Base64, then gzipped. (40%)
-rw-r--r-- 1 dwd dwd 8.1K 2007-11-07 15:44 connection.c.gz
Just gzipped. Note it's nearly half the size. We'll use this as an uncompressible object. (22% / 100%)
-rw-r--r-- 1 dwd dwd  11K 2007-11-07 15:45 connection.c.gz.b64
Gzipped, then base64. (30% / 135%)
-rw-r--r-- 1 dwd dwd 8.4K 2007-11-07 15:45 connection.c.gz.b64.gz
Now gzip it again. In principle, this should have recovered the base64 encoding, but note that it hasn't. (23% / 103%)

This suggests to me that not only does gzip not recover the base64 encoding fully - although close - but base64 encoding prior to compression really hurts the compressor.

Note that compressing first, then base64 encoding, then compressing *again* actually gave better results than base64 *then* compressing, meaning that almost every file transfer we do under base64 should be compressed first.

Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Reply via email to