On Wed Nov 7 15:02:57 2007, Michal 'vorner' Vaner wrote:
Can't compression solve this? Does anyone know, how the base64
encoded
data grow/shrink, if they are put trough zlib? Would be nice to
know,
how far it is worth going with the blob transfers & modifications to
protocol.
I've been accused - on this list - of treating compression as a
panacea. But it's not a substitute for efficiency. Base64 encoding is
recovered to a degree by a good minimal redundancy algorithm, but it
tends to shield patterns from a dictionary algorithm. DEFLATE uses a
Lempel-Ziv dictionary algorithm first, then Huffman, a minimal
redundancy algorithm.
Lucky, practise is easier than theory. Grab some suitable data,
compress it, base64+compress it, and compare all the sizes. Gzip is a
useful tool to do this - the results aren't 100% accurate due to gzip
overhead, but are close to the zlib compression we use in the
application layer of XMPP, and are pretty close to DEFLATE (as we
should be using, and as TLS uses).
I took a C source file, and found this:
-rwxr-xr-x 1 dwd dwd 36K 2007-11-07 15:43 connection.c
The original file. (100%)
-rw-r--r-- 1 dwd dwd 49K 2007-11-07 15:44 connection.c.b64
Base64 encoded, traditionally, with newlines. (135%)
-rw-r--r-- 1 dwd dwd 15K 2007-11-07 15:44 connection.c.b64.gz
Base64, then gzipped. (40%)
-rw-r--r-- 1 dwd dwd 8.1K 2007-11-07 15:44 connection.c.gz
Just gzipped. Note it's nearly half the size. We'll use this as an
uncompressible object. (22% / 100%)
-rw-r--r-- 1 dwd dwd 11K 2007-11-07 15:45 connection.c.gz.b64
Gzipped, then base64. (30% / 135%)
-rw-r--r-- 1 dwd dwd 8.4K 2007-11-07 15:45 connection.c.gz.b64.gz
Now gzip it again. In principle, this should have recovered the
base64 encoding, but note that it hasn't. (23% / 103%)
This suggests to me that not only does gzip not recover the base64
encoding fully - although close - but base64 encoding prior to
compression really hurts the compressor.
Note that compressing first, then base64 encoding, then compressing
*again* actually gave better results than base64 *then* compressing,
meaning that almost every file transfer we do under base64 should be
compressed first.
Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade