On Tue Nov  6 13:00:44 2007, Tomasz Sterna wrote:
Dnia 05-11-2007, Pn o godzinie 16:23 +0100, Tomasz Sterna pisze:
> Alternatively we could invent binary-2-utf mapping which has less
> overhead than BASE64.

Simplest that comes to mind:
Let's take first 256 allowable UTF-8 characters and assign them to 256
values of a single byte.
That would be less than 33% BASE64 overhead.


Can't do that, because many of those characters are going to be illegal even in CDATA sections.

You could take all those ones, though, and add 256 to the codepoint value before encoding - that would - I think - be sufficient.

But bear in mind that even then, to encode a single octet will yield between 1 and 3 characters. Encoding essentially random data - which includes the output of any decent encryption algorithm - will encode half the octets using 2-byte characters, yielding - on average - a 50% inflation. That's higher than base64, of course.

It's possible that a modified UTF-7 might be better. (And UTF-7, modified or not, is acceptable UTF-8).

Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Reply via email to