RE: Compression through normalization

D. Starner Wed, 26 Nov 2003 09:00:20 -0800

> I see no reason why you accept some limitations for this
> encapsulation, but not ALL the limitations.


Because I can convert the data from binary to Unicode text in UTF-16
in a few lines of code if I don't worry about normalization. Suddenly
the rules become much more complex if I have to worry about normalization.

The simple fact is I can change UTF-8 to UTF-16 to UTF-32 with several
utilities on my system, but not the normalization. I don't know of any
basic text tools that handle normalization, so if I edit a source code
and email it to someone (which compresses and decompresses automatically), 
they're going to have trouble running diff on the code. 
 
> If you don't want that such "denormalisation" occurs during the compression,
> don't claim that your 9-bit encapsulator produces Unicode text (so don't
> label it with a UTF-* encoding scheme or even a BOCU-* or SCSU character
> encoding scheme, but use your own charset label)!

The whole point of such a tool would be to send binary data on a transport that
only allowed Unicode text. In practice, you'd also have to remap C0 and C1
characters; but even then 0x00-0x1F -> U+0250-026F and 0x80-0x9F to U+0270-U+028F
wouldn't be too complex. Unless you've added a Unicode library to what could
otherwise be coded in 4k, normalization would add a lot of complexity.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

RE: Compression through normalization

Reply via email to