> I see no reason why you accept some limitations for this > encapsulation, but not ALL the limitations.
Because I can convert the data from binary to Unicode text in UTF-16 in a few lines of code if I don't worry about normalization. Suddenly the rules become much more complex if I have to worry about normalization. The simple fact is I can change UTF-8 to UTF-16 to UTF-32 with several utilities on my system, but not the normalization. I don't know of any basic text tools that handle normalization, so if I edit a source code and email it to someone (which compresses and decompresses automatically), they're going to have trouble running diff on the code. > If you don't want that such "denormalisation" occurs during the compression, > don't claim that your 9-bit encapsulator produces Unicode text (so don't > label it with a UTF-* encoding scheme or even a BOCU-* or SCSU character > encoding scheme, but use your own charset label)! The whole point of such a tool would be to send binary data on a transport that only allowed Unicode text. In practice, you'd also have to remap C0 and C1 characters; but even then 0x00-0x1F -> U+0250-026F and 0x80-0x9F to U+0270-U+028F wouldn't be too complex. Unless you've added a Unicode library to what could otherwise be coded in 4k, normalization would add a lot of complexity. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm

