From: "John Cowan" <[EMAIL PROTECTED]> > http://panda.com/tops-20/utf9.txt > > Res ipsa loquitur.
Are there still now platforms where storage bytes are not octets but nonets? i.e. 9-bit based platforms? If so this proposal makes sense, but as a local optimization for these platforms. Problems will code if you want to interchange this data with the large majority of hosts that can handle a 9th bit in their bytes. This means that the interchange would require to send 2 octets to represent each 9-bit byte without loosing data, or to use a complex bit pattern to pack sequences of height 9-bit bytes into sequences of nine 8-bit bytes, and with a way to interpret the last sequence (Such converters needed for interoperability do exist: look for example at the MIME Base64 algorithm for example which is used to pack sequences of 8-bit bytes into serialized octets each with 6 significant bits). UTF-9 seems interesting in this case, but is it worth the value as it is not interchangeable directly with the most common networking technologies? Can't you accept to loose 1-bit per storage byte? What will happen then to a plain-text coded with UTF-9, and that is sent through FTP? Do you mean that FTP should use a Base256 converter for 9-bit platforms similar to Base64 for 8-bit platforms, to avoid loosing the most significant bits of each transfered byte? How the recipient of the file supposed to interpret the convereted data? Is it still plain text? So if the format is not interchangeable, this UTF-9 transform looks like a local-only transformation, and locally, each host can use its own representation. And why not then a UTF-18 encoding scheme that would avoid using UF-16 surrogates for all characters that fit in the first 4 planes? For me, a UTF-18 encoding would make better sense if local optimization in memory is the issue, as it will represent almost all existing Unicode characters in planes 0 (BMP), 1 (SMP), 2 (SIP) and 3 (still not used, but you may map instead the SSP plane 14 for tags and variation selectors, or keep it for later use as SIP2) in one 18-bit code unit... But you'll still need a converter to transform it to UTF-8 or a UTF-16 encoding scheme to perform any I/O.

