Pub date = 1 April 2003. I think that's the salient part.
Addison P. Phillips Director, Globalization Architecture webMethods | Delivering Global Business Visibility http://www.webMethods.com Chair, W3C Internationalization (I18N) Working Group Chair, W3C-I18N-WG, Web Services Task Force http://www.w3.org/International Internationalization is an architecture. It is not a feature. > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Behalf Of Philippe Verdy > Sent: jeudi 30 octobre 2003 15:42 > To: John Cowan > Cc: [EMAIL PROTECTED] > Subject: Re: UTF-9 > > > From: "John Cowan" <[EMAIL PROTECTED]> > > > http://panda.com/tops-20/utf9.txt > > > > Res ipsa loquitur. > > Are there still now platforms where storage bytes are not octets > but nonets? > i.e. 9-bit based platforms? If so this proposal makes sense, but > as a local > optimization for these platforms. Problems will code if you want to > interchange this data with the large majority of hosts that can > handle a 9th > bit in their bytes. > > This means that the interchange would require to send 2 octets to > represent > each 9-bit byte without loosing data, or to use a complex bit pattern to > pack sequences of height 9-bit bytes into sequences of nine 8-bit > bytes, and > with a way to interpret the last sequence (Such converters needed for > interoperability do exist: look for example at the MIME Base64 > algorithm for > example which is used to pack sequences of 8-bit bytes into serialized > octets each with 6 significant bits). > > UTF-9 seems interesting in this case, but is it worth the value > as it is not > interchangeable directly with the most common networking > technologies? Can't > you accept to loose 1-bit per storage byte? > > What will happen then to a plain-text coded with UTF-9, and that is sent > through FTP? Do you mean that FTP should use a Base256 converter for 9-bit > platforms similar to Base64 for 8-bit platforms, to avoid loosing the most > significant bits of each transfered byte? How the recipient of the file > supposed to interpret the convereted data? Is it still plain text? > > So if the format is not interchangeable, this UTF-9 transform looks like a > local-only transformation, and locally, each host can use its own > representation. And why not then a UTF-18 encoding scheme that would avoid > using UF-16 surrogates for all characters that fit in the first 4 planes? > > For me, a UTF-18 encoding would make better sense if local optimization in > memory is the issue, as it will represent almost all existing Unicode > characters in planes 0 (BMP), 1 (SMP), 2 (SIP) and 3 (still not used, but > you may map instead the SSP plane 14 for tags and variation selectors, or > keep it for later use as SIP2) in one 18-bit code unit... But you'll still > need a converter to transform it to UTF-8 or a UTF-16 encoding scheme to > perform any I/O. >

