I think Markus was referring to UTF-8 in the context of the message as a "compression" format. And you would have to add that it is really good at ASCII-only...
Mark __________________________________ http://www.macchiato.com â ààààààààààààààààààààà â ----- Original Message ----- From: <[EMAIL PROTECTED]> To: "Markus Scherer" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thu, 2004 Jan 22 10:50 Subject: Re: Unicode forms for internal storage - BOCU-1 speed > Markus Scherer scripsit: > > > UTF-8 is useful because it's simple, and supported just about everywhere - > > but it's otherwise hardly optimal for anything. > > You entirely omit its principal advantage, sine qua non: it's maximally > ASCII-compatible, using bytes 0x00 to 0x7F to represent ASCII characters and > nothing else. > > Mark Crispin's UTF-9 (not to be confused with Jerome Abela's) is also > excellent, although most of us don't have 36-bit systems, for which it > makes sense. A precis: > > Code points (base 2) UTF-9 code units (base 2) > 0000000000000abcdefgh 0abcdefgh > 00000abcdefghijklmnop 1abcdefgh 0ijklmnop > abcdefghijklmnopqrstu 1000abcde 1fghijklm 0nopqrstu > > This is almost as good as Latin-1 for its repertoire, only minutely worse > than UTF-16 for the rest of the BMP, and beats all other encodings for the > other planes. > > -- > John Cowan <[EMAIL PROTECTED]> > http://www.ccil.org/~cowan http://www.reutershealth.com > Charles li reis, nostre emperesdre magnes, > Set anz totz pleinz ad ested in Espagnes. > >

