Re: Unicode forms for internal storage - BOCU-1 speed

Mark Davis Thu, 22 Jan 2004 14:40:41 -0800

I think Markus was referring to UTF-8 in the context of the message as a
"compression" format. And you would have to add that it is really good at
ASCII-only...


Mark
__________________________________
http://www.macchiato.com
â ààààààààààààààààààààà â

----- Original Message ----- 
From: <[EMAIL PROTECTED]>
To: "Markus Scherer" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thu, 2004 Jan 22 10:50
Subject: Re: Unicode forms for internal storage - BOCU-1 speed


> Markus Scherer scripsit:
>
> > UTF-8 is useful because it's simple, and supported just about everywhere -
> > but it's otherwise hardly optimal for anything.
>
> You entirely omit its principal advantage, sine qua non:  it's maximally
> ASCII-compatible, using bytes 0x00 to 0x7F to represent ASCII characters and
> nothing else.
>
> Mark Crispin's UTF-9 (not to be confused with Jerome Abela's) is also
> excellent, although most of us don't have 36-bit systems, for which it
> makes sense.  A precis:
>
> Code points (base 2) UTF-9 code units (base 2)
> 0000000000000abcdefgh 0abcdefgh
> 00000abcdefghijklmnop 1abcdefgh 0ijklmnop
> abcdefghijklmnopqrstu 1000abcde 1fghijklm 0nopqrstu
>
> This is almost as good as Latin-1 for its repertoire, only minutely worse
> than UTF-16 for the rest of the BMP, and beats all other encodings for the
> other planes.
>
> -- 
> John Cowan                              <[EMAIL PROTECTED]>
> http://www.ccil.org/~cowan              http://www.reutershealth.com
>                 Charles li reis, nostre emperesdre magnes,
>                 Set anz totz pleinz ad ested in Espagnes.
>
>

Re: Unicode forms for internal storage - BOCU-1 speed

Reply via email to