Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

DougEwell2 Mon, 05 Feb 2001 08:47:29 -0800
In a message dated 2001-02-05 5:19:59 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

>  > I have heard a rumour (i.e. my source is not involved in the reported
>  > activity) that:
>  >
>  > <quote>
>  > SAP, PeopleSoft, Siebel, Oracle and others are actually
>  > in the process of proposing a new format of UTF that will cause a UTF-16
>  > surrogate pair to become two 3-byte UTF-8 codepoints so that UTF-8 will
>  > have the same behaviour as UTF-16, that is, a surrogate will be two UTF-8
>  > code points.
>  > </quote>
>  >
>  > Can anyone corroborate this, and, if it's true, offer an opinion on it?

>  Using UTF-8 to handle characters in the supplementary planes by way of 
>  using two separate code points in the surrogate range is NOT considered
>  acceptable.
>  
>  Currently it is legal to interpret them but *not* to generate them 
(multople
>  refs on the Unicode site). Therefore, I hope you are mistaken about the
>  rumor since this would be a Bad Thing (tm).

This is laziness, intended to get around the "problem" of supplementary code 
points instead of handling them like any other code points.  This reminds me 
of the Java bastardization of UTF-8, in which U+0000 is encoded 0xC0 0x80 so 
that no character string will ever contain the byte 0x00.  (Nobody has ever 
explained to me why a character string would contain U+0000 in the first 
place.)

I have argued in the past that in some cases, semi-conformant Unicode 
implementations might be better than non-Unicode solutions.  But creating a 
new UTF to get around your product's lack of real Unicode support *and then 
expecting others to use your hack* is a different matter entirely.  Just bite 
the bullet and support UTF-8.  It's not that hard.

-Doug Ewell
 Fullerton, California
Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

Reply via email to