Re: CESU-8 vs UTF-8

Marcin 'Qrczak' Kowalczyk Sun, 16 Sep 2001 04:30:33 -0700
Sun, 16 Sep 2001 01:14:06 -0700, Carl W. Brown <[EMAIL PROTECTED]> pisze:

> If it can be demonstrated that there is a real need for an encoding
> like CESU-8 then is should be very different from UTF-8.  How does
> SCSU for example sort?

SCSU encoding is non-deterministic and its representations can't
be compared lexicographically at all (logically equal strings might
compare unequal).

Ehh, we wouldn't have the problem with CESU-8 now if Unicode hadn't
been described as a 16-bit encoding in the past. I still think that
UTF-16 was a big mistake. Too bad that it still affects people who
avoid it.

We can't change the past, but I hope that at least UTF-8 processing can
be done without treating surrogates in any special way. Surrogates are
relevant only for UTF-16; by not using UTF-16 you should be free of
surrogate issues, except by having a silly unused area in character
numbers and a silly highest character number. Please don't spread
UTF-16 madness where it doesn't belong.

-- 
 __("<  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK
Re: CESU-8 vs UTF-8

Reply via email to