Re: UTF-16 inside UTF-8

Doug Ewell Tue, 04 Nov 2003 23:48:00 -0800

Peter Kirk <peterkirk at qaya dot org> wrote:

>> ... (a very old, legacy application, unaware of the existence of
>> codepoints above U+FFFF) ...
>
> Such applications are not "very old", they are still being written.
> For example (see http://www.mysql.com/doc/en/Charset-Unicode.html),
> MySQL 4.1 adds UCS-2 and UTF-8 support to previous versions but for
> single two-byte codes in UCS-2 and up to three bytes per UTF-8
> character only :-( - and this is still in alpha!


At the risk of upsetting the open-source faithful, that is just plain
lazy.  Anyone who can master the wizardly details of building a powerful
(and commercially successful) database program can figure out how to
slap two surrogates together without destroying performance.
Constraining UTF-8 to the BMP is even less defensible, since there is no
performance penalty in allowing four-byte UTF-8 sequences.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: UTF-16 inside UTF-8

Reply via email to