Peter Kirk <peterkirk at qaya dot org> wrote: >> ... (a very old, legacy application, unaware of the existence of >> codepoints above U+FFFF) ... > > Such applications are not "very old", they are still being written. > For example (see http://www.mysql.com/doc/en/Charset-Unicode.html), > MySQL 4.1 adds UCS-2 and UTF-8 support to previous versions but for > single two-byte codes in UCS-2 and up to three bytes per UTF-8 > character only :-( - and this is still in alpha!
At the risk of upsetting the open-source faithful, that is just plain lazy. Anyone who can master the wizardly details of building a powerful (and commercially successful) database program can figure out how to slap two surrogates together without destroying performance. Constraining UTF-8 to the BMP is even less defensible, since there is no performance penalty in allowing four-byte UTF-8 sequences. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

