Re: UTF-16 inside UTF-8

Doug Ewell Tue, 04 Nov 2003 10:35:54 -0800

Jill Ramonsky wrote:

> What is a conforming application supposed to do if, when decoding a
> UTF-8 stream (or indeed a UTF-32 stream, etc.), it encounters a
> sequence of bytes which decodes to U+D800, U+DF00 ?


It should recognize that the text is not UTF-8 at all, but rather CESU-8
(see UTR #26), whereupon it should burst into uncontrollable peals of
laughter.

Serious answer: It should recognize that the text is *ill-formed* UTF-8
(definition D30) and should probably decline to process the two code
points.  If it wants to be more charitable than conformant, it MAY
choose to reassemble them to create U+10300, but it is under no
obligation to do so.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: UTF-16 inside UTF-8

Reply via email to