Jill Ramonsky wrote: > What is a conforming application supposed to do if, when decoding a > UTF-8 stream (or indeed a UTF-32 stream, etc.), it encounters a > sequence of bytes which decodes to U+D800, U+DF00 ?
It should recognize that the text is not UTF-8 at all, but rather CESU-8 (see UTR #26), whereupon it should burst into uncontrollable peals of laughter. Serious answer: It should recognize that the text is *ill-formed* UTF-8 (definition D30) and should probably decline to process the two code points. If it wants to be more charitable than conformant, it MAY choose to reassemble them to create U+10300, but it is under no obligation to do so. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

