In a message dated 11/4/2003 6:44:05 AM Pacific Standard Time, [EMAIL PROTECTED] writes:
Hi,

What is a conforming application supposed to do if, when decoding a UTF-8 stream (or indeed a UTF-32 stream, etc.), it encounters a sequence of bytes which decodes to U+D800, U+DF00 ?

Of course, if such a sequence were encountered during UTF-16 processing it would be pretty obvious, but I'm not talking UTF-16 any more. At least, not directly. Nonetheless, such a sequence could arise if Application A encodes text to a file using UTF-16, which is then read by Application B (a very old, legacy application, unaware of the existence of codepoints above U+FFFF) and re-saved in UTF-8.
It is clear that Application B is not a conforming application to Unicode 3.2 or Unicode 4.0, right?
It is clear that Application A is a conforming application to Unicode 3.2 or Unicode 4.0, right?
 
If you have application C, which read whatever the application B write, then it should not accept illegal UTF-8 sequence which use 3 bytes to encode U+D800 and another 3 bytes to encode U+DF00. This is clear in Unicode 3.2 or Unicode 4.0


This question generalises to ... should all encoding schemes treat surrogate pairs as surrogate pairs, or just UTF-16 ?

This question generalises further still, to ... do the phrases "surrogate character" and "surrogate pair" have any meaning whatsoever outside UTF-16?
 
==================================
Frank Yung-Fong Tang
System Architect, I�t�rn�ti�n�l D�v�l�pme�t, AOL Int�r��t�v� S�rvi�es
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son, that whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's I�t�rn�ti�n�liz�ti�n Secrets
Want to translate your English text to something Thailand users can understand ?
-> Try English-to-Thai machine translation at http://c3po.links.nectec.or.th/parsit/

Reply via email to