Re: Unicode 4.0 BETA available for review

Stefan Persson Thu, 27 Feb 2003 11:33:18 -0800

Kenneth Whistler wrote:

Unicode 3.0 defined non-shorted UTF-8 as *irregular* code value
sequences. There were two types:
a. 0xC0 0x80 for U+0000 (instead of 0x00) b. 0xED 0xA0 0x80 0xED 0xB0 0x80 for U+10000 (instead of 0xF0 0x90 0x80 0x80)

Ah, but encoding NULL as a surrogate character and then encoding those two surrogates as three bytes, making totally 6 bytes a character, would also be technically possible (though not legal), right?

Stefan

_____________________________________________________
G� f�re i k�n och f� din sajt v�rderad p� nolltid med Yahoo! Express
Se mer p�: http://se.docs.yahoo.com/info/express/help/index.html

Re: Unicode 4.0 BETA available for review

Reply via email to