Re: Invalid UTF-8 sequences

Marcin 'Qrczak' Kowalczyk Wed, 08 Dec 2004 14:23:23 -0800

Lars Kristan <[EMAIL PROTECTED]> writes:

> Quite close. Except for the fact that:
> * U+EE93 is represented in UTF-32 as 0x0000EE93
> * U+EE93 is represented in UTF-16 as 0xEE93
> * U+EE93 is represented in UTF-8 as 0x93 (_NOT_ 0xEE 0xBA 0x93)


Then it would be impossible to represent sequences like
U+EEEE U+EEBA U+EE93 in UTF-8, and conversion UTF-32 -> UTF-8 -> UTF-32
would not round-trip.

Concatenation of UTF-8-encoded strings would not be equivalent to
UTF-8-encoding of the concatenation of code points.

This is broken.

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/

Re: Invalid UTF-8 sequences

Reply via email to