RE: UTF-8 ill-formed question

Doug Ewell Tue, 11 Dec 2012 13:43:24 -0800

Ian Clifton <ian dot clifton at chem dot ox dot ac dot uk> wrote:

>> Does anyone know why ill-form occurred on the UTF-8? besides it
>> doesn't follow > the pattern of UTF-8 byte-sequences, i just
>> wondering how or why?
>
> There’s a lot about the conditions for the well-formedness of UTF-8
> sequences in Chapter 3 of the Standard:
>
> [...]
>
> Even if these conditions hold, however, a UTF-8 sequence might still
> be ill-formed, Table 3-7 exhaustively lists all the cases.


But the bottom line is, there's nothing ill-formed about James' original
example. It's perfectly good UTF-8. The visual similarity between the
digits in U+4E8C and the first and last bytes in <E4 BA 8C> is mostly
coincidental.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

RE: UTF-8 ill-formed question

Reply via email to