Ian Clifton <ian dot clifton at chem dot ox dot ac dot uk> wrote: >> Does anyone know why ill-form occurred on the UTF-8? besides it >> doesn't follow > the pattern of UTF-8 byte-sequences, i just >> wondering how or why? > > There’s a lot about the conditions for the well-formedness of UTF-8 > sequences in Chapter 3 of the Standard: > > [...] > > Even if these conditions hold, however, a UTF-8 sequence might still > be ill-formed, Table 3-7 exhaustively lists all the cases.
But the bottom line is, there's nothing ill-formed about James' original example. It's perfectly good UTF-8. The visual similarity between the digits in U+4E8C and the first and last bytes in <E4 BA 8C> is mostly coincidental. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell

