From: James Lin <James_Lin_at_symantec.com> > Hi > Does anyone know why ill-form occurred on the UTF-8? besides it > doesn't follow > the pattern of UTF-8 byte-sequences, i just > wondering how or why?
There’s a lot about the conditions for the well‐formedness of UTF-8 sequences in Chapter 3 of the Standard: http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf Basically, a header byte starting with 𝑛 1-bits (2 ≤ 𝑛 ≤ 4) and a 0-bit must be followed by 𝑛−1 trailer bytes starting 10…, and that’s the only place such trailer bytes should occur. Even if these conditions hold, however, a UTF-8 sequence might still be ill‐formed, Table 3-7 exhaustively lists all the cases. -- Ian ◎

