Re: UTF-8 ill-formed question

Ian Clifton Tue, 11 Dec 2012 13:06:00 -0800

From: James Lin <James_Lin_at_symantec.com>
> Hi
> Does anyone know why ill-form occurred on the UTF-8? besides it
> doesn't follow > the pattern of UTF-8 byte-sequences, i just
> wondering how or why?


There’s a lot about the conditions for the well‐formedness of UTF-8
sequences in Chapter 3 of the Standard:

http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf

Basically, a header byte starting with 𝑛 1-bits (2 ≤ 𝑛 ≤ 4) and a 0-bit
must be followed by 𝑛−1 trailer bytes starting 10…, and that’s the only
place such trailer bytes should occur. Even if these conditions hold,
however, a UTF-8 sequence might still be ill‐formed, Table 3-7
exhaustively lists all the cases.

-- 
Ian ◎

Re: UTF-8 ill-formed question

Reply via email to