On 8/28/2013 3:29 PM, Xue Fuqiao wrote:
I see.  Thanks for all your replies!

BTW I have a further question:

On Wed, Aug 28, 2013 at 1:44 PM, Philippe Verdy <[email protected]> wrote:
- in UTF-8, you'll need to look backward between 1 to 3 positions before
your start position to find the leading 8-bit code unit (>= 0xC0).
Why should this be >=0xC0?

because all trailing bytes start with pattern 10xxxxxx which is < 1100000 for any value of x. (The bits marked x can take any bit combination, while the first two bits are constant).

So, if you see byte >= 0xC0 you know that you are on a leading byte.

(single bytes, those < 0x80 don't need any backup, if your pointer points to one of them,
you are at a character boundary anyway).

A./

Reply via email to