Re: [PHP-DEV] UTF-8 encoding

Stig Venaas Sun, 25 Aug 2002 11:55:05 -0700

On Sun, Aug 25, 2002 at 06:28:44PM +0100, Wez Furlong wrote:
> Hi Stefan,
> 
> I borrowed that code from the mbstring extension.  Either I misinterpreted
> the code, or mbstring also has it's utf-8 decoder incorrect.
> 
> --Wez.
> 
> On 08/25/02, "Stefan Esser" <[EMAIL PROTECTED]> wrote:
> > Hello,
> > 
> > html.c / get_next_char() has an utf-8 decoder. The implementation
> > is a little bit fishy. AFAIK utf-8 sequences are 1 upto 4 chars
> > but this one supports 5, 6 byte utf-8 sequences. I wonder
> > where this addition to the standard is defined..
> > The problem is the following: the german ue is 0xFC which is an
> > invalid utf-8 sequence. But the utf-8 decoder would recognise it
> > as the lead byte of a 6 byte utf-8 sequence.


I wonder too, but it would still be recognized (or should, I haven't
checked the code), unless the next 5 bytes all have values between
128 and 192.

BTW It seems that for some reason I can't post to php-dev anymore,
at least some of you get this...

Stig


-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] UTF-8 encoding

Reply via email to