On Sun, Aug 25, 2002 at 06:28:44PM +0100, Wez Furlong wrote: > Hi Stefan, > > I borrowed that code from the mbstring extension. Either I misinterpreted > the code, or mbstring also has it's utf-8 decoder incorrect. > > --Wez. > > On 08/25/02, "Stefan Esser" <[EMAIL PROTECTED]> wrote: > > Hello, > > > > html.c / get_next_char() has an utf-8 decoder. The implementation > > is a little bit fishy. AFAIK utf-8 sequences are 1 upto 4 chars > > but this one supports 5, 6 byte utf-8 sequences. I wonder > > where this addition to the standard is defined.. > > The problem is the following: the german ue is 0xFC which is an > > invalid utf-8 sequence. But the utf-8 decoder would recognise it > > as the lead byte of a 6 byte utf-8 sequence.
I wonder too, but it would still be recognized (or should, I haven't checked the code), unless the next 5 bytes all have values between 128 and 192. BTW It seems that for some reason I can't post to php-dev anymore, at least some of you get this... Stig -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php