Re: [PHP-DEV] UTF-8 encoding
On Sun, Aug 25, 2002 at 09:21:01PM +0200, Stig Venaas wrote: Great, I've been wondering why UTF-8 wasn't defined like that in the first place. Could you please give me a pointer to the addition? It is defined in RFC 2279. Regards, Stefan -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] UTF-8 encoding
Hi Stefan, I borrowed that code from the mbstring extension. Either I misinterpreted the code, or mbstring also has it's utf-8 decoder incorrect. --Wez. On 08/25/02, Stefan Esser [EMAIL PROTECTED] wrote: Hello, html.c / get_next_char() has an utf-8 decoder. The implementation is a little bit fishy. AFAIK utf-8 sequences are 1 upto 4 chars but this one supports 5, 6 byte utf-8 sequences. I wonder where this addition to the standard is defined.. The problem is the following: the german ue is 0xFC which is an invalid utf-8 sequence. But the utf-8 decoder would recognise it as the lead byte of a 6 byte utf-8 sequence. -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] UTF-8 encoding
On Sun, Aug 25, 2002 at 06:28:44PM +0100, Wez Furlong wrote: Hi Stefan, I borrowed that code from the mbstring extension. Either I misinterpreted the code, or mbstring also has it's utf-8 decoder incorrect. --Wez. On 08/25/02, Stefan Esser [EMAIL PROTECTED] wrote: Hello, html.c / get_next_char() has an utf-8 decoder. The implementation is a little bit fishy. AFAIK utf-8 sequences are 1 upto 4 chars but this one supports 5, 6 byte utf-8 sequences. I wonder where this addition to the standard is defined.. The problem is the following: the german ue is 0xFC which is an invalid utf-8 sequence. But the utf-8 decoder would recognise it as the lead byte of a 6 byte utf-8 sequence. I wonder too, but it would still be recognized (or should, I haven't checked the code), unless the next 5 bytes all have values between 128 and 192. BTW It seems that for some reason I can't post to php-dev anymore, at least some of you get this... Stig -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php