Re: [PHP-DEV] UTF-8 encoding

2002-08-26 Thread Stefan Esser

On Sun, Aug 25, 2002 at 09:21:01PM +0200, Stig Venaas wrote:
 Great, I've been wondering why UTF-8 wasn't defined like that
 in the first place. Could you please give me a pointer to the
 addition?

It is defined in RFC 2279.

Regards,
Stefan


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] UTF-8 encoding

2002-08-25 Thread Wez Furlong

Hi Stefan,

I borrowed that code from the mbstring extension.  Either I misinterpreted
the code, or mbstring also has it's utf-8 decoder incorrect.

--Wez.

On 08/25/02, Stefan Esser [EMAIL PROTECTED] wrote:
 Hello,
 
 html.c / get_next_char() has an utf-8 decoder. The implementation
 is a little bit fishy. AFAIK utf-8 sequences are 1 upto 4 chars
 but this one supports 5, 6 byte utf-8 sequences. I wonder
 where this addition to the standard is defined..
 The problem is the following: the german ue is 0xFC which is an
 invalid utf-8 sequence. But the utf-8 decoder would recognise it
 as the lead byte of a 6 byte utf-8 sequence.



-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] UTF-8 encoding

2002-08-25 Thread Stig Venaas

On Sun, Aug 25, 2002 at 06:28:44PM +0100, Wez Furlong wrote:
 Hi Stefan,
 
 I borrowed that code from the mbstring extension.  Either I misinterpreted
 the code, or mbstring also has it's utf-8 decoder incorrect.
 
 --Wez.
 
 On 08/25/02, Stefan Esser [EMAIL PROTECTED] wrote:
  Hello,
  
  html.c / get_next_char() has an utf-8 decoder. The implementation
  is a little bit fishy. AFAIK utf-8 sequences are 1 upto 4 chars
  but this one supports 5, 6 byte utf-8 sequences. I wonder
  where this addition to the standard is defined..
  The problem is the following: the german ue is 0xFC which is an
  invalid utf-8 sequence. But the utf-8 decoder would recognise it
  as the lead byte of a 6 byte utf-8 sequence.

I wonder too, but it would still be recognized (or should, I haven't
checked the code), unless the next 5 bytes all have values between
128 and 192.

BTW It seems that for some reason I can't post to php-dev anymore,
at least some of you get this...

Stig


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php