Hello,

html.c / get_next_char() has an utf-8 decoder. The implementation
is a little bit fishy. AFAIK utf-8 sequences are 1 upto 4 chars
but this one supports 5, 6 byte utf-8 sequences. I wonder
where this addition to the standard is defined..
The problem is the following: the german ue is 0xFC which is an
invalid utf-8 sequence. But the utf-8 decoder would recognise it
as the lead byte of a 6 byte utf-8 sequence.

Stefan 

-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to