Hello, html.c / get_next_char() has an utf-8 decoder. The implementation is a little bit fishy. AFAIK utf-8 sequences are 1 upto 4 chars but this one supports 5, 6 byte utf-8 sequences. I wonder where this addition to the standard is defined.. The problem is the following: the german ue is 0xFC which is an invalid utf-8 sequence. But the utf-8 decoder would recognise it as the lead byte of a 6 byte utf-8 sequence.
Stefan -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php