ID: 30692 Updated by: [EMAIL PROTECTED] Reported By: chrivers at iversen-net dot dk -Status: Open +Status: Bogus Bug Type: XML related Operating System: Linux 2.6.5, Debian Sarge PHP Version: 5.0.2 New Comment:
This is a change, but nothing wrong as a SAX parser just fires events. It might break up character data and this is normal behavior. Previous Comments: ------------------------------------------------------------------------ [2004-11-05 15:24:57] chrivers at iversen-net dot dk Description: ------------ When converting my pages to PHP5 SAX XML parser, they broke because of an appearant incompatability. The chardata-handler is called in a different pattern that in PHP4. Before, it seemed to be called once per character block. Now, the buffer is flushed before each block of high-bit characters, it seems. This is unexpected and (seemingly?) impossible to change. Reproduce code: --------------- <? function es() {} function ee() {} function cd($P, $D) {print "[$D]\n";} # $str = "UTF:æøå:UTF"; $strenc = "utf-8"; $str = "ISO:æøå:ISO"; $strenc = "iso-8859-1"; $buffer = "<?xml version=\"1.0\" encoding=\"$strenc\"?><global>$str</global>"; $xml_parser = xml_parser_create(); # xml_set_element_handler($xml_parser, "es", "ee"); xml_set_character_data_handler($xml_parser, "cd"); xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true); xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, "iso-8859-1"); If (xml_parse($xml_parser, $buffer) == false) die(sprintf("TV import error: %s at line %d col %d\n%s", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser), xml_get_current_column_number($xml_parser), $buffer)); xml_parser_free($xml_parser); ?> Expected result: ---------------- expected: [ISO:æøå:ISO] php4: [ISO:æøå:ISO] Actual result: -------------- [ISO:] [æøå:ISO] ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=30692&edit=1