From: gros at mpdl dot mpg dot de Operating system: Mac OS-X 10.6.2 PHP version: 5.3.0 PHP Bug Type: XML Reader Bug description: text in UTF-8 encoded xml cut off by xml parser with German umlauts
Description: ------------ When parsing an xml file with UTF-8 encoding (like this one: http://bit.ly/3PSi44), text containing German umlauts is cut off: original: <e:organization-name>Kaiser Wilhelm Institut für Züchtungsforschung</e:organization-name> result after parsing: "Kaiser Wilhelm Institut f" or parsing this <dc:publisher>Societäts-Verlag</dc:publisher> results in "äts-Verlag" Reproduce code: --------------- $snippet = file_get_contents("http://bit.ly/3PSi44"); if (!($xml_parser = xml_parser_create(""))) die("Couldn't create parser."); xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING,'UTF-8'); xml_set_element_handler($xml_parser,"startElementHandler","endElementHandler"); xml_set_character_data_handler( $xml_parser, "characterDataHandler"); $retstr = ""; if(!xml_parse($xml_parser, $snippet)) { $retstr = sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)); } xml_parser_free($xml_parser); Expected result: ---------------- I expect properly imported text like outlined in the description: parsing this: <e:organization-name>Kaiser Wilhelm Institut für Züchtungsforschung</e:organization-name> should result in: "Kaiser Wilhelm Institut für Züchtungsforschung" or parsing this <dc:publisher>Societäts-Verlag</dc:publisher> should result in "Societäts-Verlag" Actual result: -------------- I get cut-off pieces of text when the text contains German umlauts (see two examples in the description). parsing this: <e:organization-name>Kaiser Wilhelm Institut für Züchtungsforschung</e:organization-name> results in: "Kaiser Wilhelm Institut f" or parsing this <dc:publisher>Societäts-Verlag</dc:publisher> results in "äts-Verlag" -- Edit bug report at http://bugs.php.net/?id=50139&edit=1 -- Try a snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=50139&r=trysnapshot52 Try a snapshot (PHP 5.3): http://bugs.php.net/fix.php?id=50139&r=trysnapshot53 Try a snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=50139&r=trysnapshot60 Fixed in SVN: http://bugs.php.net/fix.php?id=50139&r=fixed Fixed in SVN and need be documented: http://bugs.php.net/fix.php?id=50139&r=needdocs Fixed in release: http://bugs.php.net/fix.php?id=50139&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=50139&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=50139&r=needscript Try newer version: http://bugs.php.net/fix.php?id=50139&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=50139&r=support Expected behavior: http://bugs.php.net/fix.php?id=50139&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=50139&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=50139&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=50139&r=globals PHP 4 support discontinued: http://bugs.php.net/fix.php?id=50139&r=php4 Daylight Savings: http://bugs.php.net/fix.php?id=50139&r=dst IIS Stability: http://bugs.php.net/fix.php?id=50139&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=50139&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=50139&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=50139&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=50139&r=mysqlcfg