ID: 27808 Updated by: [EMAIL PROTECTED] -Summary: xml_parse() chokes on the UTF-8 magic string Reported By: jcalvert at gmx dot net Status: Open -Bug Type: XML related +Bug Type: Documentation problem Operating System: Debian Sid PHP Version: 5.0.0RC1 New Comment:
Corrected summary. 1. For the sake of backwards compatibility, xml_parser_create() with no arguments generates a parser that only recognises ISO-8859-1. 2. If one passed "UTF-8" to it for the "encoding" argument, the parser backed by libxml assumes any given XML document to be encoded in plain UTF-8 encoding, where no BOM (Byte order mark) is allowed. 3. If one passed "" (a null string) to it, the parser attempts to identify which encoding the document is encoded in by looking at the heading 3 or 4 bytes. In this case a BOM must be there. This might fix your problem. It seems the third feature is not documented yet, so I'm marking this as a documentation problem. Previous Comments: ------------------------------------------------------------------------ [2004-03-31 13:00:42] jcalvert at gmx dot net Description: ------------ In PHP4 parsing a UTF-8 file with the magic string (\xEF\xBB\xBF) works just fine. In PHP5.0.0RC1 the function returns with an error message saying the string didn't contain any XML data. Stripping the magic string before calling the function yields the expected result. libxml2* version 2.6.7-1 ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=27808&edit=1