ID:               27808
 Updated by:       [EMAIL PROTECTED]
-Summary:          xml_parse() chokes on the UTF-8 magic string
 Reported By:      jcalvert at gmx dot net
 Status:           Open
-Bug Type:         XML related
+Bug Type:         Documentation problem
 Operating System: Debian Sid
 PHP Version:      5.0.0RC1
 New Comment:

Corrected summary.



1. For the sake of backwards compatibility, xml_parser_create() with no
arguments generates a parser that only recognises ISO-8859-1.



2. If one passed "UTF-8" to it for the "encoding" argument, the parser
backed by libxml assumes any given XML document to be encoded in plain
UTF-8 encoding, where no BOM (Byte order mark) is allowed.



3. If one passed "" (a null string) to it, the parser attempts to
identify which encoding the document is encoded in by looking at the
heading 3 or 4 bytes. In this case a BOM must be there. This might fix
your problem.



It seems the third feature is not documented yet, so I'm marking this
as a documentation problem.




Previous Comments:
------------------------------------------------------------------------

[2004-03-31 13:00:42] jcalvert at gmx dot net

Description:
------------
In PHP4 parsing a UTF-8 file with the magic string (\xEF\xBB\xBF) works
just fine. In PHP5.0.0RC1 the function returns with an error message
saying the string didn't contain any XML data. Stripping the magic
string before calling the function yields the expected result.



libxml2* version 2.6.7-1





------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=27808&edit=1

Reply via email to