ID: 37154
User updated by: troelskn at gmail dot com
Reported By: troelskn at gmail dot com
Status: Bogus
Bug Type: DOM XML related
Operating System: *
PHP Version: 5.1.2
New Comment:
Not true.
$mb_detect_charsets = "ASCII,UTF-8,ISO-8859-1";
$dom = new DOMDocument("1.0", "UTF-8");
$doc = $dom->appendChild($dom->createElement("document"));
$doc->appendChild($dom->createTextNode(utf8_encode("Iñtërnâtiônàlizætiøn")));
echo mb_detect_encoding($dom->saveXML(), $mb_detect_charsets) .
"<br>";
$dom = new DOMDocument("1.0", "ISO-8859-1");
$doc = $dom->appendChild($dom->createElement("document"));
$doc->appendChild($dom->createTextNode(utf8_encode("Iñtërnâtiônàlizætiøn")));
echo mb_detect_encoding($dom->saveXML(), $mb_detect_charsets) .
"<br>";
-------------------------------------------------------
outputs :
UTF-8
ISO-8859-1
-------------------------------------------------------
Removing ut8_encode crashes the second example.
Previous Comments:
------------------------------------------------------------------------
[2006-04-21 15:02:53] [EMAIL PROTECTED]
Wrong.
The *default* input encoding is UTF8. But you can always use <?xml
version="1.0" encoding="<your encoding>"?>.
All the result data are in UTF8 anyway, this is libxml2 feature.
------------------------------------------------------------------------
[2006-04-21 14:37:34] troelskn at gmail dot com
Description:
------------
After some digging around and experimentation, I have found out that
the DOM extension needs all input strings to be utf8-encoded. This
means that any code using the extension must be spingled with
urf8_encode.
The problem can probably not be fixed without breaking backward
compatibility, so the most sane choice may be to leave it, but atleast
update the documentation to state this.
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=37154&edit=1