ID: 35447
Updated by: [EMAIL PROTECTED]
Reported By: saramaca at libertysurf dot fr
-Status: Open
+Status: Assigned
Bug Type: XML related
Operating System: Windows XP
PHP Version: 5.1.0
-Assigned To:
+Assigned To: rrichards
New Comment:
expat vs libxml2 incompatibility?
Previous Comments:
------------------------------------------------------------------------
[2005-11-28 14:55:33] saramaca at libertysurf dot fr
Description:
------------
In PHP4 xml_parse_into_struct() can parse an UTF-8-encoded XML file
with or without a UTF-8 BOM (\xEF\xBB\xBF). In PHP 5, this is no longer
the case and it raises an error saying the string doesn't contain any
XML data (Empty document).
Additionally PHP 5's xml_parse_into_struct() does *NOT* place default
attribute values into the struct (e.g. despite the DTD provided,
$content[1]['attributes']['type'] isn't set to "literal" in actual
result section below ; please compare it to expected result.) This used
to work under PHP 4.1.x and above (but the parser is based on expat
AFAIK.)
PS: I guess "manually" stripping this magic number -- if embedded --
before calling the function would yield the expected result. However I
found an acceptable work-around that seems to work equally well across
versions 4 and 5 of PHP :
<?php
...
$parser = xml_parser_create('');
xml_parser_set_option($parser, XML_OPTION_TARGET_ENCODING, $encoding);
...
?>
Rather than:
<?php
...
$parser = xml_parser_create($encoding);
...
?>
Reproduce code:
---------------
http://www.diptyque.net/bugs/utf8_bom.php
; running PHP 4 --> outputs expected result
http://www.diptyque.net/bugs/utf8_bom.phps
; source code
Expected result:
----------------
w/ autodetect -->
Array
(
[0] => Array
(
[tag] => bundle
[type] => open
[level] => 1
[value] =>
)
[1] => Array
(
[tag] => resource
[type] => complete
[level] => 2
[attributes] => Array
(
[key] => rSeeYou
[type] => literal
)
[value] => A bient&244;t
)
[2] => Array
(
[tag] => bundle
[value] =>
[type] => cdata
[level] => 1
)
[3] => Array
(
[tag] => bundle
[type] => close
[level] => 1
)
)
w/o autodetect -->
Array
(
[0] => Array
(
[tag] => bundle
[type] => open
[level] => 1
[value] =>
)
[1] => Array
(
[tag] => resource
[type] => complete
[level] => 2
[attributes] => Array
(
[key] => rSeeYou
[type] => literal
)
[value] => A bient&244;t
)
[2] => Array
(
[tag] => bundle
[value] =>
[type] => cdata
[level] => 1
)
[3] => Array
(
[tag] => bundle
[type] => close
[level] => 1
)
)
Actual result:
--------------
w/ autodetect -->
Array
(
[0] => Array
(
[tag] => bundle
[type] => open
[level] => 1
[value] =>
)
[1] => Array
(
[tag] => resource
[type] => complete
[level] => 2
[attributes] => Array
(
[key] => rSeeYou
)
[value] => A bient&244;t
)
[2] => Array
(
[tag] => bundle
[value] =>
[type] => cdata
[level] => 1
)
[3] => Array
(
[tag] => bundle
[type] => close
[level] => 1
)
)
w/o autodetect -->
Empty document
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=35447&edit=1