From: saramaca at libertysurf dot fr
Operating system: Windows XP
PHP version: 5.1.0
PHP Bug Type: XML related
Bug description: xml_parse_into_struct() chokes on the UTF-8 BOM
Description:
------------
In PHP4 xml_parse_into_struct() can parse an UTF-8-encoded XML file with
or without a UTF-8 BOM (\xEF\xBB\xBF). In PHP 5, this is no longer the
case and it raises an error saying the string doesn't contain any XML data
(Empty document).
Additionally PHP 5's xml_parse_into_struct() does *NOT* place default
attribute values into the struct (e.g. despite the DTD provided,
$content[1]['attributes']['type'] isn't set to "literal" in actual result
section below ; please compare it to expected result.) This used to work
under PHP 4.1.x and above (but the parser is based on expat AFAIK.)
PS: I guess "manually" stripping this magic number -- if embedded --
before calling the function would yield the expected result. However I
found an acceptable work-around that seems to work equally well across
versions 4 and 5 of PHP :
<?php
...
$parser = xml_parser_create('');
xml_parser_set_option($parser, XML_OPTION_TARGET_ENCODING, $encoding);
...
?>
Rather than:
<?php
...
$parser = xml_parser_create($encoding);
...
?>
Reproduce code:
---------------
http://www.diptyque.net/bugs/utf8_bom.php
; running PHP 4 --> outputs expected result
http://www.diptyque.net/bugs/utf8_bom.phps
; source code
Expected result:
----------------
w/ autodetect -->
Array
(
[0] => Array
(
[tag] => bundle
[type] => open
[level] => 1
[value] =>
)
[1] => Array
(
[tag] => resource
[type] => complete
[level] => 2
[attributes] => Array
(
[key] => rSeeYou
[type] => literal
)
[value] => A bient&244;t
)
[2] => Array
(
[tag] => bundle
[value] =>
[type] => cdata
[level] => 1
)
[3] => Array
(
[tag] => bundle
[type] => close
[level] => 1
)
)
w/o autodetect -->
Array
(
[0] => Array
(
[tag] => bundle
[type] => open
[level] => 1
[value] =>
)
[1] => Array
(
[tag] => resource
[type] => complete
[level] => 2
[attributes] => Array
(
[key] => rSeeYou
[type] => literal
)
[value] => A bient&244;t
)
[2] => Array
(
[tag] => bundle
[value] =>
[type] => cdata
[level] => 1
)
[3] => Array
(
[tag] => bundle
[type] => close
[level] => 1
)
)
Actual result:
--------------
w/ autodetect -->
Array
(
[0] => Array
(
[tag] => bundle
[type] => open
[level] => 1
[value] =>
)
[1] => Array
(
[tag] => resource
[type] => complete
[level] => 2
[attributes] => Array
(
[key] => rSeeYou
)
[value] => A bient&244;t
)
[2] => Array
(
[tag] => bundle
[value] =>
[type] => cdata
[level] => 1
)
[3] => Array
(
[tag] => bundle
[type] => close
[level] => 1
)
)
w/o autodetect -->
Empty document
--
Edit bug report at http://bugs.php.net/?id=35447&edit=1
--
Try a CVS snapshot (php4): http://bugs.php.net/fix.php?id=35447&r=trysnapshot4
Try a CVS snapshot (php5.0):
http://bugs.php.net/fix.php?id=35447&r=trysnapshot50
Try a CVS snapshot (php5.1):
http://bugs.php.net/fix.php?id=35447&r=trysnapshot51
Fixed in CVS: http://bugs.php.net/fix.php?id=35447&r=fixedcvs
Fixed in release: http://bugs.php.net/fix.php?id=35447&r=alreadyfixed
Need backtrace: http://bugs.php.net/fix.php?id=35447&r=needtrace
Need Reproduce Script: http://bugs.php.net/fix.php?id=35447&r=needscript
Try newer version: http://bugs.php.net/fix.php?id=35447&r=oldversion
Not developer issue: http://bugs.php.net/fix.php?id=35447&r=support
Expected behavior: http://bugs.php.net/fix.php?id=35447&r=notwrong
Not enough info:
http://bugs.php.net/fix.php?id=35447&r=notenoughinfo
Submitted twice:
http://bugs.php.net/fix.php?id=35447&r=submittedtwice
register_globals: http://bugs.php.net/fix.php?id=35447&r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=35447&r=php3
Daylight Savings: http://bugs.php.net/fix.php?id=35447&r=dst
IIS Stability: http://bugs.php.net/fix.php?id=35447&r=isapi
Install GNU Sed: http://bugs.php.net/fix.php?id=35447&r=gnused
Floating point limitations: http://bugs.php.net/fix.php?id=35447&r=float
No Zend Extensions: http://bugs.php.net/fix.php?id=35447&r=nozend
MySQL Configuration Error: http://bugs.php.net/fix.php?id=35447&r=mysqlcfg