From:
Operating system: Windows WAMP + LAMP(?)
PHP version: 5.3.2
Package: DOM XML related
Bug Type: Bug
Bug description:DOMDocument::load() UTF-8 limitation
Description:
------------
The DOMDocument::load() function ONLY loads UTF-8 encoded files.
Ex: 'article.php' contains :
$xmlDoc = new DOMDocument();
$page = 'article.xsl';
$xmlDoc->load($page);
$xmlDoc->load('cours.xml');
Let's consider 'article.xsl' contains '... Précédent ...' (not pure ASCII
chars)
If the content of 'article.xsl' is iso-8859-1 encoded, the subsequent
error
appears (same if 'cours.xml' is iso-8859-1 encoded):
"DOMDocument::load() [domdocument.load]: Input is not proper UTF-8,
indicate encoding ! Bytes: 0xE9 0x62 0x75 0x74 in
file:///C:/wamp/www/xsl2/article.xsl, line: 71 in
C:\wamp\www\xsl2\article.php on line 13"
So, it's imperative to UTF-8 encode 'cours.xml' and 'article.xsl'.
Of course $page = utf8_encode($page); ... is of no use,
because the 'utf8_encode' only operates on the string 'article.xsl', and
not on the file content !.
CONCLUSION : It's not really a BUG in the ->load() function.
But it would be really important to have a supplementary optional
parameter,
indicating the encoding of the incoming file:
-----Desired improvment ----------->
Add an optional parameter describing the $file actual encoding:
$xmlDoc->load($page, 'iso-8859-1');
DOMDocument::load( string $file [, string $encoding])
The $encoding optional parameter thus would be useful to describe the
actual $file encoding (if not UTF-8).
----------- END ----------------------
Test script:
---------------
[test.php]
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load("cours.xml");
?>
[cours.xml] (no matter the line encoding...
The problem is caused by the 'é' from 'éclair'...)
<?xml version="1.0" encoding="UTF-8"?>
<root>
<chapitre titre="Titre du chapitre 1">
<partie titre="Titre de la partie 1">
Texte éclair
</partie>
</chapitre>
</root>
(displays):
Warning: DOMDocument::load() [domdocument.load]: Input is not proper UTF-8,
indicate encoding ! Bytes: 0xE9 0x63 0x6C 0x61 in
file:///C:/wamp/www/xsl2/cours.xml, line: 5 in C:\wamp\www\xsl2\test.php on
line 3
--
Edit bug report at http://bugs.php.net/bug.php?id=51325&edit=1
--
Try a snapshot (PHP 5.2):
http://bugs.php.net/fix.php?id=51325&r=trysnapshot52
Try a snapshot (PHP 5.3):
http://bugs.php.net/fix.php?id=51325&r=trysnapshot53
Try a snapshot (PHP 6.0):
http://bugs.php.net/fix.php?id=51325&r=trysnapshot60
Fixed in SVN:
http://bugs.php.net/fix.php?id=51325&r=fixed
Fixed in SVN and need be documented:
http://bugs.php.net/fix.php?id=51325&r=needdocs
Fixed in release:
http://bugs.php.net/fix.php?id=51325&r=alreadyfixed
Need backtrace:
http://bugs.php.net/fix.php?id=51325&r=needtrace
Need Reproduce Script:
http://bugs.php.net/fix.php?id=51325&r=needscript
Try newer version:
http://bugs.php.net/fix.php?id=51325&r=oldversion
Not developer issue:
http://bugs.php.net/fix.php?id=51325&r=support
Expected behavior:
http://bugs.php.net/fix.php?id=51325&r=notwrong
Not enough info:
http://bugs.php.net/fix.php?id=51325&r=notenoughinfo
Submitted twice:
http://bugs.php.net/fix.php?id=51325&r=submittedtwice
register_globals:
http://bugs.php.net/fix.php?id=51325&r=globals
PHP 4 support discontinued: http://bugs.php.net/fix.php?id=51325&r=php4
Daylight Savings: http://bugs.php.net/fix.php?id=51325&r=dst
IIS Stability:
http://bugs.php.net/fix.php?id=51325&r=isapi
Install GNU Sed:
http://bugs.php.net/fix.php?id=51325&r=gnused
Floating point limitations:
http://bugs.php.net/fix.php?id=51325&r=float
No Zend Extensions:
http://bugs.php.net/fix.php?id=51325&r=nozend
MySQL Configuration Error:
http://bugs.php.net/fix.php?id=51325&r=mysqlcfg