From: Operating system: PHP version: 5.3.2 Package: SimpleXML related Bug Type: Bug Bug description:simplexml_load_file() doesn't use HTTP headers
Description: ------------ Seen at http://stackoverflow.com/questions/2899274/ If you use simplexml_load_file() to load a remote document via HTTP, SimpleXML assumes that the content is UTF-8 regardless of the HTTP headers. In the test script below, at the time of writing, Google's web server returns something like: ------------- HTTP/1.1 200 OK Content-Type: text/xml; charset=GB2312 Date: Tue, 25 May 2010 05:05:17 GMT Pragma: no-cache Expires: Fri, 01 Jan 1990 00:00:00 GMT Cache-Control: no-cache, no-store, must-revalidate expires=Thu, 24-May-2012 05:05:17 GMT; path=/; domain=.google.com X-Content-Type-Options: nosniff Server: igfe X-XSS-Protection: 1; mode=block Transfer-Encoding: chunked <?xml version="1.0"?><xml_api_reply version="1"> <!-- single-byte encoded GB2312 stuff --> </xml_api_reply> ------------- The server advertises the content "text/xml; charset=GB2312", but since the XML declaration doesn't mention the encoding, SimpleXML assumes it is UTF-8 and eventually fails to load it. If it is at all possible, SimpleXML (and DOM, I assume) should look at the HTTP headers to find the document's encoding. Test script: --------------- simplexml_load_file('http://www.google.com/ig/api?weather=11791&hl=zh-CN'); Actual result: -------------- PHP Warning: simplexml_load_file(): http://www.google.com/ig/api?weather=11791&hl=zh-CN:1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xC7 0xE7 0x22 0x2F in Command line code on line 1 Warning: simplexml_load_file(): http://www.google.com/ig/api?weather=11791&hl=zh-CN:1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xC7 0xE7 0x22 0x2F in Command line code on line 1 PHP Warning: simplexml_load_file(): t_system data="SI"/></forecast_information><current_conditions><condition data=" in Command line code on line 1 Warning: simplexml_load_file(): t_system data="SI"/></forecast_information><current_conditions><condition data=" in Command line code on line 1 PHP Warning: simplexml_load_file(): ^ in Command line code on line 1 Warning: simplexml_load_file(): -- Edit bug report at http://bugs.php.net/bug.php?id=51903&edit=1 -- Try a snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=51903&r=trysnapshot52 Try a snapshot (PHP 5.3): http://bugs.php.net/fix.php?id=51903&r=trysnapshot53 Try a snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=51903&r=trysnapshot60 Fixed in SVN: http://bugs.php.net/fix.php?id=51903&r=fixed Fixed in SVN and need be documented: http://bugs.php.net/fix.php?id=51903&r=needdocs Fixed in release: http://bugs.php.net/fix.php?id=51903&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=51903&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=51903&r=needscript Try newer version: http://bugs.php.net/fix.php?id=51903&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=51903&r=support Expected behavior: http://bugs.php.net/fix.php?id=51903&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=51903&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=51903&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=51903&r=globals PHP 4 support discontinued: http://bugs.php.net/fix.php?id=51903&r=php4 Daylight Savings: http://bugs.php.net/fix.php?id=51903&r=dst IIS Stability: http://bugs.php.net/fix.php?id=51903&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=51903&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=51903&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=51903&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=51903&r=mysqlcfg