From: Operating system: CentOS 5 PHP version: 5.3.8 Package: SimpleXML related Bug Type: Bug Bug description:SimpleXML parse fails because of an URL, no error messages
Description: ------------ - SimpleXML - Revision: 314376 - libxml2 - version: 2.6.26 XML document generated using Word 2007, saving as regular Word file (doxc), then extracting the "word/document.xml" file from the compressed docx file (open w/7-zip, WinZIP, WinRAR, etc.). I've been developing a docx parser in PHP and have encountered a strange bug. SimpleXML fails to parse the XML due to a URL in the XML document. I get no errors even with libxml_use_internal_errors(true) and libxml_get_errors(). the same documents parses perfectly fine with DOMDocument and also the W3C validator. I played around with the XML, trying to add/remove elements causing the parse error, and it turned out to be a URL in the <w:document> node. It is the URL in this line in the w:document tag: xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" Remove the URL, keeping xmlns:w="" makes the parse successful. Test script: --------------- Script for reproduction: http://nerdvar.com/stigma/php_src/simplexml_bug_example.phps The entire document.xml file: http://nerdvar.com/stigma/php_src/document.xml Expected result: ---------------- SimpleXMLElement Object ( [body] => SimpleXMLElement Object ( [0] => ) ) Actual result: -------------- Error -- Edit bug report at https://bugs.php.net/bug.php?id=60416&edit=1 -- Try a snapshot (PHP 5.4): https://bugs.php.net/fix.php?id=60416&r=trysnapshot54 Try a snapshot (PHP 5.3): https://bugs.php.net/fix.php?id=60416&r=trysnapshot53 Try a snapshot (trunk): https://bugs.php.net/fix.php?id=60416&r=trysnapshottrunk Fixed in SVN: https://bugs.php.net/fix.php?id=60416&r=fixed Fixed in SVN and need be documented: https://bugs.php.net/fix.php?id=60416&r=needdocs Fixed in release: https://bugs.php.net/fix.php?id=60416&r=alreadyfixed Need backtrace: https://bugs.php.net/fix.php?id=60416&r=needtrace Need Reproduce Script: https://bugs.php.net/fix.php?id=60416&r=needscript Try newer version: https://bugs.php.net/fix.php?id=60416&r=oldversion Not developer issue: https://bugs.php.net/fix.php?id=60416&r=support Expected behavior: https://bugs.php.net/fix.php?id=60416&r=notwrong Not enough info: https://bugs.php.net/fix.php?id=60416&r=notenoughinfo Submitted twice: https://bugs.php.net/fix.php?id=60416&r=submittedtwice register_globals: https://bugs.php.net/fix.php?id=60416&r=globals PHP 4 support discontinued: https://bugs.php.net/fix.php?id=60416&r=php4 Daylight Savings: https://bugs.php.net/fix.php?id=60416&r=dst IIS Stability: https://bugs.php.net/fix.php?id=60416&r=isapi Install GNU Sed: https://bugs.php.net/fix.php?id=60416&r=gnused Floating point limitations: https://bugs.php.net/fix.php?id=60416&r=float No Zend Extensions: https://bugs.php.net/fix.php?id=60416&r=nozend MySQL Configuration Error: https://bugs.php.net/fix.php?id=60416&r=mysqlcfg