ID: 38538 Updated by: [EMAIL PROTECTED] Reported By: spam02 at pornel dot net -Status: Open +Status: Wont fix Bug Type: DOM XML related Operating System: * PHP Version: 6CVS-2006-08-21 (snap) New Comment:
loadHTML only properly deals with HTML(4) documents (which are by definition not namespace aware and therefore discards them). If you want to keep the namespaces, use loadXML() or, for your proposal, use the tidy extension to make XHTML out of your HTML documents. Previous Comments: ------------------------------------------------------------------------ [2006-08-21 23:25:35] spam02 at pornel dot net Description: ------------ >From W3C: XHTML/1.0 is a reformulation of HTML 4 in XML. The semantics of HTML and XHTML elements are identical. loadHTML() should put loaded elements in XHTML namespace to preserve their semantics. These aren't just any random elements - these are HTML elements, and HTML elements in XML (therefore DOM) are in "http://www.w3.org/1999/xhtml" namespace. This isn't purely academic problem. It's difficult to handle both HTML and XHTML uniformly using DOM in PHP - difference in namespaces causes xpath/XSLT to behave differently. AFAIK there's no trivial method of changing namespace of all document elements, so namespace returned by loadHTML() is quite important. SUGGESTED CHANGE Simply putting elements in a namespace will break backwards-compatibility a little (xpath queries for example). Therefore I suggest adding optional boolean argument to loadHTML() and loadHTMLFile() that enables new behavior. Reproduce code: --------------- <?php $html = new DOMDocument(); $html->loadHTML('<html><body>hello'); $xhtml = new DOMDocument(); $xhtml->loadXML('<html xmlns="http://www.w3.org/1999/xhtml"><body>hello</body></html>'); function test($doc) { $x = new DOMXPath($doc); $x->registerNamespace("x","http://www.w3.org/1999/xhtml"); echo $x->evaluate("string(//x:body)"); } test($html); test($xhtml); // local-name() could be used as workaround in this practicular text-case, however this isn't possible/feasible in every case. Expected result: ---------------- hellohello Actual result: -------------- hello ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=38538&edit=1