ID: 38538 User updated by: spam02 at pornel dot net Reported By: spam02 at pornel dot net Status: Wont fix Bug Type: DOM XML related Operating System: * PHP Version: 6CVS-2006-08-21 (snap) New Comment:
I'm not saying that loadHTML should read namespace from input - ofcourse HTML/SGML syntax doesn't support it. But by loading HTML into XML DOM you're basically converting it to namespace aware representation. Being HTML is implied by source format, and not explictly stated in the document, so namespace information needs to be added. Namespace is used to distinguish incompatible nodes in DOM, however DOM representation of XHTML and HTML is 100% compatible (same semantics, structure). Tidy nodes aren't compatible with PHP DOM extension, so this is not a solution. Previous Comments: ------------------------------------------------------------------------ [2006-08-22 05:41:44] [EMAIL PROTECTED] loadHTML only properly deals with HTML(4) documents (which are by definition not namespace aware and therefore discards them). If you want to keep the namespaces, use loadXML() or, for your proposal, use the tidy extension to make XHTML out of your HTML documents. ------------------------------------------------------------------------ [2006-08-21 23:25:35] spam02 at pornel dot net Description: ------------ >From W3C: XHTML/1.0 is a reformulation of HTML 4 in XML. The semantics of HTML and XHTML elements are identical. loadHTML() should put loaded elements in XHTML namespace to preserve their semantics. These aren't just any random elements - these are HTML elements, and HTML elements in XML (therefore DOM) are in "http://www.w3.org/1999/xhtml" namespace. This isn't purely academic problem. It's difficult to handle both HTML and XHTML uniformly using DOM in PHP - difference in namespaces causes xpath/XSLT to behave differently. AFAIK there's no trivial method of changing namespace of all document elements, so namespace returned by loadHTML() is quite important. SUGGESTED CHANGE Simply putting elements in a namespace will break backwards-compatibility a little (xpath queries for example). Therefore I suggest adding optional boolean argument to loadHTML() and loadHTMLFile() that enables new behavior. Reproduce code: --------------- <?php $html = new DOMDocument(); $html->loadHTML('<html><body>hello'); $xhtml = new DOMDocument(); $xhtml->loadXML('<html xmlns="http://www.w3.org/1999/xhtml"><body>hello</body></html>'); function test($doc) { $x = new DOMXPath($doc); $x->registerNamespace("x","http://www.w3.org/1999/xhtml"); echo $x->evaluate("string(//x:body)"); } test($html); test($xhtml); // local-name() could be used as workaround in this practicular text-case, however this isn't possible/feasible in every case. Expected result: ---------------- hellohello Actual result: -------------- hello ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=38538&edit=1
