ID:               38538
 Updated by:       [EMAIL PROTECTED]
 Reported By:      spam02 at pornel dot net
-Status:           Open
+Status:           Wont fix
 Bug Type:         DOM XML related
 Operating System: *
 PHP Version:      6CVS-2006-08-21 (snap)
 New Comment:

loadHTML only properly deals with HTML(4) documents (which are 
by definition not namespace aware and therefore discards 
them). 

If you want to keep the namespaces, use loadXML() or, for your 
proposal, use the tidy extension to make XHTML out of your 
HTML documents.




Previous Comments:
------------------------------------------------------------------------

[2006-08-21 23:25:35] spam02 at pornel dot net

Description:
------------
>From W3C: XHTML/1.0 is a reformulation of HTML 4 in XML. The semantics
of HTML and XHTML elements are identical.

loadHTML() should put loaded elements in XHTML namespace to preserve
their semantics. These aren't just any random elements - these are HTML
elements, and HTML elements in XML (therefore DOM) are in
"http://www.w3.org/1999/xhtml"; namespace.

This isn't purely academic problem. 

It's difficult to handle both HTML and XHTML uniformly using DOM in PHP
- difference in namespaces causes xpath/XSLT to behave differently.

AFAIK there's no trivial method of changing namespace of all document
elements, so namespace returned by loadHTML() is quite important.


SUGGESTED CHANGE
Simply putting elements in a namespace will break
backwards-compatibility a little (xpath queries for example). Therefore
I suggest adding optional boolean argument to loadHTML() and
loadHTMLFile() that enables new behavior.

Reproduce code:
---------------
<?php 
$html = new DOMDocument(); $html->loadHTML('<html><body>hello');
$xhtml = new DOMDocument(); $xhtml->loadXML('<html
xmlns="http://www.w3.org/1999/xhtml";><body>hello</body></html>');

function test($doc)
{
$x = new DOMXPath($doc);
$x->registerNamespace("x","http://www.w3.org/1999/xhtml";);
echo $x->evaluate("string(//x:body)");
}

test($html);
test($xhtml);


// local-name() could be used as workaround in this practicular
text-case, however this isn't possible/feasible in every case.


Expected result:
----------------
hellohello

Actual result:
--------------
hello



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=38538&edit=1

Reply via email to