ID:               38538
 User updated by:  spam02 at pornel dot net
 Reported By:      spam02 at pornel dot net
 Status:           Wont fix
 Bug Type:         DOM XML related
 Operating System: *
 PHP Version:      6CVS-2006-08-21 (snap)
 New Comment:

I'm not saying that loadHTML should read namespace from input -
ofcourse HTML/SGML syntax doesn't support it. 

But by loading HTML into XML DOM you're basically converting it to
namespace aware representation. Being HTML is implied by source format,
and not explictly stated in the document, so namespace information needs
to be added.

Namespace is used to distinguish incompatible nodes in DOM, however DOM
representation of XHTML and HTML is 100% compatible (same semantics,
structure).

Tidy nodes aren't compatible with PHP DOM extension, so this is not a
solution.


Previous Comments:
------------------------------------------------------------------------

[2006-08-22 05:41:44] [EMAIL PROTECTED]

loadHTML only properly deals with HTML(4) documents (which are 
by definition not namespace aware and therefore discards 
them). 

If you want to keep the namespaces, use loadXML() or, for your 
proposal, use the tidy extension to make XHTML out of your 
HTML documents.



------------------------------------------------------------------------

[2006-08-21 23:25:35] spam02 at pornel dot net

Description:
------------
>From W3C: XHTML/1.0 is a reformulation of HTML 4 in XML. The semantics
of HTML and XHTML elements are identical.

loadHTML() should put loaded elements in XHTML namespace to preserve
their semantics. These aren't just any random elements - these are HTML
elements, and HTML elements in XML (therefore DOM) are in
"http://www.w3.org/1999/xhtml"; namespace.

This isn't purely academic problem. 

It's difficult to handle both HTML and XHTML uniformly using DOM in PHP
- difference in namespaces causes xpath/XSLT to behave differently.

AFAIK there's no trivial method of changing namespace of all document
elements, so namespace returned by loadHTML() is quite important.


SUGGESTED CHANGE
Simply putting elements in a namespace will break
backwards-compatibility a little (xpath queries for example). Therefore
I suggest adding optional boolean argument to loadHTML() and
loadHTMLFile() that enables new behavior.

Reproduce code:
---------------
<?php 
$html = new DOMDocument(); $html->loadHTML('<html><body>hello');
$xhtml = new DOMDocument(); $xhtml->loadXML('<html
xmlns="http://www.w3.org/1999/xhtml";><body>hello</body></html>');

function test($doc)
{
$x = new DOMXPath($doc);
$x->registerNamespace("x","http://www.w3.org/1999/xhtml";);
echo $x->evaluate("string(//x:body)");
}

test($html);
test($xhtml);


// local-name() could be used as workaround in this practicular
text-case, however this isn't possible/feasible in every case.


Expected result:
----------------
hellohello

Actual result:
--------------
hello



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=38538&edit=1

Reply via email to