From:             vesselin at awcreator dot com
Operating system: Linux
PHP version:      5.2.2
PHP Bug Type:     DOM XML related
Bug description:  The XML DOM loadHTML method incorrectly duplicates text nodes

Description:
------------
HTML documents loaded via DOMDocument->loadHTML() incorrectly loads some
text nodes twice. Please note that formatting and whitespace in the loaded
HTML is important. For example the bug does not show if the <h1> tag in the
sample code is not followed by spaces/tabs.

Reproduce code:
---------------
<?php
function dump_node ($node)
{
        for (
                $child = $node->firstChild;
                $child !== null;
                $child = $child->nextSibling
        ) {
                printf ("NODE TYPE: %s\n", $child->nodeType);
                switch ($child->nodeType) {
                case XML_ELEMENT_NODE:
                        printf ("TYPE: ELEMENT, TAG: \"%s\"\n", 
$child->tagName);
                        dump_node ($child);
                        break;
                case XML_TEXT_NODE:
                        printf ("TYPE TEXT, TEXT: \"%s\"\n", htmlspecialchars
($child->wholeText));
                        break;
                }
        }
}

$html = <<<EOF
<html>
<body>
<table>
<tr>
<td>
          <h1>Left col</h1>Some generic text
</td>
</tr>
</table>
</body>
</html>
EOF;

$document = new DOMDocument ();
$document->resolveExternals = true;
$document->loadHTML ($html);
dump_node ($document);
?>


Expected result:
----------------
A dump of all document nodes and only one text node that has "Some generic
text" as data.

Actual result:
--------------
A dump of all document nodes and two text nodes that have "Some generic
text" as data.

-- 
Edit bug report at http://bugs.php.net/?id=41374&edit=1
-- 
Try a CVS snapshot (PHP 4.4): 
http://bugs.php.net/fix.php?id=41374&r=trysnapshot44
Try a CVS snapshot (PHP 5.2): 
http://bugs.php.net/fix.php?id=41374&r=trysnapshot52
Try a CVS snapshot (PHP 6.0): 
http://bugs.php.net/fix.php?id=41374&r=trysnapshot60
Fixed in CVS:                 http://bugs.php.net/fix.php?id=41374&r=fixedcvs
Fixed in release:             
http://bugs.php.net/fix.php?id=41374&r=alreadyfixed
Need backtrace:               http://bugs.php.net/fix.php?id=41374&r=needtrace
Need Reproduce Script:        http://bugs.php.net/fix.php?id=41374&r=needscript
Try newer version:            http://bugs.php.net/fix.php?id=41374&r=oldversion
Not developer issue:          http://bugs.php.net/fix.php?id=41374&r=support
Expected behavior:            http://bugs.php.net/fix.php?id=41374&r=notwrong
Not enough info:              
http://bugs.php.net/fix.php?id=41374&r=notenoughinfo
Submitted twice:              
http://bugs.php.net/fix.php?id=41374&r=submittedtwice
register_globals:             http://bugs.php.net/fix.php?id=41374&r=globals
PHP 3 support discontinued:   http://bugs.php.net/fix.php?id=41374&r=php3
Daylight Savings:             http://bugs.php.net/fix.php?id=41374&r=dst
IIS Stability:                http://bugs.php.net/fix.php?id=41374&r=isapi
Install GNU Sed:              http://bugs.php.net/fix.php?id=41374&r=gnused
Floating point limitations:   http://bugs.php.net/fix.php?id=41374&r=float
No Zend Extensions:           http://bugs.php.net/fix.php?id=41374&r=nozend
MySQL Configuration Error:    http://bugs.php.net/fix.php?id=41374&r=mysqlcfg

Reply via email to