ID: 41374 Updated by: [EMAIL PROTECTED] Reported By: vesselin at awcreator dot com -Status: Open +Status: Closed Bug Type: DOM XML related Operating System: Linux PHP Version: 5.2.2 Assigned To: rrichards New Comment:
This bug has been fixed in CVS. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. Thank you for the report, and for helping us make PHP better. Previous Comments: ------------------------------------------------------------------------ [2007-05-12 13:43:52] [EMAIL PROTECTED] change summary and assign to self ------------------------------------------------------------------------ [2007-05-12 09:37:22] vesselin at awcreator dot com Actually this sentence: "For example the bug does not show if the <h1> tag in the sample code is not followed by spaces/tabs." should be read as: "For example the bug does not show if the <h1> tag in the sample code is not PRECEDED by spaces/tabs." ------------------------------------------------------------------------ [2007-05-12 09:34:57] vesselin at awcreator dot com Description: ------------ HTML documents loaded via DOMDocument->loadHTML() incorrectly loads some text nodes twice. Please note that formatting and whitespace in the loaded HTML is important. For example the bug does not show if the <h1> tag in the sample code is not followed by spaces/tabs. Reproduce code: --------------- <?php function dump_node ($node) { for ( $child = $node->firstChild; $child !== null; $child = $child->nextSibling ) { printf ("NODE TYPE: %s\n", $child->nodeType); switch ($child->nodeType) { case XML_ELEMENT_NODE: printf ("TYPE: ELEMENT, TAG: \"%s\"\n", $child->tagName); dump_node ($child); break; case XML_TEXT_NODE: printf ("TYPE TEXT, TEXT: \"%s\"\n", htmlspecialchars ($child->wholeText)); break; } } } $html = <<<EOF <html> <body> <table> <tr> <td> <h1>Left col</h1>Some generic text </td> </tr> </table> </body> </html> EOF; $document = new DOMDocument (); $document->resolveExternals = true; $document->loadHTML ($html); dump_node ($document); ?> Expected result: ---------------- A dump of all document nodes and only one text node that has "Some generic text" as data. Actual result: -------------- A dump of all document nodes and two text nodes that have "Some generic text" as data. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=41374&edit=1
