Hi Nathan,

if you're already speaking of iterating children, i'd like to ask you another question:

Basically i was trying to do the same thing as Tim, when i experienced some difficulties iterating over DOMElement->childNodes with foreach and manipulating strings inside the nodes or even replacing DOMElement/DOMNode/DOMText with another node. Instead, i am currently iterating like this:

$child = $element->firstChild;
while ($child != null) {
        $next_sibling = $child->nextSibling;

        // Do something with child (manipulate, replace, ...)
        
        // Continue iteration
        $child = $next_sibling
}

Is this correct, or is there any better way?

Thank you in advance!
Mario


Nathan Nobbe schrieb:
bouncing back to the list so that others may benefit from our work...

On Fri, Sep 5, 2008 at 3:09 PM, Tim Gustafson <[EMAIL PROTECTED]> wrote:

Nathan,

Thanks for the suggestion, but it's still not working for me.  Here's my
code:

===========
$HTML = new DOMDocument();
@$HTML->loadHTML($text);
$Elements = $HTML->getElementsByTagName("*");

for ($X = 0; $X < $Elements->length; $X++) {
  $Element =  $Elements->item($X);

 if ($Element->tagName == "a") {
   # SNIP - Do something with A tags here
 } else if ($Element instanceof DOMText) {
   echo $Element->nodeValue; exit;
 }
}
===========

This loop never executes the instanceof part of the code.  If I add:

 } else if ($Element instanceof DOMNode) {
   echo "foo!"; exit;
 }

Then it echos "foo!" as expected.  It just seems that none of the nodes in
the tree are DOMText nodes.  In fact, get_class($Element) returns
"DOMElement" for every node in the tree.


Tim,

i got your code working with minimal effort by pulling in two of the methods
i posted and making some revisions.  scope it out,
(this will produce the same output as my last post (the part after OUT:))

<?php
$text = '<html><body>Test<br><h2>[EMAIL PROTECTED]<a name="bar">stuff
inside the link</a>Foo</h2><p>care</p><p>yoyser</p></body></html>';
$HTML = new DOMDocument();
$HTML->loadHTML($text);
$Elements = $HTML->getElementsByTagName("*");

for ($X = 0; $X < $Elements->length; $X++) {
 $Element =  $Elements->item($X);
 if($Element->hasChildNodes())
    foreach($Element->childNodes as $curChild)
     if ($curChild->nodeName == "a") {
       # SNIP - Do something with A tags here
     } else if ($curChild instanceof DOMText) {
      convertToLinkIfNecc($Element, $curChild);
     }
}
echo $HTML->saveXML() . PHP_EOL;


function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) {
    if( (strtolower($textContainer->nodeName) != 'a') &&
        (filter_var($textNode->nodeValue, FILTER_VALIDATE_EMAIL) !== false)
) {
        convertMailtoToAnchor($textContainer, $textNode);
    }
}
function convertMailtoToAnchor(DomElement $textContainer, DOMText $textNode)
{
    $newNode = new DomElement('a', $textNode->nodeValue);
    $textContainer->replaceChild($newNode, $textNode);
    $newNode->setAttribute('href', "mailto:{$textNode->nodeValue}");
}
?>

so, the problem is iterating over a tree structure will only show you whats
at the first level of the tree.  this is why you need to call
hasChildNodes(), and if that is true, call childNodes() and iterate across
that (and really, the code should be doing the same thing there as well,
calling hasChildNodes() and iterating over the results of childNodes()).
the code i have shown will work for the html i posted, however it wont work
on (x)html where these text nodes we're searching for are deeper in the tree
than the second level.  im sure you can cook up something that will recurse
down to the leafs :)
anyway, im going to try and hook up a RecursiveDOMDocumentIterator that
implements RecursiveIterator so that it has the convenient foreach support.
also, ill probly try to hook up a Filter varient of this class so that
situations like this are trivial.

stay tuned :D

-nathan


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to