>I've asked simular questions before, but I think I'm finally asking the >*right* question to get the right answer.
That's often the tricky part :-) >I'm look for some suggestions on the best method of parsing a HTML document >(or part thereof), with the view of CAPTURING and MODIFYING a specific >element of a specific tag. > >something like: > >1. look for a given tag eg DIV >2. capture the tag (everything from '<DIV' up to the '>') >3. look for a given attribute (eg ID="foo", ID=foo, ID='foo' -- all valid >ways) >4. capture it >5. be given the opportunity to manipulate the attribute's value, delete it, >etc >6. place captured tag (complete with modifed elements) back into the string >in it's original position >7. return to step 1, looking for the next occurence of a DIV tag If you are only looking for a SPECIFIC tag, you just simplified life immensely! <?php # Get some beautiful sample HTML: $html = file('http://php.net/') or die("Could not open php.net"); $html = implode('', $html); # Find the DIV tag: $div = stristr($html, '<div'); $divpos = strlen($html) - strlen($div); # Break the HTML up into "before" and "after" DIV tag: $before_div = substr($html, 0, $divpos); $after_div = substr($html, $divpos); # Find the *END* of the DIV tag: # KNOWN BUG: # They *could* bury a > in their attributes if they work at it... $end_tag = strstr($after_div, '>'); $endpos = strlen($after_div) - strlen($end_tag); $div = substr($after_div, 0, $endpos); # Now get the "after" part to *really* be after the *WHOLE* DIV tag: $afterdiv = substr($after_div, $endpos); echo "Before DIV tag: <BR>", htmlentities($before_div), "<HR>\n"; echo "DIV tag itself: <BR>", htmlentities($div), "<HR>\n"; echo "After DIV tag: <BR>", htmlentities($after_div), "<HR>\n"; ?> I can pretty much guarantee that I didn't put a +1 or -1 somewhere where it belongs in the substr() function calls. I never get that right in my first pass of coding. You'll have to fine-tune that part yourself. But you can now do the same technique to search inside of $div for the ID attribute, pretty much. >The solution might be a helluva lot more complex, or may be OOP based. > > >Any inspiration/links/words of wisdom? If you need to do this for any arbitrary tag all at once, there *HAVE* to be PHP-based HTML parsers "out there" in the various PHP script archives... If all else fails, the PHP source for http://php.net/strip_tags must have some kind of HTML parsing routine in it. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php