>I've asked simular questions before, but I think I'm finally asking the
>*right* question to get the right answer.
That's often the tricky part :-)
>I'm look for some suggestions on the best method of parsing a HTML document
>(or part thereof), with the view of CAPTURING and MODIFYING a specific
>element of a specific tag.
>
>something like:
>
>1. look for a given tag eg DIV
>2. capture the tag (everything from '<DIV' up to the '>')
>3. look for a given attribute (eg ID="foo", ID=foo, ID='foo' -- all valid
>ways)
>4. capture it
>5. be given the opportunity to manipulate the attribute's value, delete it,
>etc
>6. place captured tag (complete with modifed elements) back into the string
>in it's original position
>7. return to step 1, looking for the next occurence of a DIV tag
If you are only looking for a SPECIFIC tag, you just simplified life
immensely!
<?php
# Get some beautiful sample HTML:
$html = file('http://php.net/') or die("Could not open php.net");
$html = implode('', $html);
# Find the DIV tag:
$div = stristr($html, '<div');
$divpos = strlen($html) - strlen($div);
# Break the HTML up into "before" and "after" DIV tag:
$before_div = substr($html, 0, $divpos);
$after_div = substr($html, $divpos);
# Find the *END* of the DIV tag:
# KNOWN BUG:
# They *could* bury a > in their attributes if they work at it...
$end_tag = strstr($after_div, '>');
$endpos = strlen($after_div) - strlen($end_tag);
$div = substr($after_div, 0, $endpos);
# Now get the "after" part to *really* be after the *WHOLE* DIV tag:
$afterdiv = substr($after_div, $endpos);
echo "Before DIV tag: <BR>", htmlentities($before_div), "<HR>\n";
echo "DIV tag itself: <BR>", htmlentities($div), "<HR>\n";
echo "After DIV tag: <BR>", htmlentities($after_div), "<HR>\n";
?>
I can pretty much guarantee that I didn't put a +1 or -1 somewhere where it
belongs in the substr() function calls. I never get that right in my first
pass of coding. You'll have to fine-tune that part yourself.
But you can now do the same technique to search inside of $div for the ID
attribute, pretty much.
>The solution might be a helluva lot more complex, or may be OOP based.
>
>
>Any inspiration/links/words of wisdom?
If you need to do this for any arbitrary tag all at once, there *HAVE* to be
PHP-based HTML parsers "out there" in the various PHP script archives...
If all else fails, the PHP source for http://php.net/strip_tags must have
some kind of HTML parsing routine in it.
--
Like Music? http://l-i-e.com/artists.htm
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php