>I've asked simular questions before, but I think I'm finally asking the
>*right* question to get the right answer.

That's often the tricky part :-)

>I'm look for some suggestions on the best method of parsing a HTML document
>(or part thereof), with the view of CAPTURING and MODIFYING a specific
>element of a specific tag.
>
>something like:
>
>1. look for a given tag eg DIV
>2. capture the tag (everything from '<DIV' up to the '>')
>3. look for a given attribute (eg ID="foo", ID=foo, ID='foo' -- all valid
>ways)
>4. capture it
>5. be given the opportunity to manipulate the attribute's value, delete it,
>etc
>6. place captured tag (complete with modifed elements) back into the string
>in it's original position
>7. return to step 1, looking for the next occurence of a DIV tag

If you are only looking for a SPECIFIC tag, you just simplified life
immensely!

<?php
  # Get some beautiful sample HTML:
  $html = file('http://php.net/') or die("Could not open php.net");
  $html = implode('', $html);
  
  # Find the DIV tag:
  $div = stristr($html, '<div');
  $divpos = strlen($html) - strlen($div);
  
  # Break the HTML up into "before" and "after" DIV tag:
  $before_div = substr($html, 0, $divpos);
  $after_div = substr($html, $divpos);
  
  # Find the *END* of the DIV tag:
  # KNOWN BUG:
  # They *could* bury a > in their attributes if they work at it...
  $end_tag = strstr($after_div, '>');
  $endpos = strlen($after_div) - strlen($end_tag);
  $div = substr($after_div, 0, $endpos);
  
  # Now get the "after" part to *really* be after the *WHOLE* DIV tag:
  $afterdiv = substr($after_div, $endpos);
  
  echo "Before DIV tag: <BR>", htmlentities($before_div), "<HR>\n";
  echo "DIV tag itself: <BR>", htmlentities($div), "<HR>\n";
  echo "After DIV tag:  <BR>", htmlentities($after_div), "<HR>\n";
?>

I can pretty much guarantee that I didn't put a +1 or -1 somewhere where it
belongs in the substr() function calls.  I never get that right in my first
pass of coding.  You'll have to fine-tune that part yourself.

But you can now do the same technique to search inside of $div for the ID
attribute, pretty much.

>The solution might be a helluva lot more complex, or may be OOP based.
>
>
>Any inspiration/links/words of wisdom?

If you need to do this for any arbitrary tag all at once, there *HAVE* to be
PHP-based HTML parsers "out there" in the various PHP script archives...

If all else fails, the PHP source for http://php.net/strip_tags must have
some kind of HTML parsing routine in it.

-- 
Like Music?  http://l-i-e.com/artists.htm


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to