At 03:35 22.02.2003, Andy Crain said:
--------------------[snip]--------------------
>My apologies in advance if this too basic or there's a solution easily
>found out there, but after lots of searching, I'm still lost.
>
>I'm trying to build a regexp that would parse user-supplied text and
>identify cases where HTML tags are left open or are not properly
>matched-e.g., <b> tags without closing </b> tags. This is for a sort of
>message board type of application, and I'd like to allow users to use
>some HTML, but just would like to check to ensure that no stray tags are
>input that would screw up the rest of the page's display. I'm new to
>regular expressions, and the one below is as far as I've gotten. If
>anyone has any suggestions, they'd be very much appreciated.
>
>$suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote ";
>$pattern = '/<(' . $suspect_tags . '[^>]*>)(.*)(?!<\/\1)/Ui';
>if (preg_match($pattern,$_POST['entry'],$matches)) {
>   //do something to report the unclosed tags
>} else {
>   echo 'Input looks fine. No unmatched tags.';
>}
--------------------[snip]-------------------- 

Hi,

I don't believe you can create a regular expression to look for something
that's NOT there.

I'd take this approach (tested with drawbacks, see below):

function check_tags($text) {
        $suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote";
        $re_find = '/<\s*(' . $suspect_tags . ').*?>(.*)/is';

        while (preg_match($re_find,$text,$matches)) {
                // a suspect tag was found, check if closed
                $suspect = $matches[1];
                $text = $matches[2];
                $re_close = '/<\s*\/\s*' . $suspect . '\s*?>(.*)/is';
                if (preg_match($re_close, $text, $matches)) {
                        // fine, found matching closer, continue loop
                        $text = $matches[1];
                }
                else {
                        // not closed - return to report it
                        return $suspect;
                }
        }
        return null;
}

$text = <<<EOT
This text contains < font
        size=+4 > an
        unclosed suspect </fint>tag.

EOT;

$tag = check_tags($text);
if ($tag) echo "Unmatched: \"$tag\"\n";
else echo "Perfect!\n";

The drawbacks: This approach is softly targeted at unintended typos, such
as in the example text. It won't catch deliberate attacks, such as
   Blindtext <font color="red><font size=+22>Hehe I've got you</font>
because it is missing the second font opener. To catch these attacks you'd
need to build a source tree of the text in question.

HTH,

-- 
   >O     Ernest E. Vogelsinger
   (\)    ICQ #13394035
    ^     http://www.vogelsinger.at/



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to