At 03:35 22.02.2003, Andy Crain said: --------------------[snip]-------------------- >My apologies in advance if this too basic or there's a solution easily >found out there, but after lots of searching, I'm still lost. > >I'm trying to build a regexp that would parse user-supplied text and >identify cases where HTML tags are left open or are not properly >matched-e.g., <b> tags without closing </b> tags. This is for a sort of >message board type of application, and I'd like to allow users to use >some HTML, but just would like to check to ensure that no stray tags are >input that would screw up the rest of the page's display. I'm new to >regular expressions, and the one below is as far as I've gotten. If >anyone has any suggestions, they'd be very much appreciated. > >$suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote "; >$pattern = '/<(' . $suspect_tags . '[^>]*>)(.*)(?!<\/\1)/Ui'; >if (preg_match($pattern,$_POST['entry'],$matches)) { > //do something to report the unclosed tags >} else { > echo 'Input looks fine. No unmatched tags.'; >} --------------------[snip]--------------------
Hi, I don't believe you can create a regular expression to look for something that's NOT there. I'd take this approach (tested with drawbacks, see below): function check_tags($text) { $suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote"; $re_find = '/<\s*(' . $suspect_tags . ').*?>(.*)/is'; while (preg_match($re_find,$text,$matches)) { // a suspect tag was found, check if closed $suspect = $matches[1]; $text = $matches[2]; $re_close = '/<\s*\/\s*' . $suspect . '\s*?>(.*)/is'; if (preg_match($re_close, $text, $matches)) { // fine, found matching closer, continue loop $text = $matches[1]; } else { // not closed - return to report it return $suspect; } } return null; } $text = <<<EOT This text contains < font size=+4 > an unclosed suspect </fint>tag. EOT; $tag = check_tags($text); if ($tag) echo "Unmatched: \"$tag\"\n"; else echo "Perfect!\n"; The drawbacks: This approach is softly targeted at unintended typos, such as in the example text. It won't catch deliberate attacks, such as Blindtext <font color="red><font size=+22>Hehe I've got you</font> because it is missing the second font opener. To catch these attacks you'd need to build a source tree of the text in question. HTH, -- >O Ernest E. Vogelsinger (\) ICQ #13394035 ^ http://www.vogelsinger.at/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php