Hi, Saturday, February 22, 2003, 12:35:15 PM, you wrote: AC> My apologies in advance if this too basic or there's a solution easily AC> found out there, but after lots of searching, I'm still lost.
AC> I'm trying to build a regexp that would parse user-supplied text and AC> identify cases where HTML tags are left open or are not properly AC> matched-e.g., <b> tags without closing </b> tags. This is for a sort of AC> message board type of application, and I'd like to allow users to use AC> some HTML, but just would like to check to ensure that no stray tags are AC> input that would screw up the rest of the page's display. I'm new to AC> regular expressions, and the one below is as far as I've gotten. If AC> anyone has any suggestions, they'd be very much appreciated. AC> Thanks, AC> Andy AC> $suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote "; AC> $pattern = '/<(' . $suspect_tags . '[^>]*>)(.*)(?!<\/\1)/Ui'; AC> if (preg_match($pattern,$_POST['entry'],$matches)) { AC> //do something to report the unclosed tags AC> } else { AC> echo 'Input looks fine. No unmatched tags.'; AC> } Here is a function that will fixup simple tags like <b> <i> ,it will add in the missing /b tag at the next start/end tag or end of document. function fix_mismatch($str){ $match = array(); $split = preg_split('!\<(.*?)\>!s', $str); $c = count($split); $r = ($c == 1)? $str : ''; if($c > 1){ $fix = ''; preg_match_all('!\<(.*?)\>!s', $str,$match); for($x=0,$y=0;$x < $c;$x++){ $out = $split[$x].$fix; //add in text + any fixup end tag $fix = ''; if(isset($match[0][$x])){ $list = explode(' ',$match[1][$x]); //split up compound tag like <img src=""> $t = trim(strtolower($list[0])); //get the tag name switch ($t){ //add tags to check/fix here case 'b': case 'div': case 'i': case 'textarea': $st = '/'.$t; //make an end tag to search for $rest = array_slice($match[1],$x+1); // get the remaining tags $found = false; while(!$found && list(,$v) = each($rest)){ $et = explode(' ',$v); $found = ($st == trim(strtolower($et[0])))? True:False; //have we found it ? } if(!$found){ $fix = '<'.$st.'>'; //create an html end tag } break; } $out .= $match[0][$x]; //add in tag } $r .= $out; //build return string } } return $r; } //usage $test1 = '<div>This is a <B >bold word <img src="hello.jpg"></b> and another <b>bold word </div>end of <b>test'; $test2 = '<b><b><i>frog'; echo fix_mismatch($test1); echo '<br>'; echo fix_mismatch($test2); -- regards, Tom -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php