Hi,

Saturday, February 22, 2003, 12:35:15 PM, you wrote:
AC> My apologies in advance if this too basic or there's a solution easily
AC> found out there, but after lots of searching, I'm still lost.

 

AC> I'm trying to build a regexp that would parse user-supplied text and
AC> identify cases where HTML tags are left open or are not properly
AC> matched-e.g., <b> tags without closing </b> tags. This is for a sort of
AC> message board type of application, and I'd like to allow users to use
AC> some HTML, but just would like to check to ensure that no stray tags are
AC> input that would screw up the rest of the page's display. I'm new to
AC> regular expressions, and the one below is as far as I've gotten. If
AC> anyone has any suggestions, they'd be very much appreciated.

AC> Thanks,

AC> Andy

 

 

AC> $suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote ";

AC> $pattern = '/<(' . $suspect_tags . '[^>]*>)(.*)(?!<\/\1)/Ui';

AC> if (preg_match($pattern,$_POST['entry'],$matches)) {

AC>    //do something to report the unclosed tags

AC> } else {

AC>    echo 'Input looks fine. No unmatched tags.';

AC> }

Here is a function that will fixup simple tags like <b> <i> ,it will add in the
missing /b tag at the next start/end tag or end of document.

function fix_mismatch($str){
        $match = array();
        $split = preg_split('!\<(.*?)\>!s', $str);
        $c = count($split);
        $r = ($c == 1)? $str : '';
        if($c > 1){
                $fix = '';
                preg_match_all('!\<(.*?)\>!s', $str,$match);
                for($x=0,$y=0;$x < $c;$x++){
                        $out = $split[$x].$fix;         //add in text + any fixup end 
tag
                        $fix = '';
                        if(isset($match[0][$x])){
                                $list = explode(' ',$match[1][$x]);     //split up 
compound tag like <img src="">
                                $t = trim(strtolower($list[0]));                //get 
the tag name
                                switch ($t){
                                        //add tags to check/fix here
                                        case 'b':
                                        case 'div':
                                        case 'i':
                                        case 'textarea':
                                                $st = '/'.$t;           //make an end 
tag to search for
                                                $rest = array_slice($match[1],$x+1); 
// get the remaining tags
                                                $found = false;
                                                while(!$found && list(,$v) = 
each($rest)){
                                                        $et = explode(' ',$v);
                                                        $found = ($st == 
trim(strtolower($et[0])))? True:False; //have we found it ?
                                                }
                                                if(!$found){
                                                        $fix = '<'.$st.'>'; //create 
an html end tag
                                                }
                                        break;
                                }
                                $out .= $match[0][$x]; //add in tag
                        }
                        $r .= $out; //build return string
                }
        }
        return $r;
}

//usage
$test1 = '<div>This is a <B >bold word <img src="hello.jpg"></b> and another <b>bold 
word </div>end of <b>test';
$test2 = '<b><b><i>frog';

echo fix_mismatch($test1);
echo '<br>';
echo fix_mismatch($test2);

-- 
regards,
Tom


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to