At 15:38 21.11.2002, David Russell spoke out and said: --------------------[snip]-------------------- >One of the steps I am looking at doing is to replace something "<a >href="blah" onmouseover="blah">" with "<a href="blah">" --------------------[snip]--------------------
I found it way easier not to look for encoded values but for the characters themselves, as it is a lot easier with regexes to scan for characters (or, better, to scanb for everything EXCEPT a certain character). So I once took this approach: Step 1 - extract all "allowed" tags Step 2 - htmlentitize the string Step 3 - put the pieces together again You need to consider that there may be multiple possibilities to write a link tag (other tags too): <a href="foo" title="bar"> < a title = "bar" href = "foo" any="other"> etc etc. So you must be looking for the "href" portion, enclosed by (encoded) angle brackets: $re = '/(.*?)(<\s*a\s*[^>]+?href.*?>)(.*)/i'; This reads as ( build a group .*? with anything until the very next '<' (below) ) end group ( build a group < beginning with '<' \s*a\s+ followed by optional blanks and an 'a' followed by at least one blank [^>]*? followed by anything EXCEPT '>' until the very next href "href" .*? followed by anything until the very next > '>' ) end group The 'i' modifier makes that expression case insenitive. Next we parse the whole buffer for the href: $result = null; while ($buffer && $preg_match($re, $buffer, $aresult)) { // $aresult is: // [0] - whole buffer // [1] - pre-match // [2] - matched group // [3] - post match $result .= htmlentities($aresult[1]) . $aresult[2]; $buffer = $aresult[3]; } $result .= $buffer; This loops through the data buffer, applying htmlentities() to all parts except any link tag. Of course this example only works for the <a href> tag. If you have multiple tags (and you _do_ have them since you also need to check for the </a> tag), find ANY tag and check if they are valid: $re = '/(.*?)(<\s*)(\/?)([^>]*?)(\s*>)(.*)/'; preg_match will create the following result array: [0] - whole buffer [1] - prematch [2] - tag opener incl. opt. blanks [3] - optional '/' for the closing tag [4] - tag contents [5] - tag closer incl. opt. blanks [6] - postmatch You can then, within your loop, analyze the tag contents (entry [4]) and decide how to proceed. -- >O Ernest E. Vogelsinger (\) ICQ #13394035 ^ http://www.vogelsinger.at/