At 15:38 21.11.2002, David Russell spoke out and said:
--------------------[snip]--------------------
>One of the steps I am looking at doing is to replace something "<a
>href="blah" onmouseover="blah">" with "<a href="blah">"
--------------------[snip]--------------------
I found it way easier not to look for encoded values but for the characters
themselves, as it is a lot easier with regexes to scan for characters (or,
better, to scanb for everything EXCEPT a certain character).
So I once took this approach:
Step 1 - extract all "allowed" tags
Step 2 - htmlentitize the string
Step 3 - put the pieces together again
You need to consider that there may be multiple possibilities to write a
link tag (other tags too):
<a href="foo" title="bar">
< a title = "bar" href = "foo" any="other">
etc etc.
So you must be looking for the "href" portion, enclosed by (encoded) angle
brackets:
$re = '/(.*?)(<\s*a\s*[^>]+?href.*?>)(.*)/i';
This reads as
( build a group
.*? with anything until the very next '<' (below)
) end group
( build a group
< beginning with '<'
\s*a\s+ followed by optional blanks and an 'a' followed by at least one
blank
[^>]*? followed by anything EXCEPT '>' until the very next
href "href"
.*? followed by anything until the very next
> '>'
) end group
The 'i' modifier makes that expression case insenitive.
Next we parse the whole buffer for the href:
$result = null;
while ($buffer && $preg_match($re, $buffer, $aresult)) {
// $aresult is:
// [0] - whole buffer
// [1] - pre-match
// [2] - matched group
// [3] - post match
$result .= htmlentities($aresult[1]) . $aresult[2];
$buffer = $aresult[3];
}
$result .= $buffer;
This loops through the data buffer, applying htmlentities() to all parts
except any link tag.
Of course this example only works for the <a href> tag. If you have
multiple tags (and you _do_ have them since you also need to check for the
</a> tag), find ANY tag and check if they are valid:
$re = '/(.*?)(<\s*)(\/?)([^>]*?)(\s*>)(.*)/';
preg_match will create the following result array:
[0] - whole buffer
[1] - prematch
[2] - tag opener incl. opt. blanks
[3] - optional '/' for the closing tag
[4] - tag contents
[5] - tag closer incl. opt. blanks
[6] - postmatch
You can then, within your loop, analyze the tag contents (entry [4]) and
decide how to proceed.
--
>O Ernest E. Vogelsinger
(\) ICQ #13394035
^ http://www.vogelsinger.at/