Larry Gilson <[EMAIL PROTECTED]> wrote:
> I had the following HTML tag OBFU rule (variant of yours):
> /(\>|\s)\w{1,5}?\<\/?\s?[\w\s]{6,150}\/?\s?\>\w{1,7}?(\s|\W|\<)/
There's a lot of clutter in that that makes it harder to
follow. Let's try paring it down. First, '<' and '>' are not
special on their own in regexes, so there's no need to
backslash them:
/(>|\s)\w{1,5}?<\/?\s?[\w\s]{6,150}\/?\s?>\w{1,7}?(\s|\W|<)/
When you have an alternation -- something like '(a|b|c)' --
where all the alternatives are single characters, it's better
to write it as a character class -- something like '[abc]'.
Also, '\s' and '<' are both included in '\W', so that last
alternation is equivalent to just '\W':
/[>\s]\w{1,5}?<\/?\s?[\w\s]{6,150}\/?\s?>\w{1,7}?\W/
Now, nongreedy matching serves no purpose when the thing
following it can't be matched by the thing being repeated. In
this case you have '\w{1,5}?' followed by '<', but '<' can't
match '\w', so there's no difference between greedy and
nongreedy matching there. The matching for the series of '\w'
characters has to go all the way to the '<' -- it can't stop
short. Similarly, the '\W' at the end can never match the '\w'
preceding it, so that '?' is also pointless:
/[>\s]\w{1,5}<\/?\s?[\w\s]{6,150}\/?\s?>\w{1,7}\W/
That regex is equivalent to your original one, and may help you
see better why it's not matching as you expect. It's looking
for
a '>' or whitespace character (space, tab, carriage return,
line feed, form feed),
followed by 1 to 5 word characters (letters, numbers, and
underscores),
followed by '<',
followed by an optional '/',
followed by an optional single whitespace character,
followed by 6 to 150 word or whitespace characters,
followed by an optional '/',
followed by an optional single whitespace character,
followed by '>',
followed by 1 to 7 word characters,
followed by a nonword character (anything other than
letters, numbers, and underscore).
I'm not clear on what you want to match, but that's probably
not it.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk