Hi,
On Wed, 17 Mar 2004 11:19:44 +0000 Mat Harris <[EMAIL PROTECTED]> wrote:
> On Wed, Mar 17, 2004 at 04:04:58 -0600, David B Funk wrote:
> > Would somebody please mass-check the following rule set
> > and let me know if there's any collateral damage?
> > I whiped them up to deal with a new flavor of spam that I'm
> > seeing more of these days.
> >
> >
> > rawbody L_FAKE_HREF /\w\whref=http:/i
> > describe L_FAKE_HREF Faked href to hide spammer URLs
> > score L_FAKE_HREF 1.0
> >
>
> i am probably just seeing things and being stupid, but what is
> invalid about the above href?
\w matches [a-zA-Z0-9_] so /\w\whref=http:/i matches 'href=http:'
preceded by two characters that are neither punctuation or whitespace.
Meaning 'zzhref=http:' matches, but '<a href=http:' doesn't.
See `perldoc perlre` for details.
Hrm. Does it hurt to change
/\w\whref=http:/i
to
/\w\whref="?https?:/i
or even
/\w\whref="?[a-z]{4,8}:/i
?
-- Bob