> -----Original Message-----
> From: David B Funk [mailto:[EMAIL PROTECTED]
> Sent: Friday, February 27, 2004 10:20 PM
> To: Carl R. Friend
> Cc: [EMAIL PROTECTED]
> Subject: Re: Rule to catch odd href tag
> 
> 
> On Fri, 27 Feb 2004, Carl R. Friend wrote:
> 
> >    On Fri, 27 Feb 2004, Alton Danks wrote:
> >
> > > I'm seeing some SPAM that has odd href tags like the following:
> > >
> > > align="center"><a hrefShanghaishref=http://cowerers.com href=
> > >
> > > "http://www.nungim.com/?ai=7030 ">
> >
> >    [...]
> >
> > > I've tried:
> > >
> > > rawbody CTS_HREF /\bhref[a-z]\b/i
> >
> >    Give /<a .{0,32}href[^=]/i a go.  That's what I use here.
> > The general idea is to scan for the <A anchor and look for an
> > href that's followed by anything *other* than an equals sign
> > within zero to 32 characters.  That should bag your spammer.
> 
> I took the opposite tack with this particular specimen,
> looking for a "href=http:" that is preceded by '\w' instead of
> '\W'. (IE "href" should be a keyword.)
> 
> rawbody L_FAKE_HREF     /\w\whref=http:/i
> 
> If you are worried about FPs, you could further qualify it
> with:      /<a .{0,32}\w\whref=http:/i
> however because of the need for more backtracking, this will take
> more CPU.

I'm curious on this. Can a dev tell us if it really makes much CPU
difference if we have that much more regex? How about adding the ?: after
the first / ?

--Chris

Reply via email to