> -----Original Message----- > From: David B Funk [mailto:[EMAIL PROTECTED] > Sent: Friday, February 27, 2004 10:20 PM > To: Carl R. Friend > Cc: [EMAIL PROTECTED] > Subject: Re: Rule to catch odd href tag > > > On Fri, 27 Feb 2004, Carl R. Friend wrote: > > > On Fri, 27 Feb 2004, Alton Danks wrote: > > > > > I'm seeing some SPAM that has odd href tags like the following: > > > > > > align="center"><a hrefShanghaishref=http://cowerers.com href= > > > > > > "http://www.nungim.com/?ai=7030 "> > > > > [...] > > > > > I've tried: > > > > > > rawbody CTS_HREF /\bhref[a-z]\b/i > > > > Give /<a .{0,32}href[^=]/i a go. That's what I use here. > > The general idea is to scan for the <A anchor and look for an > > href that's followed by anything *other* than an equals sign > > within zero to 32 characters. That should bag your spammer. > > I took the opposite tack with this particular specimen, > looking for a "href=http:" that is preceded by '\w' instead of > '\W'. (IE "href" should be a keyword.) > > rawbody L_FAKE_HREF /\w\whref=http:/i > > If you are worried about FPs, you could further qualify it > with: /<a .{0,32}\w\whref=http:/i > however because of the need for more backtracking, this will take > more CPU.
I'm curious on this. Can a dev tell us if it really makes much CPU difference if we have that much more regex? How about adding the ?: after the first / ? --Chris
