On Fri, 27 Feb 2004, Carl R. Friend wrote:

>    On Fri, 27 Feb 2004, Alton Danks wrote:
>
> > I'm seeing some SPAM that has odd href tags like the following:
> >
> > align="center"><a hrefShanghaishref=http://cowerers.com href=
> >
> > "http://www.nungim.com/?ai=7030 ">
>
>    [...]
>
> > I've tried:
> >
> > rawbody CTS_HREF /\bhref[a-z]\b/i
>
>    Give /<a .{0,32}href[^=]/i a go.  That's what I use here.
> The general idea is to scan for the <A anchor and look for an
> href that's followed by anything *other* than an equals sign
> within zero to 32 characters.  That should bag your spammer.

I took the opposite tack with this particular specimen,
looking for a "href=http:" that is preceded by '\w' instead of
'\W'. (IE "href" should be a keyword.)

rawbody L_FAKE_HREF     /\w\whref=http:/i

If you are worried about FPs, you could further qualify it
with:      /<a .{0,32}\w\whref=http:/i
however because of the need for more backtracking, this will take
more CPU.

So far I haven't seen ay FPs here with the first form.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Reply via email to