On Fri, 27 Feb 2004, Carl R. Friend wrote:
> On Fri, 27 Feb 2004, Alton Danks wrote:
>
> > I'm seeing some SPAM that has odd href tags like the following:
> >
> > align="center"><a hrefShanghaishref=http://cowerers.com href=
> >
> > "http://www.nungim.com/?ai=7030 ">
>
> [...]
>
> > I've tried:
> >
> > rawbody CTS_HREF /\bhref[a-z]\b/i
>
> Give /<a .{0,32}href[^=]/i a go. That's what I use here.
> The general idea is to scan for the <A anchor and look for an
> href that's followed by anything *other* than an equals sign
> within zero to 32 characters. That should bag your spammer.
I took the opposite tack with this particular specimen,
looking for a "href=http:" that is preceded by '\w' instead of
'\W'. (IE "href" should be a keyword.)
rawbody L_FAKE_HREF /\w\whref=http:/i
If you are worried about FPs, you could further qualify it
with: /<a .{0,32}\w\whref=http:/i
however because of the need for more backtracking, this will take
more CPU.
So far I haven't seen ay FPs here with the first form.
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{