Re: Rule Effeciency

guenther 23 Mar 2004 16:24:33 -0000

> Was wondering if some of the regex guru's on the list might be willing
> to tell me if a rule I have created could be made more efficient
> somehow.


Not a guru, but some comments...


> This rule checks the from address for some common spam-originating
> country codes and tacks on a half-point to the score.  
> 
> From =~ /[EMAIL 
> PROTECTED](nl|ie|de|fr|pl|co\.za|co\.nz|dk|ch|ru|fi|mx|il|tw|ca|cz|lu|lt|ar).?$/i
           ^^  ^^                                                               
     ^^^

The first marked part ".*" is totally irrelevant. There may be any
number of chars before the @. Simply dropping it and starting the RE
with @ will give the same results.

The second marked part is bogus. It means any string between the leading
"@" and the trailing TLD. This might not have huge impact on the From:
header, but will result in FP on body tests.

The third marked part feels incorrect as well. Do you really want chars
there? This can result in FP with (sub)-domains ending with the given
TLDs, like "foo.de.edu".

>>From =~ /[EMAIL PROTECTED](nl|ie|de|...|ar)>?$/

The above RE will only trigger on TLDs that are at the end of the From:
header (no real name) or ending with the optional ">" (with real name).

The middle part should probably not contain other chars than
[-a-zA-z0-9] instead of the ".".


> I apologize if anyone from the list is from any of these countries,
> please don't take it personally.. ;)

Although this is not SPAM, I get added half a point. Spammy me... ;-)

...guenther


-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Rule Effeciency

Reply via email to