Re: over-representing non-English spam?

Karsten Bräckelmann Tue, 19 May 2009 17:38:05 -0700

On Wed, 2009-05-20 at 12:01 +1200, Jason Haar wrote:
> Hi there
> 
> I just got a very large Chinese spam (>4M) - I seem to get several of
> these a month. Anyway, while I was fiddling with it I saw the score SA
> gave it when it could actually swallow the whole thing (see below).
> 
> As you can see, MIME_CHARSET_FARAWAY, CHARSET_FARAWAY_HEADER, and
> SARE_SUB_ENC_GB2312 (from openprotect rules) all triggered - total of
> 8.0 points. Sounds good - but of course that's very bad! Doesn't that
> mean an actual legitimate Chinese email would *default to a score of
> 8.0*!?!?!?!
> 
> There's a lot of overlap there - comments?


Sure.

The ok_locales setting defaults to all, effectively disabling all
CHARSET_FARAWAY rules. It is intended to be set voluntarily to charsets
you cannot even decipher, let alone read.

If you DO get Chinese mail, want it and can read it, DO NOT exclude zh
from the list if you set it at all. This is a custom, user made problem.

Then there is that SARE rule-set. Too lazy to check details tonight.
However, a lot of those went stale long ago. Moreover, they are
third-party rule-sets YOU installed. If they don't work for you, don't
use 'em.

That said, a lot of the SARE rule-sets still seem to work pretty good
for some folks, as has been reported here. YMMV, and you absolutely
can't run third-party rule-sets without investigating their performance.


Oh, and of course -- did you say 4 Meg? Dude, are you nuts? :)
Seriously, don't scan mail that large. They can easily hog SA to the
extent you'd better kill the processes to get some mail flowing again.

Virtually no spam at all is larger than 500 k. Cut off there, and don't
scan anything larger. Needless to mention that's the spamc default
anyway. ;)

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: over-representing non-English spam?

Reply via email to