On Wed, 2009-05-20 at 12:01 +1200, Jason Haar wrote: > Hi there > > I just got a very large Chinese spam (>4M) - I seem to get several of > these a month. Anyway, while I was fiddling with it I saw the score SA > gave it when it could actually swallow the whole thing (see below). > > As you can see, MIME_CHARSET_FARAWAY, CHARSET_FARAWAY_HEADER, and > SARE_SUB_ENC_GB2312 (from openprotect rules) all triggered - total of > 8.0 points. Sounds good - but of course that's very bad! Doesn't that > mean an actual legitimate Chinese email would *default to a score of > 8.0*!?!?!?! > > There's a lot of overlap there - comments?
Sure. The ok_locales setting defaults to all, effectively disabling all CHARSET_FARAWAY rules. It is intended to be set voluntarily to charsets you cannot even decipher, let alone read. If you DO get Chinese mail, want it and can read it, DO NOT exclude zh from the list if you set it at all. This is a custom, user made problem. Then there is that SARE rule-set. Too lazy to check details tonight. However, a lot of those went stale long ago. Moreover, they are third-party rule-sets YOU installed. If they don't work for you, don't use 'em. That said, a lot of the SARE rule-sets still seem to work pretty good for some folks, as has been reported here. YMMV, and you absolutely can't run third-party rule-sets without investigating their performance. Oh, and of course -- did you say 4 Meg? Dude, are you nuts? :) Seriously, don't scan mail that large. They can easily hog SA to the extent you'd better kill the processes to get some mail flowing again. Virtually no spam at all is larger than 500 k. Cut off there, and don't scan anything larger. Needless to mention that's the spamc default anyway. ;) guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}