On Thu, 2009-04-02 at 10:15 +1300, Aran wrote:
> Hi, I've been using spamassassin with exim4 for a few months now and after
> the first month of building up the baysean rules it began working very well
> only letting about 1 spam in 50 through.

And your SA version is...?


> We have it set up such that we file our false positives and negatives in
> "Spam" and "Not spam" IMAP folders and each day sa-learn runs across them
> and processes/deletes them and we run sa-update on a monthly cronjob.

Do *ALL* your users do that? Are there perhaps some users who just don't
care, delete missed spam, and you may end up with falsely learned spam?

Also, I'd recommend running sa-update not less than once a week, daily
is just fine. But that's just a side note and not your problem.


> This has worked well for a few months but recently it has gone down hill and
> now lets about 80% of spam through :-( the baysean learning accumulates
> tokens every day as usual but to no effect. Its learned 6000 spams and 2500
> hams with 130K tokens.

Please keep in mind that Bayes is just one sub-system of SA.

Anyway, 80% of spam slipping through indicates some *REAL*, gross issues
somewhere. Even a 3+ years old SA 3.1.x without updates and without
Bayes should perform much, much better than that.

Alas, your post doesn't have any information, about what could possibly
have gone wrong.


> I'm just wondering what your advice is on the best practices and procedures
> to have in place to ensure that we can build up good filtering results, and
> ensure that it remains good over time.

That list would be too long...

Anyway, it isn't your problem. An accuracy problem like THAT is not
related to missing some best practices. Sounds more like a heavily
borked install or config.

For example, do you whitelist your own domain?


Checking SA headers and rules' hit of spam that slipped through, what do
you see? Any pattern, anything that sticks out?


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to