On Tue, 2008-11-11 at 21:55 +0100, Thomas Zastrow wrote: > I'm still not happy with my Spamassassin ... it don't recognizes a lot > of Spam mails, even my Thunderbird with default properties recognizes > more than SA. > > Every day, I train the Bayes filter with all the spam which were not > already recognized as spam. My question is now: makes it sense to use > also the already as spam marked mails as input for sa-learn?
Yes, but... (you know, there just has to be a but. ;) I'm taking a guess here only, however your description sounds like you may not have trained Bayes on *ham* properly. It is important to train both, spam and ham -- otherwise, everything would start to look spammy. Moreover, Bayes doesn't even return a score, if it hasn't been trained sufficiently, to avoid mis-fire. You'll need to train it at least 200 spam and ham each, for Bayes to kick in. Preferably much more, taken from your recent and possibly archived ham. Similar for spam, though the older spam gets, the less useful it is for training. Spam changes much more rapidly than the average users ham. If this might be the case, you will not have seen BAYES rules in any of your messages SA headers. To know for sure about your training so far, see nham and nspam in this command: sa-learn --dump magic Another common pitfall is training as the wrong user. You did train Bayes running as the same user SA is being called on behalf in your mail processing chain? HTH guenther -- char *t="[EMAIL PROTECTED]"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}