Re: Question training the Bayse filter

Karsten Bräckelmann Tue, 11 Nov 2008 15:57:37 -0800

On Tue, 2008-11-11 at 21:55 +0100, Thomas Zastrow wrote:
> I'm still not happy with my Spamassassin ... it don't recognizes a lot
> of Spam mails, even my Thunderbird with default properties recognizes
> more than SA.
> 
> Every day, I train the Bayes filter with all the spam which were not
> already recognized as spam. My question is now: makes it sense to use
> also the already as spam marked mails as input for sa-learn?


Yes, but... (you know, there just has to be a but. ;)

I'm taking a guess here only, however your description sounds like you
may not have trained Bayes on *ham* properly. It is important to train
both, spam and ham -- otherwise, everything would start to look spammy.

Moreover, Bayes doesn't even return a score, if it hasn't been trained
sufficiently, to avoid mis-fire. You'll need to train it at least 200
spam and ham each, for Bayes to kick in. Preferably much more, taken
from your recent and possibly archived ham. Similar for spam, though the
older spam gets, the less useful it is for training. Spam changes much
more rapidly than the average users ham.

If this might be the case, you will not have seen BAYES rules in any of
your messages SA headers. To know for sure about your training so far,
see nham and nspam in this command:

  sa-learn --dump magic

Another common pitfall is training as the wrong user. You did train
Bayes running as the same user SA is being called on behalf in your mail
processing chain?  HTH

  guenther


-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Question training the Bayse filter

Reply via email to