Re: bayesian filter training

Matias Lopez Bergero 11 Feb 2005 20:37:53 -0000

Matt Kettler wrote:

At 05:06 PM 2/10/2005, Matias Lopez Bergero wrote:
It is worth to train the bayes filter with messages already detected and flagged as spam by spamassassin? That would do any good?
Yes. And even if they are already flagged as BAYES_99 it is still worthwhile.


Many thanks for the explanation Matt :)

I think that this are good news. A couple of weeks ago I started storing the spam flagged messages by SA. Currently, I have like 20400 messages stored, I'm planing to sa-learn them, but now I got another question ;)

The sa-learn man page says that for a good training of the Bayesian filter, you need to train it with equal amounts of spam and ham, or more ham if is possible. So if I sa-learn the spam folder, the spam tokens are going to grow a lot compared to ham tokens. Here are my training efforts:

[EMAIL PROTECTED] root]# sa-learn --dump magic | head -4
0.000          0          3          0  non-token data: bayes db version
0.000          0       1932          0  non-token data: nspam
0.000          0       1973          0  non-token data: nham
0.000          0     170590          0  non-token data: ntokens
[EMAIL PROTECTED] root]#

This possible increase in the spam data would have adverse effects on the bayes filter classifying the spam or ham messages??

BR,
Matías.

Re: bayesian filter training

Reply via email to