Jim C. Nasby wrote: > On Tue, Feb 07, 2006 at 03:16:57PM -0500, Matt Kettler wrote: >> My current training ratio is about 7:1 spam:nonspam, but in the past it's >> been >> as bad as 20:1. Both of those are very far off from equal amounts, but the >> imbalance has never caused me any problems. >> >> From my sa-learn --dump magic output as of today: >> 0.000 0 995764 0 non-token data: nspam >> 0.000 0 145377 0 non-token data: nham > > Interesting... it appears I actually need to do a better job of training > spam! > sa-learn --dump magic|grep am > 0.000 0 98757 0 non-token data: nspam > 0.000 0 255134 0 non-token data: nham > > I just changed bayes_auto_learn_threshold_spam to 5.0, we'll see what > that does...
Actually, you can't ever set the threshold below 6.0. SA has a hard-coded requirement of at least 3.0 header points, and 3.0 body points before it will autolearn as spam. Therefore, any setting below 6 is moot, because the two 3.0 requirements can't both be met without a score of at least 6. I would also check to make sure you don't have a lot of spam coming in that's getting autolearned as ham. (note: the learner's idea of score is very different than the final message score, so a message CAN be tagged as spam, and still get autolearned as ham)