Jim C. Nasby wrote:
> On Tue, Feb 07, 2006 at 03:16:57PM -0500, Matt Kettler wrote:
>> My current training ratio is about 7:1 spam:nonspam, but in the past it's 
>> been
>> as bad as 20:1. Both of those are very far off from equal amounts, but the
>> imbalance has never caused me any problems.
>>
>> From my sa-learn --dump magic output as of today:
>> 0.000          0     995764          0  non-token data: nspam
>> 0.000          0     145377          0  non-token data: nham
> 
> Interesting... it appears I actually need to do a better job of training
> spam!
> sa-learn --dump magic|grep am
> 0.000          0      98757          0  non-token data: nspam
> 0.000          0     255134          0  non-token data: nham
> 
> I just changed bayes_auto_learn_threshold_spam to 5.0, we'll see what
> that does...

Actually, you can't ever set the threshold below 6.0. SA has a hard-coded
requirement of at least 3.0 header points, and 3.0 body points before it will
autolearn as spam. Therefore, any setting below 6 is moot, because the two 3.0
requirements can't both be met without a score of at least 6.

I would also check to make sure you don't have a lot of spam coming in that's
getting autolearned as ham. (note: the learner's idea of score is very different
than the final message score, so a message CAN be tagged as spam, and still get
autolearned as ham)


Reply via email to