Re: [spambayes-dev] Training Question

T. Alexander Popiel Mon, 16 May 2005 11:35:57 -0700

In message:  <[EMAIL PROTECTED]>
             "From Concept To Reality, L.L.C." <[EMAIL PROTECTED]> writes:
>Greetings one and all:
>
>At what point is SPAMBayes sufficiently trained?


Spambayes is sufficiently trained when you are satisfied with its
performance. ;-)  Really, there is no absolute rule, particularly
since everyone's email is different.

[ settings snipped ]
>
>Using these settings, HAM have NEVER gone to UNSURE or SPAM,
>however, if I get 10 e-mails, with 1 as HAM, and 9 as SPAM, 3 SPAM
>end up in SPAM, 3 SPAM end up in UNSURE, and 3 SPAM end up in HAM.

First, spambayes tends to work better when trained with similar
amounts of spam and ham; you've currently got about a 4:1 ratio.
I'd suggest retraining with closer to a 1:1 ratio, and turning off
training while filtering (which will tend to drive you towards
severely unbalanced training).

Second, you may want to lower both your ham and spam thresholds;
if all your ham is being solidly classified as such, you may be
able to get by with a ham threshold of .1, or even .05.  Similarly,
you may be able to drop the spam threshold to .51 or lower, though
lower runs into the problem that a mail with only novel tokens
(scoring at .5, since spambayes doesn't know anything about it)
will end up in the spam bucket.

>What's going on, here? Do I need to adjust my settings more, or do
>I need to train more?

Oddly enough, you may need to train _less_, and preserve a better
training balance between spam and ham.

- Alex
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev

Re: [spambayes-dev] Training Question

Reply via email to