In message: <[EMAIL PROTECTED]>
"From Concept To Reality, L.L.C." <[EMAIL PROTECTED]> writes:
>Greetings one and all:
>
>At what point is SPAMBayes sufficiently trained?
Spambayes is sufficiently trained when you are satisfied with its
performance. ;-) Really, there is no absolute rule, particularly
since everyone's email is different.
[ settings snipped ]
>
>Using these settings, HAM have NEVER gone to UNSURE or SPAM,
>however, if I get 10 e-mails, with 1 as HAM, and 9 as SPAM, 3 SPAM
>end up in SPAM, 3 SPAM end up in UNSURE, and 3 SPAM end up in HAM.
First, spambayes tends to work better when trained with similar
amounts of spam and ham; you've currently got about a 4:1 ratio.
I'd suggest retraining with closer to a 1:1 ratio, and turning off
training while filtering (which will tend to drive you towards
severely unbalanced training).
Second, you may want to lower both your ham and spam thresholds;
if all your ham is being solidly classified as such, you may be
able to get by with a ham threshold of .1, or even .05. Similarly,
you may be able to drop the spam threshold to .51 or lower, though
lower runs into the problem that a mail with only novel tokens
(scoring at .5, since spambayes doesn't know anything about it)
will end up in the spam bucket.
>What's going on, here? Do I need to adjust my settings more, or do
>I need to train more?
Oddly enough, you may need to train _less_, and preserve a better
training balance between spam and ham.
- Alex
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev