So do you think it is better for bayes if you try to keep this ratio more
toward 50/50? I find it is much harder to train HAM than it is SPAM. But if
a bad ratio is going to hurt things, one could shut down the SPAM trainer.

Basically, is too much SPAM a bad thing?

-----Original Message-----
From: Matt Kettler [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 15, 2005 5:50 PM
To: Chris Santerre; Thomas Arend; users@spamassassin.apache.org
Subject: RE: Good idea or bad idea?

At 04:56 PM 2/15/2005, Chris Santerre wrote:
>Yes, it does tolerate a deviation well. But I remember DQ saying somethnig
>like this.

Here's one reference to the post I was talking about. In the thread I'd 
been suggesting "optimal" would be best if the training ratio matched your 
"real world" spam:ham ratio (which historically was somewhere around 75/25 
here, but recently it's closer to 60/40).

Dan corrected me and said 50/50 was the goal to shoot for:

http://readlist.com/lists/incubator.apache.org/spamassassin-users/0/2046.htm
l

Of course, my all-of-history ratio is about 96:4, and my recent training 
ratio is 90:10 (past day).



>I agree on a personal scale it works wonders if you *continue* to feed it a
>proper diet.

Really, I think ratios are helpful, but a fresh feed of both seems more 
important. I totally agree with the above. between autolearn and forced 
training scripts, SA learns quite a bit of mail.

>  But when you get to a more general server side solution, I
>don't think the results are worth the effort, when one can write a simple
>rule faster then training.

I don't think that's true.. the autolearner is a big help here.. Although I 
force feed, SA autolearns more mail than my scripts feed it.

(64% of spam and 12% of ham get autolearned the way I'm set up, and I've 
not seen any learning errors so far. However, I do use a setup tweaked to 
avoid false ham learning, something I consider a major issue with the 
default autolearn threshold.)


Reply via email to