So do you think it is better for bayes if you try to keep this ratio more toward 50/50? I find it is much harder to train HAM than it is SPAM. But if a bad ratio is going to hurt things, one could shut down the SPAM trainer.
Basically, is too much SPAM a bad thing? -----Original Message----- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 15, 2005 5:50 PM To: Chris Santerre; Thomas Arend; users@spamassassin.apache.org Subject: RE: Good idea or bad idea? At 04:56 PM 2/15/2005, Chris Santerre wrote: >Yes, it does tolerate a deviation well. But I remember DQ saying somethnig >like this. Here's one reference to the post I was talking about. In the thread I'd been suggesting "optimal" would be best if the training ratio matched your "real world" spam:ham ratio (which historically was somewhere around 75/25 here, but recently it's closer to 60/40). Dan corrected me and said 50/50 was the goal to shoot for: http://readlist.com/lists/incubator.apache.org/spamassassin-users/0/2046.htm l Of course, my all-of-history ratio is about 96:4, and my recent training ratio is 90:10 (past day). >I agree on a personal scale it works wonders if you *continue* to feed it a >proper diet. Really, I think ratios are helpful, but a fresh feed of both seems more important. I totally agree with the above. between autolearn and forced training scripts, SA learns quite a bit of mail. > But when you get to a more general server side solution, I >don't think the results are worth the effort, when one can write a simple >rule faster then training. I don't think that's true.. the autolearner is a big help here.. Although I force feed, SA autolearns more mail than my scripts feed it. (64% of spam and 12% of ham get autolearned the way I'm set up, and I've not seen any learning errors so far. However, I do use a setup tweaked to avoid false ham learning, something I consider a major issue with the default autolearn threshold.)