Hello Matias,
Friday, February 11, 2005, 5:32:10 AM, you wrote:
MLB> The sa-learn man page says that for a good training of the MLB> Bayesian filter, you need to train it with equal amounts of spam MLB> and ham, or more ham if is possible. So if I sa-learn the spam MLB> folder, the spam tokens are going to grow a lot compared to ham MLB> tokens.
IMO, if you manually train ONLY spam into the system, then yes, you may end up with Bayes problems. Emphasis: may. It might work just fine.
You don't need to worry about training Bayes with equal amounts of spam and ham -- my ratio has varied from 10:1 to 15:1 spam:ham, with no problem.
But it's important to feed ham into the system as well. I would hesitate exceeding a 100:1 ratio, unless your actual spam load exceeds 100:1.
I'm running a bayes site-wide db now, and I'm seeing a lot of ham appended to the db by the auto learn.
I think that this is a very good thing, and it's helping me to keep a good ham:spam radio :)
Since: Feb 13 04:03:57 learned ham: 1671 Learned spam: 1560
:-D
BR, Matías.