Robert Menschel wrote:
Hello Matias,

Friday, February 11, 2005, 5:32:10 AM, you wrote:

MLB> The sa-learn man page says that for a good training of the
MLB> Bayesian filter, you need to train it with equal amounts of spam
MLB> and ham, or more ham if is possible. So if I sa-learn the spam
MLB> folder, the spam tokens are going to grow a lot compared to ham
MLB> tokens.

IMO, if you manually train ONLY spam into the system, then yes, you
may end up with Bayes problems. Emphasis: may. It might work just
fine.

You don't need to worry about training Bayes with equal amounts of
spam and ham -- my ratio has varied from 10:1 to 15:1 spam:ham, with
no problem.

But it's important to feed ham into the system as well. I would
hesitate exceeding a 100:1 ratio, unless your actual spam load exceeds
100:1.


I'm running a bayes site-wide db now, and I'm seeing a lot of ham appended to the db by the auto learn.
I think that this is a very good thing, and it's helping me to keep a good ham:spam radio :)


Since: Feb 13 04:03:57
learned ham: 1671
Learned spam: 1560

:-D

BR,
Matías.

Reply via email to