200 is OK. 2000 is enough. Over the years from 2.43 forward my entire
spam and ham corpus contents amount to under 2000 each and Bayes is
running remarkably smoothly for me. I am "tempted" to enable automatic
learning to see what will happen. I'll take a snapshot of my Bayes
first, though. (The "get a round tuit" aspect involved is that I have
a strong aversion to fixing what isn't broken. {^_-})
{^_^}
----- Original Message -----
From: "Leigh Sharpe" <[EMAIL PROTECTED]>
So it looks like I have to reset my Bayes and re-train it. I want to do
it properly this time. I will be making sure I personally review every
message that our users put into the spam folder first, to make sure they
haven't put spam into the wrong folder. However, I have a couple of
questions:
1) Am I better off to feed it a few emails a day, or wait until I get a
few hundred, then feed them all to sa-learn at once? Is there really a
difference?
2) How many spams should I feed it? I've heard in some places that 200
is OK, I've heard elsewhere that 10000 or more are needed.
3) Just how 'balanced' should it's diet be? Should I use the same
quantity of ham as spam, or can I get away with less ham than spam?
Regards,
Leigh
Leigh Sharpe
Network Systems Engineer
Pacific Wireless
Ph +61 3 9584 8966
Mob 0408 009 502
email [EMAIL PROTECTED]
web www.pacificwireless.com.au