Re: Trashed Bayes DBase

Kris Deugau 2 Dec 2004 15:54:17 -0000

"Gray, Richard" wrote:
> Basically, it was tagging about 75% of mail as BAYES_00 (we receive
> about 70% SPAM here), so the BAYES was way off


[snip]

> This has finally reached a head and we have had to disable Bayes
> altogether until we can iron this out.
> 
> So, my question is, how on earth do I go about repairing this mess?

1) Wipe your existing bayes_* files.  Given what you're saying here,
they have totally incorrect data and if anything wiping them completely
should *improve* your spam detection rate.  <g>

2) Enable Bayes and autolearn.  Leave the autolearn=ham threshold low; 
although you might want to bring it up to -0.1 or so to learn
"high"-scoring hams.

3) Collect some hand-classified ham and spam.  Feed both to Bayes. 
Watch for messages that get misclassified - ignore the scores.  Feed the
misclassified messages back into Bayes as appropriate.  Note that the
feedback process is ongoing to keep up with the changing flow of spam! 
I'm still feeding misclassified mail into the Bayes dbs on several
systems in various configurations - although not nearly as often, nor as
many messages as when I started.

4) Make sure you're using SURBL - this will significantly help spam
scores get further separated from ham scores, and allow more spam to be
autolearned correctly.

Bayes (and any other learning system) needs fairly close attention for
the first little while;  after a few weeks it should be working much
smoother.

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!

Re: Trashed Bayes DBase

Reply via email to