"Gray, Richard" wrote: > Basically, it was tagging about 75% of mail as BAYES_00 (we receive > about 70% SPAM here), so the BAYES was way off
[snip] > This has finally reached a head and we have had to disable Bayes > altogether until we can iron this out. > > So, my question is, how on earth do I go about repairing this mess? 1) Wipe your existing bayes_* files. Given what you're saying here, they have totally incorrect data and if anything wiping them completely should *improve* your spam detection rate. <g> 2) Enable Bayes and autolearn. Leave the autolearn=ham threshold low; although you might want to bring it up to -0.1 or so to learn "high"-scoring hams. 3) Collect some hand-classified ham and spam. Feed both to Bayes. Watch for messages that get misclassified - ignore the scores. Feed the misclassified messages back into Bayes as appropriate. Note that the feedback process is ongoing to keep up with the changing flow of spam! I'm still feeding misclassified mail into the Bayes dbs on several systems in various configurations - although not nearly as often, nor as many messages as when I started. 4) Make sure you're using SURBL - this will significantly help spam scores get further separated from ham scores, and allow more spam to be autolearned correctly. Bayes (and any other learning system) needs fairly close attention for the first little while; after a few weeks it should be working much smoother. -kgd -- Get your mouse off of there! You don't know where that email has been!