Daniel Staal wrote:
Depends on the setup. For instance, given the explanations above, I'll start a system to automatically learn from my 'checkspam' folder, but not my 'highspam' folder. I have procmail automatically sort my spam by score, so I can pay extra attention to low-scoring spam. (Which is more likely to be ham which was misplaced than the high-scoring spam.)

So, since I *already* have them separated out, I can avoid the double-check. ;)

But the final score alone doesn't determine whether something gets autolearned.

As Matt pointed out, there are a number of different factors, including the mix of head/body tests and the current Bayes score -- and it acts on what the score would have been if Bayes had been disabled.

So unless you've filtered on the "autolearn=(ham|spam|no)" tag in the X-Spam-Status header, you could be missing some high-scoring spam that hasn't already been learned.

You could probably filter your training folder to remove any messages where X-Spam-Status contains "autolearn=spam" (assuming, of course, that your server takes full control of that header). That should be relatively fast and cut down on the resources used to identify duplicates.

--
Kelson Vibber
SpeedGate Communications <www.speed.net>

Reply via email to