Daniel Staal wrote:
Depends on the setup. For instance, given the explanations above, I'll
start a system to automatically learn from my 'checkspam' folder, but
not my 'highspam' folder. I have procmail automatically sort my spam by
score, so I can pay extra attention to low-scoring spam. (Which is more
likely to be ham which was misplaced than the high-scoring spam.)
So, since I *already* have them separated out, I can avoid the
double-check. ;)
But the final score alone doesn't determine whether something gets
autolearned.
As Matt pointed out, there are a number of different factors, including
the mix of head/body tests and the current Bayes score -- and it acts on
what the score would have been if Bayes had been disabled.
So unless you've filtered on the "autolearn=(ham|spam|no)" tag in the
X-Spam-Status header, you could be missing some high-scoring spam that
hasn't already been learned.
You could probably filter your training folder to remove any messages
where X-Spam-Status contains "autolearn=spam" (assuming, of course, that
your server takes full control of that header). That should be
relatively fast and cut down on the resources used to identify duplicates.
--
Kelson Vibber
SpeedGate Communications <www.speed.net>