Paul Boven wrote:
Paul Boven wrote:
Hi everyone,

The message-ID's of mails that have been (auto-)learned by Bayes are stored indefinitely in bayes_seen. Which, over the years that we've used SpamAssassin now, has grown to a 320MB file. We're using site-wide Bayes databases. What would be the best way to trim down this database, safely? Given that it only stores message-ID and spam status, I assume there is no way to rescue more recent entries, and I'd have to wipe it altogether?

No replies yet, so I'll clarify my question a bit:

1.) How much of a performance impact would it have to have a Bayes_seen that is this large?

Depending on how busy your disk is, it could hurt a bit when learning.


2.) What is the safest way of trimming it down? Can I simply stop SpamAssassin (called by Mimedefang in our case) and remove it, or do I need to recreate it in some way?

IIRC you can do just that and SA will recreate a bayes_seen file. Make sure all SA processes are killed off before doing it.

Of course, making a copy of all the bayes datafiles before doing so wouldn't hurt.


It would perhaps be usefull if the Bayes seen database also had timestamps, so this kind of purging could be done automatically and properly.

Code welcome. :)


Daryl

Reply via email to