Paul Boven wrote:
Paul Boven wrote:
Hi everyone,
The message-ID's of mails that have been (auto-)learned by Bayes are
stored indefinitely in bayes_seen. Which, over the years that we've
used SpamAssassin now, has grown to a 320MB file. We're using
site-wide Bayes databases. What would be the best way to trim down
this database, safely?
Given that it only stores message-ID and spam status, I assume there
is no way to rescue more recent entries, and I'd have to wipe it
altogether?
No replies yet, so I'll clarify my question a bit:
1.) How much of a performance impact would it have to have a Bayes_seen
that is this large?
Depending on how busy your disk is, it could hurt a bit when learning.
2.) What is the safest way of trimming it down? Can I simply stop
SpamAssassin (called by Mimedefang in our case) and remove it, or do I
need to recreate it in some way?
IIRC you can do just that and SA will recreate a bayes_seen file. Make
sure all SA processes are killed off before doing it.
Of course, making a copy of all the bayes datafiles before doing so
wouldn't hurt.
It would perhaps be usefull if the Bayes seen database also had
timestamps, so this kind of purging could be done automatically and
properly.
Code welcome. :)
Daryl