What sort of experiences have people had managing a sitewide bayes db that is used by spamassassin (spamd|amavisd) instances on multiple machines? I've got an environment with spamassassin/amavisd-new running in parallel on a pool of two (but possibly more in the future) equally weighted machines. How have you avoided the dreaded Single Point of Failure?
I've been experimenting (on a small scale) with an SQL backed bayes db. I can readily have multiple machines talk to single mysql instance, but then I'm stuck trying to make that mysql instance "highly available" (and I *could* do that on an existing "clustered" server). I could also have an instance of mysql running on all of the machines, with one master mysql instance replicating to one or more mysql slave instances. I've never set up mysql replication (but it can't be much harder than OpenLDAP replication!). In such an example I'd only enable autolearning on the machine with the master mysql db. I could also ditch the idea of using a mysql backed bayes and simply rsync the bayes db file from the master to the slaves on a regular basis (stopping and starting spamd|amavisd in the process). In such an environment I'd do training only on one "master" machine and enable autolearning only on that machine. How are other people addressing this issue? Ben