highly available sitewide bayes, local db vs. sql

Ben Poliakoff 24 Feb 2005 00:46:11 -0000

What sort of experiences have people had managing a sitewide bayes db
that is used by spamassassin (spamd|amavisd) instances on multiple
machines?  I've got an environment with spamassassin/amavisd-new running
in parallel on a pool of two (but possibly more in the future) equally
weighted machines.  How have you avoided the dreaded Single Point of
Failure?


I've been experimenting (on a small scale) with an SQL backed bayes db.
I can readily have multiple machines talk to single mysql instance, but
then I'm stuck trying to make that mysql instance "highly available"
(and I *could* do that on an existing "clustered" server).

I could also have an instance of mysql running on all of the machines,
with one master mysql instance replicating to one or more mysql slave
instances.  I've never set up mysql replication (but it can't be much
harder than OpenLDAP replication!).  In such an example I'd only enable
autolearning on the machine with the master mysql db.

I could also ditch the idea of using a mysql backed bayes and simply
rsync the bayes db file from the master to the slaves on a regular basis
(stopping and starting spamd|amavisd in the process).  In such an
environment I'd do training only on one "master" machine and enable
autolearning only on that machine.

How are other people addressing this issue?

Ben

highly available sitewide bayes, local db vs. sql

Reply via email to