2014-02-18, Jason Haar wrote:
We have a geographically distributed edge mail relay network (some in the US and some in Europe) and I'm wondering if the new REDIS support could be used to centralize our Bayes?
If you have a fast and reliable connection between the two, then in principle it could work, although even the roundtrip time across the globe is several times the time needed for a local transaction, so this is probably not a desirable setup. One server in each continent might be acceptable, but hasn't been tried. Bear in mind that a redis server offers no access controls of its own, so IP restrictions need to be handled by a firewall if redis binds to a publicly reachable interface.
Is anything special required to be done to get 4-6 spamd servers to use the same REDIS backend?
No, this is normal. It is no different that having multiple spamd or amavisd child processes under a single master process, each process accesses a database completely independently.
Will network outages (which will happen) cause corruption that could impact the others? (eg what if spamd is trying to upload 3 records to redis and only the first two go through)
No corruption can happen due to network problems. Cases where some but not all tokens are learned, or tokens learned but 'seen' entry not added are non-problematic if it doesn't happen too often. Token updates usually fit within a single IP packet, so in most cases either all of the transaction gets committed or none, even in case of network problems. A full network breakdown (or server down) would cause SpamAssassin to log warnings for each mail message, but will move on anyway, just without Bayes checks. Depending on the mail traffic rate and the duration of outage the volume of such warnings may be undesirable. Intermittent network problems or slowness would be more problematic, as it could slow down mail checking substantially, as timeouts for failing rules and checks are rather large. Mark