On Tue, 21 Jun 2011 07:30:51 -0700 Marc Perkel <supp...@junkemailfilter.com> wrote:
> Thanks David but I need real time updating and it's spread across > multiple servers. So need PostgreSQL or MySQL. That's what we used to think. It turns out that real-time updating is a waste of resources; journalling Bayes updates and then running the journal every 5-10 minutes or so works fine in practice. We synchronize across multiple servers using rsync. This is much more scalable than a central database because as you add servers, your disk bandwidth for extracting Bayes data scales up naturally. You also don't have the database round-trip adding latency. Regards, David.