On Tue, 21 Jun 2011 07:30:51 -0700
Marc Perkel <supp...@junkemailfilter.com> wrote:

> Thanks David but I need real time updating and it's spread across 
> multiple servers. So need PostgreSQL or MySQL.

That's what we used to think.  It turns out that real-time updating
is a waste of resources; journalling Bayes updates and then running
the journal every 5-10 minutes or so works fine in practice.  We
synchronize across multiple servers using rsync.  This is much more
scalable than a central database because as you add servers, your disk
bandwidth for extracting Bayes data scales up naturally.  You also don't
have the database round-trip adding latency.

Regards,

David.

Reply via email to