Quoting "David F. Skoll" <d...@roaringpenguin.com>:

On Tue, 21 Jun 2011 07:06:11 -0700
Marc Perkel <supp...@junkemailfilter.com> wrote:

Trying to get MySQL bays working in a high volume environment.
Dedicated MySQL server with SSD drives. Can someone send me a sample
my.cnf file and make other suggestings to keep it running wihout
database corruption and other MySQL "features"? Or - should I be
using some other DB?

We've tried various ways of storing Bayes data (we have our own Bayes
implementation, so this discussion may not correspond exactly with the
SA implementation.)  After trying Berkeley DB files and PostgreSQL---we
would never use MySQL for any data we care about---we finally settled
on Dan Bernstein's CDB format.  It has by far the best performance.
See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/
Take a look at the "Random Reads" timings.  CDB is 6 times faster than
Berkeley DB!

CDB is read-only, which means when you want to do Bayes training, you
have to rewrite the entire database.  This is not an issue for our
system because of how we do Bayes training, but it may be an issue
with the standard sa-learn.



Thats a bit harsh on MySQL isn't it? Anyway, how big are the DB's of you guys?? My Bayes is under 100MB so can happily fit in memory, so no need for SSD's (correct me if I'm wrong but write performance isn't normally causing you issues is it?),

Andy.



Reply via email to