On Tue, 21 Jun 2011 07:06:11 -0700 Marc Perkel <supp...@junkemailfilter.com> wrote:
> Trying to get MySQL bays working in a high volume environment. > Dedicated MySQL server with SSD drives. Can someone send me a sample > my.cnf file and make other suggestings to keep it running wihout > database corruption and other MySQL "features"? Or - should I be > using some other DB? We've tried various ways of storing Bayes data (we have our own Bayes implementation, so this discussion may not correspond exactly with the SA implementation.) After trying Berkeley DB files and PostgreSQL---we would never use MySQL for any data we care about---we finally settled on Dan Bernstein's CDB format. It has by far the best performance. See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/ Take a look at the "Random Reads" timings. CDB is 6 times faster than Berkeley DB! CDB is read-only, which means when you want to do Bayes training, you have to rewrite the entire database. This is not an issue for our system because of how we do Bayes training, but it may be an issue with the standard sa-learn. Regards, David.