On Mar 24, 2007, at 4:36 AM, Peter Normann wrote:

Quey wrote:
I've read that use of SQL for vpopmail is really only advantageous
when you get many domains.

Not exactly. SQL is advantageous if you somewhere down the road want to
implement a web based management system for administering accounts.

SQL is advantageous if you want to scale the cluster beyond more than one box. If the data is stored in CDB, then you must replicate or share the CDB file(s) to each machine in the cluster. This is not difficult, as you can NFS share the CDB, or you can rsync/rdist/scp the files to each system. But if the CDB file(s) are frequently updated, and can be updated by more than one system, you'll run into problems with the CDB getting munged. This is most pronounced with the etc/tcp.smtp when using POP before SMTP on a large cluster, but I've also seen the vpasswd files get munged.

But you shouldn't be using POP before SMTP any longer. And it's been a long time since I've used CDB for vpasswd but I can recall writing a script to rebuild the vpasswd file after it got munged. You have to be very careful to make sure you limit the writers of a CDB to one at a time else you run into problems.

With SQL, it's easy for multiple systems to all access the same SQL tables and concurrency issues are taken care of for you.

However, what about 1 domain, at how many users would it be faster to
use SQL over the default cdb file?

Never. SQL (any flavor) is at least an order of magnitude slower than CDB, on a single box. In 2000, the best I could get out of one top of the line dual PIII system was about 400 queries per second. Of course, qps will vary based on your ratio of read and writes. Writes are very slow with SQL because they must be committed to disk in order to complete. With todays hardware and the latest MySQL, I'd guess you'd be looking at somewhere in the neighborhood of 1,000-1,500 qps under normal usage. That assumes you've got a large enough data set to invest 4-8 hours tuning MySQL and your queries to get the best performance.

The last time I benchmarked CDB performance (in 2000), I was able to get well over 6,000 qps on servers half the CPU of my SQL boxes. That kind of performance is expected from a CDB because it's a file. Any good Unix-like OS will mmap it and access it from RAM.

Is it beneficial at 10K users or 50K users in the same domain, or no
real gain at all until 100K users, or never?

I am uncertain whether SQL provides performance gains under any
circumstances. Maybe someone could expand on this...

Where SQL beats the pants off CDB is scalability. CDB has file size limits and you can't have multiple writers. Before any write is completed you must rebuild the CDB from the plain text file. With tiny CDB files, this is never an issue. But when your CDB gets large and takes seconds, or minutes to compile, soaking up gobs of CPU and RAM in the process, this becomes a big problem. Under those conditions, SQL kicks CDB's tail all over town. One SQL write/update and you're done.

You can throw a bunch of hardware at MySQL and achieve many thousands of queries per second such as sites like Wikipedia and Friendster do, pushing upwards of 15,000 queries per second. And unlike CDB, they have a lot of redundancy built in because the entire data set exists in multiple databases.

If your data access is almost entirely reads, CDB is fantastic. If your data set is tiny or small, CDB is excellent. If you need frequent writes of a large or huge data set, CDB is probably inappropriate.


Reply via email to