On Mar 24, 2007, at 4:36 AM, Peter Normann wrote:
I've read that use of SQL for vpopmail is really only advantageous
when you get many domains.
Not exactly. SQL is advantageous if you somewhere down the road
implement a web based management system for administering accounts.
SQL is advantageous if you want to scale the cluster beyond more than
one box. If the data is stored in CDB, then you must replicate or
share the CDB file(s) to each machine in the cluster. This is not
difficult, as you can NFS share the CDB, or you can rsync/rdist/scp
the files to each system. But if the CDB file(s) are frequently
updated, and can be updated by more than one system, you'll run into
problems with the CDB getting munged. This is most pronounced with
the etc/tcp.smtp when using POP before SMTP on a large cluster, but
I've also seen the vpasswd files get munged.
But you shouldn't be using POP before SMTP any longer. And it's been
a long time since I've used CDB for vpasswd but I can recall writing
a script to rebuild the vpasswd file after it got munged. You have to
be very careful to make sure you limit the writers of a CDB to one at
a time else you run into problems.
With SQL, it's easy for multiple systems to all access the same SQL
tables and concurrency issues are taken care of for you.
However, what about 1 domain, at how many users would it be faster to
use SQL over the default cdb file?
Never. SQL (any flavor) is at least an order of magnitude slower than
CDB, on a single box. In 2000, the best I could get out of one top of
the line dual PIII system was about 400 queries per second. Of
course, qps will vary based on your ratio of read and writes. Writes
are very slow with SQL because they must be committed to disk in
order to complete. With todays hardware and the latest MySQL, I'd
guess you'd be looking at somewhere in the neighborhood of
1,000-1,500 qps under normal usage. That assumes you've got a large
enough data set to invest 4-8 hours tuning MySQL and your queries to
get the best performance.
The last time I benchmarked CDB performance (in 2000), I was able to
get well over 6,000 qps on servers half the CPU of my SQL boxes. That
kind of performance is expected from a CDB because it's a file. Any
good Unix-like OS will mmap it and access it from RAM.
Is it beneficial at 10K users or 50K users in the same domain, or no
real gain at all until 100K users, or never?
I am uncertain whether SQL provides performance gains under any
circumstances. Maybe someone could expand on this...
Where SQL beats the pants off CDB is scalability. CDB has file size
limits and you can't have multiple writers. Before any write is
completed you must rebuild the CDB from the plain text file. With
tiny CDB files, this is never an issue. But when your CDB gets large
and takes seconds, or minutes to compile, soaking up gobs of CPU and
RAM in the process, this becomes a big problem. Under those
conditions, SQL kicks CDB's tail all over town. One SQL write/update
and you're done.
You can throw a bunch of hardware at MySQL and achieve many thousands
of queries per second such as sites like Wikipedia and Friendster do,
pushing upwards of 15,000 queries per second. And unlike CDB, they
have a lot of redundancy built in because the entire data set exists
in multiple databases.
If your data access is almost entirely reads, CDB is fantastic. If
your data set is tiny or small, CDB is excellent. If you need
frequent writes of a large or huge data set, CDB is probably