> What we're seeing is that our network and RAID 5 IDE-based disk array on
> our central mail store server is not able to keep up with the 'client'
> servers doing the POP3, IMAP, Webmail, and SMTP legwork.
I've found an interesting bottleneck with webmail. When people use
POP or IMAP clients (Outlook, Mozilla, Opera, Thunderbird, etc.),
the client application caches alot of the information locally and
synchronizes occasionally with the server to see if there are new
messages. Things like browsing and searching run eally fast because
the user is utilizing the resources of their local PC to do most of
teh work. With webmail, the session state is not saved nor cached,
so with each new request, the mailbox can be rescanned. A relatively
modest webmail application might only rescan all headers and show
subject lines. A complex application might scan all content in a
folder to present content more fully. Without anything to throttle
back the webmail server, it's possible that the webmail server softwar
can pound the mail spool server to death.
I used to run a Qmail-based infrastructure for 4000 clients on a
single slow machine without much memory. They used POP as their
only pickup mechanism. We recently reimplemented on a Dell 1750
with two Xeon procs, alot of RAM and a GigE backend to a NetApp
filer with 14 fast disks, and I STILL notice that the machine
sometimes slowed down while people tried to read their 140MB
mailboxes via webmail. <sigh> I put some bottlenecks on the
"search" and retrieval algorithms of the webmail software to help
protect the filer from a flood of queries, and we've been better
since then. The power users with super-large mailboxes complain
that it's "slow", but now its a localized problem rather than a
problem that affects everyone.
Jeremy's comments are great for scaling the database, but it sounds
to me that you're just maxed out on what you can serve over NFS.
An SQL select might take at most a few kilobytes of data on the
network whereas a webmail scan of a 30MB mailbox will take, well,
So.... what to do?
Instead of the centralized NFS mail spool (where the central spool
becomes the bottleneck), you might consider splitting the user base
across several machines. Each machine would have its own RAID1
mail spool. Each machine would be responsible for its own
Inbound SMTP and POP/IMAP/Webmail and use the local disk for the
spool. Use lots of RAM for "buffer cache" to make sure your disk
is hit less frequently. You might be able to centralize outbound
SMTP. Once a machine "fills up", you add another machine. This is
one way to scale.
The big boys in teh mailbox size wars (google, yahoo, hotmail) can't
afford centralized storage for their mailboxes. Look for each to
roll out racks of distribtuted storage where each "storage server" is
a 1/2 U box with a couple large ATA disks in it. We might learn from
this method of scaling.
> Before we take this costly step, what have you noticed for user / system
> loads before you start hitting the limits of your hardware?
Yes. I serve 6000 users right now. They used to all be POP, and life
was good. Now a significant percentage of my new customers use webmail,
and I'm not happy with how my current web-based mail reading software
scales. I may have to hack it alot to get it to perform well.
Something that would help is if we rolled out spam/virus filtering out
for everyone whih will cut 50% inbound mail and 10% viruses from being
processed/stored/read and reread/reread/reread.
BTW: I separate SMTP processing (/var/qmail local RAID1 fast SCSI with
battery cache) from user mail spool storage (/home/vpopmail NFS
mount to filer). Putting /var/qmail on the NFS server might be
another source of overload.