Re: [vchkpw] Big server

Eric Ziegast Mon, 21 Jul 2003 14:12:12 -0700

> The client mail server would serve whatever combination
> I would like make a big server with qmail +vpopmail +mysql +procmail.
> I think in this structure:
> Server 1: Mx domain + smtp delivery +filters (Antispam, user filter(procmail)
> and antivirus)
> This server basically is the mail gateway of all domains, where is passed in 
> the filters rules per domain and redirect all mails to server 2
> 
> Server 2: Pop3 accounts + mysql server
> Here is created all accounts.
> 
> This schema is good for multiple domains?


Based on my experience, I agree, but I might split Server2 into
    Server2 (delivery/storage/database)
and
    Server3 (pop/imap/webmail servers for clients).

I include more details below for one economical infrastructure I
worked with.  It's not a HOWTO, but knowing what someone else has
done might help guide you instead of figuring it out from scratch.

> Another question is how do i do the message delivery the messages from
> server 1 to server2?

Qmail!  :^)

In /var/qmail/control/smtproutes, set it so that all mail goes
to Server2 (eg: ":server2.mydomain.com").  If you're fancy, you can
try QMTP instead of SMTP.

--
Eric Ziegast

A sample large server environment (hundreds of domains, thousands of
users) I once helped with:

The MX record points to multiple cheap parallel inbound mail
servers:
 - Single CPU PC at the best Price/Performance cost.
   I've found that one can build these for $300 each.  You will
   find that when doing Virus/Spam scanning that the first
   bottleneck that you hit (out of CPU/memory/disk/network) is CPU.
   All of the regular expression searching on an e-mail message
   takes processing power.  Assuming you have enough RAM, disk I/O
   would be the next bottleneck.  I found a good balance AMD 1800+
   motherboard /w 512MB PC133 RAM and 7200RPM IDE.  Another option
   is investing in a very fast multi-processor Intel screamer with
   lots of RAM, but the cheap and disposable dervers are linearly
   scalable.
 - RAM depends on how many simultaneous connections you want to
   be allowed for Spam/Virus filtering.  I used 512MB on a cheap
   system becasue RAM is cheap these days.  I usually ran out of
   CPU before memory.  If the OS uses any significant amount of
   virtual memory, you need more RAM.  Run vmstat.  If "pi" or "po"
   is above 0, you need more RAM or need to lower the number of
   simultaneous connections allowed by qmail (eg: "concurrencyincoming"
   in /var/qmail/control).  The inbound server is your "mail firewall"
   and doesn't have time for paging to disk when the message load
   is high.
 - Hardware or softare RAID1 7200+ RPM IDE drives is sufficient.
   I have been told by a Linux integrator that Linux software RAID1
   can be faster than the RAID1 provided by hardware controllers.
   If you have a budget for SCSI, use it.  You need merely a
   9GB drive in an inbound relay server anyway because the mail
   doesn't sit on the server.  In fact, you may see a disk I/O
   improvement if you limit /var/qmail/queue to a 2GB partition
   of the hard drive.  If you don't need the space, you don't
   need to have the disk head potentially cross the entire disk
   to find data.  If you select a hardware RAID controller,
   prefer a controller that has non-volatile RAM or RAM /w a battery.
   This will allow the controller to use write-back mode on write
   and significantly reduce response time between the computer and
   the hard drives.
 - While I love OpenBSD and FreeBSD, I've used Linux for Qmail
   services because I've had other Linux-capable staff that
   could help administer the servers.  Another advantage to
   Linux is ReiserFS.  I have used ReiserFS on /var/qmail/queue
   partitions with success after applying the fsync patches.
   (http://www.jedi.claranet.fr/qmail-reiserfs-howto.html)
   ReiserFS performs well with thousands of files in a directory and
   allows you to keep the default hash value (23) for the spool
   directory.  If using ufs (Solaris/BSD), consider compiling a queue
   hash value of some large prime number (like 101).  If using Linux
   without ReiserFS, at least use ext3 instead of ext2 so that you can
   recover after a crash.  If using Solaris, consider VxFS if you
   have the ability to use it.  A standard fsck of a non-journaled
   filsystems used for qmail REALLY sucks.  Aside: I don't export
   ReiserFS over NFS - just use it for the mail relays themselves.
   For vpopmail directories, I use filesystems that are known to
   be tried and tested in heavy read/write environments under NFS.
   I hope ReiserFS gets to this state, but at the time of my
   implementation, it was easier for me to use ext3 for vpopmail
   dirs.
 - I followed instructions for using QmailScanner /w SpamAssassin
   (spamc -f -c) and a Virus checker.  I found QmailScanner to
   be quite inefficient and significantly rewrote it to not
   break up the message into a zillion pieces for its internal
   scanning.  SpamAssassin (spamd) does that for you anyway, and
   so do virus scanning software.  I'd have the qmail-scanner
   programs mark the messages with a "X-Spam-Status" and
   "X-Virus-Status" header and forward the message to my central
   mail server.  Filtering software (your choice is procmail) on
   the central mail server wouldn't have to scan messages, just
   headers to figure out what to do with the message.
 - One well-balanced server was able to handle 40 simultaneous
   connections while filtering each message.  I started with two
   servers.  If the load increased, I could add more.  At $400 each,
   using generic mid-tower boxes, it was easy to justify new boxes.

The central mail storage server:
 - The vpopmail directories and mysql server should be servered
   from a machine with a reliable motherboard and reliable storage.
   CPU load isn't necessary.  Memory isn't necessary.  512MB RAM
   is still sufficient, and a 1GHz processor was still plenty.  If
   you have access to a Network Appliance back-end, I recommend it
   highly, but if you're like me, you don't have much of a budget.
   I would use fast SCSI 73GB drives (10000+ RPM) in a RAID1
   configuration.  I would have a hot-spare drive.  I would have
   NVRAM on the RAID controller if possible.  If using Linux, I'd
   use ext3 if NFS were required (ReiserFS is this is also my POP/IMAP
   server).  If I need more storage, I'd add more RAID1 pairs to
   the SCSI chain.  You main bottleneck is likely to be disk I/O.
   Use iostat to see if read/write requests need to wait.  If
 - In one setup, I put mysql and final qmail delivery into a
   large spool on one server.  I then had POP3 clients and Web
   clients access the spool via NFS.  If I needed more I/O
   between clients and theirmail, I culd add more cheap client
   servers.  I used hardware RAID0+1 SCSI storage on the main
   Vpopmail home directory with ext3 (not ReiserFS because NFS
   reliability for ReiserFS had not yet been proven).  I used
   73GB partitions.  If you have a good budget, consider using
   Network Appliance filers for your back end storage.  They're
   fast and reliable and is amazingly good at random disk I/O
   when you use many spindles in a RAID group.
 - If something catostrophic ever happens to your qmail/vpopmail
   storage server, a user might live with an outage for a couple
   hours, but not for a full day.  I would have a backup IDE disk
   that would be ready to to serve in the event that the primary
   storage is unavailable.  I would do rsyncs between the primary
   storage and the backup drive as often as possible (at least
   once a day).  The directory structure with alot of e-mail
   accounts would take alot of time to recreate.  Having it
   available on a disk nearby helps.  I haven't had to use a
   backup disk, but I sleep better knowing it's there.  I know
   the backup disk would be slower than primary storage and
   add a bottleneck, but it's better than being down.
 - If you're going to use procmail or other filtering software,
   consider doing what you can to minimize the impact of the
   filtering on the storage server.
   - One method is having all filtering done on the inbound
     servers and perform final mailbox delivery from the filtering
     servers into an NFS vpopmail spool.  Synchronization and
     idempotency of NFS writes shouldn't be an issue for qmail,
     but it could seriously lock up inbound delivery if NFS ever
     fails.  ("mount -o rw,soft,intr,noatime ...")
   - Another method is having filtering and final delivery done
     on the mail storage server itself.  There's less network
     traffic with this method compared to NFS, and qmail's queueing
     allows the inbound servers to handle an outage on the main mail
     server gracefully (sessions tempfail and queu up), but now
     there's significant processing being done on the central mail
     server.  The central server could become a processing bottleneck,
     and it's possible that a high enough load for procmail filtering
     could make it less responsive to clients (POP, etc).  You never
     want to overwhelm your file server doing non-file-server tasks.
     If you run procmail on the central storage server, at least use
     "nice 10 COMMAND".  I'm also wary of the security implications
     of running promail on the same server central to mail processing
     for everyone.  What if someone introduces a program as part of
     the procmail filters that has a security bug or sucks up all
     available CPU on the server?  I generally shy away from this
     scenario.
   - At one site, I implemented a batch processing system.  Mail
     would be "delivered" by vdelivermail program into a user's
     mail spool on the central storage server.  The user would not
     be able to run any filters on inbound mail.  By "delivered",
     I mean that the message would exist in ~user/Maildir/tmp/MESSAGE#.
     I would have vdelivermail stop before moving the message into
     the "new" subdirectory and instead append the message delivery
     information to a file that got rolled every minute.  After a
     periodic delay of at least one minute, another program on the
     storage server would look at the list of new messages and serially
     process them through user-controlled filers (a jailed perl script).
     The customers are given some knobs for tuning (spam score threshold,
     sender address, recipient address, content search) and the filtering
     program would apply them to each batched message for final delivery.
     In final delivery, the message is moved into "~user/Maildir/new"
     or "~user/Maildir/junk" or some user-specified CourierIMAP folder.
     The MySql database contains a list of user preferences for their
     mail filtering with this program.  The message is optionally
     tagged by the filter with "spam" in case a POP user wants to use
     that subject tag for filtering in their outlook mail client.

The web mail and POP and IMAP servers are simply scalable via NFS:
  - Platform: up to 1GHz, 1GB RAM, single-disk PCs ($200) or more
    PCs similar to the inbound relays.
  - User vpopmail directories are seen via an NFS mount of the central
    mail storage server.
  - Install user-based web services (eg: pop3d, imapd, apache, sqwebmail),
    and SSL wrappers.  Note: The nubmer of simultaneous sessions depends
    most upon available RAM.  More sessions -> more RAM.
  - This server could also be the outbound SMTP server for users.
    One could take the POP logs and use them for POP-before-SMTP
    authentication, or the qmail-smtpd service could be configured
    for SMTP-auth with queries against the cenral MySQL server on
    the mail storage server.  With multiple servers, SMTP-auth
    works better, but it is possible to create a summary of POP
    traffic on all servers to build a central POP-before-SMTP
    database.

Scaling:
- The database and storage server must be reliable.  This
  server is not linearly scalable, but if properly designed,
  one can process a significant amount of storage before
  needing to upgrade.  To scale this out to multiple servers,
  one needs a way of hashing/routing inbound mail to the correct
  storage server - think of how "vdominfo -d" or "vuserinfo -d"
  or /var/qmail/users/access are used by vdelivermail/qmail and
  how one might be able to use symlinks to point users into
  different NFS mounts.  Adding a user or domain becomes more
  complicated if one can't use a single directory tree for
  their mail storage.
- The mail filtering servers must have good CPU, but don't
  need to be reliable.  They are linearly scalable.
- The client access processors don't need to be reliable if you
  have some method for failover (IP address takeover, LVS) or
  load balancing (Alteon, Foundry, BigIP, etc).  They are
  linearly scalable.  Outbound smtp works better with a local
  "dnscache" DNS resolver (http://cr.yp.to/djbdns) than a local
  nameserver on the network.

This is an example that can service 20000+ users well.
One could decide to scale the single database/storage server
with some sort of server redundancy, or create clusters of
cheaper storage/database servers paired with relay/access
servers.  One would need a method, though, of managing accounts
across multiple clusters.

Scaling to 1000000 users or more would involve commercial
storage systems (eg: Celerra or NetApp), load balancers
(eg: Alteon), hashing users or domains into multiple
clusters of access servers and storage/database managers,
and custom scripting to manage the creation and removal
of domains and users.  One might also start looking at 
the cost/benefit of commercial software packages when
dealing with large user bases.


"Life With Qmail" is your friend: http://LifewithQmail.org .

"Qmail-Scanner" is a good beginning Virus/Spam integration tool
for Qmail - http://qmail-scanner.sourceforge.net/ .

Re: [vchkpw] Big server

Reply via email to