hi,
Due to massive network failure on our campus backbone, our dept's mail
system got quite _messy_. One of the problems we had was having the mail
server flooded with messages trying to get in but were failing because of
the intermittent network.
We have 2 qmail servers (on 2 different subnets), each being a backup for
the other (one is a DU4.0d system and the other a FreeBSD). The DU system
started spawning a lot of qmail-smtpd and qmail-queue processes, until the
point where the process table was max-ed out and this almost brought the
system down. Fortunately I managed to kill off the smtpd/queue processes
and turn off mail reception until the network was up again. I suppose, in
retrospect, I should have ulimited the qmail daemons? Or was there
something else I could have done?
The FreeBSD system turned out to be o.k. (I couldn't telnet in to turn
off the mail, or check on it at all) since it queued up the messages meant
for the other server, and when the server came back online, they were sent
back there. Any ideas as to why the FreeBSD server survived over the DU
box?
Poking around the queue dirs on the DU box, I saw that 99% of the queued
messages were zero length, while a handfull had complete messages and
others only partial content. I deleted the the entire queue (since it was
useless anyway after saving the complete messages), or rather, deleted the
entire /var/qmail and re-installed. Question is, did I actually lose any
mail?? What does qmail do when it is unable to complete sending a message
to another qmail server? (from the FreeBSD box to DU box in this case.) I
noticed that some of my personal mail had duplicate copies...are these the
ones from the deleted queue?
thanks,
--shing