Short summary: I screwed something up and bounced a lot of mail. It
seems to me that the mistake I made could be handled differently, and I'd
just like to explore it as an idea, see if it makes sense to anyone else.
I'm not blaming qmail or saying "qmail should definitely do this." I'm
just exploring an idea.
I have 2 qmail mail relays. Currently, they forward all mail to a Xerox
mail relay, which then relays it through the Xerox firewall to the
(ex-)Xerox company where I work. We are migrating to our own network, and
yesterday we installed the firewall and planned to go live. (We didn't go
live because of other problems that cropped up).
As part of the attempted switch, I took down the internal mail server to
transfer its files to the new mail server (the old one will remain up as a
Xerox host for a while). Since I expected the new mail server would be
accepting mail by the end of the day, I stopped the mail relays from passing
mail onto the Xerox relay. I did this by configuring smtproutes to route to
an (unreachable) internal network address.
We spent the entire day setting up the firewall and running tests on a
small test network. Several tests failed, so at 4:30 we gave up on the plan
to switch users over and re-enabled the old (Xerox) internal mail server.
Then I reconfigured the external mail relays to relay through Xerox again.
Unfortunately, after a long day of intensive work with 5 subnets and 2
domain names, I messed up and reset the smtproutes file on the main relay to
"mailer-east.scansoft.com" instead of "mailer-east.xerox.com," the Xerox
relay. Of course, "mailer-east.scansoft.com" doesn't exist. Qmail looked
it up in DNS, found it didn't exist, and bounced the 300 or so messages in
its queue back to their senders. I didn't think that was a lot of mail, but
the VP of Sales sure did ;>.
Now, it seems to me that a case where the smtproutes - an internal
control file set by the mail administrator - is wrong like that, might be
treated differently. Perhaps rather than bouncing the (innocent ;>) mail
messages, they could remain queued, and mail be sent to the postmaster. Of
course, if the postmaster is relayed to that smtproute, he wouldn't get it,
but presumably he'd notice sooner or later that a) he wasn't getting mail
from that system he just modified and b) the disks on it were filling up.
Again presumably, he'd check the logs, see the error messages that clue him
in to his internal mistake, and let him fix it without losing mail.
Obviously, qmail requires almost everything to be kosher DNS wise for
security and spam reasons. But it seems to me an invalid smtproute is
pretty clearly an administrator error as opposed to an attempt to spoof,
overload, enter, or otherwise attack the server.
So, what do you think of the following ideas?
1) qmail could treat unresolvable hosts in its control files as operator
errors and leave affected mail in the queue rather than bouncing, and also
try to notify the operator.
2) Perhaps changes to control files could somehow require something like
qmail-lint that checks stuff like this? (I note qmail-lint doesn't check
smtproutes). But the key would be requiring a check made before changes
would take affect. The key to this question is, it seems to me that some
changes (like smtproutes) take affect immediately, and that limits
checkability. Or maybe I'm misunderstanding...
3) get a smarter and more careful sysadmin. For the obvious reasons, I
heartily disapprove of this option.
Any thoughts on all this?
--
gowen -- Greg Owen -- [EMAIL PROTECTED] -- [EMAIL PROTECTED]
Please note my new [EMAIL PROTECTED] address which will
become my default address in March, and which works now.