We've got a pool of servers running postfix.  Each server is running bind to 
cache DNS queries.  We are running into an issue where DNS queries are 
intermittently failing (beyond scope for this discussion).   When this happens 
multiple times consecutively postfix starts queueing ALL mail that would go to 
this destination for exactly 5 minutes.

For example: bind, with query logging turned on, shows several of these logs:

Oct 19 11:53:12 hkglppfpool4 named[206415]: client @0x7f32b806b440 
127.0.0.1#53827 (cluster9out.us.messagelabs.com): query failed (SERVFAIL) for 
cluster9out.us.messagelabs.com/IN/A at ../../../bin/named/query.c:8580

At the same time Postfix logs:
Oct 19 11:53:12 hkglppfpool4 postfix/smtp[131030]: 4MspyQ3Fm6z511Sx: 
to=<tengyilian1428...@126.com>, relay=none, delay=10, delays=0.14/0/10/0, 
dsn=4.4.3, status=deferred (Host or domain name not found. Name service error 
for name=cluster9out.us.messagelabs.com type=A: Host not found, try again)

When this happens postfix starts deferring ALL mail that should be delivered to 
cluster9out.us.messagelabs.com for exactly 300 seconds.  The named query logs 
show no queries for this hostname for those 5 minutes, Postfix is not even 
trying the lookup any more.  After the 5 minutes are up, new messages routing 
to cluster9out.us.messagelabs.com are delivered without being deferred and the 
queued messages begin to go out.

Testing shows that the DNS issue is very short term, lasting for 1 second or 
so.  However the pool of servers can handle a large number of messages in a 
short time period.  The particular combination of events amplifies the short 
term DNS issue to messages queueing for 5 minutes.  We've seen the queues get 
up over 1000 messages before the 5 minutes are up.  Above is just one example.  
We're seeing these delivery delays going to several different host. 

The correct solution is to fix the underlying DNS issue.  However until then 
we'd like to mitigate the consequences.  Are there configuration options that 
will  
a) adjust the number of DNS failures before postfix starts deferring the 
messages 
b) adjust the timeout before postfix stops queueing messages

Thanks,

Eric Wilkison



Reply via email to