Howdy! I'm trying to figure out some tuning for a cluster of postfix servers 
behind a load balancer.  The load balancer simply does a round-robin of 4 
nodes, direct TCP passthrough and does not mangle the traffic in any way.  We 
are running RHEL/CentOS 7 packaged Postfix 2.10 currently.

This cluster receives mail from an external Proofpoint cloud service, processes 
it and passes it on to final delivery.

We use an external cloud service for emergency notifications and are testing a 
new process to send out notification email - on the order of 81,000+ addresses 
in just a few minutes.  What we ran into is that the Proofpoint seemed to 
connect to one of the four nodes for about 3 minutes - sending about 60 batches 
of addresses through a single smtpd process.  Somewhere in that, the node 
started telling Proofpoint to back off and deferred around 70,000 messages from 
the overall batch; initially around 7,000 messages were submitted and 
delivered.  Throughout the day, the remainder of the deferred messages 
continued to deliver, about 10-20 at a time, until some point overnight all 
were finally delivered.

Needless to say, our emergency management people are unhappy with these 
results.  I've been looking at tuning options in postfix to try and accomplish 
two things:
1) force the Proofpoint to terminate and reconnect, in the hopes of spreading 
the load over all four nodes, and
2) allow postfix to accept more messages in a short time when these burst 
periods hit (once a month for testing, and as needed throughout the year).

The setting I'm looking at mostly is smtpd_client_connection_rate_limit which 
is currently the default of 0.  For reasons unclear, the 
smtpd_client_connection_count_limit was raised from default 50 to 1000 several 
years ago but the default_process_limit was not increased; we probably need to 
tune those a bit.

Are there any recommendations that could improve our throughput in times of 
these message bursts? Normal day-to-day traffic flows without issue with our 
current configuration.  I'd appreciate any advice. Note that I will likely not 
be able to test changes in a large mailing until the end of March, for the next 
scheduled emergency notification test window.

Thanks,
RobertC

Reply via email to