On Wed, May 15, 2013 at 12:22:41PM +0200, Patrik Rak wrote: > >There is no "them". This is not an analysis. > > The "bad guys" was a metaphor for slow mail which consumes available > resources.
The metaphor is flawed, you need a model that considers message rates entering the active queue from incoming and deferred. A queue whose output rate exceeds its input rate is nearly empty. If the blocked process limit is 500, and minimal backoff time is the default 300s and blocked messages are done in <= 300s, you'd need to bring in 600 or more messages from the deferred queue in a single scan. If the maximum backoff time is raised from the default 4,000 to just 19,200 (5.33 hours) then the odds of any single deferred message being eligible during a given 5 minutes are 1:64. Thus you'd need O(38,400) of the bad messages in the deferred queue to see starvation of new mail. If the queue lifetime is 5 days, that's ~8,000 arriving each day, if it is 2 days, that's 20,000 each day. I'd say a site in that situation should focus on not admitting that much junk. Otherwise, given a sensibly low load of bad addresses, improved tuning, plus dynamic capacity to soak up concurrency demand spikes, the scenario you're imagining just does not happen. I faced this very problem in 2001, and: - Switched to nqmgr since without "relay" it does a better job of fairness between inbound and outbound mail, since it is FIFO not round-robin by destination. - In fact increased the maximal backoff time, and reduced the minimum. - Increased the smtp delivery agent process limit. - Enabled recipient validation, which was the real solution. Some of my observations back then were part of the motiviation for Postfix to evolve: - The default process limit has been raised from 50 to 100 - The minimum backoff time is smaller by default. - The default qmgr became "nqmgr", thanks FIFO is good, and making it work with mega-recipient messages is a win. [ We still don't have a way to deal with senders who flood the queue with lots of single-recipient messages. There was a thread about that recently. If you want to make the queue manager even more flexible, you could support mechanisms to group related messages into a logical scheduling unit. ] - The relay transport is standard. - Recipient validation is on by default for local recipients. > Without any offense, maybe you should reread all what was already > written and put it all more thought. Then you might realize why your > after-the-fact-testing solution is flawed, and why your > boost-the-concurrency solution works but is a needless waste. Wietse > explained the former pretty clearly, IMHO, and I tried my best about > the latter. And yet it moves. It is a fallacy to claim that increasing the output capacity of a queue does not reduce congestion. Any queue is congested under high enough load, there might simply be too much mail slow or fast. Your magical 60:1 model ignores the "supply" of deferred mail, it is generally equal to the supply of new mail and should be tuned to generate retries at a rate below the output capacity. I'm providing that tuning. I explained a safe way to dynamically raise concurrency when the load creates pressure via latency and we can afford more "threads", because they are mostly all sleeping, provided the machine is not tuned with exceedingly tight RAM limits. An extra smtp(8) process is quite cheap. Just a bit of space for the stack and a small heap, plus process structure. Look at the process map and see how much of the address space is private. This is a quantitative issue, do the numbers. -- Viktor.