On Mon, May 13, 2013 at 05:25:44PM +0200, Patrik Rak wrote: > On 13.5.2013 12:55, Wietse Venema wrote: > >Viktor Dukhovni: > >>The reasonable response to latency spikes is creating concurrency > >>spikes. > > > >By design, Postfix MUST be able to run in a fixed resource budget. > >Your on-demand concurrency spikes break this principle and will > >result in unexpected resource exhaustion. > > I'd second Wietse on this one.
And yet you're missing the point. > If you throw in more resources for everyone, the bad guys are gonna > claim it sooner or later. You have to make sure you give it only to > the good guys, which is the same as giving less to the bad guys in > the first place. No need to throw in yet more additional resources > on demand. We don't know who the "good guys" are and who the "bad guys" are. - A deferred message may simply be greylisted and may deserve timely delivery on its 2nd or 3rd (if the second was a bit too early) delivery attempt. - A small burst of fresh messages may be a pile of poop destined to dead domains, and may immediately clog the queue for 30-300 seconds. > And that's also why it is important to classify ahead of time, as > once you give something away, it's hard to take it back. There is no "giving away" to maintain throughput, high latency tasks warrant higher concurrency, such concurrency is cheap since the delivery agents spend most of their time just sitting there waiting. By *moving* the process count from the fast column to the slow column in real-time (based on actual delivery latency not some heuristic prediction), we free-up precious slots for fast deliveries, which are fewer in number. Nothing I'm proposing creates less opportunity for delivery of new mail, rather I'm proposing dynamic (up to a limit) higher concurrency that soaks up a bounded amount of high latency traffic (ideally all of it most of the time). To better understand the factors that impract the design we need to distinguish between burst pressure and steady-state pressure. When a burst of bad new mail arrives, your proposal takes it through the fast path which gets congested "once" (by each message anyway, but if the burst is large enough, the effect can last quite some time). If the mail is simply slow to deliver, but actually leaves the queue, that's all. Otherwise the burst gets deferred, and now gets the slow path, which does not further congest delivery of new mail, but presumably makes multiple trips through the deferred queue, causing congestion there each time, amplified if you allocate fewer processes to the slow than the fast path (I would strongly discourage that idea). In any case the fast/slow path fails to completely deal with bursts. So lets consider steady-state. Suppose bad mail trickles in as a fraction "0 < b < 1" of the total new mail stream, at a rate that does lead to enough congested fast path processes just from new mail. What happens after that? Well in steady-state, each initially deferred message (which we for worst-case assume continues to tempfail until it expires) gets retried N times, where N grows with the maximum queue lifetime and shrinks with the maximal backoff time (details later). Therefore, the rate at which bad messages enter the active queue from the deferred queue is approximately N * b * new_mail_input_rate. When is that a problem? When, N * b >> 1. Because now a small trickle of bad new mail becomes a steady stream of retried bad mail whose volume is "N * b" higher. So what can we do to reduce the impact? I am proposing raising concurrency for just the bad mail, without subtracting concurrency for the good mail, thereby avoiding collateral damage to innocent bystanders (greylisted mail for example). This also deals with the initial burst (provided the higher concurrency for slow mail is high enough to absorb the most common bursts and low enough to not run out of RAM or kernel resources). This does no harm! It can only help. You're proposing a separate transport for previously deferred mail, this can help but also hurt if the concurrency for the slow path is lower than for the fast path, otherwise it is just a variant of my proposal, in which we guess who's good and who's bad in advance, and avoid spillover from the bad processes into the good when the bad limit is reached. In both cases total output concurrency should rise. Each performs better in some cases and worse in others. The two are composable, we could have a dedicated transport for previously deferred mail with a separate process limit for slow vs. fast mail if we really wanted to get fancy. We could even throw in Wietse's prescreen for DNS prefetching, making a further dent in latency. All three would be a lot of work of course. So what have we not looked at yet? We've not considered trying to reduce "N * b", which amounts to reducing "N" since "b" is outside our control to some degree (though if you can accept less junk, that's by far the best place to solve the problem, e.g. validate the destination domain interactively while the user is submitting the form for example). So what controls "N"? With exponential backoff we rapidly reach the maximum backoff time in a small number of retries, especially because the this backoff time is actually a lower bound in the spacing between deliveries, actual deliveries are typically spaced wider and so the time grows faster than a simpler power of two. Therefore, to good approximation we can assume that the retry count for steady-state bad mail is queue_lifetime/maximal_backoff. Let's plug-in the defaults: $ echo "1k 86400 5 * 4000 / p" | dc 108.0 That's ~100 retries in 5 days. This concentrates bad mail when the bad is > 1% of the total. Suppose a site with unavoidable garbage entering the queue has users that are happier to find out that their mail did not get to its recipient sooner rather than perhaps wating for a full 5 days adjusts the queue lifetime down to 2 days (I did that at Morgan Stanley, where this worked well for the user community, RFCs to the contrary notwithstanding). Then we get: $ echo "1k 86400 2 * 4000 / p" | dc 43.2 Now N drops to ~40, which could make the differennce between deferred mail concentrating initial latency spikes to diluting them (at the 2.5% bad mail mark). What else can do? Clearly raise the maximal backoff time. How does that help? Consider raising the maximal backoff time from 4000s to 14400s (4 hours). Now we get: $ echo "1k 86400 2 * 14400 / p" | dc 12.0 Now N is ~12, and we've won almost a factor of 10 from the default settings. Unless the bad mail is ~8% of the total input there is no concentration and we don't need to discriminate against deferred mail. Is it reasonable to push the max backoff time this high? I think so, by the time we tried; 5m (default first retry) 10m 20m 40m 80m (20% higher than the curren ceiling of 4000s) the message has been in the queue for 155 minutes (or 3.5 hours) and has been tried 6 times. The next retry would normally be about 66 minutes later, but I'd delay it to 160 minutes, so such a message would leave (if that is its fate however unlikely) after 6 hours instead of 5. Is that sufficiently better? Otherwise, with the message already 6 hours late, do we have to try every hour or so? Or is every 4 hours enough? I think it is. So the simplest improvement we can make it just tune the backoff and queue lifetime timers. If we then add process slots for blocked messages (another factor of 5 in many cases) we are looking at raw sewage (40% bad) entering the queue before the deferred queue is any different from fresh mail. Since we've managed 12 years with few complaints about this issue, I think that the timer adjustment is the easiest first step. Users can tune their timers today with no new code. -- Viktor.