On Tue, May 14, 2013 at 08:24:16AM -0400, Wietse Venema wrote: > Viktor Dukhovni: > > Nothing I'm proposing creates less opportunity for delivery of new > > mail, rather I'm proposing dynamic (up to a limit) higher concurrency > > that soaks up a bounded amount of high latency traffic (ideally > > all of it most of the time). > > This is no better than having a static process limit at that larger > maximum. Your on-demand additional process slots cannot prevent > slow mail from using up all delivery agents.
The difference is that the static larger maximum does not prevent a thundering hurd of fast deliveries using the high limit to thrash the network link and process scheduler. > To prevent slow mail from using up all delivery agents, one needs > limit the amount of slow mail in the active queue. Once a message > is in the active queue the queue manager has no choice. It has to > be delivered ASAP. My goal was not preventing congestion under all conditions, this is simply not possible. Once some heuristically identified mail is substantially delayed, we've lost already, since the proposed heuristics are rather crude. I am proposing a means of having sustainably higher process limits, without thrashing. The higher process limits substantially reduce steady-state congestion frequency. As you said, we don't need perfection. Simply higher limits are a bit problematic when the slow path is in fact full of fast mail. > How do we limit the amount of slow mail in the active queue? I would prefer to process it at higher concurrency, to the extent possible, maintaining reasonable throughput even for the plausibly slow mail, unless our predictors become much more precise. > That > requires prediction. We seem to agree that once mail has been > deferred a few times, it is likey to be deferred again. We have one > other predictor: the built-in dead-site list. That's it as far as > I know. Provided the reason is an unreachable destination, and not a deferred transport, or a certificate expiration, (any fast repeated deferral via local policy, ...) > As for after-the-fact detection, it does not help if a process > informs the master dynamically that it is blocked. That is too > late to prevent slow mail from using up all delivery agents, > regardless of whether the process limit is dynamically increased > up to some maximum, or whether it is frozen at that same inflated > maximum. The above is a misreading of intent. It does help, it enables safe support for higher concurrency levels, which modern hardware and O/S combinations can easily handle. > [detailed analysis] > > Thanks. This underscores that longer maximal_backoff_time can be > beneficial, by reducing the number of times that a delayed message > visits the active queue. This reflects a simple heuristic: once > mail has been deferred a few times, it is likey to be deferred > again. That, plus for many sites a not too aggressively reduced queue lifetime. Often an email delayed for more than 1 or 2 days is effectively too late, with a bounce the sender can resend to a better address or try another means to reach the recipient. I found 2 days to rather than 5 to be largely beneficial with no complaints of lost mail because some site was down for ~3-4 days. -- Viktor.