On Sat, May 11, 2013 at 04:20:51PM +0200, Patrik Rak wrote: > >- What common use case has different per-recipient (not: per-sender, > >etc.) soft reject rates for a mail stream between two sites? Does > >it matter whether some portion of a mail stream between two sites > >is deferred because of the recipient, sender or other cause? > > The use case which I am interested in is basically some service > sending registration confirmation messages to its users, where some > users decide to fill in bogus addresses which result in temporary > errors until the message expires and bounces. Such messages tend to > stock pile in the deferred queue and can quite dominate the active > queue and adversely affect the deliveries to proper recipients. > Especially when these bogus recipients are not deferred immediately, > but only after considerably long timeout.
The only way to deal with high latency is with high concurrency, thus maintaining a reasonable throughput (concurrency/latency). Most cases of high latency due to bogus domains, non-responding MX hosts, ... are cases in which the concurrency at the receiving system is zero, since no SMTP connection is ever made. So in this case you want at least a high process limit for the transport. If the bogus destinations are many, then this is enough. One would need to size the active queue limits for some multiple of the expected 5-days of bad addresses so that such mail rarely fills the active queue. Since Postfix 1.0 was released in 2001, the price of RAM has fallen considerably. It is now quite cost-effective to build servers with 1-4 GB of RAM or more. So an MTA with this problem should have a large active queue size to avoid running out of queue slots. I think such tuning is a pain in a single instance of Postfix, and monitoring such a queue is needlessly complex with a single instance. I find all the fear and loathing of multiple instances perplexing. Multiple instances are *simpler* than intricately tuned single instances. > - the concurrency window limit of that alternate transport can be > explicitly configured to be small, which should minimize the > difference of the load caused on the target site. That would be a mistake. You want a high concurrency, which is problematic for retries to some legitimate destinations (say Yahoo after greylisting). Therefore, what one really wants to know is: - Did the message fail via a 4XX reply or connection failure? - Is this the first failure, or has delivery failed multiple times? (though with greylisting, one's own retry time may be sooner than the receiver's minimum delay). Thus one may want to keep messages that fail for the fist time or with a 4XX reply rather than a timeout or connection failure in the same queue as regular mail, while sending messages that time out after being deferred into a fallback queue (remote or second instance). For this one would need to change the SMTP delivery agent to use a a conditional fallback relay. This would be added to the delivery request by the queue manager when processing messages from the deferred queue, and used by the SMTP delivery agent only when the "last" regular MX host "site-fails" (not 4XX reply). The effect is to separate slow mail that times out multiple times, whose delivery could clog the queue, from other mail that is in the queue briefly, or whose delivery failures are in any case fast enough to not be a big problem. > I am all eager to hear what Victor has to say about this one, > though... He has a lot of experience with problematic sites using > small concurrency windows, from what I remember... I don't think that additional transports in the same instance are a good idea here. Too much complexity, and still a high risk of a full active queue. With a second downstream instance that holds only the slow mail, one can tune concurrency up, and tune any queue monitoring more appropriately to the content in hand. One can also adjust queue lifetimes more sensibly, ... So I propose: - No changes in trivial-rewrite. - No additional transport personalities. - One additional parameter to define a queue-manager signalled fallback relay, included with delivery requests for messages that come from the deferred queue. - This fallback relay is ignored by default by all delivery agents, and is optionally available in the smtp(8) delivery agent, which needs a second non-default parameter to enable its use. - The second parameter would be set by administrators of affected sites in the "smtp" transport, and likely not set in the "relay" transport. - Sludge (connection timeout or failure possibly combined with a minimum message age) goes to a remote or second instance queue. The main difficulty is that this meshes somewhat poorly with "defer_transports", since some deferred mail may be "innocent" and could be sent to the slow queue when the transport is no longer deferred if the first delivery fails, but this edge case is likely not significant. Similar considerations for mail released from the "hold" to the "deferred" queues. We could extend the queue-file format, to define a new record type which is a variant of 'R' (recipient), this would be a recipient that failed slowly on the last delivery, and should become sludge on the next failed delivery. It behaves just like 'R', except in the smtp(8) delivery agent which sends it to the sludge fallback. That way, the queue-manager is even simpler, just treat 'R' and the new record identically, and let smtp(8) do all the work, but now defer_append() would need to be able to update the recipient record type just like sent(). -- Viktor.