On Sat, May 11, 2013 at 04:20:51PM +0200, Patrik Rak wrote:

> >- What common use case has different per-recipient (not: per-sender,
> >etc.) soft reject rates for a mail stream between two sites? Does
> >it matter whether some portion of a mail stream between two sites
> >is deferred because of the recipient, sender or other cause?
> 
> The use case which I am interested in is basically some service
> sending registration confirmation messages to its users, where some
> users decide to fill in bogus addresses which result in temporary
> errors until the message expires and bounces. Such messages tend to
> stock pile in the deferred queue and can quite dominate the active
> queue and adversely affect the deliveries to proper recipients.
> Especially when these bogus recipients are not deferred immediately,
> but only after considerably long timeout.

The only way to deal with high latency is with high concurrency,
thus maintaining a reasonable throughput (concurrency/latency).
Most cases of high latency due to bogus domains, non-responding MX
hosts, ... are cases in which the concurrency at the receiving
system is zero, since no SMTP connection is ever made.  So in this
case you want at least a high process limit for the transport.  If
the bogus destinations are many, then this is enough.

One would need to size the active queue limits for some multiple
of the expected 5-days of bad addresses so that such mail rarely
fills the active queue.  Since Postfix 1.0 was released in 2001,
the price of RAM has fallen considerably.  It is now quite
cost-effective to build servers with 1-4 GB of RAM or more.  So an
MTA with this problem should have a large active queue size to avoid
running out of queue slots.

I think such tuning is a pain in a single instance of Postfix,
and monitoring such a queue is needlessly complex with a single
instance.  I find all the fear and loathing of multiple instances
perplexing.  Multiple instances are *simpler* than intricately
tuned single instances.

> - the concurrency window limit of that alternate transport can be
> explicitly configured to be small, which should minimize the
> difference of the load caused on the target site.

That would be a mistake.  You want a high concurrency, which is
problematic for retries to some legitimate destinations (say Yahoo
after greylisting).  Therefore, what one really wants to know is:

    - Did the message fail via a 4XX reply or connection failure?

    - Is this the first failure, or has delivery failed multiple times?
      (though with greylisting, one's own retry time may be sooner than
       the receiver's minimum delay).

Thus one may want to keep messages that fail for the fist time or
with a 4XX reply rather than a timeout or connection failure in
the same queue as regular mail, while sending messages that time
out after being deferred into a fallback queue (remote or second
instance).

For this one would need to change the SMTP delivery agent to use a
a conditional fallback relay. This would be added to the delivery
request by the queue manager when processing messages from the
deferred queue, and used by the SMTP delivery agent only when the
"last" regular MX host "site-fails" (not 4XX reply).

The effect is to separate slow mail that times out multiple times,
whose delivery could clog the queue, from other mail that is in
the queue briefly, or whose delivery failures are in any case fast
enough to not be a big problem.

> I am all eager to hear what Victor has to say about this one,
> though... He has a lot of experience with problematic sites using
> small concurrency windows, from what I remember...

I don't think that additional transports in the same instance are
a good idea here.  Too much complexity, and still a high risk of
a full active queue.  With a second downstream instance that holds
only the slow mail, one can tune concurrency up, and tune any queue
monitoring more appropriately to the content in hand.  One can also
adjust queue lifetimes more sensibly,  ...

So I propose:

    - No changes in trivial-rewrite.

    - No additional transport personalities.

    - One additional parameter to define a queue-manager signalled
      fallback relay, included with delivery requests for messages
      that come from the deferred queue.

    - This fallback relay is ignored by default by all delivery agents, and
      is optionally available in the smtp(8) delivery agent, which needs
      a second non-default parameter to enable its use.

    - The second parameter would be set by administrators of affected
      sites in the "smtp" transport, and likely not set in the "relay"
      transport.

    - Sludge (connection timeout or failure possibly combined with a minimum
      message age) goes to a remote or second instance queue.

The main difficulty is that this meshes somewhat poorly with
"defer_transports", since some deferred mail may be "innocent" and
could be sent to the slow queue when the transport is no longer
deferred if the first delivery fails, but this edge case is likely
not significant.  Similar considerations for mail released from
the "hold" to the "deferred" queues.

We could extend the queue-file format, to define a new record type
which is a variant of 'R' (recipient), this would be a recipient
that failed slowly on the last delivery, and should become sludge
on the next failed delivery.  It behaves just like 'R', except
in the smtp(8) delivery agent which sends it to the sludge fallback.

That way, the queue-manager is even simpler, just treat 'R' and
the new record identically, and let smtp(8) do all the work, but
now defer_append() would need to be able to update the recipient
record type just like sent().

-- 
        Viktor.

Reply via email to