On Wed, May 15, 2013 at 12:22:41PM +0200, Patrik Rak wrote:

> >There is no "them".  This is not an analysis.
> 
> The "bad guys" was a metaphor for slow mail which consumes available
> resources.

The metaphor is flawed, you need a model that considers message
rates entering the active queue from incoming and deferred.  A
queue whose output rate exceeds its input rate is nearly empty.

If the blocked process limit is 500, and minimal backoff time is
the default 300s and blocked messages are done in <= 300s, you'd
need to bring in 600 or more messages from the deferred queue in
a single scan.  If the maximum backoff time is raised from the
default 4,000 to just 19,200 (5.33 hours) then the odds of any
single deferred message being eligible during a given 5 minutes
are 1:64.  Thus you'd need O(38,400) of the bad messages in the
deferred queue to see starvation of new mail.  If the queue lifetime
is 5 days, that's ~8,000 arriving each day, if it is 2 days, that's
20,000 each day.

I'd say a site in that situation should focus on not admitting that
much junk.  Otherwise, given a sensibly low load of bad addresses,
improved tuning, plus dynamic capacity to soak up concurrency demand
spikes, the scenario you're imagining just does not happen.

I faced this very problem in 2001, and:

    - Switched to nqmgr since without "relay" it does a better job
      of fairness between inbound and outbound mail, since it is FIFO
      not round-robin by destination.

    - In fact increased the maximal backoff time, and reduced the
      minimum.

    - Increased the smtp delivery agent process limit.

    - Enabled recipient validation, which was the real solution.

Some of my observations back then were part of the motiviation for
Postfix to evolve:

    - The default process limit has been raised from 50 to 100

    - The minimum backoff time is smaller by default.

    - The default qmgr became "nqmgr", thanks FIFO is good, and
      making it work with mega-recipient messages is a win.  [ We
      still don't have a way to deal with senders who flood the
      queue with lots of single-recipient messages.  There was a
      thread about that recently.  If you want to make the queue
      manager even more flexible, you could support mechanisms to
      group related messages into a logical scheduling unit.  ]

    - The relay transport is standard.

    - Recipient validation is on by default for local recipients.

> Without any offense, maybe you should reread all what was already
> written and put it all more thought. Then you might realize why your
> after-the-fact-testing solution is flawed, and why your
> boost-the-concurrency solution works but is a needless waste. Wietse
> explained the former pretty clearly, IMHO, and I tried my best about
> the latter.

And yet it moves.

It is a fallacy to claim that increasing the output capacity of a
queue does not reduce congestion.  Any queue is congested under
high enough load, there might simply be too much mail slow or fast.

Your magical 60:1 model ignores the "supply" of deferred mail, it
is generally equal to the supply of new mail and should be tuned
to generate retries at a rate below the output capacity.  I'm
providing that tuning.

I explained a safe way to dynamically raise concurrency when the
load creates pressure via latency and we can afford more "threads",
because they are mostly all sleeping, provided the machine is not
tuned with exceedingly tight RAM limits.  An extra smtp(8) process
is quite cheap.  Just a bit of space for the stack and a small
heap, plus process structure.  Look at the process map and see how
much of the address space is private.

This is a quantitative issue, do the numbers.

-- 
        Viktor.

Reply via email to