On Wed, 24 Jun 2020, Wietse Venema wrote:
> > > ?Problem: nqmgr's fairness breaks down for single-recipient bulk mail
> >
> > You describe the solutions but maybe you could describe the problem first?
>
> My text was a bit disorganized, but the gist is that nqmgr uses a
> clever approach to interleave (bulk mail) multi-recipient deliveries
> with (non-bulk) single-recipient deliveries. This allows the non-bulk
> mail to trickle out *during* bulk delivery, instead of being stuck
> *after* the bulk delivery. I am making a simplification as nqmgr
> will also interleave workloads with other sizes, not just
> single-recipient and multi-recipient.
>
> When the bulk mail is in the form of single-recipient messages,
> then nqmgr cannot distinguish between bulk mail and non-bulk mail,
> and these deliveries are no longer interleaved. Deliveries to a
> specific destination become sequential. Again, I am making
> a simplification, this time that all email is single-recipient.
Yep, this is a nice summary of how it works. And the below is some
solution, but I am still not sure which problem you are trying to solve.
(I really apologize, maybe it's been part of some prior discussion and
thus I am missing the bit which lead to that post. Honestly, I am not
trying to play dumb here.)
It's obviously some congestion bothering you. So let me ask: where
is the bottleneck?
1) paricular destination is congested
2) all delivery agents are congested
3) in-memory queue is congested
4) something else?
and what is the "blocker" attribute?
A) particular mail recipient is the hog
B) destination itself is the hog (not happening AFAIK)
C) particular mail sender is the hog
D) particular mail is the hog
E) single recipient mail blocking multi recipient mail (not happening AFAICT)
F) multi recipient mail blocking single recipient mail (not happening AFAICT)
G) something else?
>From the rest of the following mails I have gathered that it may be the
same sender blocking things (which may be either delivery agents or
in-memory queue, so 2C or 3C, I am not really sure) is what is bugging
you. But that's what confuses me even further because separate
per-mail-recipient-count classes don't help with that, do they? (If all
you get in your system is single recipient mail, this type of classes
doesn't change a thing.)
But if that's really the case (the single sender usurping everything), why
complicate things? Rather than dealing with that during the outbound
selection, I'd say cut it during the inbound phase. Don't even let it into
the system. Simply don't pickup mail from sender who already has too many
recipients in in-memory queue during the active queue scan. It prevents it
from clogging the queue or even the agents (depending on the limit), is
trivial to do, and requires no change to the scheduling algorithm itself.
> The idea then is to have three FIFO classes with delivery requests
> that are round-robin interleaved by design: one class for senders
> with, say, 1-20 pending delivery requests, one class for senders
> with 20-400, and one class for senders with more than 400 pending
> delivery requests. The scheduler has up to 20,000 delivery requests
> in memory.
>
> For a given transport and destination, if all three classes have
> work then round-robin interleaving gives each class gets 1/3 of the
> delivery slots, 1/2 if there are only two classes with work, all
> slots if there is only one class witth work, and zero if there is
> no work.
>
> If implemented in oqmgr, it's brain-dead round-robin with FIFO.
> What could possibly go wrong? :-) Well it does not interleave
> deliveries within the same class. But, with single-recipient messages,
> that wasn't happening anyway, so we are not making things worse.
Let's put aside that I don't know what you are solving. Regardless of that
what you suggest has it's own share of problem. You either have too little
classes, or too many (David pointed at this as well, if I understood it
correctly within the context I have so far). Each suck in a different way.
1) Too little classes - in which class will 100 recipient mail fit? Will 1
recipient mail have to wait until several 100 recipient mails before it
are delivered? Or will 100 recipient mail wait until few 1000 recipient
mails before it are delivered? No matter how you set the thresholds, with
too little classes it will always be too coarse and will cause substantial
delays.
(For comparison, nqmgr has kind of implicit log2(n) classes, even if their
boundaries may be a bit more fuzzy)
2) Too many classes - let's assume you will have log2(n) classes (similar
to what nqmgr implicitly has). It seems to work fine, but actually that's
where the round robin hits us hard - you inflate the delivery of some
mails beyond reasonable. Let's say you have 16 classes, all delivering
mail. With round robin, you are inflating delivery of mail in each class
by deliveries of all other classes. This means each mail can be
extended 16 times. As you put it some 20+ years ago when I was proposing
round robin for the first time - first recipients of some mails all arrive
today while last recipients of the same mails arrive tomorrow. It sucked
then and I believe it still sucks today.
(For comparison, nqmgr never inflates any mail delivery more than twice,
even with the most aggressive preemtpion settings enabled).
So, this is why I think what you propose is not the way to go.
Unfortunately, I don't think there is any sweet spot of "just the right
amount of classes" which would work wonderfully. I am afraid it would
suffer from the worst of both cases, being too coarse to delay too much of
shorter mails while inflating some deliveries way too much at the same time.
> > I don?t remember any of it now but maybe I wrote the most important
> > ideas somewhere, so I can have a look.
>
> I remember we discussed classes, but I don't think that I was able
> to reduce the concept to a combination of really brain-dead simple
> concepts.
>
> I think it would work (in oqmgr), I just need to find time to do this.
Well, I don't want to discourage you, but the above is why I am rather
skeptic about the result.
FWIW, in the mean time, I had a look and found my notes from July 2013,
when we were last discussing the classes. It's quite interesting read (for
me, at least). It contains various notes for the arguments I prepared for
our future discussions, as well as the solution to the problem we were
solving back then.
It turns out we were indeed discussing the classes as well, but those were
"slow" and "fast" classes instead. My biggest gripe with nqmgr (or oqmgr,
for that matter) is that on a saturated system, deliveries of "slow" mail
to "slow" destinations eventually hog down most of the delivery agents,
considerably slowing any other, both "slow" and "fast", mail sitting in
the queue.
We were trying to come up with a solution that would solve this by using
the mail size as an attribute somehow, and dedicating some portions of the
agents to certain classes. After some serious thinking my revelation was
that to solve this properly, we should in fact measure the time the mail
spends "on the wire" (effectively measuring per-destination throughput),
and use this information to drive the delivery agent assignment decisions.
It's quite obvious in the hindsight, but that's something which we have
never discussed before. The solution I had back would work marvelously
even with a single agent, but in the end I had never pursued this further
to turn it into an actual implementation.
I attach the notes for your perusal, maybe you'll find some of it
interesting, too. It's quite brief to be generally useful as is, but if
something piques your interest, I can try to remember and turn it into
something more understandable.
BTW, one section even discusses the explicit vs implicit preemption, which
is basically the class based round robin being discussed above. :)
Best,
Patrik
Queues are sequential. Order matters.
Metrics for messed up ordering is minimum number of swaps needed to get from
order A to order B.
Every swap breaks order in one pair, helping one message nad harming the other
one.
So this metrics gives the number of hurt (as well as helped) messages.
Fairness is an ellusive term. Average message delays and number of reordering
swaps are a metric. Fairness is not.
Solving congestion alone is simple. The point is doing it in such a way that
only improves things.
---------------
What user wants from the mail system (fast, order preserving within reason).
Tracking both per user, as well as globally (although the global reordering is
less relevant).
Both are interesting from recipient view as well, not only sender.
Sending big message, then small. Or list mail, then single.
------------------
So, the point is to minimize order swaps and minimize delays.
Increasing swaps slightly is ok if it leads to considerably decreased delays,
but whatever leads to increasing both swaps and delays at the same time is
stupid.
The relations between classes are not symetrical.
When putting class A in front of B improves things, doing the opposite can't
improve things as well.
Example with old lady with a full shopping cart, both right and wrong.
Example with separate queues for black and white. (or trucks and cars on
horizon).
Example with big message sent much later than small, but arrives first.
Example with class storming the shop.
-------------------------------
Some problems are quantitative, some qualitative. They manifests themselves
differently.
(The former are concerned with multitude (count), while the others with
magnitude (order) (which are btw both quantitative properties (measurable), by
the way, not qualitative properties (taste, like). OTOH, in general, quality
refers to attribute or property, which are not measurable on its own (red),
unless one assign thems some meaning or order. ))
It's possible to convert back and forth, though - counting items of some
quality or grouping object of some quantity.
Quantitative problems can be solved by counting. Qualitative can't, at least
not in an optimal way.
Example with car with empty gas tank, (frozen diesel), and diesel with petrol
added. All make the car stop.
Recognizing what's the problem is different, though, as is fixing it.
XXX - not the best analogy... Example with two cannisters, one with diesel, one
with diesel with petrol added. Which one to use? Either you compare, or you get
average results (mixing stuff).
-------------------------------
Two people sending list mail vs two people sending bulk mail - the first should
be sequential, while the
latter could be either sequential or round robin, depending on when and how
they sent it (interleaved vs sequential).
Fundamental difference. The list sender has this guarantee, bulk doesn't.
Also, explicit list mail preemption is smarter in candidate selection. Simple
implicit preemption would just choose whatever's available next - not good
enough wrt message/recipient delays. Could lead to stretching list delivery too
much by simple interleaving, because it lacks slot grouping and thus tighter
message start/stop delay.
Also, explicit preemption works with single agent. Implicit one can be done for
quantitative ones by counting, but not for qualitative ones - you need to
somehow measure. That's why the latter works better with implicit preemption
using multiple delivery agents, as it does the measuring for us. But for more
than two classes, one needs the ordering, to prevent starving the remaining
classes. Or suffer the average results, as usual.
---------------------
replace number of agents used with leaky bucket based predicate. Basically time
on wire used by each class. Has to use time estimates (corrected afterwards)
instead of self-adjusting measurement of multiple agents. The rest works the
same.
Hmm, the catch is that we don't know how many delivery agents there is (of
course) - so we have no problem to limit some class to X% of the total when
there is enough stuff being delivered, but it is difficult to leave Y% of
delivery agents idle when there is nothing but mail from that class going out.
Could it be somehow derived from how much is still on the wire compared to how
much has already finished?
In other words, given single agent, having just sent slow message, how do we
know if we can send the next slow or if we should look if there is fast one? Or
do we look always, and only use the slow one anyway if there is nothing better
found at all? Maybe that's how we do it - instead of using the first which can
go or nothing, we use the best candidate we find - of course, the unrestricted
one being the winner, if there is one.
Ahaaa, actually we know how many there are ready in each class, so it is merely
the matter of checking if there is something better at all - in which case we
will find it (unless it's concurrency window restricted, of course), or we
don't have to bother.
Hmm, but it can be one of the blockers BEFORE the current job, so the best
choice fallback has to be used, anyway.
The big advantage is not only it is independent on the total number of agents,
but also that it automatically adapts to any number of agents on the fly - if
the total limit is 1000 but only 20 are being used because of concurrency
window(s), it adjusts to this limit automatically.
-----
Another type of congestion would be per destination congestion. It affects the
per destination average message delays. It's similar to bulk sender problem -
if the messages/entries for/from few Xs consume to much time, the other Xs are
congested. Not really a problem in practice, but nice example.
Metrics:
Order swaps (increasing)
Average message delays
Per sender delays
Per destination delays
Total time it takes to deliver everything (same)
...
----