On 14.5.2013 6:55, Viktor Dukhovni wrote:
If you throw in more resources for everyone, the bad guys are gonna
claim it sooner or later. You have to make sure you give it only to
the good guys, which is the same as giving less to the bad guys in
the first place. No need to throw in yet more additional resources
on demand.
We don't know who the "good guys" are and who the "bad guys" are.
Exactly. That's the problem. And all we can do is either try to detect
them, or merely guess.
You try to detect them, by initiating the request and measuring how long
it takes. No matter what exactly the test is, that means allocating some
resources for each such test (giving away). However, unless you are
willing to tear down the connection if it doesn't complete on time
(taking away), the resource is wasted until the test completes. And the
bad guys will eventually take it all. Therefore, you have merely moved
the bottleneck elsewhere - instead of competing for the delivery agents,
they compete for the bad/good guy test resources.
The fact that they are the same resourcein what you describe makes no
difference.
Therefore, I say the guess is the best what we have got. All we have to
focus on is to make sure it doesn't backfire if we mis-classify someone.
And what I proposed (solutions 1 and 3) and now repeat below does not,
AFAICT.
And that's also why it is important to classify ahead of time, as
once you give something away, it's hard to take it back.
There is no "giving away" to maintain throughput, high latency
tasks warrant higher concurrency, such concurrency is cheap since
the delivery agents spend most of their time just sitting there
waiting.
You say it's cheap. I believe it is in your environment, as well as it
is in mine and in many others. However, I believe that no one is willing
to pay for more RAM every month for their cloud servers, just so they
can dedicate it for dealing with mail that never gets delivered. That
seems like total waste of money. Especially if there is another solution
which works with fixed resources at hand.
The problem with your approach is that whatever the bad guys want, you
give it to them. That's meek. They want more, you give it to them. They
take it and ask for even more. You give it to them again. Until the
point when you can't give them more, or until the point when they are
finally happy with what they have. That's no better than setting the
transport limit this high in the first place. If there is no demand, it
remains unused, if there is, it will be consumed exactly the same way.
In my approach, I instead tell the bad guys: "You want more? No way!
This is all you'll get, now shut up and move along as well as you can."
You're proposing a separate transport for previously deferred mail,
Not anymore. What I suggest is the solution 1 and 3 from my previous
mail, which both merely restrain how much of the available resources we
are willing to give to the bad guys in the worst case. Note that there
is no slow/fast way either, which would somehow affect either group. We
just make sure that those who we think are bad never get everything.
The key fact is this, which I have mentioned before:
> Each group should automatically adjust the ratio of used slots over
> time according to the ratio of the corresponding delivery speeds
That's the key idea why it doesn't matter if we classify someone
incorrectly. How does it work?
We classify every mail into one of the two groups. We can call them fast
and slow for simplicity, but in fact they are "hopefully fast" or
"presumably slow". For the start it can be equal to new mail and
deferred mail, but doesn't have to, as Wietse pointed out before.
Now let's explore what share of the available resources each group gets.
When both groups contain some mail and the mail is delivered equally
fast, they get 1:1 split. That seems fair. If the slow group becomes say
4 times slower on average, they will get 4:1 split over time. The same
holds if the fast group becomes 4 times slower, they will get 1:4 split.
So far, so good.
Now if one group becomes really slow, like 30 or 60 times slower than
the other one, it's effectively the case when it starts starving the
other one. If it is the slow group which becomes this slow, it gets
60:1 split, which with ~100 delivery agents available is obviously not
enough to get new mail delivered fast enough. If we were willing to
increase the transport limit considerably, the 1/61 will eventually
become enough delivery agents available for fast mail delivery. However,
what I say is that it's enough if we simply do not allow the ratio go
this high. We can fairly easily limit the amount of resources we give to
the bad guys to 80% or 90%, allowing them to get no more than 4:1 or 9:1
split. That can leave quite enough for the fast group while not wasting
too much on the bad group. Seems like good trade, especially when we
presume that most of the bad mail won't get delivered anyway (if it
were, it wouldn't likely be this slow and demand so much resources in
the first place).
Finally, what happens if the fast group becomes terribly slow instead,
and the slow group not? I'd conclude that that doesn't have to bother
us. It makes no sense to try to take resources away from new mail which
suddenly became slow, just so we can try more new mail which can be just
as slow. And taking resources away from slow new mail just so we can
retry some deferred mail seems equally pointless. So I would say it's
perfectly fine to wait until the situation gets back to normal, shall
this ever happen.
Now, does this make it any clearer what I have in mind?
Patrik