This will be my last note on the topic on tsvwg. I'd like to believe that by 
tomorrow we will have all joined aqm@.

On Mar 14, 2013, at 6:54 AM, Fred Baker (fred) <[email protected]> wrote:

> 
> On Mar 14, 2013, at 12:43 AM, Michael Welzl <[email protected]> wrote:
> 
>> Great, I turn on ECN, and that gives me more delay, just what I want!
>> And, even better: the more people use ECN, the more delay everybody gets.
>> 
>> Seriously, I see the incentive idea behind this two-level marking idea, but 
>> one has to carefully consider the increased delay vs. gained throughput 
>> trade-off in such a scheme.
> 
> I don't understand your comment. Fill me in please?
> 
> If I have a tail-drop queue, TCP using CUBIC or New Reno seeks to keep the 
> queue full. If the queue has N positions (there are many ways to measure a 
> queue, bear with this one for the purpose of the example), worst case delay 
> approximates waiting for N messages, and general case delay is predicted by 
> queuing theory to shoot for N when the link has 95% utilization or whatever.
> 
> With any AQM algorithm, the same queue will accept N messages in the event 
> that it gets a burst, but will start marking/dropping at some point M 
> significantly less than N, so that the queue depth tends to approximate M, 
> not N. That's the whole point of any AQM algorithm. How M is chosen or 
> predicted is of course different for various algorithms.
> 
> The pushback I generally hear about ECN is that people will mark traffic as 
> ECN-capable in order to work around AQM algorithms signaling early; to make 
> them signal later. I am told that people do abuse the EF code point in order 
> to make their traffic go into a priority queue, so I can imagine people doing 
> this as well.
> 
> What I am suggesting is that the AQM algorithm use the appropriate signal to 
> make queue depth approximate M, whether than is ECN or loss depending on the 
> traffic marking, and in the event of abuse do no worse than present 
> it's-done-everywhere tail drop, in which the queue depth tends towards N.
> 
> M < N.
> 
> Remind me how saying that one will use ECN for ECN-capable traffic makes the 
> average queue depth deeper? That makes no sense to me.


Following up on the comment about M and N. Let me show you some (old and low 
speed) slides that I use as examples in this class of discussion. It's not ECN 
vs loss, it's AQM vs tail-drop. However, since AQM-using-ECN and AQM-using-loss 
are examples of AQM, and the argument in the draft is to use both depending on 
the capabilities of the traffic, I think it's applicable.

    ftp://ftpeng.cisco.com/fred/dpreed/Bufferbloat_Masterclass.pdf

I'm looking at slides 11 and 12. Comments on the rest of the deck are 
interesting, but I'd suggest taking that private in order to not muddy this 
discussion.

The slides were pulled together as a simple and understandable example of what 
I said above about M and N. "AQM", in this case, is "RED"; other algorithms 
will display slightly different characteristics, but I assert that they will 
have the same basic behavior. The value and perhaps appearance of M will vary, 
but not the fact of it.

The test setup is two hosts talking across two low end routers connected with a 
2 MBPS link. I could go faster; the charts would display timings appropriate to 
the line speeds and I would have to generate some multiple of the traffic to 
present the issue, but the structure of the behavior would be about the same. 

     +----+   +------+         +------+   +-----+
     |Left|   |Left  +---------+Right |   |Right|
     |Host|   |Router|         |Router|   |Host |
     +--+-+   +---+--+         +---+--+   +---+-+
        |         |                |          |
    ----+---------+---         ----+----------+---

On the "Left Host", I have a large file, large enough to take perhaps ten 
seconds to move to the "Right Host" in isolation. In one window on "Left", I 
run a ping -s to measure the RTT of traffic between the two hosts, and by 
extension the changing depth of the queue in "Left Router". In another window, 
I run a script. The script counts from one to fifteen. On each value, it 
simultaneously opens up that number of FTPs to copy said file to "Right Host". 
When the last FTP in the set completes, it waits a few seconds, rsh's to delete 
the now-unneeded copies from "Right", and goes on to the next value.

Afterwards, I did some simple calculations in Excel. I took successive and 
overlapping sets of ping measurements (ten, IIRC) and calculated the min, max, 
average (probably should have been median), and standard deviation of the set. 
I then plotted the values.

I ran the test twice, the difference being that I used RED or tail drop.

Slide 11 is the tail drop test. You see 15 bumps, measuring traffic from 1..15 
FTPs moving said file in parallel. What is pretty obvious is that the variation 
in the queue is at the top of the queue; if N was five, we would be discussing 
variation around five; if it was five hundred, we would be discussing variation 
around five hundred. loss-based TCPs work Really Hard to fill the queue and 
keep it full, resulting in the potential queue depth being a predictor of end 
to end delay and variation in delay.

Slide 12 is the RED test. It's pretty obvious where I set min-thresh. I'm not, 
in this, suggesting a value for min-thresh, BTW; I set it somewhere to 
demonstrate that where I set it is the average queue depth in the scenario. You 
want it somewhere else, be my guest. What I observe is wider variation in queue 
depth, because I am using the configured depth of the queue to absorb bursts 
when they happen. But the average queue depth is at min-threshold. M, which is 
less than N.

I'll argue, and I think we probably generally agree, that the optimum target 
average queue depth is close to zero, and we'd like to use the buffer available 
to absorb and play out bursts when they happen. It would also be nice if the 
impact of AQM on the various data flows were more consistent, so that there was 
less variation in delay, and that might be true of some algorithms. But that's 
the target.

Coming back to ECN vs loss, ECN doesn't increase delay, it enables the same 
signal to be sent end to end, but sent more quickly and reliably (it is 
explicit and is known to be about congestion, as opposed to being implicit and 
possibly triggered by other factors). It targets the same mean queue depth. 
Signaling without losing traffic is better than losing traffic and calling that 
signaling, as the session completes more quickly and reliably and is then out 
of the way. I don't see the down side.

Reply via email to