Re: [Codel] [aqm] CoDel's control law that determines drop frequency

Dave Taht Tue, 03 Nov 2015 08:28:08 -0800

It helps to have the codel mailing list cc'd on codel discussions.
Adding this message to the cc.


One of these days we do have to write up - after finishing - cake's
code-likel implementation.


Dave Täht
I just invested five years of my life to making wifi better. And,
now... the FCC wants to make my work, illegal for people to install.
https://www.gofundme.com/savewifi


On Tue, Nov 3, 2015 at 11:22 AM, Jeff Weeks <jwe...@sandvine.com> wrote:
> The drop rate is affected by sojourn time, yes, but a 2x sojourn time goes 
> through the same incremental reduction of interval size, as does a sojourn 
> time of x.
>
> In investigating codel, I've setup various worst case scenarios, and I feel 
> like the algorithm could be made better by having its response time more 
> dependent upon how far away from the target latency it is.
>
> For example, consider a disabled interface, with a large (or even 
> conceptually infinite queue), that's attached to a fairly small shaper.
>
> The interface is then enabled, and immediately starts seeing 100Mbps, and 
> tries to shape it to 1Mbps.
>
> The queue will obviously build up quickly, and codel will notice this, and 
> enter the drop state.  But it will start at count = 1.
>
> If the interface is receiving 64b udp packets, then it'll be receiving 
> 100,000,000/512 == 195,312 packets per second, and only transmitting 1953 
> packets per second.
>
> The default target is 5ms, which is about 10 packets.  So of those 195,312 
> packets/second, we should ideally be dropping 195,312 - (1953 + 10) == 
> 192,349 packets/second.
>
> But in order to drop that many packets, 'count' needs to ramp up to the point 
> where the drop interval is consistently 5,198 ns.
>
> I believe that means 'count' has to reach some nearly impossibly high value 
> of (100ms/5198ns)^2 == 370,107,128
>
> I say nearly impossible, because it will take minutes (hours?) to get that 
> high (if my math is correct, it'll take over 17 seconds just to reach 7500).
>
> In the meantime, the queue *isn't* being effectively managed, as packets with 
> extremely high latencies will be transmitted for far too long.
>
> Of course, as I stated earlier, simply increasing count more quickly, based 
> on how far away we are from the target latency effectively invalidates the 
> optimization which most (all?) codel implementations use (namely the newton 
> step integer-only sqrt approximation) as, at some point, the approximation 
> starts *diverging* from the appropriate value.
>
> One alternative which I've been investigating is the possibility of skewing 
> the precalculated 1/sqrt(count) value.
>
> If this is kept as a 32-bit all decimal fixed point number, then performing 
> the multiplication by intentionally miss-shifting will result in doubling 
> sqrt(count):
>
> eg, take the following accurate calculation of next interval:
>
>     codel->next_interval_start_ticks = base_time + ((interval * 
> codel->one_over_sqrt_count) >> 32)
>
> And intentionally miss-shift by 1 bit:
>
>     codel->next_interval_start_ticks = base_time + ((interval * 
> codel->one_over_sqrt_count) >> 33)
>
> Will effectively have the interval reduce twice as fast.
>
> Alternatively, (and similarly to how CAKE halves the count while re-entering 
> the drop interval), count can periodically be doubled, if the current value 
> is seen to not be adequately affecting traffic, and the pre-calculated 
> 1/sqrt(count) can then be divided by sqrt(2) (i.e., do not rely on the newton 
> step approximation for this modification of count).
>
> Cheers,
> --Jeff
>
>
>
> ________________________________
> /dev/jeff_weeks.x2936
> Sandvine Incorporated
> ________________________________
> From: aqm [aqm-boun...@ietf.org] on behalf of Andrew Mcgregor 
> [andrewm...@google.com]
> Sent: Sunday, October 25, 2015 6:44 PM
> To: Dave Dolson
> Cc: Kathleen Nichols; Bob Briscoe; Dave Taht; Van Jacobson; AQM IETF list
> Subject: Re: [aqm] CoDel's control law that determines drop frequency
>
> CoDel does have the form of a controller; drop rate (not probability) is a 
> function of sojourn time (not queue size) and history, encoded in the state 
> variables.
>
> Now, I don't take it as proven that the particular form of the controller is 
> the best we could do, but making it a rate and based on sojourn time are 
> clear wins. Yes, you can use size as a proxy for sojourn time if your link 
> really has a constant bit rate, but not even ethernet is exactly CBR in 
> practice (and in some hardware situations, knowing the size is much more 
> expensive than measuring sojourn; the opposite can also apply).  Yes, you can 
> use probability as a proxy for rate if you are doing a hardware 
> implementation or otherwise doing the rate the way CoDel does is hard.
>
> PIE is closely related; the controller is the biggest difference, and it uses 
> both those proxies because it was aimed at a situation where there was 
> hardware support for one set of choices.
>
> On 23 October 2015 at 07:07, Dave Dolson 
> <ddol...@sandvine.com<mailto:ddol...@sandvine.com>> wrote:
> Catching up...
>
> Bob said:
>> I meant to say in my mail to Andrew that the rate at which the control
>> law increases the dropping frequency ought to depend on how far the
>> queue is from target, or even how quickly it is diverging from target.
>> I've always said that this control law should not be just a hard-coded
>> thing that has no dependency on how the queue is moving.
>
> In other words, make it a PID controller, not just a PI controller?
> (Or is it currently just "I" ?)
> With consideration of the derivative term (how quickly the queue
> size is moving away from the target), a faster deviation from target
> queue size would cause a faster response.
>
>
> I find it unfortunate that neither CODEL nor PIE algorithms are formulated as 
> a
> controller. It's difficult to reason about.
> (They *are* controllers, just not explained that way.)
> IMHO, an explanation should begin with a linear controller,
> with drop probability as a function of queue size,
> and then explanations of why non-linear modifications were necessary.
>
>
> -----Original Message-----
> From: aqm [mailto:aqm-boun...@ietf.org<mailto:aqm-boun...@ietf.org>] On 
> Behalf Of Bob Briscoe
> Sent: Wednesday, September 30, 2015 9:09 AM
> To: Dave Taht; Andrew Mcgregor
> Cc: Kathleen Nichols; AQM IETF list; Van Jacobson
> Subject: Re: [aqm] CoDel's control law that determines drop frequency
>
> Dave,
>
> It says the point at which cake enters drop state depends on how quickly
> the queue is growing.
> But I don't see anything on the page you referenced about a better
> curve. I looked under AQM and under other possibly relevant headings.
>
> I meant to say in my mail to Andrew that the rate at which the control
> law increases the dropping frequency ought to depend on how far the
> queue is from target, or even how quickly it is diverging from target.
> I've always said that this control law should not be just a hard-coded
> thing that has no dependency on how the queue is moving.
>
> Also below I've restated a couple of points I made back in 2013 when I
> first posted this.
>
> On 12/11/13 22:30, Bob Briscoe wrote:
>> 5/ My analysis also shows that the rate of increase in drop probablity
>> is inversely proportional to the link rate. I don't believe this is
>> desirable, as it will require tuning for the expected number of flows
>> on the link. However it is probably OK for variable rate links.
>>
>> 6/ The rate of increase of drop probability is also inversely
>> proportional to the square of 'interval'. I suggest a new constant is
>> defined for the control law, rather than relating the control law to
>> interval in this way. Otherwise, when anyone tunes interval, they will
>> get undesirable changes to the control law (e.g. reducing interval 10x
>> [to 10ms] would make the control law 100x more sluggish).
>
> This last point is particularly important. If the control law is still
> to be hard-coded, it should relate to another variable RTT_ave, not
> interval. TO decide when to enter dropping state, interval should be set
> to the likely worst-case RTT (in the low hundreds of ms). But, If
> hard-coded, the dynamics of the control law will be correct most often
> if it depends on the /most likely/ RTT, which these days with CDNs is
> perhaps 10-25ms (depending on population density in a country).
>
> There's another reason for having two separate variables, not just
> 'interval' for both. I've said the following before somewhere as well:
> For ECN marking, 'interval' can be set much lower (or even zero) because
> being on a hair-trigger with ECN marks doesn't matter, whereas it does
> for drops.{Note 1} Then CoDel will signal ECN without the sluggish delay
> it has to have before it starts dropping. However, once it has entered
> dropping state, the ECN control law still needs to be based on the same
> RTT_ave as for drop, because the control law is trying to mimic the
> dynamics of the congestion control, which RFC3168 says should be the
> same for ECN as for drop.{Note 2}
>
>
> Bob
>
> {Note 1}: The faster dynamics won't affect the long-term stable rate
> between an ECN and a non-ECN flow, because by definition long-term
> stable means the dynamics have settled.
> {Note 2}: As suggested in draft-briscoe-aqm-dualq-coupled-00, if the
> congestion control were radically different, the packets would probably
> need another identifier to distinguish them from 'classic' ECN.
>
> On 30/09/15 10:00, Dave Taht wrote:
>> On Wed, Sep 30, 2015 at 1:50 AM, Bob Briscoe 
>> <i...@bobbriscoe.net<mailto:i...@bobbriscoe.net>> wrote:
>>> Andrew,
>>>
>>> I am also not so interested in an AQM dealing directly with unresponsive
>>> traffic - I prefer to keep policing and AQM as separately deployable
>>> functions, because AQM should be policy-neutral, whereas policing inherently
>>> involves policy.
>>>
>>> My concern was merely that CoDel's linear increase in drop probability can
>>> take a long time to reach where it intends to get to. I would have thought
>>> some form of exponential increase, or at least super-linear, would have been
>>> more responsive to changing traffic conditions. I.e., rather than have to
>>> answer the question "how quickly should drop probability increase?", make it
>>> increase increasingly quickly.
>>>
>>> Early on, Rong Pan showed that it takes CoDel ages to bring high load under
>>> control. I think this linear increase is the reason.
>> cake uses a better curve for codel, but we still need to do more
>> testing in the lab.
>>
>> http://www.bufferbloat.net/projects/codel/wiki/CakeTechnical
>>
>>> Bob
>>>
>>>
>>>
>>> On 30/09/15 01:42, Andrew Mcgregor wrote:
>>>
>>> Hmm, that's really interesting.
>>>
>>> Most interesting is that my understanding is that the control law was
>>> intended to deal with aggregates of mostly TCP-like traffic, and that an
>>> overload of unresponsive traffic wasn't much of a goal; this seems like
>>> vaguely reasonable behaviour, I suppose, given that pathological situation.
>>>
>>> But I don't have a way to derive the control law from first principles at
>>> this time (I haven't been working on that for a long time now).
>>>
>>> On 25 September 2015 at 06:27, Bob Briscoe 
>>> <i...@bobbriscoe.net<mailto:i...@bobbriscoe.net>> wrote:
>>>> Toke,
>>>>
>>>> Having originally whinged that no-one ever responded to my original 2013
>>>> posting, now it's my turn to be embarrassed for having missed your
>>>> interesting response for over 3 months.
>>>>
>>>> Cool that the analysis proves correct in practice - always nice.
>>>>
>>>> The question is still open whether this was the intention, and if so why
>>>> this particular control law was intended.
>>>> I would rather we started from a statement of what the control law ought
>>>> to do, then derive it.
>>>>
>>>> Andrew McGregor said he would have a go at this question some time ago...
>>>> Andrew?
>>>>
>>>>
>>>> Bob
>>>>
>>>>
>>>>
>>>> On 07/06/15 20:27, Toke Høiland-Jørgensen wrote:
>>>>
>>>> Hi Bob
>>>>
>>>> Apologies for reviving this ancient thread; been meaning to get around
>>>> to it sooner, but well... better late than never I suppose.
>>>>
>>>> (Web link to your original mail, in case Message-ID referencing breaks:
>>>> https://www.ietf.org/mail-archive/web/aqm/current/msg00376.html ).
>>>>
>>>> Having recently had a need to understand CoDel's behaviour in more
>>>> detail, your analysis popped out of wherever it's been hiding in the
>>>> back of my mind and presented itself as maybe a good place to start. :)
>>>>
>>>> So anyhow, I'm going to skip the initial assertions in your email and
>>>> focus on the analysis:
>>>>
>>>> Here's my working (pls check it - I may have made mistakes)
>>>> _________________
>>>> For brevity, I'll define some briefer variable names:
>>>>          interval =      I [s]
>>>>          next_drop =     D [s]
>>>>          packet-rate =   R [pkt/s]
>>>>          count =         n [pkt]
>>>>
>>>> >From the CoDel control law code:
>>>>          D(n) = I / sqrt(n)
>>>> And the instantaneous drop probability is:
>>>>          p(n) = 1/( R * D(n) )
>>>>
>>>> Then the slope of the rise in drop probability with time is:
>>>>          Delta p / Delta t       = [p(n+1) - p(n)] / D(n)
>>>>                                  = [1/D(n+1) - 1/D(n)] / [ R * D(n) ]
>>>>                                  = sqrt(n) * [sqrt(n+1) - sqrt(n)] /
>>>> [R*I*I]
>>>>                                  = [ sqrt(n(n+1)) - n ] / R*I^2
>>>>
>>>> I couldn't find anything wrong with the derivation. I'm not entirely
>>>> sure that I think it makes sense to speak about an "instantaneous drop
>>>> probability" for an algorithm that is not probabilistic in nature.
>>>> However, interpreting p(n) as "the fraction of packets dropped over the
>>>> interval from D(n) to D(n+1)" makes sense, I guess, and for this
>>>> analysis that works.
>>>>
>>>> At count = 1, the numerator starts at sqrt(2)-1 = 0.414.
>>>> Amd as n increases, it rapidly tends to 1/2.
>>>>
>>>> So CoDel's rate of increase of drop probability with time is nearly
>>>> constant (it
>>>> is always between 0.414 and 0.5) and it rapidly approaches 0.5 after a few
>>>> drops, tending towards:
>>>>          dp/dt = 1/(2*R*I^2)
>>>>
>>>> This constant increase clearly has very little to do with the square-root
>>>> law of
>>>> TCP Reno.
>>>>
>>>> In the above formula, drop probability increases inversely proportional to
>>>> the
>>>> packet rate. For instance, with I = 100ms and 1500B packets
>>>> at 10Mb/s =>    R = 833 pkt/s =>        dp/dt = 6.0% /s
>>>> at 100Mb/s =>   R = 8333 pkt/s =>       dp/dt = 0.6% /s
>>>>
>>>> I also tried to test this. I configured CoDel (on a Linux 4.0 box) on
>>>> 1Mbps, 2Mbps and 10Mbps links with interval settings of 1 second and
>>>> 500ms, and a total packet limit of 100k packets. This was to make it
>>>> deliberately slower to react (so the change in drop probability is more
>>>> visible), and to make sure no packets are dropped from queue overflow.
>>>>
>>>> I then sent an unresponsive UDP stream over the link at 110% of the link
>>>> capacity (as passed to Iperf, so approximately), and collected the
>>>> output of `tc -s qdisc` every 0.2 seconds.
>>>>
>>>> The attached plot is of 'pkts dropped / (pkts sent + pkts dropped)' in a
>>>> 2-second sliding window over the duration of the test (the plot is also
>>>> available here:
>>>> https://kau.toke.dk/ietf/codel-drop-rate/codel-drop-rate.svg ).
>>>>
>>>> I've included linear trend lines from the initial time to the point of
>>>> maximum drop probability, and as is apparent from the plot, got quite a
>>>> good fit (r>0.99 for all six data sets). The legend includes the slopes
>>>> of the linear fits for each of the data sets, which are not too far from
>>>> what your analysis predicts (and I'm guessing the difference can be
>>>> attributed to the difference in exact packet rates, but I haven't
>>>> checked).
>>>>
>>>> The Flent data files with the qdisc stats over time (readable by the
>>>> newest git version of Flent), as well as the Python script I used to
>>>> create the graph are available here:
>>>> https://kau.toke.dk/ietf/codel-drop-rate/
>>>>
>>>> So, in short: It seems that CoDel's "drop rate" does increase linearly
>>>> in the presence of a persistent queue, and that the rate of increase
>>>> depends on both the interval and the link rate.
>>>>
>>>> Now, I'll refrain from commenting on whether or not this is "bad", or
>>>> indeed if it is contrary to design. It was surprising to me at least, so
>>>> I thought I'd share my findings, in the hope that someone would either
>>>> find them useful or tell me how they're wrong (or both!). :)
>>>>
>>>> -Toke
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> aqm mailing list
>>>> a...@ietf.org<mailto:a...@ietf.org>
>>>> https://www.ietf.org/mailman/listinfo/aqm
>>>>
>>>>
>>>> --
>>>> ________________________________________________________________
>>>> Bob Briscoe                               http://bobbriscoe.net/
>>>>
>>>>
>>>> _______________________________________________
>>>> aqm mailing list
>>>> a...@ietf.org<mailto:a...@ietf.org>
>>>> https://www.ietf.org/mailman/listinfo/aqm
>>>>
>>>
>>>
>>> --
>>> Andrew McGregor | SRE | andrewm...@google.com<mailto:andrewm...@google.com> 
>>> | +61 4 1071 2221<tel:%2B61%204%201071%202221>
>>>
>>>
>>> --
>>> ________________________________________________________________
>>> Bob Briscoe                               http://bobbriscoe.net/
>>>
>>>
>>> _______________________________________________
>>> aqm mailing list
>>> a...@ietf.org<mailto:a...@ietf.org>
>>> https://www.ietf.org/mailman/listinfo/aqm
>>>
>>
>>
>
> --
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>
> _______________________________________________
> aqm mailing list
> a...@ietf.org<mailto:a...@ietf.org>
> https://www.ietf.org/mailman/listinfo/aqm
>
>
>
> --
> Andrew McGregor | SRE | andrewm...@google.com<mailto:andrewm...@google.com> | 
> +61 4 1071 2221
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

Re: [Codel] [aqm] CoDel's control law that determines drop frequency

Reply via email to