date:20181129

> On 29 Nov 2018, at 13:52, Jonathan Morton  wrote:
> 
>> On 29 Nov, 2018, at 2:06 pm, Michael Welzl  wrote:
>> 
>>> That's my proposal.
>> 
>> - and it's an interesting one. Indeed, I wasn't aware that you're thinking 
>> of a DCTCP-style signal from a string of packets.
>> 
>> Of course, this is hard to get right - there are many possible flavours to 
>> ideas like this ... but yes, interesting!
> 
> I'm glad you think so.  Working title is ELR - Explicit Load Regulation.
> 
> As noted, this needs standardisation effort, which is a bit outside my realm 
> of experience - Cake was a great success, but relied entirely on exploiting 
> existing standards to their logical conclusions.  I think I started writing 
> some material to put in an I-D, but got distracted by something more urgent.

Well - "interesting" is one thing, "better than current proposals" is 
another... I guess this needs lots of evaluations before going anywhere.

> If there's an opportunity to coordinate with relevant people from similar 
> efforts, so much the better.  I wonder, for example, whether the DCTCP folks 
> would be open to supporting a more deployable version of their idea, or 
> whether that would be a political non-starter for them.

I'm not convinced (and I strongly doubt that they would be) that this would 
indeed be more deployable; your idea also includes TCP option changes, which 
have their own deployment trouble... the L4S effort, to me, sounds "easier" to 
deploy  (which is not to say that it's easy to deploy at all; though I did like 
a recent conversation on possibly deploying it with a PEP... that sounded quite 
doable to me).

Cheers,
Michael

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)


On Fri, 30 Nov 2018, Jonathan Morton wrote:

Ah, so you're thinking in terms of link-layers which perform local 
retransmission, like wifi.  So the optimisation is to not delay packets 
"behind" a corrupted packet while the latter is retransmitted.


Yes.

It's possible for a TCP to interpret a reordered packet as missing, 
triggering an end-to-end retransmission which is then discovered to be 
unnecessary.  At the application level, TCP also performs the same HoL 
blocking in response to missing data.  So it's easy to see why links try 
to preserve ordering, even to this extent, but I suspect they typically 
do so on a per-station basis rather than per-flow.


It's a "truth-everybody-knows" in networking that "NEVER RE-ORDER PACKETS 
WITHIN 5-TUPLE FLOW! THERE BE DRAGONS THERE!". I'd also say I see 
enough transport people who says that this should be true generally, if 
nothing else because of legacy.


Personally I think the problem of reordering packets is overblown, and 
that TCPs can cope with occasional missing or reordered packets without 
serious consequences to performance.  So if you add "reordering 
tolerant" to the list of stuff that Diffserv can indicate, you might 
just end up with all traffic being marked that way.  Is that really 
worthwhile?


Question isn't so much about TCP, it's the other things I am worried 
about. TCP handles re-ordering kind of gracefully, other protocols might 
not.


Oddly enough, wifi is now one of the places where FQ is potentially 
easiest to find, with Toke's work reaching the Linux kernel and so many 
wifi routers being Linux based.


Again, even if they're using Linux they will/might have packet 
accelerators that just grab the flow and the kernel never sees it again. 
No FQ_CODEL for that.


An acknowledged problem is overly persistent retries by the ARQ 
mechanism, such that the time horizon for the link-layer retransmission 
often exceeds that of the end-to-end RTO, both for TCP and 
request-response protocols like DNS. I say, retransmit at the link layer 
once or twice, then give up and let the end-hosts sort it out.


I agree, but I also think that it would help some link-layers if the 
re-ordering requirement could be relaxed. However, before that can be 
communicated a lot of study needs to be done to check if this is actually 
true. I've had incidents in my 20 year networking career where it's not 
and applications misbehaved when they were re-ordered.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?


On Thu, 29 Nov 2018, Stephen Hemminger wrote:

The problem is that any protocol is mostly blind to the underlying 
network (and that can change).  To use dave's analogy it is like being 
put in the driver seat of a vehicle blind folded.  When you step on the 
gas you don't know if it is a dragster, jet fighter, or a soviet 
tractor. The only way a protocol can tell is based on the perceived 
inertia and when it runs into things...


Actually, I've made the argument to IETF TCPM that this is not true. You 
can be able to communicate earlier data from previous flows on the same 
connection so that new flows can re-learn this.


If no flow the past hour has been able to run faster than 1 megabit/s and 
always PMTUD to 1460 bytes MTU outbound, then there is good chance that 
the next flow will encounter the same thing. Why not use this information 
when guessing how things will behave going forward?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

On Wed, Nov 28, 2018 at 11:36 PM Jonathan Morton  wrote:
>
> > On 29 Nov, 2018, at 9:28 am, Mikael Abrahamsson  wrote:
> >
> > This is one thing about L4S, ETC(1) is the last "codepoint" in the header 
> > not used, that can statelessly identify something. If anyone sees a better 
> > way to use it compared to "let's put it in a separate queue and CE-mark it 
> > agressively at very low queue depths and also do not care about re-ordering 
> > so a ARQ L2 can re-order all it wants", then they need to speak up, soon.
>
> You are essentially proposing using ECT(1) to take over an intended function 
> of Diffserv.  In my view, that is the wrong approach.  Better to improve 
> Diffserv to the point where it becomes useful in practice.  Cake has taken 
> steps in that direction, by implementing some reasonable interpretation of 
> some Diffserv codepoints.
>
> My alternative use of ECT(1) is more in keeping with the other codepoints 
> represented by those two bits, to allow ECN to provide more fine-grained 
> information about congestion than it presently does.  The main challenge is 
> communicating the relevant information back to the sender upon receipt, 
> ideally without increasing overhead in the TCP/IP headers.

I felt that using this bit up as a separate indicator of an alternate
algorithm in play for indicating congestion was a pretty good idea...
but no-one was listening at the time.

>  - Jonathan Morton
>
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] go's improvements to gc

2018-11-29 Thread David Collier-Brown

If you can think in terms of pipes, Go makes you /productive/. I 
recognized that about 13 hours into learning Go about two years ago.


My advice as an aphorism: "redo something you've done at least once 
before, in go, and see how different it it. Then decide if it's better"


--dave

On 2018-11-29 8:33 p.m., Dave Taht wrote:

as remarkable as our efforts have been to reduce network bloat, I have
to take my hat off to the
golang garbage collection folk, also.

Reductions in latencies from 300ms to 500us in 4 years. Good story
here about how latency is cumulative:

https://blog.golang.org/ismmkeynote

The very first thing I learned about go, was how to turn off the
garbage collector. I guess I have to go learn some go, for real, now.


--
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
dav...@spamcop.net   |  -- Mark Twain

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

[Bloat] go's improvements to gc

as remarkable as our efforts have been to reduce network bloat, I have
to take my hat off to the
golang garbage collection folk, also.

Reductions in latencies from 300ms to 500us in 4 years. Good story
here about how latency is cumulative:

https://blog.golang.org/ismmkeynote

The very first thing I learned about go, was how to turn off the
garbage collector. I guess I have to go learn some go, for real, now.

-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

>> I have to ask, why would the network care?  What optimisations can be 
>> obtained by reordering packets *within* a flow, when it's usually just as 
>> easy to deliver them in order?
> 
> Because most implementations aren't flow aware at all and might have 4 
> queues, saying "oh, this single queue is for transports that don't care about 
> ordering" means everything in that queue can just be sent as soon as it can, 
> ignoring HOL caused by ARQ.

Ah, so you're thinking in terms of link-layers which perform local 
retransmission, like wifi.  So the optimisation is to not delay packets 
"behind" a corrupted packet while the latter is retransmitted.

It's possible for a TCP to interpret a reordered packet as missing, triggering 
an end-to-end retransmission which is then discovered to be unnecessary.  At 
the application level, TCP also performs the same HoL blocking in response to 
missing data.  So it's easy to see why links try to preserve ordering, even to 
this extent, but I suspect they typically do so on a per-station basis rather 
than per-flow.

Personally I think the problem of reordering packets is overblown, and that 
TCPs can cope with occasional missing or reordered packets without serious 
consequences to performance.  So if you add "reordering tolerant" to the list 
of stuff that Diffserv can indicate, you might just end up with all traffic 
being marked that way.  Is that really worthwhile?

>> Of course, we already have FQ which reorders packets in *different* flows.  
>> The benefits are obvious in that case.
> 
> FQ is a fringe in real life (speaking as a packet moving monkey). It's just 
> on this mailing list that it's the norm.

Oddly enough, wifi is now one of the places where FQ is potentially easiest to 
find, with Toke's work reaching the Linux kernel and so many wifi routers being 
Linux based.

An acknowledged problem is overly persistent retries by the ARQ mechanism, such 
that the time horizon for the link-layer retransmission often exceeds that of 
the end-to-end RTO, both for TCP and request-response protocols like DNS. I 
say, retransmit at the link layer once or twice, then give up and let the 
end-hosts sort it out.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-29 Thread Luca Muscariello

Mario,

putting aside LoLa for a second,
I'm not quite sure that the theorem you cite is valid.

According to the model R_i is the sending rate.
The sum across all flows bottlenecked at the link does not need to be equal
to the link capacity.
The input rate can be instantaneously below or above the link capacity.
Even If the link is never idle and the output rate is always C for all t.

If the sum_i R_i = C is false because of what I said, than p_i which is
nothing more than
the shadow price of the link capacity constraint, can be a function of a
constant delay d, i.e. p_i = cost * d for all i.

This theorem can be valid only if the input rate of a queue is
instantaneously equal to the output queue.
We all know that a queue exists just because there is an instantaneous
difference of input and output rates
at the link. So to conclude,  this theorem if valid iff input == output
rate then the queue is always zero, i.e.  d=0.

The theorem is either an artefact of the model or just wrong. Or I'm
missing something...






On Thu, Nov 29, 2018 at 6:07 PM Mario Hock  wrote:

> Hi Luca,
>
> I'm answering on behalf of Roland, since I am a co-author of the paper.
>
> This is an excellent question, since it goes right at the heart of how
> LoLa works.
>
> Indeed, the paper is a first of a series. A second one, looking deeper
> into the fair flow balancing mechanism, is currently under submission.
>
> Similar as other delay based congestion controls, LoLa tries to achieve
> fairness by allowing each flow to buffer the same amount of data at the
> bottleneck. We have this, e.g., in TCP Vegas, and (in a way) also in
> Copa (a recently proposed congestion control) and many others. If this
> is achieved, we get flow rate fairness independent of a flow's RTT.
>
> Usually (in other congestion controls) this "allowed amount of data" is
> fixed per flow. We presume that this approach does not scale well to
> high speed networks. Since the queuing delay resulting from this amount
> of data is reduced with increasing bottleneck rate. Thus, it becomes
> harder to measure it right. This can easily be seen (and proven) for TCP
> Vegas.
>
> Note: Just using higher fixed values is not an option, since it would
> not work at lower speeds anymore and also not with a large number of flows.
>
> Therefore, LoLa tries to find a suitable value for the "allowed amount
> of data" dynamically. This is X(t).
>
> Our approach is to grow X(t) over time during the Fair Flow Balancing
> phase. This phase ends when the queuing delay reaches 5ms. Thus, (in the
> ideal case) at the end of Fair Flow Balancing, X(t) is just as large
> that all flows at bottleneck create a queuing delay of 5ms, and all
> flows contribute equally to this queue. Hence, flow rate fairness is
> achieved. (Note that LoLa is designed in a way that t is (almost)
> synchronized among the competing flows.)
>
> Generally, other ways of determining a suitable X(t) are conceivable. In
> our approach X(t) is a monotonically increasing function, but it is
> regularly reset as LoLa cycles between its states; i.e., after a queuing
> delay of 5ms is reached, the queue is drained and everything starts
> again. (Thus, the timespan where X(t) is monotonically increased is
> called a "round of fair flow balancing".)
>
> This way we can overcome the constraint given in [1]:
>
> """
> THEOREM 6 (FAIRNESS/DELAY TRADEOFF). For congestion control mechanisms
> that have steady state throughput of the kind R = f(d, p), for some
> function f, delay d and feedback p, if the feedback is based on purely
> end to end delay measurements, you can either have fairness or a fixed
> delay, but not both simultaneously
> """
>
> [1] "ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY"
> Yibo Zhu et al., https://dl.acm.org/citation.cfm?id=2999593
>
> Best, Mario
>
>
> Am 29.11.18 um 17:09 schrieb Luca Muscariello:
> > Hi Roland,
> >
> > It took me quite a lot of time to find this message in the thread...
> > I read the paper you sent and I guess this is the first of a series as
> > many things stay uncovered.
> >
> > Just a quick question: why is X(t) always increasing with  t?
> >
> >
> > On Tue, Nov 27, 2018 at 11:26 AM Bless, Roland (TM)
> > mailto:roland.bl...@kit.edu>> wrote:
> >
> > Hi Luca,
> >
> > Am 27.11.18 um 10:24 schrieb Luca Muscariello:
> >  > A congestion controlled protocol such as TCP or others, including
> > QUIC,
> >  > LEDBAT and so on
> >  > need at least the BDP in the transmission queue to get full link
> >  > efficiency, i.e. the queue never empties out.
> >
> > This is not true. There are congestion control algorithms
> > (e.g., TCP LoLa [1] or BBRv2) that can fully utilize the bottleneck
> link
> > capacity without filling the buffer to its maximum capacity. The BDP
> > rule of thumb basically stems from the older loss-based congestion
> > control variants that profit from the standing queue that they built
> > over time

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

On Thu, Nov 29, 2018 at 10:43 AM Stephen Hemminger
 wrote:
>
> On Wed, 28 Nov 2018 23:35:53 -0800
> Dave Taht  wrote:
>
> > > As someone who works with moving packets, it's perplexing to me to
> > > interact with transport peeps who seem enormously focused on
> > > "goodput". My personal opinion is that most people would be better off
> > > with 80% of their available bandwidth being in use without any
> > > noticable buffer induced delay, as opposed to the transport protocol
> > > doing its damndest to fill up the link to 100% and sometimes failing
> > > and inducing delay instead.
>
> The problem is that any protocol is mostly blind to the underlying network
> (and that can change).  To use dave's analogy it is like being put in
> the driver seat of a vehicle blind folded.  When you step on the gas you
> don't know if it is a dragster, jet fighter, or a soviet tractor. The only
> way a protocol can tell is based on the perceived inertia and when
> it runs into things...

coffee. blown through nose. thx! I'm still chuckling - 'because I also
referenced a church.
You sit in a pew, and *pray* for something good to happen.

> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-29 Thread Stephen Hemminger

On Wed, 28 Nov 2018 23:35:53 -0800
Dave Taht  wrote:

> > As someone who works with moving packets, it's perplexing to me to
> > interact with transport peeps who seem enormously focused on
> > "goodput". My personal opinion is that most people would be better off
> > with 80% of their available bandwidth being in use without any
> > noticable buffer induced delay, as opposed to the transport protocol
> > doing its damndest to fill up the link to 100% and sometimes failing
> > and inducing delay instead.  

The problem is that any protocol is mostly blind to the underlying network
(and that can change).  To use dave's analogy it is like being put in
the driver seat of a vehicle blind folded.  When you step on the gas you
don't know if it is a dragster, jet fighter, or a soviet tractor. The only
way a protocol can tell is based on the perceived inertia and when
it runs into things...
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] found another good use for a queue today, possibly

His thesis is more clear:
https://sites.google.com/site/yuriyarbitman/Home/de-amortizedcuckoohashing

He did exclude the cost of a resize, but, still... I find the core
idea very attractive.

We swapped an email and he said:

>   In general, I would say that a cryptographic hash function will do.
>   If you want to use a non-cryptographic hash function, then the
>   question is what provable random properties it has. This is also
>   discussed in the thesis and in the paper.

On Mon, Nov 26, 2018 at 6:17 PM Dave Taht  wrote:
>
> I had been investigating various hashing schemes for speeding up the
> babeld routing protocol daemon, and dealing with annoying bursty cpu
> behavior (resizing memory, bursts of packets, thundering herds of
> retractions), and, although it's a tough slog of a read, this adds a
> queue to cuckoo hashing to good effect in flattening out insertion
> time.
>
> https://arxiv.org/pdf/0903.0391.pdf
>
> But for all I know it's dependent on angels dancing on saddles mounted
> on unicorns. I skip to the graphs for insertion time and go back to
> the text for another round...
>
> "polylog(n)-wise Independent Hash Function". OK, my google-foo fails
> me: The authors use sha1, would something lighter weight suit?
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-29 Thread Mario Hock

Hi Luca,

I'm answering on behalf of Roland, since I am a co-author of the paper.

This is an excellent question, since it goes right at the heart of how 
LoLa works.

Indeed, the paper is a first of a series. A second one, looking deeper 
into the fair flow balancing mechanism, is currently under submission.

Similar as other delay based congestion controls, LoLa tries to achieve 
fairness by allowing each flow to buffer the same amount of data at the 
bottleneck. We have this, e.g., in TCP Vegas, and (in a way) also in 
Copa (a recently proposed congestion control) and many others. If this 
is achieved, we get flow rate fairness independent of a flow's RTT.

Usually (in other congestion controls) this "allowed amount of data" is 
fixed per flow. We presume that this approach does not scale well to 
high speed networks. Since the queuing delay resulting from this amount 
of data is reduced with increasing bottleneck rate. Thus, it becomes 
harder to measure it right. This can easily be seen (and proven) for TCP 
Vegas.

Note: Just using higher fixed values is not an option, since it would 
not work at lower speeds anymore and also not with a large number of flows.

Therefore, LoLa tries to find a suitable value for the "allowed amount 
of data" dynamically. This is X(t).

Our approach is to grow X(t) over time during the Fair Flow Balancing 
phase. This phase ends when the queuing delay reaches 5ms. Thus, (in the 
ideal case) at the end of Fair Flow Balancing, X(t) is just as large 
that all flows at bottleneck create a queuing delay of 5ms, and all 
flows contribute equally to this queue. Hence, flow rate fairness is 
achieved. (Note that LoLa is designed in a way that t is (almost) 
synchronized among the competing flows.)

Generally, other ways of determining a suitable X(t) are conceivable. In 
our approach X(t) is a monotonically increasing function, but it is 
regularly reset as LoLa cycles between its states; i.e., after a queuing 
delay of 5ms is reached, the queue is drained and everything starts 
again. (Thus, the timespan where X(t) is monotonically increased is 
called a "round of fair flow balancing".)

This way we can overcome the constraint given in [1]:

"""
THEOREM 6 (FAIRNESS/DELAY TRADEOFF). For congestion control mechanisms 
that have steady state throughput of the kind R = f(d, p), for some 
function f, delay d and feedback p, if the feedback is based on purely 
end to end delay measurements, you can either have fairness or a fixed 
delay, but not both simultaneously

"""

[1] "ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY"
Yibo Zhu et al., https://dl.acm.org/citation.cfm?id=2999593

Best, Mario

Am 29.11.18 um 17:09 schrieb Luca Muscariello:

Hi Roland,

It took me quite a lot of time to find this message in the thread...
I read the paper you sent and I guess this is the first of a series as 
many things stay uncovered.

Just a quick question: why is X(t) always increasing with  t?

On Tue, Nov 27, 2018 at 11:26 AM Bless, Roland (TM) 
mailto:roland.bl...@kit.edu>> wrote:

Hi Luca,

Am 27.11.18 um 10:24 schrieb Luca Muscariello:
 > A congestion controlled protocol such as TCP or others, including
QUIC,
 > LEDBAT and so on
 > need at least the BDP in the transmission queue to get full link
 > efficiency, i.e. the queue never empties out.

This is not true. There are congestion control algorithms
(e.g., TCP LoLa [1] or BBRv2) that can fully utilize the bottleneck link
capacity without filling the buffer to its maximum capacity. The BDP
rule of thumb basically stems from the older loss-based congestion
control variants that profit from the standing queue that they built
over time when they detect a loss:
while they back-off and stop sending, the queue keeps the bottleneck
output busy and you'll not see underutilization of the link. Moreover,
once you get good loss de-synchronization, the buffer size requirement
for multiple long-lived flows decreases.

 > This gives rule of thumbs to size buffers which is also very
practical
 > and thanks to flow isolation becomes very accurate.

The positive effect of buffers is merely their role to absorb
short-term bursts (i.e., mismatch in arrival and departure rates)
instead of dropping packets. One does not need a big buffer to
fully utilize a link (with perfect knowledge you can keep the link
saturated even without a single packet waiting in the buffer).
Furthermore, large buffers (e.g., using the BDP rule of thumb)
are not useful/practical anymore at very high speed such as 100 Gbit/s:
memory is also quite costly at such high speeds...

Regards,
  Roland

[1] M. Hock, F. Neumeister, M. Zitterbart, R. Bless.
TCP LoLa: Congestion Control for Low Latencies and High Throughput.
Local Computer Networks (LCN), 2017 IEEE 42nd Conference on, pp.
215-218, Singapore, Singapore, October 2017

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-29 Thread Luca Muscariello

Hi Roland,

It took me quite a lot of time to find this message in the thread...
I read the paper you sent and I guess this is the first of a series as many
things stay uncovered.

Just a quick question: why is X(t) always increasing with  t?


On Tue, Nov 27, 2018 at 11:26 AM Bless, Roland (TM) 
wrote:

> Hi Luca,
>
> Am 27.11.18 um 10:24 schrieb Luca Muscariello:
> > A congestion controlled protocol such as TCP or others, including QUIC,
> > LEDBAT and so on
> > need at least the BDP in the transmission queue to get full link
> > efficiency, i.e. the queue never empties out.
>
> This is not true. There are congestion control algorithms
> (e.g., TCP LoLa [1] or BBRv2) that can fully utilize the bottleneck link
> capacity without filling the buffer to its maximum capacity. The BDP
> rule of thumb basically stems from the older loss-based congestion
> control variants that profit from the standing queue that they built
> over time when they detect a loss:
> while they back-off and stop sending, the queue keeps the bottleneck
> output busy and you'll not see underutilization of the link. Moreover,
> once you get good loss de-synchronization, the buffer size requirement
> for multiple long-lived flows decreases.
>
> > This gives rule of thumbs to size buffers which is also very practical
> > and thanks to flow isolation becomes very accurate.
>
> The positive effect of buffers is merely their role to absorb
> short-term bursts (i.e., mismatch in arrival and departure rates)
> instead of dropping packets. One does not need a big buffer to
> fully utilize a link (with perfect knowledge you can keep the link
> saturated even without a single packet waiting in the buffer).
> Furthermore, large buffers (e.g., using the BDP rule of thumb)
> are not useful/practical anymore at very high speed such as 100 Gbit/s:
> memory is also quite costly at such high speeds...
>
> Regards,
>  Roland
>
> [1] M. Hock, F. Neumeister, M. Zitterbart, R. Bless.
> TCP LoLa: Congestion Control for Low Latencies and High Throughput.
> Local Computer Networks (LCN), 2017 IEEE 42nd Conference on, pp.
> 215-218, Singapore, Singapore, October 2017
> http://doc.tm.kit.edu/2017-LCN-lola-paper-authors-copy.pdf
>
> > Which is:
> >
> > 1) find a way to keep the number of backlogged flows at a reasonable
> value.
> > This largely depends on the minimum fair rate an application may need in
> > the long term.
> > We discussed a little bit of available mechanisms to achieve that in the
> > literature.
> >
> > 2) fix the largest RTT you want to serve at full utilization and size
> > the buffer using BDP * N_backlogged.
> > Or the other way round: check how much memory you can use
> > in the router/line card/device and for a fixed N, compute the largest
> > RTT you can serve at full utilization.
> >
> > 3) there is still some memory to dimension for sparse flows in addition
> > to that, but this is not based on BDP.
> > It is just enough to compute the total utilization of sparse flows and
> > use the same simple model Toke has used
> > to compute the (de)prioritization probability.
> >
> > This procedure would allow to size FQ_codel but also SFQ.
> > It would be interesting to compare the two under this buffer sizing.
> > It would also be interesting to compare another mechanism that we have
> > mentioned during the defense
> > which is AFD + a sparse flow queue. Which is, BTW, already available in
> > Cisco nexus switches for data centres.
> >
> > I think that the the codel part would still provide the ECN feature,
> > that all the others cannot have.
> > However the others, the last one especially can be implemented in
> > silicon with reasonable cost.
>
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

Hi Michael,

Am 29.11.18 um 13:12 schrieb Michael Welzl:
> I'm answering myself with an add-on thought:
> 
>> On 29 Nov 2018, at 09:08, Michael Welzl  wrote:
>>
>>
>>
>>> On 29 Nov 2018, at 08:46, Mikael Abrahamsson  wrote:
>>>
>>> On Thu, 29 Nov 2018, Jonathan Morton wrote:
>>>
 In my view, that is the wrong approach.  Better to improve Diffserv to the 
 point where it becomes useful in practice.
>>>
>>> I agree, but unfortunately nobody has made me king of the Internet yet so I 
>>> can't just decree it into existance.
>>
>> Well, for what you want (re-ordering tolerance), I would think that the LE 
>> codepoint is suitable. From:
>> https://tools.ietf.org/html/draft-ietf-tsvwg-le-phb-06
>> "there ought to be an expectation that packets of the LE PHB could be 
>> excessively delayed or dropped when any other traffic is present"
>>
>> ... I think it would be strange for an application to expect this, yet not 
>> expect it to happen for only a few individual packets from a stream.
> 
> Actually, maybe this is a problem: the semantics of LE are way broader than 
> "tolerant to re-ordering". What about applications that are 
> reordering-tolerant, yet still latency critical?

Yep, the LE semantics are basically that you're expecting to just
utilize any spare capacity (which may not be available for some longer
periods). Re-ordering of LE-packets shouldn't normally be the case as
packets of a particular flow should all be in the same LE queue.

> E.g., if I use a protocol that can hand over messages out of order (e.g. 
> SCTP, and imagine it running over UDP if that helps), then the benefit of 
> this is typically to get messages delivered faster (without receiver-side HOL 
> blocking)).
> But then, wouldn't it be good to have a way to tell the network "I don't care 
> about ordering" ?
> 
> It seems to me that we'd need a new codepoint for that.

Too few DiffServ codepoints for too many purposes available. :-)
Most of the DiffServ PHBs are observing the recommendation of RFC 2474:
"It is RECOMMENDED that PHB implementations do not introduce any packet
re-ordering within a microflow."

> But, it also seems to me that this couldn't get standardised because that 
> standard would embrace a layer violation (caring about a transport 
> connection), even though that has been implemented for ages.

Just from a logical perspective, a re-ordering property could be
_one_ attribute of a per-hop behavior (PHB), but a PHB
has very likely further properties that specify the packet
forwarding treatment. So probably re-ordering is probably
often orthogonal to other PHB features. But having a
new (best-effort + re-ordering tolerant) PHB could
be useful for some cases...

Regards,
 Roland
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

Hi Jonathan,

Am 29.11.18 um 08:45 schrieb Jonathan Morton:
>> On 29 Nov, 2018, at 9:39 am, Dave Taht  wrote:
>>
>> …when it is nearly certain that more than one flow exists, means aiming
>> for the BDP in a single flow is generally foolish.
> 
> It might be more accurate to say that the BDP of the fair-share of the path 
> is the cwnd to aim for.  Plus epsilon for probing.

+1

Right, my statement wasn't on buffer sizing, but on the amount of
inflight data (see other mail). Interestingly enough, it seems hard to
find out the current share without any queue, where the flows indirectly
interact with each other...

Regards,
 Roland
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)


On Thu, 29 Nov 2018, Jonathan Morton wrote:

I have to ask, why would the network care?  What optimisations can be 
obtained by reordering packets *within* a flow, when it's usually just 
as easy to deliver them in order?


Because most implementations aren't flow aware at all and might have 4 
queues, saying "oh, this single queue is for transports that don't care 
about ordering" means everything in that queue can just be sent as soon as 
it can, ignoring HOL caused by ARQ.


Of course, we already have FQ which reorders packets in *different* 
flows.  The benefits are obvious in that case.


FQ is a fringe in real life (speaking as a packet moving monkey). It's 
just on this mailing list that it's the norm.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

> On 29 Nov, 2018, at 2:12 pm, Michael Welzl  wrote:
> 
> But then, wouldn't it be good to have a way to tell the network "I don't care 
> about ordering" ?

I have to ask, why would the network care?  What optimisations can be obtained 
by reordering packets *within* a flow, when it's usually just as easy to 
deliver them in order?

Of course, we already have FQ which reorders packets in *different* flows.  The 
benefits are obvious in that case.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

> On 29 Nov, 2018, at 2:06 pm, Michael Welzl  wrote:
> 
>> That's my proposal.
> 
> - and it's an interesting one. Indeed, I wasn't aware that you're thinking of 
> a DCTCP-style signal from a string of packets.
> 
> Of course, this is hard to get right - there are many possible flavours to 
> ideas like this ... but yes, interesting!

I'm glad you think so.  Working title is ELR - Explicit Load Regulation.

As noted, this needs standardisation effort, which is a bit outside my realm of 
experience - Cake was a great success, but relied entirely on exploiting 
existing standards to their logical conclusions.  I think I started writing 
some material to put in an I-D, but got distracted by something more urgent.

If there's an opportunity to coordinate with relevant people from similar 
efforts, so much the better.  I wonder, for example, whether the DCTCP folks 
would be open to supporting a more deployable version of their idea, or whether 
that would be a political non-starter for them.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

I'm answering myself with an add-on thought:

> On 29 Nov 2018, at 09:08, Michael Welzl  wrote:
> 
> 
> 
>> On 29 Nov 2018, at 08:46, Mikael Abrahamsson  wrote:
>> 
>> On Thu, 29 Nov 2018, Jonathan Morton wrote:
>> 
>>> In my view, that is the wrong approach.  Better to improve Diffserv to the 
>>> point where it becomes useful in practice.
>> 
>> I agree, but unfortunately nobody has made me king of the Internet yet so I 
>> can't just decree it into existance.
> 
> Well, for what you want (re-ordering tolerance), I would think that the LE 
> codepoint is suitable. From:
> https://tools.ietf.org/html/draft-ietf-tsvwg-le-phb-06
> "there ought to be an expectation that packets of the LE PHB could be 
> excessively delayed or dropped when any other traffic is present"
> 
> ... I think it would be strange for an application to expect this, yet not 
> expect it to happen for only a few individual packets from a stream.

Actually, maybe this is a problem: the semantics of LE are way broader than 
"tolerant to re-ordering". What about applications that are 
reordering-tolerant, yet still latency critical?
E.g., if I use a protocol that can hand over messages out of order (e.g. SCTP, 
and imagine it running over UDP if that helps), then the benefit of this is 
typically to get messages delivered faster (without receiver-side HOL 
blocking)).
But then, wouldn't it be good to have a way to tell the network "I don't care 
about ordering" ?

It seems to me that we'd need a new codepoint for that.
But, it also seems to me that this couldn't get standardised because that 
standard would embrace a layer violation (caring about a transport connection), 
even though that has been implemented for ages.
:-(

Cheers,
Michael

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)



> On 29 Nov 2018, at 11:30, Jonathan Morton  wrote:
> 
 My alternative use of ECT(1) is more in keeping with the other codepoints 
 represented by those two bits, to allow ECN to provide more fine-grained 
 information about congestion than it presently does.  The main challenge 
 is communicating the relevant information back to the sender upon receipt, 
 ideally without increasing overhead in the TCP/IP headers.
>>> 
>>> You need to go into the IETF process and voice this opinion then, because 
>>> if nobody opposes in the near time then ECT(1) might go to L4S 
>>> interpretation of what is going on. They do have ECN feedback mechanisms in 
>>> their proposal, have you read it? It's a whole suite of documents, 
>>> architecture, AQM proposal, transport proposal, the entire thing.
>>> 
>>> On the other hand, what you want to do and what L4S tries to do might be 
>>> closely related. It doesn't sound too far off.
>> 
>> Indeed I think that the proposal of finer-grain feedback using 2 bits 
>> instead of one is not adding anything to, but in fact strictly weaker than 
>> L4S, where the granularity is in the order of the number of packets that you 
>> sent per RTT, i.e. much higher.
> 
> An important facet you may be missing here is that we don't *only* have 2 
> bits to work with, but a whole sequence of packets carrying these 2-bit 
> codepoints.  We can convey fine-grained information by setting codepoints 
> stochastically or in a pattern, rather than by merely choosing one of the 
> three available (ignoring Not-ECT).  The receiver can then observe the 
> density of codepoints and report that to the sender.
> 
> Which is more-or-less the premise of DCTCP.  However, DCTCP changes the 
> meaning of CE, instead of making use of ECT(1), which I think is the big 
> mistake that makes it undeployable.
> 
> So, from the middlebox perspective, very little changes.  ECN-capable packets 
> still carry ECT(0) or ECT(1).  You still set CE on ECT packets, or drop 
> Non-ECT packets, to signal when a serious level of persistent queue has 
> developed, so that the sender needs to back off a lot.  But if a less serious 
> congestion condition exists, you can now signal *that* by changing some 
> proportion of ECT(0) codepoints to ECT(1), with the intention that senders 
> either reduce their cwnd growth rate, halt growth entirely, or enter a 
> gradual decline.  Those are three things that ECN cannot currently signal.
> 
> This change is invisible to existing, RFC-compliant, deployed middleboxes and 
> endpoints, so should be completely backwards-compatible and incrementally 
> deployable in the network.  (The only thing it breaks is the optional ECN 
> integrity RFC that, according to fairly recent measurements, literally nobody 
> bothered implementing.)
> 
> Through TCP Timestamps, both sender and receiver can know fairly precisely 
> when a round-trip has occurred.  The receiver can use this information to 
> calculate the ratio of ECT(0) and ECT(1) codepoints received in the most 
> recent RTT.  A new TCP Option could replace TCP Timestamps and the two bytes 
> of padding that usually go with it, allowing reporting of this ratio without 
> actually increasing the size of the TCP header.  Large cwnds can be 
> accommodated at the receiver by shifting both counters right until they both 
> fit in a byte each; it is the ratio between them that is significant.
> 
> It is then incumbent on the sender to do something useful with that 
> information.  A reasonable idea would be to aim for a 1:1 ratio via an 
> integrating control loop.  Receipt of even one ECT(1) signal might be 
> considered grounds for exiting slow-start, while exceeding 1:2 ratio should 
> limit growth rate to "Reno linear" semantics (significant for CUBIC), and 
> exceeding 2:1 ratio should trigger a "Reno linear" *decrease* of cwnd.  
> Through all this, a single CE mark (reported in the usual way via ECE and 
> CWR) still has the usual effect of a multiplicative decrease.
> 
> That's my proposal.

- and it's an interesting one. Indeed, I wasn't aware that you're thinking of a 
DCTCP-style signal from a string of packets.

Of course, this is hard to get right - there are many possible flavours to 
ideas like this ... but yes, interesting!

Cheers,
Michael

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

On Thu, 29 Nov 2018, Sebastian Moeller wrote:

As far as I can tell intel is pushing atom/x86 cores into its
docsis SoCs (puma5/6/7) as well as into the high-end dsl SoCs (formerly
lantiq,
https://www.intel.com/content/www/us/en/smart-home/anywan-grx750-home-gateway-brief.html?wapkw=grx750),
I am quite confident that those also pack enough punch for CPU based
routing at Gbps-rates. In docsis modems these are already rolled-out, I
do not know of any DSL modem/router that uses the GRX750

"10 Gbit/s packet processor".

Game over, again.

Call me naive, but the solution to the impasse at getting a common
definition of diffserv agreed upon is replacing all TCP CC algorithms?
This is replacing changing all endpoints (and network nodes) to honor
diffserve with changing all endpoints to use a different TCP CC. At
least I would call that ambitious (unless L4S offers noticeable
advantages for all participating without being terribly unfair to the
non-participating legacy TCP users*).

L4S proposes a separate queue for the L4S compatible traffic, and some
kind of fair split between L4S and non-L4S traffic. I guess it's kind of
along the lines of my earlier proposals about having some kind of fair
split with 3 queues for PHB LE, BE and the rest. It makes it deployable in
current HW without the worst kind of DDoS downsides imaginable.

The Internet is all about making things incrementally deployable. It's
very frustrating, but that's the way it is. Whatever we want to propose
needs to work so-so with what's already out there and it's ok if it takes
a while before it makes everything better.

I'd like diffserv to work better, but it would take a lot of work in the
operator community to bring it out to where it needs to be. It's not
hopeless though, and I think
https://tools.ietf.org/html/draft-ietf-tsvwg-le-phb-06 is one step in the
right direction. Just the fact that we might have two queues instead of
one in the simplest implementations might help. The first step is to get
ISPs to not bleach diffserv but at least allow 000xxx.

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

>>> My alternative use of ECT(1) is more in keeping with the other codepoints 
>>> represented by those two bits, to allow ECN to provide more fine-grained 
>>> information about congestion than it presently does.  The main challenge is 
>>> communicating the relevant information back to the sender upon receipt, 
>>> ideally without increasing overhead in the TCP/IP headers.
>> 
>> You need to go into the IETF process and voice this opinion then, because if 
>> nobody opposes in the near time then ECT(1) might go to L4S interpretation 
>> of what is going on. They do have ECN feedback mechanisms in their proposal, 
>> have you read it? It's a whole suite of documents, architecture, AQM 
>> proposal, transport proposal, the entire thing.
>> 
>> On the other hand, what you want to do and what L4S tries to do might be 
>> closely related. It doesn't sound too far off.
> 
> Indeed I think that the proposal of finer-grain feedback using 2 bits instead 
> of one is not adding anything to, but in fact strictly weaker than L4S, where 
> the granularity is in the order of the number of packets that you sent per 
> RTT, i.e. much higher.

An important facet you may be missing here is that we don't *only* have 2 bits 
to work with, but a whole sequence of packets carrying these 2-bit codepoints.  
We can convey fine-grained information by setting codepoints stochastically or 
in a pattern, rather than by merely choosing one of the three available 
(ignoring Not-ECT).  The receiver can then observe the density of codepoints 
and report that to the sender.

Which is more-or-less the premise of DCTCP.  However, DCTCP changes the meaning 
of CE, instead of making use of ECT(1), which I think is the big mistake that 
makes it undeployable.

So, from the middlebox perspective, very little changes.  ECN-capable packets 
still carry ECT(0) or ECT(1).  You still set CE on ECT packets, or drop Non-ECT 
packets, to signal when a serious level of persistent queue has developed, so 
that the sender needs to back off a lot.  But if a less serious congestion 
condition exists, you can now signal *that* by changing some proportion of 
ECT(0) codepoints to ECT(1), with the intention that senders either reduce 
their cwnd growth rate, halt growth entirely, or enter a gradual decline.  
Those are three things that ECN cannot currently signal.

This change is invisible to existing, RFC-compliant, deployed middleboxes and 
endpoints, so should be completely backwards-compatible and incrementally 
deployable in the network.  (The only thing it breaks is the optional ECN 
integrity RFC that, according to fairly recent measurements, literally nobody 
bothered implementing.)

Through TCP Timestamps, both sender and receiver can know fairly precisely when 
a round-trip has occurred.  The receiver can use this information to calculate 
the ratio of ECT(0) and ECT(1) codepoints received in the most recent RTT.  A 
new TCP Option could replace TCP Timestamps and the two bytes of padding that 
usually go with it, allowing reporting of this ratio without actually 
increasing the size of the TCP header.  Large cwnds can be accommodated at the 
receiver by shifting both counters right until they both fit in a byte each; it 
is the ratio between them that is significant.

It is then incumbent on the sender to do something useful with that 
information.  A reasonable idea would be to aim for a 1:1 ratio via an 
integrating control loop.  Receipt of even one ECT(1) signal might be 
considered grounds for exiting slow-start, while exceeding 1:2 ratio should 
limit growth rate to "Reno linear" semantics (significant for CUBIC), and 
exceeding 2:1 ratio should trigger a "Reno linear" *decrease* of cwnd.  Through 
all this, a single CE mark (reported in the usual way via ECE and CWR) still 
has the usual effect of a multiplicative decrease.

That's my proposal.

 - Jonathan Morton


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

2018-11-29 Thread Sebastian Moeller

Hi Mikael,

> On Nov 29, 2018, at 08:46, Mikael Abrahamsson  wrote:
> 
> On Thu, 29 Nov 2018, Jonathan Morton wrote:
> 
>> You are essentially proposing using ECT(1) to take over an intended function 
>> of Diffserv.
> 
> Well, I am not proposing anything. I am giving people a heads-up that the L4S 
> authors are proposing this.
> 
> But yes, you're right. Diffserv has shown itself to be really hard to 
> incrementally deploy across the Internet, so it's generally bleached mid-path.
> 
>> In my view, that is the wrong approach.  Better to improve Diffserv to the 
>> point where it becomes useful in practice.
> 
> I agree, but unfortunately nobody has made me king of the Internet yet so I 
> can't just decree it into existance.

With your kind of clue, I would happily vote you as (temporary) king of 
the internet. ;)

> 
>> Cake has taken steps in that direction, by implementing some reasonable 
>> interpretation of some Diffserv codepoints.
> 
> Great. I don't know if I've asked this but is CAKE easily implementable in 
> hardware? From what I can tell it's still only Marvell that is trying to put 
> high performance enough CPUs into HGWs to do forwarding in CPU (which can do 
> CAKE), all others still rely on packet accelerators to achieve the desired 
> speeds.

As far as I can tell intel is pushing atom/x86 cores into its docsis 
SoCs (puma5/6/7) as well as into the high-end dsl SoCs (formerly lantiq, 
https://www.intel.com/content/www/us/en/smart-home/anywan-grx750-home-gateway-brief.html?wapkw=grx750),
 I am quite confident that those also pack enough punch for CPU based routing 
at Gbps-rates. In docsis modems these are already rolled-out, I do not know of 
any DSL modem/router that uses the GRX750

> 
>> My alternative use of ECT(1) is more in keeping with the other codepoints 
>> represented by those two bits, to allow ECN to provide more fine-grained 
>> information about congestion than it presently does.  The main challenge is 
>> communicating the relevant information back to the sender upon receipt, 
>> ideally without increasing overhead in the TCP/IP headers.
> 
> You need to go into the IETF process and voice this opinion then, because if 
> nobody opposes in the near time then ECT(1) might go to L4S interpretation of 
> what is going on. They do have ECN feedback mechanisms in their proposal, 
> have you read it? It's a whole suite of documents, architecture, AQM 
> proposal, transport proposal, the entire thing.
> 
> On the other hand, what you want to do and what L4S tries to do might be 
> closely related. It doesn't sound too far off.
> 
> Also, Bob Briscoe works for Cable Labs now, so he will now have silicon 
> behind him. This silicon might go into other things, not just DOCSIS 
> equipment, so if you have use-cases that L4S doesn't do but might do with 
> minor modification, it might be better to join him than to fight him.

Call me naive, but the solution to the impasse at getting a common definition 
of diffserv agreed upon is replacing all TCP CC algorithms? This is replacing 
changing all endpoints (and network nodes) to honor diffserve with changing all 
endpoints to use a different TCP CC. At least I would call that ambitious 
(unless L4S offers noticeable advantages for all participating without being 
terribly unfair to the non-participating legacy TCP users*).

Best Regards
Sebastian

*) Well, being unfair ad out-competing the legacy users would be the best way 
to incentivize everybody to upgrade, but that would also be true for a better 
diffserve scheme...

> 
> -- 
> Mikael Abrahamssonemail: swm...@swm.pp.se
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-29 Thread Pete Heist

> On Nov 29, 2018, at 8:33 AM, Dave Taht  wrote:
> 
> This whole thread, although diversive... well, I'd really like everybody
> to get together and try to write a joint paper on the best stuff to do,
> worldwide, to make bufferbloat go away.

+1

I don’t think it’s an accident that a discussion around CoDel evolved into a 
discussion around TCP.

If newer TCP CC algorithms can eliminate self-induced bloat, it should still be 
possible for queue management to handle older TCP implementations and extreme 
cases while not damaging newer TCPs. Beyond that, there may be areas where 
queue management can actually enhance the performance of newer TCPs. For 
starters, there’s what happens within an RTT, which I suppose can’t be dealt 
with in the TCP stack, and referring back to one of Jon’s messages from 11/27, 
the possibility for improved signaling from AQM back to TCP on the state of the 
queue. Global coordination could make this work better.

p.s.- Apologies for it taking me longer than an RTT to re-read the original 
CoDel papers and think through some implications. My original question might 
have been smarter.___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

Hi Dave,

Am 29.11.18 um 08:39 schrieb Dave Taht:
> "Bless, Roland (TM)"  writes:
> 
>> Hi Luca,
>>
>> Am 27.11.18 um 11:40 schrieb Luca Muscariello:
>>> OK. We agree.
>>> That's correct, you need *at least* the BDP in flight so that the
>>> bottleneck queue never empties out.
>>
>> No, that's not what I meant, but it's quite simple.
>> You need: data min_inflight=2 * RTTmin * bottleneck_rate to filly
>> utilize the bottleneck link.
>> If this is true, the bottleneck queue will be empty. If your amount
>> of inflight data is larger, the bottleneck queue buffer will store
>> the excess packets. With just min_inflight there will be no
>> bottleneck queue, the packets are "on the wire".
>>
>>> This can be easily proven using fluid models for any congestion
>>> controlled source no matter if it is 
>>> loss-based, delay-based, rate-based, formula-based etc.
>>>
>>> A highly paced source gives you the ability to get as close as
>>> theoretically possible to the BDP+epsilon
>>> as possible.
>>
>> Yep, but that BDP is "on the wire" and epsilon will be in the bottleneck
>> buffer.
> 
> I'm hoping I made my point effectively earlier, that
> 
> " data min_inflight=2 * RTTmin * bottleneck_rate "

That factor of 2 was a mistake in my first mail (sorry for that...).
I corrected that three minutes after. I should have written:
data min_inflight=RTTmin * bottleneck_rate

> when it is nearly certain that more than one flow exists, means aiming
> for the BDP in a single flow is generally foolish. Liked the stanford

I think one should not confuse the buffer sizing rule with the
calcluation for inflight data...

> result, I think it's pretty general. I see hundreds of flows active
> every minute. There was another paper that looked into some magic
> 200-ish number as simultaneous flows active, normally

So for buffer sizing, the BDP dependent rule is foolish in general,
because it is optimized for older loss-based TCP congestion controls
so that they can keep the utilization high. It's correct that in
presence of multiple flows and good loss desynchronization, you
still get high utilization with a smaller buffer (Appenzeller et. al,
SIGCOMM 2004).

However, when it comes to CWnd sizing, that inflight rule would convert
to:
data min_inflight=RTTmin * bottleneck_rate_share
because other flows are present at the bottleneck.

Interestingly enough: flows with a different RTT_min should
use different CWnds, but their amount of queued data at the bottleneck
should be nearly equal if you want to have flow rate fairness.

Regards
 Roland
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

> On 29 Nov, 2018, at 10:19 am, Mikael Abrahamsson  wrote:
> 
>> I'd say the important bits are only slightly harder than doing the same with 
>> fq_codel.
> 
> Ok, FQ_CODEL is way off to get implemented in HW. I haven't heard anyone even 
> discussing it. Have you (or anyone else) heard differently?

I haven't heard of anyone with a specific project to do so, no.  But there are 
basically three components to implement:

1: Codel AQM.  This shouldn't be too difficult.

2: Hashing flows into separate queues.  I think this is doable if you accept 
simplified memory management (eg. assuming every packet is a full MTU for 
allocation purposes) and accept limited/no support for encapsulated protocols 
(which simplifies locating the elements of the 5-tuple for hashing).

3: Dequeuing packets from queues following DRR++ rules.  I think this is also 
doable, since it basically means managing some linked lists.

It should be entirely feasible to prototype this at GigE speeds using existing 
FPGA hardware.  Development can then continue from there.  Overall, it's well 
within the capabilities of any competent HW vendor, so long as they're 
genuinely interested.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)


On Thu, 29 Nov 2018, Jonathan Morton wrote:


I'd say the important bits are only slightly harder than doing the same with 
fq_codel.


Ok, FQ_CODEL is way off to get implemented in HW. I haven't heard anyone 
even discussing it. Have you (or anyone else) heard differently?


I believe much of Cake's perceived CPU overhead is actually down to 
inefficiencies in the Linux network stack.  Using a CPU and some modest 
auxiliary hardware dedicated to moving packets, not tied up in handling 
general-purpose duties, then achieving greater efficiency with 
reasonable hardware costs could be quite easy, without losing the 
flexibility to change algorithms later.


I need to watch the MT7621 packet accelerator talk at the most recent 
OpenWrt summit. I installed OpenWrt 18.06.1 on an Mikrotik RB750vGR3 and 
just clicked my way around in LUCI and enabled flow offload and b00m, it 
now did full gig NAT44 forwarding. It's implemented as a -j FLOWOFFLOAD 
iptables rule. The good thing here might be that we could throw 
unimportant high speed flows off to the accelerator and then just handle 
the time sensitive flows in CPU, and just make sure the CPU has 
preferential access to the media for its time-sensitive flow. That kind of 
approach might make FQ_CODEL deployable even on slow CPU platforms with 
accelerators because you would only run some flows through FQ_CODEL, where 
the bulk high-speed flows would be handed off to acceleration (and we 
guess they don't care about PDV and bufferbloat).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

Hi Dave,

Am 29.11.18 um 08:33 schrieb Dave Taht:
> "Bless, Roland (TM)"  writes:
> 
>> Hi Luca,
>>
>> Am 27.11.18 um 10:24 schrieb Luca Muscariello:
>>> A congestion controlled protocol such as TCP or others, including QUIC,
>>> LEDBAT and so on
>>> need at least the BDP in the transmission queue to get full link
>>> efficiency, i.e. the queue never empties out.
>>
>> This is not true. There are congestion control algorithms
>> (e.g., TCP LoLa [1] or BBRv2) that can fully utilize the bottleneck link
>> capacity without filling the buffer to its maximum capacity. The BDP
> 
> Just to stay cynical, I would rather like the BBR and Lola folk to look
> closely at asymmetric networks, ack path delay, and lower rates than
> 1Gbit. And what the heck... wifi. :)

Yes, absolutely right from a practical point of view.
The thing is that we have to prioritize our research work
at the moment. LoLa is meant to be a conceptual study rather
than a real-world full blown, rock solid congestion control.
It came out of a research project that focuses on high speed networks,
thus we were experimenting with that. Scaling a CC across several
orders of magnitude w.r.t. to speed is a challenge. I think, Mario
also used 100Mbit/s for experiments (but they aren't in that paper)
and it still works fine. However, experimenting with LoLa in real
world environments will always be a problem if flows with
loss-based CC are actually present at the same bottleneck, because LoLa
will back-off (it will not sacrifice its low latency goal for getting
more bandwidth). However, LoLa shows that you can actually get very
close to the goal of limiting queuing delay, but achieving high
utilization _and_ fairness at the same time. BTW, there is an ns-3
implementation of LoLa available...

> BBRv1, for example, is hard coded to reduce cwnd to 4, not lower - because
> that works in the data center. Lola, so far as I know, achieves its
> tested results at 1-10Gbits. My world and much of the rest of the world,
> barely gets to a gbit, on a good day, with a tail-wind.
> 
> If either of these TCPs could be tuned to work well and not saturate
> 5Mbit links I would be a happier person. RRUL benchmarks anyone?

I think we need some students to do this...

> I did, honestly, want to run lola, (codebase was broken), and I am
> patiently waiting for BBRv2 to escape (while hoping that the googlers
> actually run some flent tests at edge bandwidths before I tear into it)

LoLa code is currently revised by Felix and I think it will converge
to a more stable state within the next few weeks.

> Personally, I'd settle for SFQ on the CMTSes, fq_codel on the home
> routers, and then let the tcp-ers decide how much delay and loss they
> can tolerate.
> 
> Another thought... I mean... can't we all just agree to make cubic
> more gentle and go fix that, and not a have a flag day? "From linux 5.0
> forward cubic shall:
> 
> Stop increasing its window at 250ms of delay greater than
> the initial RTT? 
> 
> Have it occasionally rtt probe a bit, more like BBR?

RTT probing is fine, but in order to measure RTTmin you have
to make sure that the bottleneck queue is empty. This isn't that
trivial, because all flows need to synchronize a bit in order to
achieve that. But both, BBR and LoLa, have such mechanisms.

>> rule of thumb basically stems from the older loss-based congestion
>> control variants that profit from the standing queue that they built
>> over time when they detect a loss:
>> while they back-off and stop sending, the queue keeps the bottleneck
>> output busy and you'll not see underutilization of the link. Moreover,
>> once you get good loss de-synchronization, the buffer size requirement
>> for multiple long-lived flows decreases.
>>
>>> This gives rule of thumbs to size buffers which is also very practical
>>> and thanks to flow isolation becomes very accurate.
>>
>> The positive effect of buffers is merely their role to absorb
>> short-term bursts (i.e., mismatch in arrival and departure rates)
>> instead of dropping packets. One does not need a big buffer to
>> fully utilize a link (with perfect knowledge you can keep the link
>> saturated even without a single packet waiting in the buffer).
>> Furthermore, large buffers (e.g., using the BDP rule of thumb)
>> are not useful/practical anymore at very high speed such as 100 Gbit/s:
>> memory is also quite costly at such high speeds...
>>
>> Regards,
>>  Roland
>>
>> [1] M. Hock, F. Neumeister, M. Zitterbart, R. Bless.
>> TCP LoLa: Congestion Control for Low Latencies and High Throughput.
>> Local Computer Networks (LCN), 2017 IEEE 42nd Conference on, pp.
>> 215-218, Singapore, Singapore, October 2017
>> http://doc.tm.kit.edu/2017-LCN-lola-paper-authors-copy.pdf
> 
> 
> This whole thread, although diversive... well, I'd really like everybody
> to get together and try to write a joint paper on the best stuff to do,
> worldwide, to make bufferbloat go away.

Yea, at least if everyone would use LoLa you could eliminat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-29 Thread Luca Muscariello

If you have multiple flows the BDP will change as measured at the end
points.
Also the queue occupancy has to accommodate the overshoot. If you have a
BDP in flight plus
epsilon you should not size based on the long term value but on the
overshoot.
If you don't have space for it, the long term value may be even larger.

On Thu, Nov 29, 2018 at 8:55 AM Dave Taht  wrote:

> On Wed, Nov 28, 2018 at 11:45 PM Jonathan Morton 
> wrote:
> >
> > > On 29 Nov, 2018, at 9:39 am, Dave Taht  wrote:
> > >
> > > …when it is nearly certain that more than one flow exists, means aiming
> > > for the BDP in a single flow is generally foolish.
> >
> > It might be more accurate to say that the BDP of the fair-share of the
> path is the cwnd to aim for.  Plus epsilon for probing.
>
> OK, much better, thanks.
>
> >  - Jonathan Morton
> >
> > ___
> > Bloat mailing list
> > Bloat@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
>
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

> On 29 Nov, 2018, at 9:46 am, Mikael Abrahamsson  wrote:
> 
> I don't know if I've asked this but is CAKE easily implementable in hardware?

I'd say the important bits are only slightly harder than doing the same with 
fq_codel.  Some of the less important details might be significantly harder, 
and could reasonably be left out.  The Diffserv bit should be nearly trivial to 
put in.

I believe much of Cake's perceived CPU overhead is actually down to 
inefficiencies in the Linux network stack.  Using a CPU and some modest 
auxiliary hardware dedicated to moving packets, not tied up in handling 
general-purpose duties, then achieving greater efficiency with reasonable 
hardware costs could be quite easy, without losing the flexibility to change 
algorithms later.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)