Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
> On 29 Nov 2018, at 13:52, Jonathan Morton wrote: > >> On 29 Nov, 2018, at 2:06 pm, Michael Welzl wrote: >> >>> That's my proposal. >> >> - and it's an interesting one. Indeed, I wasn't aware that you're thinking >> of a DCTCP-style signal from a string of packets. >> >> Of course, this is hard to get right - there are many possible flavours to >> ideas like this ... but yes, interesting! > > I'm glad you think so. Working title is ELR - Explicit Load Regulation. > > As noted, this needs standardisation effort, which is a bit outside my realm > of experience - Cake was a great success, but relied entirely on exploiting > existing standards to their logical conclusions. I think I started writing > some material to put in an I-D, but got distracted by something more urgent. Well - "interesting" is one thing, "better than current proposals" is another... I guess this needs lots of evaluations before going anywhere. > If there's an opportunity to coordinate with relevant people from similar > efforts, so much the better. I wonder, for example, whether the DCTCP folks > would be open to supporting a more deployable version of their idea, or > whether that would be a political non-starter for them. I'm not convinced (and I strongly doubt that they would be) that this would indeed be more deployable; your idea also includes TCP option changes, which have their own deployment trouble... the L4S effort, to me, sounds "easier" to deploy (which is not to say that it's easy to deploy at all; though I did like a recent conversation on possibly deploying it with a PEP... that sounded quite doable to me). Cheers, Michael ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
On Fri, 30 Nov 2018, Jonathan Morton wrote: Ah, so you're thinking in terms of link-layers which perform local retransmission, like wifi. So the optimisation is to not delay packets "behind" a corrupted packet while the latter is retransmitted. Yes. It's possible for a TCP to interpret a reordered packet as missing, triggering an end-to-end retransmission which is then discovered to be unnecessary. At the application level, TCP also performs the same HoL blocking in response to missing data. So it's easy to see why links try to preserve ordering, even to this extent, but I suspect they typically do so on a per-station basis rather than per-flow. It's a "truth-everybody-knows" in networking that "NEVER RE-ORDER PACKETS WITHIN 5-TUPLE FLOW! THERE BE DRAGONS THERE!". I'd also say I see enough transport people who says that this should be true generally, if nothing else because of legacy. Personally I think the problem of reordering packets is overblown, and that TCPs can cope with occasional missing or reordered packets without serious consequences to performance. So if you add "reordering tolerant" to the list of stuff that Diffserv can indicate, you might just end up with all traffic being marked that way. Is that really worthwhile? Question isn't so much about TCP, it's the other things I am worried about. TCP handles re-ordering kind of gracefully, other protocols might not. Oddly enough, wifi is now one of the places where FQ is potentially easiest to find, with Toke's work reaching the Linux kernel and so many wifi routers being Linux based. Again, even if they're using Linux they will/might have packet accelerators that just grab the flow and the kernel never sees it again. No FQ_CODEL for that. An acknowledged problem is overly persistent retries by the ARQ mechanism, such that the time horizon for the link-layer retransmission often exceeds that of the end-to-end RTO, both for TCP and request-response protocols like DNS. I say, retransmit at the link layer once or twice, then give up and let the end-hosts sort it out. I agree, but I also think that it would help some link-layers if the re-ordering requirement could be relaxed. However, before that can be communicated a lot of study needs to be done to check if this is actually true. I've had incidents in my 20 year networking career where it's not and applications misbehaved when they were re-ordered. -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
On Thu, 29 Nov 2018, Stephen Hemminger wrote: The problem is that any protocol is mostly blind to the underlying network (and that can change). To use dave's analogy it is like being put in the driver seat of a vehicle blind folded. When you step on the gas you don't know if it is a dragster, jet fighter, or a soviet tractor. The only way a protocol can tell is based on the perceived inertia and when it runs into things... Actually, I've made the argument to IETF TCPM that this is not true. You can be able to communicate earlier data from previous flows on the same connection so that new flows can re-learn this. If no flow the past hour has been able to run faster than 1 megabit/s and always PMTUD to 1460 bytes MTU outbound, then there is good chance that the next flow will encounter the same thing. Why not use this information when guessing how things will behave going forward? -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
On Wed, Nov 28, 2018 at 11:36 PM Jonathan Morton wrote: > > > On 29 Nov, 2018, at 9:28 am, Mikael Abrahamsson wrote: > > > > This is one thing about L4S, ETC(1) is the last "codepoint" in the header > > not used, that can statelessly identify something. If anyone sees a better > > way to use it compared to "let's put it in a separate queue and CE-mark it > > agressively at very low queue depths and also do not care about re-ordering > > so a ARQ L2 can re-order all it wants", then they need to speak up, soon. > > You are essentially proposing using ECT(1) to take over an intended function > of Diffserv. In my view, that is the wrong approach. Better to improve > Diffserv to the point where it becomes useful in practice. Cake has taken > steps in that direction, by implementing some reasonable interpretation of > some Diffserv codepoints. > > My alternative use of ECT(1) is more in keeping with the other codepoints > represented by those two bits, to allow ECN to provide more fine-grained > information about congestion than it presently does. The main challenge is > communicating the relevant information back to the sender upon receipt, > ideally without increasing overhead in the TCP/IP headers. I felt that using this bit up as a separate indicator of an alternate algorithm in play for indicating congestion was a pretty good idea... but no-one was listening at the time. > - Jonathan Morton > > ___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] go's improvements to gc
If you can think in terms of pipes, Go makes you /productive/. I recognized that about 13 hours into learning Go about two years ago. My advice as an aphorism: "redo something you've done at least once before, in go, and see how different it it. Then decide if it's better" --dave On 2018-11-29 8:33 p.m., Dave Taht wrote: as remarkable as our efforts have been to reduce network bloat, I have to take my hat off to the golang garbage collection folk, also. Reductions in latencies from 300ms to 500us in 4 years. Good story here about how latency is cumulative: https://blog.golang.org/ismmkeynote The very first thing I learned about go, was how to turn off the garbage collector. I guess I have to go learn some go, for real, now. -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest dav...@spamcop.net | -- Mark Twain ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
[Bloat] go's improvements to gc
as remarkable as our efforts have been to reduce network bloat, I have to take my hat off to the golang garbage collection folk, also. Reductions in latencies from 300ms to 500us in 4 years. Good story here about how latency is cumulative: https://blog.golang.org/ismmkeynote The very first thing I learned about go, was how to turn off the garbage collector. I guess I have to go learn some go, for real, now. -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
>> I have to ask, why would the network care? What optimisations can be >> obtained by reordering packets *within* a flow, when it's usually just as >> easy to deliver them in order? > > Because most implementations aren't flow aware at all and might have 4 > queues, saying "oh, this single queue is for transports that don't care about > ordering" means everything in that queue can just be sent as soon as it can, > ignoring HOL caused by ARQ. Ah, so you're thinking in terms of link-layers which perform local retransmission, like wifi. So the optimisation is to not delay packets "behind" a corrupted packet while the latter is retransmitted. It's possible for a TCP to interpret a reordered packet as missing, triggering an end-to-end retransmission which is then discovered to be unnecessary. At the application level, TCP also performs the same HoL blocking in response to missing data. So it's easy to see why links try to preserve ordering, even to this extent, but I suspect they typically do so on a per-station basis rather than per-flow. Personally I think the problem of reordering packets is overblown, and that TCPs can cope with occasional missing or reordered packets without serious consequences to performance. So if you add "reordering tolerant" to the list of stuff that Diffserv can indicate, you might just end up with all traffic being marked that way. Is that really worthwhile? >> Of course, we already have FQ which reorders packets in *different* flows. >> The benefits are obvious in that case. > > FQ is a fringe in real life (speaking as a packet moving monkey). It's just > on this mailing list that it's the norm. Oddly enough, wifi is now one of the places where FQ is potentially easiest to find, with Toke's work reaching the Linux kernel and so many wifi routers being Linux based. An acknowledged problem is overly persistent retries by the ARQ mechanism, such that the time horizon for the link-layer retransmission often exceeds that of the end-to-end RTO, both for TCP and request-response protocols like DNS. I say, retransmit at the link layer once or twice, then give up and let the end-hosts sort it out. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
Mario, putting aside LoLa for a second, I'm not quite sure that the theorem you cite is valid. According to the model R_i is the sending rate. The sum across all flows bottlenecked at the link does not need to be equal to the link capacity. The input rate can be instantaneously below or above the link capacity. Even If the link is never idle and the output rate is always C for all t. If the sum_i R_i = C is false because of what I said, than p_i which is nothing more than the shadow price of the link capacity constraint, can be a function of a constant delay d, i.e. p_i = cost * d for all i. This theorem can be valid only if the input rate of a queue is instantaneously equal to the output queue. We all know that a queue exists just because there is an instantaneous difference of input and output rates at the link. So to conclude, this theorem if valid iff input == output rate then the queue is always zero, i.e. d=0. The theorem is either an artefact of the model or just wrong. Or I'm missing something... On Thu, Nov 29, 2018 at 6:07 PM Mario Hock wrote: > Hi Luca, > > I'm answering on behalf of Roland, since I am a co-author of the paper. > > This is an excellent question, since it goes right at the heart of how > LoLa works. > > Indeed, the paper is a first of a series. A second one, looking deeper > into the fair flow balancing mechanism, is currently under submission. > > Similar as other delay based congestion controls, LoLa tries to achieve > fairness by allowing each flow to buffer the same amount of data at the > bottleneck. We have this, e.g., in TCP Vegas, and (in a way) also in > Copa (a recently proposed congestion control) and many others. If this > is achieved, we get flow rate fairness independent of a flow's RTT. > > Usually (in other congestion controls) this "allowed amount of data" is > fixed per flow. We presume that this approach does not scale well to > high speed networks. Since the queuing delay resulting from this amount > of data is reduced with increasing bottleneck rate. Thus, it becomes > harder to measure it right. This can easily be seen (and proven) for TCP > Vegas. > > Note: Just using higher fixed values is not an option, since it would > not work at lower speeds anymore and also not with a large number of flows. > > Therefore, LoLa tries to find a suitable value for the "allowed amount > of data" dynamically. This is X(t). > > Our approach is to grow X(t) over time during the Fair Flow Balancing > phase. This phase ends when the queuing delay reaches 5ms. Thus, (in the > ideal case) at the end of Fair Flow Balancing, X(t) is just as large > that all flows at bottleneck create a queuing delay of 5ms, and all > flows contribute equally to this queue. Hence, flow rate fairness is > achieved. (Note that LoLa is designed in a way that t is (almost) > synchronized among the competing flows.) > > Generally, other ways of determining a suitable X(t) are conceivable. In > our approach X(t) is a monotonically increasing function, but it is > regularly reset as LoLa cycles between its states; i.e., after a queuing > delay of 5ms is reached, the queue is drained and everything starts > again. (Thus, the timespan where X(t) is monotonically increased is > called a "round of fair flow balancing".) > > This way we can overcome the constraint given in [1]: > > """ > THEOREM 6 (FAIRNESS/DELAY TRADEOFF). For congestion control mechanisms > that have steady state throughput of the kind R = f(d, p), for some > function f, delay d and feedback p, if the feedback is based on purely > end to end delay measurements, you can either have fairness or a fixed > delay, but not both simultaneously > """ > > [1] "ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY" > Yibo Zhu et al., https://dl.acm.org/citation.cfm?id=2999593 > > Best, Mario > > > Am 29.11.18 um 17:09 schrieb Luca Muscariello: > > Hi Roland, > > > > It took me quite a lot of time to find this message in the thread... > > I read the paper you sent and I guess this is the first of a series as > > many things stay uncovered. > > > > Just a quick question: why is X(t) always increasing with t? > > > > > > On Tue, Nov 27, 2018 at 11:26 AM Bless, Roland (TM) > > mailto:roland.bl...@kit.edu>> wrote: > > > > Hi Luca, > > > > Am 27.11.18 um 10:24 schrieb Luca Muscariello: > > > A congestion controlled protocol such as TCP or others, including > > QUIC, > > > LEDBAT and so on > > > need at least the BDP in the transmission queue to get full link > > > efficiency, i.e. the queue never empties out. > > > > This is not true. There are congestion control algorithms > > (e.g., TCP LoLa [1] or BBRv2) that can fully utilize the bottleneck > link > > capacity without filling the buffer to its maximum capacity. The BDP > > rule of thumb basically stems from the older loss-based congestion > > control variants that profit from the standing queue that they built > > over time
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
On Thu, Nov 29, 2018 at 10:43 AM Stephen Hemminger wrote: > > On Wed, 28 Nov 2018 23:35:53 -0800 > Dave Taht wrote: > > > > As someone who works with moving packets, it's perplexing to me to > > > interact with transport peeps who seem enormously focused on > > > "goodput". My personal opinion is that most people would be better off > > > with 80% of their available bandwidth being in use without any > > > noticable buffer induced delay, as opposed to the transport protocol > > > doing its damndest to fill up the link to 100% and sometimes failing > > > and inducing delay instead. > > The problem is that any protocol is mostly blind to the underlying network > (and that can change). To use dave's analogy it is like being put in > the driver seat of a vehicle blind folded. When you step on the gas you > don't know if it is a dragster, jet fighter, or a soviet tractor. The only > way a protocol can tell is based on the perceived inertia and when > it runs into things... coffee. blown through nose. thx! I'm still chuckling - 'because I also referenced a church. You sit in a pew, and *pray* for something good to happen. > ___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
On Wed, 28 Nov 2018 23:35:53 -0800 Dave Taht wrote: > > As someone who works with moving packets, it's perplexing to me to > > interact with transport peeps who seem enormously focused on > > "goodput". My personal opinion is that most people would be better off > > with 80% of their available bandwidth being in use without any > > noticable buffer induced delay, as opposed to the transport protocol > > doing its damndest to fill up the link to 100% and sometimes failing > > and inducing delay instead. The problem is that any protocol is mostly blind to the underlying network (and that can change). To use dave's analogy it is like being put in the driver seat of a vehicle blind folded. When you step on the gas you don't know if it is a dragster, jet fighter, or a soviet tractor. The only way a protocol can tell is based on the perceived inertia and when it runs into things... ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] found another good use for a queue today, possibly
His thesis is more clear: https://sites.google.com/site/yuriyarbitman/Home/de-amortizedcuckoohashing He did exclude the cost of a resize, but, still... I find the core idea very attractive. We swapped an email and he said: > In general, I would say that a cryptographic hash function will do. > If you want to use a non-cryptographic hash function, then the > question is what provable random properties it has. This is also > discussed in the thesis and in the paper. On Mon, Nov 26, 2018 at 6:17 PM Dave Taht wrote: > > I had been investigating various hashing schemes for speeding up the > babeld routing protocol daemon, and dealing with annoying bursty cpu > behavior (resizing memory, bursts of packets, thundering herds of > retractions), and, although it's a tough slog of a read, this adds a > queue to cuckoo hashing to good effect in flattening out insertion > time. > > https://arxiv.org/pdf/0903.0391.pdf > > But for all I know it's dependent on angels dancing on saddles mounted > on unicorns. I skip to the graphs for insertion time and go back to > the text for another round... > > "polylog(n)-wise Independent Hash Function". OK, my google-foo fails > me: The authors use sha1, would something lighter weight suit? > > > -- > > Dave Täht > CTO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-831-205-9740 -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
Hi Luca, I'm answering on behalf of Roland, since I am a co-author of the paper. This is an excellent question, since it goes right at the heart of how LoLa works. Indeed, the paper is a first of a series. A second one, looking deeper into the fair flow balancing mechanism, is currently under submission. Similar as other delay based congestion controls, LoLa tries to achieve fairness by allowing each flow to buffer the same amount of data at the bottleneck. We have this, e.g., in TCP Vegas, and (in a way) also in Copa (a recently proposed congestion control) and many others. If this is achieved, we get flow rate fairness independent of a flow's RTT. Usually (in other congestion controls) this "allowed amount of data" is fixed per flow. We presume that this approach does not scale well to high speed networks. Since the queuing delay resulting from this amount of data is reduced with increasing bottleneck rate. Thus, it becomes harder to measure it right. This can easily be seen (and proven) for TCP Vegas. Note: Just using higher fixed values is not an option, since it would not work at lower speeds anymore and also not with a large number of flows. Therefore, LoLa tries to find a suitable value for the "allowed amount of data" dynamically. This is X(t). Our approach is to grow X(t) over time during the Fair Flow Balancing phase. This phase ends when the queuing delay reaches 5ms. Thus, (in the ideal case) at the end of Fair Flow Balancing, X(t) is just as large that all flows at bottleneck create a queuing delay of 5ms, and all flows contribute equally to this queue. Hence, flow rate fairness is achieved. (Note that LoLa is designed in a way that t is (almost) synchronized among the competing flows.) Generally, other ways of determining a suitable X(t) are conceivable. In our approach X(t) is a monotonically increasing function, but it is regularly reset as LoLa cycles between its states; i.e., after a queuing delay of 5ms is reached, the queue is drained and everything starts again. (Thus, the timespan where X(t) is monotonically increased is called a "round of fair flow balancing".) This way we can overcome the constraint given in [1]: """ THEOREM 6 (FAIRNESS/DELAY TRADEOFF). For congestion control mechanisms that have steady state throughput of the kind R = f(d, p), for some function f, delay d and feedback p, if the feedback is based on purely end to end delay measurements, you can either have fairness or a fixed delay, but not both simultaneously """ [1] "ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY" Yibo Zhu et al., https://dl.acm.org/citation.cfm?id=2999593 Best, Mario Am 29.11.18 um 17:09 schrieb Luca Muscariello: Hi Roland, It took me quite a lot of time to find this message in the thread... I read the paper you sent and I guess this is the first of a series as many things stay uncovered. Just a quick question: why is X(t) always increasing with t? On Tue, Nov 27, 2018 at 11:26 AM Bless, Roland (TM) mailto:roland.bl...@kit.edu>> wrote: Hi Luca, Am 27.11.18 um 10:24 schrieb Luca Muscariello: > A congestion controlled protocol such as TCP or others, including QUIC, > LEDBAT and so on > need at least the BDP in the transmission queue to get full link > efficiency, i.e. the queue never empties out. This is not true. There are congestion control algorithms (e.g., TCP LoLa [1] or BBRv2) that can fully utilize the bottleneck link capacity without filling the buffer to its maximum capacity. The BDP rule of thumb basically stems from the older loss-based congestion control variants that profit from the standing queue that they built over time when they detect a loss: while they back-off and stop sending, the queue keeps the bottleneck output busy and you'll not see underutilization of the link. Moreover, once you get good loss de-synchronization, the buffer size requirement for multiple long-lived flows decreases. > This gives rule of thumbs to size buffers which is also very practical > and thanks to flow isolation becomes very accurate. The positive effect of buffers is merely their role to absorb short-term bursts (i.e., mismatch in arrival and departure rates) instead of dropping packets. One does not need a big buffer to fully utilize a link (with perfect knowledge you can keep the link saturated even without a single packet waiting in the buffer). Furthermore, large buffers (e.g., using the BDP rule of thumb) are not useful/practical anymore at very high speed such as 100 Gbit/s: memory is also quite costly at such high speeds... Regards, Roland [1] M. Hock, F. Neumeister, M. Zitterbart, R. Bless. TCP LoLa: Congestion Control for Low Latencies and High Throughput. Local Computer Networks (LCN), 2017 IEEE 42nd Conference on, pp. 215-218, Singapore, Singapore, October 2017
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
Hi Roland, It took me quite a lot of time to find this message in the thread... I read the paper you sent and I guess this is the first of a series as many things stay uncovered. Just a quick question: why is X(t) always increasing with t? On Tue, Nov 27, 2018 at 11:26 AM Bless, Roland (TM) wrote: > Hi Luca, > > Am 27.11.18 um 10:24 schrieb Luca Muscariello: > > A congestion controlled protocol such as TCP or others, including QUIC, > > LEDBAT and so on > > need at least the BDP in the transmission queue to get full link > > efficiency, i.e. the queue never empties out. > > This is not true. There are congestion control algorithms > (e.g., TCP LoLa [1] or BBRv2) that can fully utilize the bottleneck link > capacity without filling the buffer to its maximum capacity. The BDP > rule of thumb basically stems from the older loss-based congestion > control variants that profit from the standing queue that they built > over time when they detect a loss: > while they back-off and stop sending, the queue keeps the bottleneck > output busy and you'll not see underutilization of the link. Moreover, > once you get good loss de-synchronization, the buffer size requirement > for multiple long-lived flows decreases. > > > This gives rule of thumbs to size buffers which is also very practical > > and thanks to flow isolation becomes very accurate. > > The positive effect of buffers is merely their role to absorb > short-term bursts (i.e., mismatch in arrival and departure rates) > instead of dropping packets. One does not need a big buffer to > fully utilize a link (with perfect knowledge you can keep the link > saturated even without a single packet waiting in the buffer). > Furthermore, large buffers (e.g., using the BDP rule of thumb) > are not useful/practical anymore at very high speed such as 100 Gbit/s: > memory is also quite costly at such high speeds... > > Regards, > Roland > > [1] M. Hock, F. Neumeister, M. Zitterbart, R. Bless. > TCP LoLa: Congestion Control for Low Latencies and High Throughput. > Local Computer Networks (LCN), 2017 IEEE 42nd Conference on, pp. > 215-218, Singapore, Singapore, October 2017 > http://doc.tm.kit.edu/2017-LCN-lola-paper-authors-copy.pdf > > > Which is: > > > > 1) find a way to keep the number of backlogged flows at a reasonable > value. > > This largely depends on the minimum fair rate an application may need in > > the long term. > > We discussed a little bit of available mechanisms to achieve that in the > > literature. > > > > 2) fix the largest RTT you want to serve at full utilization and size > > the buffer using BDP * N_backlogged. > > Or the other way round: check how much memory you can use > > in the router/line card/device and for a fixed N, compute the largest > > RTT you can serve at full utilization. > > > > 3) there is still some memory to dimension for sparse flows in addition > > to that, but this is not based on BDP. > > It is just enough to compute the total utilization of sparse flows and > > use the same simple model Toke has used > > to compute the (de)prioritization probability. > > > > This procedure would allow to size FQ_codel but also SFQ. > > It would be interesting to compare the two under this buffer sizing. > > It would also be interesting to compare another mechanism that we have > > mentioned during the defense > > which is AFD + a sparse flow queue. Which is, BTW, already available in > > Cisco nexus switches for data centres. > > > > I think that the the codel part would still provide the ECN feature, > > that all the others cannot have. > > However the others, the last one especially can be implemented in > > silicon with reasonable cost. > ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
Hi Michael, Am 29.11.18 um 13:12 schrieb Michael Welzl: > I'm answering myself with an add-on thought: > >> On 29 Nov 2018, at 09:08, Michael Welzl wrote: >> >> >> >>> On 29 Nov 2018, at 08:46, Mikael Abrahamsson wrote: >>> >>> On Thu, 29 Nov 2018, Jonathan Morton wrote: >>> In my view, that is the wrong approach. Better to improve Diffserv to the point where it becomes useful in practice. >>> >>> I agree, but unfortunately nobody has made me king of the Internet yet so I >>> can't just decree it into existance. >> >> Well, for what you want (re-ordering tolerance), I would think that the LE >> codepoint is suitable. From: >> https://tools.ietf.org/html/draft-ietf-tsvwg-le-phb-06 >> "there ought to be an expectation that packets of the LE PHB could be >> excessively delayed or dropped when any other traffic is present" >> >> ... I think it would be strange for an application to expect this, yet not >> expect it to happen for only a few individual packets from a stream. > > Actually, maybe this is a problem: the semantics of LE are way broader than > "tolerant to re-ordering". What about applications that are > reordering-tolerant, yet still latency critical? Yep, the LE semantics are basically that you're expecting to just utilize any spare capacity (which may not be available for some longer periods). Re-ordering of LE-packets shouldn't normally be the case as packets of a particular flow should all be in the same LE queue. > E.g., if I use a protocol that can hand over messages out of order (e.g. > SCTP, and imagine it running over UDP if that helps), then the benefit of > this is typically to get messages delivered faster (without receiver-side HOL > blocking)). > But then, wouldn't it be good to have a way to tell the network "I don't care > about ordering" ? > > It seems to me that we'd need a new codepoint for that. Too few DiffServ codepoints for too many purposes available. :-) Most of the DiffServ PHBs are observing the recommendation of RFC 2474: "It is RECOMMENDED that PHB implementations do not introduce any packet re-ordering within a microflow." > But, it also seems to me that this couldn't get standardised because that > standard would embrace a layer violation (caring about a transport > connection), even though that has been implemented for ages. Just from a logical perspective, a re-ordering property could be _one_ attribute of a per-hop behavior (PHB), but a PHB has very likely further properties that specify the packet forwarding treatment. So probably re-ordering is probably often orthogonal to other PHB features. But having a new (best-effort + re-ordering tolerant) PHB could be useful for some cases... Regards, Roland ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
Hi Jonathan, Am 29.11.18 um 08:45 schrieb Jonathan Morton: >> On 29 Nov, 2018, at 9:39 am, Dave Taht wrote: >> >> …when it is nearly certain that more than one flow exists, means aiming >> for the BDP in a single flow is generally foolish. > > It might be more accurate to say that the BDP of the fair-share of the path > is the cwnd to aim for. Plus epsilon for probing. +1 Right, my statement wasn't on buffer sizing, but on the amount of inflight data (see other mail). Interestingly enough, it seems hard to find out the current share without any queue, where the flows indirectly interact with each other... Regards, Roland ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
On Thu, 29 Nov 2018, Jonathan Morton wrote: I have to ask, why would the network care? What optimisations can be obtained by reordering packets *within* a flow, when it's usually just as easy to deliver them in order? Because most implementations aren't flow aware at all and might have 4 queues, saying "oh, this single queue is for transports that don't care about ordering" means everything in that queue can just be sent as soon as it can, ignoring HOL caused by ARQ. Of course, we already have FQ which reorders packets in *different* flows. The benefits are obvious in that case. FQ is a fringe in real life (speaking as a packet moving monkey). It's just on this mailing list that it's the norm. -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
> On 29 Nov, 2018, at 2:12 pm, Michael Welzl wrote: > > But then, wouldn't it be good to have a way to tell the network "I don't care > about ordering" ? I have to ask, why would the network care? What optimisations can be obtained by reordering packets *within* a flow, when it's usually just as easy to deliver them in order? Of course, we already have FQ which reorders packets in *different* flows. The benefits are obvious in that case. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
> On 29 Nov, 2018, at 2:06 pm, Michael Welzl wrote: > >> That's my proposal. > > - and it's an interesting one. Indeed, I wasn't aware that you're thinking of > a DCTCP-style signal from a string of packets. > > Of course, this is hard to get right - there are many possible flavours to > ideas like this ... but yes, interesting! I'm glad you think so. Working title is ELR - Explicit Load Regulation. As noted, this needs standardisation effort, which is a bit outside my realm of experience - Cake was a great success, but relied entirely on exploiting existing standards to their logical conclusions. I think I started writing some material to put in an I-D, but got distracted by something more urgent. If there's an opportunity to coordinate with relevant people from similar efforts, so much the better. I wonder, for example, whether the DCTCP folks would be open to supporting a more deployable version of their idea, or whether that would be a political non-starter for them. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
I'm answering myself with an add-on thought: > On 29 Nov 2018, at 09:08, Michael Welzl wrote: > > > >> On 29 Nov 2018, at 08:46, Mikael Abrahamsson wrote: >> >> On Thu, 29 Nov 2018, Jonathan Morton wrote: >> >>> In my view, that is the wrong approach. Better to improve Diffserv to the >>> point where it becomes useful in practice. >> >> I agree, but unfortunately nobody has made me king of the Internet yet so I >> can't just decree it into existance. > > Well, for what you want (re-ordering tolerance), I would think that the LE > codepoint is suitable. From: > https://tools.ietf.org/html/draft-ietf-tsvwg-le-phb-06 > "there ought to be an expectation that packets of the LE PHB could be > excessively delayed or dropped when any other traffic is present" > > ... I think it would be strange for an application to expect this, yet not > expect it to happen for only a few individual packets from a stream. Actually, maybe this is a problem: the semantics of LE are way broader than "tolerant to re-ordering". What about applications that are reordering-tolerant, yet still latency critical? E.g., if I use a protocol that can hand over messages out of order (e.g. SCTP, and imagine it running over UDP if that helps), then the benefit of this is typically to get messages delivered faster (without receiver-side HOL blocking)). But then, wouldn't it be good to have a way to tell the network "I don't care about ordering" ? It seems to me that we'd need a new codepoint for that. But, it also seems to me that this couldn't get standardised because that standard would embrace a layer violation (caring about a transport connection), even though that has been implemented for ages. :-( Cheers, Michael ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
> On 29 Nov 2018, at 11:30, Jonathan Morton wrote: > My alternative use of ECT(1) is more in keeping with the other codepoints represented by those two bits, to allow ECN to provide more fine-grained information about congestion than it presently does. The main challenge is communicating the relevant information back to the sender upon receipt, ideally without increasing overhead in the TCP/IP headers. >>> >>> You need to go into the IETF process and voice this opinion then, because >>> if nobody opposes in the near time then ECT(1) might go to L4S >>> interpretation of what is going on. They do have ECN feedback mechanisms in >>> their proposal, have you read it? It's a whole suite of documents, >>> architecture, AQM proposal, transport proposal, the entire thing. >>> >>> On the other hand, what you want to do and what L4S tries to do might be >>> closely related. It doesn't sound too far off. >> >> Indeed I think that the proposal of finer-grain feedback using 2 bits >> instead of one is not adding anything to, but in fact strictly weaker than >> L4S, where the granularity is in the order of the number of packets that you >> sent per RTT, i.e. much higher. > > An important facet you may be missing here is that we don't *only* have 2 > bits to work with, but a whole sequence of packets carrying these 2-bit > codepoints. We can convey fine-grained information by setting codepoints > stochastically or in a pattern, rather than by merely choosing one of the > three available (ignoring Not-ECT). The receiver can then observe the > density of codepoints and report that to the sender. > > Which is more-or-less the premise of DCTCP. However, DCTCP changes the > meaning of CE, instead of making use of ECT(1), which I think is the big > mistake that makes it undeployable. > > So, from the middlebox perspective, very little changes. ECN-capable packets > still carry ECT(0) or ECT(1). You still set CE on ECT packets, or drop > Non-ECT packets, to signal when a serious level of persistent queue has > developed, so that the sender needs to back off a lot. But if a less serious > congestion condition exists, you can now signal *that* by changing some > proportion of ECT(0) codepoints to ECT(1), with the intention that senders > either reduce their cwnd growth rate, halt growth entirely, or enter a > gradual decline. Those are three things that ECN cannot currently signal. > > This change is invisible to existing, RFC-compliant, deployed middleboxes and > endpoints, so should be completely backwards-compatible and incrementally > deployable in the network. (The only thing it breaks is the optional ECN > integrity RFC that, according to fairly recent measurements, literally nobody > bothered implementing.) > > Through TCP Timestamps, both sender and receiver can know fairly precisely > when a round-trip has occurred. The receiver can use this information to > calculate the ratio of ECT(0) and ECT(1) codepoints received in the most > recent RTT. A new TCP Option could replace TCP Timestamps and the two bytes > of padding that usually go with it, allowing reporting of this ratio without > actually increasing the size of the TCP header. Large cwnds can be > accommodated at the receiver by shifting both counters right until they both > fit in a byte each; it is the ratio between them that is significant. > > It is then incumbent on the sender to do something useful with that > information. A reasonable idea would be to aim for a 1:1 ratio via an > integrating control loop. Receipt of even one ECT(1) signal might be > considered grounds for exiting slow-start, while exceeding 1:2 ratio should > limit growth rate to "Reno linear" semantics (significant for CUBIC), and > exceeding 2:1 ratio should trigger a "Reno linear" *decrease* of cwnd. > Through all this, a single CE mark (reported in the usual way via ECE and > CWR) still has the usual effect of a multiplicative decrease. > > That's my proposal. - and it's an interesting one. Indeed, I wasn't aware that you're thinking of a DCTCP-style signal from a string of packets. Of course, this is hard to get right - there are many possible flavours to ideas like this ... but yes, interesting! Cheers, Michael ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
On Thu, 29 Nov 2018, Sebastian Moeller wrote: As far as I can tell intel is pushing atom/x86 cores into its docsis SoCs (puma5/6/7) as well as into the high-end dsl SoCs (formerly lantiq, https://www.intel.com/content/www/us/en/smart-home/anywan-grx750-home-gateway-brief.html?wapkw=grx750), I am quite confident that those also pack enough punch for CPU based routing at Gbps-rates. In docsis modems these are already rolled-out, I do not know of any DSL modem/router that uses the GRX750 "10 Gbit/s packet processor". Game over, again. Call me naive, but the solution to the impasse at getting a common definition of diffserv agreed upon is replacing all TCP CC algorithms? This is replacing changing all endpoints (and network nodes) to honor diffserve with changing all endpoints to use a different TCP CC. At least I would call that ambitious (unless L4S offers noticeable advantages for all participating without being terribly unfair to the non-participating legacy TCP users*). L4S proposes a separate queue for the L4S compatible traffic, and some kind of fair split between L4S and non-L4S traffic. I guess it's kind of along the lines of my earlier proposals about having some kind of fair split with 3 queues for PHB LE, BE and the rest. It makes it deployable in current HW without the worst kind of DDoS downsides imaginable. The Internet is all about making things incrementally deployable. It's very frustrating, but that's the way it is. Whatever we want to propose needs to work so-so with what's already out there and it's ok if it takes a while before it makes everything better. I'd like diffserv to work better, but it would take a lot of work in the operator community to bring it out to where it needs to be. It's not hopeless though, and I think https://tools.ietf.org/html/draft-ietf-tsvwg-le-phb-06 is one step in the right direction. Just the fact that we might have two queues instead of one in the simplest implementations might help. The first step is to get ISPs to not bleach diffserv but at least allow 000xxx. -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
>>> My alternative use of ECT(1) is more in keeping with the other codepoints >>> represented by those two bits, to allow ECN to provide more fine-grained >>> information about congestion than it presently does. The main challenge is >>> communicating the relevant information back to the sender upon receipt, >>> ideally without increasing overhead in the TCP/IP headers. >> >> You need to go into the IETF process and voice this opinion then, because if >> nobody opposes in the near time then ECT(1) might go to L4S interpretation >> of what is going on. They do have ECN feedback mechanisms in their proposal, >> have you read it? It's a whole suite of documents, architecture, AQM >> proposal, transport proposal, the entire thing. >> >> On the other hand, what you want to do and what L4S tries to do might be >> closely related. It doesn't sound too far off. > > Indeed I think that the proposal of finer-grain feedback using 2 bits instead > of one is not adding anything to, but in fact strictly weaker than L4S, where > the granularity is in the order of the number of packets that you sent per > RTT, i.e. much higher. An important facet you may be missing here is that we don't *only* have 2 bits to work with, but a whole sequence of packets carrying these 2-bit codepoints. We can convey fine-grained information by setting codepoints stochastically or in a pattern, rather than by merely choosing one of the three available (ignoring Not-ECT). The receiver can then observe the density of codepoints and report that to the sender. Which is more-or-less the premise of DCTCP. However, DCTCP changes the meaning of CE, instead of making use of ECT(1), which I think is the big mistake that makes it undeployable. So, from the middlebox perspective, very little changes. ECN-capable packets still carry ECT(0) or ECT(1). You still set CE on ECT packets, or drop Non-ECT packets, to signal when a serious level of persistent queue has developed, so that the sender needs to back off a lot. But if a less serious congestion condition exists, you can now signal *that* by changing some proportion of ECT(0) codepoints to ECT(1), with the intention that senders either reduce their cwnd growth rate, halt growth entirely, or enter a gradual decline. Those are three things that ECN cannot currently signal. This change is invisible to existing, RFC-compliant, deployed middleboxes and endpoints, so should be completely backwards-compatible and incrementally deployable in the network. (The only thing it breaks is the optional ECN integrity RFC that, according to fairly recent measurements, literally nobody bothered implementing.) Through TCP Timestamps, both sender and receiver can know fairly precisely when a round-trip has occurred. The receiver can use this information to calculate the ratio of ECT(0) and ECT(1) codepoints received in the most recent RTT. A new TCP Option could replace TCP Timestamps and the two bytes of padding that usually go with it, allowing reporting of this ratio without actually increasing the size of the TCP header. Large cwnds can be accommodated at the receiver by shifting both counters right until they both fit in a byte each; it is the ratio between them that is significant. It is then incumbent on the sender to do something useful with that information. A reasonable idea would be to aim for a 1:1 ratio via an integrating control loop. Receipt of even one ECT(1) signal might be considered grounds for exiting slow-start, while exceeding 1:2 ratio should limit growth rate to "Reno linear" semantics (significant for CUBIC), and exceeding 2:1 ratio should trigger a "Reno linear" *decrease* of cwnd. Through all this, a single CE mark (reported in the usual way via ECE and CWR) still has the usual effect of a multiplicative decrease. That's my proposal. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
Hi Mikael, > On Nov 29, 2018, at 08:46, Mikael Abrahamsson wrote: > > On Thu, 29 Nov 2018, Jonathan Morton wrote: > >> You are essentially proposing using ECT(1) to take over an intended function >> of Diffserv. > > Well, I am not proposing anything. I am giving people a heads-up that the L4S > authors are proposing this. > > But yes, you're right. Diffserv has shown itself to be really hard to > incrementally deploy across the Internet, so it's generally bleached mid-path. > >> In my view, that is the wrong approach. Better to improve Diffserv to the >> point where it becomes useful in practice. > > I agree, but unfortunately nobody has made me king of the Internet yet so I > can't just decree it into existance. With your kind of clue, I would happily vote you as (temporary) king of the internet. ;) > >> Cake has taken steps in that direction, by implementing some reasonable >> interpretation of some Diffserv codepoints. > > Great. I don't know if I've asked this but is CAKE easily implementable in > hardware? From what I can tell it's still only Marvell that is trying to put > high performance enough CPUs into HGWs to do forwarding in CPU (which can do > CAKE), all others still rely on packet accelerators to achieve the desired > speeds. As far as I can tell intel is pushing atom/x86 cores into its docsis SoCs (puma5/6/7) as well as into the high-end dsl SoCs (formerly lantiq, https://www.intel.com/content/www/us/en/smart-home/anywan-grx750-home-gateway-brief.html?wapkw=grx750), I am quite confident that those also pack enough punch for CPU based routing at Gbps-rates. In docsis modems these are already rolled-out, I do not know of any DSL modem/router that uses the GRX750 > >> My alternative use of ECT(1) is more in keeping with the other codepoints >> represented by those two bits, to allow ECN to provide more fine-grained >> information about congestion than it presently does. The main challenge is >> communicating the relevant information back to the sender upon receipt, >> ideally without increasing overhead in the TCP/IP headers. > > You need to go into the IETF process and voice this opinion then, because if > nobody opposes in the near time then ECT(1) might go to L4S interpretation of > what is going on. They do have ECN feedback mechanisms in their proposal, > have you read it? It's a whole suite of documents, architecture, AQM > proposal, transport proposal, the entire thing. > > On the other hand, what you want to do and what L4S tries to do might be > closely related. It doesn't sound too far off. > > Also, Bob Briscoe works for Cable Labs now, so he will now have silicon > behind him. This silicon might go into other things, not just DOCSIS > equipment, so if you have use-cases that L4S doesn't do but might do with > minor modification, it might be better to join him than to fight him. Call me naive, but the solution to the impasse at getting a common definition of diffserv agreed upon is replacing all TCP CC algorithms? This is replacing changing all endpoints (and network nodes) to honor diffserve with changing all endpoints to use a different TCP CC. At least I would call that ambitious (unless L4S offers noticeable advantages for all participating without being terribly unfair to the non-participating legacy TCP users*). Best Regards Sebastian *) Well, being unfair ad out-competing the legacy users would be the best way to incentivize everybody to upgrade, but that would also be true for a better diffserve scheme... > > -- > Mikael Abrahamssonemail: swm...@swm.pp.se > ___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
> On Nov 29, 2018, at 8:33 AM, Dave Taht wrote: > > This whole thread, although diversive... well, I'd really like everybody > to get together and try to write a joint paper on the best stuff to do, > worldwide, to make bufferbloat go away. +1 I don’t think it’s an accident that a discussion around CoDel evolved into a discussion around TCP. If newer TCP CC algorithms can eliminate self-induced bloat, it should still be possible for queue management to handle older TCP implementations and extreme cases while not damaging newer TCPs. Beyond that, there may be areas where queue management can actually enhance the performance of newer TCPs. For starters, there’s what happens within an RTT, which I suppose can’t be dealt with in the TCP stack, and referring back to one of Jon’s messages from 11/27, the possibility for improved signaling from AQM back to TCP on the state of the queue. Global coordination could make this work better. p.s.- Apologies for it taking me longer than an RTT to re-read the original CoDel papers and think through some implications. My original question might have been smarter.___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
Hi Dave, Am 29.11.18 um 08:39 schrieb Dave Taht: > "Bless, Roland (TM)" writes: > >> Hi Luca, >> >> Am 27.11.18 um 11:40 schrieb Luca Muscariello: >>> OK. We agree. >>> That's correct, you need *at least* the BDP in flight so that the >>> bottleneck queue never empties out. >> >> No, that's not what I meant, but it's quite simple. >> You need: data min_inflight=2 * RTTmin * bottleneck_rate to filly >> utilize the bottleneck link. >> If this is true, the bottleneck queue will be empty. If your amount >> of inflight data is larger, the bottleneck queue buffer will store >> the excess packets. With just min_inflight there will be no >> bottleneck queue, the packets are "on the wire". >> >>> This can be easily proven using fluid models for any congestion >>> controlled source no matter if it is >>> loss-based, delay-based, rate-based, formula-based etc. >>> >>> A highly paced source gives you the ability to get as close as >>> theoretically possible to the BDP+epsilon >>> as possible. >> >> Yep, but that BDP is "on the wire" and epsilon will be in the bottleneck >> buffer. > > I'm hoping I made my point effectively earlier, that > > " data min_inflight=2 * RTTmin * bottleneck_rate " That factor of 2 was a mistake in my first mail (sorry for that...). I corrected that three minutes after. I should have written: data min_inflight=RTTmin * bottleneck_rate > when it is nearly certain that more than one flow exists, means aiming > for the BDP in a single flow is generally foolish. Liked the stanford I think one should not confuse the buffer sizing rule with the calcluation for inflight data... > result, I think it's pretty general. I see hundreds of flows active > every minute. There was another paper that looked into some magic > 200-ish number as simultaneous flows active, normally So for buffer sizing, the BDP dependent rule is foolish in general, because it is optimized for older loss-based TCP congestion controls so that they can keep the utilization high. It's correct that in presence of multiple flows and good loss desynchronization, you still get high utilization with a smaller buffer (Appenzeller et. al, SIGCOMM 2004). However, when it comes to CWnd sizing, that inflight rule would convert to: data min_inflight=RTTmin * bottleneck_rate_share because other flows are present at the bottleneck. Interestingly enough: flows with a different RTT_min should use different CWnds, but their amount of queued data at the bottleneck should be nearly equal if you want to have flow rate fairness. Regards Roland ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
> On 29 Nov, 2018, at 10:19 am, Mikael Abrahamsson wrote: > >> I'd say the important bits are only slightly harder than doing the same with >> fq_codel. > > Ok, FQ_CODEL is way off to get implemented in HW. I haven't heard anyone even > discussing it. Have you (or anyone else) heard differently? I haven't heard of anyone with a specific project to do so, no. But there are basically three components to implement: 1: Codel AQM. This shouldn't be too difficult. 2: Hashing flows into separate queues. I think this is doable if you accept simplified memory management (eg. assuming every packet is a full MTU for allocation purposes) and accept limited/no support for encapsulated protocols (which simplifies locating the elements of the 5-tuple for hashing). 3: Dequeuing packets from queues following DRR++ rules. I think this is also doable, since it basically means managing some linked lists. It should be entirely feasible to prototype this at GigE speeds using existing FPGA hardware. Development can then continue from there. Overall, it's well within the capabilities of any competent HW vendor, so long as they're genuinely interested. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
On Thu, 29 Nov 2018, Jonathan Morton wrote: I'd say the important bits are only slightly harder than doing the same with fq_codel. Ok, FQ_CODEL is way off to get implemented in HW. I haven't heard anyone even discussing it. Have you (or anyone else) heard differently? I believe much of Cake's perceived CPU overhead is actually down to inefficiencies in the Linux network stack. Using a CPU and some modest auxiliary hardware dedicated to moving packets, not tied up in handling general-purpose duties, then achieving greater efficiency with reasonable hardware costs could be quite easy, without losing the flexibility to change algorithms later. I need to watch the MT7621 packet accelerator talk at the most recent OpenWrt summit. I installed OpenWrt 18.06.1 on an Mikrotik RB750vGR3 and just clicked my way around in LUCI and enabled flow offload and b00m, it now did full gig NAT44 forwarding. It's implemented as a -j FLOWOFFLOAD iptables rule. The good thing here might be that we could throw unimportant high speed flows off to the accelerator and then just handle the time sensitive flows in CPU, and just make sure the CPU has preferential access to the media for its time-sensitive flow. That kind of approach might make FQ_CODEL deployable even on slow CPU platforms with accelerators because you would only run some flows through FQ_CODEL, where the bulk high-speed flows would be handed off to acceleration (and we guess they don't care about PDV and bufferbloat). -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
Hi Dave, Am 29.11.18 um 08:33 schrieb Dave Taht: > "Bless, Roland (TM)" writes: > >> Hi Luca, >> >> Am 27.11.18 um 10:24 schrieb Luca Muscariello: >>> A congestion controlled protocol such as TCP or others, including QUIC, >>> LEDBAT and so on >>> need at least the BDP in the transmission queue to get full link >>> efficiency, i.e. the queue never empties out. >> >> This is not true. There are congestion control algorithms >> (e.g., TCP LoLa [1] or BBRv2) that can fully utilize the bottleneck link >> capacity without filling the buffer to its maximum capacity. The BDP > > Just to stay cynical, I would rather like the BBR and Lola folk to look > closely at asymmetric networks, ack path delay, and lower rates than > 1Gbit. And what the heck... wifi. :) Yes, absolutely right from a practical point of view. The thing is that we have to prioritize our research work at the moment. LoLa is meant to be a conceptual study rather than a real-world full blown, rock solid congestion control. It came out of a research project that focuses on high speed networks, thus we were experimenting with that. Scaling a CC across several orders of magnitude w.r.t. to speed is a challenge. I think, Mario also used 100Mbit/s for experiments (but they aren't in that paper) and it still works fine. However, experimenting with LoLa in real world environments will always be a problem if flows with loss-based CC are actually present at the same bottleneck, because LoLa will back-off (it will not sacrifice its low latency goal for getting more bandwidth). However, LoLa shows that you can actually get very close to the goal of limiting queuing delay, but achieving high utilization _and_ fairness at the same time. BTW, there is an ns-3 implementation of LoLa available... > BBRv1, for example, is hard coded to reduce cwnd to 4, not lower - because > that works in the data center. Lola, so far as I know, achieves its > tested results at 1-10Gbits. My world and much of the rest of the world, > barely gets to a gbit, on a good day, with a tail-wind. > > If either of these TCPs could be tuned to work well and not saturate > 5Mbit links I would be a happier person. RRUL benchmarks anyone? I think we need some students to do this... > I did, honestly, want to run lola, (codebase was broken), and I am > patiently waiting for BBRv2 to escape (while hoping that the googlers > actually run some flent tests at edge bandwidths before I tear into it) LoLa code is currently revised by Felix and I think it will converge to a more stable state within the next few weeks. > Personally, I'd settle for SFQ on the CMTSes, fq_codel on the home > routers, and then let the tcp-ers decide how much delay and loss they > can tolerate. > > Another thought... I mean... can't we all just agree to make cubic > more gentle and go fix that, and not a have a flag day? "From linux 5.0 > forward cubic shall: > > Stop increasing its window at 250ms of delay greater than > the initial RTT? > > Have it occasionally rtt probe a bit, more like BBR? RTT probing is fine, but in order to measure RTTmin you have to make sure that the bottleneck queue is empty. This isn't that trivial, because all flows need to synchronize a bit in order to achieve that. But both, BBR and LoLa, have such mechanisms. >> rule of thumb basically stems from the older loss-based congestion >> control variants that profit from the standing queue that they built >> over time when they detect a loss: >> while they back-off and stop sending, the queue keeps the bottleneck >> output busy and you'll not see underutilization of the link. Moreover, >> once you get good loss de-synchronization, the buffer size requirement >> for multiple long-lived flows decreases. >> >>> This gives rule of thumbs to size buffers which is also very practical >>> and thanks to flow isolation becomes very accurate. >> >> The positive effect of buffers is merely their role to absorb >> short-term bursts (i.e., mismatch in arrival and departure rates) >> instead of dropping packets. One does not need a big buffer to >> fully utilize a link (with perfect knowledge you can keep the link >> saturated even without a single packet waiting in the buffer). >> Furthermore, large buffers (e.g., using the BDP rule of thumb) >> are not useful/practical anymore at very high speed such as 100 Gbit/s: >> memory is also quite costly at such high speeds... >> >> Regards, >> Roland >> >> [1] M. Hock, F. Neumeister, M. Zitterbart, R. Bless. >> TCP LoLa: Congestion Control for Low Latencies and High Throughput. >> Local Computer Networks (LCN), 2017 IEEE 42nd Conference on, pp. >> 215-218, Singapore, Singapore, October 2017 >> http://doc.tm.kit.edu/2017-LCN-lola-paper-authors-copy.pdf > > > This whole thread, although diversive... well, I'd really like everybody > to get together and try to write a joint paper on the best stuff to do, > worldwide, to make bufferbloat go away. Yea, at least if everyone would use LoLa you could eliminat
Re: [Bloat] when does the CoDel part of fq_codel help in the real world?
If you have multiple flows the BDP will change as measured at the end points. Also the queue occupancy has to accommodate the overshoot. If you have a BDP in flight plus epsilon you should not size based on the long term value but on the overshoot. If you don't have space for it, the long term value may be even larger. On Thu, Nov 29, 2018 at 8:55 AM Dave Taht wrote: > On Wed, Nov 28, 2018 at 11:45 PM Jonathan Morton > wrote: > > > > > On 29 Nov, 2018, at 9:39 am, Dave Taht wrote: > > > > > > …when it is nearly certain that more than one flow exists, means aiming > > > for the BDP in a single flow is generally foolish. > > > > It might be more accurate to say that the BDP of the fair-share of the > path is the cwnd to aim for. Plus epsilon for probing. > > OK, much better, thanks. > > > - Jonathan Morton > > > > ___ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > > > -- > > Dave Täht > CTO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-831-205-9740 > ___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
> On 29 Nov, 2018, at 9:46 am, Mikael Abrahamsson wrote: > > I don't know if I've asked this but is CAKE easily implementable in hardware? I'd say the important bits are only slightly harder than doing the same with fq_codel. Some of the less important details might be significantly harder, and could reasonably be left out. The Diffserv bit should be nearly trivial to put in. I believe much of Cake's perceived CPU overhead is actually down to inefficiencies in the Linux network stack. Using a CPU and some modest auxiliary hardware dedicated to moving packets, not tied up in handling general-purpose duties, then achieving greater efficiency with reasonable hardware costs could be quite easy, without losing the flexibility to change algorithms later. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)
> On 29 Nov 2018, at 08:46, Mikael Abrahamsson wrote: > > On Thu, 29 Nov 2018, Jonathan Morton wrote: > >> You are essentially proposing using ECT(1) to take over an intended function >> of Diffserv. > > Well, I am not proposing anything. I am giving people a heads-up that the L4S > authors are proposing this. > > But yes, you're right. Diffserv has shown itself to be really hard to > incrementally deploy across the Internet, so it's generally bleached mid-path. Rumours, rumours. Just like "SCTP can never work", all the Internet must run over HTTP, etc etc. For the "DiffServ is generally bleached" stuff, there is pretty clear counter evidence. One: https://itc-conference.org/_Resources/Persistent/780df4482d0fe80f6180f523ebb9482c6869e98b/Barik18ITC30.pdf And another: http://tma.ifip.org/wp-content/uploads/sites/7/2017/06/mnm2017_paper13.pdf >> In my view, that is the wrong approach. Better to improve Diffserv to the >> point where it becomes useful in practice. > > I agree, but unfortunately nobody has made me king of the Internet yet so I > can't just decree it into existance. Well, for what you want (re-ordering tolerance), I would think that the LE codepoint is suitable. From: https://tools.ietf.org/html/draft-ietf-tsvwg-le-phb-06 "there ought to be an expectation that packets of the LE PHB could be excessively delayed or dropped when any other traffic is present" ... I think it would be strange for an application to expect this, yet not expect it to happen for only a few individual packets from a stream. >> Cake has taken steps in that direction, by implementing some reasonable >> interpretation of some Diffserv codepoints. > > Great. +1 > I don't know if I've asked this but is CAKE easily implementable in hardware? > From what I can tell it's still only Marvell that is trying to put high > performance enough CPUs into HGWs to do forwarding in CPU (which can do > CAKE), all others still rely on packet accelerators to achieve the desired > speeds. > >> My alternative use of ECT(1) is more in keeping with the other codepoints >> represented by those two bits, to allow ECN to provide more fine-grained >> information about congestion than it presently does. The main challenge is >> communicating the relevant information back to the sender upon receipt, >> ideally without increasing overhead in the TCP/IP headers. > > You need to go into the IETF process and voice this opinion then, because if > nobody opposes in the near time then ECT(1) might go to L4S interpretation of > what is going on. They do have ECN feedback mechanisms in their proposal, > have you read it? It's a whole suite of documents, architecture, AQM > proposal, transport proposal, the entire thing. > > On the other hand, what you want to do and what L4S tries to do might be > closely related. It doesn't sound too far off. Indeed I think that the proposal of finer-grain feedback using 2 bits instead of one is not adding anything to, but in fact strictly weaker than L4S, where the granularity is in the order of the number of packets that you sent per RTT, i.e. much higher. > Also, Bob Briscoe works for Cable Labs now, so he will now have silicon > behind him. This silicon might go into other things, not just DOCSIS > equipment, so if you have use-cases that L4S doesn't do but might do with > minor modification, it might be better to join him than to fight him. Yes... Cheers, Michael ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat