Re: [Cake] Fwd: net-next is OPEN...
Dave Tahtwrites: > do we consider cake ready this time? I'm not aware of anything outstanding, at least... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] A few puzzling Cake results
Jonathan Mortonwrites: your solution significantly hurts performance in the common case >>> >>> I'm sorry - did someone actually describe such a case? I must have >>> missed it. >> >> I started this whole thread by pointing out that this behaviour results >> in the delay of the TCP flows scaling with the number of active flows; >> and that for 32 active flows (on a 10Mbps link), this results in the >> latency being three times higher than for FQ-CoDel on the same link. > > Okay, so intra-flow latency is impaired for bulk flows sharing a > relatively low-bandwidth link. That's a metric which few people even > know how to measure for bulk flows, though it is of course important > for sparse flows. I was hoping you had a common use-case where > *sparse* flow latency was impacted, in which case we could actually > discuss it properly. > > But *inter-flow* latency is not impaired, is it? Nor intra-sparse-flow > latency? Nor packet loss, which people often do measure (or at least > talk about measuring) - quite the opposite? Nor goodput, which people > *definitely* measure and notice, and is influenced more strongly by > packet loss when in ingress mode? As I said, I'll run more tests and post more data once I have time. > The measurement you took had a baseline latency in the region of 60ms. The baseline link latency is 50 ms; which is sorta what you'd expect from a median non-CDN'en internet connection. > That's high enough for a couple of packets per flow to be in flight > independently of the bottleneck queue. Yes. As is the case for most flows going over the public internet... > I would take this argument more seriously if a use-case that mattered > was identified. Use cases where intra-flow latency matters, off the top of my head: - Real-time video with congestion response - Multiple connections multiplexed over a single flow (HTTP/2 or QUIC-style) - Anything that behaves more sanely than TCP at really low bandwidths. But yeah, you're right, no one uses any of those... /s > So far, I can't even see a coherent argument for making this tweak > optional (which is of course possible), let alone removing it > entirely; we only have a single synthetic benchmark which shows one > obscure metric move in the "wrong" direction, versus a real use-case > identified by an actual user in which this configuration genuinely > helps. And I've been trying to explain why you are the one optimising for pathological cases at the expense of the common case. But I don't think we are going to agree based on a theoretical discussion. So let's just leave this and I'll return with some data once I've had a chance to run some actual tests of the different use cases. -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] Diffserv LLT mode
Is anyone actually using the LLT diffserv setting? The draft describing it seems to have expired ages ago: https://datatracker.ietf.org/doc/draft-you-tsvwg-latency-loss-tradeoff/ -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] net-next is OPEN...
Pete Heistwrites: >> On Apr 18, 2018, at 7:43 AM, Pete Heist wrote: >> >> I also think I saw this happen at lower bandwidths as well, when the CPU >> wasn’t loaded. What I’ll do is re-test on the current version I have at, >> say, 50Mbit (or to where load drops substantially), then update to the head >> and test again, and let you know... >> >>> On Apr 17, 2018, at 3:52 PM, Jonathan Morton wrote: >>> On 16 Apr, 2018, at 11:55 pm, Pete Heist wrote: I remember that fairness behavior at low RTTs (< 20ms) needed to be either improved or documented >>> >>> The reason for the behaviour, IIRC, was that throughput dropped below 100% >>> when the latency target was reduced too much. Since then there has been a >>> small change which might improve it a little, so a retest would be >>> reasonable. > > At 50mbit I don’t see nearly as much fairness degradation at low RTTs, > although there’s some. Even at 100us, “fairness” is around 1.1 (1.0 being > perfectly fair) instead of the 2.x I saw at 500mbit. I do not see much of a > difference between the latest code (16d7fed, 2018-04-17) and the previous > code I tested (7061401, 2017-12-01), if that info is of use. > > RTT: tcp_1up upload Mbps / tcp_12up upload Mbps > > 7061401 (2017-12-01): > >100us: 23.80 / 25.85 >1ms: 23.89 / 29.46 >10ms: 23.93 / 24.66 >40ms: 23.96 / 24.10 >100ms: 23.97 / 24.12 > > 16d7fed (2018-04-17): > >100us: 23.97 / 26.49 >1ms: 23.89 / 26.27 >10ms: 23.98 / 26.37 >40ms: 23.94 / 24.08 >100ms: 23.97 / 24.12 > > I can post reports / flent files on request. > > So it appears this is CPU related, and not worth exploring further(?) > and not worth documenting(?) other than that once things have > stabilized, documenting how Cake degrades under various extreme > conditions would be informative. Awesome, thanks for re-testing! :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v2] Add Common Applications Kept Enhanced (cake) qdisc
Georgios Amanakiswrites: > On Tue, Apr 24, 2018 at 11:47 AM, Georgios Amanakis > wrote: >>> >>> Does anyone know if there is a way to do this so the module/builtin >>> split doesn't bite us? >>> >> #ifdef CONFIG_NF_CONNTRACK ?? That is basically what we're doing. But it looks like there's an IS_REACHABLE macro which does what we need. Will fix, thanks for pointing it out :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Testing wanted: Statistics API rework
Pete Heist <p...@eventide.io> writes: >> On Apr 24, 2018, at 11:30 PM, Toke Høiland-Jørgensen <t...@toke.dk> wrote: >> >> If someone has time to test it (especially on mips!), that would be >> great; I think we still have a few days left of net-next being open, so >> if it works for you, I'll resubmit tomorrow... > > I requested a re-build over in the Ubiquiti EdgeOS forums. Between > ER-X and ER-L I think that would cover 32-bit mips (little endian) and > 64-bit mips (big endian). Anything specific to test besides “it > works"? :) We need to verify that it will still print statistics and that the values do not look weird (especially the per-tin 'bytes' counter since that is a 64bit value). -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH ipruote2-next v4] Add support for cake qdisc
sch_cake is intended to squeeze the most bandwidth and latency out of even the slowest ISP links and routers, while presenting an API simple enough that even an ISP can configure it. Example of use on a cable ISP uplink: tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter To shape a cable download link (ifb and tc-mirred setup elided) tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort Cake is filled with: * A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel derived Flow Queuing system, which autoconfigures based on the bandwidth. * A novel "triple-isolate" mode (the default) which balances per-host and per-flow FQ even through NAT. * An deficit based shaper, that can also be used in an unlimited mode. * 8 way set associative hashing to reduce flow collisions to a minimum. * A reasonable interpretation of various diffserv latency/loss tradeoffs. * Support for zeroing diffserv markings for entering and exiting traffic. * Support for interacting well with Docsis 3.0 shaper framing. * Support for DSL framing types and shapers. * Support for ack filtering. * Extensive statistics for measuring, loss, ecn markings, latency variation. Various versions baking have been available as an out of tree build for kernel versions going back to 3.10, as the embedded router world has been running a few years behind mainline Linux. A stable version has been generally available on lede-17.01 and later. sch_cake replaces a combination of iptables, tc filter, htb and fq_codel in the sqm-scripts, with sane defaults and vastly simpler configuration. Cake's principal author is Jonathan Morton, with contributions from Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller, Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht, and Loganaden Velvindron. Testing from Pete Heist, Georgios Amanakis, and the many other members of the cake@lists.bufferbloat.net mailing list. Signed-off-by: Dave Taht <dave.t...@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk> --- Changelog: v4: - Switch stats parsing to use nested netlink attributes - Tweaks to JSON stats output keys v3: - Remove accidentally included test flag v2: - Updated netlink config ABI - Remove diffserv-llt mode - Various tweaks and clean-ups of stats output man/man8/tc-cake.8 | 632 + man/man8/tc.8 | 1 + tc/Makefile| 1 + tc/q_cake.c| 761 + 4 files changed, 1395 insertions(+) create mode 100644 man/man8/tc-cake.8 create mode 100644 tc/q_cake.c diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8 new file mode 100644 index ..30e41bc9 --- /dev/null +++ b/man/man8/tc-cake.8 @@ -0,0 +1,632 @@ +.TH CAKE 8 "23 November 2017" "iproute2" "Linux" +.SH NAME +CAKE \- Common Applications Kept Enhanced (CAKE) +.SH SYNOPSIS +.B tc qdisc ... cake +.br +[ +.BR bandwidth +RATE | +.BR unlimited* +| +.BR autorate_ingress +] +.br +[ +.BR rtt +TIME | +.BR datacentre +| +.BR lan +| +.BR metro +| +.BR regional +| +.BR internet* +| +.BR oceanic +| +.BR satellite +| +.BR interplanetary +] +.br +[ +.BR besteffort +| +.BR diffserv8 +| +.BR diffserv4 +| +.BR diffserv3* +] +.br +[ +.BR flowblind +| +.BR srchost +| +.BR dsthost +| +.BR hosts +| +.BR flows +| +.BR dual-srchost +| +.BR dual-dsthost +| +.BR triple-isolate* +] +.br +[ +.BR nat +| +.BR nonat* +] +.br +[ +.BR wash +| +.BR nowash* +] +.br +[ +.BR ack-filter +| +.BR ack-filter-aggressive +| +.BR no-ack-filter* +] +.br +[ +.BR memlimit +LIMIT ] +.br +[ +.BR ptm +| +.BR atm +| +.BR noatm* +] +.br +[ +.BR overhead +N | +.BR conservative +| +.BR raw* +] +.br +[ +.BR mpu +N ] +.br +[ +.BR ingress +| +.BR egress* +] +.br +(* marks defaults) + + +.SH DESCRIPTION +CAKE (Common Applications Kept Enhanced) is a shaping-capable queue discipline +which uses both AQM and FQ. It combines COBALT, which is an AQM algorithm +combining Codel and BLUE, a shaper which operates in deficit mode, and a variant +of DRR++ for flow isolation. 8-way set-associative hashing is used to virtually +eliminate hash collisions. Priority queuing is available through a simplified +diffserv implementation. Overhead compensation for various encapsulation +schemes is tightly integrated. + +All settings are optional; the default settings are chosen to be sensible in +most common deployments. Most people will only need to set the +.B bandwidth +parameter to get useful results, but reading the +.B Overhead Compensation +and +.B Round Trip Time +sections is strongly encouraged. + +.SH SHAPER PARAMETERS +CAKE uses a deficit-mode shaper, which does not exhibit the initial burst +typical of token-bucket shapers. It will automatically burst precisely as much +as required to maintain the configured throughput. As such, it is very +straightforward to configure. +.PP +.B unlimit
[Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc
sch_cake targets the home router use case and is intended to squeeze the most bandwidth and latency out of even the slowest ISP links and routers, while presenting an API simple enough that even an ISP can configure it. Example of use on a cable ISP uplink: tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter To shape a cable download link (ifb and tc-mirred setup elided) tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash CAKE is filled with: * A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel derived Flow Queuing system, which autoconfigures based on the bandwidth. * A novel "triple-isolate" mode (the default) which balances per-host and per-flow FQ even through NAT. * An deficit based shaper, that can also be used in an unlimited mode. * 8 way set associative hashing to reduce flow collisions to a minimum. * A reasonable interpretation of various diffserv latency/loss tradeoffs. * Support for zeroing diffserv markings for entering and exiting traffic. * Support for interacting well with Docsis 3.0 shaper framing. * Extensive support for DSL framing types. * Support for ack filtering. * Extensive statistics for measuring, loss, ecn markings, latency variation. A paper describing the design of CAKE is available at https://arxiv.org/abs/1804.07617 Various versions baking have been available as an out of tree build for kernel versions going back to 3.10, as the embedded router world has been running a few years behind mainline Linux. A stable version has been generally available on lede-17.01 and later. sch_cake replaces a combination of iptables, tc filter, htb and fq_codel in the sqm-scripts, with sane defaults and vastly simpler configuration. CAKE's principal author is Jonathan Morton, with contributions from Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller, Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht, and Loganaden Velvindron. Testing from Pete Heist, Georgios Amanakis, and the many other members of the cake@lists.bufferbloat.net mailing list. tc -s qdisc show dev eth2 qdisc cake 1: root refcnt 2 bandwidth 100Mbit diffserv3 triple-isolate rtt 100.0ms raw overhead 0 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 memory used: 0b of 500b capacity estimate: 100Mbit min/max network layer size:65535 / 0 min/max overhead-adjusted size:65535 / 0 average network hdr offset:0 Bulk Best EffortVoice thresh 6250Kbit 100Mbit 25Mbit target 5.0ms5.0ms5.0ms interval 100.0ms 100.0ms 100.0ms pk_delay 0us 0us 0us av_delay 0us 0us 0us sp_delay 0us 0us 0us pkts000 bytes 000 way_inds000 way_miss000 way_cols000 drops 000 marks 000 ack_drop000 sp_flows000 bk_flows000 un_flows000 max_len 000 quantum 300 1514 762 Tested-by: Pete Heist <petehe...@gmail.com> Tested-by: Georgios Amanakis <gamana...@gmail.com> Signed-off-by: Dave Taht <dave.t...@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk> --- Changelog: v3: - Use IS_REACHABLE() macro to fix compilation when sch_cake is built-in and conntrack is a module. - Switch the stats output to use nested netlink attributes instead of a versioned struct. - Remove GPL boilerplate. - Fix array initialisation style. v2: - Fix kbuild test bot complaint - Clean up the netlink ABI - Fix checkpatch complaints - A few tweaks to the behaviour of cake based on testing carried out while writing the paper. include/uapi/linux/pkt_sched.h | 104 ++ net/sched/Kconfig | 11 + net/sched/Makefile |1 + net/sched/sch_cake.c | 2616 4 files changed, 2732 insertions(+) create mode 100644 net/sched/sch_cake.c diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 37b5096ae97b..a54474ac8259 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -934,4 +934,108 @@ enum { #define TCA_CBS_MAX (__TCA_CBS_MAX - 1) +/* CAKE */ +enum { + TCA_CAKE_UNSPEC, + TCA_CAKE_BASE_RATE, + TCA_CAKE_DIFFSERV_MODE, + TCA_CAKE_ATM, + TCA_CAKE_FLOW_MODE, + TCA_CAKE_OVERHEAD, + TCA_CAKE_RTT, +
[Cake] Pre-print of Cake paper available
Last week we submitted an academic paper describing Cake. A pre-print is now available on arXiv: https://arxiv.org/abs/1804.07617 Comments welcome, of course :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] SKB queue API usage
I've started looking at what changes are needed before Cake can be submitted upstream (see the upstream-4.18 branch of the repo). While looking over the data structures I noticed that struct cake_flow contains pointers to struct sk_buff instread of using the in-kernel struct sk_buff_head queueing structure. Why is that? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Testing variants of the MTU latency scaling
Jonathan Mortonwrites: >> Based on all this, I propose we change the scaling mechanism so that it >> is only active in egress mode, and change it from 4 MTUs to 2. I'll >> merge Kevin's patch to do this unless someone complains loudly :) > > I suppose that makes enough sense. But let me look over the patch > again - there's probably a better way to do it. Right. Pushed the patch to the cobalt branch, feel free to fix it up :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Pre-print of Cake paper available
Sebastian Moeller <moell...@gmx.de> writes: > Hi Toke, > > > >> On Apr 24, 2018, at 10:47, Toke Høiland-Jørgensen <t...@toke.dk> wrote: >> >> Sebastian Moeller <moell...@gmx.de> writes: >> >>>> On Apr 24, 2018, at 01:01, Pete Heist <p...@eventide.io> wrote: >>>> >>>> >>>>> On Apr 23, 2018, at 10:39 AM, Toke Høiland-Jørgensen <t...@toke.dk> wrote: >>>>> >>>>> Last week we submitted an academic paper describing Cake. A pre-print is >>>>> now available on arXiv: https://arxiv.org/abs/1804.07617 >>>>> >>>>> Comments welcome, of course :) >>>> >>>> >>>> Nice work overall… :) Below is some feedback on content, and attached is a >>>> marked up PDF with some feedback on grammar and wording. Click the vanilla >>>> squares to show the notes. >>>> >>>> Content: >>>> >>>> - I wish there were some reference on how widespread of a problem >>>> bufferbloat actually is on the current Internet. That would bolster the >>>> initial assertion in the introduction. >>>> >>>> - Thank you, I finally “get" triple-isolate. :) But I find it easier >>>> to understand the behavior of dual-srchost and dual-dsthost, and I >>>> think most would prefer its behavior, despite the fact that it needs >>>> to be configured manually. Just a thought, knowing that cake >>>> currently targets home gateways, and that there are now the egress >>>> and ingress keywords, could host isolation default to dual-srchost >>>> for egress mode and dual-dsthost for ingress mode? Or since using the >>>> keywords would be fragile, is there a better way to know the proper >>>> sense for dual-srchost and dual-dsthost? >>> >>> The challenge is to find a heuristic that covers all reasonable >>> use cases and does no harm in unexpected cases. I could envision >>> setting up a cake instance on the upstream end of say a >>> microwave link, there "ingress" seems like the appropriate >>> keyword (as the goal would be to keep the link non-congested), >>> but for customer fairness "dual-srchost" would be the >>> appropriate keyword (or just srchost if all the ISP cares for is >>> inter-customer fairness). Sure this will not work with IPv6 (for >>> that we would either need to llok at the MACs or IMHO preferably >>> the IPv6 prefix (or the partially masked IPv6 IP-address, I >>> believe this to be better than MAC adresses as the ISP can >>> easily control the prefix, but I digress)). >> >> I don't think we can make assumptions on ISP deployments. > > Sure we do not really need to: > https://forum.lede-project.org/t/transparent-cake-box/2161/4?u=moeller0 > and > https://forum.lede-project.org/t/lede-as-a-dedicated-qos-bufferbloat-appliance/1861/14?u=moeller0 > so it looks like one person already use cake in an small ISP context. > Now 1 is not a very convincing number, but certainly larger than > zero... Well I just meant that there are many ways to deploy a shaper in an ISP context (centrally in the network, at the next hop from the customer, etc). Looking at those threads, they seem to be increasing the number of queues. Not sure they need to, but, well, there's nothing in principle that says this couldn't be configurable (it is in FQ-CoDel). It would need a bit of a reorg of the current code, though, so that would be a thing for later I guess... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Pre-print of Cake paper available
Sebastian Moellerwrites: >> Looking at those threads, they seem to be increasing the number of >> queues. Not sure they need to, but, well, there's nothing in principle >> that says this couldn't be configurable (it is in FQ-CoDel). It would >> need a bit of a reorg of the current code, though, so that would be a >> thing for later I guess... > > I had hoped that orangetek could be convinced to do measurements > with different number of queues to answer that question, but I > guess the current default of 1000 queues is decent for a typical > homenet, but will simply become a bit tight with 600 concurrent > households (I assume that its the peak usage that would want > more queues on the "back-haul", on average 1000 might still be > okay, especially with Jonathan's clever set associativity > design) Well, there's https://www.researchgate.net/profile/S_Oueslati/publication/247045479_On_the_Scalability_of_Fair_Queueing/links/5451fab10cf285a067c6af0a/On-the-Scalability-of-Fair-Queueing.pdf -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH] Remove plain pkt_sched.h dependency
Georgios Amanakiswrites: > I just tried an in-tree built using net-next. Aside from that line it > compiles on x86_64 successfully. I will give it a try on my router > later on. Yeah, I was getting to that; but removing that include means the changes have to be imported into the kernel-tree pkt_sched.h (which I assume you did to compile it?), so I wanted to finish up any other changes first... Thanks for testing! -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Pre-print of Cake paper available
Pete Heistwrites: >> On Apr 24, 2018, at 10:50 AM, Jonathan Morton wrote: >> >>> I think, if we wanted to support the ISP case, that a per-customer *shaper* >>> is more useful. >> >> Yes, I think the technology can be recoded to better suit a >> multi-subscriber environment; it would no longer be Cake, but would >> use some of the same key algorithms. > > Right, I was going to comment on the paper- where’s the ISP backhaul > use case? (Because I might still try to use it that way.) But I was > sure that this was already considered, and am not surprised that it > might be a different but related project. Well, you could use it on an ISP backhaul by having a separate CAKE instance per customer, and having another mechanism to assign customer traffic to each. An HTB tree would be one way to do this; separate interfaces per customer would be another. The latter works well if customers are on separate VLANs, for instance... But yeah, having a single instance of CAKE solve all of your customer shaping problems is probably not in scope currently... ;) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Pre-print of Cake paper available
Sebastian Moeller <moell...@gmx.de> writes: >> On Apr 24, 2018, at 01:01, Pete Heist <p...@eventide.io> wrote: >> >> >>> On Apr 23, 2018, at 10:39 AM, Toke Høiland-Jørgensen <t...@toke.dk> wrote: >>> >>> Last week we submitted an academic paper describing Cake. A pre-print is >>> now available on arXiv: https://arxiv.org/abs/1804.07617 >>> >>> Comments welcome, of course :) >> >> >> Nice work overall… :) Below is some feedback on content, and attached is a >> marked up PDF with some feedback on grammar and wording. Click the vanilla >> squares to show the notes. >> >> Content: >> >> - I wish there were some reference on how widespread of a problem >> bufferbloat actually is on the current Internet. That would bolster the >> initial assertion in the introduction. >> >> - Thank you, I finally “get" triple-isolate. :) But I find it easier >> to understand the behavior of dual-srchost and dual-dsthost, and I >> think most would prefer its behavior, despite the fact that it needs >> to be configured manually. Just a thought, knowing that cake >> currently targets home gateways, and that there are now the egress >> and ingress keywords, could host isolation default to dual-srchost >> for egress mode and dual-dsthost for ingress mode? Or since using the >> keywords would be fragile, is there a better way to know the proper >> sense for dual-srchost and dual-dsthost? > > The challenge is to find a heuristic that covers all reasonable > use cases and does no harm in unexpected cases. I could envision > setting up a cake instance on the upstream end of say a > microwave link, there "ingress" seems like the appropriate > keyword (as the goal would be to keep the link non-congested), > but for customer fairness "dual-srchost" would be the > appropriate keyword (or just srchost if all the ISP cares for is > inter-customer fairness). Sure this will not work with IPv6 (for > that we would either need to llok at the MACs or IMHO preferably > the IPv6 prefix (or the partially masked IPv6 IP-address, I > believe this to be better than MAC adresses as the ISP can > easily control the prefix, but I digress)). I don't think we can make assumptions on ISP deployments. The shaper may or may not be at the point of NAT, and per-customer prefix size can vary. To properly support the ISP per-customer fairness use case, we'd probably need to support arbitrary filtering (like what FQ-CoDel supports with 'tc filter'). And I think, if we wanted to support the ISP case, that a per-customer *shaper* is more useful... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Pre-print of Cake paper available
Pete Heist <p...@eventide.io> writes: >> On Apr 23, 2018, at 10:39 AM, Toke Høiland-Jørgensen <t...@toke.dk> wrote: >> >> Last week we submitted an academic paper describing Cake. A pre-print is >> now available on arXiv: https://arxiv.org/abs/1804.07617 >> >> Comments welcome, of course :) > > > Nice work overall… :) Below is some feedback on content, and attached > is a marked up PDF with some feedback on grammar and wording. Click > the vanilla squares to show the notes. Thanks! A lot of those should have been fixed before submission; boy, did I make a mess of verb tenses... Ah well, I'll incorporate your fixes, so they will be fixed for the camera ready (assuming it gets accepted) :) > Content: > > - I wish there were some reference on how widespread of a problem > bufferbloat actually is on the current Internet. That would bolster > the initial assertion in the introduction. Hmm, I do actually have a paper of my own that I could cite for this ;) The trouble is that we have a pretty tight page limit, and adding another reference takes us over that, meaning we would have to cut something else. And I think we can do without in an academic paper context at least... > - Thank you, I finally “get" triple-isolate. :) But I find it easier > to understand the behavior of dual-srchost and dual-dsthost, and I > think most would prefer its behavior, despite the fact that it needs > to be configured manually. Just a thought, knowing that cake currently > targets home gateways, and that there are now the egress and ingress > keywords, could host isolation default to dual-srchost for egress mode > and dual-dsthost for ingress mode? Or since using the keywords would > be fragile, is there a better way to know the proper sense for > dual-srchost and dual-dsthost? > > - If ‘nat’ is not the default, won’t host isolation not work by > default for most home gateways, almost all of which do nat? (Untested > assumption.) I think these questions are actually better handled as userspace policy issues. For instance, in sqm-scripts we could reasonably default to dual-srchost on egress and dual-dsthost on ingress, as we know which is which. > - Not in the paper, but is the ‘wash’ keyword really needed? Is anything really *needed*? ;) It's useful in settings where you want to clear diffserv markings, and not in other places... > - Is it worth mentioning that when the home gateway’s uplink is WiFi, > shaping is hard to do reliably, overhead and framing compensation > can’t even be implemented, and that this is all more properly handled > in the WiFi specific work? > > - One of the biggest deployment challenges (not unique to cake) is > that most people have to use shaping, since deploying cake on their > gateway’s external interface isn’t practical. But setting the rate > properly for shaping isn’t always straightforward. This is sheer > speculation, but could observed latency (obtained by passively > measuring TCP RTT, for instance) be used as a signal to control the > rate? I can only imagine this might be difficult to get right (though > I would’ve thought what BBR does is also), so just take this as food > for thought... I'd categorise these as relevant issues that we don't have space to discuss in the paper ;) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Testing variants of the MTU latency scaling
Jonas Mårtenssonwrites: > One thing that is still not clear to me from these results: if I run > cake on an IFB without ingress mode (i.e. the default?), does the MTU > scaling have any impact on TCP download throughput? Odds are that not using ingress mode will make Cake lose control of the bottleneck (that is what happened when I tried running a quick test), and so will mess up both latency and throughput as you hit the bloated upstream link buffer... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH net-next v2] Add Common Applications Kept Enhanced (cake) qdisc
sch_cake targets the home router use case and is intended to squeeze the most bandwidth and latency out of even the slowest ISP links and routers, while presenting an API simple enough that even an ISP can configure it. Example of use on a cable ISP uplink: tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter To shape a cable download link (ifb and tc-mirred setup elided) tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash CAKE is filled with: * A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel derived Flow Queuing system, which autoconfigures based on the bandwidth. * A novel "triple-isolate" mode (the default) which balances per-host and per-flow FQ even through NAT. * An deficit based shaper, that can also be used in an unlimited mode. * 8 way set associative hashing to reduce flow collisions to a minimum. * A reasonable interpretation of various diffserv latency/loss tradeoffs. * Support for zeroing diffserv markings for entering and exiting traffic. * Support for interacting well with Docsis 3.0 shaper framing. * Extensive support for DSL framing types. * Support for ack filtering. * Extensive statistics for measuring, loss, ecn markings, latency variation. A paper describing the design of CAKE is available at https://arxiv.org/abs/1804.07617 Various versions baking have been available as an out of tree build for kernel versions going back to 3.10, as the embedded router world has been running a few years behind mainline Linux. A stable version has been generally available on lede-17.01 and later. sch_cake replaces a combination of iptables, tc filter, htb and fq_codel in the sqm-scripts, with sane defaults and vastly simpler configuration. CAKE's principal author is Jonathan Morton, with contributions from Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller, Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht, and Loganaden Velvindron. Testing from Pete Heist, Georgios Amanakis, and the many other members of the cake@lists.bufferbloat.net mailing list. tc -s qdisc show dev eth2 qdisc cake 1: root refcnt 2 bandwidth 100Mbit diffserv3 triple-isolate rtt 100.0ms raw overhead 0 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 memory used: 0b of 500b capacity estimate: 100Mbit min/max network layer size:65535 / 0 min/max overhead-adjusted size:65535 / 0 average network hdr offset:0 Bulk Best EffortVoice thresh 6250Kbit 100Mbit 25Mbit target 5.0ms5.0ms5.0ms interval 100.0ms 100.0ms 100.0ms pk_delay 0us 0us 0us av_delay 0us 0us 0us sp_delay 0us 0us 0us pkts000 bytes 000 way_inds000 way_miss000 way_cols000 drops 000 marks 000 ack_drop000 sp_flows000 bk_flows000 un_flows000 max_len 000 quantum 300 1514 762 Tested-by: Pete Heist <petehe...@gmail.com> Tested-by: Georgios Amanakis <gamana...@gmail.com> Signed-off-by: Dave Taht <dave.t...@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk> --- Changes since v1: - Fix kbuild test bot complaint - Clean up the netlink ABI - Fix checkpatch complaints - A few tweaks to the behaviour of cake based on testing carried out while writing the paper. include/uapi/linux/pkt_sched.h | 107 ++ net/sched/Kconfig | 11 + net/sched/Makefile |1 + net/sched/sch_cake.c | 2620 4 files changed, 2739 insertions(+) create mode 100644 net/sched/sch_cake.c diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 37b5096ae97b..919f7c698fed 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -934,4 +934,111 @@ enum { #define TCA_CBS_MAX (__TCA_CBS_MAX - 1) +/* CAKE */ +enum { + TCA_CAKE_UNSPEC, + TCA_CAKE_BASE_RATE, + TCA_CAKE_DIFFSERV_MODE, + TCA_CAKE_ATM, + TCA_CAKE_FLOW_MODE, + TCA_CAKE_OVERHEAD, + TCA_CAKE_RTT, + TCA_CAKE_TARGET, + TCA_CAKE_AUTORATE, + TCA_CAKE_MEMORY, + TCA_CAKE_NAT, + TCA_CAKE_RAW, // was _ETHERNET + TCA_CAKE_WASH, + TCA_CAKE_MPU, + TCA_CAKE_INGRESS, + TCA_CAKE_ACK_FILTER, + __TCA_CAKE_MAX +}; +#define
[Cake] [PATCH iproute2-next v2] Add support for cake qdisc
sch_cake targets the home router use case and is intended to squeeze the most bandwidth and latency out of even the slowest ISP links and routers, while presenting an API simple enough that even an ISP can configure it. Example of use on a cable ISP uplink: tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter To shape a cable download link (ifb and tc-mirred setup elided) tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash CAKE is filled with: * A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel derived Flow Queuing system, which autoconfigures based on the bandwidth. * A novel "triple-isolate" mode (the default) which balances per-host and per-flow FQ even through NAT. * An deficit based shaper, that can also be used in an unlimited mode. * 8 way set associative hashing to reduce flow collisions to a minimum. * A reasonable interpretation of various diffserv latency/loss tradeoffs. * Support for zeroing diffserv markings for entering and exiting traffic. * Support for interacting well with Docsis 3.0 shaper framing. * Support for DSL framing types and shapers. * Support for ack filtering. * Extensive statistics for measuring, loss, ecn markings, latency variation. A paper describing the design of CAKE is available at https://arxiv.org/abs/1804.07617 Various versions baking have been available as an out of tree build for kernel versions going back to 3.10, as the embedded router world has been running a few years behind mainline Linux. A stable version has been generally available on lede-17.01 and later. sch_cake replaces a combination of iptables, tc filter, htb and fq_codel in the sqm-scripts, with sane defaults and vastly simpler configuration. Cake's principal author is Jonathan Morton, with contributions from Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller, Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht, and Loganaden Velvindron. Testing from Pete Heist, Georgios Amanakis, and the many other members of the cake@lists.bufferbloat.net mailing list. Signed-off-by: Dave Taht <dave.t...@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk> --- Changes since v1: - Updated netlink ABI - Remove diffserv-llt mode - Various tweaks and clean-ups of stats output man/man8/tc-cake.8 | 632 +++ man/man8/tc.8 | 1 + tc/Makefile| 1 + tc/q_cake.c| 797 + 4 files changed, 1431 insertions(+) create mode 100644 man/man8/tc-cake.8 create mode 100644 tc/q_cake.c diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8 new file mode 100644 index ..30e41bc9 --- /dev/null +++ b/man/man8/tc-cake.8 @@ -0,0 +1,632 @@ +.TH CAKE 8 "24 April 2018" "iproute2" "Linux" +.SH NAME +CAKE \- Common Applications Kept Enhanced (CAKE) +.SH SYNOPSIS +.B tc qdisc ... cake +.br +[ +.BR bandwidth +RATE | +.BR unlimited* +| +.BR autorate_ingress +] +.br +[ +.BR rtt +TIME | +.BR datacentre +| +.BR lan +| +.BR metro +| +.BR regional +| +.BR internet* +| +.BR oceanic +| +.BR satellite +| +.BR interplanetary +] +.br +[ +.BR besteffort +| +.BR diffserv8 +| +.BR diffserv4 +| +.BR diffserv3* +] +.br +[ +.BR flowblind +| +.BR srchost +| +.BR dsthost +| +.BR hosts +| +.BR flows +| +.BR dual-srchost +| +.BR dual-dsthost +| +.BR triple-isolate* +] +.br +[ +.BR nat +| +.BR nonat* +] +.br +[ +.BR wash +| +.BR nowash* +] +.br +[ +.BR ack-filter +| +.BR ack-filter-aggressive +| +.BR no-ack-filter* +] +.br +[ +.BR memlimit +LIMIT ] +.br +[ +.BR ptm +| +.BR atm +| +.BR noatm* +] +.br +[ +.BR overhead +N | +.BR conservative +| +.BR raw* +] +.br +[ +.BR mpu +N ] +.br +[ +.BR ingress +| +.BR egress* +] +.br +(* marks defaults) + + +.SH DESCRIPTION +CAKE (Common Applications Kept Enhanced) is a shaping-capable queue discipline +which uses both AQM and FQ. It combines COBALT, which is an AQM algorithm +combining Codel and BLUE, a shaper which operates in deficit mode, and a variant +of DRR++ for flow isolation. 8-way set-associative hashing is used to virtually +eliminate hash collisions. Priority queuing is available through a simplified +diffserv implementation. Overhead compensation for various encapsulation +schemes is tightly integrated. + +All settings are optional; the default settings are chosen to be sensible in +most common deployments. Most people will only need to set the +.B bandwidth +parameter to get useful results, but reading the +.B Overhead Compensation +and +.B Round Trip Time +sections is strongly encouraged. + +.SH SHAPER PARAMETERS +CAKE uses a deficit-mode shaper, which does not exhibit the initial burst +typical of token-bucket shapers. It will automatically burst precisely as much +as required to maintain the configured throughput. As such, it is very +straightforward to configure. +.PP +.B unlimited +(default) +.br + No limit on the b
[Cake] [PATCH iproute2-next v3] Add support for cake qdisc
sch_cake is intended to squeeze the most bandwidth and latency out of even the slowest ISP links and routers, while presenting an API simple enough that even an ISP can configure it. Example of use on a cable ISP uplink: tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter To shape a cable download link (ifb and tc-mirred setup elided) tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort Cake is filled with: * A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel derived Flow Queuing system, which autoconfigures based on the bandwidth. * A novel "triple-isolate" mode (the default) which balances per-host and per-flow FQ even through NAT. * An deficit based shaper, that can also be used in an unlimited mode. * 8 way set associative hashing to reduce flow collisions to a minimum. * A reasonable interpretation of various diffserv latency/loss tradeoffs. * Support for zeroing diffserv markings for entering and exiting traffic. * Support for interacting well with Docsis 3.0 shaper framing. * Support for DSL framing types and shapers. * Support for ack filtering. * Extensive statistics for measuring, loss, ecn markings, latency variation. Various versions baking have been available as an out of tree build for kernel versions going back to 3.10, as the embedded router world has been running a few years behind mainline Linux. A stable version has been generally available on lede-17.01 and later. sch_cake replaces a combination of iptables, tc filter, htb and fq_codel in the sqm-scripts, with sane defaults and vastly simpler configuration. Cake's principal author is Jonathan Morton, with contributions from Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller, Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht, and Loganaden Velvindron. Testing from Pete Heist, Georgios Amanakis, and the many other members of the cake@lists.bufferbloat.net mailing list. Signed-off-by: Dave Taht <dave.t...@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk> --- Changes since v2: - Accidentally sent a version of the tc patch from my testing branch which includes a test flag that shouldn't be in there. Sorry for the mistake. man/man8/tc-cake.8 | 632 +++ man/man8/tc.8 | 1 + tc/Makefile| 1 + tc/q_cake.c| 778 + 4 files changed, 1412 insertions(+) create mode 100644 man/man8/tc-cake.8 create mode 100644 tc/q_cake.c diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8 new file mode 100644 index ..30e41bc9 --- /dev/null +++ b/man/man8/tc-cake.8 @@ -0,0 +1,632 @@ +.TH CAKE 8 "23 November 2017" "iproute2" "Linux" +.SH NAME +CAKE \- Common Applications Kept Enhanced (CAKE) +.SH SYNOPSIS +.B tc qdisc ... cake +.br +[ +.BR bandwidth +RATE | +.BR unlimited* +| +.BR autorate_ingress +] +.br +[ +.BR rtt +TIME | +.BR datacentre +| +.BR lan +| +.BR metro +| +.BR regional +| +.BR internet* +| +.BR oceanic +| +.BR satellite +| +.BR interplanetary +] +.br +[ +.BR besteffort +| +.BR diffserv8 +| +.BR diffserv4 +| +.BR diffserv3* +] +.br +[ +.BR flowblind +| +.BR srchost +| +.BR dsthost +| +.BR hosts +| +.BR flows +| +.BR dual-srchost +| +.BR dual-dsthost +| +.BR triple-isolate* +] +.br +[ +.BR nat +| +.BR nonat* +] +.br +[ +.BR wash +| +.BR nowash* +] +.br +[ +.BR ack-filter +| +.BR ack-filter-aggressive +| +.BR no-ack-filter* +] +.br +[ +.BR memlimit +LIMIT ] +.br +[ +.BR ptm +| +.BR atm +| +.BR noatm* +] +.br +[ +.BR overhead +N | +.BR conservative +| +.BR raw* +] +.br +[ +.BR mpu +N ] +.br +[ +.BR ingress +| +.BR egress* +] +.br +(* marks defaults) + + +.SH DESCRIPTION +CAKE (Common Applications Kept Enhanced) is a shaping-capable queue discipline +which uses both AQM and FQ. It combines COBALT, which is an AQM algorithm +combining Codel and BLUE, a shaper which operates in deficit mode, and a variant +of DRR++ for flow isolation. 8-way set-associative hashing is used to virtually +eliminate hash collisions. Priority queuing is available through a simplified +diffserv implementation. Overhead compensation for various encapsulation +schemes is tightly integrated. + +All settings are optional; the default settings are chosen to be sensible in +most common deployments. Most people will only need to set the +.B bandwidth +parameter to get useful results, but reading the +.B Overhead Compensation +and +.B Round Trip Time +sections is strongly encouraged. + +.SH SHAPER PARAMETERS +CAKE uses a deficit-mode shaper, which does not exhibit the initial burst +typical of token-bucket shapers. It will automatically burst precisely as much +as required to maintain the configured throughput. As such, it is very +straightforward to configure. +.PP +.B unlimited +(default) +.br + No limit on the bandwidth. +.PP +.B bandwidth +RATE +.br
Re: [Cake] [PATCH iproute2-next v3] Add support for cake qdisc
Stephen Hemminger <step...@networkplumber.org> writes: > On Tue, 24 Apr 2018 14:30:46 +0200 > Toke Høiland-Jørgensen <t...@toke.dk> wrote: > >> +static void cake_print_json_tin(struct tc_cake_tin_stats *tst, uint version) >> +{ >> +open_json_object(NULL); >> +print_uint(PRINT_JSON, "threshold_rate", NULL, tst->threshold_rate); >> +print_uint(PRINT_JSON, "target", NULL, tst->target_us); >> +print_uint(PRINT_JSON, "interval", NULL, tst->interval_us); >> +print_uint(PRINT_JSON, "peak_delay", NULL, tst->peak_delay_us); >> +print_uint(PRINT_JSON, "average_delay", NULL, tst->avge_delay_us); >> +print_uint(PRINT_JSON, "base_delay", NULL, tst->base_delay_us); >> +print_uint(PRINT_JSON, "sent_packets", NULL, tst->sent.packets); >> +print_uint(PRINT_JSON, "sent_bytes", NULL, tst->sent.bytes); >> +print_uint(PRINT_JSON, "way_indirect_hits", NULL, >> tst->way_indirect_hits); >> +print_uint(PRINT_JSON, "way_misses", NULL, tst->way_misses); >> +print_uint(PRINT_JSON, "way_collisions", NULL, tst->way_collisions); >> +print_uint(PRINT_JSON, "drops", NULL, tst->dropped.packets); >> +print_uint(PRINT_JSON, "ecn_mark", NULL, tst->ecn_marked.packets); >> +print_uint(PRINT_JSON, "ack_drops", NULL, tst->ack_drops.packets); >> +print_uint(PRINT_JSON, "sparse_flows", NULL, tst->sparse_flows); >> +print_uint(PRINT_JSON, "bulk_flows", NULL, tst->bulk_flows); >> +print_uint(PRINT_JSON, "unresponsive_flows", NULL, >> tst->unresponse_flows); >> +print_uint(PRINT_JSON, "max_pkt_len", NULL, tst->max_skblen); >> +if (version >= 0x102) >> +print_uint(PRINT_JSON, "flow_quantum", NULL, tst->flow_quantum); > > Please don't version objects in netlink. That is not how netlink is > supposed to be used. Well, this is leftover from keeping track of different versions of the out-of-tree patch, and we already broke compatibility pretty thoroughly as a preparation for upstreaming. So I'm fine with dropping the version check; will resend. That being said, the versioning comes from the XSTATS API, which does not use netlink attributes for its members, but rather passes through as struct. So what is one supposed to do in this case? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH iproute2-next v3] Add support for cake qdisc
Stephen Hemminger <step...@networkplumber.org> writes: > On Tue, 24 Apr 2018 16:52:57 +0200 > Toke Høiland-Jørgensen <t...@toke.dk> wrote: > >> Well, this is leftover from keeping track of different versions of the >> out-of-tree patch, and we already broke compatibility pretty thoroughly >> as a preparation for upstreaming. So I'm fine with dropping the version >> check; will resend. >> >> That being said, the versioning comes from the XSTATS API, which does >> not use netlink attributes for its members, but rather passes through as >> struct. So what is one supposed to do in this case? > > If a structure is likely to change, then it should be decomposed into nested > netlink attributes. Once you send a structure over userspace API in netlink > it is fixed forever. Right. Is decomposing stats into netlink attributes actually possible within the qdisc dump_stats callback? Could we do something like: nla_start_nest(d->skb, TCA_CAKE_STATS); from within cake_dump_stats() and read it in cake_print_xstats in tc? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v2] Add Common Applications Kept Enhanced (cake) qdisc
Georgios Amanakis <gamana...@gmail.com> writes: > On Tue, 24 Apr 2018 13:44:06 +0200 > Toke Høiland-Jørgensen <t...@toke.dk> wrote: > >> +config NET_SCH_CAKE >> + tristate "Common Applications Kept Enhanced (CAKE)" >> + help >> + Say Y here if you want to use the Common Applications Kept Enhanced >> + (CAKE) queue management algorithm. >> + >> + To compile this driver as a module, choose M here: the module >> + will be called sch_cake. > > In net/sched/Kconfig we should probably add NF_CONNTRACK as a dependency: > "depends on NF_CONNTRACK" > > Otherwise if NET_SCH_CAKE=y and NF_CONNTRACK=m compilation fails with: > > net/sched/sch_cake.o: In function `cake_enqueue': > sch_cake.c:(.text+0x3e10): undefined reference to `nf_ct_get_tuplepr' > sch_cake.c:(.text+0x3e3a): undefined reference to `nf_conntrack_find_get' > make: *** [Makefile:1041: vmlinux] Error 1 Hmm we really don't want to have a hard depend on conntrack. We currently ifdef the conntrack-specific bits thus: #if IS_ENABLED(CONFIG_NF_CONNTRACK). Does anyone know if there is a way to do this so the module/builtin split doesn't bite us? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc
Eric Dumazet <eric.duma...@gmail.com> writes: > On 04/25/2018 06:42 AM, Toke Høiland-Jørgensen wrote: >> sch_cake targets the home router use case and is intended to squeeze the >> most bandwidth and latency out of even the slowest ISP links and routers, >> while presenting an API simple enough that even an ISP can configure it. >> > > * Support for ack filtering. > > Oh my god. Cake became a monster. Haha, you either die a hero or live long enough to see yourself become the villain, I guess? ;) We do realise there are lots of good reasons not to do ACK filtering; but it does make a difference on highly asymmetrical links (see http://blog.cerowrt.org/post/ack_filtering/), which sadly some people are still behind. > syzkaller will be very happy to trigger all kind of bugs in it. Heh, right. > Lack of any pskb_may_pull() is really concerning. By this you mean "check that the packet is long enough to contain the header we are looking for before trying to do ACK filtering", right? > How ack filter deals with reorders ? I think this will do the right thing? /* new ack sequence must be greater */ if (thisconn && (ntohl(tcph_check->ack_seq) > ntohl(tcph->ack_seq))) continue; > Also the forced GSO segmentation looks wrong to me. Wrong as in it's done wrong, or wrong as in you object to the whole concept? > This kills xmit_more gain we have when GSO is performed after > qdisc dequeue before hitting device. > > This should be driven by a parameter really, some threshold on the skb size. > > What performance number do you get on a 10Gbit NIC for example ? Single-flow throughput through 2 hops on a 40Gbit connection (with CAKE in unlimited mode vs pfifo_fast on the router): MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to testbed-40g-2 () port 0 AF_INET : demo Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.0018840.40 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to testbed-40g-2 () port 0 AF_INET : demo Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.0024804.77 Note, however, that CAKE is explicitly targeted at home gateways, which generally run at speed a few orders of magnitude slower than this, and we have heavily optimised for lowest possible latency. So this is a conscious design choice, I guess you could say. > Also, how ack filter can suppress packets after skb_gso_segment() ? Hmm, because pure ACKs are not generally aggregated (sorry, I'm not quite clear on when exactly GSO will kick in)? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Testing wanted: Statistics API rework
Pete Heist <p...@eventide.io> writes: >> On Apr 25, 2018, at 11:15 AM, Toke Høiland-Jørgensen <t...@toke.dk> wrote: >> >> Pete Heist <p...@eventide.io> writes: >> >>> I requested a re-build over in the Ubiquiti EdgeOS forums. Between >>> ER-X and ER-L I think that would cover 32-bit mips (little endian) and >>> 64-bit mips (big endian). Anything specific to test besides “it >>> works"? :) >> >> We need to verify that it will still print statistics and that the >> values do not look weird (especially the per-tin 'bytes' counter since >> that is a 64bit value). > > > In case you didn’t see it before, Lochnair has automatic builds set up > (cool) for EdgeOS that are failing as of yesterday: > > https://build.lochnair.net/job/ubiquiti/job/bufferbloat/job/cake_cobalt/ > > It may very well be due to the 3.10(!) kernel it ships with, but in > case it helps somehow, go to the latest build and “Console Output”. In > case you commit a change, the git repo is polled once an hour, > apparently. Otherwise, I don’t have an easy way to test mips… Yup, definitely an issue with kernel versions. Guess we'll have to go back and add in compatibility stuff later. Kevin was nice enough to test on an Archer C7, so I think we are good for now :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] gcc 7.3.1 issue on arch
Jonathan Mortonwrites: >> On 29 Mar, 2018, at 1:30 am, Bret Towe wrote: >> >> ./include/linux/kernel.h:6:10: fatal error: stdarg.h: No such file or >> directory > > I've started seeing the same error on my PowerPC box, running Gentoo. > It seemed specific to that machine, but the GCC version may well be > relevant since Gentoo tends to stabilise new GCC versions earlier on > minority platforms. Yeah, definitely only shows up in very new GCCs. > Shouldn't stdarg.h be a built-in header? Yup, which is why this is really odd... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] bufferbloat still misunderstood & ignored
Jonathan Mortonwrites: >> On 30 Mar, 2018, at 11:05 am, Pete Heist wrote: >> >> So this mapping from member (subscriber) to their MACs or IPs would need to >> be configurable somewhere > > Yes, I assumed something like that would be required to assign the > correct tier of service (or BRAS rate) to each subscriber, and that's > what makes it ISP-focused rather than end-user focused. > > In Linux it appears to be possible to assign a flow mark (ie. a > number) to packets using firewall rules or qdisc filters, and the > subscriber ID and tier ID could probably be encoded into that. It > would then only be necessary to inform the qdisc of this encoding (how > many bits for each portion) and to define the tiers. Even better, you can do arbitrary programmable matching these days, using tc-bpf(8). Theoretically FQ-CoDel supports this in place of the built-in flow hashing; but I have never managed to get it configured correctly (haven't tried that hard, though). -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] gcc 7.3.1 issue on arch
Bret Towewrites: > for anyone seeing a compile error like the below on Arch Linux > > CC [M] /root/sch_cake/sch_cake.o > In file included from ./include/linux/list.h:9:0, > from ./include/linux/module.h:9, > from /root/sch_cake/sch_cake.c:42: > ./include/linux/kernel.h:6:10: fatal error: stdarg.h: No such file or > directory > #include > ^~ > compilation terminated. I was seeing the same issue on a full kernel compile, which was fixed by make clean && make -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] bufferbloat still misunderstood & ignored
Kevin Darbyshire-Bryant via Cakewrites: > From: Kevin Darbyshire-Bryant > Subject: bufferbloat still misunderstood & ignored > To: Cake List > Date: Wed, 28 Mar 2018 15:46:47 + > > http://forums.thinkbroadband.com/talktalk/t/4587923-sensible-discussion-with-talktalk-about-bufferbloat.html > > The thing that bothers me more than anything….. the first reply comes > from a staff member of ‘thinkbroadband’. "Solution is two get two lines, so if a big download is using up one they can do latency sensitive stuff on the other." Lovely; an ISP that found a way to monetise bufferbloat... :/ -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Compiling under net-next
Toke Høiland-Jørgensen <t...@toke.dk> writes: > George Amanakis <gamana...@gmail.com> writes: > >> It seems the change was introduced here: >> https://patchwork.kernel.org/patch/9671147/ >> >> I drafted the following very simplistic patch, could somebody take a >> look at it? > > LGTM. We may want to actually use the extack feature at some point, but > we're not really returning error messages from the configuration except > when parsing fails, so I guess it doesn't matter too much in this > instance... Ah, wait, nla_parse_nested() takes an extack parameter now, so we should pass that through in cake_change() after this change. -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH 1/2] cake: print_uint format fixes
Kevin Darbyshire-Bryant <ke...@darbyshire-bryant.me.uk> writes: >> On 12 Mar 2018, at 15:11, Stephen Hemminger <step...@networkplumber.org> >> wrote: >> >> On Mon, 12 Mar 2018 10:56:09 +0100 >> Toke Høiland-Jørgensen <t...@toke.dk> wrote: >>> >>> Stephen, would you accept patches to fix the API (to add >>> print_{u,}int64() variants and turn print_uint() into native-int size)? >>> Or should we stick with the API currently there and live with the >>> inconsistency? :) >>> >>> -Toke >> >> I agree print_int should take int, print_uint should take unsigned int, and >> there should >> be print_u64 (and print_u32, print_u8) > > Submitted a patch to netdev but almost certainly did it wrong/based on > wrong tree etc ‘cos I don’t normally inhabit that space. The quite > simple attached patch restores corrrect functioning on my BE MIPS box. > Validation that the MANY calls to print_uint are using the correct > type is left as an exercise for the reader :-) But at least there’s no > longer a hidden promotion from uint to uint_64t going on. Awesome! Good work :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Compiling under net-next
George Amanakiswrites: > It seems the change was introduced here: > https://patchwork.kernel.org/patch/9671147/ > > I drafted the following very simplistic patch, could somebody take a > look at it? LGTM. We may want to actually use the extack feature at some point, but we're not really returning error messages from the configuration except when parsing fails, so I guess it doesn't matter too much in this instance... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Fwd: Compiling under net-next
Georgios Amanakiswrites: > #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 12, 0) > err = nla_parse_nested(tb, TCA_CAKE_MAX, opt, cake_policy); > -#else > +#elif LINUX_VERSION_CODE < KERNEL_VERSION(4, 16, 0) > err = nla_parse_nested(tb, TCA_CAKE_MAX, opt, cake_policy, NULL); > +#else > +err = nla_parse_nested(tb, TCA_CAKE_MAX, opt, cake_policy, extack); > #endif Yes, exactly! Could you squash the two patches into one, add a Signed-off-by tag and send it using git send-email? It's somewhat mangled in transit, it seems; but otherwise looks good to apply :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] eliminating link_ms and increasing packets to 64 bits
Dave Tahtwrites: > fprintf(f, " pkts"); > FOR_EACH_TIN(stnc, tst, i) > - fprintf(f, " %12u", tst->sent.packets); > + fprintf(f, " %12llu", tst->sent.packets); > fprintf(f, "\n"); Presumably this can potentially break column alignment? Not sure that is too much of a problem, though; mostly cosmetic? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] eliminating link_ms and increasing packets to 64 bits
Dave Tahtwrites: > the preceding space handles column alignment. the %12 is probably > going to be misleading. The space separates columns, the %12 aligns them. So it will break alignment. But as I said, mostly a cosmetic issue, so probably not something we should spend a lot of time on fixing. -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH] q_cake: Update xstats format to use per-tin structure
Toke Høiland-Jørgensen <t...@toke.dk> writes: > Jonathan Morton <chromati...@gmail.com> writes: > >>> On 11 Feb, 2018, at 7:26 pm, Toke Høiland-Jørgensen <t...@toke.dk> wrote: >>> >>> This updates tc to understand the updated cake xstats structure (which >>> splits out the tin stats in a separate structure the length of which is >>> included in the containing struct). >>> >>> Old versions of the cake stats will no longer be understood by the >>> resulting version of tc. >> >> I think you could make a pull request for these patches now. I can >> then add new stats to the new code instead of duplicating work. > > I think I actually have write access to the repos, so I'll just push the > commits. Sometime tonight; the code is on my laptop :) And totally forgot about it, of course. Sorry about that, pushed now :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc
David Miller <da...@davemloft.net> writes: > From: Toke Høiland-Jørgensen <t...@toke.dk> > Date: Wed, 25 Apr 2018 15:42:48 +0200 > >> +static void *cake_zalloc(size_t sz) >> +{ >> +void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN); >> + >> +if (!ptr) >> +ptr = vzalloc(sz); >> +return ptr; >> +} > > This is just kvzalloc(sz, GFP_KERNEL | __GFP_NOWARN)? Ah yes; we actually fixed that before, but guess it got lost this time around. Will fix :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc
Eric Dumazet <eric.duma...@gmail.com> writes: > On 04/25/2018 11:34 AM, Toke Høiland-Jørgensen wrote: >> Eric Dumazet <eric.duma...@gmail.com> writes: >> >>> On 04/25/2018 09:52 AM, Jonathan Morton wrote: >>>>> We can see here the high cost of forcing software GSO :/ >>>>> >>>>> Really, this should be done only : >>>>> 1) If requested by the admin ( tc gso ) >>>>> >>>>> 2) If packet size is above a threshold. >>>>> The threshold could be set by the admin, and/or based on a fraction of >>>>> the bandwidth parameter. >>>>> >>>>> I totally understand why you prefer to segment yourself for < 100 Mbit >>>>> links. >>>>> >>>>> But this makes no sense on 10Gbit+ >>>> >>>> It is absolutely necessary, so far as I can see, to segment GSO >>>> superpackets when overhead compensation is selected - as it very >>>> often should be, even on pure Ethernet links. Without that, the >>>> calculation of link occupancy time will be wrong. (The actual >>>> transmission time of an Ethernet frame is rather more than just 14 >>>> bytes longer than the underlying IP packet.) >>> >>> Just fix the overhead compensation computation in the code. >>> >>> skb in a qdisc have everything you need. >>> >>> qdisc_pkt_len_init() has initialized qdisc_skb_cb(skb)->pkt_len with >>> the exact bytes on the wire, and you have gso_segs to perform any >>> adjustement you need to do. >> >> The problem is that may not be the right values. For example, in many >> CPEs there's a built-in switch that strips VLAN tags before the packet >> actually hits the wire. So we do need to be able to get the actual >> packet size. Is it possible to get the sizes of the individual segments >> of a GSO packet? That way we could do the calculation for the whole >> super-packet... > > All segments of GSO packets have the same size, by definition. > > Only the last segment might be smaller, and again this can be inferred > from gso_size and gso_segs Gotcha. Until we are confident that we've implemented this in a way that works, will this do? @@ -88,6 +88,7 @@ #define CAKE_SET_WAYS (8) #define CAKE_MAX_TINS (8) #define CAKE_QUEUES (1024) +#define CAKE_SPLIT_GSO_THRESHOLD (12500) /* 1Gbps */ @@ -1437,7 +1439,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, * or if we need to know individual packet sizes for framing overhead. */ - if (skb_is_gso(skb)) { + if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) { struct sk_buff *segs, *nskb; netdev_features_t features = netif_skb_features(skb); /* signed slen to handle corner case @@ -2337,6 +2339,12 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt, if (tb[TCA_CAKE_MEMORY]) q->buffer_config_limit = nla_get_u32(tb[TCA_CAKE_MEMORY]); + if (q->rate_bps && (q->rate_bps <= CAKE_SPLIT_GSO_THRESHOLD || + q->rate_flags & CAKE_FLAG_OVERHEAD)) + q->rate_flags |= CAKE_FLAG_SPLIT_GSO; + else + q->rate_flags &= ~CAKE_FLAG_SPLIT_GSO; + if (q->tins) { sch_tree_lock(sch); cake_reconfigure(sch); ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Pre-print of Cake paper available
David Lang <da...@lang.hm> writes: > On Tue, 24 Apr 2018, Toke Høiland-Jørgensen wrote: > >> Pete Heist <p...@eventide.io> writes: >> >>>> On Apr 24, 2018, at 7:58 AM, Jonathan Morton <chromati...@gmail.com> wrote: >>>> >>>> Turning NAT support on by default might actually be reasonable, since >>>> it doesn't really break anything if it's not needed - it just eats a >>>> bit of CPU with unnecessary conntrack lookups. >>> >>> I would be for it, if it eats say < 1% additional CPU, and preferably >>> less. I expect the impact to increase with packet rates. >> >> I'm a bit worried that the way it is implemented now, if we turn it on >> by default we risk activating conntrack even when it was otherwise >> disabled... > > I will say that just about every system ships with conntrack enabled, and > disabling it can be pretty difficult (especially in LEDE/OpenWRT), there are > so > many things that require it that tracking them all down and disabling them is > very difficult. > > There are not that many places where Cake is going to be used that NAT or > some > other thing that requires connection tracking is not also going to be used, > in > the remaining cases, can it be disabled manually in configs after it's been > sucked in automatically? Hmm, actually it looks like just compiling against the conntrack code adds a module dependency on conntrack. And as far as I can tell, the code doesn't initiate any new conntrack state if it doesn't already exist. So I think it's safe to turn on NAT mode by default. Will add that :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc
Eric Dumazetwrites: > On 04/25/2018 09:52 AM, Jonathan Morton wrote: >>> We can see here the high cost of forcing software GSO :/ >>> >>> Really, this should be done only : >>> 1) If requested by the admin ( tc gso ) >>> >>> 2) If packet size is above a threshold. >>> The threshold could be set by the admin, and/or based on a fraction of the >>> bandwidth parameter. >>> >>> I totally understand why you prefer to segment yourself for < 100 Mbit >>> links. >>> >>> But this makes no sense on 10Gbit+ >> >> It is absolutely necessary, so far as I can see, to segment GSO >> superpackets when overhead compensation is selected - as it very >> often should be, even on pure Ethernet links. Without that, the >> calculation of link occupancy time will be wrong. (The actual >> transmission time of an Ethernet frame is rather more than just 14 >> bytes longer than the underlying IP packet.) > > Just fix the overhead compensation computation in the code. > > skb in a qdisc have everything you need. > > qdisc_pkt_len_init() has initialized qdisc_skb_cb(skb)->pkt_len with > the exact bytes on the wire, and you have gso_segs to perform any > adjustement you need to do. The problem is that may not be the right values. For example, in many CPEs there's a built-in switch that strips VLAN tags before the packet actually hits the wire. So we do need to be able to get the actual packet size. Is it possible to get the sizes of the individual segments of a GSO packet? That way we could do the calculation for the whole super-packet... > Do not kill GSO only because you do not want to deal with it. It's not (just) that we don't want to deal with it, it is also a very aggressive optimisation for latency, which makes a lot of sense at residential internet bandwidths. >> Another reason to apply GSO segmentation is to achieve maximal >> smoothness of flow isolation. This should still be achievable within >> some tolerance at high link rates, but calculating this tolerance is >> complicated by the triple-isolate algorithm. >> >> If there's a way to obtain the individual packet sizes without >> incurring the full segmentation overhead, it may be worth considering >> (at high link rates only). I would want to leave it on by default, >> because some of Cake's demonstrably superior latency performance >> depends on seeing the real packets, not the aggregates, and the >> overhead only becomes significant above 100Mbps on weak MIPS CPUs and >> 1Gbps on vaguely modern x86 stuff. > > Again, these arguments are moot on 10Gbit+. > > Lets build the future, do not pretend GSO/TSO is not part of it. I'm all for being future proof, but we also do want well-tested code. Reworking this feature to function correctly with GSO is going to take some work. Would you accept gating GSO splitting on bandwidth for now, and adding correct overhead compensation for GSO packets later? something like: if (shaper_active() && (bandwidth <= 1Gbps || overhead_configured)) split_gso() -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] CAKE upstreaming - testers wanted, ACK filtering rescuers needed
For those who have not been following the discussion on the upstreaming patches, here's an update: - I've just pushed patches to only split GSO packets when shaping below one gigabit; and hopefully made the overhead compensation code deal gracefully with GSO packets if someone for some reason wants to use the shaper at speeds higher than that and still use the overhead compensation code. - It turns out that the ACK filtering code does not properly sanity check the packet sizes, and so can potentially crash the box running CAKE if it receives malformed packets. So if no one steps up to fix that within the next few days, or I'll submit the next version without it (I'm not going to open that particular can of worms)... This doesn't mean it can't be added back later, of course, it just means it won't go upstream this time around. - NAT mode is now enabled by default; doesn't seem to be a good reason not to as the compile time dependency already makes the module depend on conntrack. So please do test the current git version (cobalt branch, still). I'm planning to resubmit on Friday. -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] Linux 4.19 released...
...which means that CAKE is now officially in upstream Linux. Woohoo! It's even listed at the top of the "Coolest features" list on the kernelnewbies overview: https://kernelnewbies.org/Linux_4.19#Better_networking_experience_with_the_CAKE_queue_management_algorithm Congratulations, and thanks, to all involved :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Cerowrt-devel] apu2 sqm/htb issue + a minor win for speeding up fq_codel itself
Dave Taht writes: > Well, fq_codel does use a lot less cpu but everything seems slower I don't suppose 18.06 enables any of the SPECTRE mitigations (was that an issue on ARM)? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Cerowrt-devel] apu2 sqm/htb issue + a minor win for speeding up fq_codel itself
Jonathan Morton writes: > I'm not familiar with precisely what mitigations are now in use on > ARM. I am however certain that, on a device running only trustworthy > code (ie. not running a Web browser), mitigating Spectre is > unnecessary. If an attacker gets into a position to exploit it, he's > already compromised the device enough to run a botnet anyway. Yup, especially on openwrt, where most daemons run as root anyway :) I would assume that something like the retpoline indirect function call protection is not actually enabled on openwrt; but since we were talking about performance regressions, that is certainly a major one... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] arp flow dissector
Jonathan Morton writes: >> On 2 Sep, 2018, at 10:37 pm, Dave Taht wrote: >> >> Didn't know we had this now. >> >> https://www.spinics.net/lists/netdev/msg413986.html > > Does that automagically give Cake an idea of the IP addresses > associated with the packet, for host-fairness purposes? If not, we > should fix that. Don't think so. There's a follow-up patch to enable the matching for cls_flower: https://www.spinics.net/lists/netdev/msg413985.html However, in normal operation ARPs should be fairly rare, so adding this support to CAKE would mostly be to protect against flooding, wouldn't it? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH] MAINTAINERS: Add entry for CAKE qdisc
We would like the existing community to be kept in the loop for any new developments on CAKE; and I certainly plan to keep maintaining it. Reflect this in MAINTAINERS. Signed-off-by: Toke Høiland-Jørgensen --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 3bd775ba51ce..96be48879a7b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3276,6 +3276,12 @@ F: include/uapi/linux/caif/ F: include/net/caif/ F: net/caif/ +CAKE QDISC +M: Toke Høiland-Jørgensen +L: cake@lists.bufferbloat.net (moderated for non-subscribers) +S: Maintained +F: net/sched/sch_cake.c + CALGARY x86-64 IOMMU M: Muli Ben-Yehuda M: Jon Mason -- 2.19.1 ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: >> On Jan 4, 2019, at 11:34 PM, Toke Høiland-Jørgensen wrote: >> >> Pete Heist writes: >> >> This basically means that we can't use CAKE as a leaf qdisc with GSO >> splitting as it stands currently. I *think* the solution is for CAKE to >> notify its parents; could you try the patch below and see if it helps? > > Aha, good news. :) > > I’m probably not currently in a position to try it on my old kernels with the > out of tree build: > > On 3.16.7: Hmm, try this version for 3.16 - probably doesn't work on later kernels. I'll look into a proper backport once you've confirmed that it works :) -Toke diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index b910cd5c56f7..ef3acdbb8429 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1617,6 +1617,44 @@ static u32 cake_classify(struct Qdisc *sch, struct cake_tin_data **t, static void cake_reconfigure(struct Qdisc *sch); + +static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) +{ + struct Qdisc *q; + + if (!(root->flags & TCQ_F_BUILTIN) && + root->handle == handle) + return root; + + list_for_each_entry(q, >list, list) { + if (q->handle == handle) + return q; + } + return NULL; +} + +void adjust_parent_qlen(struct Qdisc *sch, unsigned int n, + unsigned int len) +{ + u32 parentid; + if (n == 0 && len == 0) + return; + rcu_read_lock(); + while ((parentid = sch->parent)) { + if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS)) + break; + + sch = qdisc_match_from_root(qdisc_dev(sch), TC_H_MAJ(parentid)); + if (sch == NULL) { + WARN_ON_ONCE(parentid != TC_H_ROOT); + break; + } + sch->q.qlen += n; + sch->qstats.backlog += len; + } + rcu_read_unlock(); +} + static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { @@ -1667,7 +1705,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) { struct sk_buff *segs, *nskb; netdev_features_t features = netif_skb_features(skb); - unsigned int slen = 0; + unsigned int slen = 0, numsegs = 0; segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); if (IS_ERR_OR_NULL(segs)) @@ -1684,6 +1722,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, sch->q.qlen++; slen += segs->len; + numsegs++; q->buffer_used += segs->truesize; b->packets++; segs = nskb; @@ -1696,7 +1735,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, sch->qstats.backlog += slen; q->avg_window_bytes += slen; - qdisc_tree_reduce_backlog(sch, 1, len); + adjust_parent_qlen(sch, numsegs - 1, slen - len); consume_skb(skb); } else { /* not splitting */ ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic
> Jon, is there anything I can check by instrumenting the code somewhere > specific? Is there any way you could test with a bulk UDP flow? I'm wondering whether this is a second-order effect where TCP ACKs are limited in a way that cause the imbalance? Are you using ACK compression? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: > Ok, that fixes the compiler warnings, but I get this now. Same as > before it’s repeated until reboot, stack sometimes changes but I > always see sch_hfsc.c:1427 at the beginning: Hmm, that's odd. Could you try adding this debugging line in adjust_parent_qlen(), right before the sch->q.qlen += n line: net_info_ratelimited("Adjusting parent qdisc %d with pkt += %d, len += %d", parentid, n, len); And see if you actually get any of those lines in your dmesg? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
On 5 January 2019 11:44:44 CET, Jonathan Morton wrote: >> On 5 Jan, 2019, at 12:34 am, Toke Høiland-Jørgensen >wrote: >> >> This basically means that we can't use CAKE as a leaf qdisc with GSO >> splitting as it stands currently. I *think* the solution is for CAKE >to >> notify its parents; could you try the patch below and see if it >helps? > >Is this also a problem on current kernels, or only older ones? Newer ones as well (I assume - haven't tested). -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: >> On Jan 5, 2019, at 2:35 PM, Toke Høiland-Jørgensen wrote: >> >> Pete Heist writes: >> >>>> On Jan 5, 2019, at 2:10 PM, Toke Høiland-Jørgensen wrote: >>>> >>>> Hmm, that's odd. Could you try adding this debugging line in >>>> adjust_parent_qlen(), right before the sch->q.qlen += n line: >>>> >>>>net_info_ratelimited("Adjusting parent qdisc %d with pkt += %d, >>>> len += %d", >>>> parentid, n, len); >>>> >>>> And see if you actually get any of those lines in your dmesg? >>> >>> I do see the messages twice, then not after that in the rest of the >>> output... >> >> Right. Looking at the HFSC code some more, I think the bug is actually >> caused by another, but related, interaction between HFSC and CAKE. >> >> Specifically, this line: >> >> https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_hfsc.c#L1605 >> >> where HFSC checks whether the child queue len is 1, which it interprets >> as the event that activates that queue. However, because CAKE splits the >> packet, this check will fail, and the HFSC class will not be activated. >> This also explains why you only see the bug with HFSC, and not with HTB >> (although I do think that we still need to update the hierarchy). >> >> The good news it that it is a fairly simple to fix in HFSC. The bad news >> is that it's something that's hard to work around from the out-of-tree >> CAKE... > > Aha, well, I wonder if we’ll see this problem with other qdiscs- maybe > cbq, if I ever get a chance to try it (not hurrying yet). Ideally this > interaction between qdiscs would be clarified somewhere, at some > point. :) > > Thanks a lot for doing the discovery though! You're welcome, and thanks for you help :) > We may not have hfsc+cake with GSO splitting on older kernels very > soon, but what should we do with this? There’s nobody in MAINTAINERS > for hfsc, so we may not get much of a response to any bug > submissions... $ ./scripts/get_maintainer.pl net/sched/sch_hfsc.c Jamal Hadi Salim (maintainer:TC subsystem) Cong Wang (maintainer:TC subsystem) Jiri Pirko (maintainer:TC subsystem) "David S. Miller" (maintainer:NETWORKING [GENERAL]) net...@vger.kernel.org (open list:TC subsystem) I'll submit a patch sometime next week, and also look into the qlen adjustment for CAKE GSO splitting... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: > Quick update to the trace because I had to apply the patch manually > and missed one line to remove (qdisc_tree_reduce_backlog...), just so > it doesn’t through off the addresses for you, but it still does the > same thing: Ah, my bad; you need to replace qdisc_match_from_root(qdisc_dev(sch), TC_H_MAJ(parentid)); with qdisc_match_from_root(qdisc_dev(sch)->qdisc, TC_H_MAJ(parentid)); -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: >> On Jan 5, 2019, at 2:10 PM, Toke Høiland-Jørgensen wrote: >> >> Hmm, that's odd. Could you try adding this debugging line in >> adjust_parent_qlen(), right before the sch->q.qlen += n line: >> >> net_info_ratelimited("Adjusting parent qdisc %d with pkt += %d, >> len += %d", >> parentid, n, len); >> >> And see if you actually get any of those lines in your dmesg? > > > I do see the messages twice, then not after that in the rest of the > output... Right. Looking at the HFSC code some more, I think the bug is actually caused by another, but related, interaction between HFSC and CAKE. Specifically, this line: https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_hfsc.c#L1605 where HFSC checks whether the child queue len is 1, which it interprets as the event that activates that queue. However, because CAKE splits the packet, this check will fail, and the HFSC class will not be activated. This also explains why you only see the bug with HFSC, and not with HTB (although I do think that we still need to update the hierarchy). The good news it that it is a fairly simple to fix in HFSC. The bad news is that it's something that's hard to work around from the out-of-tree CAKE... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: > That first bug report looks decidedly similar to mine, but Toke would > have to comment on the specifics. So far I see the patch to > sch_codel.c you mentioned and another two-liner to remove the warning > in hfsc.c (https://patchwork.ozlabs.org/patch/933611/). It would be > really good to know that that warning is truly bogus, that it wasn’t > put there by the author for good reason, as Toke may have been > thinking of a different way to fix hfsc. Well, it's the same WARN_ON(), and if that patch had been applied, debugging our issue would have been a lot harder, I think. But the underlying issue is different, and we still need to fix HFSC (and probably CAKE as well). > Thanks for bringing this up! I see that I ought to search > OpenWRT/kernel.org next time… :) Yeah, nice find. I'll make sure that CAKE doesn't suffer from the same issue that https://www.spinics.net/lists/netdev/msg450655.html fixes for CoDel while I'm writing patches... :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: >> On Jan 5, 2019, at 9:10 PM, Toke Høiland-Jørgensen wrote: >> >> Well, it's the same WARN_ON(), and if that patch had been applied, >> debugging our issue would have been a lot harder, I think. > > Yikes, this is what I mean. I’d rather suffer the warning than be > troubleshooting flaky behavior. That patch is applied in the latest > kernel, so hopefully it’s the right thing. Well, if it causes false positives, getting rid of it is probably worth it just to avoid spurious bug reports :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: >> On Jan 5, 2019, at 11:27 PM, Toke Høiland-Jørgensen wrote: >> >> Pete Heist writes: >> >>>> On Jan 5, 2019, at 9:10 PM, Toke Høiland-Jørgensen wrote: >>>> >>>> Well, it's the same WARN_ON(), and if that patch had been applied, >>>> debugging our issue would have been a lot harder, I think. >>> >>> Yikes, this is what I mean. I’d rather suffer the warning than be >>> troubleshooting flaky behavior. That patch is applied in the latest >>> kernel, so hopefully it’s the right thing. >> >> Well, if it causes false positives, getting rid of it is probably worth >> it just to avoid spurious bug reports :) > > If it helps finds bugs, I’d rather know about it. > > But, a warning once in a while might have been better than a repeated > one that sometimes makes a hard reboot necessary, causing need for a > manual, offline fsck in order to boot again. Just sayin’… ;) Yes, that is why WARN_ON tends to be frowned upon ;) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: > Lastly, is using cake as a leaf to htb risky until a fix is made? I’ve > been doing that for a while without any apparent issues, though I’m > hesitating now to try that in a production environment. Hmm, that's a good question. I would expect so; but I would also expect the issue to show up pretty much straight away, so if you haven't hit it yet, I may be wrong. I'll do some more digging... Should probably also try to replicate all this stuff on my own machine :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic
Jonathan Morton writes: >>> Yes, exactly. Would be interesting to hear what Jonathan, Toke and >>> others think. I want to see if fairness is preserved in this case with >>> sparse flows only. Could flent do this? >> >> Well, sparse flows are (by definition) not building a queue, so it >> doesn't really make sense to talk about fairness for them. How would you >> measure that? >> >> This is also the reason I agree that they shouldn't be counted for host >> fairness calculation purposes, BTW... > > The trick is that we need to keep fairness of the deficit > replenishments, which occur for sparse flows as well as bulk ones, but > in smaller amounts. The number of active flows is presently the > stand-in for this. It's possible to have a host backlogged with > hundreds of new flows which are, by definition, sparse. Right, there's some care needed to ensure we don't get weird behaviour during transients such as flow startup. > I'm still trying to get my head around how the modified code works in > detail. It's possible that a different implementation would either be > more concise and readable, or better model what is actually needed. > But I can't tell until I grok it. Cool, good to know you are on it; I'm happy to wait until you've had some time to form an opinion on this :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic
Sebastian Moeller writes: > Hi Toke, > >> On Jan 18, 2019, at 14:33, Toke Høiland-Jørgensen wrote: >> >> Georgios Amanakis writes: >> >>> Yes, exactly. Would be interesting to hear what Jonathan, Toke and >>> others think. I want to see if fairness is preserved in this case with >>> sparse flows only. Could flent do this? >> >> Well, sparse flows are (by definition) not building a queue, so it >> doesn't really make sense to talk about fairness for them. How would you >> measure that? >> >> This is also the reason I agree that they shouldn't be counted for host >> fairness calculation purposes, BTW... > > That leads to a question (revealing my lack of detailed knowledge) if > there is a sufficient number of new flows (that should qualify as > new/sparse) that servicing all of them takes longer than each queue > accumulating new packets, at what point in time are these flows > considered "unworthy" of sparse flow boosting? Or differetly how i > cake going to deal with a UDP flood where the 5 tuple hash is > different for all packets (say by spoofing ports or randomly picking > dst addresses)? Well, what is considered a sparse flow is a function of both the flow rate itself, *as well as* the link rate, number of competing flows, etc. So a flow can be sparse on one link, but turn into a bulk flow on another because that link has less capacity. I explore this in some detail here: https://doi.org/10.1109/LCOMM.2018.2871457 -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic
George Amanakis writes: > A better version of the patch for testing. So basically, you're changing the host fairness algorithm to only consider bulk flows instead of all active flows to that host, right? Seems reasonable to me. Jonathan, any opinion? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic
Georgios Amanakis writes: > Yes, exactly. Would be interesting to hear what Jonathan, Toke and > others think. I want to see if fairness is preserved in this case with > sparse flows only. Could flent do this? Well, sparse flows are (by definition) not building a queue, so it doesn't really make sense to talk about fairness for them. How would you measure that? This is also the reason I agree that they shouldn't be counted for host fairness calculation purposes, BTW... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] Defending my thesis on Friday - live stream available
Hey everyone I'm defending my PhD thesis this Friday (Nov 23rd), which contains content relevant to all three lists (hence the three-way cross post). In case anyone is interested in watching, I'm setting up a live stream on Youtube, see link below. The defence starts at 09:15 CET (UTC+1), but the video should be available for later viewing as well. Blog post with embedded video, and links to slides and the full thesis: https://blog.tohojo.dk/2018/11/nearing-the-end-thesis-defence.html Direct Youtube link: https://youtu.be/upvx6rpSLSw -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Defending my thesis on Friday - live stream available
Georgios Amanakis writes: > Thank you very much for doing this, Toke! All the best of luck on > Friday!!! Thanks! :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Emulating Bufferbloat
Fabian Ruffy writes: > Hello, > > this is a somewhat esoteric question. I am trying to actually force > bufferbloat > in an emulation setup I am using. I set up a dumbbell topology and push > traffic > through it, causing congestion at the central link. I use this setup to > compare > congestion avoidance algorithms such as DCTCP to other solutions. > This has worked nicely with the 4.18 kernel. However, after upgrading to 4.19 > I > cannot reproduce bufferbloat anymore. The traffic (even UDP packets) is > perfectly rate limited and I never see any congestion happening. This is > great, > but in practice it prevents me from prototyping algorithms. Ha, that's awesome! :D > My interface configuration for bottlenecked links is: > > qdisc tbf 5: dev OBcbnsw1-eth2 root refcnt 2 rate 10Mbit burst 15000b lat > 12.0ms > Sent 6042 bytes 51 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > qdisc netem 10: dev OBcbnsw1-eth2 parent 5:1 limit 500 > Sent 6042 bytes 51 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > > > I have the suspicion that it is related to the CAKE changes in the 4.19 > kernel, > but I am not exactly sure. I am not using tc cake at all. Do you maybe know > what could cause this behavior? Apologies if this is the wrong mailing > list. My guess would be changes in the TCP stack; can't point you to anything specific off the top of my head, though... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: > Your patch in the latest kernels looks simpler. Bringing the patch > back to prior kernel versions would be appreciated, but I can > understand how 3.16 becomes less and less relevant as time goes on, > although, it’s not at end of life yet. :) Could you try the latest git version of the out-of-tree version of cake; that has the same patch as upstream, and I *think* it will work on old kernels as well (for the HTB issue; can't fix HFSC without patching it). -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: >> On Jan 6, 2019, at 9:56 PM, Toke Høiland-Jørgensen wrote: >> >> Pete Heist writes: >> >>> Lastly, is using cake as a leaf to htb risky until a fix is made? I’ve >>> been doing that for a while without any apparent issues, though I’m >>> hesitating now to try that in a production environment. >> >> Hmm, that's a good question. I would expect so; but I would also expect >> the issue to show up pretty much straight away, so if you haven't hit it >> yet, I may be wrong. I'll do some more digging... Should probably also >> try to replicate all this stuff on my own machine :) > > > Ok, after what I’m seeing on my APU1 tests on 3.16.7, I’m definitely > not putting split GSO into production. I just turned it on and off > three times and here’s what I got: > > Split GSO on: > > https://www.heistp.net/downloads/htb_split_gso/htb_cake_split_gso.svg > https://www.heistp.net/downloads/htb_split_gso/htb_cake_split_gso2.svg > https://www.heistp.net/downloads/htb_split_gso/htb_cake_split_gso3.svg > > Split GSO off: > > https://www.heistp.net/downloads/htb_split_gso/htb_cake_no_split_gso.svg > https://www.heistp.net/downloads/htb_split_gso/htb_cake_no_split_gso2.svg > https://www.heistp.net/downloads/htb_split_gso/htb_cake_no_split_gso3.svg > > I’ve seen these square waves before with htb and wondered where they > came from, and I think we may finally have an answer! What manner of > thing causes this I don’t know, but there’s a chance you may end up > finding out… :) Is this without the patch to CAKE that adjusts the qlen? And have you tried running with that patch (with HTB)? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH 1/4] sched: Avoid dereferencing skb pointer after child enqueue
From: Toke Høiland-Jørgensen Parent qdiscs may dereference the pointer to the enqueued skb after enqueue. However, both CAKE and TBF call consume_skb() on the original skb when splitting GSO packets, leading to a potential use-after-free in the parent. Fix this by avoiding dereferencing the skb pointer after enqueueing to the child. Signed-off-by: Toke Høiland-Jørgensen --- net/sched/sch_cbs.c| 3 ++- net/sched/sch_drr.c| 3 ++- net/sched/sch_dsmark.c | 3 ++- net/sched/sch_hfsc.c | 5 ++--- net/sched/sch_htb.c| 3 ++- net/sched/sch_prio.c | 3 ++- net/sched/sch_qfq.c| 16 +--- net/sched/sch_tbf.c| 3 ++- 8 files changed, 23 insertions(+), 16 deletions(-) diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c index e689e11b6d0f..c6a502933fe7 100644 --- a/net/sched/sch_cbs.c +++ b/net/sched/sch_cbs.c @@ -88,13 +88,14 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct Qdisc *child, struct sk_buff **to_free) { + unsigned int len = qdisc_pkt_len(skb); int err; err = child->ops->enqueue(skb, child, to_free); if (err != NET_XMIT_SUCCESS) return err; - qdisc_qstats_backlog_inc(sch, skb); + sch->qstats.backlog += len; sch->q.qlen++; return NET_XMIT_SUCCESS; diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c index cdebaed0f8cf..feaf47178653 100644 --- a/net/sched/sch_drr.c +++ b/net/sched/sch_drr.c @@ -350,6 +350,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch, static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { + unsigned int len = qdisc_pkt_len(skb); struct drr_sched *q = qdisc_priv(sch); struct drr_class *cl; int err = 0; @@ -376,7 +377,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch, cl->deficit = cl->quantum; } - qdisc_qstats_backlog_inc(sch, skb); + sch->qstats.backlog += len; sch->q.qlen++; return err; } diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c index f6f480784bc6..42471464ded3 100644 --- a/net/sched/sch_dsmark.c +++ b/net/sched/sch_dsmark.c @@ -199,6 +199,7 @@ static struct tcf_block *dsmark_tcf_block(struct Qdisc *sch, unsigned long cl, static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { + unsigned int len = qdisc_pkt_len(skb); struct dsmark_qdisc_data *p = qdisc_priv(sch); int err; @@ -271,7 +272,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch, return err; } - qdisc_qstats_backlog_inc(sch, skb); + sch->qstats.backlog += len; sch->q.qlen++; return NET_XMIT_SUCCESS; diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c index b18ec1f6de60..6bb8f73a8473 100644 --- a/net/sched/sch_hfsc.c +++ b/net/sched/sch_hfsc.c @@ -1539,6 +1539,7 @@ hfsc_dump_qdisc(struct Qdisc *sch, struct sk_buff *skb) static int hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { + unsigned int len = qdisc_pkt_len(skb); struct hfsc_class *cl; int uninitialized_var(err); @@ -1560,8 +1561,6 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) } if (cl->qdisc->q.qlen == 1) { - unsigned int len = qdisc_pkt_len(skb); - if (cl->cl_flags & HFSC_RSC) init_ed(cl, len); if (cl->cl_flags & HFSC_FSC) @@ -1576,7 +1575,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) } - qdisc_qstats_backlog_inc(sch, skb); + sch->qstats.backlog += len; sch->q.qlen++; return NET_XMIT_SUCCESS; diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 58b449490757..30f9da7e1076 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -581,6 +581,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { int uninitialized_var(ret); + unsigned int len = qdisc_pkt_len(skb); struct htb_sched *q = qdisc_priv(sch); struct htb_class *cl = htb_classify(skb, sch, ); @@ -610,7 +611,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch, htb_activate(q, cl); } - qdisc_qstats_backlog_inc(sch, skb); + sch->qstats.backlog += len; sch->q.qlen++; return NET_XMIT_SUCCESS; } diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index cdf68706e40f..847141cd900f 100644 --- a/net/sched/sch_prio.c +++ b/net/sched/sch_prio.c @@ -72,6 +72,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qer
[Cake] [PATCH 4/4] sch_cake: Correctly update parent qlen when splitting GSO packets
From: Toke Høiland-Jørgensen To ensure parent qdiscs have the same notion of the number of enqueued packets even after splitting a GSO packet, update the qdisc tree with the number of packets that was added due to the split. Signed-off-by: Toke Høiland-Jørgensen --- net/sched/sch_cake.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index b910cd5c56f7..73940293700d 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1667,7 +1667,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) { struct sk_buff *segs, *nskb; netdev_features_t features = netif_skb_features(skb); - unsigned int slen = 0; + unsigned int slen = 0, numsegs = 0; segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); if (IS_ERR_OR_NULL(segs)) @@ -1683,6 +1683,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, flow_queue_add(flow, segs); sch->q.qlen++; + numsegs++; slen += segs->len; q->buffer_used += segs->truesize; b->packets++; @@ -1696,7 +1697,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, sch->qstats.backlog += slen; q->avg_window_bytes += slen; - qdisc_tree_reduce_backlog(sch, 1, len); + qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen); consume_skb(skb); } else { /* not splitting */ -- 2.20.1 ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH 3/4] sch_api: Allow reducing queue backlog by a negative value
From: Toke Høiland-Jørgensen With GSO splitting in sch_cake, we can decrease as well as increase the qlen. To make it possible to signal this up the qdisc tree, change qdisc_tree_reduce_backlog() to accept signed integer values as the number of packets and bytes to reduce the backlog by. Signed-off-by: Toke Høiland-Jørgensen --- include/net/sch_generic.h | 3 +-- net/sched/sch_api.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 9481f2c142e2..7a4957599874 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -580,8 +580,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue, void qdisc_reset(struct Qdisc *qdisc); void qdisc_put(struct Qdisc *qdisc); void qdisc_put_unlocked(struct Qdisc *qdisc); -void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n, - unsigned int len); +void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, int n, int len); #ifdef CONFIG_NET_SCHED int qdisc_offload_dump_helper(struct Qdisc *q, enum tc_setup_type type, void *type_data); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 7e4d1ccf4c87..03e26e8d0ec9 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -758,8 +758,7 @@ static u32 qdisc_alloc_handle(struct net_device *dev) return 0; } -void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n, - unsigned int len) +void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len) { bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED; const struct Qdisc_class_ops *cops; -- 2.20.1 ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH 0/4] sched: Fix qdisc interactions exposed by using sch_cake as a leaf qdisc
This series fixes a couple of issues exposed by running sch_cake as a leaf qdisc in an HFSC tree, which were discovered and reported by Pete Heist. The interaction between CAKE's GSO splitting and the parent qdisc's notion of its own queue length could cause queue stalls. While investigating the report, I also noticed that several qdiscs would dereference the skb pointer after dequeue, which is potentially problematic since the GSO splitting code also frees the original skb. See the individual patches in the series for details. Toke Høiland-Jørgensen (4): sched: Avoid dereferencing skb pointer after child enqueue sched: Fix detection of empty queues in child qdiscs sch_api: Allow reducing queue backlog by a negative value sch_cake: Correctly update parent qlen when splitting GSO packets include/net/sch_generic.h | 3 +-- net/sched/sch_api.c | 3 +-- net/sched/sch_cake.c | 5 +++-- net/sched/sch_cbs.c | 3 ++- net/sched/sch_drr.c | 7 +-- net/sched/sch_dsmark.c| 3 ++- net/sched/sch_hfsc.c | 9 + net/sched/sch_htb.c | 3 ++- net/sched/sch_prio.c | 3 ++- net/sched/sch_qfq.c | 20 net/sched/sch_tbf.c | 3 ++- 11 files changed, 37 insertions(+), 25 deletions(-) -- 2.20.1 ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: >> On Jan 4, 2019, at 10:34 PM, Toke Høiland-Jørgensen wrote: >> >> Pete Heist writes: >> >>> Ok, the lockup goes away if you use no-split-gso on the cake qdiscs for the >>> default traffic (noted below in the drr and hfsc cases with "!!! must use >>> no-split-gso here !!!"). Only I’d like my 600 μs back. :) >>> >>> This smells of a bug Toke fixed on Sep 12, 2018 in >>> 42e87f12ea5c390bf5eeb658c942bc810046160a, but then reverted in the next >>> commit because it was fixed upstream. However, if I re-apply that commit, >>> it still doesn’t fix it. >>> >>> Perhaps there are more cases where skb_reset_mac_len(skb) needs to be >>> called somewhere for VLAN support? >>> >>> I managed to capture some output from what happens to hfsc: >>> >>> [ 683.864456] [ cut here ] >>> [ 683.869116] WARNING: CPU: 1 PID: 11 at net/sched/sch_hfsc.c:1427 >>> 0xf9ced4ef() >> >> So this seems to be this line: >> >> WARN_ON(next_time == 0); >> >> See >> https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_hfsc.c#L1427 >> >> Which seems to indicate that HFSC can't find the next class to schedule. >> Not entirely sure why, nor why this only happens with CAKE as a qdisc. >> But I don't think it's actually an infinite loop that's causing it... > > > Ok, fwiw one doesn’t actually need a one-armed router or VLANs to reproduce > this. Just do this: > > tc qdisc add dev $IFACE root handle 1: hfsc default 1 > tc class add dev $IFACE parent 1: classid 1:1 hfsc ls rate $RATE ul rate $RATE > tc qdisc add dev $IFACE parent 1:1 cake # add split-gso here, or else… > > I’ve tried it as far as 4.9.0-8, but no farther. It’s not much of a > priority for me now that I have a workaround for it... Ah, I think I know what's going on: On enqueue, HFSC will increase its own internal notion of qlen (q.qlen++ in hfsc_enqueue()), and in dequeue, it will return immediately if this qlen is 0. Now, with GSO packet splitting, a single packet on enqueue can turn in to several packets on dequeue, which means that HFSC will think the queue is empty after dequeueing the first on, and refuse to dequeue any more packets. This basically means that we can't use CAKE as a leaf qdisc with GSO splitting as it stands currently. I *think* the solution is for CAKE to notify its parents; could you try the patch below and see if it helps? -Toke diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index b910cd5c56f7..77b0ebd673ac 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1617,6 +1617,30 @@ static u32 cake_classify(struct Qdisc *sch, struct cake_tin_data **t, static void cake_reconfigure(struct Qdisc *sch); +void adjust_parent_qlen(struct Qdisc *sch, unsigned int n, + unsigned int len) +{ + u32 parentid; + if (n == 0 && len == 0) + return; + rcu_read_lock(); + while ((parentid = sch->parent)) { + if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS)) + break; + + if (sch->flags & TCQ_F_NOPARENT) + break; + sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid)); + if (sch == NULL) { + WARN_ON_ONCE(parentid != TC_H_ROOT); + break; + } + sch->q.qlen += n; + sch->qstats.backlog += len; + } + rcu_read_unlock(); +} + static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { @@ -1667,7 +1691,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) { struct sk_buff *segs, *nskb; netdev_features_t features = netif_skb_features(skb); - unsigned int slen = 0; + unsigned int slen = 0, numsegs = 0; segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); if (IS_ERR_OR_NULL(segs)) @@ -1684,6 +1708,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, sch->q.qlen++; slen += segs->len; + numsegs++; q->buffer_used += segs->truesize; b->packets++; segs = nskb; @@ -1696,7 +1721,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, sch->qstats.backlog += slen; q->avg_window_bytes += slen; - qdisc_tree_reduce_backlog(sch, 1, len); + adjust_parent_qlen(sch, numsegs - 1, slen - len); consume_skb(skb); } else { /* not splitting */ ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic
Pete Heist writes: >> On Jan 3, 2019, at 12:03 PM, Toke Høiland-Jørgensen wrote: >> >>> Jon, is there anything I can check by instrumenting the code somewhere >>> specific? >> >> Is there any way you could test with a bulk UDP flow? I'm wondering >> whether this is a second-order effect where TCP ACKs are limited in a >> way that cause the imbalance? Are you using ACK compression? > > > Not using ack-filter, if that’s what’s meant by ACK compression. I > thought about the TCP ACK traffic, but would be very surprised if that > amount of ACK traffic could cause that large of an imbalance, although > it’s worth trying to find out. > > I tried iperf3 in UDP mode, but cake is treating these flows > aggressively. I get the impression that cake penalizes flows heavily > that do not respond to congestion control signals. If I pit one 8 TCP > flows against a single UDP flow at 40mbit, the UDP flow goes into a > death spiral with increasing drops over time (iperf3 output attached). > > I’m not sure there’d be any way I can test fairness with iperf3 in UDP > mode. We’d need something that has some congestion control feedback, > right? Otherwise, I don’t think there are any rates I can choose to > both reach saturation and not be severely punished. And if it has > congestion control feedback, it has the ACK-like traffic we’re trying > to avoid for the test. :) Try setting cake to 'interplanetary' - that should basically turn off the AQM dropping... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH 0/4] sched: Fix qdisc interactions exposed by using sch_cake as a leaf qdisc
David Miller writes: > From: Toke Høiland-Jørgensen > Date: Mon, 7 Jan 2019 20:47:29 +0100 > >> This series fixes a couple of issues exposed by running sch_cake as a >> leaf qdisc in an HFSC tree, which were discovered and reported by Pete >> Heist. The interaction between CAKE's GSO splitting and the parent >> qdisc's notion of its own queue length could cause queue stalls. While >> investigating the report, I also noticed that several qdiscs would >> dereference the skb pointer after dequeue, which is potentially >> problematic since the GSO splitting code also frees the original skb. >> >> See the individual patches in the series for details. > > Toke, can you please resubmit this without patch #3. > > If you want to push for patch #3 still, it is much easier to submit > it separately. Can do :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: >> On Jan 8, 2019, at 11:27 PM, Toke Høiland-Jørgensen wrote: >> >> Pete Heist writes: >> >>> Your patch in the latest kernels looks simpler. Bringing the patch >>> back to prior kernel versions would be appreciated, but I can >>> understand how 3.16 becomes less and less relevant as time goes on, >>> although, it’s not at end of life yet. :) >> >> Could you try the latest git version of the out-of-tree version of cake; >> that has the same patch as upstream, and I *think* it will work on old >> kernels as well (for the HTB issue; can't fix HFSC without patching it). > > Yes, it compiles fine on 3.16, and gives statistically identical > results for HTB. :) Excellent! That's at least one bug squashed :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH 1/4] sched: Avoid dereferencing skb pointer after child enqueue
Cong Wang writes: > On Mon, Jan 7, 2019 at 11:50 AM Toke Høiland-Jørgensen wrote: >> @@ -1254,7 +1256,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct >> Qdisc *sch, >> if (cl->qdisc->q.qlen != 1) { >> if (unlikely(skb == cl->qdisc->ops->peek(cl->qdisc)) && > > > Isn't this comparison problematic too? While you are on it... Well, I was only looking at safety issues, and since it's not dereferencing the pointer, that's not really an issue here. The check is just going to always fail if GSO splitting is enabled. Which I'm not actually sure is an error in this case? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake infinite loop(?) with hfsc on one-armed router
Pete Heist writes: > Ok, the lockup goes away if you use no-split-gso on the cake qdiscs for the > default traffic (noted below in the drr and hfsc cases with "!!! must use > no-split-gso here !!!"). Only I’d like my 600 μs back. :) > > This smells of a bug Toke fixed on Sep 12, 2018 in > 42e87f12ea5c390bf5eeb658c942bc810046160a, but then reverted in the next > commit because it was fixed upstream. However, if I re-apply that commit, it > still doesn’t fix it. > > Perhaps there are more cases where skb_reset_mac_len(skb) needs to be called > somewhere for VLAN support? > > I managed to capture some output from what happens to hfsc: > > [ 683.864456] [ cut here ] > [ 683.869116] WARNING: CPU: 1 PID: 11 at net/sched/sch_hfsc.c:1427 > 0xf9ced4ef() So this seems to be this line: WARN_ON(next_time == 0); See https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_hfsc.c#L1427 Which seems to indicate that HFSC can't find the next class to schedule. Not entirely sure why, nor why this only happens with CAKE as a qdisc. But I don't think it's actually an infinite loop that's causing it... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Munin-Plugins
Ruben writes: > Hey guys, > > I've already mentioned this in a response to dtaht on GitHub, but here > again for everyone: > > I was wondering if it's possible to extend the tin statistics by > packets for backlog. Why do you need packets when there's already bytes? > Also I haven't found details on the man page, but haven't looked in > the code how exactly flows are defined for the new 2 level fair > queueings. A flow is a flow (hash on the regular 5-tuple). Unless you install custom tc filters, in which case a flow could be anything you want it to be... > I would like to show hosts as well as flows in a graph, in the various > states as summary as well as per tin. But hosts aren't shown in the > statistics. But they could be helpful to see unexpected rises if > someone is abusing the system. Hmm, guess we could do "number of hosts per tin" accounting as well as flows. But it would be bloating up the already bloated statistics. But on the other hand, they are already quite bloated, so another pair of stats maybe wouldn't make much of a difference? Does anyone else have an opinion on this? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net] gso_segment: Reset skb->mac_len after modifying network header
On 13 September 2018 16:11:18 CEST, Eric Dumazet wrote: > > >On 09/11/2018 03:19 PM, Toke Høiland-Jørgensen wrote: >> When splitting a GSO segment that consists of encapsulated packets, >the >> skb->mac_len of the segments can end up being set wrong, causing >packet >> drops in particular when using act_mirred and ifb interfaces in >> combination with a qdisc that splits GSO packets. >> >> This happens because at the time skb_segment() is called, >network_header >> will point to the inner header, throwing off the calculation in >> skb_reset_mac_len(). The network_header is subsequently adjust by the >> outer IP gso_segment handlers, but they don't set the mac_len. >> >> Fix this by adding skb_reset_mac_len() calls to both the IPv4 and >IPv6 >> gso_segment handlers, after they modify the network_header. >> >> Signed-off-by: Toke Høiland-Jørgensen > >Looks good but I would have appreciated a thanks or something >after the help I gave on this problem. Yes, of course. Should have mentioned that in the commit message. My apologies, won't happen again. And thanks! :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH net v2] gso_segment: Reset skb->mac_len after modifying network header
When splitting a GSO segment that consists of encapsulated packets, the skb->mac_len of the segments can end up being set wrong, causing packet drops in particular when using act_mirred and ifb interfaces in combination with a qdisc that splits GSO packets. This happens because at the time skb_segment() is called, network_header will point to the inner header, throwing off the calculation in skb_reset_mac_len(). The network_header is subsequently adjust by the outer IP gso_segment handlers, but they don't set the mac_len. Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6 gso_segment handlers, after they modify the network_header. Many thanks to Eric Dumazet for his help in identifying the cause of the bug. Acked-by: Dave Taht Reviewed-by: Eric Dumazet Signed-off-by: Toke Høiland-Jørgensen --- v2: - Properly credit Eric for his help - Add review and ack tags net/ipv4/af_inet.c | 1 + net/ipv6/ip6_offload.c | 1 + 2 files changed, 2 insertions(+) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 20fda8fb8ffd..1fbe2f815474 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1377,6 +1377,7 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb, if (encap) skb_reset_inner_headers(skb); skb->network_header = (u8 *)iph - skb->head; + skb_reset_mac_len(skb); } while ((skb = skb->next)); out: diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index 37ff4805b20c..c7e495f12011 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -115,6 +115,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, payload_len = skb->len - nhoff - sizeof(*ipv6h); ipv6h->payload_len = htons(payload_len); skb->network_header = (u8 *)ipv6h - skb->head; + skb_reset_mac_len(skb); if (udpfrag) { int err = ip6_find_1stfragopt(skb, ); -- 2.18.0 ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH iproute2] q_cake: Also print nonat, nowash and no-ack-filter keywords
Similar to the previous patch for no-split-gso, the negative keywords for 'nat', 'wash' and 'ack-filter' were not printed either. Add those well. Signed-off-by: Toke Høiland-Jørgensen --- tc/q_cake.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tc/q_cake.c b/tc/q_cake.c index 077bf84f..e827e3f1 100644 --- a/tc/q_cake.c +++ b/tc/q_cake.c @@ -468,6 +468,8 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) if (nat) print_string(PRINT_FP, NULL, "nat ", NULL); + else + print_string(PRINT_FP, NULL, "nonat ", NULL); print_bool(PRINT_JSON, "nat", NULL, nat); if (tb[TCA_CAKE_WASH] && @@ -508,6 +510,8 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) if (wash) print_string(PRINT_FP, NULL, "wash ", NULL); + else + print_string(PRINT_FP, NULL, "nowash ", NULL); print_bool(PRINT_JSON, "wash", NULL, wash); if (ingress) @@ -520,7 +524,7 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) else if (ack_filter == CAKE_ACK_FILTER) print_string(PRINT_ANY, "ack-filter", "ack-filter ", "enabled"); else - print_string(PRINT_JSON, "ack-filter", NULL, "disabled"); + print_string(PRINT_ANY, "ack-filter", "no-ack-filter ", "disabled"); if (split_gso) print_string(PRINT_FP, NULL, "split-gso ", NULL); -- 2.18.0 ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Munin-Plugins
Ruben writes: > Hey Toke, > > Am 13.09.2018 21:12 schrieb Toke Høiland-Jørgensen : > > > Ruben writes: > > > Hey Toke, > > > > Thanks for your fast response! > > > > Am 13.09.2018 12:27 schrieb Toke Høiland-Jørgensen : > > > > > > Ruben writes: > > > > > Hey guys, > > > > > > I've already mentioned this in a response to dtaht on GitHub, but > here > > > again for everyone: > > > > > > I was wondering if it's possible to extend the tin statistics by > > > packets for backlog. > > > > Why do you need packets when there's already bytes? > > > > Easy: dtaht requested a packets graph with ecn marks, which is also > > packets, so backlog as bytes do not fit, backlog as packets do. > > > > The idea was to do a multi-graph which is one graph with combined > > stats for all tins and sub graphs for all tins. > > > > On the main graph a backlog in packets is available, but I would need > > to leave out the backlog for the tins, which is somewhat confusing. > > Why not just do both backlogs in bytes? > > There's no counter for ecn marked packets in bytes, so it's impossible to > implement it that way, too. > > ECN-Marks is in packets, so everything else need to be in packets as > well. Hmm, so the obvious follow-up question would be "why do you need to have backlog and number of drops on the same graph?" :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Munin-Plugins
Ruben writes: > Hey Toke, > > Thanks for your fast response! > > Am 13.09.2018 12:27 schrieb Toke Høiland-Jørgensen : > > > Ruben writes: > > > Hey guys, > > > > I've already mentioned this in a response to dtaht on GitHub, but here > > again for everyone: > > > > I was wondering if it's possible to extend the tin statistics by > > packets for backlog. > > Why do you need packets when there's already bytes? > > Easy: dtaht requested a packets graph with ecn marks, which is also > packets, so backlog as bytes do not fit, backlog as packets do. > > The idea was to do a multi-graph which is one graph with combined > stats for all tins and sub graphs for all tins. > > On the main graph a backlog in packets is available, but I would need > to leave out the backlog for the tins, which is somewhat confusing. Why not just do both backlogs in bytes? > > Also I haven't found details on the man page, but haven't looked in > > the code how exactly flows are defined for the new 2 level fair > > queueings. > > A flow is a flow (hash on the regular 5-tuple). Unless you install > custom tc filters, in which case a flow could be anything you want it to > be... > > > I would like to show hosts as well as flows in a graph, in the various > > states as summary as well as per tin. But hosts aren't shown in the > > statistics. But they could be helpful to see unexpected rises if > > someone is abusing the system. > > Hmm, guess we could do "number of hosts per tin" accounting as well as > flows. But it would be bloating up the already bloated statistics. But > on the other hand, they are already quite bloated, so another pair of > stats maybe wouldn't make much of a difference? > > > How important is a row more or less? It's quite verbose already (what > I like) - as long as the number is available without much additionally > work needed. It's the additional complexity of recording the stats internally in cake I'm worried about, not the extra lines of output :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] SCE/CAKE coding curiosities
Kevin Darbyshire-Bryant writes: > Ahoy there team Bufferboat :-) > > I’ve been looking at the new SCE related code and a few questions have fallen > out of the brain cell. > > 1) > https://github.com/dtaht/sch_cake/blob/47d821f89f39c1b2216d6f65b014f609e46fc57c/sce.h#L50 > Curious as to why htons isn’t used here as per other instances in > CAKE? No idea about this. > 2) Immediately below that line we have a skb length guard before > trying to set ECT ultimately via ipv?_change_dsfield - we have other > places in CAKE where we play with dsfield bits that don’t have an > (obvious) guard - should they? > > 3) And saving the most curious for last, in cake_update_flowkeys we check the > protocol again (we want IPv4 packets only) but we use tc_skb_protocol(skb) > https://github.com/dtaht/sch_cake/blob/47d821f89f39c1b2216d6f65b014f609e46fc57c/sch_cake.c#L628 > > tc_skb_protocol says "/* We need to take extra care in case the skb came via > * vlan accelerated path. In that case, use skb->vlan_proto > * as the original vlan header was already stripped. > */ > if (skb_vlan_tag_present(skb)) > return skb->vlan_proto; > return skb->protocol; > " > > What??? Should cake_handle_diffserv use tc_skb_protocol too? and > INET_ECN_set_sce ?? Yup, I believe you are right about both of these; nice catch! I pushed fixes for cake_handle_diffserv(); keeping my hands off the sce bits for now ;) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH iproute2] q_cake: Add support for setting the fwmark option
This adds support for the newly added fwmark option to CAKE, which allows overriding the tin selection from the per-packet firewall marks. The fwmark field is a bitmask that is applied to the fwmark to select the tin. Signed-off-by: Toke Høiland-Jørgensen --- man/man8/tc-cake.8 | 16 tc/q_cake.c| 24 2 files changed, 40 insertions(+) diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8 index eda436e1..8c57eadd 100644 --- a/man/man8/tc-cake.8 +++ b/man/man8/tc-cake.8 @@ -91,6 +91,10 @@ TIME | LIMIT ] .br [ +.BR fwmark +MASK ] +.br +[ .BR ptm | .BR atm @@ -524,6 +528,18 @@ preset on the modern Internet is firmly discouraged. .br Voice (CS7, CS6, EF, VA, TOS4), 25% threshold, reduced Codel interval. +.PP +.B fwmark +MASK +.br + This options turns on fwmark-based overriding of CAKE's tin selection. +If set, the option specifies a bitmask that will be applied to the fwmark +associated with each packet. If the result of this masking is non-zero, the +result will be right-shifted by the number of least-significant unset bits in +the mask value, and the result will be used as a the tin number for that packet. +This can be used to set policies in a firewall script that will override CAKE's +built-in tin selection. + .SH OTHER PARAMETERS .B memlimit LIMIT diff --git a/tc/q_cake.c b/tc/q_cake.c index e827e3f1..307a12c0 100644 --- a/tc/q_cake.c +++ b/tc/q_cake.c @@ -82,6 +82,7 @@ static void explain(void) "[ split-gso* | no-split-gso ]\n" "[ ack-filter | ack-filter-aggressive | no-ack-filter* ]\n" "[ memlimit LIMIT ]\n" +"[ fwmark MASK ]\n" "[ ptm | atm | noatm* ] [ overhead N | conservative | raw* ]\n" "[ mpu N ] [ ingress | egress* ]\n" "(* marks defaults)\n"); @@ -106,6 +107,7 @@ static int cake_parse_opt(struct qdisc_util *qu, int argc, char **argv, int autorate = -1; int ingress = -1; int overhead = 0; + int fwmark = -1; int wash = -1; int nat = -1; int atm = -1; @@ -332,6 +334,16 @@ static int cake_parse_opt(struct qdisc_util *qu, int argc, char **argv, "Illegal value for \"memlimit\": \"%s\"\n", *argv); return -1; } + } else if (strcmp(*argv, "fwmark") == 0) { + unsigned int fwm; + + NEXT_ARG(); + if (get_u32(, *argv, 0)) { + fprintf(stderr, + "Illegal value for \"fwmark\": \"%s\"\n", *argv); + return -1; + } + fwmark = fwm; } else if (strcmp(*argv, "help") == 0) { explain(); return -1; @@ -376,6 +388,9 @@ static int cake_parse_opt(struct qdisc_util *qu, int argc, char **argv, if (memlimit) addattr_l(n, 1024, TCA_CAKE_MEMORY, , sizeof(memlimit)); + if (fwmark != -1) + addattr_l(n, 1024, TCA_CAKE_FWMARK, , + sizeof(fwmark)); if (nat != -1) addattr_l(n, 1024, TCA_CAKE_NAT, , sizeof(nat)); if (wash != -1) @@ -409,6 +424,7 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) struct rtattr *tb[TCA_CAKE_MAX + 1]; unsigned int interval = 0; unsigned int memlimit = 0; + unsigned int fwmark = 0; __u64 bandwidth = 0; int ack_filter = 0; int split_gso = 0; @@ -507,6 +523,10 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) RTA_PAYLOAD(tb[TCA_CAKE_RTT]) >= sizeof(__u32)) { interval = rta_getattr_u32(tb[TCA_CAKE_RTT]); } + if (tb[TCA_CAKE_FWMARK] && + RTA_PAYLOAD(tb[TCA_CAKE_FWMARK]) >= sizeof(__u32)) { + fwmark = rta_getattr_u32(tb[TCA_CAKE_FWMARK]); + } if (wash) print_string(PRINT_FP, NULL, "wash ", NULL); @@ -559,6 +579,10 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) sprint_size(memlimit, b1)); } + if (fwmark) + print_uint(PRINT_FP, NULL, "fwmark 0x%x ", fwmark); + print_0xhex(PRINT_JSON, "fwmark", NULL, fwmark); + return 0; } -- 2.21.0 ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH net 1/2] sch_cake: Use tc_skb_protocol() helper for getting packet protocol
We shouldn't be using skb->protocol directly as that will miss cases with hardware-accelerated VLAN tags. Use the helper instead to get the right protocol number. Reported-by: Kevin Darbyshire-Bryant Signed-off-by: Toke Høiland-Jørgensen --- net/sched/sch_cake.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index acc9b9da985f..a3b55e18df04 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1519,7 +1519,7 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) { u8 dscp; - switch (skb->protocol) { + switch (tc_skb_protocol(skb)) { case htons(ETH_P_IP): dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2; if (wash && dscp) ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH net 2/2] sch_cake: Make sure we can write the IP header before changing DSCP bits
There is not actually any guarantee that the IP headers are valid before we access the DSCP bits of the packets. Fix this using the same approach taken in sch_dsmark. Reported-by: Kevin Darbyshire-Bryant Signed-off-by: Toke Høiland-Jørgensen --- net/sched/sch_cake.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index a3b55e18df04..259d97bc2abd 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1517,16 +1517,27 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free) static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) { + int wlen = skb_network_offset(skb); u8 dscp; switch (tc_skb_protocol(skb)) { case htons(ETH_P_IP): + wlen += sizeof(struct iphdr); + if (!pskb_may_pull(skb, wlen) || + skb_try_make_writable(skb, wlen)) + return 0; + dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2; if (wash && dscp) ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, 0); return dscp; case htons(ETH_P_IPV6): + wlen += sizeof(struct ipv6hdr); + if (!pskb_may_pull(skb, wlen) || + skb_try_make_writable(skb, wlen)) + return 0; + dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2; if (wash && dscp) ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, 0); ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH net 0/2] sched: A few small fixes for sch_cake
Hi Dave Kevin noticed a few issues with the way CAKE reads the skb protocol and the IP diffserv fields. This series fixes those two issues, and should probably go to in 4.19 as well. However, the previous refactoring patch means they don't apply as-is; I can send a follow-up directly to stable if that's OK with you? -Toke --- Toke Høiland-Jørgensen (2): sch_cake: Use tc_skb_protocol() helper for getting packet protocol sch_cake: Make sure we can write the IP header before changing DSCP bits net/sched/sch_cake.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net 2/2] sch_cake: Make sure we can write the IP header before changing DSCP bits
Stephen Hemminger writes: > On Thu, 04 Apr 2019 22:44:33 +0200 > Toke Høiland-Jørgensen wrote: > >> Stephen Hemminger writes: >> >> > On Thu, 04 Apr 2019 15:01:33 +0200 >> > Toke Høiland-Jørgensen wrote: >> > >> >> static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) >> >> { >> >> + int wlen = skb_network_offset(skb); >> > >> > In theory this could be negative, you should handle that? >> > Rather than calling may_pull() with a huge unsigned value. >> >> Huh, that would imply that skb->network_header points to before >> skb->head; when does that happen? >> >> Also, pskb_may_pull() does check for len > skb->len, so I guess a >> follow-up question would be, "does it happen often enough to warrant >> handling at this level"? >> >> Also, I copied that bit from sch_dsmark, so if you really thing it needs >> to be fixed, I guess we should fix both... >> >> -Toke > > It should never happen just paranoid Ah, right. Well in that case I think any overflow is handled fine inside pskb_may_pull(): if (unlikely(len > skb->len)) return 0; -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH for-4.19 2/3] sch_cake: Use tc_skb_protocol() helper for getting packet protocol
Commit bbd669a868bba591ffd38b7bc75a7b361bb54b04 upstream. We shouldn't be using skb->protocol directly as that will miss cases with hardware-accelerated VLAN tags. Use the helper instead to get the right protocol number. Reported-by: Kevin Darbyshire-Bryant Signed-off-by: Toke Høiland-Jørgensen --- net/sched/sch_cake.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 640f00e9f665..de92b5d81ca6 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1512,7 +1512,7 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) { u8 dscp; - switch (skb->protocol) { + switch (tc_skb_protocol(skb)) { case htons(ETH_P_IP): dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2; if (wash && dscp) ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH for-4.19 0/3] Backport of series: 'sched: A few small fixes for sch_cake' from 5.1
Hi Greg This series contains backports for a couple of fixes to sch_cake that was just merged for 5.1. This series backports an earlier refactoring commit, which makes the fixes themselves apply cleanly from upstream. -Toke --- Toke Høiland-Jørgensen (3): sch_cake: Simplify logic in cake_select_tin() sch_cake: Use tc_skb_protocol() helper for getting packet protocol sch_cake: Make sure we can write the IP header before changing DSCP bits net/sched/sch_cake.c | 57 +- 1 file changed, 28 insertions(+), 29 deletions(-) ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH for-4.19 0/3] Backport of series: 'sched: A few small fixes for sch_cake' from 5.1
Greg Kroah-Hartman writes: > On Fri, Apr 05, 2019 at 02:13:18PM +0200, Toke Høiland-Jørgensen wrote: >> Greg Kroah-Hartman writes: >> >> > On Fri, Apr 05, 2019 at 12:28:21PM +0200, Toke Høiland-Jørgensen wrote: >> >> Hi Greg >> >> >> >> This series contains backports for a couple of fixes to sch_cake that was >> >> just >> >> merged for 5.1. This series backports an earlier refactoring commit, >> >> which makes >> >> the fixes themselves apply cleanly from upstream. >> > >> > You have read: >> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html >> > for how to submit networking patches to the stable trees, right? >> > >> > I suggest trying it that way first... >> >> Yeah, Dave already queued the original fixes up for stable, but they are >> not going to apply cleanly on 4.19; hence the first patch in this >> series. >> >> I thought it was better to just include the full series with that, for >> context, but maybe that was wrong? Should I just have sent the first >> one? If so, feel free to just take the first patch in this series and >> let the others go through the usual stable submission process... > > Dave queues up and sends me the stable backports for networking code, > as the document states. If there is a special series needed for 4.19, > I'm sure he would be glad to take them. Or, I can take them directly, > after I have the 5.0 series queued up, if I get an ack from him. > > But to seemingly circumvent the normal process, isn't ok, I need to > know that at least you tried :) Well I did ask Dave if he wanted me to supply a backport, see https://marc.info/?l=linux-netdev=155440061002032=2 - I just thought sending that directly to you was the natural thing to do. Certainly wasn't trying to circumvent process, just haven't done any backports for the net tree before :) Actually, looking at the log again now, I see that I got my versions mixed up, and the backport patch is also needed for 5.0... so I guess I'll just (re-)submit that to Dave to put into his stable queue... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net 2/2] sch_cake: Make sure we can write the IP header before changing DSCP bits
Stephen Hemminger writes: > On Thu, 04 Apr 2019 15:01:33 +0200 > Toke Høiland-Jørgensen wrote: > >> static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) >> { >> +int wlen = skb_network_offset(skb); > > In theory this could be negative, you should handle that? > Rather than calling may_pull() with a huge unsigned value. Huh, that would imply that skb->network_header points to before skb->head; when does that happen? Also, pskb_may_pull() does check for len > skb->len, so I guess a follow-up question would be, "does it happen often enough to warrant handling at this level"? Also, I copied that bit from sch_dsmark, so if you really thing it needs to be fixed, I guess we should fix both... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> I also equally aware that this is ‘creeping featuritis’ and doing > nothing to speed cake up… Yeah, this is the crux of the issue, really: it's a tradeoff between ease of use and featuritis. Now in this case the actual impact is a single check it might actually be acceptable > actually I may have improved BESTEFFORT a little - we no longer look > for matching TC Major numbers if there’s no actual choice of tin to be > made :-) Well, you made the besteffort case slightly faster, but every other mode slightly slower... :) If you are going to send a patch (or pull request), please leave out the refactoring, and only include the feature. This makes it easier to see the impact of the feature addition on its own. Also, I assume you have a companion patch for iproute2 somewhere? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: > How unpopular would the idea of having cake look at skb->mark directly be? > > https://github.com/ldir-EDB0/sch_cake/commit/64d0e6ac9368a271221db888ab91a367fcd37ae1 > > https://github.com/ldir-EDB0/tc-adv/commit/4f16ae5d588d44f8a5c83fe2f2b7dcad97843cbc Hmm, not impossible, but seeing as we already have a way to achieve that with BPF, is it really needed? > I did the equivalent in eBPF here > https://github.com/ldir-EDB0/cls_bpf_connmark_to_caketin but I can’t > work out how to make the major number a tc command line argument into > the BPF code. You can't, but you can set the major number explicitly when you create the qdisc: $ sudo tc qdisc replace dev eth0 root handle 1: cake $ tc qdisc show dev eth0 qdisc cake 1: root refcnt 2 bandwidth unlimited diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 28 Feb 2019, at 09:54, Toke Høiland-Jørgensen wrote: >> >>> I also equally aware that this is ‘creeping featuritis’ and doing >>> nothing to speed cake up… >> >> Yeah, this is the crux of the issue, really: it's a tradeoff between >> ease of use and featuritis. Now in this case the actual impact is a >> single check it might actually be acceptable >> >>> actually I may have improved BESTEFFORT a little - we no longer look >>> for matching TC Major numbers if there’s no actual choice of tin to be >>> made :-) >> >> Well, you made the besteffort case slightly faster, but every other mode >> slightly slower... :) > > Ah, ok, tc filter really is king ;-) Well, I don't mind doing the simplification, but it should be done as a separate commit; I'll do that after mering your pull request. >> If you are going to send a patch (or pull request), please leave out the >> refactoring, and only include the feature. This makes it easier to see >> the impact of the feature addition on its own. > > Ha ha, sending a patch to the kernel lists??? Never again! Haha, no, I meant to this list. I'll submit upstream, along with the other stuff. > The non-refactored version is > https://github.com/ldir-EDB0/sch_cake/commit/f30bb18adc97c827c81c9a3297ab14bfaf2adcb0 > and I’ve created a PR Great, thanks! >> Also, I assume you have a companion patch for iproute2 somewhere? > > In the first email: https://github.com/ldir-EDB0/tc-adv/commits/fwmark > - based on 4.20.0 ‘cos that’s where openwrt is at the moment. A pull request against tc-adv would be useful :) > When I’ve found a suitable editor I was even going to update the man > page! I usually just use a regular editor and copy-paste the syntax bits from another entry... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 3 Mar 2019, at 12:22, John Sager wrote: >> >> If you are going to do that, I would suggest using a few of the upper bits >> of the 32-bit fwmark/connmark space available, rather than the lowest bits. >> Then that would allow to use fwmarks other purposes, and to use the lowest >> bits, as well in the future. As iptables allows a mask before comparison, >> then choose a specific mask for the bits you use both for setting and >> testing. > > That’s a good point and I’m sort of hoping upstream reject the current > submission on that basis. I think the ‘use of fwmarks’ needs more > thought as to how it’s done for the future - too keen to get something > out. Maybe it’s sufficient as is, I don’t know. https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0b5c7efdfc6e389ec6840579fe90bdb6f42b08dc This means it'll be in 5.1; so we have until that is released (~8 weeks or so) to set the behaviour in stone. I do think we at least need to add masking of the mark before using it for tin selection; the question is just which bits to use from it. As for setting the fwmark back in conntrack, I'm not sure I agree that this is something CAKE should be doing. Mostly because it means even tighter coupling between CAKE and the conntrack subsystem. However, I may be convinced by a sufficiently neat implementation, and anyway this is definitely something that will need to wait for 5.2 for upstream. So I think the priority is to agree on semantics for masking the fwmark when reading, and getting that implemented in a way that is compatible with both other uses of marks, and with anything we else we might want to do down the road. -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 4 Mar 2019, at 08:39, Pete Heist wrote: >> >> >>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant >>> wrote: >>> >>> The very bad idea: >>> >>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark >>> implementation as described above. So an awful lot of our >>> shenanigans above is due to DSCP not traversing the internet >>> particularly well. The solution above abstracts DSCP into ’tins’ >>> which we put into fwmarks. Another approach would be to put the DSCP >>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained >>> DSCP into the diffserv field onto the actual packets. Voila DSCP >>> traversal across ’tinternet with tin/bandwidth allocation in our >>> local domain preserved. >> >> If I understand it right, another use case for this “very bad idea” >> is preserving DSCP locally while traversing upstream WiFi links as >> besteffort, which avoids airtime efficiency problems that can occur >> with 802.11e (WMM). In cases where the router config can’t be changed >> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as >> it hides DSCP from the WiFi stack while preserving the values through >> the tunnel, but this would be easier. Neat… :) > > Everyone has understood the intent & maybe the implementation > correctly. 2 patches attached, one for cake, one for tc. > > They are naively coded and some of it undoes Toke’s recent tidying up > (sorry!) Heh. First comment: Don't do that ;) A few more below. > 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A > diff --git a/pkt_sched.h b/pkt_sched.h > index a2f570c..d1f288d 100644 > --- a/pkt_sched.h > +++ b/pkt_sched.h > @@ -879,6 +879,7 @@ enum { > TCA_CAKE_ACK_FILTER, > TCA_CAKE_SPLIT_GSO, > TCA_CAKE_FWMARK, > + TCA_CAKE_ICING, > __TCA_CAKE_MAX > }; > #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1) > diff --git a/sch_cake.c b/sch_cake.c > index 733b897..5aca0f3 100644 > --- a/sch_cake.c > +++ b/sch_cake.c > @@ -270,7 +270,8 @@ enum { > CAKE_FLAG_INGRESS = BIT(2), > CAKE_FLAG_WASH = BIT(3), > CAKE_FLAG_SPLIT_GSO= BIT(4), > - CAKE_FLAG_FWMARK = BIT(5) > + CAKE_FLAG_FWMARK = BIT(5), > + CAKE_FLAG_ICING= BIT(6) This implies that icing and fwmark can be enabled completely independently of each other. Are you sure about the semantics for that? > }; > > /* COBALT operates the Codel and BLUE algorithms in parallel, in order to > @@ -333,7 +334,7 @@ static const u8 diffserv8[] = { > }; > > static const u8 diffserv4[] = { > - 0, 2, 0, 0, 2, 0, 0, 0, > + 0, 1, 0, 0, 2, 0, 0, 0, > 1, 0, 0, 0, 0, 0, 0, 0, > 2, 0, 2, 0, 2, 0, 2, 0, > 2, 0, 2, 0, 2, 0, 2, 0, > @@ -344,7 +345,7 @@ static const u8 diffserv4[] = { > }; > > static const u8 diffserv3[] = { > - 0, 0, 0, 0, 2, 0, 0, 0, > + 0, 1, 0, 0, 2, 0, 0, 0, Why are you messing with the diffserv mappings in this patch? > 1, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, > @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, > struct sk_buff **to_free) > return idx + (tin << 16); > } > > -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) > +void cake_update_diffserv(struct sk_buff *skb, u8 dscp) > +{ > + switch (skb->protocol) { > + case htons(ETH_P_IP): > + if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp) > + ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp); > + break; > + case htons(ETH_P_IPV6): > + if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp) > + ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp); > + break; > + default: > + break; > + } > + > +} So washing is just a special case of this (wash is cake_update_diffserv(skb,0)). So you shouldn't need to add another function, just augment the existing handling code. > +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash) > { > u8 dscp; > > @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, > u16 wash) > } > } > > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) Save an ifdef below by moving the ifdef inside the function definition. > +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp) > +{ > + enum ip_conntrack_info ctinfo; > + struct nf_conn *ct; > + > + ct = nf_ct_get(skb, ); > + if (!ct) > + return; > + > + ct->mark &= 0x80ff; > + ct->mark |= (0x40 | dscp) << 24; Right, so we *might* have an argument that putting the *tin* into the fwmark is CAKE's business, but copying over the dscp mark is not something a qdisc should be doing... > + nf_conntrack_event_cache(IPCT_MARK, ct); > +} > +#endif Also, are you sure this will work in all permutations of conntrack being a module vs
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen wrote: >> >> Kevin Darbyshire-Bryant writes: >> >>>> On 4 Mar 2019, at 08:39, Pete Heist wrote: >>>> >>>> >>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant >>>>> wrote: >>>>> >>>>> The very bad idea: >>>>> >>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark >>>>> implementation as described above. So an awful lot of our >>>>> shenanigans above is due to DSCP not traversing the internet >>>>> particularly well. The solution above abstracts DSCP into ’tins’ >>>>> which we put into fwmarks. Another approach would be to put the DSCP >>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained >>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP >>>>> traversal across ’tinternet with tin/bandwidth allocation in our >>>>> local domain preserved. >>>> >>>> If I understand it right, another use case for this “very bad idea” >>>> is preserving DSCP locally while traversing upstream WiFi links as >>>> besteffort, which avoids airtime efficiency problems that can occur >>>> with 802.11e (WMM). In cases where the router config can’t be changed >>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as >>>> it hides DSCP from the WiFi stack while preserving the values through >>>> the tunnel, but this would be easier. Neat… :) >>> >>> Everyone has understood the intent & maybe the implementation >>> correctly. 2 patches attached, one for cake, one for tc. >>> >>> They are naively coded and some of it undoes Toke’s recent tidying up >>> (sorry!) >> >> Heh. First comment: Don't do that ;) > > I did say naively coded. > >> >> A few more below. >> >>> 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A >>> diff --git a/pkt_sched.h b/pkt_sched.h >>> index a2f570c..d1f288d 100644 >>> --- a/pkt_sched.h >>> +++ b/pkt_sched.h >>> @@ -879,6 +879,7 @@ enum { >>> TCA_CAKE_ACK_FILTER, >>> TCA_CAKE_SPLIT_GSO, >>> TCA_CAKE_FWMARK, >>> + TCA_CAKE_ICING, >>> __TCA_CAKE_MAX >>> }; >>> #define TCA_CAKE_MAX(__TCA_CAKE_MAX - 1) >>> diff --git a/sch_cake.c b/sch_cake.c >>> index 733b897..5aca0f3 100644 >>> --- a/sch_cake.c >>> +++ b/sch_cake.c >>> @@ -270,7 +270,8 @@ enum { >>> CAKE_FLAG_INGRESS = BIT(2), >>> CAKE_FLAG_WASH = BIT(3), >>> CAKE_FLAG_SPLIT_GSO= BIT(4), >>> - CAKE_FLAG_FWMARK = BIT(5) >>> + CAKE_FLAG_FWMARK = BIT(5), >>> + CAKE_FLAG_ICING= BIT(6) >> >> This implies that icing and fwmark can be enabled completely >> independently of each other. Are you sure about the semantics for that? > > No, I’m not. I sent the patches so others could see my lunacy in action and > hopefully improve it. > > The actual operation links FWMARK, INGRESS & ICING in a variety of > combinations. Right, so obviously this needs to be thought through... >>> }; >>> >>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to >>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = { >>> }; >>> >>> static const u8 diffserv4[] = { >>> - 0, 2, 0, 0, 2, 0, 0, 0, >>> + 0, 1, 0, 0, 2, 0, 0, 0, >>> 1, 0, 0, 0, 0, 0, 0, 0, >>> 2, 0, 2, 0, 2, 0, 2, 0, >>> 2, 0, 2, 0, 2, 0, 2, 0, >>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = { >>> }; >>> >>> static const u8 diffserv3[] = { >>> - 0, 0, 0, 0, 2, 0, 0, 0, >>> + 0, 1, 0, 0, 2, 0, 0, 0, >> >> Why are you messing with the diffserv mappings in this patch? > > This is a combination patch of Dave’s new LE coding and the > fwmark/dscp mangling. Ah. Well let's keep that separate from this patch/discussion... >> >>> 1, 0, 0, 0, 0, 0, 0, 0, >>> 0, 0, 0, 0, 0, 0, 0, 0, >>> 0, 0, 0, 0, 0, 0, 0, 0, >>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, >>> struct sk_buff **to_free) >>> return idx + (tin << 16); >>> } >>> >>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 4 Mar 2019, at 16:39, Toke Høiland-Jørgensen wrote: >> >> Kevin Darbyshire-Bryant writes: >> >> [ ... snipping a bit of context here ... ] >> >>>>>>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp) >>>>>>> +{ >>>>>>> + enum ip_conntrack_info ctinfo; >>>>>>> + struct nf_conn *ct; >>>>>>> + >>>>>>> + ct = nf_ct_get(skb, ); >>>>>>> + if (!ct) >>>>>>> + return; >>>>>>> + >>>>>>> + ct->mark &= 0x80ff; >>>>>>> + ct->mark |= (0x40 | dscp) << 24; >>>>>> >>>>>> Right, so we *might* have an argument that putting the *tin* into the >>>>>> fwmark is CAKE's business, but copying over the dscp mark is not >>>>>> something a qdisc should be doing… >>>>> >>>>> Why ever not? It’s not the DSCP, it’s a lookup value into the cake >>>>> priority table, it just happens to look like the DSCP ;-) >>>> >>>> If it quacks like a duck… >>> >>> I honestly don’t know where to go from here. I’m clearly trying to do >>> something that the kernel doesn’t want to do. >> >> I'm not disputing that what you're trying to do (moving DSCP field into >> connmark) is useful. I'm just questioning whether CAKE is the right >> place to do this. I think it would fit better in a TC action; either >> modify act_connmark, or create a new action to sync fwmarks and DSCP >> marks… > > Interesting. Thinks out loud - Two actions - ‘act_storedscp’, > ‘act_restoredscp' Or act_dscp with 'get' and 'set' options :) > As I said earlier I couldn't work out how m_conntrack did… anything at > all to be honest! act_connmark is pretty straight forward: https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34 >>> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct >>> Qdisc *sch, >>> tin = 0; >>> >>> else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ >>> -skb->mark && >>> -skb->mark <= q->tin_cnt) >>> - tin = q->tin_order[skb->mark - 1]; >>> - >>> - else if (TC_H_MAJ(skb->priority) == sch->handle && >>> -TC_H_MIN(skb->priority) > 0 && >>> -TC_H_MIN(skb->priority) <= q->tin_cnt) >>> + skb->mark & 0x4000) { >> >> I think there's something odd with this mask? There's only one bit set >> in it… > > I use the single bit as a flag to indicate cake has stored the DSCP > in the lower 6 bits of the byte. Otherwise with a DSCP of 0 (the > default) it’s rather difficult to know if a connection has been > through the cake ’save dscp to fwmark’ process or not. That also > indicates to user space whether it should consider mangling packets or > not e.g. Ah, right. But that breaks the previous use case where someone just wants to set fwmarks that get turned into CAKE tins? I think this definitely is leaning towards decoupling the fw-mark-to-DSCP settings to an action. And probably making the shift and mask configurable rather than hard-coding something... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake