[Cake] Does the latest cake support "tc filter"?
Hello developers, I've seen the mail in the netdev mailing list, saying "other tc filters supported". So can I use "tc filter" to attach specified traffic to a specified tin without DSCP marks? It's helpful when dealing with ingress traffic where iptables DSCP mark won't work. Thanks in advance. ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v12 2/7] sch_cake: Add ingress mode
Cong Wangwrites: > On Wed, May 16, 2018 at 1:29 PM, Toke Høiland-Jørgensen wrote: >> + if (tb[TCA_CAKE_AUTORATE]) { >> + if (!!nla_get_u32(tb[TCA_CAKE_AUTORATE])) >> + q->rate_flags |= CAKE_FLAG_AUTORATE_INGRESS; >> + else >> + q->rate_flags &= ~CAKE_FLAG_AUTORATE_INGRESS; >> + } >> + >> + if (tb[TCA_CAKE_INGRESS]) { >> + if (!!nla_get_u32(tb[TCA_CAKE_INGRESS])) >> + q->rate_flags |= CAKE_FLAG_INGRESS; >> + else >> + q->rate_flags &= ~CAKE_FLAG_INGRESS; >> + } >> + >> if (tb[TCA_CAKE_MEMORY]) >> q->buffer_config_limit = nla_get_u32(tb[TCA_CAKE_MEMORY]); >> >> @@ -1559,6 +1628,14 @@ static int cake_dump(struct Qdisc *sch, struct >> sk_buff *skb) >> if (nla_put_u32(skb, TCA_CAKE_MEMORY, q->buffer_config_limit)) >> goto nla_put_failure; >> >> + if (nla_put_u32(skb, TCA_CAKE_AUTORATE, >> + !!(q->rate_flags & CAKE_FLAG_AUTORATE_INGRESS))) >> + goto nla_put_failure; >> + >> + if (nla_put_u32(skb, TCA_CAKE_INGRESS, >> + !!(q->rate_flags & CAKE_FLAG_INGRESS))) >> + goto nla_put_failure; >> + > > Why do you want to dump each bit of the rate_flags separately rather than > dumping the whole rate_flags as an integer? Well, these were added one at a time, each as a new option. Isn't that more or less congruent with how netlink attributes are supposed to be used? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v12 4/7] sch_cake: Add NAT awareness to packet classifier
On Wed, May 16, 2018 at 1:29 PM, Toke Høiland-Jørgensenwrote: > When CAKE is deployed on a gateway that also performs NAT (which is a > common deployment mode), the host fairness mechanism cannot distinguish > internal hosts from each other, and so fails to work correctly. > > To fix this, we add an optional NAT awareness mode, which will query the > kernel conntrack mechanism to obtain the pre-NAT addresses for each packet > and use that in the flow and host hashing. > > When the shaper is enabled and the host is already performing NAT, the cost > of this lookup is negligible. However, in unlimited mode with no NAT being > performed, there is a significant CPU cost at higher bandwidths. For this > reason, the feature is turned off by default. > > Signed-off-by: Toke Høiland-Jørgensen > --- > net/sched/sch_cake.c | 73 > ++ > 1 file changed, 73 insertions(+) > > diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c > index 65439b643c92..e1038a7b6686 100644 > --- a/net/sched/sch_cake.c > +++ b/net/sched/sch_cake.c > @@ -71,6 +71,12 @@ > #include > #include > > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) > +#include > +#include > +#include > +#endif > + > #define CAKE_SET_WAYS (8) > #define CAKE_MAX_TINS (8) > #define CAKE_QUEUES (1024) > @@ -514,6 +520,60 @@ static bool cobalt_should_drop(struct cobalt_vars *vars, > return drop; > } > > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) > + > +static void cake_update_flowkeys(struct flow_keys *keys, > +const struct sk_buff *skb) > +{ > + const struct nf_conntrack_tuple *tuple; > + enum ip_conntrack_info ctinfo; > + struct nf_conn *ct; > + bool rev = false; > + > + if (tc_skb_protocol(skb) != htons(ETH_P_IP)) > + return; > + > + ct = nf_ct_get(skb, ); > + if (ct) { > + tuple = nf_ct_tuple(ct, CTINFO2DIR(ctinfo)); > + } else { > + const struct nf_conntrack_tuple_hash *hash; > + struct nf_conntrack_tuple srctuple; > + > + if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), > + NFPROTO_IPV4, dev_net(skb->dev), > + )) > + return; > + > + hash = nf_conntrack_find_get(dev_net(skb->dev), > +_ct_zone_dflt, > +); > + if (!hash) > + return; > + > + rev = true; > + ct = nf_ct_tuplehash_to_ctrack(hash); > + tuple = nf_ct_tuple(ct, !hash->tuple.dst.dir); > + } > + > + keys->addrs.v4addrs.src = rev ? tuple->dst.u3.ip : tuple->src.u3.ip; > + keys->addrs.v4addrs.dst = rev ? tuple->src.u3.ip : tuple->dst.u3.ip; > + > + if (keys->ports.ports) { > + keys->ports.src = rev ? tuple->dst.u.all : tuple->src.u.all; > + keys->ports.dst = rev ? tuple->src.u.all : tuple->dst.u.all; > + } > + if (rev) > + nf_ct_put(ct); > +} > +#else > +static void cake_update_flowkeys(struct flow_keys *keys, > +const struct sk_buff *skb) > +{ > + /* There is nothing we can do here without CONNTRACK */ > +} > +#endif > + > /* Cake has several subtle multiple bit settings. In these cases you > * would be matching triple isolate mode as well. > */ > @@ -541,6 +601,9 @@ static u32 cake_hash(struct cake_tin_data *q, const > struct sk_buff *skb, > skb_flow_dissect_flow_keys(skb, , >FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); > > + if (flow_mode & CAKE_FLOW_NAT_FLAG) > + cake_update_flowkeys(, skb); > + > /* flow_hash_from_keys() sorts the addresses by value, so we have > * to preserve their order in a separate data structure to treat > * src and dst host addresses as independently selectable. > @@ -1727,6 +1790,12 @@ static int cake_change(struct Qdisc *sch, struct > nlattr *opt, > q->flow_mode = (nla_get_u32(tb[TCA_CAKE_FLOW_MODE]) & > CAKE_FLOW_MASK); > > + if (tb[TCA_CAKE_NAT]) { > + q->flow_mode &= ~CAKE_FLOW_NAT_FLAG; > + q->flow_mode |= CAKE_FLOW_NAT_FLAG * > + !!nla_get_u32(tb[TCA_CAKE_NAT]); > + } I think it's better to return -EOPNOTSUPP when CONFIG_NF_CONNTRACK is not enabled. > + > if (tb[TCA_CAKE_RTT]) { > q->interval = nla_get_u32(tb[TCA_CAKE_RTT]); > > @@ -1892,6 +1961,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff > *skb) > if (nla_put_u32(skb, TCA_CAKE_ACK_FILTER, q->ack_filter)) > goto nla_put_failure; > > + if (nla_put_u32(skb, TCA_CAKE_NAT, > + !!(q->flow_mode &
Re: [Cake] [PATCH net-next v12 4/7] sch_cake: Add NAT awareness to packet classifier
Cong Wangwrites: > On Wed, May 16, 2018 at 1:29 PM, Toke Høiland-Jørgensen wrote: >> When CAKE is deployed on a gateway that also performs NAT (which is a >> common deployment mode), the host fairness mechanism cannot distinguish >> internal hosts from each other, and so fails to work correctly. >> >> To fix this, we add an optional NAT awareness mode, which will query the >> kernel conntrack mechanism to obtain the pre-NAT addresses for each packet >> and use that in the flow and host hashing. >> >> When the shaper is enabled and the host is already performing NAT, the cost >> of this lookup is negligible. However, in unlimited mode with no NAT being >> performed, there is a significant CPU cost at higher bandwidths. For this >> reason, the feature is turned off by default. >> >> Signed-off-by: Toke Høiland-Jørgensen >> --- >> net/sched/sch_cake.c | 73 >> ++ >> 1 file changed, 73 insertions(+) >> >> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c >> index 65439b643c92..e1038a7b6686 100644 >> --- a/net/sched/sch_cake.c >> +++ b/net/sched/sch_cake.c >> @@ -71,6 +71,12 @@ >> #include >> #include >> >> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) >> +#include >> +#include >> +#include >> +#endif >> + >> #define CAKE_SET_WAYS (8) >> #define CAKE_MAX_TINS (8) >> #define CAKE_QUEUES (1024) >> @@ -514,6 +520,60 @@ static bool cobalt_should_drop(struct cobalt_vars *vars, >> return drop; >> } >> >> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) >> + >> +static void cake_update_flowkeys(struct flow_keys *keys, >> +const struct sk_buff *skb) >> +{ >> + const struct nf_conntrack_tuple *tuple; >> + enum ip_conntrack_info ctinfo; >> + struct nf_conn *ct; >> + bool rev = false; >> + >> + if (tc_skb_protocol(skb) != htons(ETH_P_IP)) >> + return; >> + >> + ct = nf_ct_get(skb, ); >> + if (ct) { >> + tuple = nf_ct_tuple(ct, CTINFO2DIR(ctinfo)); >> + } else { >> + const struct nf_conntrack_tuple_hash *hash; >> + struct nf_conntrack_tuple srctuple; >> + >> + if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), >> + NFPROTO_IPV4, dev_net(skb->dev), >> + )) >> + return; >> + >> + hash = nf_conntrack_find_get(dev_net(skb->dev), >> +_ct_zone_dflt, >> +); >> + if (!hash) >> + return; >> + >> + rev = true; >> + ct = nf_ct_tuplehash_to_ctrack(hash); >> + tuple = nf_ct_tuple(ct, !hash->tuple.dst.dir); >> + } >> + >> + keys->addrs.v4addrs.src = rev ? tuple->dst.u3.ip : tuple->src.u3.ip; >> + keys->addrs.v4addrs.dst = rev ? tuple->src.u3.ip : tuple->dst.u3.ip; >> + >> + if (keys->ports.ports) { >> + keys->ports.src = rev ? tuple->dst.u.all : tuple->src.u.all; >> + keys->ports.dst = rev ? tuple->src.u.all : tuple->dst.u.all; >> + } >> + if (rev) >> + nf_ct_put(ct); >> +} >> +#else >> +static void cake_update_flowkeys(struct flow_keys *keys, >> +const struct sk_buff *skb) >> +{ >> + /* There is nothing we can do here without CONNTRACK */ >> +} >> +#endif >> + >> /* Cake has several subtle multiple bit settings. In these cases you >> * would be matching triple isolate mode as well. >> */ >> @@ -541,6 +601,9 @@ static u32 cake_hash(struct cake_tin_data *q, const >> struct sk_buff *skb, >> skb_flow_dissect_flow_keys(skb, , >>FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); >> >> + if (flow_mode & CAKE_FLOW_NAT_FLAG) >> + cake_update_flowkeys(, skb); >> + >> /* flow_hash_from_keys() sorts the addresses by value, so we have >> * to preserve their order in a separate data structure to treat >> * src and dst host addresses as independently selectable. >> @@ -1727,6 +1790,12 @@ static int cake_change(struct Qdisc *sch, struct >> nlattr *opt, >> q->flow_mode = (nla_get_u32(tb[TCA_CAKE_FLOW_MODE]) & >> CAKE_FLOW_MASK); >> >> + if (tb[TCA_CAKE_NAT]) { >> + q->flow_mode &= ~CAKE_FLOW_NAT_FLAG; >> + q->flow_mode |= CAKE_FLOW_NAT_FLAG * >> + !!nla_get_u32(tb[TCA_CAKE_NAT]); >> + } > > > I think it's better to return -EOPNOTSUPP when CONFIG_NF_CONNTRACK > is not enabled. Good point, will fix :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v12 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
Cong Wangwrites: > On Wed, May 16, 2018 at 1:29 PM, Toke Høiland-Jørgensen wrote: >> + >> +static struct Qdisc *cake_leaf(struct Qdisc *sch, unsigned long arg) >> +{ >> + return NULL; >> +} >> + >> +static unsigned long cake_find(struct Qdisc *sch, u32 classid) >> +{ >> + return 0; >> +} >> + >> +static void cake_walk(struct Qdisc *sch, struct qdisc_walker *arg) >> +{ >> +} > > > Thanks for adding the support to other TC filters, it is much better > now! You're welcome. Turned out not to be that hard :) > A quick question: why class_ops->dump_stats is still NULL? > > It is supposed to dump the stats of each flow. Is there still any > difficulty to map it to tc class? I thought you figured it out when > you added the tcf_classify(). On the classify side, I solved the "multiple sets of queues" problem by using skb->priority to select the tin (diffserv tier) and the classifier output to select the queue within that tin. This would not work for dumping stats; some other way of mapping queues to the linear class space would be needed. And since we are not actually collecting any per-flow stats that I could print, I thought it wasn't worth coming up with a half-baked proposal for this just to add an API hook that no one in the existing CAKE user base has ever asked for... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v12 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
On Wed, May 16, 2018 at 1:29 PM, Toke Høiland-Jørgensenwrote: > + > +static struct Qdisc *cake_leaf(struct Qdisc *sch, unsigned long arg) > +{ > + return NULL; > +} > + > +static unsigned long cake_find(struct Qdisc *sch, u32 classid) > +{ > + return 0; > +} > + > +static void cake_walk(struct Qdisc *sch, struct qdisc_walker *arg) > +{ > +} Thanks for adding the support to other TC filters, it is much better now! A quick question: why class_ops->dump_stats is still NULL? It is supposed to dump the stats of each flow. Is there still any difficulty to map it to tc class? I thought you figured it out when you added the tcf_classify(). ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH net-next v12 2/7] sch_cake: Add ingress mode
The ingress mode is meant to be enabled when CAKE runs downlink of the actual bottleneck (such as on an IFB device). The mode changes the shaper to also account dropped packets to the shaped rate, as these have already traversed the bottleneck. Enabling ingress mode will also tune the AQM to always keep at least two packets queued *for each flow*. This is done by scaling the minimum queue occupancy level that will disable the AQM by the number of active bulk flows. The rationale for this is that retransmits are more expensive in ingress mode, since dropped packets have to traverse the bottleneck again when they are retransmitted; thus, being more lenient and keeping a minimum number of packets queued will improve throughput in cases where the number of active flows are so large that they saturate the bottleneck even at their minimum window size. This commit also adds a separate switch to enable ingress mode rate autoscaling. If enabled, the autoscaling code will observe the actual traffic rate and adjust the shaper rate to match it. This can help avoid latency increases in the case where the actual bottleneck rate decreases below the shaped rate. The scaling filters out spikes by an EWMA filter. Signed-off-by: Toke Høiland-Jørgensen--- net/sched/sch_cake.c | 85 -- 1 file changed, 81 insertions(+), 4 deletions(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 422cfccbf37f..d515f18f8460 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -433,7 +433,8 @@ static bool cobalt_queue_empty(struct cobalt_vars *vars, static bool cobalt_should_drop(struct cobalt_vars *vars, struct cobalt_params *p, ktime_t now, - struct sk_buff *skb) + struct sk_buff *skb, + u32 bulk_flows) { bool next_due, over_target, drop = false; ktime_t schedule; @@ -457,6 +458,7 @@ static bool cobalt_should_drop(struct cobalt_vars *vars, sojourn = ktime_to_ns(ktime_sub(now, cobalt_get_enqueue_time(skb))); schedule = ktime_sub(now, vars->drop_next); over_target = sojourn > p->target && + sojourn > p->mtu_time * bulk_flows * 2 && sojourn > p->mtu_time * 4; next_due = vars->count && schedule >= 0; @@ -910,6 +912,9 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free) b->tin_dropped++; sch->qstats.drops++; + if (q->rate_flags & CAKE_FLAG_INGRESS) + cake_advance_shaper(q, b, skb, now, true); + __qdisc_drop(skb, to_free); sch->q.qlen--; @@ -986,8 +991,46 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, cake_heapify_up(q, b->overflow_idx[idx]); /* incoming bandwidth capacity estimate */ - q->avg_window_bytes = 0; - q->last_packet_time = now; + if (q->rate_flags & CAKE_FLAG_AUTORATE_INGRESS) { + u64 packet_interval = \ + ktime_to_ns(ktime_sub(now, q->last_packet_time)); + + if (packet_interval > NSEC_PER_SEC) + packet_interval = NSEC_PER_SEC; + + /* filter out short-term bursts, eg. wifi aggregation */ + q->avg_packet_interval = \ + cake_ewma(q->avg_packet_interval, + packet_interval, + (packet_interval > q->avg_packet_interval ? + 2 : 8)); + + q->last_packet_time = now; + + if (packet_interval > q->avg_packet_interval) { + u64 window_interval = \ + ktime_to_ns(ktime_sub(now, + q->avg_window_begin)); + u64 b = q->avg_window_bytes * (u64)NSEC_PER_SEC; + + do_div(b, window_interval); + q->avg_peak_bandwidth = + cake_ewma(q->avg_peak_bandwidth, b, + b > q->avg_peak_bandwidth ? 2 : 8); + q->avg_window_bytes = 0; + q->avg_window_begin = now; + + if (ktime_after(now, + ktime_add_ms(q->last_reconfig_time, +250))) { + q->rate_bps = (q->avg_peak_bandwidth * 15) >> 4; + cake_reconfigure(sch); + } + } + } else { + q->avg_window_bytes = 0; + q->last_packet_time = now; + } /* flowchain */ if (!flow->set || flow->set == CAKE_SET_DECAYING) { @@ -1246,14 +1289,26 @@ static struct sk_buff
[Cake] [PATCH net-next v12 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
sch_cake targets the home router use case and is intended to squeeze the most bandwidth and latency out of even the slowest ISP links and routers, while presenting an API simple enough that even an ISP can configure it. Example of use on a cable ISP uplink: tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter To shape a cable download link (ifb and tc-mirred setup elided) tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash CAKE is filled with: * A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel derived Flow Queuing system, which autoconfigures based on the bandwidth. * A novel "triple-isolate" mode (the default) which balances per-host and per-flow FQ even through NAT. * An deficit based shaper, that can also be used in an unlimited mode. * 8 way set associative hashing to reduce flow collisions to a minimum. * A reasonable interpretation of various diffserv latency/loss tradeoffs. * Support for zeroing diffserv markings for entering and exiting traffic. * Support for interacting well with Docsis 3.0 shaper framing. * Extensive support for DSL framing types. * Support for ack filtering. * Extensive statistics for measuring, loss, ecn markings, latency variation. A paper describing the design of CAKE is available at https://arxiv.org/abs/1804.07617, and will be published at the 2018 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN). This patch adds the base shaper and packet scheduler, while subsequent commits add the optional (configurable) features. The full userspace API and most data structures are included in this commit, but options not understood in the base version will be ignored. Various versions baking have been available as an out of tree build for kernel versions going back to 3.10, as the embedded router world has been running a few years behind mainline Linux. A stable version has been generally available on lede-17.01 and later. sch_cake replaces a combination of iptables, tc filter, htb and fq_codel in the sqm-scripts, with sane defaults and vastly simpler configuration. CAKE's principal author is Jonathan Morton, with contributions from Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller, Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht, and Loganaden Velvindron. Testing from Pete Heist, Georgios Amanakis, and the many other members of the cake@lists.bufferbloat.net mailing list. tc -s qdisc show dev eth2 qdisc cake 1: root refcnt 2 bandwidth 100Mbit diffserv3 triple-isolate rtt 100.0ms raw overhead 0 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 memory used: 0b of 500b capacity estimate: 100Mbit min/max network layer size:65535 / 0 min/max overhead-adjusted size:65535 / 0 average network hdr offset:0 Bulk Best EffortVoice thresh 6250Kbit 100Mbit 25Mbit target 5.0ms5.0ms5.0ms interval 100.0ms 100.0ms 100.0ms pk_delay 0us 0us 0us av_delay 0us 0us 0us sp_delay 0us 0us 0us pkts000 bytes 000 way_inds000 way_miss000 way_cols000 drops 000 marks 000 ack_drop000 sp_flows000 bk_flows000 un_flows000 max_len 000 quantum 300 1514 762 Tested-by: Pete HeistTested-by: Georgios Amanakis Signed-off-by: Dave Taht Signed-off-by: Toke Høiland-Jørgensen --- include/uapi/linux/pkt_sched.h | 105 ++ net/sched/Kconfig | 11 net/sched/Makefile |1 net/sched/sch_cake.c | 1739 4 files changed, 1856 insertions(+) create mode 100644 net/sched/sch_cake.c diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 37b5096ae97b..883e84f008d7 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -934,4 +934,109 @@ enum { #define TCA_CBS_MAX (__TCA_CBS_MAX - 1) +/* CAKE */ +enum { + TCA_CAKE_UNSPEC, + TCA_CAKE_BASE_RATE64, + TCA_CAKE_DIFFSERV_MODE, + TCA_CAKE_ATM, + TCA_CAKE_FLOW_MODE, + TCA_CAKE_OVERHEAD, + TCA_CAKE_RTT, + TCA_CAKE_TARGET, + TCA_CAKE_AUTORATE, + TCA_CAKE_MEMORY, + TCA_CAKE_NAT, + TCA_CAKE_RAW, + TCA_CAKE_WASH, + TCA_CAKE_MPU, +
[Cake] [PATCH net-next v12 4/7] sch_cake: Add NAT awareness to packet classifier
When CAKE is deployed on a gateway that also performs NAT (which is a common deployment mode), the host fairness mechanism cannot distinguish internal hosts from each other, and so fails to work correctly. To fix this, we add an optional NAT awareness mode, which will query the kernel conntrack mechanism to obtain the pre-NAT addresses for each packet and use that in the flow and host hashing. When the shaper is enabled and the host is already performing NAT, the cost of this lookup is negligible. However, in unlimited mode with no NAT being performed, there is a significant CPU cost at higher bandwidths. For this reason, the feature is turned off by default. Signed-off-by: Toke Høiland-Jørgensen--- net/sched/sch_cake.c | 73 ++ 1 file changed, 73 insertions(+) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 65439b643c92..e1038a7b6686 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -71,6 +71,12 @@ #include #include +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) +#include +#include +#include +#endif + #define CAKE_SET_WAYS (8) #define CAKE_MAX_TINS (8) #define CAKE_QUEUES (1024) @@ -514,6 +520,60 @@ static bool cobalt_should_drop(struct cobalt_vars *vars, return drop; } +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) + +static void cake_update_flowkeys(struct flow_keys *keys, +const struct sk_buff *skb) +{ + const struct nf_conntrack_tuple *tuple; + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + bool rev = false; + + if (tc_skb_protocol(skb) != htons(ETH_P_IP)) + return; + + ct = nf_ct_get(skb, ); + if (ct) { + tuple = nf_ct_tuple(ct, CTINFO2DIR(ctinfo)); + } else { + const struct nf_conntrack_tuple_hash *hash; + struct nf_conntrack_tuple srctuple; + + if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), + NFPROTO_IPV4, dev_net(skb->dev), + )) + return; + + hash = nf_conntrack_find_get(dev_net(skb->dev), +_ct_zone_dflt, +); + if (!hash) + return; + + rev = true; + ct = nf_ct_tuplehash_to_ctrack(hash); + tuple = nf_ct_tuple(ct, !hash->tuple.dst.dir); + } + + keys->addrs.v4addrs.src = rev ? tuple->dst.u3.ip : tuple->src.u3.ip; + keys->addrs.v4addrs.dst = rev ? tuple->src.u3.ip : tuple->dst.u3.ip; + + if (keys->ports.ports) { + keys->ports.src = rev ? tuple->dst.u.all : tuple->src.u.all; + keys->ports.dst = rev ? tuple->src.u.all : tuple->dst.u.all; + } + if (rev) + nf_ct_put(ct); +} +#else +static void cake_update_flowkeys(struct flow_keys *keys, +const struct sk_buff *skb) +{ + /* There is nothing we can do here without CONNTRACK */ +} +#endif + /* Cake has several subtle multiple bit settings. In these cases you * would be matching triple isolate mode as well. */ @@ -541,6 +601,9 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb, skb_flow_dissect_flow_keys(skb, , FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); + if (flow_mode & CAKE_FLOW_NAT_FLAG) + cake_update_flowkeys(, skb); + /* flow_hash_from_keys() sorts the addresses by value, so we have * to preserve their order in a separate data structure to treat * src and dst host addresses as independently selectable. @@ -1727,6 +1790,12 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt, q->flow_mode = (nla_get_u32(tb[TCA_CAKE_FLOW_MODE]) & CAKE_FLOW_MASK); + if (tb[TCA_CAKE_NAT]) { + q->flow_mode &= ~CAKE_FLOW_NAT_FLAG; + q->flow_mode |= CAKE_FLOW_NAT_FLAG * + !!nla_get_u32(tb[TCA_CAKE_NAT]); + } + if (tb[TCA_CAKE_RTT]) { q->interval = nla_get_u32(tb[TCA_CAKE_RTT]); @@ -1892,6 +1961,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb) if (nla_put_u32(skb, TCA_CAKE_ACK_FILTER, q->ack_filter)) goto nla_put_failure; + if (nla_put_u32(skb, TCA_CAKE_NAT, + !!(q->flow_mode & CAKE_FLOW_NAT_FLAG))) + goto nla_put_failure; + return nla_nest_end(skb, opts); nla_put_failure: ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] [PATCH net-next v12 3/7] sch_cake: Add optional ACK filter
The ACK filter is an optional feature of CAKE which is designed to improve performance on links with very asymmetrical rate limits. On such links (which are unfortunately quite prevalent, especially for DSL and cable subscribers), the downstream throughput can be limited by the number of ACKs capable of being transmitted in the *upstream* direction. Filtering ACKs can, in general, have adverse effects on TCP performance because it interferes with ACK clocking (especially in slow start), and it reduces the flow's resiliency to ACKs being dropped further along the path. To alleviate these drawbacks, the ACK filter in CAKE tries its best to always keep enough ACKs queued to ensure forward progress in the TCP flow being filtered. It does this by only filtering redundant ACKs. In its default 'conservative' mode, the filter will always keep at least two redundant ACKs in the queue, while in 'aggressive' mode, it will filter down to a single ACK. The ACK filter works by inspecting the per-flow queue on every packet enqueue. Starting at the head of the queue, the filter looks for another eligible packet to drop (so the ACK being dropped is always closer to the head of the queue than the packet being enqueued). An ACK is eligible only if it ACKs *fewer* cumulative bytes than the new packet being enqueued. This prevents duplicate ACKs from being filtered (unless there is also SACK options present), to avoid interfering with retransmission logic. In aggressive mode, an eligible packet is always dropped, while in conservative mode, at least two ACKs are kept in the queue. Only pure ACKs (with no data segments) are considered eligible for dropping, but when an ACK with data segments is enqueued, this can cause another pure ACK to become eligible for dropping. The approach described above ensures that this ACK filter avoids most of the drawbacks of a naive filtering mechanism that only keeps flow state but does not inspect the queue. This is the rationale for including the ACK filter in CAKE itself rather than as separate module (as the TC filter, for instance). Our performance evaluation has shown that on a 30/1 Mbps link with a bidirectional traffic test (RRUL), turning on the ACK filter on the upstream link improves downstream throughput by ~20% (both modes) and upstream throughput by ~12% in conservative mode and ~40% in aggressive mode, at the cost of ~5ms of inter-flow latency due to the increased congestion. In *really* pathological cases, the effect can be a lot more; for instance, the ACK filter increases the achievable downstream throughput on a link with 100 Kbps in the upstream direction by an order of magnitude (from ~2.5 Mbps to ~25 Mbps). Finally, even though we consider the ACK filter to be safer than most, we do not recommend turning it on everywhere: on more symmetrical link bandwidths the effect is negligible at best. Signed-off-by: Toke Høiland-Jørgensen--- net/sched/sch_cake.c | 260 ++ 1 file changed, 258 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index d515f18f8460..65439b643c92 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -755,6 +755,239 @@ static void flow_queue_add(struct cake_flow *flow, struct sk_buff *skb) skb->next = NULL; } +static struct iphdr *cake_get_iphdr(const struct sk_buff *skb, + struct ipv6hdr *buf) +{ + unsigned int offset = skb_network_offset(skb); + struct iphdr *iph; + + iph = skb_header_pointer(skb, offset, sizeof(struct iphdr), buf); + + if (!iph) + return NULL; + + if (iph->version == 4 && iph->protocol == IPPROTO_IPV6) + return skb_header_pointer(skb, offset + iph->ihl * 4, + sizeof(struct ipv6hdr), buf); + + else if (iph->version == 4) + return iph; + + else if (iph->version == 6) + return skb_header_pointer(skb, offset, sizeof(struct ipv6hdr), + buf); + + return NULL; +} + +static struct tcphdr *cake_get_tcphdr(const struct sk_buff *skb, + void *buf, unsigned int bufsize) +{ + unsigned int offset = skb_network_offset(skb); + const struct ipv6hdr *ipv6h; + const struct tcphdr *tcph; + const struct iphdr *iph; + struct ipv6hdr _ipv6h; + struct tcphdr _tcph; + + ipv6h = skb_header_pointer(skb, offset, sizeof(_ipv6h), &_ipv6h); + + if (!ipv6h) + return NULL; + + if (ipv6h->version == 4) { + iph = (struct iphdr *)ipv6h; + offset += iph->ihl * 4; + + /* special-case 6in4 tunnelling, as that is a common way to get +* v6 connectivity in the home +*/ + if (iph->protocol == IPPROTO_IPV6) { + ipv6h =
[Cake] [PATCH net-next v12 6/7] sch_cake: Add overhead compensation support to the rate shaper
This commit adds configurable overhead compensation support to the rate shaper. With this feature, userspace can configure the actual bottleneck link overhead and encapsulation mode used, which will be used by the shaper to calculate the precise duration of each packet on the wire. This feature is needed because CAKE is often deployed one or two hops upstream of the actual bottleneck (which can be, e.g., inside a DSL or cable modem). In this case, the link layer characteristics and overhead reported by the kernel does not match the actual bottleneck. Being able to set the actual values in use makes it possible to configure the shaper rate much closer to the actual bottleneck rate (our experience shows it is possible to get with 0.1% of the actual physical bottleneck rate), thus keeping latency low without sacrificing bandwidth. The overhead compensation has three tunables: A fixed per-packet overhead size (which, if set, will be accounted from the IP packet header), a minimum packet size (MPU) and a framing mode supporting either ATM or PTM framing. We include a set of common keywords in TC to help users configure the right parameters. If no overhead value is set, the value reported by the kernel is used. Signed-off-by: Toke Høiland-Jørgensen--- net/sched/sch_cake.c | 124 ++ 1 file changed, 123 insertions(+), 1 deletion(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index f0f94d536e51..1ce81d919f73 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -271,6 +271,7 @@ enum { struct cobalt_skb_cb { ktime_t enqueue_time; + u32 adjusted_len; }; static u64 us_to_ns(u64 us) @@ -1120,6 +1121,88 @@ static u64 cake_ewma(u64 avg, u64 sample, u32 shift) return avg; } +static u32 cake_calc_overhead(struct cake_sched_data *q, u32 len, u32 off) +{ + if (q->rate_flags & CAKE_FLAG_OVERHEAD) + len -= off; + + if (q->max_netlen < len) + q->max_netlen = len; + if (q->min_netlen > len) + q->min_netlen = len; + + len += q->rate_overhead; + + if (len < q->rate_mpu) + len = q->rate_mpu; + + if (q->atm_mode == CAKE_ATM_ATM) { + len += 47; + len /= 48; + len *= 53; + } else if (q->atm_mode == CAKE_ATM_PTM) { + /* Add one byte per 64 bytes or part thereof. +* This is conservative and easier to calculate than the +* precise value. +*/ + len += (len + 63) / 64; + } + + if (q->max_adjlen < len) + q->max_adjlen = len; + if (q->min_adjlen > len) + q->min_adjlen = len; + + return len; +} + +static u32 cake_overhead(struct cake_sched_data *q, const struct sk_buff *skb) +{ + const struct skb_shared_info *shinfo = skb_shinfo(skb); + unsigned int hdr_len, last_len = 0; + u32 off = skb_network_offset(skb); + u32 len = qdisc_pkt_len(skb); + u16 segs = 1; + + q->avg_netoff = cake_ewma(q->avg_netoff, off << 16, 8); + + if (!shinfo->gso_size) + return cake_calc_overhead(q, len, off); + + /* borrowed from qdisc_pkt_len_init() */ + hdr_len = skb_transport_header(skb) - skb_mac_header(skb); + + /* + transport layer */ + if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | + SKB_GSO_TCPV6))) { + const struct tcphdr *th; + struct tcphdr _tcphdr; + + th = skb_header_pointer(skb, skb_transport_offset(skb), + sizeof(_tcphdr), &_tcphdr); + if (likely(th)) + hdr_len += __tcp_hdrlen(th); + } else { + struct udphdr _udphdr; + + if (skb_header_pointer(skb, skb_transport_offset(skb), + sizeof(_udphdr), &_udphdr)) + hdr_len += sizeof(struct udphdr); + } + + if (unlikely(shinfo->gso_type & SKB_GSO_DODGY)) + segs = DIV_ROUND_UP(skb->len - hdr_len, + shinfo->gso_size); + else + segs = shinfo->gso_segs; + + len = shinfo->gso_size + hdr_len; + last_len = skb->len - shinfo->gso_size * (segs - 1); + + return (cake_calc_overhead(q, len, off) * (segs - 1) + + cake_calc_overhead(q, last_len, off)); +} + static void cake_heap_swap(struct cake_sched_data *q, u16 i, u16 j) { struct cake_heap_entry ii = q->overflow_heap[i]; @@ -1197,7 +1280,7 @@ static int cake_advance_shaper(struct cake_sched_data *q, struct sk_buff *skb, ktime_t now, bool drop) { - u32 len = qdisc_pkt_len(skb); + u32 len = get_cobalt_cb(skb)->adjusted_len; /* charge packet bandwidth to
[Cake] [PATCH net-next v12 7/7] sch_cake: Conditionally split GSO segments
At lower bandwidths, the transmission time of a single GSO segment can add an unacceptable amount of latency due to HOL blocking. Furthermore, with a software shaper, any tuning mechanism employed by the kernel to control the maximum size of GSO segments is thrown off by the artificial limit on bandwidth. For this reason, we split GSO segments into their individual packets iff the shaper is active and configured to a bandwidth <= 1 Gbps. Signed-off-by: Toke Høiland-Jørgensen--- net/sched/sch_cake.c | 99 +- 1 file changed, 73 insertions(+), 26 deletions(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 1ce81d919f73..dca276806e9f 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -82,6 +82,7 @@ #define CAKE_QUEUES (1024) #define CAKE_FLOW_MASK 63 #define CAKE_FLOW_NAT_FLAG 64 +#define CAKE_SPLIT_GSO_THRESHOLD (12500) /* 1Gbps */ /* struct cobalt_params - contains codel and blue parameters * @interval: codel initial drop rate @@ -1474,36 +1475,73 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (unlikely(len > b->max_skblen)) b->max_skblen = len; - cobalt_set_enqueue_time(skb, now); - get_cobalt_cb(skb)->adjusted_len = cake_overhead(q, skb); - flow_queue_add(flow, skb); - - if (q->ack_filter) - ack = cake_ack_filter(q, flow); + if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) { + struct sk_buff *segs, *nskb; + netdev_features_t features = netif_skb_features(skb); + unsigned int slen = 0; + + segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); + if (IS_ERR_OR_NULL(segs)) + return qdisc_drop(skb, sch, to_free); + + while (segs) { + nskb = segs->next; + segs->next = NULL; + qdisc_skb_cb(segs)->pkt_len = segs->len; + cobalt_set_enqueue_time(segs, now); + get_cobalt_cb(segs)->adjusted_len = cake_overhead(q, + segs); + flow_queue_add(flow, segs); + + sch->q.qlen++; + slen += segs->len; + q->buffer_used += segs->truesize; + b->packets++; + segs = nskb; + } - if (ack) { - b->ack_drops++; - sch->qstats.drops++; - b->bytes += qdisc_pkt_len(ack); - len -= qdisc_pkt_len(ack); - q->buffer_used += skb->truesize - ack->truesize; - if (q->rate_flags & CAKE_FLAG_INGRESS) - cake_advance_shaper(q, b, ack, now, true); + /* stats */ + b->bytes+= slen; + b->backlogs[idx]+= slen; + b->tin_backlog += slen; + sch->qstats.backlog += slen; + q->avg_window_bytes += slen; - qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(ack)); - consume_skb(ack); + qdisc_tree_reduce_backlog(sch, 1, len); + consume_skb(skb); } else { - sch->q.qlen++; - q->buffer_used += skb->truesize; - } + /* not splitting */ + cobalt_set_enqueue_time(skb, now); + get_cobalt_cb(skb)->adjusted_len = cake_overhead(q, skb); + flow_queue_add(flow, skb); + + if (q->ack_filter) + ack = cake_ack_filter(q, flow); + + if (ack) { + b->ack_drops++; + sch->qstats.drops++; + b->bytes += qdisc_pkt_len(ack); + len -= qdisc_pkt_len(ack); + q->buffer_used += skb->truesize - ack->truesize; + if (q->rate_flags & CAKE_FLAG_INGRESS) + cake_advance_shaper(q, b, ack, now, true); + + qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(ack)); + consume_skb(ack); + } else { + sch->q.qlen++; + q->buffer_used += skb->truesize; + } - /* stats */ - b->packets++; - b->bytes+= len; - b->backlogs[idx]+= len; - b->tin_backlog += len; - sch->qstats.backlog += len; - q->avg_window_bytes += len; + /* stats */ + b->packets++; + b->bytes+= len; + b->backlogs[idx]+= len; + b->tin_backlog += len; + sch->qstats.backlog += len; + q->avg_window_bytes += len; + } if
Re: [Cake] [PATCH net-next v11 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
David Millerwrites: > From: Toke Høiland-Jørgensen > Date: Tue, 15 May 2018 17:12:44 +0200 > >> +typedef u64 cobalt_time_t; >> +typedef s64 cobalt_tdiff_t; > ... >> +static cobalt_time_t cobalt_get_time(void) >> +{ >> +return ktime_get_ns(); >> +} >> + >> +static u32 cobalt_time_to_us(cobalt_time_t val) >> +{ >> +do_div(val, NSEC_PER_USEC); >> +return (u32)val; >> +} > > If fundamentally you are working with ktime_t values, please use that type > and the associated helpers. > > This is a valid argument that using custom typedefs provide documentation > and an aid to understanding, but I think it doesn't serve that purpose > very well here. > > So please just use ktime_t throughout instead of this cobalt_time_t > and cobalt_tdiff_t. And then use helpers like ktime_to_us() which > properly optimize for 64-bit vs. 32-bit hosts. Can do :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH net-next v11 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
From: Toke Høiland-JørgensenDate: Tue, 15 May 2018 17:12:44 +0200 > +typedef u64 cobalt_time_t; > +typedef s64 cobalt_tdiff_t; ... > +static cobalt_time_t cobalt_get_time(void) > +{ > + return ktime_get_ns(); > +} > + > +static u32 cobalt_time_to_us(cobalt_time_t val) > +{ > + do_div(val, NSEC_PER_USEC); > + return (u32)val; > +} If fundamentally you are working with ktime_t values, please use that type and the associated helpers. This is a valid argument that using custom typedefs provide documentation and an aid to understanding, but I think it doesn't serve that purpose very well here. So please just use ktime_t throughout instead of this cobalt_time_t and cobalt_tdiff_t. And then use helpers like ktime_to_us() which properly optimize for 64-bit vs. 32-bit hosts. Thank you. ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake