Re: [Cake] Using firewall connmarks as tin selectors
> On 4 Mar 2019, at 21:33, Toke Høiland-Jørgensen wrote: >> I think so too though I think the mechanism of copying the DSCP bits >> and adding a ‘I did this’ flag bit should be retained so that other >> user space tools (iptables etc) can detect when a connmark based DSCP >> has been set/applied. > > I guess this could be an option as well? If we don’t do that then potentially we have to look up the DSCP and update the conntrack mark for every packet. >> I think cake ‘fwmark’ should have the smarts to look for the act_dscp >> DSCP value if nothing else so we don’t have to have the overhead of >> act_dscp set restoring DSCP to all the packets if we don’t want to. > > Not sure what you mean here? What I meant was that we can make the diffserv restore part optional. Our qdisc (or whatever) could pick up the fw stored DSCP for tin/bandwidth selection and not require the real DSCP to be set and quite possibly washed/bleached again anyway. >> I’m right at the limit of my coding ability with what I’ve sent in so >> far - the kernel space bits of act_connmark leave me mostly confused - >> really not sure where to start with act_dscp! > > I think I would start with `cp act_connmark.c act_dscp.c`, adding the > new file to the Makefile and Kconfig, and working from there. Then rip > out everything not needed, and copy over what you already added to cake. > > Happy to help you work out the details; but I think we'll make more > progress on this if you are driving it :) OK - we’ll see how long it takes before someone screams or laughs themselves to death :-) Kevin ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Toke Høiland-Jørgensen writes: > Kevin Darbyshire-Bryant writes: > >>> On 4 Mar 2019, at 17:36, Toke Høiland-Jørgensen wrote: >>> < snips > >>> >>> Or act_dscp with 'get' and 'set' options :) >>> As I said earlier I couldn't work out how m_conntrack did… anything at all to be honest! >>> >>> act_connmark is pretty straight forward: >>> https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34 >> >> Oh bloody hell I’m an idiot - I was looking in user space tc code for >> something that obviously lives in kernel land. Yes *now* I can see >> what it does. > > No comment ;) > >> @@ -1661,13 +1695,14 @@ static struct cake_tin_data >> *cake_select_tin(struct Qdisc *sch, >> tin = 0; >> >> else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ >> - skb->mark && >> - skb->mark <= q->tin_cnt) >> -tin = q->tin_order[skb->mark - 1]; >> - >> -else if (TC_H_MAJ(skb->priority) == sch->handle && >> - TC_H_MIN(skb->priority) > 0 && >> - TC_H_MIN(skb->priority) <= q->tin_cnt) >> + skb->mark & 0x4000) { > > I think there's something odd with this mask? There's only one bit set > in it… I use the single bit as a flag to indicate cake has stored the DSCP in the lower 6 bits of the byte. Otherwise with a DSCP of 0 (the default) it’s rather difficult to know if a connection has been through the cake ’save dscp to fwmark’ process or not. That also indicates to user space whether it should consider mangling packets or not e.g. >>> >>> Ah, right. But that breaks the previous use case where someone just >>> wants to set fwmarks that get turned into CAKE tins? >> >> Yes, which is why I was hoping for upstream to bounce it, not least >> because of the unmasked use of fwmark field. Personally I’d like to >> see that change reverted before we get trapped into something we’ll >> regret. > > Well, we have plenty of time to try out things; the whole point of the > rc cycle is testing. But sure, if we don't settle on something, we can > just revert it :) And, well, the simple thing to do is just keep the current behaviour, or add a mask of 0xff, and use the tc action for everything that's more advanced than this... >>> I think this definitely is leaning towards decoupling the >>> fw-mark-to-DSCP settings to an action. And probably making the shift and >>> mask configurable rather than hard-coding something… >> >> I think so too though I think the mechanism of copying the DSCP bits >> and adding a ‘I did this’ flag bit should be retained so that other >> user space tools (iptables etc) can detect when a connmark based DSCP >> has been set/applied. > > I guess this could be an option as well? > >> I think cake ‘fwmark’ should have the smarts to look for the act_dscp >> DSCP value if nothing else so we don’t have to have the overhead of >> act_dscp set restoring DSCP to all the packets if we don’t want to. > > Not sure what you mean here? > >> I’m right at the limit of my coding ability with what I’ve sent in so >> far - the kernel space bits of act_connmark leave me mostly confused - >> really not sure where to start with act_dscp! > > I think I would start with `cp act_connmark.c act_dscp.c`, adding the > new file to the Makefile and Kconfig, and working from there. Then rip > out everything not needed, and copy over what you already added to > cake. Or even simpler, just add new options to act_connmark... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 4 Mar 2019, at 17:36, Toke Høiland-Jørgensen wrote: >> < snips > >> >> Or act_dscp with 'get' and 'set' options :) >> >>> As I said earlier I couldn't work out how m_conntrack did… anything at >>> all to be honest! >> >> act_connmark is pretty straight forward: >> https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34 > > Oh bloody hell I’m an idiot - I was looking in user space tc code for > something that obviously lives in kernel land. Yes *now* I can see > what it does. No comment ;) > @@ -1661,13 +1695,14 @@ static struct cake_tin_data > *cake_select_tin(struct Qdisc *sch, > tin = 0; > > else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ > - skb->mark && > - skb->mark <= q->tin_cnt) > - tin = q->tin_order[skb->mark - 1]; > - > - else if (TC_H_MAJ(skb->priority) == sch->handle && > - TC_H_MIN(skb->priority) > 0 && > - TC_H_MIN(skb->priority) <= q->tin_cnt) > +skb->mark & 0x4000) { I think there's something odd with this mask? There's only one bit set in it… >>> >>> I use the single bit as a flag to indicate cake has stored the DSCP >>> in the lower 6 bits of the byte. Otherwise with a DSCP of 0 (the >>> default) it’s rather difficult to know if a connection has been >>> through the cake ’save dscp to fwmark’ process or not. That also >>> indicates to user space whether it should consider mangling packets or >>> not e.g. >> >> Ah, right. But that breaks the previous use case where someone just >> wants to set fwmarks that get turned into CAKE tins? > > Yes, which is why I was hoping for upstream to bounce it, not least > because of the unmasked use of fwmark field. Personally I’d like to > see that change reverted before we get trapped into something we’ll > regret. Well, we have plenty of time to try out things; the whole point of the rc cycle is testing. But sure, if we don't settle on something, we can just revert it :) >> I think this definitely is leaning towards decoupling the >> fw-mark-to-DSCP settings to an action. And probably making the shift and >> mask configurable rather than hard-coding something… > > I think so too though I think the mechanism of copying the DSCP bits > and adding a ‘I did this’ flag bit should be retained so that other > user space tools (iptables etc) can detect when a connmark based DSCP > has been set/applied. I guess this could be an option as well? > I think cake ‘fwmark’ should have the smarts to look for the act_dscp > DSCP value if nothing else so we don’t have to have the overhead of > act_dscp set restoring DSCP to all the packets if we don’t want to. Not sure what you mean here? > I’m right at the limit of my coding ability with what I’ve sent in so > far - the kernel space bits of act_connmark leave me mostly confused - > really not sure where to start with act_dscp! I think I would start with `cp act_connmark.c act_dscp.c`, adding the new file to the Makefile and Kconfig, and working from there. Then rip out everything not needed, and copy over what you already added to cake. Happy to help you work out the details; but I think we'll make more progress on this if you are driving it :) -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> On 4 Mar 2019, at 17:36, Toke Høiland-Jørgensen wrote: > < snips > > > Or act_dscp with 'get' and 'set' options :) > >> As I said earlier I couldn't work out how m_conntrack did… anything at >> all to be honest! > > act_connmark is pretty straight forward: > https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34 Oh bloody hell I’m an idiot - I was looking in user space tc code for something that obviously lives in kernel land. Yes *now* I can see what it does. > @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch, tin = 0; else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ - skb->mark && - skb->mark <= q->tin_cnt) - tin = q->tin_order[skb->mark - 1]; - - else if (TC_H_MAJ(skb->priority) == sch->handle && - TC_H_MIN(skb->priority) > 0 && - TC_H_MIN(skb->priority) <= q->tin_cnt) + skb->mark & 0x4000) { >>> >>> I think there's something odd with this mask? There's only one bit set >>> in it… >> >> I use the single bit as a flag to indicate cake has stored the DSCP >> in the lower 6 bits of the byte. Otherwise with a DSCP of 0 (the >> default) it’s rather difficult to know if a connection has been >> through the cake ’save dscp to fwmark’ process or not. That also >> indicates to user space whether it should consider mangling packets or >> not e.g. > > Ah, right. But that breaks the previous use case where someone just > wants to set fwmarks that get turned into CAKE tins? Yes, which is why I was hoping for upstream to bounce it, not least because of the unmasked use of fwmark field. Personally I’d like to see that change reverted before we get trapped into something we’ll regret. > I think this definitely is leaning towards decoupling the > fw-mark-to-DSCP settings to an action. And probably making the shift and > mask configurable rather than hard-coding something… I think so too though I think the mechanism of copying the DSCP bits and adding a ‘I did this’ flag bit should be retained so that other user space tools (iptables etc) can detect when a connmark based DSCP has been set/applied. I think cake ‘fwmark’ should have the smarts to look for the act_dscp DSCP value if nothing else so we don’t have to have the overhead of act_dscp set restoring DSCP to all the packets if we don’t want to. I’m right at the limit of my coding ability with what I’ve sent in so far - the kernel space bits of act_connmark leave me mostly confused - really not sure where to start with act_dscp! Cheers, Kevin D-B 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 4 Mar 2019, at 16:39, Toke Høiland-Jørgensen wrote: >> >> Kevin Darbyshire-Bryant writes: >> >> [ ... snipping a bit of context here ... ] >> >>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp) >>> +{ >>> + enum ip_conntrack_info ctinfo; >>> + struct nf_conn *ct; >>> + >>> + ct = nf_ct_get(skb, ); >>> + if (!ct) >>> + return; >>> + >>> + ct->mark &= 0x80ff; >>> + ct->mark |= (0x40 | dscp) << 24; >> >> Right, so we *might* have an argument that putting the *tin* into the >> fwmark is CAKE's business, but copying over the dscp mark is not >> something a qdisc should be doing… > > Why ever not? It’s not the DSCP, it’s a lookup value into the cake > priority table, it just happens to look like the DSCP ;-) If it quacks like a duck… >>> >>> I honestly don’t know where to go from here. I’m clearly trying to do >>> something that the kernel doesn’t want to do. >> >> I'm not disputing that what you're trying to do (moving DSCP field into >> connmark) is useful. I'm just questioning whether CAKE is the right >> place to do this. I think it would fit better in a TC action; either >> modify act_connmark, or create a new action to sync fwmarks and DSCP >> marks… > > Interesting. Thinks out loud - Two actions - ‘act_storedscp’, > ‘act_restoredscp' Or act_dscp with 'get' and 'set' options :) > As I said earlier I couldn't work out how m_conntrack did… anything at > all to be honest! act_connmark is pretty straight forward: https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34 >>> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct >>> Qdisc *sch, >>> tin = 0; >>> >>> else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ >>> -skb->mark && >>> -skb->mark <= q->tin_cnt) >>> - tin = q->tin_order[skb->mark - 1]; >>> - >>> - else if (TC_H_MAJ(skb->priority) == sch->handle && >>> -TC_H_MIN(skb->priority) > 0 && >>> -TC_H_MIN(skb->priority) <= q->tin_cnt) >>> + skb->mark & 0x4000) { >> >> I think there's something odd with this mask? There's only one bit set >> in it… > > I use the single bit as a flag to indicate cake has stored the DSCP > in the lower 6 bits of the byte. Otherwise with a DSCP of 0 (the > default) it’s rather difficult to know if a connection has been > through the cake ’save dscp to fwmark’ process or not. That also > indicates to user space whether it should consider mangling packets or > not e.g. Ah, right. But that breaks the previous use case where someone just wants to set fwmarks that get turned into CAKE tins? I think this definitely is leaning towards decoupling the fw-mark-to-DSCP settings to an action. And probably making the shift and mask configurable rather than hard-coding something... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> On 4 Mar 2019, at 16:39, Toke Høiland-Jørgensen wrote: > > Kevin Darbyshire-Bryant writes: > > [ ... snipping a bit of context here ... ] > >> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp) >> +{ >> +enum ip_conntrack_info ctinfo; >> +struct nf_conn *ct; >> + >> +ct = nf_ct_get(skb, ); >> +if (!ct) >> +return; >> + >> +ct->mark &= 0x80ff; >> +ct->mark |= (0x40 | dscp) << 24; > > Right, so we *might* have an argument that putting the *tin* into the > fwmark is CAKE's business, but copying over the dscp mark is not > something a qdisc should be doing… Why ever not? It’s not the DSCP, it’s a lookup value into the cake priority table, it just happens to look like the DSCP ;-) >>> >>> If it quacks like a duck… >> >> I honestly don’t know where to go from here. I’m clearly trying to do >> something that the kernel doesn’t want to do. > > I'm not disputing that what you're trying to do (moving DSCP field into > connmark) is useful. I'm just questioning whether CAKE is the right > place to do this. I think it would fit better in a TC action; either > modify act_connmark, or create a new action to sync fwmarks and DSCP > marks… Interesting. Thinks out loud - Two actions - ‘act_storedscp’, ‘act_restoredscp' As I said earlier I couldn't work out how m_conntrack did… anything at all to be honest! > > This would both sidestep the whole conntrack dependency issue, and make > the same functionality available outside of CAKE (for an HTB-based > setup, for instance). > >> v2 addressing some of the comments attached. Is it best to keep the >> in progress patches here or should they be github PRs ? > > Patches on the mailing list is fine by me, and it seems there are people > reading the list, but not github, so let's keep it here for now at > least. However, I'll hold off on more detailed comments on the patch > until we've resolved the point above. With one exception: There’s certainly been some response here that’s for sure :-) > >> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct >> Qdisc *sch, >> tin = 0; >> >> else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ >> - skb->mark && >> - skb->mark <= q->tin_cnt) >> -tin = q->tin_order[skb->mark - 1]; >> - >> -else if (TC_H_MAJ(skb->priority) == sch->handle && >> - TC_H_MIN(skb->priority) > 0 && >> - TC_H_MIN(skb->priority) <= q->tin_cnt) >> + skb->mark & 0x4000) { > > I think there's something odd with this mask? There's only one bit set > in it… I use the single bit as a flag to indicate cake has stored the DSCP in the lower 6 bits of the byte. Otherwise with a DSCP of 0 (the default) it’s rather difficult to know if a connection has been through the cake ’save dscp to fwmark’ process or not. That also indicates to user space whether it should consider mangling packets or not e.g. #if bit 6 0 then not been marked by cake - go & mangle the DSCP ready for cake to find & set #the mark. ipt -t mangle -A PREROUTING -i $IFACE -m mark --mark 0x00/0x4000 -g QOS_MARK_${IFACE} ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00/0x4000 -g QOS_MARK_${IFACE} Cheers, Kevin D-B 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: [ ... snipping a bit of context here ... ] > +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp) > +{ > + enum ip_conntrack_info ctinfo; > + struct nf_conn *ct; > + > + ct = nf_ct_get(skb, ); > + if (!ct) > + return; > + > + ct->mark &= 0x80ff; > + ct->mark |= (0x40 | dscp) << 24; Right, so we *might* have an argument that putting the *tin* into the fwmark is CAKE's business, but copying over the dscp mark is not something a qdisc should be doing… >>> >>> Why ever not? It’s not the DSCP, it’s a lookup value into the cake >>> priority table, it just happens to look like the DSCP ;-) >> >> If it quacks like a duck… > > I honestly don’t know where to go from here. I’m clearly trying to do > something that the kernel doesn’t want to do. I'm not disputing that what you're trying to do (moving DSCP field into connmark) is useful. I'm just questioning whether CAKE is the right place to do this. I think it would fit better in a TC action; either modify act_connmark, or create a new action to sync fwmarks and DSCP marks... This would both sidestep the whole conntrack dependency issue, and make the same functionality available outside of CAKE (for an HTB-based setup, for instance). > v2 addressing some of the comments attached. Is it best to keep the > in progress patches here or should they be github PRs ? Patches on the mailing list is fine by me, and it seems there are people reading the list, but not github, so let's keep it here for now at least. However, I'll hold off on more detailed comments on the patch until we've resolved the point above. With one exception: > @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct > Qdisc *sch, > tin = 0; > > else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ > - skb->mark && > - skb->mark <= q->tin_cnt) > - tin = q->tin_order[skb->mark - 1]; > - > - else if (TC_H_MAJ(skb->priority) == sch->handle && > - TC_H_MIN(skb->priority) > 0 && > - TC_H_MIN(skb->priority) <= q->tin_cnt) > +skb->mark & 0x4000) { I think there's something odd with this mask? There's only one bit set in it... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> On 4 Mar 2019, at 12:44, Toke Høiland-Jørgensen wrote: > > Kevin Darbyshire-Bryant writes: > >>> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen wrote: >>> >>> Kevin Darbyshire-Bryant writes: >>> > On 4 Mar 2019, at 08:39, Pete Heist wrote: > > >> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant >> wrote: >> >> The very bad idea: >> >> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark >> implementation as described above. So an awful lot of our >> shenanigans above is due to DSCP not traversing the internet >> particularly well. The solution above abstracts DSCP into ’tins’ >> which we put into fwmarks. Another approach would be to put the DSCP >> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained >> DSCP into the diffserv field onto the actual packets. Voila DSCP >> traversal across ’tinternet with tin/bandwidth allocation in our >> local domain preserved. > > If I understand it right, another use case for this “very bad idea” > is preserving DSCP locally while traversing upstream WiFi links as > besteffort, which avoids airtime efficiency problems that can occur > with 802.11e (WMM). In cases where the router config can’t be changed > (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as > it hides DSCP from the WiFi stack while preserving the values through > the tunnel, but this would be easier. Neat… :) Everyone has understood the intent & maybe the implementation correctly. 2 patches attached, one for cake, one for tc. They are naively coded and some of it undoes Toke’s recent tidying up (sorry!) >>> >>> Heh. First comment: Don't do that ;) >> >> I did say naively coded. >> >>> >>> A few more below. >>> 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A diff --git a/pkt_sched.h b/pkt_sched.h index a2f570c..d1f288d 100644 --- a/pkt_sched.h +++ b/pkt_sched.h @@ -879,6 +879,7 @@ enum { TCA_CAKE_ACK_FILTER, TCA_CAKE_SPLIT_GSO, TCA_CAKE_FWMARK, + TCA_CAKE_ICING, __TCA_CAKE_MAX }; #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1) diff --git a/sch_cake.c b/sch_cake.c index 733b897..5aca0f3 100644 --- a/sch_cake.c +++ b/sch_cake.c @@ -270,7 +270,8 @@ enum { CAKE_FLAG_INGRESS = BIT(2), CAKE_FLAG_WASH = BIT(3), CAKE_FLAG_SPLIT_GSO= BIT(4), - CAKE_FLAG_FWMARK = BIT(5) + CAKE_FLAG_FWMARK = BIT(5), + CAKE_FLAG_ICING= BIT(6) >>> >>> This implies that icing and fwmark can be enabled completely >>> independently of each other. Are you sure about the semantics for that? >> >> No, I’m not. I sent the patches so others could see my lunacy in action and >> hopefully improve it. >> >> The actual operation links FWMARK, INGRESS & ICING in a variety of >> combinations. > > Right, so obviously this needs to be thought through… Yes. Volunteers with thoughts & ideally code sought. > }; /* COBALT operates the Codel and BLUE algorithms in parallel, in order to @@ -333,7 +334,7 @@ static const u8 diffserv8[] = { }; static const u8 diffserv4[] = { - 0, 2, 0, 0, 2, 0, 0, 0, + 0, 1, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, @@ -344,7 +345,7 @@ static const u8 diffserv4[] = { }; static const u8 diffserv3[] = { - 0, 0, 0, 0, 2, 0, 0, 0, + 0, 1, 0, 0, 2, 0, 0, 0, >>> >>> Why are you messing with the diffserv mappings in this patch? >> >> This is a combination patch of Dave’s new LE coding and the >> fwmark/dscp mangling. > > Ah. Well let's keep that separate from this patch/discussion… Done > >>> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free) return idx + (tin << 16); } -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) +void cake_update_diffserv(struct sk_buff *skb, u8 dscp) +{ + switch (skb->protocol) { + case htons(ETH_P_IP): + if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp) + ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp); + break; + case htons(ETH_P_IPV6): + if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp) + ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp); + break; + default: + break; + } + +} >>> >>> So washing is just a special case of this (wash is >>> cake_update_diffserv(skb,0)). So you shouldn't need to add another >>> function, just augment the
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen wrote: >> >> Kevin Darbyshire-Bryant writes: >> On 4 Mar 2019, at 08:39, Pete Heist wrote: > On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant > wrote: > > The very bad idea: > > And it’s bad ‘cos it’s sort of incompatible with the existing fwmark > implementation as described above. So an awful lot of our > shenanigans above is due to DSCP not traversing the internet > particularly well. The solution above abstracts DSCP into ’tins’ > which we put into fwmarks. Another approach would be to put the DSCP > *into* the fwmark. CAKE could (optionally) copy the FWMARK contained > DSCP into the diffserv field onto the actual packets. Voila DSCP > traversal across ’tinternet with tin/bandwidth allocation in our > local domain preserved. If I understand it right, another use case for this “very bad idea” is preserving DSCP locally while traversing upstream WiFi links as besteffort, which avoids airtime efficiency problems that can occur with 802.11e (WMM). In cases where the router config can’t be changed (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as it hides DSCP from the WiFi stack while preserving the values through the tunnel, but this would be easier. Neat… :) >>> >>> Everyone has understood the intent & maybe the implementation >>> correctly. 2 patches attached, one for cake, one for tc. >>> >>> They are naively coded and some of it undoes Toke’s recent tidying up >>> (sorry!) >> >> Heh. First comment: Don't do that ;) > > I did say naively coded. > >> >> A few more below. >> >>> 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A >>> diff --git a/pkt_sched.h b/pkt_sched.h >>> index a2f570c..d1f288d 100644 >>> --- a/pkt_sched.h >>> +++ b/pkt_sched.h >>> @@ -879,6 +879,7 @@ enum { >>> TCA_CAKE_ACK_FILTER, >>> TCA_CAKE_SPLIT_GSO, >>> TCA_CAKE_FWMARK, >>> + TCA_CAKE_ICING, >>> __TCA_CAKE_MAX >>> }; >>> #define TCA_CAKE_MAX(__TCA_CAKE_MAX - 1) >>> diff --git a/sch_cake.c b/sch_cake.c >>> index 733b897..5aca0f3 100644 >>> --- a/sch_cake.c >>> +++ b/sch_cake.c >>> @@ -270,7 +270,8 @@ enum { >>> CAKE_FLAG_INGRESS = BIT(2), >>> CAKE_FLAG_WASH = BIT(3), >>> CAKE_FLAG_SPLIT_GSO= BIT(4), >>> - CAKE_FLAG_FWMARK = BIT(5) >>> + CAKE_FLAG_FWMARK = BIT(5), >>> + CAKE_FLAG_ICING= BIT(6) >> >> This implies that icing and fwmark can be enabled completely >> independently of each other. Are you sure about the semantics for that? > > No, I’m not. I sent the patches so others could see my lunacy in action and > hopefully improve it. > > The actual operation links FWMARK, INGRESS & ICING in a variety of > combinations. Right, so obviously this needs to be thought through... >>> }; >>> >>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to >>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = { >>> }; >>> >>> static const u8 diffserv4[] = { >>> - 0, 2, 0, 0, 2, 0, 0, 0, >>> + 0, 1, 0, 0, 2, 0, 0, 0, >>> 1, 0, 0, 0, 0, 0, 0, 0, >>> 2, 0, 2, 0, 2, 0, 2, 0, >>> 2, 0, 2, 0, 2, 0, 2, 0, >>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = { >>> }; >>> >>> static const u8 diffserv3[] = { >>> - 0, 0, 0, 0, 2, 0, 0, 0, >>> + 0, 1, 0, 0, 2, 0, 0, 0, >> >> Why are you messing with the diffserv mappings in this patch? > > This is a combination patch of Dave’s new LE coding and the > fwmark/dscp mangling. Ah. Well let's keep that separate from this patch/discussion... >> >>> 1, 0, 0, 0, 0, 0, 0, 0, >>> 0, 0, 0, 0, 0, 0, 0, 0, >>> 0, 0, 0, 0, 0, 0, 0, 0, >>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, >>> struct sk_buff **to_free) >>> return idx + (tin << 16); >>> } >>> >>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) >>> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp) >>> +{ >>> + switch (skb->protocol) { >>> + case htons(ETH_P_IP): >>> + if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp) >>> + ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp); >>> + break; >>> + case htons(ETH_P_IPV6): >>> + if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp) >>> + ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp); >>> + break; >>> + default: >>> + break; >>> + } >>> + >>> +} >> >> So washing is just a special case of this (wash is >> cake_update_diffserv(skb,0)). So you shouldn't need to add another >> function, just augment the existing handling code. > > Erm, that’s exactly what I’ve done. Ah, right; I guess it's just the reverting of the cleanup patch that is the issue, then :) >>> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash) >>> { >>> u8
Re: [Cake] Using firewall connmarks as tin selectors
> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen wrote: > > Kevin Darbyshire-Bryant writes: > >>> On 4 Mar 2019, at 08:39, Pete Heist wrote: >>> >>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant wrote: The very bad idea: And it’s bad ‘cos it’s sort of incompatible with the existing fwmark implementation as described above. So an awful lot of our shenanigans above is due to DSCP not traversing the internet particularly well. The solution above abstracts DSCP into ’tins’ which we put into fwmarks. Another approach would be to put the DSCP *into* the fwmark. CAKE could (optionally) copy the FWMARK contained DSCP into the diffserv field onto the actual packets. Voila DSCP traversal across ’tinternet with tin/bandwidth allocation in our local domain preserved. >>> >>> If I understand it right, another use case for this “very bad idea” >>> is preserving DSCP locally while traversing upstream WiFi links as >>> besteffort, which avoids airtime efficiency problems that can occur >>> with 802.11e (WMM). In cases where the router config can’t be changed >>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as >>> it hides DSCP from the WiFi stack while preserving the values through >>> the tunnel, but this would be easier. Neat… :) >> >> Everyone has understood the intent & maybe the implementation >> correctly. 2 patches attached, one for cake, one for tc. >> >> They are naively coded and some of it undoes Toke’s recent tidying up >> (sorry!) > > Heh. First comment: Don't do that ;) I did say naively coded. > > A few more below. > >> 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A >> diff --git a/pkt_sched.h b/pkt_sched.h >> index a2f570c..d1f288d 100644 >> --- a/pkt_sched.h >> +++ b/pkt_sched.h >> @@ -879,6 +879,7 @@ enum { >> TCA_CAKE_ACK_FILTER, >> TCA_CAKE_SPLIT_GSO, >> TCA_CAKE_FWMARK, >> +TCA_CAKE_ICING, >> __TCA_CAKE_MAX >> }; >> #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1) >> diff --git a/sch_cake.c b/sch_cake.c >> index 733b897..5aca0f3 100644 >> --- a/sch_cake.c >> +++ b/sch_cake.c >> @@ -270,7 +270,8 @@ enum { >> CAKE_FLAG_INGRESS = BIT(2), >> CAKE_FLAG_WASH = BIT(3), >> CAKE_FLAG_SPLIT_GSO= BIT(4), >> -CAKE_FLAG_FWMARK = BIT(5) >> +CAKE_FLAG_FWMARK = BIT(5), >> +CAKE_FLAG_ICING= BIT(6) > > This implies that icing and fwmark can be enabled completely > independently of each other. Are you sure about the semantics for that? No, I’m not. I sent the patches so others could see my lunacy in action and hopefully improve it. The actual operation links FWMARK, INGRESS & ICING in a variety of combinations. >> }; >> >> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to >> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = { >> }; >> >> static const u8 diffserv4[] = { >> -0, 2, 0, 0, 2, 0, 0, 0, >> +0, 1, 0, 0, 2, 0, 0, 0, >> 1, 0, 0, 0, 0, 0, 0, 0, >> 2, 0, 2, 0, 2, 0, 2, 0, >> 2, 0, 2, 0, 2, 0, 2, 0, >> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = { >> }; >> >> static const u8 diffserv3[] = { >> -0, 0, 0, 0, 2, 0, 0, 0, >> +0, 1, 0, 0, 2, 0, 0, 0, > > Why are you messing with the diffserv mappings in this patch? This is a combination patch of Dave’s new LE coding and the fwmark/dscp mangling. > >> 1, 0, 0, 0, 0, 0, 0, 0, >> 0, 0, 0, 0, 0, 0, 0, 0, >> 0, 0, 0, 0, 0, 0, 0, 0, >> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, >> struct sk_buff **to_free) >> return idx + (tin << 16); >> } >> >> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) >> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp) >> +{ >> +switch (skb->protocol) { >> +case htons(ETH_P_IP): >> +if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp) >> +ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp); >> +break; >> +case htons(ETH_P_IPV6): >> +if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp) >> +ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp); >> +break; >> +default: >> +break; >> +} >> + >> +} > > So washing is just a special case of this (wash is > cake_update_diffserv(skb,0)). So you shouldn't need to add another > function, just augment the existing handling code. Erm, that’s exactly what I’ve done. > >> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash) >> { >> u8 dscp; >> >> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, >> u16 wash) >> } >> } >> >> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) > > Save an ifdef below by moving the ifdef inside the function definition. > >> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp) >> +{ >> +enum ip_conntrack_info ctinfo; >> +
Re: [Cake] Using firewall connmarks as tin selectors
Let action connmark continue to do that. You still need mirred anyway. John On 4 March 2019 11:04:39 GMT, "Toke Høiland-Jørgensen" wrote: >Kevin Darbyshire-Bryant writes: > >>> On 3 Mar 2019, at 12:22, John Sager wrote: >>> >>> If you are going to do that, I would suggest using a few of the >upper bits >>> of the 32-bit fwmark/connmark space available, rather than the >lowest bits. >>> Then that would allow to use fwmarks other purposes, and to use the >lowest >>> bits, as well in the future. As iptables allows a mask before >comparison, >>> then choose a specific mask for the bits you use both for setting >and testing. >> >> That’s a good point and I’m sort of hoping upstream reject the >current >> submission on that basis. I think the ‘use of fwmarks’ needs more >> thought as to how it’s done for the future - too keen to get >something >> out. Maybe it’s sufficient as is, I don’t know. > >https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0b5c7efdfc6e389ec6840579fe90bdb6f42b08dc > >This means it'll be in 5.1; so we have until that is released (~8 weeks >or so) to set the behaviour in stone. > >I do think we at least need to add masking of the mark before using it >for tin selection; the question is just which bits to use from it. > >As for setting the fwmark back in conntrack, I'm not sure I agree that >this is something CAKE should be doing. Mostly because it means even >tighter coupling between CAKE and the conntrack subsystem. However, I >may be convinced by a sufficiently neat implementation, and anyway this >is definitely something that will need to wait for 5.2 for upstream. > >So I think the priority is to agree on semantics for masking the fwmark >when reading, and getting that implemented in a way that is compatible >with both other uses of marks, and with anything we else we might want >to do down the road. > >-Toke -- Sent from the Aether.___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 4 Mar 2019, at 08:39, Pete Heist wrote: >> >> >>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant >>> wrote: >>> >>> The very bad idea: >>> >>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark >>> implementation as described above. So an awful lot of our >>> shenanigans above is due to DSCP not traversing the internet >>> particularly well. The solution above abstracts DSCP into ’tins’ >>> which we put into fwmarks. Another approach would be to put the DSCP >>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained >>> DSCP into the diffserv field onto the actual packets. Voila DSCP >>> traversal across ’tinternet with tin/bandwidth allocation in our >>> local domain preserved. >> >> If I understand it right, another use case for this “very bad idea” >> is preserving DSCP locally while traversing upstream WiFi links as >> besteffort, which avoids airtime efficiency problems that can occur >> with 802.11e (WMM). In cases where the router config can’t be changed >> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as >> it hides DSCP from the WiFi stack while preserving the values through >> the tunnel, but this would be easier. Neat… :) > > Everyone has understood the intent & maybe the implementation > correctly. 2 patches attached, one for cake, one for tc. > > They are naively coded and some of it undoes Toke’s recent tidying up > (sorry!) Heh. First comment: Don't do that ;) A few more below. > 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A > diff --git a/pkt_sched.h b/pkt_sched.h > index a2f570c..d1f288d 100644 > --- a/pkt_sched.h > +++ b/pkt_sched.h > @@ -879,6 +879,7 @@ enum { > TCA_CAKE_ACK_FILTER, > TCA_CAKE_SPLIT_GSO, > TCA_CAKE_FWMARK, > + TCA_CAKE_ICING, > __TCA_CAKE_MAX > }; > #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1) > diff --git a/sch_cake.c b/sch_cake.c > index 733b897..5aca0f3 100644 > --- a/sch_cake.c > +++ b/sch_cake.c > @@ -270,7 +270,8 @@ enum { > CAKE_FLAG_INGRESS = BIT(2), > CAKE_FLAG_WASH = BIT(3), > CAKE_FLAG_SPLIT_GSO= BIT(4), > - CAKE_FLAG_FWMARK = BIT(5) > + CAKE_FLAG_FWMARK = BIT(5), > + CAKE_FLAG_ICING= BIT(6) This implies that icing and fwmark can be enabled completely independently of each other. Are you sure about the semantics for that? > }; > > /* COBALT operates the Codel and BLUE algorithms in parallel, in order to > @@ -333,7 +334,7 @@ static const u8 diffserv8[] = { > }; > > static const u8 diffserv4[] = { > - 0, 2, 0, 0, 2, 0, 0, 0, > + 0, 1, 0, 0, 2, 0, 0, 0, > 1, 0, 0, 0, 0, 0, 0, 0, > 2, 0, 2, 0, 2, 0, 2, 0, > 2, 0, 2, 0, 2, 0, 2, 0, > @@ -344,7 +345,7 @@ static const u8 diffserv4[] = { > }; > > static const u8 diffserv3[] = { > - 0, 0, 0, 0, 2, 0, 0, 0, > + 0, 1, 0, 0, 2, 0, 0, 0, Why are you messing with the diffserv mappings in this patch? > 1, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, > @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, > struct sk_buff **to_free) > return idx + (tin << 16); > } > > -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) > +void cake_update_diffserv(struct sk_buff *skb, u8 dscp) > +{ > + switch (skb->protocol) { > + case htons(ETH_P_IP): > + if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp) > + ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp); > + break; > + case htons(ETH_P_IPV6): > + if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp) > + ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp); > + break; > + default: > + break; > + } > + > +} So washing is just a special case of this (wash is cake_update_diffserv(skb,0)). So you shouldn't need to add another function, just augment the existing handling code. > +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash) > { > u8 dscp; > > @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, > u16 wash) > } > } > > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) Save an ifdef below by moving the ifdef inside the function definition. > +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp) > +{ > + enum ip_conntrack_info ctinfo; > + struct nf_conn *ct; > + > + ct = nf_ct_get(skb, ); > + if (!ct) > + return; > + > + ct->mark &= 0x80ff; > + ct->mark |= (0x40 | dscp) << 24; Right, so we *might* have an argument that putting the *tin* into the fwmark is CAKE's business, but copying over the dscp mark is not something a qdisc should be doing... > + nf_conntrack_event_cache(IPCT_MARK, ct); > +} > +#endif Also, are you sure this will work in all permutations of conntrack being a module vs
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 3 Mar 2019, at 12:22, John Sager wrote: >> >> If you are going to do that, I would suggest using a few of the upper bits >> of the 32-bit fwmark/connmark space available, rather than the lowest bits. >> Then that would allow to use fwmarks other purposes, and to use the lowest >> bits, as well in the future. As iptables allows a mask before comparison, >> then choose a specific mask for the bits you use both for setting and >> testing. > > That’s a good point and I’m sort of hoping upstream reject the current > submission on that basis. I think the ‘use of fwmarks’ needs more > thought as to how it’s done for the future - too keen to get something > out. Maybe it’s sufficient as is, I don’t know. https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0b5c7efdfc6e389ec6840579fe90bdb6f42b08dc This means it'll be in 5.1; so we have until that is released (~8 weeks or so) to set the behaviour in stone. I do think we at least need to add masking of the mark before using it for tin selection; the question is just which bits to use from it. As for setting the fwmark back in conntrack, I'm not sure I agree that this is something CAKE should be doing. Mostly because it means even tighter coupling between CAKE and the conntrack subsystem. However, I may be convinced by a sufficiently neat implementation, and anyway this is definitely something that will need to wait for 5.2 for upstream. So I think the priority is to agree on semantics for masking the fwmark when reading, and getting that implemented in a way that is compatible with both other uses of marks, and with anything we else we might want to do down the road. -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> On 4 Mar 2019, at 08:39, Pete Heist wrote: > > >> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant >> wrote: >> >> The very bad idea: >> >> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark >> implementation as described above. So an awful lot of our shenanigans above >> is due to DSCP not traversing the internet particularly well. The solution >> above abstracts DSCP into ’tins’ which we put into fwmarks. Another >> approach would be to put the DSCP *into* the fwmark. CAKE could >> (optionally) copy the FWMARK contained DSCP into the diffserv field onto the >> actual packets. Voila DSCP traversal across ’tinternet with tin/bandwidth >> allocation in our local domain preserved. > > If I understand it right, another use case for this “very bad idea” is > preserving DSCP locally while traversing upstream WiFi links as besteffort, > which avoids airtime efficiency problems that can occur with 802.11e (WMM). > In cases where the router config can’t be changed (802.11e is mandatory after > all) I’ve used IPIP tunnels for this, as it hides DSCP from the WiFi stack > while preserving the values through the tunnel, but this would be easier. > Neat… :) Everyone has understood the intent & maybe the implementation correctly. 2 patches attached, one for cake, one for tc. They are naively coded and some of it undoes Toke’s recent tidying up (sorry!) The ingress path is dependent upon tc actions having restored the fwmark. I cannot for the life of me work out how that actually happens (ie. the code that does it) to see if that’s all it does, or if cake could do it itself and so on. Anyway, the code warts and all…. Cheers, Kevin D-B 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A cake-add-dscp-copy-and-icing.patch Description: cake-add-dscp-copy-and-icing.patch my_layer_cake.qos Description: my_layer_cake.qos tc-cake-add-fwmark-icing-options.patch Description: tc-cake-add-fwmark-icing-options.patch ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant > wrote: > > The very bad idea: > > And it’s bad ‘cos it’s sort of incompatible with the existing fwmark > implementation as described above. So an awful lot of our shenanigans above > is due to DSCP not traversing the internet particularly well. The solution > above abstracts DSCP into ’tins’ which we put into fwmarks. Another approach > would be to put the DSCP *into* the fwmark. CAKE could (optionally) copy the > FWMARK contained DSCP into the diffserv field onto the actual packets. Voila > DSCP traversal across ’tinternet with tin/bandwidth allocation in our local > domain preserved. If I understand it right, another use case for this “very bad idea” is preserving DSCP locally while traversing upstream WiFi links as besteffort, which avoids airtime efficiency problems that can occur with 802.11e (WMM). In cases where the router config can’t be changed (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as it hides DSCP from the WiFi stack while preserving the values through the tunnel, but this would be easier. Neat… :) ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
On Sun, Mar 3, 2019 at 10:37 PM Ryan Mounce wrote: > > On Mon, 4 Mar 2019 at 17:01, Jonathan Morton wrote: > > …icing? > > Perfect! And to me, this functionality truly is the icing on (the) > cake that makes it the complete bufferbloat/QoS system I've been > dreaming of for ingress. > ___ > Cake mailing list > Cake@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cake Goforit. And while you are at it please patch in support for the new LE bit and see if that works better than CS1? It's in IESG now, it should have a number in 2-3 months. https://datatracker.ietf.org/doc/draft-ietf-tsvwg-le-phb/ -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
On Mon, 4 Mar 2019 at 17:01, Jonathan Morton wrote: > …icing? Perfect! And to me, this functionality truly is the icing on (the) cake that makes it the complete bufferbloat/QoS system I've been dreaming of for ingress. ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> On 4 Mar, 2019, at 7:37 am, Ryan Mounce wrote: > > The overwriting of the DSCP bits from fwmark could be called > 'staining', as opposed to DSCP 'bleaching'. But this doesn't sound > very appealing when we're baking delicious CAKE - we're in the kitchen > not the laundry! So how about dyeing, or colouring? …icing? https://www.youtube.com/watch?v=ouRcDRpsRsA - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
On Sun, 3 Mar 2019 at 22:22, Kevin Darbyshire-Bryant wrote: > > Be afraid, be very afraid. > > I’ve woken up with two ideas in my head, one is bad, the other is very bad. > The bad one is already implemented and lurking in the mine branch of my cake > git tree: > > The bad idea: > > An extension of the ‘fwmark’ tin allocation idea is to get cake to > automagically update the conntrack mark based on the DSCP tin allocation > chosen on egress. That way, well behaved applications using DSCP (e.g. > dropbear) get their return path packets similarly classified on ingress. > Badly behaved applications can have iptables rules put in place to ‘manually’ > add fwmarks as is already done. > > > The very bad idea: > > And it’s bad ‘cos it’s sort of incompatible with the existing fwmark > implementation as described above. So an awful lot of our shenanigans above > is due to DSCP not traversing the internet particularly well. The solution > above abstracts DSCP into ’tins’ which we put into fwmarks. Another approach > would be to put the DSCP *into* the fwmark. CAKE could (optionally) copy the > FWMARK contained DSCP into the diffserv field onto the actual packets. Voila > DSCP traversal across ’tinternet with tin/bandwidth allocation in our local > domain preserved. Ooh now you've got my attention - equal parts horrifying and brilliant. Just making sure I have my head around this, I think it would be perfect for my use case. My ISP requires traffic to be washed on egress, and they wash it on ingress. I set the DSCP in some applications and mangle some others with iptables on the router. Either way, DSCP is present by the time packets reach cake on egress, which dutifully uses DSCP to sort into tins before washing. This works great, albeit with each packet for certain flows being inefficiently mangled. So the first change is that cake will copy the DSCP bits of a packet into the fwmark on egress - if the fwmark hasn't already been set by cake. This allows the impact of my iptables mangle rules to be minimised as only packets without a cake-populated fwmark need their DSCP to be mangled. Cake then uses the fwmark to sort packets into tins, and in my case the DSCP is washed on egress. Then on ingress, cake uses the same fwmark to sort packets into tins. And cake can copy the DSCP bits back from the fwmark to packets on ingress. Magic! The overwriting of the DSCP bits from fwmark could be called 'staining', as opposed to DSCP 'bleaching'. But this doesn't sound very appealing when we're baking delicious CAKE - we're in the kitchen not the laundry! So how about dyeing, or colouring? -Ryan ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> On 3 Mar 2019, at 12:22, John Sager wrote: > > If you are going to do that, I would suggest using a few of the upper bits > of the 32-bit fwmark/connmark space available, rather than the lowest bits. > Then that would allow to use fwmarks other purposes, and to use the lowest > bits, as well in the future. As iptables allows a mask before comparison, > then choose a specific mask for the bits you use both for setting and testing. That’s a good point and I’m sort of hoping upstream reject the current submission on that basis. I think the ‘use of fwmarks’ needs more thought as to how it’s done for the future - too keen to get something out. Maybe it’s sufficient as is, I don’t know. What I do know is that after implementing the ‘bad idea’ I subsequently implemented the very bad idea! Using the top byte of mark, bits 5-0 are the DSCP, bit 6 is the ‘Cake set this’ flag, and bit 7 is left alone as it tends to get mis-interpreted as a sign bit. I implemented the iptables marking chains as ‘check if cake fwmark bit set, go to a DSCP mangle chain if not’, mangle the DSCP, picked up by CAKE which (in egress mode) copies the DSCP to the fwmark’ and set the ‘cake did this’ bit. In theory connections only go through the dscp mangle once. Cake in ingress mode will copy the fwmark if it’s been set by cake back into the diffserv field on each packet. I think that last step should be made optional but I’ve had enough ‘fun’ for the day. https://github.com/ldir-EDB0/sch_cake/commits/mine Cheers, Kevin D-B 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: >> On 28 Feb 2019, at 09:54, Toke Høiland-Jørgensen wrote: >> >>> I also equally aware that this is ‘creeping featuritis’ and doing >>> nothing to speed cake up… >> >> Yeah, this is the crux of the issue, really: it's a tradeoff between >> ease of use and featuritis. Now in this case the actual impact is a >> single check it might actually be acceptable >> >>> actually I may have improved BESTEFFORT a little - we no longer look >>> for matching TC Major numbers if there’s no actual choice of tin to be >>> made :-) >> >> Well, you made the besteffort case slightly faster, but every other mode >> slightly slower... :) > > Ah, ok, tc filter really is king ;-) Well, I don't mind doing the simplification, but it should be done as a separate commit; I'll do that after mering your pull request. >> If you are going to send a patch (or pull request), please leave out the >> refactoring, and only include the feature. This makes it easier to see >> the impact of the feature addition on its own. > > Ha ha, sending a patch to the kernel lists??? Never again! Haha, no, I meant to this list. I'll submit upstream, along with the other stuff. > The non-refactored version is > https://github.com/ldir-EDB0/sch_cake/commit/f30bb18adc97c827c81c9a3297ab14bfaf2adcb0 > and I’ve created a PR Great, thanks! >> Also, I assume you have a companion patch for iproute2 somewhere? > > In the first email: https://github.com/ldir-EDB0/tc-adv/commits/fwmark > - based on 4.20.0 ‘cos that’s where openwrt is at the moment. A pull request against tc-adv would be useful :) > When I’ve found a suitable editor I was even going to update the man > page! I usually just use a regular editor and copy-paste the syntax bits from another entry... -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> On 28 Feb 2019, at 09:54, Toke Høiland-Jørgensen wrote: > >> I also equally aware that this is ‘creeping featuritis’ and doing >> nothing to speed cake up… > > Yeah, this is the crux of the issue, really: it's a tradeoff between > ease of use and featuritis. Now in this case the actual impact is a > single check it might actually be acceptable > >> actually I may have improved BESTEFFORT a little - we no longer look >> for matching TC Major numbers if there’s no actual choice of tin to be >> made :-) > > Well, you made the besteffort case slightly faster, but every other mode > slightly slower... :) Ah, ok, tc filter really is king ;-) > > If you are going to send a patch (or pull request), please leave out the > refactoring, and only include the feature. This makes it easier to see > the impact of the feature addition on its own. Ha ha, sending a patch to the kernel lists??? Never again! The non-refactored version is https://github.com/ldir-EDB0/sch_cake/commit/f30bb18adc97c827c81c9a3297ab14bfaf2adcb0 and I’ve created a PR > > Also, I assume you have a companion patch for iproute2 somewhere? In the first email: https://github.com/ldir-EDB0/tc-adv/commits/fwmark - based on 4.20.0 ‘cos that’s where openwrt is at the moment. When I’ve found a suitable editor I was even going to update the man page! > > -Toke Cheers, Kevin D-B 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> I also equally aware that this is ‘creeping featuritis’ and doing > nothing to speed cake up… Yeah, this is the crux of the issue, really: it's a tradeoff between ease of use and featuritis. Now in this case the actual impact is a single check it might actually be acceptable > actually I may have improved BESTEFFORT a little - we no longer look > for matching TC Major numbers if there’s no actual choice of tin to be > made :-) Well, you made the besteffort case slightly faster, but every other mode slightly slower... :) If you are going to send a patch (or pull request), please leave out the refactoring, and only include the feature. This makes it easier to see the impact of the feature addition on its own. Also, I assume you have a companion patch for iproute2 somewhere? -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> On 27 Feb 2019, at 15:14, Toke Høiland-Jørgensen wrote: > > Kevin Darbyshire-Bryant writes: > >> How unpopular would the idea of having cake look at skb->mark directly be? >> >> https://github.com/ldir-EDB0/sch_cake/commit/64d0e6ac9368a271221db888ab91a367fcd37ae1 >> >> https://github.com/ldir-EDB0/tc-adv/commit/4f16ae5d588d44f8a5c83fe2f2b7dcad97843cbc > Hiya Toke, OK, so it’s not an instant no :-) > Hmm, not impossible, but seeing as we already have a way to achieve that > with BPF, is it really needed? Well, from a command line usability I’d say a ’no/fwmark’ option is a lot easier to grasp/configure than invoking anything BPF related. Similarly, a suitable BPF program requires building and based on your below comments, with hard coded constants e.g. major number. The ‘entry barrier’ is high, I have to write the BPF program, compile it (eBPF ‘toolchain'), maintain it, install it and there are certain (lack of) features that make it clunky to use even after you’ve done all that. > >> I did the equivalent in eBPF here >> https://github.com/ldir-EDB0/cls_bpf_connmark_to_caketin but I can’t >> work out how to make the major number a tc command line argument into >> the BPF code. > > You can't, but you can set the major number explicitly when you create > the qdisc: Indeed the script I’m using does exactly that, but it means I need a ‘hard coded’ BPF program (or section at least) per cake instance. Tail wagging the dog springs to mind. I re-worked the code (again) anyway, (last commit https://github.com/ldir-EDB0/sch_cake/commits/mine ) The driver for doing any of this is primarily related to wan ingress classification. DSCP can’t be trusted and can’t be manipulated (save for eBPF) Neither iptables rules nor conntrack NAT lookup will have occurred so an eBPF program using internal addresses is challenging. i *can* persuade tc to restore the connmark on ingress, so I can write old style, complicated, slow iptables rules to apply a connmark once on egress and have that connmark used for both egress & ingress aspects of that connection. I also equally aware that this is ‘creeping featuritis’ and doing nothing to speed cake up…actually I may have improved BESTEFFORT a little - we no longer look for matching TC Major numbers if there’s no actual choice of tin to be made :-) Cheers, Kevin D-B 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
I think it's much simpler to use than tc-filter, BPF or even DSCP bits. Manipulating DSCP bits seems the simplest of the currently available mechanisms to classify traffic. Even in this case, fwmarks are essentially simpler. E.g. if you want to classify outgoing traffic on the LAN interface: with DSCP you need to manipulate DSCP bits on incoming packets on the WAN interface. with fwmark you can directly mark outgoing packets on the LAN interface and cake will classify them appropriately. ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
> How unpopular would the idea of having cake look at skb->mark directly be? loving it ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant writes: > How unpopular would the idea of having cake look at skb->mark directly be? > > https://github.com/ldir-EDB0/sch_cake/commit/64d0e6ac9368a271221db888ab91a367fcd37ae1 > > https://github.com/ldir-EDB0/tc-adv/commit/4f16ae5d588d44f8a5c83fe2f2b7dcad97843cbc Hmm, not impossible, but seeing as we already have a way to achieve that with BPF, is it really needed? > I did the equivalent in eBPF here > https://github.com/ldir-EDB0/cls_bpf_connmark_to_caketin but I can’t > work out how to make the major number a tc command line argument into > the BPF code. You can't, but you can set the major number explicitly when you create the qdisc: $ sudo tc qdisc replace dev eth0 root handle 1: cake $ tc qdisc show dev eth0 qdisc cake 1: root refcnt 2 bandwidth unlimited diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 -Toke ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake