Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 09/27/2016 04:18 PM, Shmulik Ladkani wrote: On Tue, 27 Sep 2016 09:44:41 -0400 (EDT), da...@davemloft.net wrote: From: Daniel Borkmann Date: Tue, 27 Sep 2016 12:39:34 +0200 Any reason why dev_forward_skb() is not preferred over direct netif_receive_skb() you're using? It would, for example, implicitly assure that pkt_type is always PACKET_HOST, etc. dev_forward_skb() will pull the ethernet header. And since a direct call to netif_receive_skb() will not, one of these two choices won't work properly. In the patch, I'm issuing a skb_pull_rcsum() prior the netif_receive_skb, snip: [...] Existing *egress* mir/red already supported pairing two non-eth devices. Therefore I allow it for the new *ingress* mir/red as well. [...] Yeah, makes sense then. Should skb->pkt_type become an issue, you might probably just use act_skbedit for such cases.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Tue, 27 Sep 2016 09:44:41 -0400 (EDT), da...@davemloft.net wrote: > From: Daniel Borkmann > Date: Tue, 27 Sep 2016 12:39:34 +0200 > > > Any reason why dev_forward_skb() is not preferred over direct > > netif_receive_skb() you're using? It would, for example, implicitly > > assure that pkt_type is always PACKET_HOST, etc. > > dev_forward_skb() will pull the ethernet header. > > And since a direct call to netif_receive_skb() will not, one of these > two choices won't work properly. In the patch, I'm issuing a skb_pull_rcsum() prior the netif_receive_skb, snip: + /* If action's target direction differs than filter's direction, +* and devices expect a mac header on xmit, then mac push/pull is +* needed. +*/ + if (at != tcf_mirred_act_direction(m->tcfm_eaction) && + m->tcfm_mac_header_xmit) { + if (at & AT_EGRESS) { + /* caught at egress, act ingress: pull mac */ + mac_len = skb_network_header(skb) - skb_mac_header(skb); + skb_pull_rcsum(skb2, mac_len); Existing *egress* mir/red already supported pairing two non-eth devices. Therefore I allow it for the new *ingress* mir/red as well. Now, following this premise, the skb_pull_rcsum() shown above is executed for devices whose xmit is mac_header based. I've tested both ARPHRD_ETHER devices and some non ARPHRD_ETHER devices. (an *existing* symmetric skb_push_rcsum() is invoked if packet is caught at ingress and redirected for egress on a mac_header xmit device) This was the reason for picking netif_receive_skb() over dev_forward_skb(). Regards, Shmulik
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-27 04:07 AM, Shmulik Ladkani wrote: Hi David, On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), da...@davemloft.net wrote: The discussion on this patch has ventured off into what to do about recursion. But it unclear to me where this specific patch, and this series, stands right now. Someone please clear this up for me. Status: - Series adds "ingress redirect/mirror" support - Positive feedback for the feature - So far no comments regarding code itself - Questions raised regarding "recursion handling" > Expressed that existing mirred code (i.e egress redirect) is *already* loop-unsafe (and also, some non-tc netdev constructs, as exampled by others). Discussion then wandered to "recursion handling". not totaly bike-shed discussion; legit issues are being raised (and the egress issue you point out is fixable now that we are paying attention to it). We need to take care of loops. I pointed to how the original thought process was. I _dont_ see this as resolvable via recursion handling since this is per-skb and not per entry point. You can add my Acked-by if you promise to take care of this issue next. cheers, jamal PS:- the code looks straight forward
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
From: Daniel Borkmann Date: Tue, 27 Sep 2016 12:39:34 +0200 > Any reason why dev_forward_skb() is not preferred over direct > netif_receive_skb() you're using? It would, for example, implicitly > assure that pkt_type is always PACKET_HOST, etc. dev_forward_skb() will pull the ethernet header. And since a direct call to netif_receive_skb() will not, one of these two choices won't work properly.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 09/27/2016 10:07 AM, Shmulik Ladkani wrote: Hi David, On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), da...@davemloft.net wrote: The discussion on this patch has ventured off into what to do about recursion. But it unclear to me where this specific patch, and this series, stands right now. Someone please clear this up for me. Status: - Series adds "ingress redirect/mirror" support - Positive feedback for the feature - So far no comments regarding code itself - Questions raised regarding "recursion handling" Expressed that existing mirred code (i.e egress redirect) is *already* loop-unsafe (and also, some non-tc netdev constructs, as exampled by others). Discussion then wandered to "recursion handling". Any reason why dev_forward_skb() is not preferred over direct netif_receive_skb() you're using? It would, for example, implicitly assure that pkt_type is always PACKET_HOST, etc. Thanks, Daniel
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
Hi David, On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), da...@davemloft.net wrote: > The discussion on this patch has ventured off into what to do about > recursion. > > But it unclear to me where this specific patch, and this series, > stands right now. Someone please clear this up for me. Status: - Series adds "ingress redirect/mirror" support - Positive feedback for the feature - So far no comments regarding code itself - Questions raised regarding "recursion handling" Expressed that existing mirred code (i.e egress redirect) is *already* loop-unsafe (and also, some non-tc netdev constructs, as exampled by others). Discussion then wandered to "recursion handling". Regards, Shmulik
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
From: Shmulik Ladkani Date: Thu, 22 Sep 2016 16:21:52 +0300 > From: Shmulik Ladkani > > Up until now, 'action mirred' supported only egress actions (either > TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR). > > This patch implements the corresponding ingress actions > TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR. > > This allows attaching filters whose target is to hand matching skbs into > the rx processing of a specified device. > > Signed-off-by: Shmulik Ladkani > Cc: Jamal Hadi Salim > --- > Was wondering, whether netif_receive_skb or dev_forward_skb should be > used for the rx bouncing. Used netif_receive_skb as in ifb device. The discussion on this patch has ventured off into what to do about recursion. But it unclear to me where this specific patch, and this series, stands right now. Someone please clear this up for me. Thanks.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 09/26/2016 05:12 PM, Hannes Frederic Sowa wrote: [...] I would just care that we sometimes reschedule and don't do everything in one stack so we don't corrupt the machine and an admin has still a chance to solve the problem. Sounds reasonable to me, and which is what dev_forward_skb() is doing implicitly as well.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
Hi, On Mon, 26 Sep 2016 16:43:16 +0200 Hannes Frederic Sowa wrote: > On 26.09.2016 03:35, Florian Westphal wrote: > > > > Yes, but I think we get same issue when we deal with stacked > > interfaces, and redirect is to e.g. vlan on top of physical device. > > We do have the adjacent upper lists in all netdevices, calculating if a > mirred actions would insert the skb on a stacked device above us should > be as easy as querying netdev_has_upper_dev and should be possible to > check that during config time. This does not seem the right direction: - We should NOT ban egress redirect to an "upper device", this limits flexibility available today: One may validly redirect to an "upper device" according to a very specific criteria such that the skb will NOT be caught again by the filter when re-iterated down the stack. - And even if we did, at "config time" as you say, one can always stack the target device ontop the filtered device at a LATER time. And obviously, testing "is upper" while executing action is a bad idea. - Moreover, as Daniel noted, this is just a band-aid which limitely addresses only few of the already existing troubles with stacked virtual netdevices (with or without act_mirred). Frankly, I think the loop-detection suggestions overload act_mirred with AI that might imply limitations not well assessed (even on existing usecases). If needed, I'd go with the "recursion detection" suggested by Daniel, which is simple and aids with the issues of "egress" redirect that already exist today. Moreover, as the patch series is all about "ingress redirect" support, the "recursion detection" deserves a series of its own, we shouldn't bundle these two distinct features. Regards, Shmulik
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 26.09.2016 16:53, Daniel Borkmann wrote: > On 09/26/2016 04:43 PM, Hannes Frederic Sowa wrote: >> On 26.09.2016 03:35, Florian Westphal wrote: >>> Jamal Hadi Salim wrote: On 16-09-25 02:31 PM, Florian Westphal wrote: > Shmulik Ladkani wrote: >> We can later address any loop-detection improvements in mirred. >> WDYT? > > You can address this after fixing infamous spinlock recursion hard > lockup (which has existed forever): > > tc qdisc add dev eth0 root handle 1: prio > tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid > 1:2 action mirred egress redirect dev eth0 > > (only do this on toy vm) Realize didnt respond to this. Seems very simple to fix: if skb->dev->ifindex and m->tcfm_dev->ifindex are the same, then you can drop the packet. >>> >>> Yes, but I think we get same issue when we deal with stacked >>> interfaces, and redirect is to e.g. vlan on top of physical device. >> >> We do have the adjacent upper lists in all netdevices, calculating if a >> mirred actions would insert the skb on a stacked device above us should >> be as easy as querying netdev_has_upper_dev and should be possible to >> check that during config time. > > But that would still not be enough, no? In the sense that with above > scenario, you could redirect to some arbitrary device that redirects > this back to the original device if on purpose configured as such, > thus they don't necessarily need to have a stacked relationship. Yes, it would only help with the scenario Florian described above. Personally, I would only try to fix and warn against the easy to detect cases. It is easy enough to just create a loop with your local attached L2 which brings your box into a endless loop processing the same packet again and again. Because it is out of control of the kernel you cannot do anything at all. I would just care that we sometimes reschedule and don't do everything in one stack so we don't corrupt the machine and an admin has still a chance to solve the problem. Bye, Hannes
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 09/26/2016 04:43 PM, Hannes Frederic Sowa wrote: On 26.09.2016 03:35, Florian Westphal wrote: Jamal Hadi Salim wrote: On 16-09-25 02:31 PM, Florian Westphal wrote: Shmulik Ladkani wrote: We can later address any loop-detection improvements in mirred. WDYT? You can address this after fixing infamous spinlock recursion hard lockup (which has existed forever): tc qdisc add dev eth0 root handle 1: prio tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid 1:2 action mirred egress redirect dev eth0 (only do this on toy vm) Realize didnt respond to this. Seems very simple to fix: if skb->dev->ifindex and m->tcfm_dev->ifindex are the same, then you can drop the packet. Yes, but I think we get same issue when we deal with stacked interfaces, and redirect is to e.g. vlan on top of physical device. We do have the adjacent upper lists in all netdevices, calculating if a mirred actions would insert the skb on a stacked device above us should be as easy as querying netdev_has_upper_dev and should be possible to check that during config time. But that would still not be enough, no? In the sense that with above scenario, you could redirect to some arbitrary device that redirects this back to the original device if on purpose configured as such, thus they don't necessarily need to have a stacked relationship. And we have such loops even without tc, for instance when placing both veth ends in same bridge :-( We can't fix that without a ttl in the sk_buff struct. Bye, Hannes
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 26.09.2016 03:35, Florian Westphal wrote: > Jamal Hadi Salim wrote: >> On 16-09-25 02:31 PM, Florian Westphal wrote: >>> Shmulik Ladkani wrote: We can later address any loop-detection improvements in mirred. WDYT? >>> >>> You can address this after fixing infamous spinlock recursion hard >>> lockup (which has existed forever): >>> >>> tc qdisc add dev eth0 root handle 1: prio >>> tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid >>> 1:2 action mirred egress redirect dev eth0 >>> >>> (only do this on toy vm) >>> >> >> Realize didnt respond to this. Seems very simple to fix: >> if skb->dev->ifindex and m->tcfm_dev->ifindex are the >> same, then you can drop the packet. > > Yes, but I think we get same issue when we deal with stacked > interfaces, and redirect is to e.g. vlan on top of physical device. We do have the adjacent upper lists in all netdevices, calculating if a mirred actions would insert the skb on a stacked device above us should be as easy as querying netdev_has_upper_dev and should be possible to check that during config time. > And we have such loops even without tc, for instance when placing > both veth ends in same bridge :-( We can't fix that without a ttl in the sk_buff struct. Bye, Hannes
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Sun, Sep 25, 2016 at 10:59 AM, Shmulik Ladkani wrote: > Hi, > > On Sat, 24 Sep 2016 17:07:12 -0700 Cong Wang wrote: >> One problem to use your code for us is that, the RX side of veth >> is inside containers, not visible to outside, perhaps we need some >> more parameter to tell the netns before the device name/index? >> Thoughts? > > Well, this is way trickier... > > tc_mirred doesn't cope with netns movement of the target device. > See 'mirred_device_event': upon NETDEV_UNREGISTER the 'tcfm_dev' gets > nullified. > > (dev_change_net_namespace sequence includes NETDEV_UNREGISTER, > dev_net_set, NETDEV_REGISTER). > > As upposed to veth, which keeps the peer netdev pointer (since veth peers > lifetime is coupled), here in act_mirred we can't easily distinguish a > "real" NETDEV_UNREGISTER vs a namespace change... Yeah, isolation is a barrier for this, so we probably can't use this feature. But I still think it could be useful. Thanks.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Sun, Sep 25, 2016 at 6:39 AM, Jamal Hadi Salim wrote: > On 16-09-24 08:07 PM, Cong Wang wrote: >> >> On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani > > >> >> One problem to use your code for us is that, the RX side of veth >> is inside containers, not visible to outside, perhaps we need some >> more parameter to tell the netns before the device name/index? >> Thoughts? >> > > Intriguing - but this would apply for only veth? Yes, my case is only about veth.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-25 09:35 PM, Florian Westphal wrote: Jamal Hadi Salim wrote: Realize didnt respond to this. Seems very simple to fix: if skb->dev->ifindex and m->tcfm_dev->ifindex are the same, then you can drop the packet. Yes, but I think we get same issue when we deal with stacked interfaces, and redirect is to e.g. vlan on top of physical device. And we have such loops even without tc, for instance when placing both veth ends in same bridge :-( For egress->egress the xmit recursion should help (maybe an audit needs to be done). For the case Shmulik is trying to achieve I am not sure that would help. In general, catching such a loop (or broadcast), if cheap should be attempted. cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
Jamal Hadi Salim wrote: > On 16-09-25 02:31 PM, Florian Westphal wrote: > >Shmulik Ladkani wrote: > >>We can later address any loop-detection improvements in mirred. > >>WDYT? > > > >You can address this after fixing infamous spinlock recursion hard > >lockup (which has existed forever): > > > >tc qdisc add dev eth0 root handle 1: prio > >tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid > >1:2 action mirred egress redirect dev eth0 > > > >(only do this on toy vm) > > > > Realize didnt respond to this. Seems very simple to fix: > if skb->dev->ifindex and m->tcfm_dev->ifindex are the > same, then you can drop the packet. Yes, but I think we get same issue when we deal with stacked interfaces, and redirect is to e.g. vlan on top of physical device. And we have such loops even without tc, for instance when placing both veth ends in same bridge :-(
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-25 02:31 PM, Florian Westphal wrote: Shmulik Ladkani wrote: We can later address any loop-detection improvements in mirred. WDYT? You can address this after fixing infamous spinlock recursion hard lockup (which has existed forever): tc qdisc add dev eth0 root handle 1: prio tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid 1:2 action mirred egress redirect dev eth0 (only do this on toy vm) Realize didnt respond to this. Seems very simple to fix: if skb->dev->ifindex and m->tcfm_dev->ifindex are the same, then you can drop the packet. But it may be possible to reject earlier the policy entirely earlier. And true given this is egress->egress redirection one could also check with xmit_recursion check. cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-25 02:33 PM, Florian Westphal wrote: Daniel Borkmann wrote: [..] Why not just reuse xmit_recursion, which is what we did in tc cls_bpf programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on this in the skb. +1, don't (yet) understand why a per-skb counter is required for this. If you could solve redirecting from ingress->egress with xmit_recursion then we are set ;-> cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-25 01:33 PM, Shmulik Ladkani wrote: On Sun, 25 Sep 2016 09:05:08 -0400 Jamal Hadi Salim wrote: On 16-09-23 11:40 AM, Shmulik Ladkani wrote: [off topic] I think this is still on topic! Sorry, wasn't too clear on that. What I meant is that _existing_ "egress redirect" already gets us into crazy loops - the veth misconfig being just one example of, but many more exist (many device stacking constructs, with lower dev issuing an egress-redirect back to the topmost dev). So this is stopped by the xmit_recursion Daniel mentioned, correct? Point is, IMO loop detection (whether/how addressed), is orthogonal to this series implementing "ingress redirect", and doesn't seem as a strict prerequisite to adding the "ingress redirect" functionality to act_mirred. We can later address any loop-detection improvements in mirred. WDYT? If indeed the xmit_recursion solves the egress->egress problem then I would suggest we need to address the egress->ingress issue. BTW: plans to also address ingress->ingress? cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-25 12:26 PM, Daniel Borkmann wrote: On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote: [..] MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in current code. The idea above was that we would increment the rttl counter once and if we saw it again upto MAX_RED_LOOP we would assume a loop and drop the packet (at the time i didnt think it was wise to let the actions be in charge of setting the RTTL; it had to be central core code - but it may not be neccessary) Florian, when we discussed I said it was fine to reclaim those 3 bits on tc verdict for RTTL at the time because i had taken out the feature and never added it back. Your comment at the time was we can add it back when someone shows up with the feature. Shmulik is looking to add it. Why not just reuse xmit_recursion, which is what we did in tc cls_bpf programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on this in the skb. If it is going to work, I'd be happy to save those bits. xmit_recursion is going to prevent recursing into dev_xmit(), no? In our case we want to preventing looping of a singular skb. cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
Daniel Borkmann wrote: > On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote: > >MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in > >current code. The idea above was that we would increment the rttl > >counter once and if we saw it again upto MAX_RED_LOOP we would assume > >a loop and drop the packet (at the time i didnt think it was wise to > >let the actions be in charge of setting the RTTL; it had to be central > >core code - but it may not be neccessary) > > > >Florian, when we discussed I said it was fine to reclaim those 3 bits > >on tc verdict for RTTL at the time because i had taken out the > >feature and never added it back. Your comment at the time was we can > >add it back when someone shows up with the feature. > >Shmulik is looking to add it. > > Why not just reuse xmit_recursion, which is what we did in tc cls_bpf > programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on > this in the skb. +1, don't (yet) understand why a per-skb counter is required for this.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
Shmulik Ladkani wrote: > We can later address any loop-detection improvements in mirred. > WDYT? You can address this after fixing infamous spinlock recursion hard lockup (which has existed forever): tc qdisc add dev eth0 root handle 1: prio tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid 1:2 action mirred egress redirect dev eth0 (only do this on toy vm)
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
Hi, On Sat, 24 Sep 2016 17:07:12 -0700 Cong Wang wrote: > One problem to use your code for us is that, the RX side of veth > is inside containers, not visible to outside, perhaps we need some > more parameter to tell the netns before the device name/index? > Thoughts? Well, this is way trickier... tc_mirred doesn't cope with netns movement of the target device. See 'mirred_device_event': upon NETDEV_UNREGISTER the 'tcfm_dev' gets nullified. (dev_change_net_namespace sequence includes NETDEV_UNREGISTER, dev_net_set, NETDEV_REGISTER). As upposed to veth, which keeps the peer netdev pointer (since veth peers lifetime is coupled), here in act_mirred we can't easily distinguish a "real" NETDEV_UNREGISTER vs a namespace change...
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Sun, 25 Sep 2016 09:05:08 -0400 Jamal Hadi Salim wrote: > On 16-09-23 11:40 AM, Shmulik Ladkani wrote: > > > > [off topic] > > I think this is still on topic! Sorry, wasn't too clear on that. What I meant is that _existing_ "egress redirect" already gets us into crazy loops - the veth misconfig being just one example of, but many more exist (many device stacking constructs, with lower dev issuing an egress-redirect back to the topmost dev). Point is, IMO loop detection (whether/how addressed), is orthogonal to this series implementing "ingress redirect", and doesn't seem as a strict prerequisite to adding the "ingress redirect" functionality to act_mirred. We can later address any loop-detection improvements in mirred. WDYT? Thanks, Shmulik
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote: On 16-09-23 11:40 AM, Shmulik Ladkani wrote: On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim wrote: Even today, one may create loops using existing 'egress redirect', e.g. this rediculously errorneous construct: # ip l add v0 type veth peer name v0p # tc filter add dev v0p parent : basic \ action mirred egress redirect dev v0 I think we actually recover from this one by eventually dropping (theres a ttl field). [off topic] Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%, and after one second I got: # ip -s l show type veth 16: v0p@v0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 71660305923 469890864 0 0 0 0 TX: bytes packets errors dropped carrier collsns 3509 24 0 0 0 0 17: v0@v0p: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 3509 24 0 0 0 0 TX: bytes packets errors dropped carrier collsns 71660713017 469893555 0 0 0 0 I think this is still on topic! Now I realize that code we took out around 4.2.x is still useful for such a use case (I wasnt thinking about veth when Florian was slimming the skb); +Cc Florian W. This snippet from 4.2: - 3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq) 3526 { 3527 struct net_device *dev = skb->dev; 3528 u32 ttl = G_TC_RTTL(skb->tc_verd); 3529 int result = TC_ACT_OK; 3530 struct Qdisc *q; 3531 3532 if (unlikely(MAX_RED_LOOP < ttl++)) { 3533 net_warn_ratelimited("Redir loop detected Dropping packet (%d->%d)\n", 3534 skb->skb_iif, dev->ifindex); 3535 return TC_ACT_SHOT; 3536 } 3537 3538 skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl); 3539 skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); 3540 3541 q = rcu_dereference(rxq->qdisc); 3542 if (q != &noop_qdisc) { 3543 spin_lock(qdisc_lock(q)); 3544 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) 3545 result = qdisc_enqueue_root(skb, q); 3546 spin_unlock(qdisc_lock(q)); 3547 } 3548 3549 return result; 3550 } MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in current code. The idea above was that we would increment the rttl counter once and if we saw it again upto MAX_RED_LOOP we would assume a loop and drop the packet (at the time i didnt think it was wise to let the actions be in charge of setting the RTTL; it had to be central core code - but it may not be neccessary) Florian, when we discussed I said it was fine to reclaim those 3 bits on tc verdict for RTTL at the time because i had taken out the feature and never added it back. Your comment at the time was we can add it back when someone shows up with the feature. Shmulik is looking to add it. Why not just reuse xmit_recursion, which is what we did in tc cls_bpf programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on this in the skb. Similarly to all constructs injecting skbs to device rx (bond/team, vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are obligated to assign 'skb2->dev' as the new rx device. Regarding 'skb2->skb_iif', original act_mirred code already has: skb2->skb_iif = skb->dev->ifindex; <--- THIS IS ORIG DEV IIF skb2->dev = dev; <--- THIS IS TARGET DEV err = dev_queue_xmit(skb2); I'm preserving this; OTOH the suggested modification in the patch is -err = dev_queue_xmit(skb2); +if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS) +err = dev_queue_xmit(skb2); +else +netif_receive_skb(skb2); now, the call to 'netif_receive_skb' will eventually override skb_iif to the target RX dev's index, upon entry to __netif_receive_skb_core. I think this IS the expected behavior - as done by other "rx injection" constructs. Sounds fine. I am wondering if we can have a tracing feature to show the lifetime of the packet as it is being cycled around the kernel? It would help debugging if some policy misbehaves. My doubts were around whether we should call 'dev_forward_skb' instead of 'netif_receive_skb'. The former does some things I assumed we're not interested of, like testing 'is_skb_forwardable' and re-running 'eth_type_trans'. OTOH, it DOES scrub the skb. Maybe we should scrub it as well prior the netif_receive_skb call? Scrubbing the skb could be a bad idea if it gets rid of global state like the RTTL if you add it back. cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-24 08:07 PM, Cong Wang wrote: On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani One problem to use your code for us is that, the RX side of veth is inside containers, not visible to outside, perhaps we need some more parameter to tell the netns before the device name/index? Thoughts? Intriguing - but this would apply for only veth? It may be around preventing loops maybe. Could be, but personally, I treat these constructs as (powerful) building blocks, and "with great power comes great responsibility". Even today, one may create loops using existing 'egress redirect', e.g. this rediculously errorneous construct: # ip l add v0 type veth peer name v0p # tc filter add dev v0p parent : basic \ action mirred egress redirect dev v0 Detecting such loops should not be hard technically, like we do for reclassification. We might need some bits in skb to detect this specific case. Note my other email. We had the feature but we took it out to save bits on the skb. cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-23 11:40 AM, Shmulik Ladkani wrote: On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim wrote: Even today, one may create loops using existing 'egress redirect', e.g. this rediculously errorneous construct: # ip l add v0 type veth peer name v0p # tc filter add dev v0p parent : basic \ action mirred egress redirect dev v0 I think we actually recover from this one by eventually dropping (theres a ttl field). [off topic] Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%, and after one second I got: # ip -s l show type veth 16: v0p@v0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 71660305923 469890864 0 0 0 0 TX: bytes packets errors dropped carrier collsns 3509 24 0 0 0 0 17: v0@v0p: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 3509 24 0 0 0 0 TX: bytes packets errors dropped carrier collsns 71660713017 469893555 0 0 0 0 I think this is still on topic! Now I realize that code we took out around 4.2.x is still useful for such a use case (I wasnt thinking about veth when Florian was slimming the skb); +Cc Florian W. This snippet from 4.2: - 3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq) 3526 { 3527 struct net_device *dev = skb->dev; 3528 u32 ttl = G_TC_RTTL(skb->tc_verd); 3529 int result = TC_ACT_OK; 3530 struct Qdisc *q; 3531 3532 if (unlikely(MAX_RED_LOOP < ttl++)) { 3533 net_warn_ratelimited("Redir loop detected Dropping packet (%d->%d)\n", 3534 skb->skb_iif, dev->ifindex); 3535 return TC_ACT_SHOT; 3536 } 3537 3538 skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl); 3539 skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); 3540 3541 q = rcu_dereference(rxq->qdisc); 3542 if (q != &noop_qdisc) { 3543 spin_lock(qdisc_lock(q)); 3544 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) 3545 result = qdisc_enqueue_root(skb, q); 3546 spin_unlock(qdisc_lock(q)); 3547 } 3548 3549 return result; 3550 } MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in current code. The idea above was that we would increment the rttl counter once and if we saw it again upto MAX_RED_LOOP we would assume a loop and drop the packet (at the time i didnt think it was wise to let the actions be in charge of setting the RTTL; it had to be central core code - but it may not be neccessary) Florian, when we discussed I said it was fine to reclaim those 3 bits on tc verdict for RTTL at the time because i had taken out the feature and never added it back. Your comment at the time was we can add it back when someone shows up with the feature. Shmulik is looking to add it. Similarly to all constructs injecting skbs to device rx (bond/team, vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are obligated to assign 'skb2->dev' as the new rx device. Regarding 'skb2->skb_iif', original act_mirred code already has: skb2->skb_iif = skb->dev->ifindex; <--- THIS IS ORIG DEV IIF skb2->dev = dev; <--- THIS IS TARGET DEV err = dev_queue_xmit(skb2); I'm preserving this; OTOH the suggested modification in the patch is - err = dev_queue_xmit(skb2); + if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS) + err = dev_queue_xmit(skb2); + else + netif_receive_skb(skb2); now, the call to 'netif_receive_skb' will eventually override skb_iif to the target RX dev's index, upon entry to __netif_receive_skb_core. I think this IS the expected behavior - as done by other "rx injection" constructs. Sounds fine. I am wondering if we can have a tracing feature to show the lifetime of the packet as it is being cycled around the kernel? It would help debugging if some policy misbehaves. My doubts were around whether we should call 'dev_forward_skb' instead of 'netif_receive_skb'. The former does some things I assumed we're not interested of, like testing 'is_skb_forwardable' and re-running 'eth_type_trans'. OTOH, it DOES scrub the skb. Maybe we should scrub it as well prior the netif_receive_skb call? Scrubbing the skb could be a bad idea if it gets rid of global state like the RTTL if you add it back. cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Fri, Sep 23, 2016 at 8:40 AM, Shmulik Ladkani wrote: > On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim wrote: >> > Even today, one may create loops using existing 'egress redirect', >> > e.g. this rediculously errorneous construct: >> > >> > # ip l add v0 type veth peer name v0p >> > # tc filter add dev v0p parent : basic \ >> > action mirred egress redirect dev v0 >> >> I think we actually recover from this one by eventually >> dropping (theres a ttl field). > > [off topic] > > Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%, > and after one second I got: > > # ip -s l show type veth > 16: v0p@v0: mtu 1500 qdisc noqueue state UP > mode DEFAULT group default qlen 1000 > link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff > RX: bytes packets errors dropped overrun mcast > 71660305923 469890864 0 0 0 0 > TX: bytes packets errors dropped carrier collsns > 3509 24 0 0 0 0 > 17: v0@v0p: mtu 1500 qdisc noqueue state UP > mode DEFAULT group default qlen 1000 > link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff > RX: bytes packets errors dropped overrun mcast > 3509 24 0 0 0 0 > TX: bytes packets errors dropped carrier collsns > 71660713017 469893555 0 0 0 0 These ghost packets never enter IP stack, I don't think TTL helps.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani wrote: > Was wondering why it's missing, googled a bit with no meaningful > results, so speculated the following: > > Some time long ago, initial 'mirred' purpose was to facilitate ifb. > Therefore 'egress redirect' was implemented. Jamal probably left the > 'ingress' support for a later time :) > > One interesting usecase for 'ingress redirect' is creating "rx bouncing" > construct (like macvlan/macvtap/ipvlan) but applied according to custom > logic. We have done this for our containers for a long time. We simply redirect packets to veth TX then flow to veth RX of course. One problem to use your code for us is that, the RX side of veth is inside containers, not visible to outside, perhaps we need some more parameter to tell the netns before the device name/index? Thoughts? > >> It may be around preventing loops maybe. > > Could be, but personally, I treat these constructs as (powerful) > building blocks, and "with great power comes great responsibility". > > Even today, one may create loops using existing 'egress redirect', > e.g. this rediculously errorneous construct: > > # ip l add v0 type veth peer name v0p > # tc filter add dev v0p parent : basic \ > action mirred egress redirect dev v0 Detecting such loops should not be hard technically, like we do for reclassification. We might need some bits in skb to detect this specific case. Anyway, I don't think it is a blocker, just need more tests to catch some corner cases. Thanks.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Thu, Sep 22, 2016 at 6:21 AM, Shmulik Ladkani wrote: > From: Shmulik Ladkani > > Up until now, 'action mirred' supported only egress actions (either > TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR). > > This patch implements the corresponding ingress actions > TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR. > > This allows attaching filters whose target is to hand matching skbs into > the rx processing of a specified device. I like this idea, this idea actually came to my mind when we tried to redirect packets from eth0 to veth device in containers. I remember I already brought this up to Jamal before (either personally or publicly), but I forgot why I stopped. This would reduce some latency for our Mesos containers, we would just skip one veth TX in our scenario.
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim wrote: > > Even today, one may create loops using existing 'egress redirect', > > e.g. this rediculously errorneous construct: > > > > # ip l add v0 type veth peer name v0p > > # tc filter add dev v0p parent : basic \ > > action mirred egress redirect dev v0 > > I think we actually recover from this one by eventually > dropping (theres a ttl field). [off topic] Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%, and after one second I got: # ip -s l show type veth 16: v0p@v0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 71660305923 469890864 0 0 0 0 TX: bytes packets errors dropped carrier collsns 3509 24 0 0 0 0 17: v0@v0p: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 3509 24 0 0 0 0 TX: bytes packets errors dropped carrier collsns 71660713017 469893555 0 0 0 0 > The other question is what to set skb->dev and skb->iif? > Some information will be lost if you move around netdevs a > bit. [back to topic] Good point. Similarly to all constructs injecting skbs to device rx (bond/team, vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are obligated to assign 'skb2->dev' as the new rx device. Regarding 'skb2->skb_iif', original act_mirred code already has: skb2->skb_iif = skb->dev->ifindex; <--- THIS IS ORIG DEV IIF skb2->dev = dev; <--- THIS IS TARGET DEV err = dev_queue_xmit(skb2); I'm preserving this; OTOH the suggested modification in the patch is - err = dev_queue_xmit(skb2); + if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS) + err = dev_queue_xmit(skb2); + else + netif_receive_skb(skb2); now, the call to 'netif_receive_skb' will eventually override skb_iif to the target RX dev's index, upon entry to __netif_receive_skb_core. I think this IS the expected behavior - as done by other "rx injection" constructs. My doubts were around whether we should call 'dev_forward_skb' instead of 'netif_receive_skb'. The former does some things I assumed we're not interested of, like testing 'is_skb_forwardable' and re-running 'eth_type_trans'. OTOH, it DOES scrub the skb. Maybe we should scrub it as well prior the netif_receive_skb call? Thanks, Shmulik
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-23 01:11 AM, Shmulik Ladkani wrote: Hi, On Thu, 22 Sep 2016 19:40:15 -0400 Jamal Hadi Salim wrote: On 16-09-22 09:21 AM, Shmulik Ladkani wrote: From: Shmulik Ladkani Up until now, 'action mirred' supported only egress actions (either TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR). This patch implements the corresponding ingress actions TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR. This allows attaching filters whose target is to hand matching skbs into the rx processing of a specified device. Thank you for doing this. There was something that made me remove initial support for this feature - I am blanking out right now but will find my notes and give more details. Thanks Jamal, appreciate any details. Was wondering why it's missing, googled a bit with no meaningful results, so speculated the following: Some time long ago, initial 'mirred' purpose was to facilitate ifb. Therefore 'egress redirect' was implemented. Jamal probably left the 'ingress' support for a later time :) History is mirror/redirect were first introduced to do just those plain vanilla-free features. IFB came later. Up until recently there were still some bits to support the ingress features that were removed by Florian W. to save some skb bits. One interesting usecase for 'ingress redirect' is creating "rx bouncing" construct (like macvlan/macvtap/ipvlan) but applied according to custom logic. I thought that was the motivation as well. It may be around preventing loops maybe. Could be, but personally, I treat these constructs as (powerful) building blocks, and "with great power comes great responsibility". Amen. I am a believer in let-the-user-shoot-their-big-toe-if-they-want. Even today, one may create loops using existing 'egress redirect', e.g. this rediculously errorneous construct: # ip l add v0 type veth peer name v0p # tc filter add dev v0p parent : basic \ action mirred egress redirect dev v0 I think we actually recover from this one by eventually dropping (theres a ttl field). We should at least not lock the kernel forever. The other question is what to set skb->dev and skb->iif? Some information will be lost if you move around netdevs a bit. BTW: You have motivated me to start looking again at redirect to socket that was also left out. I am getting tired of redirecting to tuntap with all its bells and whistles. cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
Hi, On Thu, 22 Sep 2016 19:40:15 -0400 Jamal Hadi Salim wrote: > On 16-09-22 09:21 AM, Shmulik Ladkani wrote: > > From: Shmulik Ladkani > > > > Up until now, 'action mirred' supported only egress actions (either > > TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR). > > > > This patch implements the corresponding ingress actions > > TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR. > > > > This allows attaching filters whose target is to hand matching skbs into > > the rx processing of a specified device. > > Thank you for doing this. There was something that made me remove > initial support for this feature - I am blanking out right now but > will find my notes and give more details. Thanks Jamal, appreciate any details. Was wondering why it's missing, googled a bit with no meaningful results, so speculated the following: Some time long ago, initial 'mirred' purpose was to facilitate ifb. Therefore 'egress redirect' was implemented. Jamal probably left the 'ingress' support for a later time :) One interesting usecase for 'ingress redirect' is creating "rx bouncing" construct (like macvlan/macvtap/ipvlan) but applied according to custom logic. > It may be around preventing loops maybe. Could be, but personally, I treat these constructs as (powerful) building blocks, and "with great power comes great responsibility". Even today, one may create loops using existing 'egress redirect', e.g. this rediculously errorneous construct: # ip l add v0 type veth peer name v0p # tc filter add dev v0p parent : basic \ action mirred egress redirect dev v0 Regards, Shmulik
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On 16-09-22 09:21 AM, Shmulik Ladkani wrote: From: Shmulik Ladkani Up until now, 'action mirred' supported only egress actions (either TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR). This patch implements the corresponding ingress actions TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR. This allows attaching filters whose target is to hand matching skbs into the rx processing of a specified device. Thank you for doing this. There was something that made me remove initial support for this feature - I am blanking out right now but will find my notes and give more details. It may be around preventing loops maybe. If that was the thought then: I am just wondering is there a use case for a packet that is redirected from egress ethx to ingress of ethy that then requires ingress of ethy classify? Otherwise you could just set the "dont classify" flag. i.e SET_TC_NCLS() cheers, jamal
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Thu, 2016-09-22 at 21:27 +0300, Shmulik Ladkani wrote: > On Thu, 22 Sep 2016 07:54:13 -0700 Eric Dumazet > wrote: > > Hmm... we probably need to apply the full rcu protection before this > > patch. > > > > https://patchwork.ozlabs.org/patch/667680/ > > Are you referring to order of application into net-next? > > This patch seems to present no new tcf_mirred_params members nor > need-to-be-protected code regions (please, correct me if wrong). > So it does not _depend_ on the 'full rcu fixes', does it? No, simply a reminder that we run lockless there, so you might need to read some control variables once, and in a consistent way. (Or a concurrent writer could change params in the middle of the function)
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Thu, 22 Sep 2016 07:54:13 -0700 Eric Dumazet wrote: > Hmm... we probably need to apply the full rcu protection before this > patch. > > https://patchwork.ozlabs.org/patch/667680/ Are you referring to order of application into net-next? This patch seems to present no new tcf_mirred_params members nor need-to-be-protected code regions (please, correct me if wrong). So it does not _depend_ on the 'full rcu fixes', does it? Thanks, Shmulik
Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
On Thu, 2016-09-22 at 16:21 +0300, Shmulik Ladkani wrote: > From: Shmulik Ladkani > > Up until now, 'action mirred' supported only egress actions (either > TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR). > > This patch implements the corresponding ingress actions > TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR. > > This allows attaching filters whose target is to hand matching skbs into > the rx processing of a specified device. > > Signed-off-by: Shmulik Ladkani > Cc: Jamal Hadi Salim > --- > Was wondering, whether netif_receive_skb or dev_forward_skb should be > used for the rx bouncing. Used netif_receive_skb as in ifb device. Hmm... we probably need to apply the full rcu protection before this patch. https://patchwork.ozlabs.org/patch/667680/
[PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
From: Shmulik Ladkani Up until now, 'action mirred' supported only egress actions (either TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR). This patch implements the corresponding ingress actions TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR. This allows attaching filters whose target is to hand matching skbs into the rx processing of a specified device. Signed-off-by: Shmulik Ladkani Cc: Jamal Hadi Salim --- Was wondering, whether netif_receive_skb or dev_forward_skb should be used for the rx bouncing. Used netif_receive_skb as in ifb device. net/sched/act_mirred.c | 48 ++-- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 28629d3..942120e 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -33,6 +33,25 @@ static LIST_HEAD(mirred_list); static DEFINE_SPINLOCK(mirred_list_lock); +static bool tcf_mirred_is_act_redirect(int action) +{ + return action == TCA_EGRESS_REDIR || action == TCA_INGRESS_REDIR; +} + +static u32 tcf_mirred_act_direction(int action) +{ + switch (action) { + case TCA_EGRESS_REDIR: + case TCA_EGRESS_MIRROR: + return AT_EGRESS; + case TCA_INGRESS_REDIR: + case TCA_INGRESS_MIRROR: + return AT_INGRESS; + default: + BUG(); + } +} + static void tcf_mirred_release(struct tc_action *a, int bind) { struct tcf_mirred *m = to_mirred(a); @@ -96,6 +115,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, switch (parm->eaction) { case TCA_EGRESS_MIRROR: case TCA_EGRESS_REDIR: + case TCA_INGRESS_REDIR: + case TCA_INGRESS_MIRROR: break; default: if (exists) @@ -157,7 +178,8 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, struct tcf_mirred *m = to_mirred(a); struct net_device *dev; struct sk_buff *skb2; - int retval, err; + int retval, err = 0; + int mac_len; u32 at; tcf_lastuse_update(&m->tcf_tm); @@ -182,23 +204,37 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, if (!skb2) goto out; - if (!(at & AT_EGRESS)) { - if (m->tcfm_mac_header_xmit) + /* If action's target direction differs than filter's direction, +* and devices expect a mac header on xmit, then mac push/pull is +* needed. +*/ + if (at != tcf_mirred_act_direction(m->tcfm_eaction) && + m->tcfm_mac_header_xmit) { + if (at & AT_EGRESS) { + /* caught at egress, act ingress: pull mac */ + mac_len = skb_network_header(skb) - skb_mac_header(skb); + skb_pull_rcsum(skb2, mac_len); + } else { + /* caught at ingress, act egress: push mac */ skb_push_rcsum(skb2, skb->mac_len); + } } /* mirror is always swallowed */ - if (m->tcfm_eaction != TCA_EGRESS_MIRROR) + if (tcf_mirred_is_act_redirect(m->tcfm_eaction)) skb2->tc_verd = SET_TC_FROM(skb2->tc_verd, at); skb2->skb_iif = skb->dev->ifindex; skb2->dev = dev; - err = dev_queue_xmit(skb2); + if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS) + err = dev_queue_xmit(skb2); + else + netif_receive_skb(skb2); if (err) { out: qstats_overlimit_inc(this_cpu_ptr(m->common.cpu_qstats)); - if (m->tcfm_eaction != TCA_EGRESS_MIRROR) + if (tcf_mirred_is_act_redirect(m->tcfm_eaction)) retval = TC_ACT_SHOT; } rcu_read_unlock(); -- 1.9.1