subject:"\[PATCH net\-next 4\/4\] net\/sched\: act_mirred\: Implement ingress actions"

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-27 Thread Daniel Borkmann

On 09/27/2016 04:18 PM, Shmulik Ladkani wrote:

On Tue, 27 Sep 2016 09:44:41 -0400 (EDT), da...@davemloft.net wrote:

From: Daniel Borkmann 
Date: Tue, 27 Sep 2016 12:39:34 +0200

Any reason why dev_forward_skb() is not preferred over direct
netif_receive_skb() you're using? It would, for example, implicitly
assure that pkt_type is always PACKET_HOST, etc.

dev_forward_skb() will pull the ethernet header.

And since a direct call to netif_receive_skb() will not, one of these
two choices won't work properly.

In the patch, I'm issuing a skb_pull_rcsum() prior the netif_receive_skb,
snip:

[...]

Existing *egress* mir/red already supported pairing two non-eth devices.
Therefore I allow it for the new *ingress* mir/red as well.

[...]

Yeah, makes sense then. Should skb->pkt_type become an issue, you might
probably just use act_skbedit for such cases.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-27 Thread Shmulik Ladkani

On Tue, 27 Sep 2016 09:44:41 -0400 (EDT), da...@davemloft.net wrote:
> From: Daniel Borkmann 
> Date: Tue, 27 Sep 2016 12:39:34 +0200
> 
> > Any reason why dev_forward_skb() is not preferred over direct
> > netif_receive_skb() you're using? It would, for example, implicitly
> > assure that pkt_type is always PACKET_HOST, etc.
> 
> dev_forward_skb() will pull the ethernet header.
> 
> And since a direct call to netif_receive_skb() will not, one of these
> two choices won't work properly.

In the patch, I'm issuing a skb_pull_rcsum() prior the netif_receive_skb,
snip:

+   /* If action's target direction differs than filter's direction,
+* and devices expect a mac header on xmit, then mac push/pull is
+* needed.
+*/
+   if (at != tcf_mirred_act_direction(m->tcfm_eaction) &&
+   m->tcfm_mac_header_xmit) {
+   if (at & AT_EGRESS) {
+   /* caught at egress, act ingress: pull mac */
+   mac_len = skb_network_header(skb) - skb_mac_header(skb);
+   skb_pull_rcsum(skb2, mac_len);

Existing *egress* mir/red already supported pairing two non-eth devices.
Therefore I allow it for the new *ingress* mir/red as well.

Now, following this premise, the skb_pull_rcsum() shown above is executed
for devices whose xmit is mac_header based.

I've tested both ARPHRD_ETHER devices and some non ARPHRD_ETHER devices.

(an *existing* symmetric skb_push_rcsum() is invoked if packet is caught at
 ingress and redirected for egress on a mac_header xmit device)

This was the reason for picking netif_receive_skb() over dev_forward_skb().

Regards,
Shmulik

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-27 Thread Jamal Hadi Salim


On 16-09-27 04:07 AM, Shmulik Ladkani wrote:

Hi David,

On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), da...@davemloft.net wrote:

The discussion on this patch has ventured off into what to do about
recursion.

But it unclear to me where this specific patch, and this series,
stands right now.  Someone please clear this up for me.


Status:
 - Series adds "ingress redirect/mirror" support
 - Positive feedback for the feature
 - So far no comments regarding code itself
 - Questions raised regarding "recursion handling"

>

   Expressed that existing mirred code (i.e egress redirect) is *already*
   loop-unsafe (and also, some non-tc netdev constructs, as exampled by
   others).
   Discussion then wandered to "recursion handling".


not totaly bike-shed discussion; legit issues are being raised
(and the egress issue you point out is fixable now that we are paying
attention to it).
We need to take care of loops. I pointed to how the original thought
process was. I _dont_ see this as resolvable via recursion handling
since this is per-skb and not per entry point.
You can add my Acked-by if you promise to take care of this issue next.

cheers,
jamal

PS:- the code looks straight forward

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-27 Thread David Miller

From: Daniel Borkmann 
Date: Tue, 27 Sep 2016 12:39:34 +0200

> Any reason why dev_forward_skb() is not preferred over direct
> netif_receive_skb() you're using? It would, for example, implicitly
> assure that pkt_type is always PACKET_HOST, etc.

dev_forward_skb() will pull the ethernet header.

And since a direct call to netif_receive_skb() will not, one of these
two choices won't work properly.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-27 Thread Daniel Borkmann


On 09/27/2016 10:07 AM, Shmulik Ladkani wrote:

Hi David,

On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), da...@davemloft.net wrote:

The discussion on this patch has ventured off into what to do about
recursion.

But it unclear to me where this specific patch, and this series,
stands right now.  Someone please clear this up for me.


Status:
  - Series adds "ingress redirect/mirror" support
  - Positive feedback for the feature
  - So far no comments regarding code itself
  - Questions raised regarding "recursion handling"
Expressed that existing mirred code (i.e egress redirect) is *already*
loop-unsafe (and also, some non-tc netdev constructs, as exampled by
others).
Discussion then wandered to "recursion handling".


Any reason why dev_forward_skb() is not preferred over direct
netif_receive_skb() you're using? It would, for example, implicitly
assure that pkt_type is always PACKET_HOST, etc.

Thanks,
Daniel

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-27 Thread Shmulik Ladkani

Hi David,

On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), da...@davemloft.net wrote:
> The discussion on this patch has ventured off into what to do about
> recursion.
> 
> But it unclear to me where this specific patch, and this series,
> stands right now.  Someone please clear this up for me.

Status:
 - Series adds "ingress redirect/mirror" support
 - Positive feedback for the feature
 - So far no comments regarding code itself
 - Questions raised regarding "recursion handling"
   Expressed that existing mirred code (i.e egress redirect) is *already*
   loop-unsafe (and also, some non-tc netdev constructs, as exampled by
   others).
   Discussion then wandered to "recursion handling".

Regards,
Shmulik

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-26 Thread David Miller

From: Shmulik Ladkani 
Date: Thu, 22 Sep 2016 16:21:52 +0300

> From: Shmulik Ladkani 
> 
> Up until now, 'action mirred' supported only egress actions (either
> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
> 
> This patch implements the corresponding ingress actions
> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
> 
> This allows attaching filters whose target is to hand matching skbs into
> the rx processing of a specified device.
> 
> Signed-off-by: Shmulik Ladkani 
> Cc: Jamal Hadi Salim 
> ---
>  Was wondering, whether netif_receive_skb or dev_forward_skb should be
>  used for the rx bouncing. Used netif_receive_skb as in ifb device.

The discussion on this patch has ventured off into what to do about
recursion.

But it unclear to me where this specific patch, and this series,
stands right now.  Someone please clear this up for me.

Thanks.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-26 Thread Daniel Borkmann


On 09/26/2016 05:12 PM, Hannes Frederic Sowa wrote:
[...]

I would just care that we sometimes reschedule and don't do everything
in one stack so we don't corrupt the machine and an admin has still a
chance to solve the problem.


Sounds reasonable to me, and which is what dev_forward_skb() is doing
implicitly as well.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-26 Thread Shmulik Ladkani

Hi,

On Mon, 26 Sep 2016 16:43:16 +0200 Hannes Frederic Sowa 
 wrote:
> On 26.09.2016 03:35, Florian Westphal wrote:
> > 
> > Yes, but I think we get same issue when we deal with stacked
> > interfaces, and redirect is to e.g. vlan on top of physical device.  
> 
> We do have the adjacent upper lists in all netdevices, calculating if a
> mirred actions would insert the skb on a stacked device above us should
> be as easy as querying netdev_has_upper_dev and should be possible to
> check that during config time.

This does not seem the right direction:

- We should NOT ban egress redirect to an "upper device", this limits
  flexibility available today:
  One may validly redirect to an "upper device" according to a very
  specific criteria such that the skb will NOT be caught again by the
  filter when re-iterated down the stack.

- And even if we did, at "config time" as you say, one can always stack
  the target device ontop the filtered device at a LATER time.
  And obviously, testing "is upper" while executing action is a bad idea.

- Moreover, as Daniel noted, this is just a band-aid which limitely
  addresses only few of the already existing troubles with stacked
  virtual netdevices (with or without act_mirred).

Frankly, I think the loop-detection suggestions overload act_mirred
with AI that might imply limitations not well assessed (even on existing
usecases).

If needed, I'd go with the "recursion detection" suggested by Daniel,
which is simple and aids with the issues of "egress" redirect that
already exist today.

Moreover, as the patch series is all about "ingress redirect" support,
the "recursion detection" deserves a series of its own, we shouldn't
bundle these two distinct features.

Regards,
Shmulik

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-26 Thread Hannes Frederic Sowa

On 26.09.2016 16:53, Daniel Borkmann wrote:
> On 09/26/2016 04:43 PM, Hannes Frederic Sowa wrote:
>> On 26.09.2016 03:35, Florian Westphal wrote:
>>> Jamal Hadi Salim  wrote:
 On 16-09-25 02:31 PM, Florian Westphal wrote:
> Shmulik Ladkani  wrote:
>> We can later address any loop-detection improvements in mirred.
>> WDYT?
>
> You can address this after fixing infamous spinlock recursion hard
> lockup (which has existed forever):
>
> tc qdisc add dev eth0 root handle 1: prio
> tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
> 1:2 action mirred egress redirect dev eth0
>
> (only do this on toy vm)

 Realize didnt respond to this. Seems very simple to fix:
 if skb->dev->ifindex and m->tcfm_dev->ifindex are the
 same, then you can drop the packet.
>>>
>>> Yes, but I think we get same issue when we deal with stacked
>>> interfaces, and redirect is to e.g. vlan on top of physical device.
>>
>> We do have the adjacent upper lists in all netdevices, calculating if a
>> mirred actions would insert the skb on a stacked device above us should
>> be as easy as querying netdev_has_upper_dev and should be possible to
>> check that during config time.
> 
> But that would still not be enough, no? In the sense that with above
> scenario, you could redirect to some arbitrary device that redirects
> this back to the original device if on purpose configured as such,
> thus they don't necessarily need to have a stacked relationship.

Yes, it would only help with the scenario Florian described above.

Personally, I would only try to fix and warn against the easy to detect
cases. It is easy enough to just create a loop with your local attached
L2 which brings your box into a endless loop processing the same packet
again and again. Because it is out of control of the kernel you cannot
do anything at all.

I would just care that we sometimes reschedule and don't do everything
in one stack so we don't corrupt the machine and an admin has still a
chance to solve the problem.

Bye,
Hannes

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-26 Thread Daniel Borkmann


On 09/26/2016 04:43 PM, Hannes Frederic Sowa wrote:

On 26.09.2016 03:35, Florian Westphal wrote:

Jamal Hadi Salim  wrote:

On 16-09-25 02:31 PM, Florian Westphal wrote:

Shmulik Ladkani  wrote:

We can later address any loop-detection improvements in mirred.
WDYT?


You can address this after fixing infamous spinlock recursion hard
lockup (which has existed forever):

tc qdisc add dev eth0 root handle 1: prio
tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
1:2 action mirred egress redirect dev eth0

(only do this on toy vm)


Realize didnt respond to this. Seems very simple to fix:
if skb->dev->ifindex and m->tcfm_dev->ifindex are the
same, then you can drop the packet.


Yes, but I think we get same issue when we deal with stacked
interfaces, and redirect is to e.g. vlan on top of physical device.


We do have the adjacent upper lists in all netdevices, calculating if a
mirred actions would insert the skb on a stacked device above us should
be as easy as querying netdev_has_upper_dev and should be possible to
check that during config time.


But that would still not be enough, no? In the sense that with above
scenario, you could redirect to some arbitrary device that redirects
this back to the original device if on purpose configured as such,
thus they don't necessarily need to have a stacked relationship.


And we have such loops even without tc, for instance when placing
both veth ends in same bridge :-(


We can't fix that without a ttl in the sk_buff struct.

Bye,
Hannes

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-26 Thread Hannes Frederic Sowa

On 26.09.2016 03:35, Florian Westphal wrote:
> Jamal Hadi Salim  wrote:
>> On 16-09-25 02:31 PM, Florian Westphal wrote:
>>> Shmulik Ladkani  wrote:
 We can later address any loop-detection improvements in mirred.
 WDYT?
>>>
>>> You can address this after fixing infamous spinlock recursion hard
>>> lockup (which has existed forever):
>>>
>>> tc qdisc add dev eth0 root handle 1: prio
>>> tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
>>> 1:2 action mirred egress redirect dev eth0
>>>
>>> (only do this on toy vm)
>>>
>>
>> Realize didnt respond to this. Seems very simple to fix:
>> if skb->dev->ifindex and m->tcfm_dev->ifindex are the
>> same, then you can drop the packet.
> 
> Yes, but I think we get same issue when we deal with stacked
> interfaces, and redirect is to e.g. vlan on top of physical device.

We do have the adjacent upper lists in all netdevices, calculating if a
mirred actions would insert the skb on a stacked device above us should
be as easy as querying netdev_has_upper_dev and should be possible to
check that during config time.

> And we have such loops even without tc, for instance when placing
> both veth ends in same bridge :-(

We can't fix that without a ttl in the sk_buff struct.

Bye,
Hannes

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Cong Wang

On Sun, Sep 25, 2016 at 10:59 AM, Shmulik Ladkani
 wrote:
> Hi,
>
> On Sat, 24 Sep 2016 17:07:12 -0700 Cong Wang  wrote:
>> One problem to use your code for us is that, the RX side of veth
>> is inside containers, not visible to outside, perhaps we need some
>> more parameter to tell the netns before the device name/index?
>> Thoughts?
>
> Well, this is way trickier...
>
> tc_mirred doesn't cope with netns movement of the target device.
> See 'mirred_device_event': upon NETDEV_UNREGISTER the 'tcfm_dev' gets
> nullified.
>
> (dev_change_net_namespace sequence includes NETDEV_UNREGISTER,
>  dev_net_set, NETDEV_REGISTER).
>
> As upposed to veth, which keeps the peer netdev pointer (since veth peers
> lifetime is coupled), here in act_mirred we can't easily distinguish a
> "real" NETDEV_UNREGISTER vs a namespace change...

Yeah, isolation is a barrier for this, so we probably can't use
this feature. But I still think it could be useful.

Thanks.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Cong Wang

On Sun, Sep 25, 2016 at 6:39 AM, Jamal Hadi Salim  wrote:
> On 16-09-24 08:07 PM, Cong Wang wrote:
>>
>> On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani
>
>
>>
>> One problem to use your code for us is that, the RX side of veth
>> is inside containers, not visible to outside, perhaps we need some
>> more parameter to tell the netns before the device name/index?
>> Thoughts?
>>
>
> Intriguing - but this would apply for only veth?

Yes, my case is only about veth.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Jamal Hadi Salim


On 16-09-25 09:35 PM, Florian Westphal wrote:

Jamal Hadi Salim  wrote:




Realize didnt respond to this. Seems very simple to fix:
if skb->dev->ifindex and m->tcfm_dev->ifindex are the
same, then you can drop the packet.


Yes, but I think we get same issue when we deal with stacked
interfaces, and redirect is to e.g. vlan on top of physical device.

And we have such loops even without tc, for instance when placing
both veth ends in same bridge :-(



For egress->egress the xmit recursion should help (maybe an
audit needs to be done).
For the case Shmulik is trying to achieve I am not sure that
would help.
In general, catching such a loop (or broadcast), if cheap should
be attempted.

cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Florian Westphal

Jamal Hadi Salim  wrote:
> On 16-09-25 02:31 PM, Florian Westphal wrote:
> >Shmulik Ladkani  wrote:
> >>We can later address any loop-detection improvements in mirred.
> >>WDYT?
> >
> >You can address this after fixing infamous spinlock recursion hard
> >lockup (which has existed forever):
> >
> >tc qdisc add dev eth0 root handle 1: prio
> >tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
> >1:2 action mirred egress redirect dev eth0
> >
> >(only do this on toy vm)
> >
> 
> Realize didnt respond to this. Seems very simple to fix:
> if skb->dev->ifindex and m->tcfm_dev->ifindex are the
> same, then you can drop the packet.

Yes, but I think we get same issue when we deal with stacked
interfaces, and redirect is to e.g. vlan on top of physical device.

And we have such loops even without tc, for instance when placing
both veth ends in same bridge :-(

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Jamal Hadi Salim


On 16-09-25 02:31 PM, Florian Westphal wrote:

Shmulik Ladkani  wrote:

We can later address any loop-detection improvements in mirred.
WDYT?


You can address this after fixing infamous spinlock recursion hard
lockup (which has existed forever):

tc qdisc add dev eth0 root handle 1: prio
tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
1:2 action mirred egress redirect dev eth0

(only do this on toy vm)



Realize didnt respond to this. Seems very simple to fix:
if skb->dev->ifindex and m->tcfm_dev->ifindex are the
same, then you can drop the packet.
But it may be possible to reject earlier the policy
entirely earlier.
And true given this is egress->egress redirection
one could also check with xmit_recursion check.


cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Jamal Hadi Salim


On 16-09-25 02:33 PM, Florian Westphal wrote:

Daniel Borkmann  wrote:

[..]


Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
this in the skb.


+1, don't (yet) understand why a per-skb counter is required for this.



If you could solve redirecting from ingress->egress with xmit_recursion
then we are set ;->

cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Jamal Hadi Salim


On 16-09-25 01:33 PM, Shmulik Ladkani wrote:

On Sun, 25 Sep 2016 09:05:08 -0400 Jamal Hadi Salim  wrote:

On 16-09-23 11:40 AM, Shmulik Ladkani wrote:


[off topic]


I think this is still on topic!


Sorry, wasn't too clear on that.

What I meant is that _existing_ "egress redirect" already gets us into
crazy loops - the veth misconfig being just one example of, but
many more exist (many device stacking constructs, with lower dev issuing
an egress-redirect back to the topmost dev).



So this is stopped by the xmit_recursion Daniel mentioned, correct?


Point is, IMO loop detection (whether/how addressed), is orthogonal to
this series implementing "ingress redirect", and doesn't seem as a
strict prerequisite to adding the "ingress redirect" functionality to
act_mirred.

We can later address any loop-detection improvements in mirred.
WDYT?



If indeed the xmit_recursion solves the egress->egress problem then I
would suggest we need to address the egress->ingress issue.
BTW: plans to also address ingress->ingress?

cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Jamal Hadi Salim


On 16-09-25 12:26 PM, Daniel Borkmann wrote:

On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote:


[..]


MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
current code. The idea above was that we would increment the rttl
counter  once and if we saw it again upto MAX_RED_LOOP we would assume
a loop and drop the packet (at the time i didnt think it was wise to
let the actions be in charge of setting the RTTL; it had to be central
core code - but it may not be neccessary)

Florian, when we discussed I said it was fine to reclaim those 3 bits
on tc verdict for RTTL at the time because i had taken out the
feature and never added it back. Your comment at the time was we can
add it back when someone shows up with the feature.
Shmulik is looking to add it.


Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
this in the skb.


If it is going to work, I'd be happy to save those bits.
xmit_recursion is going to prevent recursing into
dev_xmit(), no?

In our case we want to preventing looping of a singular
skb.

cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Florian Westphal

Daniel Borkmann  wrote:
> On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote:
> >MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
> >current code. The idea above was that we would increment the rttl
> >counter  once and if we saw it again upto MAX_RED_LOOP we would assume
> >a loop and drop the packet (at the time i didnt think it was wise to
> >let the actions be in charge of setting the RTTL; it had to be central
> >core code - but it may not be neccessary)
> >
> >Florian, when we discussed I said it was fine to reclaim those 3 bits
> >on tc verdict for RTTL at the time because i had taken out the
> >feature and never added it back. Your comment at the time was we can
> >add it back when someone shows up with the feature.
> >Shmulik is looking to add it.
> 
> Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
> programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
> this in the skb.

+1, don't (yet) understand why a per-skb counter is required for this.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Florian Westphal

Shmulik Ladkani  wrote:
> We can later address any loop-detection improvements in mirred.
> WDYT?

You can address this after fixing infamous spinlock recursion hard
lockup (which has existed forever):

tc qdisc add dev eth0 root handle 1: prio
tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
1:2 action mirred egress redirect dev eth0

(only do this on toy vm)

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Shmulik Ladkani

Hi,

On Sat, 24 Sep 2016 17:07:12 -0700 Cong Wang  wrote:
> One problem to use your code for us is that, the RX side of veth
> is inside containers, not visible to outside, perhaps we need some
> more parameter to tell the netns before the device name/index?
> Thoughts?

Well, this is way trickier...

tc_mirred doesn't cope with netns movement of the target device.
See 'mirred_device_event': upon NETDEV_UNREGISTER the 'tcfm_dev' gets
nullified.

(dev_change_net_namespace sequence includes NETDEV_UNREGISTER,
 dev_net_set, NETDEV_REGISTER).

As upposed to veth, which keeps the peer netdev pointer (since veth peers
lifetime is coupled), here in act_mirred we can't easily distinguish a
"real" NETDEV_UNREGISTER vs a namespace change...

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Shmulik Ladkani

On Sun, 25 Sep 2016 09:05:08 -0400 Jamal Hadi Salim  wrote:
> On 16-09-23 11:40 AM, Shmulik Ladkani wrote:
> >
> > [off topic]
> 
> I think this is still on topic!

Sorry, wasn't too clear on that.

What I meant is that _existing_ "egress redirect" already gets us into
crazy loops - the veth misconfig being just one example of, but
many more exist (many device stacking constructs, with lower dev issuing
an egress-redirect back to the topmost dev).

Point is, IMO loop detection (whether/how addressed), is orthogonal to
this series implementing "ingress redirect", and doesn't seem as a
strict prerequisite to adding the "ingress redirect" functionality to
act_mirred.

We can later address any loop-detection improvements in mirred.
WDYT?

Thanks,
Shmulik

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Daniel Borkmann


On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote:

On 16-09-23 11:40 AM, Shmulik Ladkani wrote:

On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim  wrote:

Even today, one may create loops using existing 'egress redirect',
e.g. this rediculously errorneous construct:

 # ip l add v0 type veth peer name v0p
 # tc filter add dev v0p parent : basic \
action mirred egress redirect dev v0


I think we actually recover from this one by eventually
dropping (theres a ttl field).


[off topic]

Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
and after one second I got:

# ip -s l show type veth
16: v0p@v0:  mtu 1500 qdisc noqueue state UP 
mode DEFAULT group default qlen 1000
link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
71660305923 469890864 0   0   0   0
TX: bytes  packets  errors  dropped carrier collsns
3509   24   0   0   0   0
17: v0@v0p:  mtu 1500 qdisc noqueue state UP 
mode DEFAULT group default qlen 1000
link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
3509   24   0   0   0   0
TX: bytes  packets  errors  dropped carrier collsns
71660713017 469893555 0   0   0   0



I think this is still on topic!

Now I realize that code we took out around 4.2.x is still useful
for such a use case (I wasnt thinking about veth when Florian was
slimming the skb);
+Cc Florian W.

This snippet from 4.2:
-
3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
3526 {
3527 struct net_device *dev = skb->dev;
3528 u32 ttl = G_TC_RTTL(skb->tc_verd);
3529 int result = TC_ACT_OK;
3530 struct Qdisc *q;
3531
3532 if (unlikely(MAX_RED_LOOP < ttl++)) {
3533 net_warn_ratelimited("Redir loop detected Dropping packet 
(%d->%d)\n",
3534  skb->skb_iif, dev->ifindex);
3535 return TC_ACT_SHOT;
3536 }
3537
3538 skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
3539 skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
3540
3541 q = rcu_dereference(rxq->qdisc);
3542 if (q != &noop_qdisc) {
3543 spin_lock(qdisc_lock(q));
3544 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, 
&q->state)))
3545 result = qdisc_enqueue_root(skb, q);
3546 spin_unlock(qdisc_lock(q));
3547 }
3548
3549 return result;
3550 }


MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
current code. The idea above was that we would increment the rttl
counter  once and if we saw it again upto MAX_RED_LOOP we would assume
a loop and drop the packet (at the time i didnt think it was wise to
let the actions be in charge of setting the RTTL; it had to be central
core code - but it may not be neccessary)

Florian, when we discussed I said it was fine to reclaim those 3 bits
on tc verdict for RTTL at the time because i had taken out the
feature and never added it back. Your comment at the time was we can
add it back when someone shows up with the feature.
Shmulik is looking to add it.


Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
this in the skb.


Similarly to all constructs injecting skbs to device rx (bond/team,
vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
obligated to assign 'skb2->dev' as the new rx device.

Regarding 'skb2->skb_iif', original act_mirred code already has:

 skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
 skb2->dev = dev; <--- THIS IS TARGET DEV
err = dev_queue_xmit(skb2);

I'm preserving this; OTOH the suggested modification in the patch is

-err = dev_queue_xmit(skb2);
+if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
+err = dev_queue_xmit(skb2);
+else
+netif_receive_skb(skb2);

now, the call to 'netif_receive_skb' will eventually override skb_iif to
the target RX dev's index, upon entry to __netif_receive_skb_core.

I think this IS the expected behavior - as done by other "rx injection"
constructs.


Sounds fine.
I am wondering if we can have a tracing feature to show the lifetime of
the packet as it is being cycled around the kernel? It would help
debugging if some policy misbehaves.


My doubts were around whether we should call 'dev_forward_skb' instead
of 'netif_receive_skb'.
The former does some things I assumed we're not interested of, like
testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
OTOH, it DOES scrub the skb.
Maybe we should scrub it as well prior the netif_receive_skb call?


Scrubbing the skb could be a bad idea if it gets rid of global state
like the RTTL if you add it back.

cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Jamal Hadi Salim


On 16-09-24 08:07 PM, Cong Wang wrote:

On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani




One problem to use your code for us is that, the RX side of veth
is inside containers, not visible to outside, perhaps we need some
more parameter to tell the netns before the device name/index?
Thoughts?



Intriguing - but this would apply for only veth?




It may be around preventing loops maybe.


Could be, but personally, I treat these constructs as (powerful)
building blocks, and "with great power comes great responsibility".

Even today, one may create loops using existing 'egress redirect',
e.g. this rediculously errorneous construct:

 # ip l add v0 type veth peer name v0p
 # tc filter add dev v0p parent : basic \
action mirred egress redirect dev v0


Detecting such loops should not be hard technically, like we do
for reclassification. We might need some bits in skb to detect
this specific case.


Note my other email. We had the feature but we took it out to save bits
on the skb.


cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-25 Thread Jamal Hadi Salim


On 16-09-23 11:40 AM, Shmulik Ladkani wrote:

On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim  wrote:

Even today, one may create loops using existing 'egress redirect',
e.g. this rediculously errorneous construct:

 # ip l add v0 type veth peer name v0p
 # tc filter add dev v0p parent : basic \
action mirred egress redirect dev v0


I think we actually recover from this one by eventually
dropping (theres a ttl field).


[off topic]

Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
and after one second I got:

# ip -s l show type veth
16: v0p@v0:  mtu 1500 qdisc noqueue state UP 
mode DEFAULT group default qlen 1000
link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
71660305923 469890864 0   0   0   0
TX: bytes  packets  errors  dropped carrier collsns
3509   24   0   0   0   0
17: v0@v0p:  mtu 1500 qdisc noqueue state UP 
mode DEFAULT group default qlen 1000
link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
3509   24   0   0   0   0
TX: bytes  packets  errors  dropped carrier collsns
71660713017 469893555 0   0   0   0



I think this is still on topic!

Now I realize that code we took out around 4.2.x is still useful
for such a use case (I wasnt thinking about veth when Florian was
slimming the skb);
+Cc Florian W.

This snippet from 4.2:
-
3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
3526 {
3527 struct net_device *dev = skb->dev;
3528 u32 ttl = G_TC_RTTL(skb->tc_verd);
3529 int result = TC_ACT_OK;
3530 struct Qdisc *q;
3531
3532 if (unlikely(MAX_RED_LOOP < ttl++)) {
3533 net_warn_ratelimited("Redir loop detected Dropping 
packet (%d->%d)\n",

3534  skb->skb_iif, dev->ifindex);
3535 return TC_ACT_SHOT;
3536 }
3537
3538 skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
3539 skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
3540
3541 q = rcu_dereference(rxq->qdisc);
3542 if (q != &noop_qdisc) {
3543 spin_lock(qdisc_lock(q));
3544 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, 
&q->state)))

3545 result = qdisc_enqueue_root(skb, q);
3546 spin_unlock(qdisc_lock(q));
3547 }
3548
3549 return result;
3550 }


MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
current code. The idea above was that we would increment the rttl
counter  once and if we saw it again upto MAX_RED_LOOP we would assume
a loop and drop the packet (at the time i didnt think it was wise to
let the actions be in charge of setting the RTTL; it had to be central
core code - but it may not be neccessary)

Florian, when we discussed I said it was fine to reclaim those 3 bits
on tc verdict for RTTL at the time because i had taken out the
feature and never added it back. Your comment at the time was we can
add it back when someone shows up with the feature.
Shmulik is looking to add it.


Similarly to all constructs injecting skbs to device rx (bond/team,
vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
obligated to assign 'skb2->dev' as the new rx device.

Regarding 'skb2->skb_iif', original act_mirred code already has:

skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
skb2->dev = dev; <--- THIS IS TARGET DEV
err = dev_queue_xmit(skb2);

I'm preserving this; OTOH the suggested modification in the patch is

-   err = dev_queue_xmit(skb2);
+   if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
+   err = dev_queue_xmit(skb2);
+   else
+   netif_receive_skb(skb2);

now, the call to 'netif_receive_skb' will eventually override skb_iif to
the target RX dev's index, upon entry to __netif_receive_skb_core.

I think this IS the expected behavior - as done by other "rx injection"
constructs.



Sounds fine.
I am wondering if we can have a tracing feature to show the lifetime of
the packet as it is being cycled around the kernel? It would help
debugging if some policy misbehaves.


My doubts were around whether we should call 'dev_forward_skb' instead
of 'netif_receive_skb'.
The former does some things I assumed we're not interested of, like
testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
OTOH, it DOES scrub the skb.
Maybe we should scrub it as well prior the netif_receive_skb call?



Scrubbing the skb could be a bad idea if it gets rid of global state
like the RTTL if you add it back.

cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-24 Thread Cong Wang

On Fri, Sep 23, 2016 at 8:40 AM, Shmulik Ladkani
 wrote:
> On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim  wrote:
>> > Even today, one may create loops using existing 'egress redirect',
>> > e.g. this rediculously errorneous construct:
>> >
>> >  # ip l add v0 type veth peer name v0p
>> >  # tc filter add dev v0p parent : basic \
>> > action mirred egress redirect dev v0
>>
>> I think we actually recover from this one by eventually
>> dropping (theres a ttl field).
>
> [off topic]
>
> Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
> and after one second I got:
>
> # ip -s l show type veth
> 16: v0p@v0:  mtu 1500 qdisc noqueue state UP 
> mode DEFAULT group default qlen 1000
> link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
> RX: bytes  packets  errors  dropped overrun mcast
> 71660305923 469890864 0   0   0   0
> TX: bytes  packets  errors  dropped carrier collsns
> 3509   24   0   0   0   0
> 17: v0@v0p:  mtu 1500 qdisc noqueue state UP 
> mode DEFAULT group default qlen 1000
> link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
> RX: bytes  packets  errors  dropped overrun mcast
> 3509   24   0   0   0   0
> TX: bytes  packets  errors  dropped carrier collsns
> 71660713017 469893555 0   0   0   0

These ghost packets never enter IP stack, I don't think TTL
helps.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-24 Thread Cong Wang

On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani
 wrote:
> Was wondering why it's missing, googled a bit with no meaningful
> results, so speculated the following:
>
> Some time long ago, initial 'mirred' purpose was to facilitate ifb.
> Therefore 'egress redirect' was implemented. Jamal probably left the
> 'ingress' support for a later time :)
>
> One interesting usecase for 'ingress redirect' is creating "rx bouncing"
> construct (like macvlan/macvtap/ipvlan) but applied according to custom
> logic.

We have done this for our containers for a long time. We simply
redirect packets to veth TX then flow to veth RX of course.

One problem to use your code for us is that, the RX side of veth
is inside containers, not visible to outside, perhaps we need some
more parameter to tell the netns before the device name/index?
Thoughts?

>
>> It may be around preventing loops maybe.
>
> Could be, but personally, I treat these constructs as (powerful)
> building blocks, and "with great power comes great responsibility".
>
> Even today, one may create loops using existing 'egress redirect',
> e.g. this rediculously errorneous construct:
>
>  # ip l add v0 type veth peer name v0p
>  # tc filter add dev v0p parent : basic \
> action mirred egress redirect dev v0

Detecting such loops should not be hard technically, like we do
for reclassification. We might need some bits in skb to detect
this specific case. Anyway, I don't think it is a blocker, just need
more tests to catch some corner cases.

Thanks.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-24 Thread Cong Wang

On Thu, Sep 22, 2016 at 6:21 AM, Shmulik Ladkani
 wrote:
> From: Shmulik Ladkani 
>
> Up until now, 'action mirred' supported only egress actions (either
> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
>
> This patch implements the corresponding ingress actions
> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
>
> This allows attaching filters whose target is to hand matching skbs into
> the rx processing of a specified device.

I like this idea, this idea actually came to my mind when we
tried to redirect packets from eth0 to veth device in containers.
I remember I already brought this up to Jamal before (either personally
or publicly), but I forgot why I stopped.

This would reduce some latency for our Mesos containers, we
would just skip one veth TX in our scenario.

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-23 Thread Shmulik Ladkani

On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim  wrote:
> > Even today, one may create loops using existing 'egress redirect',
> > e.g. this rediculously errorneous construct:
> >
> >  # ip l add v0 type veth peer name v0p
> >  # tc filter add dev v0p parent : basic \
> > action mirred egress redirect dev v0
> 
> I think we actually recover from this one by eventually
> dropping (theres a ttl field).

[off topic]

Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
and after one second I got:

# ip -s l show type veth
16: v0p@v0:  mtu 1500 qdisc noqueue state UP 
mode DEFAULT group default qlen 1000
link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast   
71660305923 469890864 0   0   0   0   
TX: bytes  packets  errors  dropped carrier collsns 
3509   24   0   0   0   0   
17: v0@v0p:  mtu 1500 qdisc noqueue state UP 
mode DEFAULT group default qlen 1000
link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast   
3509   24   0   0   0   0   
TX: bytes  packets  errors  dropped carrier collsns 
71660713017 469893555 0   0   0   0

> The other question is what to set skb->dev and skb->iif?
> Some information will be lost if you move around netdevs a
> bit.

[back to topic]

Good point.

Similarly to all constructs injecting skbs to device rx (bond/team,
vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
obligated to assign 'skb2->dev' as the new rx device.

Regarding 'skb2->skb_iif', original act_mirred code already has:

skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
skb2->dev = dev; <--- THIS IS TARGET DEV
err = dev_queue_xmit(skb2);

I'm preserving this; OTOH the suggested modification in the patch is

-   err = dev_queue_xmit(skb2);
+   if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
+   err = dev_queue_xmit(skb2);
+   else
+   netif_receive_skb(skb2);

now, the call to 'netif_receive_skb' will eventually override skb_iif to
the target RX dev's index, upon entry to __netif_receive_skb_core.

I think this IS the expected behavior - as done by other "rx injection"
constructs.

My doubts were around whether we should call 'dev_forward_skb' instead
of 'netif_receive_skb'.
The former does some things I assumed we're not interested of, like
testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
OTOH, it DOES scrub the skb.
Maybe we should scrub it as well prior the netif_receive_skb call?

Thanks,
Shmulik

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-23 Thread Jamal Hadi Salim


On 16-09-23 01:11 AM, Shmulik Ladkani wrote:

Hi,

On Thu, 22 Sep 2016 19:40:15 -0400 Jamal Hadi Salim  wrote:

On 16-09-22 09:21 AM, Shmulik Ladkani wrote:

From: Shmulik Ladkani 

Up until now, 'action mirred' supported only egress actions (either
TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).

This patch implements the corresponding ingress actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.

This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.


Thank you for doing this. There was something that made me remove
initial support for this feature - I am blanking out right now but
will find my notes and give more details.


Thanks Jamal, appreciate any details.

Was wondering why it's missing, googled a bit with no meaningful
results, so speculated the following:

Some time long ago, initial 'mirred' purpose was to facilitate ifb.
Therefore 'egress redirect' was implemented. Jamal probably left the
'ingress' support for a later time :)



History is mirror/redirect were first introduced to do just those
plain vanilla-free features. IFB came later. Up until recently there
were still some bits to support the ingress features that were removed
by Florian W. to save some skb bits.



One interesting usecase for 'ingress redirect' is creating "rx bouncing"
construct (like macvlan/macvtap/ipvlan) but applied according to custom
logic.



I thought that was the motivation as well.


It may be around preventing loops maybe.


Could be, but personally, I treat these constructs as (powerful)
building blocks, and "with great power comes great responsibility".



Amen.
I am a believer in let-the-user-shoot-their-big-toe-if-they-want.



Even today, one may create loops using existing 'egress redirect',
e.g. this rediculously errorneous construct:

 # ip l add v0 type veth peer name v0p
 # tc filter add dev v0p parent : basic \
action mirred egress redirect dev v0



I think we actually recover from this one by eventually
dropping (theres a ttl field). We should at least not lock
the kernel forever.
The other question is what to set skb->dev and skb->iif?
Some information will be lost if you move around netdevs a
bit.

BTW: You have motivated me to start looking again at redirect
to socket that was also left out. I am getting tired of redirecting
to tuntap with all its bells and whistles.

cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-22 Thread Shmulik Ladkani

Hi,

On Thu, 22 Sep 2016 19:40:15 -0400 Jamal Hadi Salim  wrote:
> On 16-09-22 09:21 AM, Shmulik Ladkani wrote:
> > From: Shmulik Ladkani 
> >
> > Up until now, 'action mirred' supported only egress actions (either
> > TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
> >
> > This patch implements the corresponding ingress actions
> > TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
> >
> > This allows attaching filters whose target is to hand matching skbs into
> > the rx processing of a specified device.
> 
> Thank you for doing this. There was something that made me remove
> initial support for this feature - I am blanking out right now but
> will find my notes and give more details.

Thanks Jamal, appreciate any details.

Was wondering why it's missing, googled a bit with no meaningful
results, so speculated the following:

Some time long ago, initial 'mirred' purpose was to facilitate ifb.
Therefore 'egress redirect' was implemented. Jamal probably left the
'ingress' support for a later time :)

One interesting usecase for 'ingress redirect' is creating "rx bouncing"
construct (like macvlan/macvtap/ipvlan) but applied according to custom
logic.

> It may be around preventing loops maybe.

Could be, but personally, I treat these constructs as (powerful)
building blocks, and "with great power comes great responsibility".

Even today, one may create loops using existing 'egress redirect',
e.g. this rediculously errorneous construct:

 # ip l add v0 type veth peer name v0p 
 # tc filter add dev v0p parent : basic \
action mirred egress redirect dev v0

Regards,
Shmulik

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-22 Thread Jamal Hadi Salim


On 16-09-22 09:21 AM, Shmulik Ladkani wrote:

From: Shmulik Ladkani 

Up until now, 'action mirred' supported only egress actions (either
TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).

This patch implements the corresponding ingress actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.

This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.



Thank you for doing this. There was something that made me remove
initial support for this feature - I am blanking out right now but
will find my notes and give more details. It may be around preventing
loops maybe. If that was the thought then:
I am just wondering is there a use case for a packet that is redirected
from egress ethx to ingress of ethy that then requires ingress of ethy
classify? Otherwise you could just set the "dont classify" flag.
i.e SET_TC_NCLS()

cheers,
jamal

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-22 Thread Eric Dumazet

On Thu, 2016-09-22 at 21:27 +0300, Shmulik Ladkani wrote:
> On Thu, 22 Sep 2016 07:54:13 -0700 Eric Dumazet  
> wrote:
> > Hmm... we probably need to apply the full rcu protection before this
> > patch.
> > 
> > https://patchwork.ozlabs.org/patch/667680/
> 
> Are you referring to order of application into net-next?
> 
> This patch seems to present no new tcf_mirred_params members nor
> need-to-be-protected code regions (please, correct me if wrong).
> So it does not _depend_ on the 'full rcu fixes', does it?

No, simply a reminder that we run lockless there, so you might need to
read some control variables once, and in a consistent way.

(Or a concurrent writer could change params in the middle of the
function)

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-22 Thread Shmulik Ladkani

On Thu, 22 Sep 2016 07:54:13 -0700 Eric Dumazet  wrote:
> Hmm... we probably need to apply the full rcu protection before this
> patch.
> 
> https://patchwork.ozlabs.org/patch/667680/

Are you referring to order of application into net-next?

This patch seems to present no new tcf_mirred_params members nor
need-to-be-protected code regions (please, correct me if wrong).
So it does not _depend_ on the 'full rcu fixes', does it?

Thanks,
Shmulik

Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-22 Thread Eric Dumazet

On Thu, 2016-09-22 at 16:21 +0300, Shmulik Ladkani wrote:
> From: Shmulik Ladkani 
> 
> Up until now, 'action mirred' supported only egress actions (either
> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
> 
> This patch implements the corresponding ingress actions
> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
> 
> This allows attaching filters whose target is to hand matching skbs into
> the rx processing of a specified device.
> 
> Signed-off-by: Shmulik Ladkani 
> Cc: Jamal Hadi Salim 
> ---
>  Was wondering, whether netif_receive_skb or dev_forward_skb should be
>  used for the rx bouncing. Used netif_receive_skb as in ifb device.

Hmm... we probably need to apply the full rcu protection before this
patch.

https://patchwork.ozlabs.org/patch/667680/

[PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-22 Thread Shmulik Ladkani

From: Shmulik Ladkani 

Up until now, 'action mirred' supported only egress actions (either
TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).

This patch implements the corresponding ingress actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.

This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.

Signed-off-by: Shmulik Ladkani 
Cc: Jamal Hadi Salim 
---
 Was wondering, whether netif_receive_skb or dev_forward_skb should be
 used for the rx bouncing. Used netif_receive_skb as in ifb device.

 net/sched/act_mirred.c | 48 ++--
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 28629d3..942120e 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -33,6 +33,25 @@
 static LIST_HEAD(mirred_list);
 static DEFINE_SPINLOCK(mirred_list_lock);
 
+static bool tcf_mirred_is_act_redirect(int action)
+{
+   return action == TCA_EGRESS_REDIR || action == TCA_INGRESS_REDIR;
+}
+
+static u32 tcf_mirred_act_direction(int action)
+{
+   switch (action) {
+   case TCA_EGRESS_REDIR:
+   case TCA_EGRESS_MIRROR:
+   return AT_EGRESS;
+   case TCA_INGRESS_REDIR:
+   case TCA_INGRESS_MIRROR:
+   return AT_INGRESS;
+   default:
+   BUG();
+   }
+}
+
 static void tcf_mirred_release(struct tc_action *a, int bind)
 {
struct tcf_mirred *m = to_mirred(a);
@@ -96,6 +115,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
switch (parm->eaction) {
case TCA_EGRESS_MIRROR:
case TCA_EGRESS_REDIR:
+   case TCA_INGRESS_REDIR:
+   case TCA_INGRESS_MIRROR:
break;
default:
if (exists)
@@ -157,7 +178,8 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_mirred *m = to_mirred(a);
struct net_device *dev;
struct sk_buff *skb2;
-   int retval, err;
+   int retval, err = 0;
+   int mac_len;
u32 at;
 
tcf_lastuse_update(&m->tcf_tm);
@@ -182,23 +204,37 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
if (!skb2)
goto out;
 
-   if (!(at & AT_EGRESS)) {
-   if (m->tcfm_mac_header_xmit)
+   /* If action's target direction differs than filter's direction,
+* and devices expect a mac header on xmit, then mac push/pull is
+* needed.
+*/
+   if (at != tcf_mirred_act_direction(m->tcfm_eaction) &&
+   m->tcfm_mac_header_xmit) {
+   if (at & AT_EGRESS) {
+   /* caught at egress, act ingress: pull mac */
+   mac_len = skb_network_header(skb) - skb_mac_header(skb);
+   skb_pull_rcsum(skb2, mac_len);
+   } else {
+   /* caught at ingress, act egress: push mac */
skb_push_rcsum(skb2, skb->mac_len);
+   }
}
 
/* mirror is always swallowed */
-   if (m->tcfm_eaction != TCA_EGRESS_MIRROR)
+   if (tcf_mirred_is_act_redirect(m->tcfm_eaction))
skb2->tc_verd = SET_TC_FROM(skb2->tc_verd, at);
 
skb2->skb_iif = skb->dev->ifindex;
skb2->dev = dev;
-   err = dev_queue_xmit(skb2);
+   if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
+   err = dev_queue_xmit(skb2);
+   else
+   netif_receive_skb(skb2);
 
if (err) {
 out:
qstats_overlimit_inc(this_cpu_ptr(m->common.cpu_qstats));
-   if (m->tcfm_eaction != TCA_EGRESS_MIRROR)
+   if (tcf_mirred_is_act_redirect(m->tcfm_eaction))
retval = TC_ACT_SHOT;
}
rcu_read_unlock();
-- 
1.9.1

38 matches

Mail list logo