Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-26 Thread Cong Wang
On Thu, Jul 26, 2018 at 5:52 AM Jamal Hadi Salim  wrote:
>
> On 25/07/18 01:09 PM, Marcelo Ricardo Leitner wrote:
> > On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:
> >> On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim  wrote:
> >>>
> >>> Those changes were there from the beginning (above patch did
> >>> not introduce them).
> >>> IIRC, the reason was to distinguish between policy intended
> >>> drops and drops because of errors.
> >>
> >> There must be a limit for "overlimit" to make sense. There is
> >> no limit in mirred action's context, probably there is only
> >> such a limit in act_police. So, all rest should not touch overlimit.
> >
> > +1
> >
>
> I agree we should at least record drop count(unrelated patch though).
> we should keep overlimit (for no other reason other than this
> has been around for at least 15 years).
>
> On why "overlimit"? It is just a name for a counter that is useless
> for most actions (but was still being transfered to user space).
> It is the closest counter to counting "this failed because of
> runtime errors" as opposed to "user asked us to drop this".
>
> Probably a good alternative is to make a very small stats v3 structure
> (we have migrated stats structures before) and extend for
> each action/classifier/qdisc to add its extra counters using XSTATS.

Agreed.

>
> Note:
> If you are _mirroring_ packets - incrementing the drop counter is
> misleading because the packet is not dropped by the system.
> i.e the qdisc will not record it as dropped; it should for
> redirect policy.  It is useful to be able to tell
> them apart when you are collecting  analytics just for actions.

Sounds like we just need another counter rather than re-using overlimit
or drops.


> (if youve worked on a massive amount of machines you'll appreciate
> being able to debug by looking at counters that reduce ambiguity).
>

Yes, this is how I found out the overlimit of htb qdisc is inaccurate
or misleading, instead the one in htb class is accurate, see:

commit 3c75f6ee139d464351f8ab77a042dd3635769d98
Author: Eric Dumazet 
Date:   Mon Sep 18 12:36:22 2017 -0700

net_sched: sch_htb: add per class overlimits counter


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-26 Thread Jamal Hadi Salim

On 25/07/18 01:09 PM, Marcelo Ricardo Leitner wrote:

On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:

On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim  wrote:


Those changes were there from the beginning (above patch did
not introduce them).
IIRC, the reason was to distinguish between policy intended
drops and drops because of errors.


There must be a limit for "overlimit" to make sense. There is
no limit in mirred action's context, probably there is only
such a limit in act_police. So, all rest should not touch overlimit.


+1



I agree we should at least record drop count(unrelated patch though).
we should keep overlimit (for no other reason other than this
has been around for at least 15 years).

On why "overlimit"? It is just a name for a counter that is useless
for most actions (but was still being transfered to user space).
It is the closest counter to counting "this failed because of
runtime errors" as opposed to "user asked us to drop this".

Probably a good alternative is to make a very small stats v3 structure
(we have migrated stats structures before) and extend for
each action/classifier/qdisc to add its extra counters using XSTATS.

Note:
If you are _mirroring_ packets - incrementing the drop counter is
misleading because the packet is not dropped by the system.
i.e the qdisc will not record it as dropped; it should for
redirect policy.  It is useful to be able to tell
them apart when you are collecting  analytics just for actions.
(if youve worked on a massive amount of machines you'll appreciate
being able to debug by looking at counters that reduce ambiguity).

cheers,
jamal


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Marcelo Ricardo Leitner
On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:
> On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim  wrote:
> >
> > Those changes were there from the beginning (above patch did
> > not introduce them).
> > IIRC, the reason was to distinguish between policy intended
> > drops and drops because of errors.
> 
> There must be a limit for "overlimit" to make sense. There is
> no limit in mirred action's context, probably there is only
> such a limit in act_police. So, all rest should not touch overlimit.

+1



Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Cong Wang
On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim  wrote:
>
> Those changes were there from the beginning (above patch did
> not introduce them).
> IIRC, the reason was to distinguish between policy intended
> drops and drops because of errors.

There must be a limit for "overlimit" to make sense. There is
no limit in mirred action's context, probably there is only
such a limit in act_police. So, all rest should not touch overlimit.


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Jamal Hadi Salim

On 25/07/18 10:24 AM, Paolo Abeni wrote:

On Wed, 2018-07-25 at 08:27 -0400, Jamal Hadi Salim wrote:



Those changes were there from the beginning (above patch did
not introduce them).
IIRC, the reason was to distinguish between policy intended
drops and drops because of errors.


Double-checking to avoid misinterepration on my side: you are ok with
keeping the 'overlimits' increment, right?


Yes.

cheers,
jamal


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Paolo Abeni
On Wed, 2018-07-25 at 08:27 -0400, Jamal Hadi Salim wrote:
> On 25/07/18 04:29 AM, Paolo Abeni wrote:
> > On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:
> 
> [..]
> > > > I fail to understand why overlimit is increased in your case
> > > > here. I guess you want to increase 'drops' instead.
> > > 
> > > Hmm, actually the current mirred code increases overlimit too.
> > > But I still don't think it makes sense.
> > 
> > Yep, I chose to increment 'overlimits' to preserve the current mirred
> > semantic.
> > 
> > AFAICS, that was first introduced with:
> > 
> > commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
> > Author: Jamal Hadi Salim 
> > Date:   Mon Aug 15 05:25:40 2011 +
> > 
> >  net_sched: fix port mirror/redirect stats reporting
> > 
> > Likely increasing 'drops' would be "better", but I'm unsure we can
> > change this established behavior without affecting some user.
> > 
> 
> Those changes were there from the beginning (above patch did
> not introduce them).
> IIRC, the reason was to distinguish between policy intended
> drops and drops because of errors.

Double-checking to avoid misinterepration on my side: you are ok with
keeping the 'overlimits' increment, right?

Thanks,

Paolo 



Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Paolo Abeni
Hi,

On Wed, 2018-07-25 at 08:16 -0400, Jamal Hadi Salim wrote:
> +Cc Shmulik
> 
> Paolo - please also run the tdc tests (and add anymore if you
> feel they dont do coverage to your changes)

I run successfully tdc tests on a patched before posting. I plan to
rerun them before posting the v4.

> On 24/07/18 04:06 PM, Paolo Abeni wrote:
> > This is similar TC_ACT_REDIRECT, but with a slightly different
> > semantic:
> > - on ingress the mirred skbs are passed to the target device
> > network stack without any additional check not scrubbing.
> > - the rcu-protected stats provided via the tcf_result struct
> >are updated on error conditions.
> > 
> > This new tcfa_action value is not exposed to the user-space
> > and can be used only internally by clsact.
> > 
> > v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
> >   a new action type instead
> > 
> > v2 -> v3:
> >   - rename the new action value TC_ACT_REINJECT, update the
> > helper accordingly
> >   - take care of uncloned reinjected packets in XDP generic
> > hook
> > 
> > Signed-off-by: Paolo Abeni 
> > ---
> >   include/net/pkt_cls.h |  3 +++
> >   include/net/sch_generic.h | 19 +++
> >   net/core/dev.c|  6 +-
> >   3 files changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> > index 2081e4219f81..36ccfe2a303a 100644
> > --- a/include/net/pkt_cls.h
> > +++ b/include/net/pkt_cls.h
> > @@ -7,6 +7,9 @@
> >   #include 
> >   #include 
> >   
> > +/* TC action not accessible from user space */
> > +#define TC_ACT_REINJECT(TC_ACT_VALUE_MAX + 1)
> 
> Lets say in the future we add a new opcode.
> Will old kernel, new iproute2 (new value) work?

It will works as it currently does in similar situation: opcode unknown
to the kernel are treated as TC_ACT_UNSPEC even now. 

Cheers,

Paolo



Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Jiri Pirko
Wed, Jul 25, 2018 at 02:16:12PM CEST, j...@mojatatu.com wrote:

[...]

>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index 2081e4219f81..36ccfe2a303a 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -7,6 +7,9 @@
>>   #include 
>>   #include 
>> +/* TC action not accessible from user space */
>> +#define TC_ACT_REINJECT (TC_ACT_VALUE_MAX + 1)
>
>Lets say in the future we add a new opcode.
>Will old kernel, new iproute2 (new value) work?

This is safe. See patch #2.


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Jiri Pirko
Tue, Jul 24, 2018 at 10:06:42PM CEST, pab...@redhat.com wrote:
>This is similar TC_ACT_REDIRECT, but with a slightly different
>semantic:
>- on ingress the mirred skbs are passed to the target device
>network stack without any additional check not scrubbing.
>- the rcu-protected stats provided via the tcf_result struct
>  are updated on error conditions.
>
>This new tcfa_action value is not exposed to the user-space
>and can be used only internally by clsact.
>
>v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
> a new action type instead
>
>v2 -> v3:
> - rename the new action value TC_ACT_REINJECT, update the
>   helper accordingly
> - take care of uncloned reinjected packets in XDP generic
>   hook
>
>Signed-off-by: Paolo Abeni 
>---
> include/net/pkt_cls.h |  3 +++
> include/net/sch_generic.h | 19 +++
> net/core/dev.c|  6 +-
> 3 files changed, 27 insertions(+), 1 deletion(-)
>
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index 2081e4219f81..36ccfe2a303a 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -7,6 +7,9 @@
> #include 
> #include 
> 
>+/* TC action not accessible from user space */
>+#define TC_ACT_REINJECT   (TC_ACT_VALUE_MAX + 1)
>+
> /* Basic packet classifier frontend definitions. */
> 
> struct tcf_walker {
>diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>index 056dc1083aa3..95e81a70f549 100644
>--- a/include/net/sch_generic.h
>+++ b/include/net/sch_generic.h
>@@ -235,6 +235,12 @@ struct tcf_result {
>   u32 classid;
>   };
>   const struct tcf_proto *goto_tp;
>+
>+  /* used by the TC_ACT_REINJECT action */
>+  struct {
>+  boolingress;
>+  struct gnet_stats_queue *qstats;
>+  };
>   };
> };
> 
>@@ -1091,4 +1097,17 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair 
>*miniqp,
> void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
> struct mini_Qdisc __rcu **p_miniq);
> 
>+static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result 
>*res)
>+{
>+  struct gnet_stats_queue *stats = res->qstats;
>+  int ret;
>+
>+  if (res->ingress)
>+  ret = netif_receive_skb(skb);
>+  else
>+  ret = dev_queue_xmit(skb);

Hmm. "reinject" by the name tells me that the packet should be injected
again. By "inject", I understand beginning of the rx path. However, this
does xmit as well :/ It is a bit misleading. Maybe "reinsert" would
sound better?



>+  if (ret && stats)
>+  qstats_overlimit_inc(res->qstats);
>+}
>+
> #endif
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 14a748ee8cc9..826ec74fe1d9 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -4252,7 +4252,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>   /* Reinjected packets coming from act_mirred or similar should
>* not get XDP generic processing.
>*/
>-  if (skb_cloned(skb))
>+  if (skb_cloned(skb) || skb->tc_redirected)
>   return XDP_PASS;
> 
>   /* XDP packets must be linear and must have sufficient headroom
>@@ -4602,6 +4602,10 @@ sch_handle_ingress(struct sk_buff *skb, struct 
>packet_type **pt_prev, int *ret,
>   __skb_push(skb, skb->mac_len);
>   skb_do_redirect(skb);
>   return NULL;
>+  case TC_ACT_REINJECT:
>+  /* this does not scrub the packet, and updates stats on error */
>+  skb_tc_reinject(skb, _res);
>+  return NULL;
>   default:
>   break;
>   }
>-- 
>2.17.1
>


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Jamal Hadi Salim

On 25/07/18 04:29 AM, Paolo Abeni wrote:

On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:


[..]

I fail to understand why overlimit is increased in your case
here. I guess you want to increase 'drops' instead.


Hmm, actually the current mirred code increases overlimit too.
But I still don't think it makes sense.


Yep, I chose to increment 'overlimits' to preserve the current mirred
semantic.

AFAICS, that was first introduced with:

commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
Author: Jamal Hadi Salim 
Date:   Mon Aug 15 05:25:40 2011 +

 net_sched: fix port mirror/redirect stats reporting

Likely increasing 'drops' would be "better", but I'm unsure we can
change this established behavior without affecting some user.



Those changes were there from the beginning (above patch did
not introduce them).
IIRC, the reason was to distinguish between policy intended
drops and drops because of errors.

cheers,
jamal



Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Jamal Hadi Salim

+Cc Shmulik

Paolo - please also run the tdc tests (and add anymore if you
feel they dont do coverage to your changes)

On 24/07/18 04:06 PM, Paolo Abeni wrote:

This is similar TC_ACT_REDIRECT, but with a slightly different
semantic:
- on ingress the mirred skbs are passed to the target device
network stack without any additional check not scrubbing.
- the rcu-protected stats provided via the tcf_result struct
   are updated on error conditions.

This new tcfa_action value is not exposed to the user-space
and can be used only internally by clsact.

v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
  a new action type instead

v2 -> v3:
  - rename the new action value TC_ACT_REINJECT, update the
helper accordingly
  - take care of uncloned reinjected packets in XDP generic
hook

Signed-off-by: Paolo Abeni 
---
  include/net/pkt_cls.h |  3 +++
  include/net/sch_generic.h | 19 +++
  net/core/dev.c|  6 +-
  3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 2081e4219f81..36ccfe2a303a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -7,6 +7,9 @@
  #include 
  #include 
  
+/* TC action not accessible from user space */

+#define TC_ACT_REINJECT(TC_ACT_VALUE_MAX + 1)


Lets say in the future we add a new opcode.
Will old kernel, new iproute2 (new value) work?
Maybe use a negative number below -1.
I am honestly still unclear about introducing this new
code.
Could you not use the result code to carry sufficient
info to indicate the intent?

cheers,
jamal


+
  /* Basic packet classifier frontend definitions. */
  
  struct tcf_walker {

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 056dc1083aa3..95e81a70f549 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -235,6 +235,12 @@ struct tcf_result {
u32 classid;
};
const struct tcf_proto *goto_tp;
+
+   /* used by the TC_ACT_REINJECT action */
+   struct {
+   boolingress;
+   struct gnet_stats_queue *qstats;
+   };
};
  };
  
@@ -1091,4 +1097,17 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,

  void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
  struct mini_Qdisc __rcu **p_miniq);
  
+static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)

+{
+   struct gnet_stats_queue *stats = res->qstats;
+   int ret;
+
+   if (res->ingress)
+   ret = netif_receive_skb(skb);
+   else
+   ret = dev_queue_xmit(skb);
+   if (ret && stats)
+   qstats_overlimit_inc(res->qstats);
+}
+
  #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index 14a748ee8cc9..826ec74fe1d9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4252,7 +4252,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
/* Reinjected packets coming from act_mirred or similar should
 * not get XDP generic processing.
 */
-   if (skb_cloned(skb))
+   if (skb_cloned(skb) || skb->tc_redirected)
return XDP_PASS;
  
  	/* XDP packets must be linear and must have sufficient headroom

@@ -4602,6 +4602,10 @@ sch_handle_ingress(struct sk_buff *skb, struct 
packet_type **pt_prev, int *ret,
__skb_push(skb, skb->mac_len);
skb_do_redirect(skb);
return NULL;
+   case TC_ACT_REINJECT:
+   /* this does not scrub the packet, and updates stats on error */
+   skb_tc_reinject(skb, _res);
+   return NULL;
default:
break;
}





Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Paolo Abeni
On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:
> On Tue, Jul 24, 2018 at 1:38 PM Cong Wang  wrote:
> > 
> > On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni  wrote:
> > > +static inline void skb_tc_reinject(struct sk_buff *skb, struct 
> > > tcf_result *res)
> > > +{
> > > +   struct gnet_stats_queue *stats = res->qstats;
> > > +   int ret;
> > > +
> > > +   if (res->ingress)
> > > +   ret = netif_receive_skb(skb);
> > > +   else
> > > +   ret = dev_queue_xmit(skb);
> > > +   if (ret && stats)
> > > +   qstats_overlimit_inc(res->qstats);
> > 
> > Why increasing overlimit? Overlimit is typically increased
> > by traffic shapers to indicate there is no bandwidth to send
> > out the packet.
> > 
> > I fail to understand why overlimit is increased in your case
> > here. I guess you want to increase 'drops' instead.
> 
> Hmm, actually the current mirred code increases overlimit too.
> But I still don't think it makes sense.

Yep, I chose to increment 'overlimits' to preserve the current mirred
semantic. 

AFAICS, that was first introduced with:

commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
Author: Jamal Hadi Salim 
Date:   Mon Aug 15 05:25:40 2011 +

net_sched: fix port mirror/redirect stats reporting

Likely increasing 'drops' would be "better", but I'm unsure we can
change this established behavior without affecting some user.

Anyway, I'm fine either way. Please advice.

Cheers,

Paolo



Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-24 Thread Cong Wang
On Tue, Jul 24, 2018 at 1:38 PM Cong Wang  wrote:
>
> On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni  wrote:
> > +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result 
> > *res)
> > +{
> > +   struct gnet_stats_queue *stats = res->qstats;
> > +   int ret;
> > +
> > +   if (res->ingress)
> > +   ret = netif_receive_skb(skb);
> > +   else
> > +   ret = dev_queue_xmit(skb);
> > +   if (ret && stats)
> > +   qstats_overlimit_inc(res->qstats);
>
> Why increasing overlimit? Overlimit is typically increased
> by traffic shapers to indicate there is no bandwidth to send
> out the packet.
>
> I fail to understand why overlimit is increased in your case
> here. I guess you want to increase 'drops' instead.

Hmm, actually the current mirred code increases overlimit too.
But I still don't think it makes sense.


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-24 Thread Cong Wang
On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni  wrote:
> +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result 
> *res)
> +{
> +   struct gnet_stats_queue *stats = res->qstats;
> +   int ret;
> +
> +   if (res->ingress)
> +   ret = netif_receive_skb(skb);
> +   else
> +   ret = dev_queue_xmit(skb);
> +   if (ret && stats)
> +   qstats_overlimit_inc(res->qstats);

Why increasing overlimit? Overlimit is typically increased
by traffic shapers to indicate there is no bandwidth to send
out the packet.

I fail to understand why overlimit is increased in your case
here. I guess you want to increase 'drops' instead.