Re: [PATCH net 0/4] net/sched: forbid 'goto_chain' on fallback actions

2018-10-22 Thread Jamal Hadi Salim

On 2018-10-20 5:33 p.m., Davide Caratti wrote:

the following command:

  # tc actions add action police rate 1mbit burst 1k conform-exceed \
  > pass / goto chain 42

generates a NULL pointer dereference when packets exceed the configured
rate. Similarly, the following command:

  # tc actions add action pass random determ goto chain 42 2

makes the kernel crash with NULL dereference when the first packet does
not match the 'pass' action.

gact and police allow users to specify a fallback control action, that is
stored in the action private data. 'goto chain x' never worked for these
cases, since a->goto_chain handle was never initialized. There is only one
goto_chain handle per TC action, and it is designed to be non-NULL only if
tcf_action contains a 'goto chain' command. So, let's forbid 'goto chain'
on fallback actions.

Patch 1/4 and 2/4 change the .init() functions of police and gact, to let
them return an error when users try to set 'goto chain x' in the fallback
action. Patch 3/4 and 4/4 add TDC selftest coverage to this new behavior.



For the series,
Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net v2] net/sched: act_gact: properly init 'goto chain'

2018-10-18 Thread Jamal Hadi Salim
;tcf_action)
+#endif
+
  static inline bool __is_tcf_gact_act(const struct tc_action *a, int act,
 bool is_ext)
  {
@@ -26,8 +33,8 @@ static inline bool __is_tcf_gact_act(const struct tc_action 
*a, int act,
return false;
  
  	gact = to_gact(a);

-   if ((!is_ext && gact->tcf_action == act) ||
-   (is_ext && TC_ACT_EXT_CMP(gact->tcf_action, act)))
+   if ((!is_ext && GACT_PRIMARY_ACTION(gact) == act) ||
+   (is_ext && TC_ACT_EXT_CMP(GACT_PRIMARY_ACTION(gact), act)))
return true;
  
  #endif

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index cd1d9bd32ef9..49b32650efb9 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -31,7 +31,7 @@ static int gact_net_rand(struct tcf_gact *gact)
  {
smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
if (prandom_u32() % gact->tcfg_pval)
-   return gact->tcf_action;
+   return gact->tcfg_caction;
return gact->tcfg_paction;
  }
  
@@ -41,7 +41,7 @@ static int gact_determ(struct tcf_gact *gact)
  
  	smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */

if (pack % gact->tcfg_pval)
-   return gact->tcf_action;
+   return gact->tcfg_caction;
return gact->tcfg_paction;
  }
  
@@ -88,6 +88,9 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,

p_parm = nla_data(tb[TCA_GACT_PROB]);
if (p_parm->ptype >= MAX_RAND)
return -EINVAL;
+   if (TC_ACT_EXT_CMP(p_parm->paction, TC_ACT_GOTO_CHAIN) &&
+   TC_ACT_EXT_CMP(parm->action, TC_ACT_GOTO_CHAIN))
+   return -EINVAL;


Rejection is a good solution[1].
Would be helpful to set an ext_ack to something like
"only one goto chain is supported currently"
I didnt follow why you needed to introduce tcfg_caction...

cheers,
jamal

[1]actions should allow multitude return opcodes, not
just two. The proper solution seems to be to just let
the caller (cls_api - who is aware of chains) to
deal with the chain selection. Sticking them in actions
speeds up lookup (and deal with refcnts) but i wonder
given this breakage in abstraction whether they belong
there...


Re: [PATCH net] net/sched: properly init chain in case of multiple control actions

2018-10-14 Thread Jamal Hadi Salim

On 2018-10-13 11:23 a.m., Davide Caratti wrote:

On Fri, 2018-10-12 at 13:57 -0700, Cong Wang wrote:

Why not just validate the fallback action in each action init()?
For example, checking tcfg_paction in tcf_gact_init().

I don't see the need of making it generic.


hello Cong, once again thanks for looking at this.

what you say is doable, and I evaluated doing it before proposing this
patch.

But I felt unconfortable, because I needed to pass struct tcf_proto *tp in
tcf_gact_init() to initialize a->goto_chain with the chain_idx encoded in
the fallback action. So, I would have changed all the init() functions in
all TC actions, just to fix two of them.

A (legal?) trick  is to let tcf_action store the fallback action when it
contains a 'goto chain' command, I just posted a proposal for gact. If you
think it's ok, I will test and post the same for act_police.



Need some more thought. So the issue here is the goto chain failed
the configured chain doesnt exist?

cheers,
jamal


Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps

2018-10-11 Thread Jamal Hadi Salim

On 2018-10-11 2:44 p.m., David Ahern wrote:

On 10/11/18 12:05 PM, Jamal Hadi Salim wrote:

On 2018-10-11 1:04 p.m., David Ahern wrote:




I meant the general API of users passing filter arguments as attributes
to the dump (or values in the header) -- KIND, MASTER, device index,
etc. This is an existing API and existing capability.



which i referred to as use-case driven. It is not unreasonable
to optimize for the most common - but every time somebody
comes up with something new you need to patch the kernel.



I disagree with your overall premise of bpf the end-all hammer. It is a
tool but not the only tool. For starters, you are proposing building the
message, run the filter on it, and potentially back the message up to
drop the recently added piece because the filter does not want it
included. That is still wasting a lot of cpu cycles to build and drop. I
am thinking about scale to 1 million routes -- I do not need the dump
loop building a message for 1 million entries only to drop 99% of them.
That is crazy.



My earlier suggestion was for somewhere before the skb is formed.
In the vicinity of xxx_fill_info()
The "create skb and drop" kind works already today with some
acrobatics needed for some cases with cbpf.

Is it unfeasible to add an ebpf hook at that point and ask a user
supplied code "is this ok to send?" - this is no different
than doing a "get by key" operation where key/filter is any arbitrary
construction of fields rtm understands (including the ones you
provided like table index) that are passed in the user program.
Classical "select" mechanism in database tables


The way the kernel manages route tables says I should pass in the table
id as it is a major differentiator on what is returned. From there
lookup the specific table (I need to fix this part per my response to
Andrew), and then only walk it. The existing semantics, capabilities
that exist for other dump commands is the most efficient for some of
these high level, big hammer filters.



Sure.


What you want gets into the tiniest of details and yes the imagination
can go wild with combinations of filter options. So maybe this scanning
of post-built messages is reasonable *after* the high level sorting is done.



That doesnt require any change.

cheers,
jamal


Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps

2018-10-11 Thread Jamal Hadi Salim

On 2018-10-11 1:04 p.m., David Ahern wrote:


You can already filter link dumps by kind. How? By passing in the KIND
attribute on a dump request. This type of filtering exists for link
dumps, neighbor dumps, fdb dumps. Why is there a push to make route
dumps different? Why can't they be consistent and use existing semantics?


I think you meant filtering by ifindex in neighbor.
note: I would argue that there are already "adhoc" ways of filtering
in place, mostly use case driven). Otherwise Sowmini wouldnt have to
craft that bpf filter. There are netlink users who have none or some
weird filtering involved. There is no arguement that your approach
works for rtm. But the rest of the users missing filters will require
similar kernel changes. Could this be made generic enough to benefit
other netlink users?
The problem is there's always one new attribute that would make sense
for some use case which requires a kernel change ("send me an event only
if you get link down" or "dump all ports with link down").

cheers,
jamal




Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps

2018-10-11 Thread Jamal Hadi Salim

On 2018-10-11 12:16 p.m., David Ahern wrote:




IMO, bpf at the fill_info stage is not appropriate.



Somewhere before the skb is formed (and nlmsg is built).
If you go as far as constructing it, then cBPF per what
Sowmini should work; but there will be constructs which
are trickier.




skb->sk (hence attached filter) should be available at that point.
Classical bpf per Sowmini's example maybe trickier.
Better - why dont we have an ebpf hook at this stage and then
we dont have to make changes to the kernel when someone adds
one more field to the filter?

BTW: useful for events as well - not just dumps (as the name
fib_dump_filter suggests)


you mean kernel notifications on internal events?
1. there is no user socket when notifications are created and the
*_fill_info is invoked

2. notifications are global going to potentially many sockets. For these
cases the existing sk_filter is appropriate.


#2 - netlink being a broad/multicast bus with attached listeners.

Yes, you can do it with cBPF but some complexity may occur. Example:
if i was interested to netdevice events of "kind = vxlan &&
admin flag is down" then that is non trivial to do with classical but
would be reasonably comfortable to do with ebpf.
Note:
That filter will work fine for dumps as well since the semantics
are the same.
the win is:  in the future when you just wanna add that one new
filter attribute you dont need a kernel patch in and roll in a
new production kernel.

cheers,
jamal


Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps

2018-10-11 Thread Jamal Hadi Salim

On 2018-10-11 11:46 a.m., Sowmini Varadhan wrote:

On (10/11/18 08:26), Stephen Hemminger wrote:

You can do the something like this already with BPF socket filters.
But writing BPF for multi-part messages is hard.


Indeed. And I was just experimenting with this for ARP just last week.
So to handle the caes of "ip neigh show a.b.c.d" without walking through
the entire arp table and filtering in userspace, you could add a sk_filter()
hook like this:



If this could be done a lot earlier aka at xxx_fill_info() bpf would
be a very good answer.
skb->sk (hence attached filter) should be available at that point.
Classical bpf per Sowmini's example maybe trickier.
Better - why dont we have an ebpf hook at this stage and then
we dont have to make changes to the kernel when someone adds
one more field to the filter?

BTW: useful for events as well - not just dumps (as the name
fib_dump_filter suggests)

cheers,
jamal


[PATCH net-next v2 10/12] net: sched: cls_u32: get rid of tp_c

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

Both hnode ->tp_c and tp_c argument of u32_set_parms()
the latter is redundant, the former - never read...

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3ed2c9866b36..3d4c360f9b0c 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -79,7 +79,6 @@ struct tc_u_hnode {
struct tc_u_hnode __rcu *next;
u32 handle;
u32 prio;
-   struct tc_u_common  *tp_c;
int refcnt;
unsigned intdivisor;
struct idr  handle_idr;
@@ -390,7 +389,6 @@ static int u32_init(struct tcf_proto *tp)
tp_c->refcnt++;
RCU_INIT_POINTER(root_ht->next, tp_c->hlist);
rcu_assign_pointer(tp_c->hlist, root_ht);
-   root_ht->tp_c = tp_c;
 
rcu_assign_pointer(tp->root, root_ht);
tp->data = tp_c;
@@ -761,7 +759,7 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] 
= {
 };
 
 static int u32_set_parms(struct net *net, struct tcf_proto *tp,
-unsigned long base, struct tc_u_common *tp_c,
+unsigned long base,
 struct tc_u_knode *n, struct nlattr **tb,
 struct nlattr *est, bool ovr,
 struct netlink_ext_ack *extack)
@@ -782,7 +780,7 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
}
 
if (handle) {
-   ht_down = u32_lookup_ht(tp_c, handle);
+   ht_down = u32_lookup_ht(tp->data, handle);
 
if (!ht_down) {
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
@@ -956,7 +954,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (!new)
return -ENOMEM;
 
-   err = u32_set_parms(net, tp, base, tp_c, new, tb,
+   err = u32_set_parms(net, tp, base, new, tb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
@@ -1012,7 +1010,6 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return err;
}
}
-   ht->tp_c = tp_c;
ht->refcnt = 1;
ht->divisor = divisor;
ht->handle = handle;
@@ -1123,7 +1120,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
}
 #endif
 
-   err = u32_set_parms(net, tp, base, tp_c, n, tb, tca[TCA_RATE], ovr,
+   err = u32_set_parms(net, tp, base, n, tb, tca[TCA_RATE], ovr,
extack);
if (err == 0) {
struct tc_u_knode __rcu **ins;
-- 
2.11.0



[PATCH net-next v2 07/12] net: sched: cls_u32: clean tc_u_common hashtable

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

* calculate key *once*, not for each hash chain element
* let tc_u_hash() return the pointer to chain head rather than index -
callers are cleaner that way.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index c378168f4562..3f6fba831c57 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -343,19 +343,16 @@ static void *tc_u_common_ptr(const struct tcf_proto *tp)
return block->q;
 }
 
-static unsigned int tc_u_hash(const struct tcf_proto *tp)
+static struct hlist_head *tc_u_hash(void *key)
 {
-   return hash_ptr(tc_u_common_ptr(tp), U32_HASH_SHIFT);
+   return tc_u_common_hash + hash_ptr(key, U32_HASH_SHIFT);
 }
 
-static struct tc_u_common *tc_u_common_find(const struct tcf_proto *tp)
+static struct tc_u_common *tc_u_common_find(void *key)
 {
struct tc_u_common *tc;
-   unsigned int h;
-
-   h = tc_u_hash(tp);
-   hlist_for_each_entry(tc, _u_common_hash[h], hnode) {
-   if (tc->ptr == tc_u_common_ptr(tp))
+   hlist_for_each_entry(tc, tc_u_hash(key), hnode) {
+   if (tc->ptr == key)
return tc;
}
return NULL;
@@ -364,10 +361,8 @@ static struct tc_u_common *tc_u_common_find(const struct 
tcf_proto *tp)
 static int u32_init(struct tcf_proto *tp)
 {
struct tc_u_hnode *root_ht;
-   struct tc_u_common *tp_c;
-   unsigned int h;
-
-   tp_c = tc_u_common_find(tp);
+   void *key = tc_u_common_ptr(tp);
+   struct tc_u_common *tp_c = tc_u_common_find(key);
 
root_ht = kzalloc(sizeof(*root_ht), GFP_KERNEL);
if (root_ht == NULL)
@@ -385,12 +380,11 @@ static int u32_init(struct tcf_proto *tp)
kfree(root_ht);
return -ENOBUFS;
}
-   tp_c->ptr = tc_u_common_ptr(tp);
+   tp_c->ptr = key;
INIT_HLIST_NODE(_c->hnode);
idr_init(_c->handle_idr);
 
-   h = tc_u_hash(tp);
-   hlist_add_head(_c->hnode, _u_common_hash[h]);
+   hlist_add_head(_c->hnode, tc_u_hash(key));
}
 
tp_c->refcnt++;
-- 
2.11.0



[PATCH net-next v2 05/12] net: sched: cls_u32: get rid of tc_u_knode ->tp

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

not used anymore

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ef0f2e6ec422..810c49ac1bbe 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -68,7 +68,6 @@ struct tc_u_knode {
u32 mask;
u32 __percpu*pcpu_success;
 #endif
-   struct tcf_proto*tp;
struct rcu_work rwork;
/* The 'sel' field MUST be the last field in structure to allow for
 * tc_u32_keys allocated at end of structure.
@@ -896,7 +895,6 @@ static struct tc_u_knode *u32_init_knode(struct tcf_proto 
*tp,
/* Similarly success statistics must be moved as pointers */
new->pcpu_success = n->pcpu_success;
 #endif
-   new->tp = tp;
memcpy(>sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key));
 
if (tcf_exts_init(>exts, TCA_U32_ACT, TCA_U32_POLICE)) {
@@ -1112,7 +1110,6 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
n->handle = handle;
n->fshift = s->hmask ? ffs(ntohl(s->hmask)) - 1 : 0;
n->flags = flags;
-   n->tp = tp;
 
err = tcf_exts_init(>exts, TCA_U32_ACT, TCA_U32_POLICE);
if (err < 0)
-- 
2.11.0



[PATCH net-next v2 02/12] net: sched: cls_u32: disallow linking to root hnode

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

Operation makes no sense.  Nothing will actually break if we do so
(depth limit in u32_classify() will prevent infinite loops), but
according to maintainers it's best prohibited outright.

NOTE: doing so guarantees that u32_destroy() will trigger the call
of u32_destroy_hnode(); we might want to make that unconditional.

Test:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent : protocol ip prio 100 u32 \
link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
should fail with
Error: cls_u32: Not linking to root node

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 622f4657da94..3357331a80a2 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -797,6 +797,10 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
return -EINVAL;
}
+   if (ht_down->is_root) {
+   NL_SET_ERR_MSG_MOD(extack, "Not linking to root 
node");
+   return -EINVAL;
+   }
ht_down->refcnt++;
}
 
-- 
2.11.0



[PATCH net-next v2 06/12] net: sched: cls_u32: get rid of tc_u_common ->rcu

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

unused

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 810c49ac1bbe..c378168f4562 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -98,7 +98,6 @@ struct tc_u_common {
int refcnt;
struct idr  handle_idr;
struct hlist_node   hnode;
-   struct rcu_head rcu;
 };
 
 static inline unsigned int u32_hash_fold(__be32 key,
-- 
2.11.0



[PATCH net-next v2 01/12] net: sched: cls_u32: mark root hnode explicitly

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

... and produce consistent error on attempt to delete such.
Existing check in u32_delete() is inconsistent - after

tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent : protocol ip prio 100 handle 1: u32 \
divisor 1
tc filter add dev eth0 parent : protocol ip prio 200 handle 2: u32 \
divisor 1

both

tc filter delete dev eth0 parent : protocol ip prio 100 handle 801: u32

and

tc filter delete dev eth0 parent : protocol ip prio 100 handle 800: u32

will fail (at least with refcounting fixes), but the former will complain
about an attempt to remove a busy table, while the latter will recognize
it as root and yield "Not allowed to delete root node" instead.

The problem with the existing check is that several tcf_proto instances
might share the same tp->data and handle-to-hnode lookup will be the same
for all of them. So comparing an hnode to be deleted with tp->root won't
catch the case when one tp is used to try deleting the root of another.
Solution is trivial - mark the root hnodes explicitly upon allocation and
check for that.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index f218ccf1e2d9..622f4657da94 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -84,6 +84,7 @@ struct tc_u_hnode {
int refcnt;
unsigned intdivisor;
struct idr  handle_idr;
+   boolis_root;
struct rcu_head rcu;
u32 flags;
/* The 'ht' field MUST be the last field in structure to allow for
@@ -377,6 +378,7 @@ static int u32_init(struct tcf_proto *tp)
root_ht->refcnt++;
root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x8000;
root_ht->prio = tp->prio;
+   root_ht->is_root = true;
idr_init(_ht->handle_idr);
 
if (tp_c == NULL) {
@@ -692,7 +694,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool 
*last,
goto out;
}
 
-   if (root_ht == ht) {
+   if (ht->is_root) {
NL_SET_ERR_MSG_MOD(extack, "Not allowed to delete root node");
return -EINVAL;
}
-- 
2.11.0



[PATCH net-next v2 12/12] net: sched: cls_u32: simplify the hell out u32_delete() emptiness check

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

Now that we have the knode count, we can instantly check if
any hnodes are non-empty.  And that kills the check for extra
references to root hnode - those could happen only if there was
a knode to carry such a link.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 48 +---
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 61593bee08db..ac79a40a0392 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -627,17 +627,6 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
return -ENOENT;
 }
 
-static bool ht_empty(struct tc_u_hnode *ht)
-{
-   unsigned int h;
-
-   for (h = 0; h <= ht->divisor; h++)
-   if (rcu_access_pointer(ht->ht[h]))
-   return false;
-
-   return true;
-}
-
 static void u32_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
 {
struct tc_u_common *tp_c = tp->data;
@@ -675,13 +664,9 @@ static int u32_delete(struct tcf_proto *tp, void *arg, 
bool *last,
  struct netlink_ext_ack *extack)
 {
struct tc_u_hnode *ht = arg;
-   struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
struct tc_u_common *tp_c = tp->data;
int ret = 0;
 
-   if (ht == NULL)
-   goto out;
-
if (TC_U32_KEY(ht->handle)) {
u32_remove_hw_knode(tp, (struct tc_u_knode *)ht, extack);
ret = u32_delete_key(tp, (struct tc_u_knode *)ht);
@@ -702,38 +687,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, 
bool *last,
}
 
 out:
-   *last = true;
-   if (root_ht) {
-   if (root_ht->refcnt > 1) {
-   *last = false;
-   goto ret;
-   }
-   if (root_ht->refcnt == 1) {
-   if (!ht_empty(root_ht)) {
-   *last = false;
-   goto ret;
-   }
-   }
-   }
-
-   if (tp_c->refcnt > 1) {
-   *last = false;
-   goto ret;
-   }
-
-   if (tp_c->refcnt == 1) {
-   struct tc_u_hnode *ht;
-
-   for (ht = rtnl_dereference(tp_c->hlist);
-ht;
-ht = rtnl_dereference(ht->next))
-   if (!ht_empty(ht)) {
-   *last = false;
-   break;
-   }
-   }
-
-ret:
+   *last = tp_c->refcnt == 1 && tp_c->knodes == 0;
return ret;
 }
 
-- 
2.11.0



[PATCH net-next v2 04/12] net: sched: cls_u32: get rid of unused argument of u32_destroy_key()

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ce55eea448a0..ef0f2e6ec422 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -405,8 +405,7 @@ static int u32_init(struct tcf_proto *tp)
return 0;
 }
 
-static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n,
-  bool free_pf)
+static int u32_destroy_key(struct tc_u_knode *n, bool free_pf)
 {
struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
 
@@ -440,7 +439,7 @@ static void u32_delete_key_work(struct work_struct *work)
  struct tc_u_knode,
  rwork);
rtnl_lock();
-   u32_destroy_key(key->tp, key, false);
+   u32_destroy_key(key, false);
rtnl_unlock();
 }
 
@@ -457,7 +456,7 @@ static void u32_delete_key_freepf_work(struct work_struct 
*work)
  struct tc_u_knode,
  rwork);
rtnl_lock();
-   u32_destroy_key(key->tp, key, true);
+   u32_destroy_key(key, true);
rtnl_unlock();
 }
 
@@ -600,7 +599,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
if (tcf_exts_get_net(>exts))
tcf_queue_work(>rwork, 
u32_delete_key_freepf_work);
else
-   u32_destroy_key(n->tp, n, true);
+   u32_destroy_key(n, true);
}
}
 }
@@ -971,13 +970,13 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
-   u32_destroy_key(tp, new, false);
+   u32_destroy_key(new, false);
return err;
}
 
err = u32_replace_hw_knode(tp, new, flags, extack);
if (err) {
-   u32_destroy_key(tp, new, false);
+   u32_destroy_key(new, false);
return err;
}
 
-- 
2.11.0



[PATCH net-next v2 08/12] net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnode

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

the only thing we used ht for was ht->tp_c and callers can get that
without going through ->tp_c at all; start with lifting that into
the callers, next commits will massage those, eventually removing
->tp_c altogether.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3f6fba831c57..53f34f8cde8b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -761,7 +761,7 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] 
= {
 };
 
 static int u32_set_parms(struct net *net, struct tcf_proto *tp,
-unsigned long base, struct tc_u_hnode *ht,
+unsigned long base, struct tc_u_common *tp_c,
 struct tc_u_knode *n, struct nlattr **tb,
 struct nlattr *est, bool ovr,
 struct netlink_ext_ack *extack)
@@ -782,7 +782,7 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
}
 
if (handle) {
-   ht_down = u32_lookup_ht(ht->tp_c, handle);
+   ht_down = u32_lookup_ht(tp_c, handle);
 
if (!ht_down) {
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
@@ -957,7 +957,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return -ENOMEM;
 
err = u32_set_parms(net, tp, base,
-   rtnl_dereference(n->ht_up), new, tb,
+   rtnl_dereference(n->ht_up)->tp_c, new, tb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
@@ -1124,7 +1124,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
}
 #endif
 
-   err = u32_set_parms(net, tp, base, ht, n, tb, tca[TCA_RATE], ovr,
+   err = u32_set_parms(net, tp, base, ht->tp_c, n, tb, tca[TCA_RATE], ovr,
extack);
if (err == 0) {
struct tc_u_knode __rcu **ins;
-- 
2.11.0



[PATCH net-next v2 03/12] net: sched: cls_u32: make sure that divisor is a power of 2

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

Tested by modifying iproute2 to allow sending a divisor > 255

Tested-by: Jamal Hadi Salim 
Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3357331a80a2..ce55eea448a0 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -994,7 +994,11 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (tb[TCA_U32_DIVISOR]) {
unsigned int divisor = nla_get_u32(tb[TCA_U32_DIVISOR]);
 
-   if (--divisor > 0x100) {
+   if (!is_power_of_2(divisor)) {
+   NL_SET_ERR_MSG_MOD(extack, "Divisor is not a power of 
2");
+   return -EINVAL;
+   }
+   if (divisor-- > 0x100) {
NL_SET_ERR_MSG_MOD(extack, "Exceeded maximum 256 hash 
buckets");
return -EINVAL;
}
-- 
2.11.0



[PATCH net-next v2 11/12] net: sched: cls_u32: keep track of knodes count in tc_u_common

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

allows to simplify u32_delete() considerably

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3d4c360f9b0c..61593bee08db 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -97,6 +97,7 @@ struct tc_u_common {
int refcnt;
struct idr  handle_idr;
struct hlist_node   hnode;
+   longknodes;
 };
 
 static inline unsigned int u32_hash_fold(__be32 key,
@@ -452,6 +453,7 @@ static void u32_delete_key_freepf_work(struct work_struct 
*work)
 
 static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 {
+   struct tc_u_common *tp_c = tp->data;
struct tc_u_knode __rcu **kp;
struct tc_u_knode *pkp;
struct tc_u_hnode *ht = rtnl_dereference(key->ht_up);
@@ -462,6 +464,7 @@ static int u32_delete_key(struct tcf_proto *tp, struct 
tc_u_knode *key)
 kp = >next, pkp = rtnl_dereference(*kp)) {
if (pkp == key) {
RCU_INIT_POINTER(*kp, key->next);
+   tp_c->knodes--;
 
tcf_unbind_filter(tp, >res);
idr_remove(>handle_idr, key->handle);
@@ -576,6 +579,7 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, 
struct tc_u_knode *n,
 static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
struct netlink_ext_ack *extack)
 {
+   struct tc_u_common *tp_c = tp->data;
struct tc_u_knode *n;
unsigned int h;
 
@@ -583,6 +587,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
RCU_INIT_POINTER(ht->ht[h],
 rtnl_dereference(n->next));
+   tp_c->knodes--;
tcf_unbind_filter(tp, >res);
u32_remove_hw_knode(tp, n, extack);
idr_remove(>handle_idr, n->handle);
@@ -1141,6 +1146,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
 
RCU_INIT_POINTER(n->next, pins);
rcu_assign_pointer(*ins, n);
+   tp_c->knodes++;
*arg = n;
return 0;
}
-- 
2.11.0



[PATCH net-next v2 00/12] net: sched: cls_u32 Various improvements

2018-10-08 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Various improvements from Al.

Changes from version 1: Add missing commit 

Al Viro (11):
  net: sched: cls_u32: mark root hnode explicitly
  net: sched: cls_u32: disallow linking to root hnode
  net: sched: cls_u32: make sure that divisor is a power of 2
  net: sched: cls_u32: get rid of unused argument of u32_destroy_key()
  net: sched: cls_u32: get rid of tc_u_knode ->tp
  net: sched: cls_u32: get rid of tc_u_common ->rcu
  net: sched: cls_u32: clean tc_u_common hashtable
  net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of
tc_u_hnode
  net: sched: cls_u32: the tp_c argument of u32_set_parms() is always
tp->data
  net: sched: cls_u32: get rid of tp_c
  net: sched: cls_u32: keep track of knodes count in tc_u_common
  net: sched: cls_u32: simplify the hell out u32_delete() emptiness
check

 net/sched/cls_u32.c | 121 +---
 1 file changed, 38 insertions(+), 83 deletions(-)

-- 
2.11.0



[PATCH net-next v2 09/12] net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->data

2018-10-08 Thread Jamal Hadi Salim
From: Al Viro 

It must be tc_u_common associated with that tp (i.e. tp->data).
Proof:
* both ->ht_up and ->tp_c are assign-once
* ->tp_c of anything inserted into tp_c->hlist is tp_c
* hnodes never get reinserted into the lists or moved
between those, so anything found by u32_lookup_ht(tp->data, ...)
will have ->tp_c equal to tp->data.
* tp->root->tp_c == tp->data.
* ->ht_up of anything inserted into hnode->ht[...] is
equal to hnode.
* knodes never get reinserted into hash chains or moved
between those, so anything returned by u32_lookup_key(ht, ...)
will have ->ht_up equal to ht.
* any knode returned by u32_get(tp, ...) will have ->ht_up->tp_c
point to tp->data

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 53f34f8cde8b..3ed2c9866b36 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -956,8 +956,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (!new)
return -ENOMEM;
 
-   err = u32_set_parms(net, tp, base,
-   rtnl_dereference(n->ht_up)->tp_c, new, tb,
+   err = u32_set_parms(net, tp, base, tp_c, new, tb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
@@ -1124,7 +1123,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
}
 #endif
 
-   err = u32_set_parms(net, tp, base, ht->tp_c, n, tb, tca[TCA_RATE], ovr,
+   err = u32_set_parms(net, tp, base, tp_c, n, tb, tca[TCA_RATE], ovr,
extack);
if (err == 0) {
struct tc_u_knode __rcu **ins;
-- 
2.11.0



Re: [PATCH net-next 02/11] net: sched: cls_u32: make sure that divisor is a power of 2

2018-10-08 Thread Jamal Hadi Salim

On 2018-10-08 4:46 a.m., Sergei Shtylyov wrote:

Hello!

On 07.10.2018 19:38, Jamal Hadi Salim wrote:


From: Al Viro 

Tested by modifying iproute2 to to allow


    One "to" is enough, no? :-)



Will fix in updated version.

cheers,
jamal


Re: [PATCH net-next 00/11] net: sched: cls_u32 Various improvements

2018-10-08 Thread Jamal Hadi Salim

On 2018-10-08 2:11 a.m., Al Viro wrote:

On Sun, Oct 07, 2018 at 10:55:52PM -0700, David Miller wrote:

From: Al Viro 
Date: Mon, 8 Oct 2018 06:45:15 +0100


Er...  Both are due to missing in the very beginning of the series (well, on
top of "net: sched: cls_u32: fix hnode refcounting") commit


All dependencies like this must be explicitly stated.

And in such situations you actually should wait for the dependency to
get into 'net', eventually get merged into 'net-next', and then you
can submit the stuff that depends upon it.

Not the way this was done.


Point (and this commit is simply missing - it's not that it went into
net).  FWIW, this was simply "this is what the breakage is caused by",
not an attempt to amend the submission or anything like that...



My bad. I was wondering why it compiled for me. There is no dependency
on the single patch that went to net. I will submit an updated version.

cheers,
jamal


[PATCH net-next 08/11] net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->data

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

It must be tc_u_common associated with that tp (i.e. tp->data).
Proof:
* both ->ht_up and ->tp_c are assign-once
* ->tp_c of anything inserted into tp_c->hlist is tp_c
* hnodes never get reinserted into the lists or moved
between those, so anything found by u32_lookup_ht(tp->data, ...)
will have ->tp_c equal to tp->data.
* tp->root->tp_c == tp->data.
* ->ht_up of anything inserted into hnode->ht[...] is
equal to hnode.
* knodes never get reinserted into hash chains or moved
between those, so anything returned by u32_lookup_key(ht, ...)
will have ->ht_up equal to ht.
* any knode returned by u32_get(tp, ...) will have ->ht_up->tp_c
point to tp->data

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 53f34f8cde8b..3ed2c9866b36 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -956,8 +956,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (!new)
return -ENOMEM;
 
-   err = u32_set_parms(net, tp, base,
-   rtnl_dereference(n->ht_up)->tp_c, new, tb,
+   err = u32_set_parms(net, tp, base, tp_c, new, tb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
@@ -1124,7 +1123,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
}
 #endif
 
-   err = u32_set_parms(net, tp, base, ht->tp_c, n, tb, tca[TCA_RATE], ovr,
+   err = u32_set_parms(net, tp, base, tp_c, n, tb, tca[TCA_RATE], ovr,
extack);
if (err == 0) {
struct tc_u_knode __rcu **ins;
-- 
2.11.0



[PATCH net-next 00/11] net: sched: cls_u32 Various improvements

2018-10-07 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Various improvements from Al.

Al Viro (11):
  net: sched: cls_u32: disallow linking to root hnode
  net: sched: cls_u32: make sure that divisor is a power of 2
  net: sched: cls_u32: get rid of unused argument of u32_destroy_key()
  net: sched: cls_u32: get rid of tc_u_knode ->tp
  net: sched: cls_u32: get rid of tc_u_common ->rcu
  net: sched: cls_u32: clean tc_u_common hashtable
  net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of
tc_u_hnode
  net: sched: cls_u32: the tp_c argument of u32_set_parms() is always
tp->data
  net: sched: cls_u32: keep track of knodes count in tc_u_common
  net: sched: cls_u32: simplify the hell out u32_delete() emptiness
check
  net: sched: cls_u32: get rid of tp_c

 net/sched/cls_u32.c | 117 
 1 file changed, 35 insertions(+), 82 deletions(-)

-- 
2.11.0



[PATCH net-next 05/11] net: sched: cls_u32: get rid of tc_u_common ->rcu

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

unused

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 810c49ac1bbe..c378168f4562 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -98,7 +98,6 @@ struct tc_u_common {
int refcnt;
struct idr  handle_idr;
struct hlist_node   hnode;
-   struct rcu_head rcu;
 };
 
 static inline unsigned int u32_hash_fold(__be32 key,
-- 
2.11.0



[PATCH net-next 04/11] net: sched: cls_u32: get rid of tc_u_knode ->tp

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

not used anymore

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ef0f2e6ec422..810c49ac1bbe 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -68,7 +68,6 @@ struct tc_u_knode {
u32 mask;
u32 __percpu*pcpu_success;
 #endif
-   struct tcf_proto*tp;
struct rcu_work rwork;
/* The 'sel' field MUST be the last field in structure to allow for
 * tc_u32_keys allocated at end of structure.
@@ -896,7 +895,6 @@ static struct tc_u_knode *u32_init_knode(struct tcf_proto 
*tp,
/* Similarly success statistics must be moved as pointers */
new->pcpu_success = n->pcpu_success;
 #endif
-   new->tp = tp;
memcpy(>sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key));
 
if (tcf_exts_init(>exts, TCA_U32_ACT, TCA_U32_POLICE)) {
@@ -1112,7 +1110,6 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
n->handle = handle;
n->fshift = s->hmask ? ffs(ntohl(s->hmask)) - 1 : 0;
n->flags = flags;
-   n->tp = tp;
 
err = tcf_exts_init(>exts, TCA_U32_ACT, TCA_U32_POLICE);
if (err < 0)
-- 
2.11.0



[PATCH net-next 09/11] net: sched: cls_u32: get rid of tp_c

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

Both hnode ->tp_c and tp_c argument of u32_set_parms()
the latter is redundant, the former - never read...

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3ed2c9866b36..3d4c360f9b0c 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -79,7 +79,6 @@ struct tc_u_hnode {
struct tc_u_hnode __rcu *next;
u32 handle;
u32 prio;
-   struct tc_u_common  *tp_c;
int refcnt;
unsigned intdivisor;
struct idr  handle_idr;
@@ -390,7 +389,6 @@ static int u32_init(struct tcf_proto *tp)
tp_c->refcnt++;
RCU_INIT_POINTER(root_ht->next, tp_c->hlist);
rcu_assign_pointer(tp_c->hlist, root_ht);
-   root_ht->tp_c = tp_c;
 
rcu_assign_pointer(tp->root, root_ht);
tp->data = tp_c;
@@ -761,7 +759,7 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] 
= {
 };
 
 static int u32_set_parms(struct net *net, struct tcf_proto *tp,
-unsigned long base, struct tc_u_common *tp_c,
+unsigned long base,
 struct tc_u_knode *n, struct nlattr **tb,
 struct nlattr *est, bool ovr,
 struct netlink_ext_ack *extack)
@@ -782,7 +780,7 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
}
 
if (handle) {
-   ht_down = u32_lookup_ht(tp_c, handle);
+   ht_down = u32_lookup_ht(tp->data, handle);
 
if (!ht_down) {
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
@@ -956,7 +954,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (!new)
return -ENOMEM;
 
-   err = u32_set_parms(net, tp, base, tp_c, new, tb,
+   err = u32_set_parms(net, tp, base, new, tb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
@@ -1012,7 +1010,6 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return err;
}
}
-   ht->tp_c = tp_c;
ht->refcnt = 1;
ht->divisor = divisor;
ht->handle = handle;
@@ -1123,7 +1120,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
}
 #endif
 
-   err = u32_set_parms(net, tp, base, tp_c, n, tb, tca[TCA_RATE], ovr,
+   err = u32_set_parms(net, tp, base, n, tb, tca[TCA_RATE], ovr,
extack);
if (err == 0) {
struct tc_u_knode __rcu **ins;
-- 
2.11.0



[PATCH net-next 01/11] net: sched: cls_u32: disallow linking to root hnode

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

Operation makes no sense.  Nothing will actually break if we do so
(depth limit in u32_classify() will prevent infinite loops), but
according to maintainers it's best prohibited outright.

NOTE: doing so guarantees that u32_destroy() will trigger the call
of u32_destroy_hnode(); we might want to make that unconditional.

Test:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent : protocol ip prio 100 u32 \
link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
should fail with
Error: cls_u32: Not linking to root node

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 622f4657da94..3357331a80a2 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -797,6 +797,10 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
return -EINVAL;
}
+   if (ht_down->is_root) {
+   NL_SET_ERR_MSG_MOD(extack, "Not linking to root 
node");
+   return -EINVAL;
+   }
ht_down->refcnt++;
}
 
-- 
2.11.0



[PATCH net-next 06/11] net: sched: cls_u32: clean tc_u_common hashtable

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

* calculate key *once*, not for each hash chain element
* let tc_u_hash() return the pointer to chain head rather than index -
callers are cleaner that way.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index c378168f4562..3f6fba831c57 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -343,19 +343,16 @@ static void *tc_u_common_ptr(const struct tcf_proto *tp)
return block->q;
 }
 
-static unsigned int tc_u_hash(const struct tcf_proto *tp)
+static struct hlist_head *tc_u_hash(void *key)
 {
-   return hash_ptr(tc_u_common_ptr(tp), U32_HASH_SHIFT);
+   return tc_u_common_hash + hash_ptr(key, U32_HASH_SHIFT);
 }
 
-static struct tc_u_common *tc_u_common_find(const struct tcf_proto *tp)
+static struct tc_u_common *tc_u_common_find(void *key)
 {
struct tc_u_common *tc;
-   unsigned int h;
-
-   h = tc_u_hash(tp);
-   hlist_for_each_entry(tc, _u_common_hash[h], hnode) {
-   if (tc->ptr == tc_u_common_ptr(tp))
+   hlist_for_each_entry(tc, tc_u_hash(key), hnode) {
+   if (tc->ptr == key)
return tc;
}
return NULL;
@@ -364,10 +361,8 @@ static struct tc_u_common *tc_u_common_find(const struct 
tcf_proto *tp)
 static int u32_init(struct tcf_proto *tp)
 {
struct tc_u_hnode *root_ht;
-   struct tc_u_common *tp_c;
-   unsigned int h;
-
-   tp_c = tc_u_common_find(tp);
+   void *key = tc_u_common_ptr(tp);
+   struct tc_u_common *tp_c = tc_u_common_find(key);
 
root_ht = kzalloc(sizeof(*root_ht), GFP_KERNEL);
if (root_ht == NULL)
@@ -385,12 +380,11 @@ static int u32_init(struct tcf_proto *tp)
kfree(root_ht);
return -ENOBUFS;
}
-   tp_c->ptr = tc_u_common_ptr(tp);
+   tp_c->ptr = key;
INIT_HLIST_NODE(_c->hnode);
idr_init(_c->handle_idr);
 
-   h = tc_u_hash(tp);
-   hlist_add_head(_c->hnode, _u_common_hash[h]);
+   hlist_add_head(_c->hnode, tc_u_hash(key));
}
 
tp_c->refcnt++;
-- 
2.11.0



[PATCH net-next 07/11] net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnode

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

the only thing we used ht for was ht->tp_c and callers can get that
without going through ->tp_c at all; start with lifting that into
the callers, next commits will massage those, eventually removing
->tp_c altogether.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3f6fba831c57..53f34f8cde8b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -761,7 +761,7 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] 
= {
 };
 
 static int u32_set_parms(struct net *net, struct tcf_proto *tp,
-unsigned long base, struct tc_u_hnode *ht,
+unsigned long base, struct tc_u_common *tp_c,
 struct tc_u_knode *n, struct nlattr **tb,
 struct nlattr *est, bool ovr,
 struct netlink_ext_ack *extack)
@@ -782,7 +782,7 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
}
 
if (handle) {
-   ht_down = u32_lookup_ht(ht->tp_c, handle);
+   ht_down = u32_lookup_ht(tp_c, handle);
 
if (!ht_down) {
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
@@ -957,7 +957,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return -ENOMEM;
 
err = u32_set_parms(net, tp, base,
-   rtnl_dereference(n->ht_up), new, tb,
+   rtnl_dereference(n->ht_up)->tp_c, new, tb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
@@ -1124,7 +1124,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
}
 #endif
 
-   err = u32_set_parms(net, tp, base, ht, n, tb, tca[TCA_RATE], ovr,
+   err = u32_set_parms(net, tp, base, ht->tp_c, n, tb, tca[TCA_RATE], ovr,
extack);
if (err == 0) {
struct tc_u_knode __rcu **ins;
-- 
2.11.0



[PATCH net-next 10/11] net: sched: cls_u32: keep track of knodes count in tc_u_common

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

allows to simplify u32_delete() considerably

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3d4c360f9b0c..61593bee08db 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -97,6 +97,7 @@ struct tc_u_common {
int refcnt;
struct idr  handle_idr;
struct hlist_node   hnode;
+   longknodes;
 };
 
 static inline unsigned int u32_hash_fold(__be32 key,
@@ -452,6 +453,7 @@ static void u32_delete_key_freepf_work(struct work_struct 
*work)
 
 static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 {
+   struct tc_u_common *tp_c = tp->data;
struct tc_u_knode __rcu **kp;
struct tc_u_knode *pkp;
struct tc_u_hnode *ht = rtnl_dereference(key->ht_up);
@@ -462,6 +464,7 @@ static int u32_delete_key(struct tcf_proto *tp, struct 
tc_u_knode *key)
 kp = >next, pkp = rtnl_dereference(*kp)) {
if (pkp == key) {
RCU_INIT_POINTER(*kp, key->next);
+   tp_c->knodes--;
 
tcf_unbind_filter(tp, >res);
idr_remove(>handle_idr, key->handle);
@@ -576,6 +579,7 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, 
struct tc_u_knode *n,
 static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
struct netlink_ext_ack *extack)
 {
+   struct tc_u_common *tp_c = tp->data;
struct tc_u_knode *n;
unsigned int h;
 
@@ -583,6 +587,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
RCU_INIT_POINTER(ht->ht[h],
 rtnl_dereference(n->next));
+   tp_c->knodes--;
tcf_unbind_filter(tp, >res);
u32_remove_hw_knode(tp, n, extack);
idr_remove(>handle_idr, n->handle);
@@ -1141,6 +1146,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
 
RCU_INIT_POINTER(n->next, pins);
rcu_assign_pointer(*ins, n);
+   tp_c->knodes++;
*arg = n;
return 0;
}
-- 
2.11.0



[PATCH net-next 02/11] net: sched: cls_u32: make sure that divisor is a power of 2

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

Tested by modifying iproute2 to to allow
sending a divisor > 255 

Tested-by: Jamal Hadi Salim 
Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3357331a80a2..ce55eea448a0 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -994,7 +994,11 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (tb[TCA_U32_DIVISOR]) {
unsigned int divisor = nla_get_u32(tb[TCA_U32_DIVISOR]);
 
-   if (--divisor > 0x100) {
+   if (!is_power_of_2(divisor)) {
+   NL_SET_ERR_MSG_MOD(extack, "Divisor is not a power of 
2");
+   return -EINVAL;
+   }
+   if (divisor-- > 0x100) {
NL_SET_ERR_MSG_MOD(extack, "Exceeded maximum 256 hash 
buckets");
return -EINVAL;
}
-- 
2.11.0



[PATCH net-next 11/11] net: sched: cls_u32: simplify the hell out u32_delete() emptiness check

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

Now that we have the knode count, we can instantly check if
any hnodes are non-empty.  And that kills the check for extra
references to root hnode - those could happen only if there was
a knode to carry such a link.

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 48 +---
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 61593bee08db..ac79a40a0392 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -627,17 +627,6 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
return -ENOENT;
 }
 
-static bool ht_empty(struct tc_u_hnode *ht)
-{
-   unsigned int h;
-
-   for (h = 0; h <= ht->divisor; h++)
-   if (rcu_access_pointer(ht->ht[h]))
-   return false;
-
-   return true;
-}
-
 static void u32_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
 {
struct tc_u_common *tp_c = tp->data;
@@ -675,13 +664,9 @@ static int u32_delete(struct tcf_proto *tp, void *arg, 
bool *last,
  struct netlink_ext_ack *extack)
 {
struct tc_u_hnode *ht = arg;
-   struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
struct tc_u_common *tp_c = tp->data;
int ret = 0;
 
-   if (ht == NULL)
-   goto out;
-
if (TC_U32_KEY(ht->handle)) {
u32_remove_hw_knode(tp, (struct tc_u_knode *)ht, extack);
ret = u32_delete_key(tp, (struct tc_u_knode *)ht);
@@ -702,38 +687,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, 
bool *last,
}
 
 out:
-   *last = true;
-   if (root_ht) {
-   if (root_ht->refcnt > 1) {
-   *last = false;
-   goto ret;
-   }
-   if (root_ht->refcnt == 1) {
-   if (!ht_empty(root_ht)) {
-   *last = false;
-   goto ret;
-   }
-   }
-   }
-
-   if (tp_c->refcnt > 1) {
-   *last = false;
-   goto ret;
-   }
-
-   if (tp_c->refcnt == 1) {
-   struct tc_u_hnode *ht;
-
-   for (ht = rtnl_dereference(tp_c->hlist);
-ht;
-ht = rtnl_dereference(ht->next))
-   if (!ht_empty(ht)) {
-   *last = false;
-   break;
-   }
-   }
-
-ret:
+   *last = tp_c->refcnt == 1 && tp_c->knodes == 0;
return ret;
 }
 
-- 
2.11.0



[PATCH net-next 03/11] net: sched: cls_u32: get rid of unused argument of u32_destroy_key()

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ce55eea448a0..ef0f2e6ec422 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -405,8 +405,7 @@ static int u32_init(struct tcf_proto *tp)
return 0;
 }
 
-static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n,
-  bool free_pf)
+static int u32_destroy_key(struct tc_u_knode *n, bool free_pf)
 {
struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
 
@@ -440,7 +439,7 @@ static void u32_delete_key_work(struct work_struct *work)
  struct tc_u_knode,
  rwork);
rtnl_lock();
-   u32_destroy_key(key->tp, key, false);
+   u32_destroy_key(key, false);
rtnl_unlock();
 }
 
@@ -457,7 +456,7 @@ static void u32_delete_key_freepf_work(struct work_struct 
*work)
  struct tc_u_knode,
  rwork);
rtnl_lock();
-   u32_destroy_key(key->tp, key, true);
+   u32_destroy_key(key, true);
rtnl_unlock();
 }
 
@@ -600,7 +599,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
if (tcf_exts_get_net(>exts))
tcf_queue_work(>rwork, 
u32_delete_key_freepf_work);
else
-   u32_destroy_key(n->tp, n, true);
+   u32_destroy_key(n, true);
}
}
 }
@@ -971,13 +970,13 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
tca[TCA_RATE], ovr, extack);
 
if (err) {
-   u32_destroy_key(tp, new, false);
+   u32_destroy_key(new, false);
return err;
}
 
err = u32_replace_hw_knode(tp, new, flags, extack);
if (err) {
-   u32_destroy_key(tp, new, false);
+   u32_destroy_key(new, false);
return err;
}
 
-- 
2.11.0



[PATCH net 1/1] net: sched: cls_u32: fix hnode refcounting

2018-10-07 Thread Jamal Hadi Salim
From: Al Viro 

cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references
via ->hlist and via ->tp_root together.  u32_destroy() drops the former
and, in case when there had been links, leaves the sucker on the list.
As the result, there's nothing to protect it from getting freed once links
are dropped.
That also makes the "is it busy" check incapable of catching the root
hnode - it *is* busy (there's a reference from tp), but we don't see it as
something separate.  "Is it our root?" check partially covers that, but
the problem exists for others' roots as well.

AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
include oopsen, that is) would be this:
* count tp->root and tp_c->hlist as separate references.  I.e.
have u32_init() set refcount to 2, not 1.
* in u32_destroy() we always drop the former;
in u32_destroy_hnode() - the latter.

That way we have *all* references contributing to refcount.  List
removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
an in u32_destroy() in case of tc_u_common going away, along with
everything reachable from it.  IOW, that way we know that
u32_destroy_key() won't free something still on the list (or pointed to by
someone's ->root).

Reproducer:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent : protocol ip prio 100 handle 1: \
u32 divisor 1
tc filter add dev eth0 parent : protocol ip prio 200 handle 2: \
u32 divisor 1
tc filter add dev eth0 parent : protocol ip prio 100 \
handle 1:0:11 u32 ht 1: link 801: offset at 0 mask 0f00 shift 6 \
plus 0 eat match ip protocol 6 ff
tc filter delete dev eth0 parent : protocol ip prio 200
tc filter change dev eth0 parent : protocol ip prio 100 \
handle 1:0:11 u32 ht 1: link 0: offset at 0 mask 0f00 shift 6 plus 0 \
eat match ip protocol 6 ff
tc filter delete dev eth0 parent : protocol ip prio 100

Signed-off-by: Al Viro 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/cls_u32.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index f218ccf1e2d9..b2c3406a2cf2 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -398,6 +398,7 @@ static int u32_init(struct tcf_proto *tp)
rcu_assign_pointer(tp_c->hlist, root_ht);
root_ht->tp_c = tp_c;
 
+   root_ht->refcnt++;
rcu_assign_pointer(tp->root, root_ht);
tp->data = tp_c;
return 0;
@@ -610,7 +611,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht,
struct tc_u_hnode __rcu **hn;
struct tc_u_hnode *phn;
 
-   WARN_ON(ht->refcnt);
+   WARN_ON(--ht->refcnt);
 
u32_clear_hnode(tp, ht, extack);
 
@@ -649,7 +650,7 @@ static void u32_destroy(struct tcf_proto *tp, struct 
netlink_ext_ack *extack)
 
WARN_ON(root_ht == NULL);
 
-   if (root_ht && --root_ht->refcnt == 0)
+   if (root_ht && --root_ht->refcnt == 1)
u32_destroy_hnode(tp, root_ht, extack);
 
if (--tp_c->refcnt == 0) {
@@ -698,7 +699,6 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool 
*last,
}
 
if (ht->refcnt == 1) {
-   ht->refcnt--;
u32_destroy_hnode(tp, ht, extack);
} else {
NL_SET_ERR_MSG_MOD(extack, "Can not delete in-use filter");
@@ -708,11 +708,11 @@ static int u32_delete(struct tcf_proto *tp, void *arg, 
bool *last,
 out:
*last = true;
if (root_ht) {
-   if (root_ht->refcnt > 1) {
+   if (root_ht->refcnt > 2) {
*last = false;
goto ret;
}
-   if (root_ht->refcnt == 1) {
+   if (root_ht->refcnt == 2) {
if (!ht_empty(root_ht)) {
*last = false;
goto ret;
-- 
2.11.0



Re: Offloaded u32 classifier tables WAS (Re: [PATCH net 00/13] cls_u32 cleanups and fixes.

2018-09-10 Thread Jamal Hadi Salim

On 2018-09-10 8:25 a.m., Jamal Hadi Salim wrote:

On 2018-09-09 11:48 a.m., Al Viro wrote:



BTW, shouldn't we issue u32_clear_hw_hnode() every time
we destroy an hnode?  It's done on u32_delete(), it's
done (for root ht) on u32_destroy(), but it's not done
for any other hnodes when you remove the entire (not shared)
filter.  Looks fishy...



What you are saying makes sense, but that doesnt seem
like a new thing.
All hardware offload examples I have seen use root tables
only[1]. So I am not sure if they use any other tables
although the intel hardware at least seems very capable.
Look at ixgbe_main.c for example. Theres an explicit assumption
that root is always 0x800 but oresence of



"..but presence of other tables can be handled"

cheers,
jamal


Offloaded u32 classifier tables WAS (Re: [PATCH net 00/13] cls_u32 cleanups and fixes.

2018-09-10 Thread Jamal Hadi Salim

On 2018-09-09 11:48 a.m., Al Viro wrote:



BTW, shouldn't we issue u32_clear_hw_hnode() every time
we destroy an hnode?  It's done on u32_delete(), it's
done (for root ht) on u32_destroy(), but it's not done
for any other hnodes when you remove the entire (not shared)
filter.  Looks fishy...



What you are saying makes sense, but that doesnt seem
like a new thing.
All hardware offload examples I have seen use root tables
only[1]. So I am not sure if they use any other tables
although the intel hardware at least seems very capable.
Look at ixgbe_main.c for example. Theres an explicit assumption
that root is always 0x800 but oresence of

+Cc some of the NIC vendor folks..



cheers,
jamal

[1]
Here's a script posted by someone at Intel(Sridhar?) a while back
that adds 2 filters, one with skip-sw and the other with skip-hw
flag.

---
   # add ingress qdisc
   tc qdisc add dev p4p1 ingress

   # enable hw tc offload.
   ethtool -K p4p1 hw-tc-offload on

   # add u32 filter with skip-sw flag.
   tc filter add dev p4p1 parent : protocol ip prio 99 \
  handle 800:0:1 u32 ht 800: flowid 800:1 \
  skip-sw \
  match ip src 192.168.1.0/24 \
  action drop

   # add u32 filter with skip-hw flag.
   tc filter add dev p4p1 parent : protocol ip prio 99 \
  handle 800:0:2 u32 ht 800: flowid 800:2 \
  skip-hw \
  match ip src 192.168.2.0/24 \
  action drop



Re: [PATCH net 00/13] cls_u32 cleanups and fixes.

2018-09-10 Thread Jamal Hadi Salim

On 2018-09-09 10:15 a.m., Al Viro wrote:

[..]


Umm...  Interesting - TCA_U32_SEL is not the only thing that
gets ignored there; TCA_U32_MARK gets the same treatment.
And then there's a lovely question what to do with n->pf -
it's an array of n->sel.nkeys counters, and apparently we
want (at least in common cases) to avoid resetting those.



*If* we declare that ->nkeys mismatch means failure, it's
all relatively easy to implement.  Alternatively, we could
declare that selector change means resetting the stats.
Preferences?


I am now conflicted. I have sample scripts which showed
that at one point that worked. All of them seem to be
indicating the nkeys and stats never changed. So that
could be a starting point; however, if a policy changes
the match tuples then it is no longer the same rule imo.

Note: you can change the "actions" - of which the most
primitive is "set classid x:y" without changing what is
being matched. i.e changing the classid in that example
would work.

cheers,
jamal




Re: [PATCH net 00/13] cls_u32 cleanups and fixes.

2018-09-09 Thread Jamal Hadi Salim



Since you have the momentum here: i noticed something
unusual while i was trying to craft a test that would
vet some of your changes. This has nothing to do with
your changes, same happens on my stock debian laptop
with kernel:
4.17.0-0.bpo.3-amd64 #1 SMP Debian 4.17.17-1~bpo9+1 (2018-08-27)

Looking at git - possibly introduced around the time u32
lockless was being introduced and maybe even earlier
than that. Unfortunately i dont have time to dig
further.

To reproduce what i am referring to, here's a setup:

$tc filter add dev eth0 parent : protocol ip prio 102 u32 \
classid 1:2 match ip src 192.168.8.0/8
$tc filter replace dev eth0 parent : protocol ip prio 102 \
handle 800:0:800 u32 classid 1:2 match ip src 1.1.0.0/24

u32_change() code path should have allowed changing of the
keynode.

cheers,
jamal


Re: [PATCH net 13/13] net: sched: cls_u32: simplify the hell out u32_delete() emptiness check

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 

Now that we have the knode count, we can instantly check if
any hnodes are non-empty.  And that kills the check for extra
references to root hnode - those could happen only if there was
a knode to carry such a link.

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 12/13] net: sched: cls_u32: keep track of knodes count in tc_u_common

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 

allows to simplify u32_delete() considerably

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 11/13] net: sched: cls_u32: get rid of hnode ->tp_c and tp_c argument of u32_set_parms()

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 



Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 10/13] net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->data

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 




Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 09/13] net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnode

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 

the only thing we used ht for was ht->tp_c and callers can get that
without going through ->tp_c at all; start with lifting that into
the callers, next commits will massage those, eventually removing
->tp_c altogether.

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 08/13] net: sched: cls_u32: clean tc_u_common hashtable

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 

* calculate key *once*, not for each hash chain element
* let tc_u_hash() return the pointer to chain head rather than index -
callers are cleaner that way.

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 07/13] net: sched: cls_u32: get rid of tc_u_common ->rcu

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 

unused

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 06/13] net: sched: cls_u32: get rid of tc_u_knode ->tp

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 

not used anymore

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 05/13] net: sched: cls_u32: get rid of unused argument of u32_destroy_key()

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 04/13] net: sched: cls_u32: make sure that divisor is a power of 2

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 
 > Signed-off-by: Al Viro 


Test is by hacking user space iproute2 to allow
sending a divisor > 255

Tested-by: Jamal Hadi Salim 
Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 02/13] net: sched: cls_u32: mark root hnode explicitly

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal



Re: [PATCH net 03/13] net: sched: cls_u32: disallow linking to root hnode

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 

Operation makes no sense.  Nothing will actually break if we do so
(depth limit in u32_classify() will prevent infinite loops), but
according to maintainers it's best prohibited outright.

NOTE: doing so guarantees that u32_destroy() will trigger the call
of u32_destroy_hnode(); we might want to make that unconditional.

Test:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent : protocol ip prio 100 u32 \
link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 
ff
should fail with
Error: cls_u32: Not linking to root node

Signed-off-by: Al Viro 


Tested-by: Jamal Hadi Salim 
Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net 01/13] net: sched: cls_u32: fix hnode refcounting

2018-09-09 Thread Jamal Hadi Salim

On 2018-09-08 9:31 p.m., Al Viro wrote:

From: Al Viro 

cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references via
->hlist and via ->tp_root together.  u32_destroy() drops the former and, in
case when there had been links, leaves the sucker on the list.  As the result,
there's nothing to protect it from getting freed once links are dropped.
That also makes the "is it busy" check incapable of catching the root hnode -
it *is* busy (there's a reference from tp), but we don't see it as something
separate.  "Is it our root?" check partially covers that, but the problem
exists for others' roots as well.

AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
include oopsen, that is) would be this:
 * count tp->root and tp_c->hlist as separate references.  I.e.
have u32_init() set refcount to 2, not 1.
* in u32_destroy() we always drop the former; in u32_destroy_hnode() -
the latter.

That way we have *all* references contributing to refcount.  List
removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
an in u32_destroy() in case of tc_u_common going away, along with everything
reachable from it.  IOW, that way we know that u32_destroy_key() won't
free something still on the list (or pointed to by someone's ->root).

Cc: sta...@vger.kernel.org
Signed-off-by: Al Viro 


per Cong's earlier remark - the reproducer going in the changelog
would be nice to show i.e this you posted earlier, otherwise:

Tested-by: Jamal Hadi Salim 
Acked-by: Jamal Hadi Salim 

Reminder:
--
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent : protocol ip prio 100 handle 1: u32 
divisor 1
tc filter add dev eth0 parent : protocol ip prio 200 handle 2: u32 
divisor 1
tc filter add dev eth0 parent : protocol ip prio 100 handle 1:0:11 
u32 ht 1: link 801: offset at 0 mask 0f00 shift 6 plus 0 eat match ip 
protocol 6 ff

tc filter delete dev eth0 parent : protocol ip prio 200
tc filter change dev eth0 parent : protocol ip prio 100 handle 
1:0:11 u32 ht 1: link 0: offset at 0 mask 0f00 shift 6 plus 0 eat match 
ip protocol 6 ff

tc filter delete dev eth0 parent ffff: protocol ip prio 100
---

cheers,
jamal


Re: [PATCH 1/7] fix hnode refcounting

2018-09-07 Thread Jamal Hadi Salim

To clarify with an example i used to test
your patches:

#0 add ingress filter
$TC qdisc add dev $P ingress
#1 add filter
$TC filter add dev $P parent : protocol ip prio 10 \
u32 match ip protocol 1 0xff
#2 display
$TC filter ls dev $P parent :

#3 try to delete root
$TC filter delete dev $P parent : protocol ip prio 10 \
handle 800: u32

#4 nothing changes..
$TC filter ls dev $P parent :
#5 delete filter
$TC filter delete dev $P parent : protocol ip prio 10 \
handle 800:0:800 u32
#6 filter gone but hash table still there..
$TC filter ls dev $P parent :
#7 delete tp
$TC filter delete dev $P parent : protocol ip prio 10 \
u32
#8 now it is gone..
$TC filter ls dev $P parent :

your patches show #6 filter as still active.
We want it to look like #8

Hope this helps.

cheers,
jamal


Re: [PATCH 1/7] fix hnode refcounting

2018-09-07 Thread Jamal Hadi Salim

On 2018-09-06 10:35 p.m., Al Viro wrote:

On Thu, Sep 06, 2018 at 06:21:09AM -0400, Jamal Hadi Salim wrote:

[..]


Argh...  Unfortunately, there's this: in u32_delete() we have
 if (root_ht) {
 if (root_ht->refcnt > 1) {
 *last = false;
 goto ret;
 }
 if (root_ht->refcnt == 1) {
 if (!ht_empty(root_ht)) {
 *last = false;
 goto ret;
 }
 }
 }
and that would need to be updated.  


It is not detrimental as you have it right now but
you are right an adjustment is needed...

Deleting of a root directly should not be allowed. But
you can flush a whole tp. Consider this:
--
sudo tc qdisc add dev $P ingress
sudo tc filter add dev $P parent : protocol ip prio 10 \
u32 match ip protocol 1 0xff

Which creates root ht 800

You shouldnt be allowed to do this:
--
tc filter delete dev $P parent : protocol ip prio 10 handle 800: u32
---

But you can delete the tp entirely as such:
---
tc filter delete dev $P parent : protocol ip prio 10 u32
--

The later will go via the destroy() path and flush all filters.

You should also be able to delete individual filters. ex:
$tc filter del dev $P parent : prio 10 handle 800:0:800 u32

Where that code you are referring to is important is when
the last filter deleted - we need the caller to know
and it destroys root.

i.e you should return last=true when the last filter is
deleted so root gets auto deleted (just like it was autocreated)


However, that logics is bloody odd
to start with.  First of all, root_ht has come from
struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
and the only place where it's ever modified is
 rcu_assign_pointer(tp->root, root_ht);
in u32_init(), where we'd bloody well checked that root_ht is non-NULL
(see
 if (root_ht == NULL)
 return -ENOBUFS;
upstream of that place) and where that assignment is inevitable on the
way to returning 0.  No matter what, if tp has passed u32_init() it
will have non-NULL ->root, forever.  And there is no way for tcf_proto
to be seen outside of tcf_proto_create() without ->init() having returned
0 - it gets freed before anyone sees it.



Yes, the check for root_ht is not necessary - but the check for the
last filter (and testing for last) is needed.


So this 'if (root_ht)' can't be false.  What's more, what the hell is the
whole thing checking?  We are in u32_delete().  It's called (as ->delete())
from tfilter_del_notify(), which is called from tc_del_tfilter().  If we
return 0 with *last true, we follow up calling tcf_proto_destroy().
OK, let's look at the logics in there:
* if there are links to root hnode => false
* if there's no links to root hnode and it has knodes => false
(BTW, if we ever get there with root_ht->refcnt < 1, we are obviously screwed)
* if there is a tcf_proto sharing tp->data => false (i.e. any filters
with different prio - don't bother)
* if tp is the only one with reference to tp->data and there are *any*
knodes => false.

Any extra links can come only from knodes in a non-empty hnode.  And it's not
a common case.  Shouldn't thIe whole thing be
* shared tp->data => false
* any non-empty hnode => false
instead?  Perhaps even with the knode counter in tp->data, avoiding any loops
in there, as well as the entire ht_empty()...

Now, in the very beginning of u32_delete() we have this:
 struct tc_u_hnode *ht = arg;

 if (ht == NULL)
 goto out;
OK, but the call of ->delete() is
 err = tp->ops->delete(tp, fh, last, extack);
and arg == NULL seen in u32_delete() means fh == NULL in tfilter_del_notify().
Which is called in
 if (!fh) {
...
} else {
 bool last;

 err = tfilter_del_notify(net, skb, n, tp, block,
  q, parent, fh, false, ,
  extack);
How can we ever get there with NULL fh?



Try:
tc filter delete dev $P parent : protocol ip prio 10 u32
tcm handle is 0, so will hit that code path.


The whole thing makes very little sense; looks like it used to live in
u32_destroy() prior to commit 763dbf6328e41 ("net_sched: move the empty tp
check from ->destroy() to ->delete()"), but looking at the rationale in
that commit...  I don't see how it fixes anything - sure, now we remove
tcf_proto from the list before calling ->destroy().  Without any RCU delays
in between.  How could it possibly solve any issues with ->classify()
called in parallel with ->destroy()?  cls_u32 (at least these days)
does try to survive u32_destroy() in parallel with u32_classify();
if any other classifiers do

Re: [PATCH 2/7] mark root hnode explicitly

2018-09-06 Thread Jamal Hadi Salim

On 2018-09-06 6:59 a.m., Al Viro wrote:

On Thu, Sep 06, 2018 at 06:34:00AM -0400, Jamal Hadi Salim wrote:

On 2018-09-06 6:28 a.m., Jamal Hadi Salim wrote:


[..]


Point, and that one is IMO enough to give up on using ->flags for
that.  How about simply

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3f985f29ef30..d14048e38b5c 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -84,6 +84,7 @@ struct tc_u_hnode {
int refcnt;
unsigned intdivisor;
struct idr  handle_idr;
+   boolis_root;
struct rcu_head rcu;
u32 flags;
/* The 'ht' field MUST be the last field in structure to allow for
@@ -377,6 +378,7 @@ static int u32_init(struct tcf_proto *tp)
root_ht->refcnt++;
root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x8000;
root_ht->prio = tp->prio;
+   root_ht->is_root = true;
idr_init(_ht->handle_idr);
  
  	if (tp_c == NULL) {

@@ -693,7 +695,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool 
*last,
goto out;
}
  
-	if (root_ht == ht) {

+   if (ht->is_root) {
NL_SET_ERR_MSG_MOD(extack, "Not allowed to delete root node");
return -EINVAL;
}
@@ -795,6 +797,10 @@ static int u32_set_parms(struct net *net, struct tcf_proto 
*tp,
NL_SET_ERR_MSG_MOD(extack, "Link hash table not 
found");
return -EINVAL;
}
+   if (ht_down->is_root) {
+   NL_SET_ERR_MSG_MOD(extack, "Not linking to root 
node");
+   return -EINVAL;
+   }
    ht_down->refcnt++;



Looks good to me ;->

cheers,
jamal




Re: [PATCH 2/7] mark root hnode explicitly

2018-09-06 Thread Jamal Hadi Salim


And a bunch of indentations...

cheers,
jamal
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 6d45ec4c218c..cb3bee12af78 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -485,7 +485,8 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
 	struct tc_cls_u32_offload cls_u32 = {};
 
 	tc_cls_common_offload_init(_u32.common, tp,
-		h->flags & ~TCA_CLS_FLAGS_U32_ROOT, extack);
+   h->flags & ~TCA_CLS_FLAGS_U32_ROOT,
+   extack);
 	cls_u32.command = TC_CLSU32_DELETE_HNODE;
 	cls_u32.hnode.divisor = h->divisor;
 	cls_u32.hnode.handle = h->handle;
@@ -1215,7 +1216,7 @@ static int u32_reoffload_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
 	int err;
 
 	tc_cls_common_offload_init(_u32.common, tp,
-		ht->flags & ~TCA_CLS_FLAGS_U32_ROOT, extack);
+   ht->flags & ~TCA_CLS_FLAGS_U32_ROOT, extack);
 	cls_u32.command = add ? TC_CLSU32_NEW_HNODE : TC_CLSU32_DELETE_HNODE;
 	cls_u32.hnode.divisor = ht->divisor;
 	cls_u32.hnode.handle = ht->handle;


Re: [PATCH 7/7] clean tc_u_common hashtable

2018-09-06 Thread Jamal Hadi Salim

On 2018-09-05 3:04 p.m., Al Viro wrote:

From: Al Viro 

* calculate key *once*, not for each hash chain element
* let tc_u_hash() return the pointer to chain head rather than index -
callers are cleaner that way.

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH 6/7] get rid of tc_u_common ->rcu

2018-09-06 Thread Jamal Hadi Salim

On 2018-09-05 3:04 p.m., Al Viro wrote:

From: Al Viro 

unused

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH 5/7] get rid of tc_u_knode ->tp

2018-09-06 Thread Jamal Hadi Salim

On 2018-09-05 3:04 p.m., Al Viro wrote:

From: Al Viro 

not used anymore

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH 4/7] get rid of unused argument of u32_destroy_key()

2018-09-06 Thread Jamal Hadi Salim

On 2018-09-05 3:04 p.m., Al Viro wrote:

From: Al Viro 

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH 2/7] mark root hnode explicitly

2018-09-06 Thread Jamal Hadi Salim

On 2018-09-06 6:28 a.m., Jamal Hadi Salim wrote:

On 2018-09-05 3:04 p.m., Al Viro wrote:

From: Al Viro 

... and disallow deleting or linking to such

Signed-off-by: Al Viro 


Same comment as other one in regards to subject

Since the flag space is coming from htnode which is
exposed via uapi it makes sense to keep this one here
because it is for private use; but  a comment in
include/uapi/linux/pkt_cls.h that this flag or
maybe a set of bits is reserved for internal use.
Otherwise:

Acked-by: Jamal Hadi Salim 



Sorry, additional comment:
It makes sense to reject user space attempt to
set TCA_CLS_FLAGS_U32_ROOT
So my suggestion is to update tc_flags_valid() to
check for this.

cheers,
jamal



Re: [PATCH 3/7] make sure that divisor is a power of 2

2018-09-06 Thread Jamal Hadi Salim

On 2018-09-05 3:04 p.m., Al Viro wrote:

From: Al Viro 

Signed-off-by: Al Viro 


Acked-by: Jamal Hadi Salim 


cheers,
jamal


Re: [PATCH 2/7] mark root hnode explicitly

2018-09-06 Thread Jamal Hadi Salim

On 2018-09-05 3:04 p.m., Al Viro wrote:

From: Al Viro 

... and disallow deleting or linking to such

Signed-off-by: Al Viro 


Same comment as other one in regards to subject

Since the flag space is coming from htnode which is
exposed via uapi it makes sense to keep this one here
because it is for private use; but  a comment in
include/uapi/linux/pkt_cls.h that this flag or
maybe a set of bits is reserved for internal use.
Otherwise:

Acked-by: Jamal Hadi Salim 


cheers,
jamal


Re: [PATCH 1/7] fix hnode refcounting

2018-09-06 Thread Jamal Hadi Salim

On 2018-09-05 3:04 p.m., Al Viro wrote:

From: Al Viro 

cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references via
->hlist and via ->tp_root together.  u32_destroy() drops the former and, in
case when there had been links, leaves the sucker on the list.  As the result,
there's nothing to protect it from getting freed once links are dropped.
That also makes the "is it busy" check incapable of catching the root hnode -
it *is* busy (there's a reference from tp), but we don't see it as something
separate.  "Is it our root?" check partially covers that, but the problem
exists for others' roots as well.

AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
include oopsen, that is) would be this:
 * count tp->root and tp_c->hlist as separate references.  I.e.
have u32_init() set refcount to 2, not 1.
* in u32_destroy() we always drop the former; in u32_destroy_hnode() -
the latter.

That way we have *all* references contributing to refcount.  List
removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
an in u32_destroy() in case of tc_u_common going away, along with everything
reachable from it.  IOW, that way we know that u32_destroy_key() won't
free something still on the list (or pointed to by someone's ->root).

Cc: sta...@vger.kernel.org
Signed-off-by: Al Viro 


For networking patches, subject should be reflective of tree and
subsystem. Example for this one:
"[PATCH net 1/7]:net: sched: cls_u32: fix hnode refcounting"
Also useful to have a cover letter summarizing the patchset
in 0/7. Otherwise

Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: broken behaviour of TC filter delete

2018-08-26 Thread Jamal Hadi Salim

On 2018-08-25 9:02 a.m., Jiri Pirko wrote:

Fri, Aug 24, 2018 at 08:11:07PM CEST, xiyou.wangc...@gmail.com wrote:


ENOENT seems to be more logical to return when there's no more filter to delete.


Yeah, at least we should keep ENOENT for compatibility.

The bug here is chain 0 is gone after the last filter is gone,
so when you delete the filter again, it treats it as you specify
chain 0 which does not exist, so it hits EINVAL before ENOENT.


I understand. My concern is about consistency with other chains. Perhaps
-ENOENT for all chains in this case would be doable. What do you think?



ENOENT with extack describing whether chain or filter not found.

cheers,
jamal


Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Jamal Hadi Salim

On 2018-08-13 1:20 p.m., Vlad Buslov wrote:

Lockdep reports deadlock for following locking scenario in ife action:

Task one:
1) Executes ife action update.
2) Takes tcfa_lock.
3) Waits on ife_mod_lock which is already taken by task two.

Task two:

1) Executes any path that obtains ife_mod_lock without disabling bh (any
path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like
loading a meta module, or creating new action.
2) Takes ife_mod_lock.
3) Task is preempted by rate estimator timer.
4) Timer callback waits on tcfa_lock which is taken by task one.

In described case tasks deadlock because they take same two locks in
different order. To prevent potential deadlock reported by lockdep, always
disable bh when obtaining ife_mod_lock.



Looks like your recent changes on net-next exposed this.

Acked-by: Jamal Hadi Salim 

cheers,
jamal


[PATCH net-next 04/13] net: sched: act_gact method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_gact.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index bfccd34a3968..52a3e474d822 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -133,8 +133,8 @@ static int tcf_gact_init(struct net *net, struct nlattr 
*nla,
return ret;
 }
 
-static int tcf_gact(struct sk_buff *skb, const struct tc_action *a,
-   struct tcf_result *res)
+static int tcf_gact_act(struct sk_buff *skb, const struct tc_action *a,
+   struct tcf_result *res)
 {
struct tcf_gact *gact = to_gact(a);
int action = READ_ONCE(gact->tcf_action);
@@ -254,7 +254,7 @@ static struct tc_action_ops act_gact_ops = {
.kind   =   "gact",
.type   =   TCA_ACT_GACT,
.owner  =   THIS_MODULE,
-   .act=   tcf_gact,
+   .act=   tcf_gact_act,
.stats_update   =   tcf_gact_stats_update,
.dump   =   tcf_gact_dump,
.init   =   tcf_gact_init,
-- 
2.11.0



[PATCH net-next 11/13] net: sched: act_skbmod method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_skbmod.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index e9c86ade3b40..d6a1af0c4171 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -24,7 +24,7 @@ static unsigned int skbmod_net_id;
 static struct tc_action_ops act_skbmod_ops;
 
 #define MAX_EDIT_LEN ETH_HLEN
-static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
+static int tcf_skbmod_act(struct sk_buff *skb, const struct tc_action *a,
  struct tcf_result *res)
 {
struct tcf_skbmod *d = to_skbmod(a);
@@ -270,7 +270,7 @@ static struct tc_action_ops act_skbmod_ops = {
.kind   =   "skbmod",
.type   =   TCA_ACT_SKBMOD,
.owner  =   THIS_MODULE,
-   .act=   tcf_skbmod_run,
+   .act=   tcf_skbmod_act,
.dump   =   tcf_skbmod_dump,
.init   =   tcf_skbmod_init,
.cleanup=   tcf_skbmod_cleanup,
-- 
2.11.0



[PATCH net-next 07/13] net: sched: act_pedit method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_pedit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 3f62da72ab6a..8a7a7cb94e83 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -288,8 +288,8 @@ static int pedit_skb_hdr_offset(struct sk_buff *skb,
return ret;
 }
 
-static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
-struct tcf_result *res)
+static int tcf_pedit_act(struct sk_buff *skb, const struct tc_action *a,
+struct tcf_result *res)
 {
struct tcf_pedit *p = to_pedit(a);
int i;
@@ -471,7 +471,7 @@ static struct tc_action_ops act_pedit_ops = {
.kind   =   "pedit",
.type   =   TCA_ACT_PEDIT,
.owner  =   THIS_MODULE,
-   .act=   tcf_pedit,
+   .act=   tcf_pedit_act,
.dump   =   tcf_pedit_dump,
.cleanup=   tcf_pedit_cleanup,
.init   =   tcf_pedit_init,
-- 
2.11.0



[PATCH net-next 10/13] net: sched: act_skbedit method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_skbedit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index a6db47ebec11..926d7bc4a89d 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -33,8 +33,8 @@
 static unsigned int skbedit_net_id;
 static struct tc_action_ops act_skbedit_ops;
 
-static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
-  struct tcf_result *res)
+static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
+  struct tcf_result *res)
 {
struct tcf_skbedit *d = to_skbedit(a);
struct tcf_skbedit_params *params;
@@ -310,7 +310,7 @@ static struct tc_action_ops act_skbedit_ops = {
.kind   =   "skbedit",
.type   =   TCA_ACT_SKBEDIT,
.owner  =   THIS_MODULE,
-   .act=   tcf_skbedit,
+   .act=   tcf_skbedit_act,
.dump   =   tcf_skbedit_dump,
.init   =   tcf_skbedit_init,
.cleanup=   tcf_skbedit_cleanup,
-- 
2.11.0



[PATCH net-next 00/13] net: sched: actions rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Having a structure (example tcf_mirred) and a function with the same name is
not good for readability or grepability.

This long overdue patchset improves it and make sure there is consistency
across all actions

Jamal Hadi Salim (13):
  net: sched: act_connmark method rename for grep-ability and
consistency
  net: sched: act_bpf method rename for grep-ability and consistency
  net: sched: act_sum method rename for grep-ability and consistency
  net: sched: act_gact method rename for grep-ability and consistency
  net: sched: act_ipt method rename for grep-ability and consistency
  net: sched: act_nat method rename for grep-ability and consistency
  net: sched: act_pedit method rename for grep-ability and consistency
  net: sched: act_police method rename for grep-ability and consistency
  net: sched: act_simple method rename for grep-ability and consistency
  net: sched: act_skbedit method rename for grep-ability and consistency
  net: sched: act_skbmod method rename for grep-ability and consistency
  net: sched: act_vlan method rename for grep-ability and consistency
  net: sched: act_mirred method rename for grep-ability and consistency

 net/sched/act_bpf.c  |  6 +++---
 net/sched/act_connmark.c |  6 +++---
 net/sched/act_csum.c |  6 +++---
 net/sched/act_gact.c |  6 +++---
 net/sched/act_ipt.c  |  8 
 net/sched/act_mirred.c   |  6 +++---
 net/sched/act_nat.c  |  6 +++---
 net/sched/act_pedit.c|  6 +++---
 net/sched/act_police.c   | 16 
 net/sched/act_simple.c   |  6 +++---
 net/sched/act_skbedit.c  |  6 +++---
 net/sched/act_skbmod.c   |  4 ++--
 net/sched/act_vlan.c |  6 +++---
 13 files changed, 44 insertions(+), 44 deletions(-)

-- 
2.11.0



[PATCH net-next 03/13] net: sched: act_sum method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_csum.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index f01c59ba6d12..5596fae4e478 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -555,8 +555,8 @@ static int tcf_csum_ipv6(struct sk_buff *skb, u32 
update_flags)
return 0;
 }
 
-static int tcf_csum(struct sk_buff *skb, const struct tc_action *a,
-   struct tcf_result *res)
+static int tcf_csum_act(struct sk_buff *skb, const struct tc_action *a,
+   struct tcf_result *res)
 {
struct tcf_csum *p = to_tcf_csum(a);
struct tcf_csum_params *params;
@@ -670,7 +670,7 @@ static struct tc_action_ops act_csum_ops = {
.kind   = "csum",
.type   = TCA_ACT_CSUM,
.owner  = THIS_MODULE,
-   .act= tcf_csum,
+   .act= tcf_csum_act,
.dump   = tcf_csum_dump,
.init   = tcf_csum_init,
.cleanup= tcf_csum_cleanup,
-- 
2.11.0



[PATCH net-next 01/13] net: sched: act_connmark method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_connmark.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index 2f9bc833d046..54c0bf54f2ac 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -31,8 +31,8 @@
 static unsigned int connmark_net_id;
 static struct tc_action_ops act_connmark_ops;
 
-static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a,
-   struct tcf_result *res)
+static int tcf_connmark_act(struct sk_buff *skb, const struct tc_action *a,
+   struct tcf_result *res)
 {
const struct nf_conntrack_tuple_hash *thash;
struct nf_conntrack_tuple tuple;
@@ -209,7 +209,7 @@ static struct tc_action_ops act_connmark_ops = {
.kind   =   "connmark",
.type   =   TCA_ACT_CONNMARK,
.owner  =   THIS_MODULE,
-   .act=   tcf_connmark,
+   .act=   tcf_connmark_act,
.dump   =   tcf_connmark_dump,
.init   =   tcf_connmark_init,
.walk   =   tcf_connmark_walker,
-- 
2.11.0



[PATCH net-next 13/13] net: sched: act_mirred method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_mirred.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 327be257033d..8ec216001077 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -190,8 +190,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
return ret;
 }
 
-static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
- struct tcf_result *res)
+static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
+ struct tcf_result *res)
 {
struct tcf_mirred *m = to_mirred(a);
struct sk_buff *skb2 = skb;
@@ -406,7 +406,7 @@ static struct tc_action_ops act_mirred_ops = {
.kind   =   "mirred",
.type   =   TCA_ACT_MIRRED,
.owner  =   THIS_MODULE,
-   .act=   tcf_mirred,
+   .act=   tcf_mirred_act,
.stats_update   =   tcf_stats_update,
.dump   =   tcf_mirred_dump,
.cleanup=   tcf_mirred_release,
-- 
2.11.0



[PATCH net-next 06/13] net: sched: act_nat method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_nat.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index 4dd9188a72fd..822e903bfc25 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -93,8 +93,8 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, 
struct nlattr *est,
return ret;
 }
 
-static int tcf_nat(struct sk_buff *skb, const struct tc_action *a,
-  struct tcf_result *res)
+static int tcf_nat_act(struct sk_buff *skb, const struct tc_action *a,
+  struct tcf_result *res)
 {
struct tcf_nat *p = to_tcf_nat(a);
struct iphdr *iph;
@@ -311,7 +311,7 @@ static struct tc_action_ops act_nat_ops = {
.kind   =   "nat",
.type   =   TCA_ACT_NAT,
.owner  =   THIS_MODULE,
-   .act=   tcf_nat,
+   .act=   tcf_nat_act,
.dump   =   tcf_nat_dump,
.init   =   tcf_nat_init,
.walk   =   tcf_nat_walker,
-- 
2.11.0



[PATCH net-next 12/13] net: sched: act_vlan method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_vlan.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 5bde17fe3608..d1f5028384c9 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -22,8 +22,8 @@
 static unsigned int vlan_net_id;
 static struct tc_action_ops act_vlan_ops;
 
-static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
-   struct tcf_result *res)
+static int tcf_vlan_act(struct sk_buff *skb, const struct tc_action *a,
+   struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
struct tcf_vlan_params *p;
@@ -307,7 +307,7 @@ static struct tc_action_ops act_vlan_ops = {
.kind   =   "vlan",
.type   =   TCA_ACT_VLAN,
.owner  =   THIS_MODULE,
-   .act=   tcf_vlan,
+   .act=   tcf_vlan_act,
.dump   =   tcf_vlan_dump,
.init   =   tcf_vlan_init,
.cleanup=   tcf_vlan_cleanup,
-- 
2.11.0



[PATCH net-next 09/13] net: sched: act_simple method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_simple.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index 18e4452574cd..e616523ba3c1 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -28,8 +28,8 @@ static unsigned int simp_net_id;
 static struct tc_action_ops act_simp_ops;
 
 #define SIMP_MAX_DATA  32
-static int tcf_simp(struct sk_buff *skb, const struct tc_action *a,
-   struct tcf_result *res)
+static int tcf_simp_act(struct sk_buff *skb, const struct tc_action *a,
+   struct tcf_result *res)
 {
struct tcf_defact *d = to_defact(a);
 
@@ -207,7 +207,7 @@ static struct tc_action_ops act_simp_ops = {
.kind   =   "simple",
.type   =   TCA_ACT_SIMP,
.owner  =   THIS_MODULE,
-   .act=   tcf_simp,
+   .act=   tcf_simp_act,
.dump   =   tcf_simp_dump,
.cleanup=   tcf_simp_release,
.init   =   tcf_simp_init,
-- 
2.11.0



[PATCH net-next 02/13] net: sched: act_bpf method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_bpf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 9e8a33f9fee3..9b30e62805c7 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -34,8 +34,8 @@ struct tcf_bpf_cfg {
 static unsigned int bpf_net_id;
 static struct tc_action_ops act_bpf_ops;
 
-static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
-  struct tcf_result *res)
+static int tcf_bpf_act(struct sk_buff *skb, const struct tc_action *act,
+  struct tcf_result *res)
 {
bool at_ingress = skb_at_tc_ingress(skb);
struct tcf_bpf *prog = to_bpf(act);
@@ -406,7 +406,7 @@ static struct tc_action_ops act_bpf_ops __read_mostly = {
.kind   =   "bpf",
.type   =   TCA_ACT_BPF,
.owner  =   THIS_MODULE,
-   .act=   tcf_bpf,
+   .act=   tcf_bpf_act,
.dump   =   tcf_bpf_dump,
.cleanup=   tcf_bpf_cleanup,
.init   =   tcf_bpf_init,
-- 
2.11.0



[PATCH net-next 08/13] net: sched: act_police method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_police.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 88c16d80c1cf..06f0742db593 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -56,7 +56,7 @@ struct tc_police_compat {
 static unsigned int police_net_id;
 static struct tc_action_ops act_police_ops;
 
-static int tcf_act_police_walker(struct net *net, struct sk_buff *skb,
+static int tcf_police_walker(struct net *net, struct sk_buff *skb,
 struct netlink_callback *cb, int type,
 const struct tc_action_ops *ops,
 struct netlink_ext_ack *extack)
@@ -73,7 +73,7 @@ static const struct nla_policy police_policy[TCA_POLICE_MAX + 
1] = {
[TCA_POLICE_RESULT] = { .type = NLA_U32 },
 };
 
-static int tcf_act_police_init(struct net *net, struct nlattr *nla,
+static int tcf_police_init(struct net *net, struct nlattr *nla,
   struct nlattr *est, struct tc_action **a,
   int ovr, int bind, bool rtnl_held,
   struct netlink_ext_ack *extack)
@@ -203,7 +203,7 @@ static int tcf_act_police_init(struct net *net, struct 
nlattr *nla,
return err;
 }
 
-static int tcf_act_police(struct sk_buff *skb, const struct tc_action *a,
+static int tcf_police_act(struct sk_buff *skb, const struct tc_action *a,
  struct tcf_result *res)
 {
struct tcf_police *police = to_police(a);
@@ -267,7 +267,7 @@ static int tcf_act_police(struct sk_buff *skb, const struct 
tc_action *a,
return police->tcf_action;
 }
 
-static int tcf_act_police_dump(struct sk_buff *skb, struct tc_action *a,
+static int tcf_police_dump(struct sk_buff *skb, struct tc_action *a,
   int bind, int ref)
 {
unsigned char *b = skb_tail_pointer(skb);
@@ -335,10 +335,10 @@ static struct tc_action_ops act_police_ops = {
.kind   =   "police",
.type   =   TCA_ID_POLICE,
.owner  =   THIS_MODULE,
-   .act=   tcf_act_police,
-   .dump   =   tcf_act_police_dump,
-   .init   =   tcf_act_police_init,
-   .walk   =   tcf_act_police_walker,
+   .act=   tcf_police_act,
+   .dump   =   tcf_police_dump,
+   .init   =   tcf_police_init,
+   .walk   =   tcf_police_walker,
.lookup =   tcf_police_search,
.delete =   tcf_police_delete,
.size   =   sizeof(struct tcf_police),
-- 
2.11.0



[PATCH net-next 05/13] net: sched: act_ipt method rename for grep-ability and consistency

2018-08-12 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_ipt.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index e149f0e66cb6..51f235bbeb5b 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -222,8 +222,8 @@ static int tcf_xt_init(struct net *net, struct nlattr *nla,
  bind);
 }
 
-static int tcf_ipt(struct sk_buff *skb, const struct tc_action *a,
-  struct tcf_result *res)
+static int tcf_ipt_act(struct sk_buff *skb, const struct tc_action *a,
+  struct tcf_result *res)
 {
int ret = 0, result = 0;
struct tcf_ipt *ipt = to_ipt(a);
@@ -348,7 +348,7 @@ static struct tc_action_ops act_ipt_ops = {
.kind   =   "ipt",
.type   =   TCA_ACT_IPT,
.owner  =   THIS_MODULE,
-   .act=   tcf_ipt,
+   .act=   tcf_ipt_act,
.dump   =   tcf_ipt_dump,
.cleanup=   tcf_ipt_release,
.init   =   tcf_ipt_init,
@@ -406,7 +406,7 @@ static struct tc_action_ops act_xt_ops = {
.kind   =   "xt",
.type   =   TCA_ACT_XT,
.owner  =   THIS_MODULE,
-   .act=   tcf_ipt,
+   .act=   tcf_ipt_act,
.dump   =   tcf_ipt_dump,
.cleanup=   tcf_ipt_release,
.init   =   tcf_xt_init,
-- 
2.11.0



Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values

2018-08-01 Thread Jamal Hadi Salim

On 31/07/18 10:40 AM, Paolo Abeni wrote:


If we choose to reject unknown opcodes, such user-space configuration
will fail.



I think that is a good thing. The kernel should not be accepting things
it doesnt understand. This is a good opportunity to enforce that.


What would happen before this patch is that configurations using such
TC_ACT_ value would be successful. This is why I proposed to keep
the fixup.



Note: Such behavior can only occur if tc(user space) allows you
to pass illegitimate values which today can only happen when you have a 
new user space but older kernel (with "old" starting with your current

changes).
iow, fixing a policy in a kernel which has no support for TC_ACT_
to translate intent to be TC_ACT_OK/PIPE is problematic (as i was
showing earlier).


I initially thought the kernel behavior in the above scenario would
match exactly TC_ACT_UNSPEC processing, but as you noted with the
example in your previous email, TC_ACT_UNSPEC processing is actually a
bit different.



I worry: I dont think we can get a good default for most use
cases. No point in maintaining faulty  expectations
(because  IMO: the user will - eventually - fix their scripts if they
dont see expected behavior).

cheers,
jamal


Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values

2018-07-31 Thread Jamal Hadi Salim

On 31/07/18 05:41 AM, Paolo Abeni wrote:


Before this patch, the kernel exposed the same behaviour for negative
value of 'bar', while, for positive 'bar' values, the overall behaviour
was more complex (some classifier always stops with unknown positive
action value, others go to lower prio).


>

Overall, the kernel behavior should be more well-defined now, but yes,
there is a change of behavior under some circumstances.

What about instead mapping undefined/unknown actions value to TC_ACT_OK
(still at initialization time)? >
this is what is already done by
tcf_action_exec() for faulty opcodes/graphs and by tcf_ipt() and would
handle the above example more conistently.



I think PIPE maybe more reasonable. You still continue the action graph
even on such a mis-configuration.

But let me see if i can make a convincing arguement for rejecting at
init time:
I would be _very suprised_ if someone is depending on a misconfiguration 
such as this in a deployment because they would get different results 
than what they are expecting and sooner or later fix it after a lot of

debugging and cursing. Your patch helps them notice it sooner. And a
rejection even much much sooner. With a rejection you dont get to
execute a "fixup" the kernel assumes for you.

BTW, I asked this earlier and Jiri said it was addressed in patch 2.
I just looked again and i may be missing something basic:
Lets say tomorrow in a new kernel we add new TC_ACT_XXX that then gets 
exposed to uapi - so user space tc is updated.

You then use the new tc specifying TC_ACT_XXX policy on kernel with your
changes.
If i read correctly because TC_ACT_XXX is out of bounds for current
kernel(which has your changes) you will fix it to be UNSPEC, no?

cheers,
jamal


Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values

2018-07-30 Thread Jamal Hadi Salim

On 30/07/18 12:41 PM, Paolo Abeni wrote:

On Mon, 2018-07-30 at 10:03 -0400, Jamal Hadi Salim wrote:

On 30/07/18 08:30 AM, Paolo Abeni wrote:

}
   
+	if (!tcf_action_valid(a->tcfa_action)) {

+   NL_SET_ERR_MSG(extack, "invalid action value, using TC_ACT_UNSPEC 
instead");
+   a->tcfa_action = TC_ACT_UNSPEC;
+   }
+
return a;
   



I think it would make a lot more sense to just reject the entry than
changing it underneath the user to a default value. Least element of
suprise.


I fear that would break existing (bad) users ?!? This way, such users
are notified they are doing something uncorrect, but still continue to
work.



By "bad users" I think you mean someone setting a policy expecting
one behavior but getting a different one? If yes, that policy was
already wrong/buggy. As an example, if i configured:

match xxx action foo action goo action bar action gah

where action goo has a bad opcode
If  you "fix it"  with TC_ACT_UNSPEC then basically the above
policy is now equivalent to:

match xxx action foo action goo

Infact if there was a lower prio rule in the chain
then lookup will continue there and produce even stranger
results.


cheers,
jamal




The patch can be changed to reject bad actions, if there is agreement,
but it would not look as the safest way to me.


Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values

2018-07-30 Thread Jamal Hadi Salim

On 30/07/18 08:30 AM, Paolo Abeni wrote:

}
  
+	if (!tcf_action_valid(a->tcfa_action)) {

+   NL_SET_ERR_MSG(extack, "invalid action value, using TC_ACT_UNSPEC 
instead");
+   a->tcfa_action = TC_ACT_UNSPEC;
+   }
+
return a;
  



I think it would make a lot more sense to just reject the entry than
changing it underneath the user to a default value. Least element of
suprise.

cheers,
jamal


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-26 Thread Jamal Hadi Salim

On 25/07/18 01:09 PM, Marcelo Ricardo Leitner wrote:

On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:

On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim  wrote:


Those changes were there from the beginning (above patch did
not introduce them).
IIRC, the reason was to distinguish between policy intended
drops and drops because of errors.


There must be a limit for "overlimit" to make sense. There is
no limit in mirred action's context, probably there is only
such a limit in act_police. So, all rest should not touch overlimit.


+1



I agree we should at least record drop count(unrelated patch though).
we should keep overlimit (for no other reason other than this
has been around for at least 15 years).

On why "overlimit"? It is just a name for a counter that is useless
for most actions (but was still being transfered to user space).
It is the closest counter to counting "this failed because of
runtime errors" as opposed to "user asked us to drop this".

Probably a good alternative is to make a very small stats v3 structure
(we have migrated stats structures before) and extend for
each action/classifier/qdisc to add its extra counters using XSTATS.

Note:
If you are _mirroring_ packets - incrementing the drop counter is
misleading because the packet is not dropped by the system.
i.e the qdisc will not record it as dropped; it should for
redirect policy.  It is useful to be able to tell
them apart when you are collecting  analytics just for actions.
(if youve worked on a massive amount of machines you'll appreciate
being able to debug by looking at counters that reduce ambiguity).

cheers,
jamal


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Jamal Hadi Salim

On 25/07/18 10:24 AM, Paolo Abeni wrote:

On Wed, 2018-07-25 at 08:27 -0400, Jamal Hadi Salim wrote:



Those changes were there from the beginning (above patch did
not introduce them).
IIRC, the reason was to distinguish between policy intended
drops and drops because of errors.


Double-checking to avoid misinterepration on my side: you are ok with
keeping the 'overlimits' increment, right?


Yes.

cheers,
jamal


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Jamal Hadi Salim

On 25/07/18 04:29 AM, Paolo Abeni wrote:

On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:


[..]

I fail to understand why overlimit is increased in your case
here. I guess you want to increase 'drops' instead.


Hmm, actually the current mirred code increases overlimit too.
But I still don't think it makes sense.


Yep, I chose to increment 'overlimits' to preserve the current mirred
semantic.

AFAICS, that was first introduced with:

commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
Author: Jamal Hadi Salim 
Date:   Mon Aug 15 05:25:40 2011 +

 net_sched: fix port mirror/redirect stats reporting

Likely increasing 'drops' would be "better", but I'm unsure we can
change this established behavior without affecting some user.



Those changes were there from the beginning (above patch did
not introduce them).
IIRC, the reason was to distinguish between policy intended
drops and drops because of errors.

cheers,
jamal



Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Jamal Hadi Salim

+Cc Shmulik

Paolo - please also run the tdc tests (and add anymore if you
feel they dont do coverage to your changes)

On 24/07/18 04:06 PM, Paolo Abeni wrote:

This is similar TC_ACT_REDIRECT, but with a slightly different
semantic:
- on ingress the mirred skbs are passed to the target device
network stack without any additional check not scrubbing.
- the rcu-protected stats provided via the tcf_result struct
   are updated on error conditions.

This new tcfa_action value is not exposed to the user-space
and can be used only internally by clsact.

v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
  a new action type instead

v2 -> v3:
  - rename the new action value TC_ACT_REINJECT, update the
helper accordingly
  - take care of uncloned reinjected packets in XDP generic
hook

Signed-off-by: Paolo Abeni 
---
  include/net/pkt_cls.h |  3 +++
  include/net/sch_generic.h | 19 +++
  net/core/dev.c|  6 +-
  3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 2081e4219f81..36ccfe2a303a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -7,6 +7,9 @@
  #include 
  #include 
  
+/* TC action not accessible from user space */

+#define TC_ACT_REINJECT(TC_ACT_VALUE_MAX + 1)


Lets say in the future we add a new opcode.
Will old kernel, new iproute2 (new value) work?
Maybe use a negative number below -1.
I am honestly still unclear about introducing this new
code.
Could you not use the result code to carry sufficient
info to indicate the intent?

cheers,
jamal


+
  /* Basic packet classifier frontend definitions. */
  
  struct tcf_walker {

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 056dc1083aa3..95e81a70f549 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -235,6 +235,12 @@ struct tcf_result {
u32 classid;
};
const struct tcf_proto *goto_tp;
+
+   /* used by the TC_ACT_REINJECT action */
+   struct {
+   boolingress;
+   struct gnet_stats_queue *qstats;
+   };
};
  };
  
@@ -1091,4 +1097,17 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,

  void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
  struct mini_Qdisc __rcu **p_miniq);
  
+static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)

+{
+   struct gnet_stats_queue *stats = res->qstats;
+   int ret;
+
+   if (res->ingress)
+   ret = netif_receive_skb(skb);
+   else
+   ret = dev_queue_xmit(skb);
+   if (ret && stats)
+   qstats_overlimit_inc(res->qstats);
+}
+
  #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index 14a748ee8cc9..826ec74fe1d9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4252,7 +4252,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
/* Reinjected packets coming from act_mirred or similar should
 * not get XDP generic processing.
 */
-   if (skb_cloned(skb))
+   if (skb_cloned(skb) || skb->tc_redirected)
return XDP_PASS;
  
  	/* XDP packets must be linear and must have sufficient headroom

@@ -4602,6 +4602,10 @@ sch_handle_ingress(struct sk_buff *skb, struct 
packet_type **pt_prev, int *ret,
__skb_push(skb, skb->mac_len);
skb_do_redirect(skb);
return NULL;
+   case TC_ACT_REINJECT:
+   /* this does not scrub the packet, and updates stats on error */
+   skb_tc_reinject(skb, _res);
+   return NULL;
default:
break;
}





Re: [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback

2018-07-25 Thread Jamal Hadi Salim

On 24/07/18 04:06 PM, Paolo Abeni wrote:

Each lockless action currently does its own RCU locking in ->act().
This is allows using plain RCU accessor, even if the context
is really RCU BH.

This change drops the per action RCU lock, replace the accessors
with _bh variant, cleans up a bit the surronding code and documents
the RCU status in the relevant header.
No functional nor performance change is intended.

The goal of this patch is clarifying that the RCU critical section
used by the tc actions extends up to the classifier's caller.



This and 2/5 seems to stand on their own merit.

cheers,
jamal


Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly

2018-07-25 Thread Jamal Hadi Salim

On 24/07/18 04:06 PM, Paolo Abeni wrote:


--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, 
struct tcf_proto *tp,
}
}
  
+	if (a->tcfa_action == TC_ACT_REDIRECT) {

+   net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
+   a->tcfa_action = TC_ACT_UNSPEC;
+   }
+


Why not just reject the rule instead of changing the code underneath the
user?

cheers,
jamal



Re: [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible

2018-07-25 Thread Jamal Hadi Salim

On 24/07/18 05:15 PM, Cong Wang wrote:

On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni  wrote:

+
+   /* let's the caller reinject the packet, if possible */
+   if (skb_at_tc_ingress(skb)) {
+   res->ingress = want_ingress;
+   res->qstats = this_cpu_ptr(m->common.cpu_qstats);
+   return TC_ACT_REINJECT;
+   }


Looks good to me, but here we no longer return user-specified
return value here, I am sure it is safe for TC_ACT_STOLEN, but
I am not sure if it is safe for other values, like TC_ACT_RECLASSIFY.

Jamal, is there any use case of returning !TC_ACT_STOLEN for
ingress redirections?


I cant think of one off top of my head.
There maybe a future use case where it is not so - maybe just allow
to return the user programmed action? that value will always be
TC_ACT_STOLEN if the rule was specified via iproute2/tc.

cheers,
jamal


Re: [patch net-next v2 0/9] net: sched: introduce chain templates support with offloading to mlxsw

2018-06-29 Thread Jamal Hadi Salim

On 29/06/18 04:39 AM, Jiri Pirko wrote:

Fri, Jun 29, 2018 at 12:25:53AM CEST, xiyou.wangc...@gmail.com wrote:

On Thu, Jun 28, 2018 at 6:10 AM Jiri Pirko  wrote:

Add a template of type flower allowing to insert rules matching on last
2 bytes of destination mac address:
# tc chaintemplate add dev dummy0 ingress proto ip flower dst_mac 
00:00:00:00:00:00/00:00:00:00:FF:FF

The template is now showed in the list:
# tc chaintemplate show dev dummy0 ingress
chaintemplate flower chain 0
   dst_mac 00:00:00:00:00:00/00:00:00:00:ff:ff
   eth_type ipv4

Add another template, this time for chain number 22:
# tc chaintemplate add dev dummy0 ingress proto ip chain 22 flower dst_ip 
0.0.0.0/16
# tc chaintemplate show dev dummy0 ingress
chaintemplate flower chain 0
   dst_mac 00:00:00:00:00:00/00:00:00:00:ff:ff
   eth_type ipv4
chaintemplate flower chain 22
   eth_type ipv4
   dst_ip 0.0.0.0/16


So, if I want to check the template of a chain, I have to use
'tc chaintemplate... chain X'.

If I want to check the filters in a chain, I have to use
'tc filter show  chain X'.

If you introduce 'tc chain', it would just need one command:
`tc chain show ... X` which could list its template first and
followed by filters in this chain, something like:

# tc chain show dev eth0 chain X
template: # could be none

filter1
...
filter2
...

Isn't it more elegant?


Well, that is just another iproute2 command. It would use the same
kernel uapi. Filters+templates. Sure, why not. Can be easily introduced.
Let's do it in a follow-up iproute2 patch.



Half a dozen or 6 - take your pick, really.
I would call the template an attribute as opposed to a stand alone
object i.e A chain of filters may have a template. If you have to
introduce a new object then Sridhar's suggested syntax seems appealing.

cheers,
jamal


Re: [patch net-next v2 0/9] net: sched: introduce chain templates support with offloading to mlxsw

2018-06-28 Thread Jamal Hadi Salim

On 28/06/18 09:22 AM, Jiri Pirko wrote:

Thu, Jun 28, 2018 at 03:13:30PM CEST, j...@mojatatu.com wrote:


On 26/06/18 03:59 AM, Jiri Pirko wrote:

From: Jiri Pirko 

For the TC clsact offload these days, some of HW drivers need
to hold a magic ball. The reason is, with the first inserted rule inside
HW they need to guess what fields will be used for the matching. If
later on this guess proves to be wrong and user adds a filter with a
different field to match, there's a problem. Mlxsw resolves it now with
couple of patterns. Those try to cover as many match fields as possible.
This aproach is far from optimal, both performance-wise and scale-wise.
Also, there is a combination of filters that in certain order won't
succeed.


Most of the time, when user inserts filters in chain, he knows right away
how the filters are going to look like - what type and option will they
have. For example, he knows that he will only insert filters of type
flower matching destination IP address. He can specify a template that
would cover all the filters in the chain.



Is this just restricted to hardware offload? Example it will make sense
for u32 in s/ware as well (i.e flexible TCAM like TCAM based
classification). i.e it is possible that rules the user enters
end up being worst case a linked list lookup, yes? And allocating
space for a tuple that is not in use is a waste of space.


I'm afraid I don't understand clearly what you say.


Well - I was trying to understand what you said ;->

I think what you are getting at is two issues:
a) space in the tcams - if the user is just going to enter
rules which use one tuple (dst ip for example) the hardware
would be better off told that this is the case so it doesnt
allocate space in anticipation that someone is going to
specify src ip later on.
b) lookup speed in tcams - without the template hint a
selection of rules may end up looking like a linked list
which is not optimal for lookup


This is not
restricted to hw offload. The templates apply to all filters, no matter
if they are offloaded or not.



Do you save anything with flower(in s/w) if you only added a template
with say dst ip/mask? I can see it will make sense for u32 which is more
flexible and protocol independent.

cheers,
jamal


Re: [patch net-next v2 0/9] net: sched: introduce chain templates support with offloading to mlxsw

2018-06-28 Thread Jamal Hadi Salim



On 26/06/18 03:59 AM, Jiri Pirko wrote:

From: Jiri Pirko 

For the TC clsact offload these days, some of HW drivers need
to hold a magic ball. The reason is, with the first inserted rule inside
HW they need to guess what fields will be used for the matching. If
later on this guess proves to be wrong and user adds a filter with a
different field to match, there's a problem. Mlxsw resolves it now with
couple of patterns. Those try to cover as many match fields as possible.
This aproach is far from optimal, both performance-wise and scale-wise.
Also, there is a combination of filters that in certain order won't
succeed.


>

Most of the time, when user inserts filters in chain, he knows right away
how the filters are going to look like - what type and option will they
have. For example, he knows that he will only insert filters of type
flower matching destination IP address. He can specify a template that
would cover all the filters in the chain.



Is this just restricted to hardware offload? Example it will make sense
for u32 in s/ware as well (i.e flexible TCAM like TCAM based
classification). i.e it is possible that rules the user enters
end up being worst case a linked list lookup, yes? And allocating
space for a tuple that is not in use is a waste of space.

If yes, then I would reword the above as something like:

For very flexible classifiers such as TCAM based ones,
one could add arbitrary tuple rules which tend to be inefficient both
from a space and lookup performance. One approach, taken by Mlxsw,
is to assume a multi filter tuple arrangement which is inefficient
from a space perspective when the user-specified rules dont make
use of pre-provisioned tuple space.
Typically users already know what tuples are of interest to them:
for example for ipv4 route lookup purposes they may just want to
lookup destination IP with a specified mask etc.
This feature allows user to provide good hints to the classifier to
optimize.



This patchset is providing the possibility to user to provide such
template  to kernel and propagate it all the way down to device
drivers.

See the examples below.

Create dummy device with clsact first:
# ip link add type dummy
# tc qdisc add dev dummy0 clsact

There is no template assigned by default:
# tc filter template show dev dummy0 ingress

Add a template of type flower allowing to insert rules matching on last
2 bytes of destination mac address:
# tc filter template add dev dummy0 ingress proto ip flower dst_mac 
00:00:00:00:00:00/00:00:00:00:FF:FF

The template is now showed in the list:
# tc filter template show dev dummy0 ingress
filter flower chain 0
   dst_mac 00:00:00:00:00:00/00:00:00:00:ff:ff
   eth_type ipv4

Add another template, this time for chain number 22:
# tc filter template add dev dummy0 ingress proto ip chain 22 flower dst_ip 
0.0.0.0/16
# tc filter template show dev dummy0 ingress
filter flower chain 0
   dst_mac 00:00:00:00:00:00/00:00:00:00:ff:ff
   eth_type ipv4
filter flower chain 22
   eth_type ipv4
   dst_ip 0.0.0.0/16

Add a filter that fits the template:
# tc filter add dev dummy0 ingress proto ip flower dst_mac 
aa:bb:cc:dd:ee:ff/00:00:00:00:00:0F action drop

Addition of filters that does not fit the template would fail:
# tc filter add dev dummy0 ingress proto ip flower dst_mac 
aa:11:22:33:44:55/00:00:00:FF:00:00 action drop
Error: Mask does not fit the template.
We have an error talking to the kernel, -1
# tc filter add dev dummy0 ingress proto ip flower dst_ip 10.0.0.1 action drop
Error: Mask does not fit the template.
We have an error talking to the kernel, -1

Additions of filters to chain 22:
# tc filter add dev dummy0 ingress proto ip chain 22 flower dst_ip 10.0.0.1/8 
action drop
# tc filter add dev dummy0 ingress proto ip chain 22 flower dst_ip 10.0.0.1 
action drop
Error: Mask does not fit the template.
We have an error talking to the kernel, -1
# tc filter add dev dummy0 ingress proto ip chain 22 flower dst_ip 10.0.0.1/24 
action drop
Error: Mask does not fit the template.
We have an error talking to the kernel, -1

Removal of a template from non-empty chain would fail:
# tc filter template del dev dummy0 ingress
Error: The chain is not empty, unable to delete template.
We have an error talking to the kernel, -1

Once the chain is flushed, the template could be removed:
# tc filter del dev dummy0 ingress
# tc filter template del dev dummy0 ingress



BTW: unlike the other comments on this - I think the syntax above
is fine ;-> Chain are already either explicitly or are implicitly
(case of chain 0) specified.

Assuming that one cant add a new template to a chain if it already
has at least one filter (even if no template has been added).

I like it - it may help making u32 more friendly to humans in some
cases.

cheers,
jamal


Re: [PATCH v2 net-next] net/sched: add skbprio scheduler

2018-06-24 Thread Jamal Hadi Salim

On 23/06/18 04:47 PM, Nishanth Devarajan wrote:
[..]


+   /* Drop the packet at the tail of the lowest priority qdisc. */
+   lp_qdisc = >qdiscs[lp];
+   to_drop = __skb_dequeue_tail(lp_qdisc);
+   BUG_ON(!to_drop);
+   qdisc_qstats_backlog_dec(sch, to_drop);
+   qdisc_drop(to_drop, sch, to_free);
+


Maybe also increase overlimit stat here? It will keep track
of low prio things dropped because you were congested.
Such a stat helps when debugging or collecting analytics.

Per Alex's comment, how about:

---
Skbprio (SKB Priority Queue) is a queueing discipline that
prioritizes packets according to their skb->priority field.
Under congestion, already-enqueued lower priority packets
will be dropped to make space available for higher priority
packets. Skbprio was conceived as a solution for
denial-of-service defenses that need to route packets with
different priorities as a means to overcome DoS attacks
as described in paper ...


cheers,
jamal


Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit

2018-06-20 Thread Jamal Hadi Salim

On 19/06/18 08:39 AM, Michel Machado wrote:

 Notice that, different from skbmod, there's no field parm->flags in 
skbedit. Skbedit infers the flags in d->flags from the presence of the 
parameters of each of its actions. But SKBEDIT_F_INHERITDSFIELD has no 
parameter and adding field parm->flags breaks backward compatibility 
with user space as pointed out by Marcelo Ricardo Leitner. Our solution 
was to add TCA_SKBEDIT_FLAGS, so SKBEDIT_F_INHERITDSFIELD and future 
flag-only actions can be added.


Ok, that makes sense - thanks. I am not so sure about using
64 bits (32 bits would have been fine to match the size of
the kernel flags), but other than that LGTM.

Acked-by: Jamal Hadi Salim 

cheers,
jamal



Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit

2018-06-19 Thread Jamal Hadi Salim



Hi Qiaobin,

Per my previous comments, why do we need the
TCA_SKBEDIT_FLAGS TLV? Isnt SKBEDIT_F_INHERITDSFIELD
sufficient? i.e in tcf_skbedit_init() check for
d->flags_F_INHERITDSFIELD then set skb->priority
and flags|=SKBEDIT_F_INHERITDSFIELD

Side note:
Infact the whole flags setting in the init function
seems to be a redundant given all the TLVs have
d->flags also set by user space.
But thats a different patch.

cheers,
jamal

On 12/06/18 11:42 AM, Fu, Qiaobin wrote:

The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v4:
*Not allow setting flags other than the expected ones.

*Allow dumping the pure flags.

Original idea by Jamal Hadi Salim 

Signed-off-by: Qiaobin Fu 
Reviewed-by: Michel Machado 
---

Note that the motivation for this patch is found in the following discussion:
https://www.spinics.net/lists/netdev/msg501061.html
---
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h 
b/include/uapi/linux/tc_act/tc_skbedit.h
index fbcfe27a4e6c..6de6071ebed6 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -30,6 +30,7 @@
  #define SKBEDIT_F_MARK0x4
  #define SKBEDIT_F_PTYPE   0x8
  #define SKBEDIT_F_MASK0x10
+#define SKBEDIT_F_INHERITDSFIELD   0x20
  
  struct tc_skbedit {

tc_gen;
@@ -45,6 +46,7 @@ enum {
TCA_SKBEDIT_PAD,
TCA_SKBEDIT_PTYPE,
TCA_SKBEDIT_MASK,
+   TCA_SKBEDIT_FLAGS,
__TCA_SKBEDIT_MAX
  };
  #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 6138d1d71900..9adbcfa3f5fe 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -23,6 +23,9 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
  
  #include 

  #include 
@@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct 
tc_action *a,
  
  	if (d->flags & SKBEDIT_F_PRIORITY)

skb->priority = d->priority;
+   if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
+   int wlen = skb_network_offset(skb);
+
+   switch (tc_skb_protocol(skb)) {
+   case htons(ETH_P_IP):
+   wlen += sizeof(struct iphdr);
+   if (!pskb_may_pull(skb, wlen))
+   goto err;
+   skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+   break;
+
+   case htons(ETH_P_IPV6):
+   wlen += sizeof(struct ipv6hdr);
+   if (!pskb_may_pull(skb, wlen))
+   goto err;
+   skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+   break;
+   }
+   }
if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
skb->dev->real_num_tx_queues > d->queue_mapping)
skb_set_queue_mapping(skb, d->queue_mapping);
@@ -53,6 +75,10 @@ static int tcf_skbedit(struct sk_buff *skb, const struct 
tc_action *a,
  
  	spin_unlock(>tcf_lock);

return d->tcf_action;
+
+err:
+   spin_unlock(>tcf_lock);
+   return TC_ACT_SHOT;
  }
  
  static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {

@@ -62,6 +88,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX 
+ 1] = {
[TCA_SKBEDIT_MARK]  = { .len = sizeof(u32) },
[TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) },
[TCA_SKBEDIT_MASK]  = { .len = sizeof(u32) },
+   [TCA_SKBEDIT_FLAGS] = { .len = sizeof(u64) },
  };
  
  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,

@@ -73,6 +100,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr 
*nla,
struct tc_skbedit *parm;
struct tcf_skbedit *d;
u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
+   u64 *pure_flags = NULL;
u16 *queue_mapping = NULL, *ptype = NULL;
bool exists = false;
int ret = 0, err;
@@ -114,6 +142,12 @@ static int tcf_skbedit_init(struct net *net, struct nlattr 
*nla,
mask = nla_data(tb[TCA_SKBEDIT_MASK]);
}
  
+	if (tb[TCA_SKBEDIT_FLAGS] != NULL) {

+   pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
+   if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
+   flags |= SKBEDIT_F_INHERITDSFIELD;
+   }
+
parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
  
  	exists = tcf_idr_check(tn, parm->index, a, bind);

@@ -178,6 +212,7 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct 
tc_action *a,
.action  = d->tcf_action,
};
struct tcf_t t;
+   u64 pure_flags = 0;
  
  	if (nla_put(skb, TCA_SKBEDIT_PARMS, size

Re: [PATCH net-next v4 00/11] Modify action API for implementing lockless actions

2018-06-01 Thread Jamal Hadi Salim

On 31/05/18 08:38 AM, Vlad Buslov wrote:


Hi Jamal,

On current net-next I still have action with single reference after last
step:
~$ sudo $TC -s actions ls action skbedit
total acts 1

 action order 0:  skbedit mark 1 pipe

  index 1 ref 2 bind 1 installed 47 sec used 47 sec
 Action statistics:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
~$ sudo $TC filter del dev lo parent : protocol ip prio 1 \

u32 match ip dst 127.0.0.1/32 flowid 1:1

~$ sudo $TC -s actions ls action skbedit
total acts 1

 action order 0:  skbedit mark 1 pipe

  index 1 ref 1 bind 0 installed 80 sec used 80 sec
 Action statistics:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

Which branch are you testing on?


You are correct - this is how it works now (I was thinking of a
very old version before Cong made some changes a while back).
Just vet this continues to work as above.

cheers,
jamal


Re: [PATCH net-next v4 00/11] Modify action API for implementing lockless actions

2018-05-31 Thread Jamal Hadi Salim

Hi Vlad,

Can you try one simple test below with these patches?

#create an action
sudo $TC actions add action skbedit mark 1 pipe
#
sudo $TC qdisc del dev lo parent :
sudo $TC qdisc add dev lo ingress
# bind action to filter
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action skbedit index 1

#now delete that action multiple times while it is still bound
sudo $TC actions del action skbedit index 1
sudo $TC actions del action skbedit index 1
sudo $TC actions del action skbedit index 1

#check the refcount and bindcount
sudo $TC -s actions ls action skbedit

#delete the filter (which should remove the bindcnt)

sudo $TC filter del dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1

#check the refcount and bindcount
sudo $TC -s actions ls action skbedit

Current behavior: i believe the action is gone in this last step.
Your patches may change behavior so that the action action is still
around. I dont think this is a big deal, but just wanted to be sure
it is not something more unexpected.

cheers,
jamal


Re: [PATCH] net: sched: split tc_ctl_tfilter into three handlers

2018-05-28 Thread Jamal Hadi Salim

On 28/05/18 12:02 PM, Jamal Hadi Salim wrote:

On 27/05/18 03:55 PM, Vlad Buslov wrote:

tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function
involves a lot of branching on specific message type because most of the
code is message-specific. This significantly complicates adding new
functionality and doesn't provide much benefit of code reuse.

Split tc_ctl_tfilter to three standalone functions that handle filter 
new,

delete and get requests.

The only truly protocol independent part of tc_ctl_tfilter is code that
looks up queue, class, and block. Refactor this code to standalone
tcf_block_find function that is used by all three new handlers.

Signed-off-by: Vlad Buslov <vla...@mellanox.com>


FWIW, I like this separation - makes things more maintainable and
readable (we should do it in act_api as well).


Sorry - meant to point out that this belongs to net-next not net.

cheers,
jamal


  1   2   3   4   5   6   7   8   9   10   >