Re: [PATCH net] net: sched: cls: Fix offloading when ingress dev is vxlan

2018-06-06 Thread Paul Blakey




On 06/06/2018 00:27, Jakub Kicinski wrote:

On Tue, 05 Jun 2018 15:06:40 -0400 (EDT), David Miller wrote:

From: Jakub Kicinski 
Date: Tue, 5 Jun 2018 11:57:47 -0700


Do we still care about correctness and not breaking backward
compatibility?


Jakub let me know if you want me to revert this change.


Yes, I think this patch introduces a regression when block is shared
between offload capable and in-capable device, therefore it should be
reverted.  Shared blocks went through a number of review cycles to
ensure such cases are handled correctly.


Longer story about the actual issue which is never explained in the
commit message is this: in kernels 4.10 - 4.14 skip_sw flag was
supported on tunnels in cls_flower only:

static int fl_hw_replace_filter(struct tcf_proto *tp,
[...]
if (!tc_can_offload(dev)) {
if (tcf_exts_get_dev(dev, >exts, >hw_dev) ||
(f->hw_dev && !tc_can_offload(f->hw_dev))) {
f->hw_dev = dev;
return tc_skip_sw(f->flags) ? -EINVAL : 0;
}
dev = f->hw_dev;
cls_flower.egress_dev = true;
} else {
f->hw_dev = dev;
}


In 4.15 - 4.17 with addition of shared blocks egdev mechanism was
promoted to a generic TC thing supported on all filters but it no
longer works with skip_sw.

I'd argue skip_sw is incorrect for tunnels, because the rule will only
apply to traffic ingressing on ASIC ports, not all traffic which hits
the tunnel netdev.  Therefore we should keep the 4.15 - 4.17 behaviour.

But that's a side note, I don't think we should be breaking offload on
shared blocks whether we decide to support skip_sw on tunnels or not.



Maybe we can apply my patch logic of still trying the egress dev if the 
block has a single device, and not shared. Is that ok with you?


You're patch seems good as an add on, but the egress hw device (sw1np0) 
would go over the tc actions and see if it can offload such rule (output 
to software lo device) and would fail anyway.







[PATCH net] net: sched: cls: Fix offloading when ingress dev is vxlan

2018-06-05 Thread Paul Blakey
When using a vxlan device as the ingress dev, we count it as a
"no offload dev", so when such a rule comes and err stop is true,
we fail early and don't try the egdev route which can offload it
through the egress device.

Fix that by not calling the block offload if one of the devices
attached to it is not offload capable, but make sure egress on such case
is capable instead.

Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]")
Reviewed-by: Roi Dayan 
Acked-by: Jiri Pirko 
Signed-off-by: Paul Blakey 
---
 net/sched/cls_api.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a57e112..2cd579f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -734,10 +734,6 @@ static int tcf_block_cb_call(struct tcf_block *block, enum 
tc_setup_type type,
int ok_count = 0;
int err;
 
-   /* Make sure all netdevs sharing this block are offload-capable. */
-   if (block->nooffloaddevcnt && err_stop)
-   return -EOPNOTSUPP;
-
list_for_each_entry(block_cb, >cb_list, list) {
err = block_cb->cb(type, type_data, block_cb->cb_priv);
if (err) {
@@ -1580,21 +1576,31 @@ static int tc_exts_setup_cb_egdev_call(struct tcf_exts 
*exts,
 int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
 enum tc_setup_type type, void *type_data, bool err_stop)
 {
-   int ok_count;
+   int ok_count = 0;
int ret;
 
-   ret = tcf_block_cb_call(block, type, type_data, err_stop);
-   if (ret < 0)
-   return ret;
-   ok_count = ret;
+   if (!block->nooffloaddevcnt) {
+   ret = tcf_block_cb_call(block, type, type_data, err_stop);
+   if (ret < 0)
+   return ret;
+   ok_count = ret;
+   }
 
if (!exts || ok_count)
-   return ok_count;
+   goto skip_egress;
+
ret = tc_exts_setup_cb_egdev_call(exts, type, type_data, err_stop);
if (ret < 0)
return ret;
ok_count += ret;
 
+skip_egress:
+   /* if one of the netdevs sharing this block are not offload-capable
+* make sure we succeeded in egress instead.
+*/
+   if (block->nooffloaddevcnt && !ok_count && err_stop)
+   return -EOPNOTSUPP;
+
return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
-- 
2.7.4



Re: [PATCH net-next V2 1/2] cls_flower: Fix missing free of rhashtable

2018-06-05 Thread Paul Blakey




On 05/06/2018 00:04, David Miller wrote:

From: Paul Blakey 
Date: Sun,  3 Jun 2018 10:06:13 +0300


When destroying the instance, destroy the head rhashtable.

Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Reported-by: Vlad Buslov 
Reviewed-by: Roi Dayan 
Reviewed-by: Jiri Pirko 
Signed-off-by: Paul Blakey 


Applied.



thanks.


[PATCH net-next V2 2/2] cls_flower: Fix comparing of old filter mask with new filter

2018-06-03 Thread Paul Blakey
We incorrectly compare the mask and the result is that we can't modify
an already existing rule.

Fix that by comparing correctly.

Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Reported-by: Vlad Buslov 
Reviewed-by: Roi Dayan 
Reviewed-by: Jiri Pirko 
Signed-off-by: Paul Blakey 
---

Changelog: v0 -> v2: rebased.

 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 159efd9..2b5be42 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -877,7 +877,7 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
return PTR_ERR(newmask);
 
fnew->mask = newmask;
-   } else if (fold && fold->mask == fnew->mask) {
+   } else if (fold && fold->mask != fnew->mask) {
return -EINVAL;
}
 
-- 
2.7.4



[PATCH net-next V2 1/2] cls_flower: Fix missing free of rhashtable

2018-06-03 Thread Paul Blakey
When destroying the instance, destroy the head rhashtable.

Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Reported-by: Vlad Buslov 
Reviewed-by: Roi Dayan 
Reviewed-by: Jiri Pirko 
Signed-off-by: Paul Blakey 
---

Changelog: v0 -> v2: rebased.

 net/sched/cls_flower.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 3786fea..159efd9 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -326,6 +326,8 @@ static void fl_destroy_sleepable(struct work_struct *work)
struct cls_fl_head *head = container_of(to_rcu_work(work),
struct cls_fl_head,
rwork);
+
+   rhashtable_destroy(>ht);
kfree(head);
module_put(THIS_MODULE);
 }
-- 
2.7.4



[PATCH net] cls_flower: Fix incorrect idr release when failing to modify rule

2018-05-30 Thread Paul Blakey
When we fail to modify a rule, we incorrectly release the idr handle
of the unmodified old rule.

Fix that by checking if we need to release it.

Fixes: fe2502e49b58 ("net_sched: remove cls_flower idr on failure")
Reported-by: Vlad Buslov 
Reviewed-by: Roi Dayan 
Acked-by: Jiri Pirko 
Signed-off-by: Paul Blakey 
---
 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index d964e60..c79f6e7 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -977,7 +977,7 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
return 0;
 
 errout_idr:
-   if (fnew->handle)
+   if (!fold)
idr_remove(>handle_idr, fnew->handle);
 errout:
tcf_exts_destroy(>exts);
-- 
2.7.4



[PATCH net-next 2/2] cls_flower: Fix comparing of old filter mask with new filter

2018-05-30 Thread Paul Blakey
We incorrectly compare the mask and the result is that we can't modify
an already existing rule.

Fix that by comparing correctly.

Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Reported-by: Vlad Buslov 
Reviewed-by: Roi Dayan 
Reviewed-by: Jiri Pirko 
Signed-off-by: Paul Blakey 
---
 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index ce5983b..3dc9a66 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -897,7 +897,7 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
return PTR_ERR(newmask);
 
fnew->mask = newmask;
-   } else if (fold && fold->mask == fnew->mask) {
+   } else if (fold && fold->mask != fnew->mask) {
return -EINVAL;
}
 
-- 
2.7.4



[PATCH net-next 0/2] cls_flower: Various fixes

2018-05-30 Thread Paul Blakey
Two of the fixes are for my multiple mask patch

Paul Blakey (2):
  cls_flower: Fix missing free of rhashtable
  cls_flower: Fix comparing of old filter mask with new filter

 net/sched/cls_flower.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.7.4



[PATCH net-next 1/2] cls_flower: Fix missing free of rhashtable

2018-05-30 Thread Paul Blakey
When destroying the instance, destroy the head rhashtable.

Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Reported-by: Vlad Buslov 
Reviewed-by: Roi Dayan 
Reviewed-by: Jiri Pirko 
Signed-off-by: Paul Blakey 
---
 net/sched/cls_flower.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index d06e398..ce5983b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -338,6 +338,8 @@ static void fl_destroy_sleepable(struct work_struct *work)
 {
struct cls_fl_head *head = container_of(work, struct cls_fl_head,
work);
+
+   rhashtable_destroy(>ht);
kfree(head);
module_put(THIS_MODULE);
 }
-- 
2.7.4



[PATCH net-next V2] cls_flower: Support multiple masks per priority

2018-04-30 Thread Paul Blakey
Currently flower doesn't support inserting filters with different masks
on a single priority, even if the actual flows (key + mask) inserted
aren't overlapping, as with the use case of offloading openvswitch
datapath flows. Instead one must go up one level, and assign different
priorities for each mask, which will create a different flower
instances.

This patch opens flower to support more than one mask per priority,
and a single flower instance. It does so by adding another hash table
on top of the existing one which will store the different masks,
and the filters that share it.

The user is left with the responsibility of ensuring non overlapping
flows, otherwise precedence is not guaranteed.

Signed-off-by: Paul Blakey <pa...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---

Changes:
V1 -> V2: in fl_init, removed unnessecry err variable, just return direct 
result instead.
  in fl_mask_range, removed extra new line.
  commit msg, spelling.

 net/sched/cls_flower.c | 275 +++--
 1 file changed, 174 insertions(+), 101 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index d964e60..eacaaf8 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -61,16 +61,18 @@ struct fl_flow_mask_range {
 struct fl_flow_mask {
struct fl_flow_key key;
struct fl_flow_mask_range range;
-   struct rcu_head rcu;
+   struct rhash_head ht_node;
+   struct rhashtable ht;
+   struct rhashtable_params filter_ht_params;
+   struct flow_dissector dissector;
+   struct list_head filters;
+   struct rcu_head rcu;
+   struct list_head list;
 };
 
 struct cls_fl_head {
struct rhashtable ht;
-   struct fl_flow_mask mask;
-   struct flow_dissector dissector;
-   bool mask_assigned;
-   struct list_head filters;
-   struct rhashtable_params ht_params;
+   struct list_head masks;
union {
struct work_struct work;
struct rcu_head rcu;
@@ -79,6 +81,7 @@ struct cls_fl_head {
 };
 
 struct cls_fl_filter {
+   struct fl_flow_mask *mask;
struct rhash_head ht_node;
struct fl_flow_key mkey;
struct tcf_exts exts;
@@ -94,6 +97,13 @@ struct cls_fl_filter {
struct net_device *hw_dev;
 };
 
+static const struct rhashtable_params mask_ht_params = {
+   .key_offset = offsetof(struct fl_flow_mask, key),
+   .key_len = sizeof(struct fl_flow_key),
+   .head_offset = offsetof(struct fl_flow_mask, ht_node),
+   .automatic_shrinking = true,
+};
+
 static unsigned short int fl_mask_range(const struct fl_flow_mask *mask)
 {
return mask->range.end - mask->range.start;
@@ -103,13 +113,19 @@ static void fl_mask_update_range(struct fl_flow_mask 
*mask)
 {
const u8 *bytes = (const u8 *) >key;
size_t size = sizeof(mask->key);
-   size_t i, first = 0, last = size - 1;
+   size_t i, first = 0, last;
 
-   for (i = 0; i < sizeof(mask->key); i++) {
+   for (i = 0; i < size; i++) {
+   if (bytes[i]) {
+   first = i;
+   break;
+   }
+   }
+   last = first;
+   for (i = size - 1; i != first; i--) {
if (bytes[i]) {
-   if (!first && i)
-   first = i;
last = i;
+   break;
}
}
mask->range.start = rounddown(first, sizeof(long));
@@ -140,12 +156,11 @@ static void fl_clear_masked_range(struct fl_flow_key *key,
memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
 }
 
-static struct cls_fl_filter *fl_lookup(struct cls_fl_head *head,
+static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
   struct fl_flow_key *mkey)
 {
-   return rhashtable_lookup_fast(>ht,
- fl_key_get_start(mkey, >mask),
- head->ht_params);
+   return rhashtable_lookup_fast(>ht, fl_key_get_start(mkey, mask),
+ mask->filter_ht_params);
 }
 
 static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
@@ -153,28 +168,28 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
 {
struct cls_fl_head *head = rcu_dereference_bh(tp->root);
struct cls_fl_filter *f;
+   struct fl_flow_mask *mask;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
 
-   if (!atomic_read(>ht.nelems))
-   return -1;
-
-   fl_clear_masked_range(_key, >mask);
+   list_for_each_entry_rcu(mask, >masks, list) {
+   fl_clear_masked_range(_key, mask);
 
-   skb_key.indev_ifindex = skb->skb_iif;
-   /* skb_flow_dissect() does

[PATCH net-next] cls_flower: Support multiple masks per priority

2018-04-30 Thread Paul Blakey
Currently flower doesn't support inserting filters with different masks
on a single priority, even if the actual flows (key + mask) inserted
aren't overlapping, as with the use case of offloading openvswitch
datapath flows. Instead one must go up one level, and assign different
priorities for each mask, which will create a different flower
instances.

This patch opens flower to support more than one mask per priority,
and a single flower instance. It does so by adding another hash table
on top of the existing one which will store the different masks,
and the filters that share it.

The user is left with the responsibilty of ensuring non overlapping
flows, otherwise precedence is not guaranteed.

Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 net/sched/cls_flower.c | 279 +++--
 1 file changed, 179 insertions(+), 100 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index d964e60..258b385 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -61,16 +61,18 @@ struct fl_flow_mask_range {
 struct fl_flow_mask {
struct fl_flow_key key;
struct fl_flow_mask_range range;
-   struct rcu_head rcu;
+   struct rhash_head ht_node;
+   struct rhashtable ht;
+   struct rhashtable_params filter_ht_params;
+   struct flow_dissector dissector;
+   struct list_head filters;
+   struct rcu_head rcu;
+   struct list_head list;
 };
 
 struct cls_fl_head {
struct rhashtable ht;
-   struct fl_flow_mask mask;
-   struct flow_dissector dissector;
-   bool mask_assigned;
-   struct list_head filters;
-   struct rhashtable_params ht_params;
+   struct list_head masks;
union {
struct work_struct work;
struct rcu_head rcu;
@@ -79,6 +81,7 @@ struct cls_fl_head {
 };
 
 struct cls_fl_filter {
+   struct fl_flow_mask *mask;
struct rhash_head ht_node;
struct fl_flow_key mkey;
struct tcf_exts exts;
@@ -94,6 +97,13 @@ struct cls_fl_filter {
struct net_device *hw_dev;
 };
 
+static const struct rhashtable_params mask_ht_params = {
+   .key_offset = offsetof(struct fl_flow_mask, key),
+   .key_len = sizeof(struct fl_flow_key),
+   .head_offset = offsetof(struct fl_flow_mask, ht_node),
+   .automatic_shrinking = true,
+};
+
 static unsigned short int fl_mask_range(const struct fl_flow_mask *mask)
 {
return mask->range.end - mask->range.start;
@@ -103,15 +113,22 @@ static void fl_mask_update_range(struct fl_flow_mask 
*mask)
 {
const u8 *bytes = (const u8 *) >key;
size_t size = sizeof(mask->key);
-   size_t i, first = 0, last = size - 1;
+   size_t i, first = 0, last;
 
-   for (i = 0; i < sizeof(mask->key); i++) {
+   for (i = 0; i < size; i++) {
+   if (bytes[i]) {
+   first = i;
+   break;
+   }
+   }
+   last = first;
+   for (i = size - 1; i != first; i--) {
if (bytes[i]) {
-   if (!first && i)
-   first = i;
last = i;
+   break;
}
}
+
mask->range.start = rounddown(first, sizeof(long));
mask->range.end = roundup(last + 1, sizeof(long));
 }
@@ -140,12 +157,11 @@ static void fl_clear_masked_range(struct fl_flow_key *key,
memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
 }
 
-static struct cls_fl_filter *fl_lookup(struct cls_fl_head *head,
+static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
   struct fl_flow_key *mkey)
 {
-   return rhashtable_lookup_fast(>ht,
- fl_key_get_start(mkey, >mask),
- head->ht_params);
+   return rhashtable_lookup_fast(>ht, fl_key_get_start(mkey, mask),
+ mask->filter_ht_params);
 }
 
 static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
@@ -153,28 +169,28 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
 {
struct cls_fl_head *head = rcu_dereference_bh(tp->root);
struct cls_fl_filter *f;
+   struct fl_flow_mask *mask;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
 
-   if (!atomic_read(>ht.nelems))
-   return -1;
-
-   fl_clear_masked_range(_key, >mask);
+   list_for_each_entry_rcu(mask, >masks, list) {
+   fl_clear_masked_range(_key, mask);
 
-   skb_key.indev_ifindex = skb->skb_iif;
-   /* skb_flow_dissect() does not set n_proto in case an unknown protocol,
-* so do it rather here.
-*/
-   skb_key.basic.n_proto = skb->protocol;
-   skb_flow_dissect_tunnel_info

Re: [PATCH net] test_rhashtable: Add missing rcu_read_lock()

2018-03-08 Thread Paul Blakey



On 08/03/2018 17:58, Herbert Xu wrote:

On Thu, Mar 08, 2018 at 01:54:57PM +0200, Paul Blakey wrote:

Suppress "suspicious rcu_dereference_protected() usage!" on duplicate
insertion test.

Fixes: 499ac3b60f65 ('test_rhashtable: add test case for rhl_table with 
duplicate objects')
Signed-off-by: Paul Blakey <pa...@mellanox.com>


This shouldn't be doing a table walk in the first place.

You should convert it to using the table walk interface.

Direct table walks are forbidden because they will break when
you hit a resize.

Cheers,




I know but I wanted to show the inner structure for the failure case, 
iirc walk doesn't provide this.


    ht: 
   bucket[1] -> [[ val 1 (tid=0) ]] -> [[ val 21 (tid=1) ]]
   -
[ 3599.715501]
    ht: 
   bucket[1] -> [[ val 1 (tid=2),  val 1 (tid=0) ]] -> [[ 
val 21 (tid=1) ]]

   -
[ 3599.738600]
    ht: 
   bucket[1] -> [[ val 21 (tid=1) ]] -> [[ val 1 (tid=0) ]]
   -
[ 3599.759443]
    ht: 
   bucket[1] -> [[ val 21 (tid=1) ]] -> [[ val 1 (tid=2), 
val 1 (tid=0) ]]


[PATCH net] test_rhashtable: Add missing rcu_read_lock()

2018-03-08 Thread Paul Blakey
Suppress "suspicious rcu_dereference_protected() usage!" on duplicate
insertion test.

Fixes: 499ac3b60f65 ('test_rhashtable: add test case for rhl_table with 
duplicate objects')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 lib/test_rhashtable.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index f4000c1..727d3ec 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -498,14 +498,16 @@ static unsigned int __init print_ht(struct rhltable *rhlt)
char buff[512] = "";
unsigned int i, cnt = 0;
 
+   rcu_read_lock();
+
ht = >ht;
-   tbl = rht_dereference(ht->tbl, ht);
+   tbl = rht_dereference_rcu(ht->tbl, ht);
for (i = 0; i < tbl->size; i++) {
struct rhash_head *pos, *next;
struct test_obj_rhl *p;
 
-   pos = rht_dereference(tbl->buckets[i], ht);
-   next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : 
NULL;
+   pos = rht_dereference_rcu(tbl->buckets[i], ht);
+   next = !rht_is_a_nulls(pos) ? rht_dereference_rcu(pos->next, 
ht) : NULL;
 
if (!rht_is_a_nulls(pos)) {
sprintf(buff, "%s\nbucket[%d] -> ", buff, i);
@@ -516,7 +518,7 @@ static unsigned int __init print_ht(struct rhltable *rhlt)
sprintf(buff, "%s[[", buff);
do {
pos = >rhead;
-   list = rht_dereference(list->next, ht);
+   list = rht_dereference_rcu(list->next, ht);
p = rht_obj(ht, pos);
 
sprintf(buff, "%s val %d (tid=%d)%s", buff, 
p->value.id, p->value.tid,
@@ -526,13 +528,15 @@ static unsigned int __init print_ht(struct rhltable *rhlt)
 
pos = next,
next = !rht_is_a_nulls(pos) ?
-   rht_dereference(pos->next, ht) : NULL;
+   rht_dereference_rcu(pos->next, ht) : NULL;
 
sprintf(buff, "%s]]%s", buff, !rht_is_a_nulls(pos) ? " 
-> " : "");
}
}
printk(KERN_ERR "\n ht: %s\n-\n", buff);
 
+   rcu_read_unlock();
+
return cnt;
 }
 
-- 
1.8.4.3



Re: [PATCH net v4 0/2] rhashtable: Fix rhltable duplicates insertion

2018-03-08 Thread Paul Blakey



On 07/03/2018 18:23, David Miller wrote:

From: Paul Blakey <pa...@mellanox.com>
Date: Wed,  7 Mar 2018 16:00:11 +0200


On our mlx5 driver fs_core.c, we use the rhltable interface to store
flow groups. We noticed that sometimes we get a warning that flow group isn't
found at removal. This rare case was caused when a specific scenario happened,
insertion of a flow group with a similar match criteria (a duplicate),
but only where the flow group rhash_head was second (or not first)
on the relevant rhashtable bucket list.

The first patch fixes it, and the second one adds a test that show
it is now working.

Paul.

v4 --> v3 changes:
 * Added Herbert Xu's ack (thanks)
 * Removed extra commit tags


I already applied v3, sorry.  But I did add Herbert's ACKs.
:-)



thanks


[PATCH net v4 2/2] test_rhashtable: add test case for rhltable with duplicate objects

2018-03-07 Thread Paul Blakey
Tries to insert duplicates in the middle of bucket's chain:
bucket 1:  [[val 21 (tid=1)]] -> [[ val 1 (tid=2),  val 1 (tid=0) ]]

Reuses tid to distinguish the elements insertion order.

Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 lib/test_rhashtable.c | 134 ++
 1 file changed, 134 insertions(+)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 76d3667..f4000c1 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -79,6 +79,21 @@ struct thread_data {
struct test_obj *objs;
 };
 
+static u32 my_hashfn(const void *data, u32 len, u32 seed)
+{
+   const struct test_obj_rhl *obj = data;
+
+   return (obj->value.id % 10) << RHT_HASH_RESERVED_SPACE;
+}
+
+static int my_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+   const struct test_obj_rhl *test_obj = obj;
+   const struct test_obj_val *val = arg->key;
+
+   return test_obj->value.id - val->id;
+}
+
 static struct rhashtable_params test_rht_params = {
.head_offset = offsetof(struct test_obj, node),
.key_offset = offsetof(struct test_obj, value),
@@ -87,6 +102,17 @@ struct thread_data {
.nulls_base = (3U << RHT_BASE_SHIFT),
 };
 
+static struct rhashtable_params test_rht_params_dup = {
+   .head_offset = offsetof(struct test_obj_rhl, list_node),
+   .key_offset = offsetof(struct test_obj_rhl, value),
+   .key_len = sizeof(struct test_obj_val),
+   .hashfn = jhash,
+   .obj_hashfn = my_hashfn,
+   .obj_cmpfn = my_cmpfn,
+   .nelem_hint = 128,
+   .automatic_shrinking = false,
+};
+
 static struct semaphore prestart_sem;
 static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0);
 
@@ -465,6 +491,112 @@ static int __init test_rhashtable_max(struct test_obj 
*array,
return err;
 }
 
+static unsigned int __init print_ht(struct rhltable *rhlt)
+{
+   struct rhashtable *ht;
+   const struct bucket_table *tbl;
+   char buff[512] = "";
+   unsigned int i, cnt = 0;
+
+   ht = >ht;
+   tbl = rht_dereference(ht->tbl, ht);
+   for (i = 0; i < tbl->size; i++) {
+   struct rhash_head *pos, *next;
+   struct test_obj_rhl *p;
+
+   pos = rht_dereference(tbl->buckets[i], ht);
+   next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : 
NULL;
+
+   if (!rht_is_a_nulls(pos)) {
+   sprintf(buff, "%s\nbucket[%d] -> ", buff, i);
+   }
+
+   while (!rht_is_a_nulls(pos)) {
+   struct rhlist_head *list = container_of(pos, struct 
rhlist_head, rhead);
+   sprintf(buff, "%s[[", buff);
+   do {
+   pos = >rhead;
+   list = rht_dereference(list->next, ht);
+   p = rht_obj(ht, pos);
+
+   sprintf(buff, "%s val %d (tid=%d)%s", buff, 
p->value.id, p->value.tid,
+   list? ", " : " ");
+   cnt++;
+   } while (list);
+
+   pos = next,
+   next = !rht_is_a_nulls(pos) ?
+   rht_dereference(pos->next, ht) : NULL;
+
+   sprintf(buff, "%s]]%s", buff, !rht_is_a_nulls(pos) ? " 
-> " : "");
+   }
+   }
+   printk(KERN_ERR "\n ht: %s\n-\n", buff);
+
+   return cnt;
+}
+
+static int __init test_insert_dup(struct test_obj_rhl *rhl_test_objects,
+ int cnt, bool slow)
+{
+   struct rhltable rhlt;
+   unsigned int i, ret;
+   const char *key;
+   int err = 0;
+
+   err = rhltable_init(, _rht_params_dup);
+   if (WARN_ON(err))
+   return err;
+
+   for (i = 0; i < cnt; i++) {
+   rhl_test_objects[i].value.tid = i;
+   key = rht_obj(, _test_objects[i].list_node.rhead);
+   key += test_rht_params_dup.key_offset;
+
+   if (slow) {
+   err = PTR_ERR(rhashtable_insert_slow(, key,
+
_test_objects[i].list_node.rhead));
+   if (err == -EAGAIN)
+   err = 0;
+   } else
+   err = rhltable_insert(,
+ _test_objects[i].list_node,
+ test_rht_params_dup);
+   if (WARN(err, "error %d on element %d/%d (%s)\n", err, i, cnt, 
slow? "slow" : "fast"))
+   goto skip_print;
+   }
+
+   ret =

[PATCH net v4 1/2] rhashtable: Fix rhlist duplicates insertion

2018-03-07 Thread Paul Blakey
When inserting duplicate objects (those with the same key),
current rhlist implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.

Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.

Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 include/linux/rhashtable.h | 4 +++-
 lib/rhashtable.c   | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index c9df252..668a21f 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast(
if (!key ||
(params.obj_cmpfn ?
 params.obj_cmpfn(, rht_obj(ht, head)) :
-rhashtable_compare(, rht_obj(ht, head
+rhashtable_compare(, rht_obj(ht, head {
+   pprev = >next;
continue;
+   }
 
data = rht_obj(ht, head);
 
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 3825c30..47de025 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -506,8 +506,10 @@ static void *rhashtable_lookup_one(struct rhashtable *ht,
if (!key ||
(ht->p.obj_cmpfn ?
 ht->p.obj_cmpfn(, rht_obj(ht, head)) :
-rhashtable_compare(, rht_obj(ht, head
+rhashtable_compare(, rht_obj(ht, head {
+   pprev = >next;
continue;
+   }
 
if (!ht->rhlist)
return rht_obj(ht, head);
-- 
1.8.4.3



[PATCH net v4 0/2] rhashtable: Fix rhltable duplicates insertion

2018-03-07 Thread Paul Blakey
On our mlx5 driver fs_core.c, we use the rhltable interface to store
flow groups. We noticed that sometimes we get a warning that flow group isn't
found at removal. This rare case was caused when a specific scenario happened,
insertion of a flow group with a similar match criteria (a duplicate),
but only where the flow group rhash_head was second (or not first)
on the relevant rhashtable bucket list.

The first patch fixes it, and the second one adds a test that show
it is now working.

Paul.

v4 --> v3 changes:
* Added Herbert Xu's ack (thanks)
* Removed extra commit tags
v3 --> v2 changes:
* Added missing fix in rhashtable_lookup_one code path as well.

v2 --> v1 changes:
* Changed commit messages to better reflect the change

Paul Blakey (2):
  rhashtable: Fix rhlist duplicates insertion
  test_rhashtable: add test case for rhltable with duplicate objects

 include/linux/rhashtable.h |   4 +-
 lib/rhashtable.c   |   4 +-
 lib/test_rhashtable.c  | 134 +
 3 files changed, 140 insertions(+), 2 deletions(-)

-- 
1.8.4.3



Re: [PATCH net v2 1/2] rhashtable: Fix rhlist duplicates insertion

2018-03-04 Thread Paul Blakey



On 04/03/2018 17:13, Mark Bloch wrote:



On 04/03/2018 15:26, Paul Blakey wrote:

When inserting duplicate objects (those with the same key),
current rhlist implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.

Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.

Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
  include/linux/rhashtable.h | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index c9df252..668a21f 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast(
if (!key ||
(params.obj_cmpfn ?
 params.obj_cmpfn(, rht_obj(ht, head)) :
-rhashtable_compare(, rht_obj(ht, head
+rhashtable_compare(, rht_obj(ht, head {
+   pprev = >next;


It seems rhashtable_lookup_one() might need the same fix.


yes was just about to send it!
it's in v3 with a test that shows it.




continue;
+   }
  
  		data = rht_obj(ht, head);
  



Mark



[PATCH net v3 0/2] rhashtable: Fix rhltable duplicates insertion

2018-03-04 Thread Paul Blakey
On our mlx5 driver fs_core.c, we use the rhltable interface to store
flow groups. We noticed that sometimes we get a warning that flow group isn't
found at removal. This rare case was caused when a specific scenario happened,
insertion of a flow group with a similar match criteria (a duplicate),
but only where the flow group rhash_head was second (or not first)
on the relevant rhashtable bucket list.

The first patch fixes it, and the second one adds a test that show
it is now working.

Paul.

v3 --> v2 changes:
* added missing fix in rhashtable_lookup_one code path as well.

v1 --> v2 changes:
* Changed commit messages to better reflect the change

Paul Blakey (2):
  rhashtable: Fix rhlist duplicates insertion
  test_rhashtable: add test case for rhltable with duplicate objects

 include/linux/rhashtable.h |   4 +-
 lib/rhashtable.c   |   4 +-
 lib/test_rhashtable.c  | 134 +
 3 files changed, 140 insertions(+), 2 deletions(-)

-- 
1.8.4.3



[PATCH net v3 2/2] test_rhashtable: add test case for rhltable with duplicate objects

2018-03-04 Thread Paul Blakey
Tries to insert duplicates in the middle of bucket's chain:
bucket 1:  [[val 21 (tid=1)]] -> [[ val 1 (tid=2),  val 1 (tid=0) ]]

Reuses tid to distinguish the elements insertion order.

Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 lib/test_rhashtable.c | 134 ++
 1 file changed, 134 insertions(+)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 76d3667..f4000c1 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -79,6 +79,21 @@ struct thread_data {
struct test_obj *objs;
 };
 
+static u32 my_hashfn(const void *data, u32 len, u32 seed)
+{
+   const struct test_obj_rhl *obj = data;
+
+   return (obj->value.id % 10) << RHT_HASH_RESERVED_SPACE;
+}
+
+static int my_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+   const struct test_obj_rhl *test_obj = obj;
+   const struct test_obj_val *val = arg->key;
+
+   return test_obj->value.id - val->id;
+}
+
 static struct rhashtable_params test_rht_params = {
.head_offset = offsetof(struct test_obj, node),
.key_offset = offsetof(struct test_obj, value),
@@ -87,6 +102,17 @@ struct thread_data {
.nulls_base = (3U << RHT_BASE_SHIFT),
 };
 
+static struct rhashtable_params test_rht_params_dup = {
+   .head_offset = offsetof(struct test_obj_rhl, list_node),
+   .key_offset = offsetof(struct test_obj_rhl, value),
+   .key_len = sizeof(struct test_obj_val),
+   .hashfn = jhash,
+   .obj_hashfn = my_hashfn,
+   .obj_cmpfn = my_cmpfn,
+   .nelem_hint = 128,
+   .automatic_shrinking = false,
+};
+
 static struct semaphore prestart_sem;
 static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0);
 
@@ -465,6 +491,112 @@ static int __init test_rhashtable_max(struct test_obj 
*array,
return err;
 }
 
+static unsigned int __init print_ht(struct rhltable *rhlt)
+{
+   struct rhashtable *ht;
+   const struct bucket_table *tbl;
+   char buff[512] = "";
+   unsigned int i, cnt = 0;
+
+   ht = >ht;
+   tbl = rht_dereference(ht->tbl, ht);
+   for (i = 0; i < tbl->size; i++) {
+   struct rhash_head *pos, *next;
+   struct test_obj_rhl *p;
+
+   pos = rht_dereference(tbl->buckets[i], ht);
+   next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : 
NULL;
+
+   if (!rht_is_a_nulls(pos)) {
+   sprintf(buff, "%s\nbucket[%d] -> ", buff, i);
+   }
+
+   while (!rht_is_a_nulls(pos)) {
+   struct rhlist_head *list = container_of(pos, struct 
rhlist_head, rhead);
+   sprintf(buff, "%s[[", buff);
+   do {
+   pos = >rhead;
+   list = rht_dereference(list->next, ht);
+   p = rht_obj(ht, pos);
+
+   sprintf(buff, "%s val %d (tid=%d)%s", buff, 
p->value.id, p->value.tid,
+   list? ", " : " ");
+   cnt++;
+   } while (list);
+
+   pos = next,
+   next = !rht_is_a_nulls(pos) ?
+   rht_dereference(pos->next, ht) : NULL;
+
+   sprintf(buff, "%s]]%s", buff, !rht_is_a_nulls(pos) ? " 
-> " : "");
+   }
+   }
+   printk(KERN_ERR "\n ht: %s\n-\n", buff);
+
+   return cnt;
+}
+
+static int __init test_insert_dup(struct test_obj_rhl *rhl_test_objects,
+ int cnt, bool slow)
+{
+   struct rhltable rhlt;
+   unsigned int i, ret;
+   const char *key;
+   int err = 0;
+
+   err = rhltable_init(, _rht_params_dup);
+   if (WARN_ON(err))
+   return err;
+
+   for (i = 0; i < cnt; i++) {
+   rhl_test_objects[i].value.tid = i;
+   key = rht_obj(, _test_objects[i].list_node.rhead);
+   key += test_rht_params_dup.key_offset;
+
+   if (slow) {
+   err = PTR_ERR(rhashtable_insert_slow(, key,
+
_test_objects[i].list_node.rhead));
+   if (err == -EAGAIN)
+   err = 0;
+   } else
+   err = rhltable_insert(,
+ _test_objects[i].list_node,
+ test_rht_params_dup);
+   if (WARN(err, "error %d on element %d/%d (%s)\n", err, i, cnt, 
slow? "slow" : "fast"))
+   goto skip_print;
+   }
+
+   ret =

[PATCH net v3 1/2] rhashtable: Fix rhlist duplicates insertion

2018-03-04 Thread Paul Blakey
When inserting duplicate objects (those with the same key),
current rhlist implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.

Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.

Issue: 1241076
Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7
Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 include/linux/rhashtable.h | 4 +++-
 lib/rhashtable.c   | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index c9df252..668a21f 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast(
if (!key ||
(params.obj_cmpfn ?
 params.obj_cmpfn(, rht_obj(ht, head)) :
-rhashtable_compare(, rht_obj(ht, head
+rhashtable_compare(, rht_obj(ht, head {
+   pprev = >next;
continue;
+   }
 
data = rht_obj(ht, head);
 
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 3825c30..47de025 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -506,8 +506,10 @@ static void *rhashtable_lookup_one(struct rhashtable *ht,
if (!key ||
(ht->p.obj_cmpfn ?
 ht->p.obj_cmpfn(, rht_obj(ht, head)) :
-rhashtable_compare(, rht_obj(ht, head
+rhashtable_compare(, rht_obj(ht, head {
+   pprev = >next;
continue;
+   }
 
if (!ht->rhlist)
return rht_obj(ht, head);
-- 
1.8.4.3



Re: [PATCH net 1/2] rhashtable: Fix rhltable duplicates insertion

2018-03-04 Thread Paul Blakey



On 04/03/2018 14:57, Herbert Xu wrote:

On Sun, Mar 04, 2018 at 02:34:26PM +0200, Paul Blakey wrote:

When inserting duplicate objects (those with the same key),
current rhashtable implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.

Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.

Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface')
Signed-off-by: Paul Blakey <pa...@mellanox.com>


Nack.  You must not insert objects with the same key through
rhashtable.  The reason is that we cannot reliably fetch all
of the objects with the same key during a resize.

If you need duplicate objects, you should use rhlist.

Cheers,




Hi, I meant the rhlist interface here, sent  v2.
Thanks,
Paul.


[PATCH net v2 0/2] rhlist: Fix rhltable duplicates insertion

2018-03-04 Thread Paul Blakey
On our mlx5 driver fs_core.c, we use the rhltable interface to store
flow groups. We noticed that sometimes we get a warning that flow group isn't
found at removal. This rare case was caused when a specific scenario happened,
insertion of a flow group with a similar match criteria (a duplicate),
but only where the flow group rhash_head was second (or not first)
on the relevant rhashtable bucket list.

The first patch fixes it, and the second one adds a test that show
it is now working.

Paul.

v1 --> v2 changes:
* Changed commit messages to better reflect the change

Paul Blakey (2):
  rhashtable: Fix rhlist duplicates insertion
  test_rhashtable: add test case for rhltable with duplicate objects

 include/linux/rhashtable.h |   4 +-
 lib/test_rhashtable.c  | 121 +
 2 files changed, 124 insertions(+), 1 deletion(-)

-- 
1.8.4.3



[PATCH net v2 1/2] rhashtable: Fix rhlist duplicates insertion

2018-03-04 Thread Paul Blakey
When inserting duplicate objects (those with the same key),
current rhlist implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.

Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.

Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 include/linux/rhashtable.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index c9df252..668a21f 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast(
if (!key ||
(params.obj_cmpfn ?
 params.obj_cmpfn(, rht_obj(ht, head)) :
-rhashtable_compare(, rht_obj(ht, head
+rhashtable_compare(, rht_obj(ht, head {
+   pprev = >next;
continue;
+   }
 
data = rht_obj(ht, head);
 
-- 
1.8.4.3



[PATCH net v2 2/2] test_rhashtable: add test case for rhltable with duplicate objects

2018-03-04 Thread Paul Blakey
Tries to insert duplicates in the middle of bucket's chain:
bucket 1:  [[val 21 (tid=1)]] -> [[ val 1 (tid=2),  val 1 (tid=0) ]]

Reuses tid to distinguish the elements insertion order.

Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 lib/test_rhashtable.c | 121 ++
 1 file changed, 121 insertions(+)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 76d3667..4a5f331 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -79,6 +79,21 @@ struct thread_data {
struct test_obj *objs;
 };
 
+static u32 my_hashfn(const void *data, u32 len, u32 seed)
+{
+   const struct test_obj_rhl *obj = data;
+
+   return (obj->value.id % 10) << RHT_HASH_RESERVED_SPACE;
+}
+
+static int my_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+   const struct test_obj_rhl *test_obj = obj;
+   const struct test_obj_val *val = arg->key;
+
+   return test_obj->value.id - val->id;
+}
+
 static struct rhashtable_params test_rht_params = {
.head_offset = offsetof(struct test_obj, node),
.key_offset = offsetof(struct test_obj, value),
@@ -87,6 +102,17 @@ struct thread_data {
.nulls_base = (3U << RHT_BASE_SHIFT),
 };
 
+static struct rhashtable_params test_rht_params_dup = {
+   .head_offset = offsetof(struct test_obj_rhl, list_node),
+   .key_offset = offsetof(struct test_obj_rhl, value),
+   .key_len = sizeof(struct test_obj_val),
+   .hashfn = jhash,
+   .obj_hashfn = my_hashfn,
+   .obj_cmpfn = my_cmpfn,
+   .nelem_hint = 128,
+   .automatic_shrinking = false,
+};
+
 static struct semaphore prestart_sem;
 static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0);
 
@@ -465,6 +491,99 @@ static int __init test_rhashtable_max(struct test_obj 
*array,
return err;
 }
 
+static unsigned int __init print_ht(struct rhltable *rhlt)
+{
+   struct rhashtable *ht;
+   const struct bucket_table *tbl;
+   char buff[512] = "";
+   unsigned int i, cnt = 0;
+
+   ht = >ht;
+   tbl = rht_dereference(ht->tbl, ht);
+   for (i = 0; i < tbl->size; i++) {
+   struct rhash_head *pos, *next;
+   struct test_obj_rhl *p;
+
+   pos = rht_dereference(tbl->buckets[i], ht);
+   next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : 
NULL;
+
+   if (!rht_is_a_nulls(pos)) {
+   sprintf(buff, "%s\nbucket[%d] -> ", buff, i);
+   }
+
+   while (!rht_is_a_nulls(pos)) {
+   struct rhlist_head *list = container_of(pos, struct 
rhlist_head, rhead);
+   sprintf(buff, "%s[[", buff);
+   do {
+   pos = >rhead;
+   list = rht_dereference(list->next, ht);
+   p = rht_obj(ht, pos);
+
+   sprintf(buff, "%s val %d (tid=%d)%s", buff, 
p->value.id, p->value.tid,
+   list? ", " : " ");
+   cnt++;
+   } while (list);
+
+   pos = next,
+   next = !rht_is_a_nulls(pos) ?
+   rht_dereference(pos->next, ht) : NULL;
+
+   sprintf(buff, "%s]]%s", buff, !rht_is_a_nulls(pos) ? " 
-> " : "");
+   }
+   }
+   printk(KERN_ERR "\n ht: %s\n-\n", buff);
+
+   return cnt;
+}
+
+static int __init test_insert_dup(struct test_obj_rhl *rhl_test_objects,
+ int cnt)
+{
+   struct rhltable rhlt;
+   unsigned int i, ret;
+   int err;
+
+   err = rhltable_init(, _rht_params_dup);
+   if (WARN_ON(err))
+   return err;
+
+   for (i = 0; i < cnt; i++) {
+   rhl_test_objects[i].value.tid = i;
+   err = rhltable_insert(, _test_objects[i].list_node,
+ test_rht_params_dup);
+   if (WARN(err, "error %d on element %d\n", err, i))
+   goto skip_print;
+   }
+
+   ret = print_ht();
+   WARN(ret != cnt, "missing rhltable elements (%d != %d)\n", ret, cnt);
+
+skip_print:
+   rhltable_destroy();
+
+   return 0;
+}
+
+static int __init test_insert_duplicates_run(void)
+{
+   struct test_obj_rhl rhl_test_objects[3] = {};
+
+   pr_info("test inserting duplicates\n");
+
+   /* two different values that map to same bucket */
+   rhl_test_objects[0].value.id = 1;
+   rhl_test_objects[1].value.id = 21;
+
+   /* and another duplicate with same as [0] value
+* which will be second on the bucket li

[PATCH net 0/2] rhashtable: Fix rhltable duplicates insertion

2018-03-04 Thread Paul Blakey
On our mlx5 driver fs_core.c, we use the rhltable interface to store
flow groups. We noticed that sometimes we get a warning that flow group isn't
found at removal. This rare case was caused when a specific scenrio happened, 
insertion of a flow group with a similar match criteria (duplicate),
but only where the flow group rhash_head was second (or not first)
on the relevant rhashtable bucket list.

The first patch fixes it, and the second one adds a test that show
it is now working.

Paul Blakey (2):
  rhashtable: Fix rhltable duplicates insertion
  test_rhashtable: add test case for rhl_table with duplicate objects

 include/linux/rhashtable.h |   4 +-
 lib/test_rhashtable.c  | 121 +
 2 files changed, 124 insertions(+), 1 deletion(-)

-- 
1.8.4.3



[PATCH net 2/2] test_rhashtable: add test case for rhl_table with duplicate objects

2018-03-04 Thread Paul Blakey
Tries to insert duplicates in the middle of bucket's chain:
bucket 1:  [[val 21 (tid=1)]] -> [[ val 1 (tid=2),  val 1 (tid=0) ]]

Reuses tid to distinguish the elements insertion order.

Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 lib/test_rhashtable.c | 121 ++
 1 file changed, 121 insertions(+)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 76d3667..4a5f331 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -79,6 +79,21 @@ struct thread_data {
struct test_obj *objs;
 };
 
+static u32 my_hashfn(const void *data, u32 len, u32 seed)
+{
+   const struct test_obj_rhl *obj = data;
+
+   return (obj->value.id % 10) << RHT_HASH_RESERVED_SPACE;
+}
+
+static int my_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+   const struct test_obj_rhl *test_obj = obj;
+   const struct test_obj_val *val = arg->key;
+
+   return test_obj->value.id - val->id;
+}
+
 static struct rhashtable_params test_rht_params = {
.head_offset = offsetof(struct test_obj, node),
.key_offset = offsetof(struct test_obj, value),
@@ -87,6 +102,17 @@ struct thread_data {
.nulls_base = (3U << RHT_BASE_SHIFT),
 };
 
+static struct rhashtable_params test_rht_params_dup = {
+   .head_offset = offsetof(struct test_obj_rhl, list_node),
+   .key_offset = offsetof(struct test_obj_rhl, value),
+   .key_len = sizeof(struct test_obj_val),
+   .hashfn = jhash,
+   .obj_hashfn = my_hashfn,
+   .obj_cmpfn = my_cmpfn,
+   .nelem_hint = 128,
+   .automatic_shrinking = false,
+};
+
 static struct semaphore prestart_sem;
 static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0);
 
@@ -465,6 +491,99 @@ static int __init test_rhashtable_max(struct test_obj 
*array,
return err;
 }
 
+static unsigned int __init print_ht(struct rhltable *rhlt)
+{
+   struct rhashtable *ht;
+   const struct bucket_table *tbl;
+   char buff[512] = "";
+   unsigned int i, cnt = 0;
+
+   ht = >ht;
+   tbl = rht_dereference(ht->tbl, ht);
+   for (i = 0; i < tbl->size; i++) {
+   struct rhash_head *pos, *next;
+   struct test_obj_rhl *p;
+
+   pos = rht_dereference(tbl->buckets[i], ht);
+   next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : 
NULL;
+
+   if (!rht_is_a_nulls(pos)) {
+   sprintf(buff, "%s\nbucket[%d] -> ", buff, i);
+   }
+
+   while (!rht_is_a_nulls(pos)) {
+   struct rhlist_head *list = container_of(pos, struct 
rhlist_head, rhead);
+   sprintf(buff, "%s[[", buff);
+   do {
+   pos = >rhead;
+   list = rht_dereference(list->next, ht);
+   p = rht_obj(ht, pos);
+
+   sprintf(buff, "%s val %d (tid=%d)%s", buff, 
p->value.id, p->value.tid,
+   list? ", " : " ");
+   cnt++;
+   } while (list);
+
+   pos = next,
+   next = !rht_is_a_nulls(pos) ?
+   rht_dereference(pos->next, ht) : NULL;
+
+   sprintf(buff, "%s]]%s", buff, !rht_is_a_nulls(pos) ? " 
-> " : "");
+   }
+   }
+   printk(KERN_ERR "\n ht: %s\n-\n", buff);
+
+   return cnt;
+}
+
+static int __init test_insert_dup(struct test_obj_rhl *rhl_test_objects,
+ int cnt)
+{
+   struct rhltable rhlt;
+   unsigned int i, ret;
+   int err;
+
+   err = rhltable_init(, _rht_params_dup);
+   if (WARN_ON(err))
+   return err;
+
+   for (i = 0; i < cnt; i++) {
+   rhl_test_objects[i].value.tid = i;
+   err = rhltable_insert(, _test_objects[i].list_node,
+ test_rht_params_dup);
+   if (WARN(err, "error %d on element %d\n", err, i))
+   goto skip_print;
+   }
+
+   ret = print_ht();
+   WARN(ret != cnt, "missing rhltable elements (%d != %d)\n", ret, cnt);
+
+skip_print:
+   rhltable_destroy();
+
+   return 0;
+}
+
+static int __init test_insert_duplicates_run(void)
+{
+   struct test_obj_rhl rhl_test_objects[3] = {};
+
+   pr_info("test inserting duplicates\n");
+
+   /* two different values that map to same bucket */
+   rhl_test_objects[0].value.id = 1;
+   rhl_test_objects[1].value.id = 21;
+
+   /* and another duplicate with same as [0] value
+* which will be second on the bucket li

[PATCH net 1/2] rhashtable: Fix rhltable duplicates insertion

2018-03-04 Thread Paul Blakey
When inserting duplicate objects (those with the same key),
current rhashtable implementation messes up the chain pointers by
updating the bucket pointer instead of prev next pointer to the
newly inserted node. This causes missing elements on removal and
travesal.

Fix that by properly updating pprev pointer to point to
the correct rhash_head next pointer.

Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 include/linux/rhashtable.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index c9df252..668a21f 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast(
if (!key ||
(params.obj_cmpfn ?
 params.obj_cmpfn(, rht_obj(ht, head)) :
-rhashtable_compare(, rht_obj(ht, head
+rhashtable_compare(, rht_obj(ht, head {
+   pprev = >next;
continue;
+   }
 
data = rht_obj(ht, head);
 
-- 
1.8.4.3



Re: [PATCH iproute2 v3 1/1] actions: Add support for user cookies

2017-03-22 Thread Paul Blakey

Hi Jamal,
I've tested the patch some and I've put some comments below:

On 22/03/2017 00:01, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim 
---
 tc/m_action.c | 44 ++--
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 05ef07e..6ba7615 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -150,18 +150,19 @@ new_cmd(char **argv)

 }

-int
-parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
+int parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
 {
int argc = *argc_p;
char **argv = *argv_p;
struct rtattr *tail, *tail2;
char k[16];
+   int act_ck_len = 0;
int ok = 0;
int eap = 0; /* expect action parameters */

int ret = 0;
int prio = 0;
+   unsigned char act_ck[TC_COOKIE_MAX_SIZE];

if (argc <= 0)
return -1;
@@ -215,16 +216,39 @@ done0:
addattr_l(n, MAX_MSG, ++prio, NULL, 0);
addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);

-   ret = a->parse_aopt(a, , , TCA_ACT_OPTIONS, 
n);
+   ret = a->parse_aopt(a, , , TCA_ACT_OPTIONS,
+   n);

if (ret < 0) {
fprintf(stderr, "bad action parsing\n");
goto bad_val;
}
+
+   if (*argv && strcmp(*argv, "cookie") == 0) {
+   size_t slen;
+
+   NEXT_ARG();
+   slen = strlen(*argv);
+   if (slen > TC_COOKIE_MAX_SIZE * 2)
+   invarg("cookie cannot exceed %d\n",
+  *argv);
+


invarg doesn't take a format string, but two strings, name of arg that 
is wrong and why it is wrong.

Also you probably  meant to print TC_COOKIE_MAX_SIZE*2


+   if (hex2mem(*argv, act_ck, slen / 2) < 0)
+   invarg("cookie must be a hex string\n",
+  *argv);


same, *argv, should probably be "cookie" and msg should be "must be a 
hex string"


Also cookie of length 1 doesn't work here.


+
+   act_ck_len = slen;
+   argc--;
+   argv++;
+   }
+
+   if (act_ck_len)
+   addattr_l(n, MAX_MSG, TCA_ACT_COOKIE,
+ _ck, act_ck_len);
+
tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
ok++;
}
-
}

if (eap > 0) {
@@ -245,8 +269,7 @@ bad_val:
return -1;
 }

-static int
-tc_print_one_action(FILE *f, struct rtattr *arg)
+static int tc_print_one_action(FILE *f, struct rtattr *arg)
 {

struct rtattr *tb[TCA_ACT_MAX + 1];
@@ -274,8 +297,17 @@ tc_print_one_action(FILE *f, struct rtattr *arg)
return err;

if (show_stats && tb[TCA_ACT_STATS]) {
+
fprintf(f, "\tAction statistics:\n");
print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL);
+   if (tb[TCA_ACT_COOKIE]) {
+   int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]);
+   char b1[strsz+1];
+
+   fprintf(f, "\n\tcookie len %d %s ", strsz,
+   

Re: [PATCH net-next V4] tc: flower: Refactor matching flags to be more user friendly

2017-01-19 Thread Paul Blakey



On 19/01/2017 16:22, Jiri Benc wrote:

On Thu, 19 Jan 2017 16:17:48 +0200, Paul Blakey wrote:

+   while (token) {
+   if (!strncmp(token, "no", 2)) {
+   no = true;
+   token = strchr(token, '_') + 1;


This seems to still assume that "no" is followed by an underscore.
What about a simple token += 2?

 Jiri



Thanks again, v5 is sent.


[PATCH iproute2 net-next V5] tc: flower: Refactor matching flags to be more user friendly

2017-01-19 Thread Paul Blakey
Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no"  (e.g nofrag) unsets the flag,
otherwise it wil be set.

Example:
# add a flower filter that will drop fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags frag \
action drop

# add a flower filter that will drop non-fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags nofrag \
action drop

Fixes: 22a8f019891c ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---

Hi,
Added a framework to add new flags more easily, such 
as the upcoming tcp_flags (see kernel cls_flower), and other ip_flags.

Thanks,
 Paul.


Changelog:

v5:
Fixed wrong use of strtok to skip old prefix.

v4:
Changed prefix in manpage as well.

v3:
Changed prefix to "no" instead of "no_".

v2:
Changed delimiter to "/" to avoid shell pipe errors.


 man/man8/tc-flower.8 |  12 +-
 tc/f_flower.c| 117 ---
 2 files changed, 102 insertions(+), 27 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 2dd2c5e..2bdd2ef 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -46,7 +46,9 @@ flower \- flow based traffic control filter
 .BR enc_dst_ip " | " enc_src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | "
 .B enc_dst_port
-.IR port_number
+.IR port_number " | "
+.BR ip_flags
+.IR IP_FLAGS
 .SH DESCRIPTION
 The
 .B flower
@@ -183,13 +185,19 @@ prefix length. If the prefix is missing, \fBtc\fR assumes 
a full-length
 host match.  Dst port
 .I NUMBER
 is a 16 bit UDP dst port.
+.TP
+.BI ip_flags " IP_FLAGS"
+.I IP_FLAGS
+may be either
+.BR frag " or " nofrag
+to match on fragmented packets or not respectively.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
 (\fBindev\fR,  \fBdst_mac\fR and \fBsrc_mac\fR)
 have no dependency, layer three matches
 (\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR,
-\fBarp_op\fR, \fBarp_tha\fR and \fBarp_sha\fR)
+\fBarp_op\fR, \fBarp_tha\fR, \fBarp_sha\fR and \fBip_flags\fR)
 depend on the
 .B protocol
 option of tc filter, layer four port matches
diff --git a/tc/f_flower.c b/tc/f_flower.c
index d301db3..b42e529 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -24,6 +24,10 @@
 #include "tc_util.h"
 #include "rt_names.h"
 
+enum flower_matching_flags {
+   FLOWER_IP_FLAGS,
+};
+
 enum flower_endpoint {
FLOWER_ENDPOINT_SRC,
FLOWER_ENDPOINT_DST
@@ -63,7 +67,7 @@ static void explain(void)
"   enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_key_id [ KEY-ID ] |\n"
-   "   matching_flags MATCHING-FLAGS | \n"
+   "   ip_flags IP-FLAGS | \n"
"   enc_dst_port [ port_number ] }\n"
"   FILTERID := X:Y:Z\n"
"   MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS 
}\n"
@@ -136,28 +140,56 @@ static int flower_parse_vlan_eth_type(char *str, __be16 
eth_type, int type,
return 0;
 }
 
-static int flower_parse_matching_flags(char *str, int type, int mask_type,
-  struct nlmsghdr *n)
-{
-   __u32 mtf, mtf_mask;
-   char *c;
+struct flag_to_string {
+   int flag;
+   enum flower_matching_flags type;
+   char *string;
+};
 
-   c = strchr(str, '/');
-   if (c)
-   *c = '\0';
+static struct flag_to_string flags_str[] = {
+   { TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOWER_IP_FLAGS, "frag" },
+};
 
-   if (get_u32(, str, 0))
-   return -1;
+static int flower_parse_matching_flags(char *str,
+  enum flower_matching_flags type,
+  __u32 *mtf, __u32 *mtf_mask)
+{
+   char *token;
+   bool no;
+   bool found;
+   int i;
 
-   if (c) {
-   if (get_u32(_mask, ++c, 0))
+   token = strtok(str, "/");
+
+   while (token) {
+   if (!strncmp(token, "no", 2)) {
+   

[PATCH net-next V4] tc: flower: Refactor matching flags to be more user friendly

2017-01-19 Thread Paul Blakey
Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no"  (e.g nofrag) unsets the flag,
otherwise it wil be set.

Example:
# add a flower filter that will drop fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags frag \
action drop

# add a flower filter that will drop non-fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags nofrag \
action drop

Fixes: 22a8f019891c ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---

Hi,
Added a framework to add new flags more easily, such 
as the upcoming tcp_flags (see kernel cls_flower), and other ip_flags.

Thanks,
 Paul.


Changelog:

v4:
Changed prefix in manpage as well.

v3:
Changed prefix to "no" instead of "no_".

v2:
Changed delimiter to "/" to avoid shell pipe errors.


 man/man8/tc-flower.8 |  12 +-
 tc/f_flower.c| 117 ---
 2 files changed, 102 insertions(+), 27 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 2dd2c5e..2bdd2ef 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -46,7 +46,9 @@ flower \- flow based traffic control filter
 .BR enc_dst_ip " | " enc_src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | "
 .B enc_dst_port
-.IR port_number
+.IR port_number " | "
+.BR ip_flags
+.IR IP_FLAGS
 .SH DESCRIPTION
 The
 .B flower
@@ -183,13 +185,19 @@ prefix length. If the prefix is missing, \fBtc\fR assumes 
a full-length
 host match.  Dst port
 .I NUMBER
 is a 16 bit UDP dst port.
+.TP
+.BI ip_flags " IP_FLAGS"
+.I IP_FLAGS
+may be either
+.BR frag " or " nofrag
+to match on fragmented packets or not respectively.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
 (\fBindev\fR,  \fBdst_mac\fR and \fBsrc_mac\fR)
 have no dependency, layer three matches
 (\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR,
-\fBarp_op\fR, \fBarp_tha\fR and \fBarp_sha\fR)
+\fBarp_op\fR, \fBarp_tha\fR, \fBarp_sha\fR and \fBip_flags\fR)
 depend on the
 .B protocol
 option of tc filter, layer four port matches
diff --git a/tc/f_flower.c b/tc/f_flower.c
index d301db3..3dffe2b 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -24,6 +24,10 @@
 #include "tc_util.h"
 #include "rt_names.h"
 
+enum flower_matching_flags {
+   FLOWER_IP_FLAGS,
+};
+
 enum flower_endpoint {
FLOWER_ENDPOINT_SRC,
FLOWER_ENDPOINT_DST
@@ -63,7 +67,7 @@ static void explain(void)
"   enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_key_id [ KEY-ID ] |\n"
-   "   matching_flags MATCHING-FLAGS | \n"
+   "   ip_flags IP-FLAGS | \n"
"   enc_dst_port [ port_number ] }\n"
"   FILTERID := X:Y:Z\n"
"   MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS 
}\n"
@@ -136,28 +140,56 @@ static int flower_parse_vlan_eth_type(char *str, __be16 
eth_type, int type,
return 0;
 }
 
-static int flower_parse_matching_flags(char *str, int type, int mask_type,
-  struct nlmsghdr *n)
-{
-   __u32 mtf, mtf_mask;
-   char *c;
+struct flag_to_string {
+   int flag;
+   enum flower_matching_flags type;
+   char *string;
+};
 
-   c = strchr(str, '/');
-   if (c)
-   *c = '\0';
+static struct flag_to_string flags_str[] = {
+   { TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOWER_IP_FLAGS, "frag" },
+};
 
-   if (get_u32(, str, 0))
-   return -1;
+static int flower_parse_matching_flags(char *str,
+  enum flower_matching_flags type,
+  __u32 *mtf, __u32 *mtf_mask)
+{
+   char *token;
+   bool no;
+   bool found;
+   int i;
 
-   if (c) {
-   if (get_u32(_mask, ++c, 0))
+   token = strtok(str, "/");
+
+   while (token) {
+   if (!strncmp(token, "no", 2)) {
+   no = true;
+   token = strchr(toke

[PATCH iproute2 net-next V3] tc: flower: Refactor matching flags to be more user friendly

2017-01-19 Thread Paul Blakey
Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no"  (e.g nofrag) unsets the flag,
otherwise it wil be set.

Example:
# add a flower filter that will drop fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags frag \
action drop

# add a flower filter that will drop non-fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags nofrag \
action drop

Fixes: 22a8f019891c ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---

Hi,
Added a framework to add new flags more easily, such 
as the upcoming tcp_flags (see kernel cls_flower), and other ip_flags.

Thanks,
 Paul.


Changelog:

v3:
Changed prefix to "no" instead of "no_".

v2:
Changed delimiter to "/" to avoid shell pipe errors.


 man/man8/tc-flower.8 |  12 +-
 tc/f_flower.c| 117 ---
 2 files changed, 102 insertions(+), 27 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 2dd2c5e..77da10b 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -46,7 +46,9 @@ flower \- flow based traffic control filter
 .BR enc_dst_ip " | " enc_src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | "
 .B enc_dst_port
-.IR port_number
+.IR port_number " | "
+.BR ip_flags
+.IR IP_FLAGS
 .SH DESCRIPTION
 The
 .B flower
@@ -183,13 +185,19 @@ prefix length. If the prefix is missing, \fBtc\fR assumes 
a full-length
 host match.  Dst port
 .I NUMBER
 is a 16 bit UDP dst port.
+.TP
+.BI ip_flags " IP_FLAGS"
+.I IP_FLAGS
+may be either
+.BR frag " or " no_frag
+to match on fragmented packets or not respectively.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
 (\fBindev\fR,  \fBdst_mac\fR and \fBsrc_mac\fR)
 have no dependency, layer three matches
 (\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR,
-\fBarp_op\fR, \fBarp_tha\fR and \fBarp_sha\fR)
+\fBarp_op\fR, \fBarp_tha\fR, \fBarp_sha\fR and \fBip_flags\fR)
 depend on the
 .B protocol
 option of tc filter, layer four port matches
diff --git a/tc/f_flower.c b/tc/f_flower.c
index d301db3..3dffe2b 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -24,6 +24,10 @@
 #include "tc_util.h"
 #include "rt_names.h"
 
+enum flower_matching_flags {
+   FLOWER_IP_FLAGS,
+};
+
 enum flower_endpoint {
FLOWER_ENDPOINT_SRC,
FLOWER_ENDPOINT_DST
@@ -63,7 +67,7 @@ static void explain(void)
"   enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_key_id [ KEY-ID ] |\n"
-   "   matching_flags MATCHING-FLAGS | \n"
+   "   ip_flags IP-FLAGS | \n"
"   enc_dst_port [ port_number ] }\n"
"   FILTERID := X:Y:Z\n"
"   MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS 
}\n"
@@ -136,28 +140,56 @@ static int flower_parse_vlan_eth_type(char *str, __be16 
eth_type, int type,
return 0;
 }
 
-static int flower_parse_matching_flags(char *str, int type, int mask_type,
-  struct nlmsghdr *n)
-{
-   __u32 mtf, mtf_mask;
-   char *c;
+struct flag_to_string {
+   int flag;
+   enum flower_matching_flags type;
+   char *string;
+};
 
-   c = strchr(str, '/');
-   if (c)
-   *c = '\0';
+static struct flag_to_string flags_str[] = {
+   { TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOWER_IP_FLAGS, "frag" },
+};
 
-   if (get_u32(, str, 0))
-   return -1;
+static int flower_parse_matching_flags(char *str,
+  enum flower_matching_flags type,
+  __u32 *mtf, __u32 *mtf_mask)
+{
+   char *token;
+   bool no;
+   bool found;
+   int i;
 
-   if (c) {
-   if (get_u32(_mask, ++c, 0))
+   token = strtok(str, "/");
+
+   while (token) {
+   if (!strncmp(token, "no", 2)) {
+   no = true;
+   token = strchr(token, '_') + 1;
+   } else
+ 

Re: [PATCH iproute2 net-next V3] tc: flower: Refactor matching flags to be more user friendly

2017-01-19 Thread Paul Blakey



On 19/01/2017 16:14, Paul Blakey wrote:

Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no"  (e.g nofrag) unsets the flag,
otherwise it wil be set.

Example:
# add a flower filter that will drop fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags frag \
action drop

# add a flower filter that will drop non-fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags nofrag \
action drop

Fixes: 22a8f019891c ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---

Hi,
Added a framework to add new flags more easily, such
as the upcoming tcp_flags (see kernel cls_flower), and other ip_flags.

Thanks,
 Paul.


Changelog:

v3:
Changed prefix to "no" instead of "no_".

v2:
Changed delimiter to "/" to avoid shell pipe errors.


 man/man8/tc-flower.8 |  12 +-
 tc/f_flower.c| 117 ---
 2 files changed, 102 insertions(+), 27 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 2dd2c5e..77da10b 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -46,7 +46,9 @@ flower \- flow based traffic control filter
 .BR enc_dst_ip " | " enc_src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | "
 .B enc_dst_port
-.IR port_number
+.IR port_number " | "
+.BR ip_flags
+.IR IP_FLAGS
 .SH DESCRIPTION
 The
 .B flower
@@ -183,13 +185,19 @@ prefix length. If the prefix is missing, \fBtc\fR assumes 
a full-length
 host match.  Dst port
 .I NUMBER
 is a 16 bit UDP dst port.
+.TP
+.BI ip_flags " IP_FLAGS"
+.I IP_FLAGS
+may be either
+.BR frag " or " no_frag
+to match on fragmented packets or not respectively.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
 (\fBindev\fR,  \fBdst_mac\fR and \fBsrc_mac\fR)
 have no dependency, layer three matches
 (\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR,
-\fBarp_op\fR, \fBarp_tha\fR and \fBarp_sha\fR)
+\fBarp_op\fR, \fBarp_tha\fR, \fBarp_sha\fR and \fBip_flags\fR)
 depend on the
 .B protocol
 option of tc filter, layer four port matches
diff --git a/tc/f_flower.c b/tc/f_flower.c
index d301db3..3dffe2b 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -24,6 +24,10 @@
 #include "tc_util.h"
 #include "rt_names.h"

+enum flower_matching_flags {
+   FLOWER_IP_FLAGS,
+};
+
 enum flower_endpoint {
FLOWER_ENDPOINT_SRC,
FLOWER_ENDPOINT_DST
@@ -63,7 +67,7 @@ static void explain(void)
"   enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_key_id [ KEY-ID ] |\n"
-   "   matching_flags MATCHING-FLAGS | \n"
+   "   ip_flags IP-FLAGS | \n"
"   enc_dst_port [ port_number ] }\n"
"   FILTERID := X:Y:Z\n"
"   MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS 
}\n"
@@ -136,28 +140,56 @@ static int flower_parse_vlan_eth_type(char *str, __be16 
eth_type, int type,
return 0;
 }

-static int flower_parse_matching_flags(char *str, int type, int mask_type,
-  struct nlmsghdr *n)
-{
-   __u32 mtf, mtf_mask;
-   char *c;
+struct flag_to_string {
+   int flag;
+   enum flower_matching_flags type;
+   char *string;
+};

-   c = strchr(str, '/');
-   if (c)
-   *c = '\0';
+static struct flag_to_string flags_str[] = {
+   { TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOWER_IP_FLAGS, "frag" },
+};

-   if (get_u32(, str, 0))
-   return -1;
+static int flower_parse_matching_flags(char *str,
+  enum flower_matching_flags type,
+  __u32 *mtf, __u32 *mtf_mask)
+{
+   char *token;
+   bool no;
+   bool found;
+   int i;

-   if (c) {
-   if (get_u32(_mask, ++c, 0))
+   token = strtok(str, "/");
+
+   while (token) {
+   if (!strncmp(token, "no", 2)) {
+   no = true;
+   token = strchr(toke

Re: [PATCH iproute2 net-next V2] tc: flower: Refactor matching flags to be more user friendly

2017-01-19 Thread Paul Blakey



On 19/01/2017 16:04, Jiri Benc wrote:

On Thu, 19 Jan 2017 14:17:57 +0200, Paul Blakey wrote:

Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no_"  (e.g no_frag) unsets the flag,
otherwise it wil be set.


Other parts of the tc tool use the "no" prefix (without the
underscore), such as "nofrag" in u32 and pedit, "noecn" in fq_codel and
pie, etc. We should follow the suit here.

Looks good to me otherwise, thanks a lot for doing the work!

 Jiri



Thanks, v3 is up.


Re: [PATCH iproute2 net-next] tc: flower: Refactor matching flags to be more user friendly

2017-01-19 Thread Paul Blakey



On 19/01/2017 13:32, Paul Blakey wrote:

Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no_"  (e.g no_frag) unsets the flag,
otherwise it wil be set.

Example:
# add a flower filter that will drop fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags frag \
action drop

# add a flower filter that will drop non-fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags no_frag \
action drop

Fixes: 22a8f019891c ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---

Hi,
Added a framework to add new flags more easily, such
as the upcoming tcp_flags (see kernel cls_flower), and other ip_flags.

Thanks,
 Paul.


 man/man8/tc-flower.8 |  12 +-
 tc/f_flower.c| 117 ---
 2 files changed, 102 insertions(+), 27 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 2dd2c5e..77da10b 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@ -46,7 +46,9 @@ flower \- flow based traffic control filter
 .BR enc_dst_ip " | " enc_src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | "
 .B enc_dst_port
-.IR port_number
+.IR port_number " | "
+.BR ip_flags
+.IR IP_FLAGS
 .SH DESCRIPTION
 The
 .B flower
@@ -183,13 +185,19 @@ prefix length. If the prefix is missing, \fBtc\fR assumes 
a full-length
 host match.  Dst port
 .I NUMBER
 is a 16 bit UDP dst port.
+.TP
+.BI ip_flags " IP_FLAGS"
+.I IP_FLAGS
+may be either
+.BR frag " or " no_frag
+to match on fragmented packets or not respectively.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
 (\fBindev\fR,  \fBdst_mac\fR and \fBsrc_mac\fR)
 have no dependency, layer three matches
 (\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR,
-\fBarp_op\fR, \fBarp_tha\fR and \fBarp_sha\fR)
+\fBarp_op\fR, \fBarp_tha\fR, \fBarp_sha\fR and \fBip_flags\fR)
 depend on the
 .B protocol
 option of tc filter, layer four port matches
diff --git a/tc/f_flower.c b/tc/f_flower.c
index d301db3..3bafc75 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -24,6 +24,10 @@
 #include "tc_util.h"
 #include "rt_names.h"

+enum flower_matching_flags {
+   FLOWER_IP_FLAGS,
+};
+
 enum flower_endpoint {
FLOWER_ENDPOINT_SRC,
FLOWER_ENDPOINT_DST
@@ -63,7 +67,7 @@ static void explain(void)
"   enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_key_id [ KEY-ID ] |\n"
-   "   matching_flags MATCHING-FLAGS | \n"
+   "   ip_flags IP-FLAGS | \n"
"   enc_dst_port [ port_number ] }\n"
"   FILTERID := X:Y:Z\n"
"   MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS 
}\n"
@@ -136,28 +140,56 @@ static int flower_parse_vlan_eth_type(char *str, __be16 
eth_type, int type,
return 0;
 }

-static int flower_parse_matching_flags(char *str, int type, int mask_type,
-  struct nlmsghdr *n)
-{
-   __u32 mtf, mtf_mask;
-   char *c;
+struct flag_to_string {
+   int flag;
+   enum flower_matching_flags type;
+   char *string;
+};

-   c = strchr(str, '/');
-   if (c)
-   *c = '\0';
+static struct flag_to_string flags_str[] = {
+   { TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOWER_IP_FLAGS, "frag" },
+};

-   if (get_u32(, str, 0))
-   return -1;
+static int flower_parse_matching_flags(char *str,
+  enum flower_matching_flags type,
+  __u32 *mtf, __u32 *mtf_mask)
+{
+   char *token;
+   bool no;
+   bool found;
+   int i;

-   if (c) {
-   if (get_u32(_mask, ++c, 0))
+   token = strtok(str, "|");
+
+   while (token) {
+   if (!strncmp(token, "no_", 3)) {
+   no = true;
+   token = strchr(token, '_') + 1;
+   } else
+   no = false;
+
+   found 

[PATCH iproute2 net-next V2] tc: flower: Refactor matching flags to be more user friendly

2017-01-19 Thread Paul Blakey
Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no_"  (e.g no_frag) unsets the flag,
otherwise it wil be set.

Example:
# add a flower filter that will drop fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags frag \
action drop

# add a flower filter that will drop non-fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags no_frag \
action drop

Fixes: 22a8f019891c ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---

Hi,
Added a framework to add new flags more easily, such 
as the upcoming tcp_flags (see kernel cls_flower), and other ip_flags.

Thanks,
 Paul.


 man/man8/tc-flower.8 |  12 +-
 tc/f_flower.c| 117 ---
 2 files changed, 102 insertions(+), 27 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 2dd2c5e..77da10b 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -46,7 +46,9 @@ flower \- flow based traffic control filter
 .BR enc_dst_ip " | " enc_src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | "
 .B enc_dst_port
-.IR port_number
+.IR port_number " | "
+.BR ip_flags
+.IR IP_FLAGS
 .SH DESCRIPTION
 The
 .B flower
@@ -183,13 +185,19 @@ prefix length. If the prefix is missing, \fBtc\fR assumes 
a full-length
 host match.  Dst port
 .I NUMBER
 is a 16 bit UDP dst port.
+.TP
+.BI ip_flags " IP_FLAGS"
+.I IP_FLAGS
+may be either
+.BR frag " or " no_frag
+to match on fragmented packets or not respectively.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
 (\fBindev\fR,  \fBdst_mac\fR and \fBsrc_mac\fR)
 have no dependency, layer three matches
 (\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR,
-\fBarp_op\fR, \fBarp_tha\fR and \fBarp_sha\fR)
+\fBarp_op\fR, \fBarp_tha\fR, \fBarp_sha\fR and \fBip_flags\fR)
 depend on the
 .B protocol
 option of tc filter, layer four port matches
diff --git a/tc/f_flower.c b/tc/f_flower.c
index d301db3..af2ab55 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -24,6 +24,10 @@
 #include "tc_util.h"
 #include "rt_names.h"
 
+enum flower_matching_flags {
+   FLOWER_IP_FLAGS,
+};
+
 enum flower_endpoint {
FLOWER_ENDPOINT_SRC,
FLOWER_ENDPOINT_DST
@@ -63,7 +67,7 @@ static void explain(void)
"   enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_key_id [ KEY-ID ] |\n"
-   "   matching_flags MATCHING-FLAGS | \n"
+   "   ip_flags IP-FLAGS | \n"
"   enc_dst_port [ port_number ] }\n"
"   FILTERID := X:Y:Z\n"
"   MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS 
}\n"
@@ -136,28 +140,56 @@ static int flower_parse_vlan_eth_type(char *str, __be16 
eth_type, int type,
return 0;
 }
 
-static int flower_parse_matching_flags(char *str, int type, int mask_type,
-  struct nlmsghdr *n)
-{
-   __u32 mtf, mtf_mask;
-   char *c;
+struct flag_to_string {
+   int flag;
+   enum flower_matching_flags type;
+   char *string;
+};
 
-   c = strchr(str, '/');
-   if (c)
-   *c = '\0';
+static struct flag_to_string flags_str[] = {
+   { TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOWER_IP_FLAGS, "frag" },
+};
 
-   if (get_u32(, str, 0))
-   return -1;
+static int flower_parse_matching_flags(char *str,
+  enum flower_matching_flags type,
+  __u32 *mtf, __u32 *mtf_mask)
+{
+   char *token;
+   bool no;
+   bool found;
+   int i;
 
-   if (c) {
-   if (get_u32(_mask, ++c, 0))
+   token = strtok(str, "/");
+
+   while (token) {
+   if (!strncmp(token, "no_", 3)) {
+   no = true;
+   token = strchr(token, '_') + 1;
+   } else
+   no = false;
+
+   found = false;
+   for (i = 0; i < ARRAY_SIZE(flags_str); i++) {
+   if (type != f

[PATCH iproute2 net-next] tc: flower: Refactor matching flags to be more user friendly

2017-01-19 Thread Paul Blakey
Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no_"  (e.g no_frag) unsets the flag,
otherwise it wil be set.

Example:
# add a flower filter that will drop fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags frag \
action drop

# add a flower filter that will drop non-fragmented packets
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags no_frag \
action drop

Fixes: 22a8f019891c ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---

Hi,
Added a framework to add new flags more easily, such 
as the upcoming tcp_flags (see kernel cls_flower), and other ip_flags.

Thanks,
 Paul.


 man/man8/tc-flower.8 |  12 +-
 tc/f_flower.c| 117 ---
 2 files changed, 102 insertions(+), 27 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 2dd2c5e..77da10b 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@ -46,7 +46,9 @@ flower \- flow based traffic control filter
 .BR enc_dst_ip " | " enc_src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | "
 .B enc_dst_port
-.IR port_number
+.IR port_number " | "
+.BR ip_flags
+.IR IP_FLAGS
 .SH DESCRIPTION
 The
 .B flower
@@ -183,13 +185,19 @@ prefix length. If the prefix is missing, \fBtc\fR assumes 
a full-length
 host match.  Dst port
 .I NUMBER
 is a 16 bit UDP dst port.
+.TP
+.BI ip_flags " IP_FLAGS"
+.I IP_FLAGS
+may be either
+.BR frag " or " no_frag
+to match on fragmented packets or not respectively.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
 (\fBindev\fR,  \fBdst_mac\fR and \fBsrc_mac\fR)
 have no dependency, layer three matches
 (\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR,
-\fBarp_op\fR, \fBarp_tha\fR and \fBarp_sha\fR)
+\fBarp_op\fR, \fBarp_tha\fR, \fBarp_sha\fR and \fBip_flags\fR)
 depend on the
 .B protocol
 option of tc filter, layer four port matches
diff --git a/tc/f_flower.c b/tc/f_flower.c
index d301db3..3bafc75 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -24,6 +24,10 @@
 #include "tc_util.h"
 #include "rt_names.h"
 
+enum flower_matching_flags {
+   FLOWER_IP_FLAGS,
+};
+
 enum flower_endpoint {
FLOWER_ENDPOINT_SRC,
FLOWER_ENDPOINT_DST
@@ -63,7 +67,7 @@ static void explain(void)
"   enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_key_id [ KEY-ID ] |\n"
-   "   matching_flags MATCHING-FLAGS | \n"
+   "   ip_flags IP-FLAGS | \n"
"   enc_dst_port [ port_number ] }\n"
"   FILTERID := X:Y:Z\n"
"   MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS 
}\n"
@@ -136,28 +140,56 @@ static int flower_parse_vlan_eth_type(char *str, __be16 
eth_type, int type,
return 0;
 }
 
-static int flower_parse_matching_flags(char *str, int type, int mask_type,
-  struct nlmsghdr *n)
-{
-   __u32 mtf, mtf_mask;
-   char *c;
+struct flag_to_string {
+   int flag;
+   enum flower_matching_flags type;
+   char *string;
+};
 
-   c = strchr(str, '/');
-   if (c)
-   *c = '\0';
+static struct flag_to_string flags_str[] = {
+   { TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOWER_IP_FLAGS, "frag" },
+};
 
-   if (get_u32(, str, 0))
-   return -1;
+static int flower_parse_matching_flags(char *str,
+  enum flower_matching_flags type,
+  __u32 *mtf, __u32 *mtf_mask)
+{
+   char *token;
+   bool no;
+   bool found;
+   int i;
 
-   if (c) {
-   if (get_u32(_mask, ++c, 0))
+   token = strtok(str, "|");
+
+   while (token) {
+   if (!strncmp(token, "no_", 3)) {
+   no = true;
+   token = strchr(token, '_') + 1;
+   } else
+   no = false;
+
+   found = false;
+   for (i = 0; i < ARRAY_SIZE(flags_str); i++) {
+   if (type != f

Re: [PATCH iproute2 net-next] tc: flower: support matching flags

2017-01-19 Thread Paul Blakey



On 18/01/2017 14:41, Jiri Benc wrote:

On Wed, 4 Jan 2017 12:55:59 +0100, Jiri Benc wrote:

On Wed, 4 Jan 2017 13:51:13 +0200, Paul Blakey wrote:

It mimics the kernel packing of flags, I have no problem either way
(flags, or ip_flags/tcp_flags pairs), what do you think jiri?


What Simon says makes sense to me. ip_flags and tcp_flags sounds like
the best solution so far (even better than my original suggestion).


Is there any progress with the follow up patch? I don't think we want
iproute2 with the magic numbers to be released.

Thanks,

 Jiri





Hi,
I've posted a patch:
"[PATCH iproute2 net-next] tc: flower: Refactor matching flags to be 
more user friendly"


Thanks,
Paul.


Re: [PATCH net-next v4 1/2] net sched actions: Add support for user cookies

2017-01-18 Thread Paul Blakey



On 17/01/2017 13:11, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to save
user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it.  The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

--- 127.0.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2109ms
rtt min/avg/max/mdev = 0.020/0.028/0.038/0.008 ms 1

... show some stats
$ sudo $TC -s actions get action gact index 1

action order 1: gact action pass
 random type none pass val 0
 index 1 ref 2 bind 1 installed 204 sec used 5 sec
Action statistics:
Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. try longer cookie...
$ sudo $TC actions replace action ok index 1 cookie 1234567890abcdef
.. dump..
$ sudo $TC -s actions ls action gact

action order 1: gact action pass
 random type none pass val 0
 index 1 ref 2 bind 1 installed 204 sec used 5 sec
Action statistics:
Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie 1234567890abcdef

Signed-off-by: Jamal Hadi Salim 
---
 include/net/act_api.h|  1 +
 include/net/pkt_cls.h|  8 
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/act_api.c  | 25 +
 4 files changed, 37 insertions(+)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 1d71644..0692458 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -41,6 +41,7 @@ struct tc_action {
struct rcu_head tcfa_rcu;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
struct gnet_stats_queue __percpu *cpu_qstats;
+   struct tc_cookie*act_ck;
 };
 #define tcf_head   common.tcfa_head
 #define tcf_index  common.tcfa_index
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f0a0514..e0bc7e8 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -515,4 +515,12 @@ struct tc_cls_bpf_offload {
u32 gen_flags;
 };

+
+/* This structure holds cookie structure that is passed from user
+ * to the kernel for actions and classifiers
+ */
+struct tc_cookie {
+   unsigned char ck[TC_COOKIE_MAX_SIZE];
+   unsigned char ck_len;
+};
 #endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 1e5e1dd..2d2414e 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -4,6 +4,8 @@
 #include 
 #include 

+#define TC_COOKIE_MAX_SIZE 16
+
 /* Action attributes */
 enum {
TCA_ACT_UNSPEC,
@@ -12,6 +14,7 @@ enum {
TCA_ACT_INDEX,
TCA_ACT_STATS,
TCA_ACT_PAD,
+   TCA_ACT_COOKIE,
__TCA_ACT_MAX
 };

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index f04715a..43f1f42 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 

@@ -33,6 +34,7 @@ static void free_tcf(struct rcu_head *head)

free_percpu(p->cpu_bstats);
free_percpu(p->cpu_qstats);
+   kfree(p->act_ck);
kfree(p);
 }

@@ -475,6 +477,12 @@ int tcf_action_destroy(struct list_head *actions, int bind)
goto nla_put_failure;
if (tcf_action_copy_stats(skb, a, 0))
goto nla_put_failure;
+   if (a->act_ck) {
+   if (nla_put(skb, TCA_ACT_COOKIE, a->act_ck->ck_len,
+   a->act_ck))
+   goto nla_put_failure;
+   }
+
nest = nla_nest_start(skb, TCA_OPTIONS);
if (nest == NULL)
goto nla_put_failure;
@@ -575,6 +583,23 @@ struct tc_action *tcf_action_init_1(struct net *net, 
struct nlattr *nla,
if (err < 0)
   

Re: [PATCH net-next] net/sched: cls_flower: Add user specified data

2017-01-17 Thread Paul Blakey



On 17/01/2017 13:23, Jamal Hadi Salim wrote:

On 17-01-16 04:51 AM, Jiri Pirko wrote:

Mon, Jan 16, 2017 at 08:54:18AM CET, pa...@mellanox.com wrote:







I think we should do it in a generic way, for every classifier, right
away. Same as Jamal is doing for actions. I think that first we should
get Jamal's patch merged and then do the same for classifiers.



Should be "trivial" like my patch.
Add a new TLV type in TCA_XXX enum in rtnetlink.h
in  tc_ctl_tfilter look for it and memcpy it
in tcf_fill_node set it on the new TCA_XXX if the cls struct
has it present.



That's exactly what I did before I realized it won't work per internal 
element (per handle), which is what I want. see my example.


So I'll probably implement it like actions dumping/init works - 
tcf_exts_init, tcf_exts_dump (calling a generic functions on all 
classifiers who want this).
I'll add something like get_cookie, dump_cookie. which get/set the 
TCA_COOKIE.



Thanks,
Paul.


Look at my patch (1 of 2) which I just reposted.

cheers,
jamal


[PATCH net-next] net/sched: cls_flower: Disallow duplicate internal elements

2017-01-16 Thread Paul Blakey
Flower currently allows having the same filter twice with the same
priority. Actions (and statistics update) will always execute on the
first inserted rule leaving the second rule unused.
This patch disallows that.

Signed-off-by: Paul Blakey <pa...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 net/sched/cls_flower.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index a3bfda3..2793445 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -134,6 +134,14 @@ static void fl_clear_masked_range(struct fl_flow_key *key,
memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
 }
 
+static struct cls_fl_filter *fl_lookup(struct cls_fl_head *head,
+  struct fl_flow_key *mkey)
+{
+   return rhashtable_lookup_fast(>ht,
+ fl_key_get_start(mkey, >mask),
+ head->ht_params);
+}
+
 static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
   struct tcf_result *res)
 {
@@ -181,9 +189,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
 
fl_set_masked_key(_mkey, _key, >mask);
 
-   f = rhashtable_lookup_fast(>ht,
-  fl_key_get_start(_mkey, >mask),
-  head->ht_params);
+   f = fl_lookup(head, _mkey);
if (f && !tc_skip_sw(f->flags)) {
*res = f->res;
return tcf_exts_exec(skb, >exts, res);
@@ -875,6 +881,11 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
goto errout;
 
if (!tc_skip_sw(fnew->flags)) {
+   if (!fold && fl_lookup(head, >mkey)) {
+   err = -EEXIST;
+   goto errout;
+   }
+
err = rhashtable_insert_fast(>ht, >ht_node,
 head->ht_params);
if (err)
-- 
2.7.4



Re: [PATCH net-next] net/sched: cls_flower: Add user specified data

2017-01-15 Thread Paul Blakey



On 15/01/2017 21:08, John Fastabend wrote:

On 17-01-15 09:36 AM, Paul Blakey wrote:



On 08/01/2017 19:12, Jiri Pirko wrote:

Mon, Jan 02, 2017 at 03:59:49PM CET, j...@mojatatu.com wrote:


We have been using a cookie as well for actions (which we have been
using but have been too lazy to submit so far). I am going to port
it over to the newer kernels and post it.


Hard to deal with something we can't look at :)



In our case that is intended to be opaque to the kernel i.e kernel
never inteprets it; in that case it is similar to the kernel
FIB protocol field.


In case of this patch, kernel also never interprets it. What makes you
think otherwise. Bot kernel, it is always a binary blob.




In your case - could this cookie have been a class/flowid
(a 32 bit)?
And would it not make more sense for it the cookie to be
generic to all classifiers? i.e why is it specific to flower?


Correct, makes sense to have it generic for all cls and perhaps also
acts.




Hi all,
I've started implementing a general cookie for all classifiers,
I added the cookie on tcf_proto struct but then I realized that it won't work as
I want. What I want is cookie per internal element (those that are identified by
handles), which we can have many per one tcf_proto:

tc filter add dev  parent : prio 1 cookie 0x123 basic action drop
tc filter add dev  parent : prio 1 cookie 0x456 "port6" basic action 
drop
tc filter add dev  parent : prio 1 cookie 0x777 basic action drop

So there is three options to do that:
1) Implement it in each classifier, using some generic function. (kinda like
stats, where classifiler_dump() calls tcf_exts_dump_stats())
2) Have a global hash table for cookies [prio,handle]->[cookie]
3) Have it on flower only for now :)

With the first one we will have to change each classifier (or leave it
unsupported as default).
Second is less code to change (instead of changing each classifier), but might
be slower. I'm afraid how it will affect dumping several filters.
Third is this patch.

Thanks,
Paul.


Avoid (2) it creates way more problems than its worth like is it locked/not
locked, how is it synced, collisions, etc. Seems way overkill.

I like the generic functionality of (1) but unless we see this pop up in
different filters I wouldn't require it for now. If you just do it in flower
(option 3) when its added to another classifier we can generalize it. As a
middle ground creating nice helper routines as needed goes a long way.

.John



Hi all,
So is everyone ok with leaving the commit as is, only adding user data
to flower for now?

Thanks,
Paul.





Re: [PATCH net-next] net/sched: cls_flower: Add user specified data

2017-01-15 Thread Paul Blakey



On 08/01/2017 19:12, Jiri Pirko wrote:

Mon, Jan 02, 2017 at 03:59:49PM CET, j...@mojatatu.com wrote:


We have been using a cookie as well for actions (which we have been
using but have been too lazy to submit so far). I am going to port
it over to the newer kernels and post it.


Hard to deal with something we can't look at :)



In our case that is intended to be opaque to the kernel i.e kernel
never inteprets it; in that case it is similar to the kernel
FIB protocol field.


In case of this patch, kernel also never interprets it. What makes you
think otherwise. Bot kernel, it is always a binary blob.




In your case - could this cookie have been a class/flowid
(a 32 bit)?
And would it not make more sense for it the cookie to be
generic to all classifiers? i.e why is it specific to flower?


Correct, makes sense to have it generic for all cls and perhaps also
acts.




Hi all,
I've started implementing a general cookie for all classifiers,
I added the cookie on tcf_proto struct but then I realized that it won't 
work as I want. What I want is cookie per internal element (those that 
are identified by handles), which we can have many per one tcf_proto:


tc filter add dev  parent : prio 1 cookie 0x123 basic action drop
tc filter add dev  parent : prio 1 cookie 0x456 "port6" basic 
action drop

tc filter add dev  parent : prio 1 cookie 0x777 basic action drop

So there is three options to do that:
1) Implement it in each classifier, using some generic function. (kinda 
like stats, where classifiler_dump() calls tcf_exts_dump_stats())

2) Have a global hash table for cookies [prio,handle]->[cookie]
3) Have it on flower only for now :)

With the first one we will have to change each classifier (or leave it 
unsupported as default).
Second is less code to change (instead of changing each classifier), but 
might be slower. I'm afraid how it will affect dumping several filters.

Third is this patch.

Thanks,
Paul.





cheers,
jamal

On 17-01-02 08:13 AM, Paul Blakey wrote:

This is to support saving extra data that might be helpful on retrieval.
First use case is upcoming openvswitch flow offloads, extra data will
include UFID and port mappings for each added flow.

Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/cls_flower.c   | 22 +-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index cb4bcdc..ca9bbe3 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -471,10 +471,13 @@ enum {
TCA_FLOWER_KEY_ICMPV6_TYPE, /* u8 */
TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,/* u8 */

+   TCA_FLOWER_COOKIE,  /* binary */
+
__TCA_FLOWER_MAX,
 };

 #define TCA_FLOWER_MAX (__TCA_FLOWER_MAX - 1)
+#define FLOWER_MAX_COOKIE_SIZE 128

 enum {
TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT = (1 << 0),
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 333f8e2..e2f5b25 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -85,6 +85,8 @@ struct cls_fl_filter {
struct rcu_head rcu;
struct tc_to_netdev tc;
struct net_device *hw_dev;
+   size_t cookie_len;
+   long cookie[0];
 };

 static unsigned short int fl_mask_range(const struct fl_flow_mask *mask)
@@ -794,6 +796,9 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
struct cls_fl_filter *fnew;
struct nlattr *tb[TCA_FLOWER_MAX + 1];
struct fl_flow_mask mask = {};
+   const struct nlattr *attr;
+   size_t cookie_len = 0;
+   void *cookie;
int err;

if (!tca[TCA_OPTIONS])
@@ -806,10 +811,22 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
if (fold && handle && fold->handle != handle)
return -EINVAL;

-   fnew = kzalloc(sizeof(*fnew), GFP_KERNEL);
+   if (tb[TCA_FLOWER_COOKIE]) {
+   attr = tb[TCA_FLOWER_COOKIE];
+   cookie_len = nla_len(attr);
+   cookie = nla_data(attr);
+   if (cookie_len > FLOWER_MAX_COOKIE_SIZE)
+   return -EINVAL;
+   }
+
+   fnew = kzalloc(sizeof(*fnew) + cookie_len, GFP_KERNEL);
if (!fnew)
return -ENOBUFS;

+   fnew->cookie_len = cookie_len;
+   if (cookie_len)
+   memcpy(fnew->cookie, cookie, cookie_len);
+
err = tcf_exts_init(>exts, TCA_FLOWER_ACT, 0);
if (err < 0)
goto errout;
@@ -1151,6 +1168,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
unsigned long fh,

nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);

+   if (f->cookie_len)
+   nla_put(skb, TCA_FLOWER_COOKIE, f->cookie_len, f->cookie);
+
if (tcf_exts_dump(skb, >exts))
goto nla_put_failure;






Re: [PATCH iproute2 net-next] tc: flower: support matching flags

2017-01-04 Thread Paul Blakey



On 04/01/2017 12:33, Simon Horman wrote:

On Tue, Jan 03, 2017 at 01:54:34PM +0200, Paul Blakey wrote:

...

Hi Paul,


Matching name was from the idea that we are doing is matching.
And regarding documentation/flag names I didn't want tc tool to be need of a
update each time a new flag is introduced,
But I guess I can add two options like with ip_proto where you can specify
known flags by name but can also give a value.
What do you think about that?

flags  / <HEX'/'HEX>
FLAGS => frag/no_frag/tcp_syn/no_tcp_syn ['|']*
e.g: flags frag|no_tcp_syn or flags 0x01/0x15
and the mask will have a on bits corresponds only to those flags specified.

I suppose a flag is a flag and bitwise masking allows arbitrary matching
on one or more flags. But I wonder if, as per your example above,
it makes sense to mix IP (frag) and TCP flags in the same field of the
classifier.
It mimics the kernel packing of flags, I have no problem either way 
(flags, or ip_flags/tcp_flags pairs), what do you think jiri?




Re: [PATCH net-next] net/sched: cls_flower: Add user specified data

2017-01-04 Thread Paul Blakey



On 04/01/2017 12:14, Simon Horman wrote:

On Tue, Jan 03, 2017 at 02:22:05PM +0200, Paul Blakey wrote:


On 03/01/2017 13:44, Jamal Hadi Salim wrote:

On 17-01-02 11:33 PM, John Fastabend wrote:

On 17-01-02 05:22 PM, Jamal Hadi Salim wrote:

[..]

Like all cookie semantics it is for storing state. The receiver
(kernel)
is not just store it and not intepret it. The user when reading it back
simplifies what they have to do for their processing.


The tuple  really should be unique why
not use this for system wide mappings?


I think on a single machine should be enough, however:
typically the user wants to define the value in a manner that
in a distributed system it is unique. It would be trickier to
do so with well defined values such as above.


Just extend the tuple  that
should be unique in the domain of hostname's, or use some other domain
wide machine identifier.


May work for the case of filter identification. The nice thing for
allowing cookies is you can let the user define it define their
own scheme.


Although actions can be shared so the cookie can be shared across
filters. Maybe its useful but it doesn't uniquely identify a filter
in the shared case but the user would have to specify that case
so maybe its not important.


Note: the action cookies and filter cookies are unrelated/orthogonal.
Their basic concept of stashing something in the cookie to help improve
what user space does (in our case millions of actions of which some are
used for accounting) is similar.
I have no objections to the flow cookies; my main concern was it should
be applicable to all classifiers not just flower. And the arbitrary size
of the cookie that you pointed out is questionable.

cheers,
jamal


Hi all,
Our use case is replacing OVS rules with TC filters for HW offload, and
you're are right the cookie would
have saved us the mapping from OVS rule ufid to the tc filter handle/prio...
that was generated for it.
It also was going to be used to store other info like which OVS output port
corresponds to the ifindex,

Possibly off-topic but I am curious to know why you need to store the port.
My possibly naïve assumption is that a filter is attached to the netdev
corresponding to the input port and mirred or other actions are used to output
to netdevs corresponding to output ports.


Right, its for the output ports, OVS uses ovs port numbers and mirred 
action uses the device ifindex, so there is need

to translate it back to OVS port on dump.




so we need 128+32 for now. It helps us with dumping the the flows back, when
we lose data on crash
or restarting the user space daemon.
HW hints is another thing that might be helpful.
Its binary blob because user/app specifc and its usage might change in the
future and its and that's why there
is some headroom with size as well.




Re: [PATCH net-next] net/sched: cls_flower: Add user specified data

2017-01-03 Thread Paul Blakey



On 03/01/2017 13:44, Jamal Hadi Salim wrote:

On 17-01-02 11:33 PM, John Fastabend wrote:

On 17-01-02 05:22 PM, Jamal Hadi Salim wrote:


[..]
Like all cookie semantics it is for storing state. The receiver 
(kernel)

is not just store it and not intepret it. The user when reading it back
simplifies what they have to do for their processing.



The tuple  really should be unique why
not use this for system wide mappings?



I think on a single machine should be enough, however:
typically the user wants to define the value in a manner that
in a distributed system it is unique. It would be trickier to
do so with well defined values such as above.



Just extend the tuple  that
should be unique in the domain of hostname's, or use some other domain
wide machine identifier.



May work for the case of filter identification. The nice thing for
allowing cookies is you can let the user define it define their
own scheme.


Although actions can be shared so the cookie can be shared across
filters. Maybe its useful but it doesn't uniquely identify a filter
in the shared case but the user would have to specify that case
so maybe its not important.



Note: the action cookies and filter cookies are unrelated/orthogonal.
Their basic concept of stashing something in the cookie to help improve
what user space does (in our case millions of actions of which some are
used for accounting) is similar.
I have no objections to the flow cookies; my main concern was it should
be applicable to all classifiers not just flower. And the arbitrary size
of the cookie that you pointed out is questionable.

cheers,
jamal



Hi all,
Our use case is replacing OVS rules with TC filters for HW offload, and 
you're are right the cookie would
have saved us the mapping from OVS rule ufid to the tc filter 
handle/prio... that was generated for it.
It also was going to be used to store other info like which OVS output 
port corresponds to the ifindex,
so we need 128+32 for now. It helps us with dumping the the flows back, 
when we lose data on crash

or restarting the user space daemon.
HW hints is another thing that might be helpful.
Its binary blob because user/app specifc and its usage might change in 
the future and its and that's why there

is some headroom with size as well.


I don't mind having it on TC level but I didn't want to intervene with 
all filters/TC.





Re: [PATCH iproute2 net-next] tc: flower: support matching flags

2017-01-03 Thread Paul Blakey

On 02/01/2017 20:55, Jiri Benc wrote:

On Wed, 28 Dec 2016 15:06:49 +0200, Paul Blakey wrote:

Enhance flower to support matching on flags.

The 1st flag allows to match on whether the packet is
an IP fragment.

Example:

# add a flower filter that will drop fragmented packets
# (bit 0 of control flags)
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
matching_flags 0x1/0x1 \
action drop

This is very poor API. First, how is the user supposed to know what
those magic values in "matching_flags" mean? At the very least, it
should be documented in the man page.

Second, why "matching_flags"? That name suggests that those modify the
way the matching is done (to illustrate my point, I'd expect things
like "if the packet is too short, match this rule anyway" to be a
"matching flag"). But this is not the case. What's wrong with plain
"flags"? Or, if you want to be more specific, perhaps packet_flags?

Third, all of this looks very wrong anyway. There should be separate
keywords for individual flags. In this case, there should be an
"ip_fragment" flag. The tc tool should be responsible for putting the
flags together and creating the appropriate mask. The example would
then be:

tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_fragment yes\
action drop

I don't care whether it's "ip_fragment yes/no", "ip_fragment 1/0",
"ip_fragment/noip_fragment" or similar. The important thing is it's a
boolean flag; if specified, it's set to 0/1 and unmasked, if not
specified, it's wildcarded.

Stephen, I understand that you already applied this patch but given how
horrible the proposed API is and that's even undocumented in this
patch, please reconsider this. If this is released, the API is set in
stone and, frankly, it's very user unfriendly this way.

Paul, could you please prepare a patch that would introduce a more sane
API? I'd strongly prefer what I described under "third" but should you
strongly disagree, at least implement "second" and document the
currently known flag values.

Thanks,

  Jiri


Matching name was from the idea that we are doing is matching.
And regarding documentation/flag names I didn't want tc tool to be need 
of a update each time a new flag is introduced,
But I guess I can add two options like with ip_proto where you can 
specify known flags by name but can also give a value.

What do you think about that?

flags  / <HEX'/'HEX>
FLAGS => frag/no_frag/tcp_syn/no_tcp_syn ['|']*
e.g: flags frag|no_tcp_syn or flags 0x01/0x15
and the mask will have a on bits corresponds only to those flags specified.






[PATCH net-next] net/sched: cls_flower: Add user specified data

2017-01-02 Thread Paul Blakey
This is to support saving extra data that might be helpful on retrieval.
First use case is upcoming openvswitch flow offloads, extra data will
include UFID and port mappings for each added flow.

Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/cls_flower.c   | 22 +-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index cb4bcdc..ca9bbe3 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -471,10 +471,13 @@ enum {
TCA_FLOWER_KEY_ICMPV6_TYPE, /* u8 */
TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,/* u8 */
 
+   TCA_FLOWER_COOKIE,  /* binary */
+
__TCA_FLOWER_MAX,
 };
 
 #define TCA_FLOWER_MAX (__TCA_FLOWER_MAX - 1)
+#define FLOWER_MAX_COOKIE_SIZE 128
 
 enum {
TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT = (1 << 0),
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 333f8e2..e2f5b25 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -85,6 +85,8 @@ struct cls_fl_filter {
struct rcu_head rcu;
struct tc_to_netdev tc;
struct net_device *hw_dev;
+   size_t cookie_len;
+   long cookie[0];
 };
 
 static unsigned short int fl_mask_range(const struct fl_flow_mask *mask)
@@ -794,6 +796,9 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
struct cls_fl_filter *fnew;
struct nlattr *tb[TCA_FLOWER_MAX + 1];
struct fl_flow_mask mask = {};
+   const struct nlattr *attr;
+   size_t cookie_len = 0;
+   void *cookie;
int err;
 
if (!tca[TCA_OPTIONS])
@@ -806,10 +811,22 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
if (fold && handle && fold->handle != handle)
return -EINVAL;
 
-   fnew = kzalloc(sizeof(*fnew), GFP_KERNEL);
+   if (tb[TCA_FLOWER_COOKIE]) {
+   attr = tb[TCA_FLOWER_COOKIE];
+   cookie_len = nla_len(attr);
+   cookie = nla_data(attr);
+   if (cookie_len > FLOWER_MAX_COOKIE_SIZE)
+   return -EINVAL;
+   }
+
+   fnew = kzalloc(sizeof(*fnew) + cookie_len, GFP_KERNEL);
if (!fnew)
return -ENOBUFS;
 
+   fnew->cookie_len = cookie_len;
+   if (cookie_len)
+   memcpy(fnew->cookie, cookie, cookie_len);
+
err = tcf_exts_init(>exts, TCA_FLOWER_ACT, 0);
if (err < 0)
goto errout;
@@ -1151,6 +1168,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
unsigned long fh,
 
nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);
 
+   if (f->cookie_len)
+   nla_put(skb, TCA_FLOWER_COOKIE, f->cookie_len, f->cookie);
+
if (tcf_exts_dump(skb, >exts))
goto nla_put_failure;
 
-- 
1.8.3.1



[PATCH iproute2 net-next] tc: flower: support matching flags

2016-12-28 Thread Paul Blakey
Enhance flower to support matching on flags.

The 1st flag allows to match on whether the packet is
an IP fragment.

Example:

# add a flower filter that will drop fragmented packets
# (bit 0 of control flags)
tc filter add dev ens4f0 protocol ip parent : \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
matching_flags 0x1/0x1 \
action drop

Signed-off-by: Paul Blakey <pa...@mellanox.com>
Signed-off-by: Or Gerlitz <ogerl...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---
 man/man8/tc-flower.8 | 11 +++
 tc/f_flower.c| 53 +++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 90fdfba..40117a9 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -39,6 +39,8 @@ flower \- flow based traffic control filter
 .IR KEY-ID " | {"
 .BR enc_dst_ip " | " enc_src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | "
+.B matching_flags
+.IR MATCHING-FLAGS " }"
 .SH DESCRIPTION
 The
 .B flower
@@ -134,6 +136,15 @@ Match on IP tunnel metadata. Key id
 is a 32 bit tunnel key id (e.g. VNI for VXLAN tunnel).
 .I ADDRESS
 must be a valid IPv4 or IPv6 address.
+.TP
+.BI matching_flags " MATCHING-FLAGS"
+Match on various dissector flags.
+.I MATCHING-FLAGS
+may be specified with or without a mask:
+.BR FLAGS
+or
+.BR FLAGS/FLAGS_MASK
+where each arg is an unsigned 32bit value in hexadecimal format.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 5dac427..479f5e6 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -56,7 +56,8 @@ static void explain(void)
"   code ICMP-CODE }\n"
"   enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
-   "   enc_key_id [ KEY-ID ] }\n"
+   "   enc_key_id [ KEY-ID ] |\n"
+   "   matching_flags MATCHING-FLAGS }\n"
"   FILTERID := X:Y:Z\n"
"   ACTION-SPEC := ... look at individual actions\n"
"\n"
@@ -99,6 +100,31 @@ static int flower_parse_vlan_eth_type(char *str, __be16 
eth_type, int type,
return 0;
 }
 
+static int flower_parse_matching_flags(char *str, int type, int mask_type,
+  struct nlmsghdr *n)
+{
+   __u32 mtf, mtf_mask;
+   char *c;
+
+   c = strchr(str, '/');
+   if (c)
+   *c = '\0';
+
+   if (get_u32(, str, 0))
+   return -1;
+
+   if (c) {
+   if (get_u32(_mask, ++c, 0))
+   return -1;
+   } else {
+   mtf_mask = 0x;
+   }
+
+   addattr32(n, MAX_MSG, type, htonl(mtf));
+   addattr32(n, MAX_MSG, mask_type, htonl(mtf_mask));
+   return 0;
+}
+
 static int flower_parse_ip_proto(char *str, __be16 eth_type, int type,
 __u8 *p_ip_proto, struct nlmsghdr *n)
 {
@@ -314,6 +340,16 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
return -1;
}
addattr_l(n, MAX_MSG, TCA_FLOWER_CLASSID, , 4);
+   } else if (matches(*argv, "matching_flags") == 0) {
+   NEXT_ARG();
+   ret = flower_parse_matching_flags(*argv,
+ TCA_FLOWER_KEY_FLAGS,
+ 
TCA_FLOWER_KEY_FLAGS_MASK,
+ n);
+   if (ret < 0) {
+   fprintf(stderr, "Illegal \"matching_flags\"\n");
+   return -1;
+   }
} else if (matches(*argv, "skip_hw") == 0) {
flags |= TCA_CLS_FLAGS_SKIP_HW;
} else if (matches(*argv, "skip_sw") == 0) {
@@ -604,6 +640,17 @@ static void flower_print_ip_proto(FILE *f, __u8 
*p_ip_proto,
*p_ip_proto = ip_proto;
 }
 
+static void flower_print_matching_flags(FILE *f, char *name,
+   struct rtattr *attr,
+   struct rtattr *mask_attr)
+{
+   if (!mask_attr || RTA_PAYLOAD(mask_attr) != 4)
+   return;
+

[PATCH net] net/sched: cls_flower: Fix missing addr_type in classify

2016-12-28 Thread Paul Blakey
Since we now use a non zero mask on addr_type, we are matching on its
value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip
failed in SW/classify path since its value was zero.
This patch sets the proper value of addr_type for encapsulated packets.

Fixes: 970bfcd09791 ('net/sched: cls_flower: Use mask for addr_type')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Hadar Hen Zion <had...@mellanox.com>
---
 net/sched/cls_flower.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 333f8e2..970db7a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -153,10 +153,14 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
 
switch (ip_tunnel_info_af(info)) {
case AF_INET:
+   skb_key.enc_control.addr_type =
+   FLOW_DISSECTOR_KEY_IPV4_ADDRS;
skb_key.enc_ipv4.src = key->u.ipv4.src;
skb_key.enc_ipv4.dst = key->u.ipv4.dst;
break;
case AF_INET6:
+   skb_key.enc_control.addr_type =
+   FLOW_DISSECTOR_KEY_IPV6_ADDRS;
skb_key.enc_ipv6.src = key->u.ipv6.src;
skb_key.enc_ipv6.dst = key->u.ipv6.dst;
break;
-- 
1.8.3.1



[PATCH net 2/2] net/sched: cls_flower: Use masked key when calling HW offloads

2016-12-14 Thread Paul Blakey
Zero bits on the mask signify a "don't care" on the corresponding bits
in key. Some HWs require those bits on the key to be zero. Since these
bits are masked anyway, it's okay to provide the masked key to all
drivers.

Fixes: 5b33f48842fa ('net/flower: Introduce hardware offload support')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9758f5a..35ac28d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -252,7 +252,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
offload.cookie = (unsigned long)f;
offload.dissector = dissector;
offload.mask = mask;
-   offload.key = >key;
+   offload.key = >mkey;
offload.exts = >exts;
 
tc->type = TC_SETUP_CLSFLOWER;
-- 
1.8.3.1



[PATCH net 0/2] net/sched: cls_flower: Fix mask handling

2016-12-14 Thread Paul Blakey
Hi,
The series fix how the mask is being handled.
Thanks.

Paul Blakey (2):
  net/sched: cls_flower: Use mask for addr_type
  net/sched: cls_flower: Use masked key when calling HW offloads

 net/sched/cls_flower.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
1.8.3.1



[PATCH net 1/2] net/sched: cls_flower: Use mask for addr_type

2016-12-14 Thread Paul Blakey
When addr_type is set, mask should also be set.

Fixes: 66530bdf85eb ('sched,cls_flower: set key address type when present')
Fixes: bc3103f1ed40 ('net/sched: cls_flower: Classify packet in ip tunnels')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 net/sched/cls_flower.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e040c51..9758f5a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -509,6 +509,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 
if (tb[TCA_FLOWER_KEY_IPV4_SRC] || tb[TCA_FLOWER_KEY_IPV4_DST]) {
key->control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
+   mask->control.addr_type = ~0;
fl_set_key_val(tb, >ipv4.src, TCA_FLOWER_KEY_IPV4_SRC,
   >ipv4.src, TCA_FLOWER_KEY_IPV4_SRC_MASK,
   sizeof(key->ipv4.src));
@@ -517,6 +518,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
   sizeof(key->ipv4.dst));
} else if (tb[TCA_FLOWER_KEY_IPV6_SRC] || tb[TCA_FLOWER_KEY_IPV6_DST]) {
key->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+   mask->control.addr_type = ~0;
fl_set_key_val(tb, >ipv6.src, TCA_FLOWER_KEY_IPV6_SRC,
   >ipv6.src, TCA_FLOWER_KEY_IPV6_SRC_MASK,
   sizeof(key->ipv6.src));
@@ -571,6 +573,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
tb[TCA_FLOWER_KEY_ENC_IPV4_DST]) {
key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
+   mask->enc_control.addr_type = ~0;
fl_set_key_val(tb, >enc_ipv4.src,
   TCA_FLOWER_KEY_ENC_IPV4_SRC,
   >enc_ipv4.src,
@@ -586,6 +589,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
if (tb[TCA_FLOWER_KEY_ENC_IPV6_SRC] ||
tb[TCA_FLOWER_KEY_ENC_IPV6_DST]) {
key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+   mask->enc_control.addr_type = ~0;
fl_set_key_val(tb, >enc_ipv6.src,
   TCA_FLOWER_KEY_ENC_IPV6_SRC,
   >enc_ipv6.src,
-- 
1.8.3.1



[PATCH iproute2] tc: flower: Fix usage message

2016-11-02 Thread Paul Blakey
Remove left over usage from removal of eth_type argument.

Fixes: 488b41d020fb ('tc: flower no need to specify the ethertype')
Signed-off-by: Paul Blakey <pa...@mellanox.com>
---
 man/man8/tc-flower.8 | 9 -
 tc/f_flower.c| 3 +--
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 74f7664..16ef261 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -23,8 +23,6 @@ flower \- flow based traffic control filter
 .R " | { "
 .BR dst_mac " | " src_mac " } "
 .IR mac_address " | "
-.BR eth_type " { " ipv4 " | " ipv6 " | " 802.1Q " | "
-.IR ETH_TYPE " } | "
 .B vlan_id
 .IR VID " | "
 .B vlan_prio
@@ -75,13 +73,6 @@ Do not process filter by hardware.
 .BI src_mac " mac_address"
 Match on source or destination MAC address.
 .TP
-.BI eth_type " ETH_TYPE"
-Match on the next protocol.
-.I ETH_TYPE
-may be either
-.BR ipv4 , ipv6 , 802.1Q ,
-or an unsigned 16bit value in hexadecimal format.
-.TP
 .BI vlan_id " VID"
 Match on vlan tag id.
 .I VID
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 2d31d1a..f39b1f7 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -36,7 +36,6 @@ static void explain(void)
fprintf(stderr, "   vlan_ethtype [ ipv4 | ipv6 | 
ETH-TYPE ] |\n");
fprintf(stderr, "   dst_mac MAC-ADDR |\n");
fprintf(stderr, "   src_mac MAC-ADDR |\n");
-   fprintf(stderr, "   [ipv4 | ipv6 ] |\n");
fprintf(stderr, "   ip_proto [tcp | udp | IP-PROTO 
] |\n");
fprintf(stderr, "   dst_ip [ IPV4-ADDR | IPV6-ADDR 
] |\n");
fprintf(stderr, "   src_ip [ IPV4-ADDR | IPV6-ADDR 
] |\n");
@@ -45,7 +44,7 @@ static void explain(void)
fprintf(stderr, "   FILTERID := X:Y:Z\n");
fprintf(stderr, "   ACTION-SPEC := ... look at individual 
actions\n");
fprintf(stderr, "\n");
-   fprintf(stderr, "NOTE: CLASSID, ETH-TYPE, IP-PROTO are parsed as 
hexadecimal input.\n");
+   fprintf(stderr, "NOTE: CLASSID, IP-PROTO are parsed as hexadecimal 
input.\n");
fprintf(stderr, "NOTE: There can be only used one mask per one prio. If 
user needs\n");
fprintf(stderr, "  to specify different mask, he has to use 
different prio.\n");
 }
-- 
1.8.3.1