[PATCH net-next v2 1/1] netfilter: flowtable: Add FLOW_OFFLOAD_XMIT_UNSPEC xmit type

2021-04-13 Thread Roi Dayan
It could be xmit type was not set and would default to FLOW_OFFLOAD_XMIT_NEIGH
and in this type the gc expect to have a route info.
Fix that by adding FLOW_OFFLOAD_XMIT_UNSPEC which defaults to 0.

Fixes: 8b9229d15877 ("netfilter: flowtable: dst_check() from garbage collector 
path")
Signed-off-by: Roi Dayan 
---

Notes:
v2
- add FLOW_OFFLOAD_XMIT_UNSPEC instead of still using neigh as default and 
checking dst for null

 include/net/netfilter/nf_flow_table.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_flow_table.h 
b/include/net/netfilter/nf_flow_table.h
index 583b327d8fc0..9b42c6523b4d 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -90,7 +90,8 @@ enum flow_offload_tuple_dir {
 #define FLOW_OFFLOAD_DIR_MAX   IP_CT_DIR_MAX
 
 enum flow_offload_xmit_type {
-   FLOW_OFFLOAD_XMIT_NEIGH = 0,
+   FLOW_OFFLOAD_XMIT_UNSPEC= 0,
+   FLOW_OFFLOAD_XMIT_NEIGH,
FLOW_OFFLOAD_XMIT_XFRM,
FLOW_OFFLOAD_XMIT_DIRECT,
 };
-- 
2.26.2



Re: [PATCH net-next 1/1] netfilter: flowtable: Make sure dst_cache is valid before using it

2021-04-13 Thread Roi Dayan




On 2021-04-12 2:42 PM, Pablo Neira Ayuso wrote:

On Mon, Apr 12, 2021 at 11:26:35AM +0300, Roi Dayan wrote:



On 2021-04-11 1:58 PM, Pablo Neira Ayuso wrote:

Hi Roi,

On Sun, Apr 11, 2021 at 11:13:34AM +0300, Roi Dayan wrote:

It could be dst_cache was not set so check it's not null before using
it.


Could you give a try to this fix?

net/sched/act_ct.c leaves the xmit_type as FLOW_OFFLOAD_XMIT_UNSPEC
since it does not cache a route.

Thanks.



what do you mean? FLOW_OFFLOAD_XMIT_UNSPEC doesn't exists so default 0
is set.

do you suggest adding that enum option as 0?


Yes. This could be FLOW_OFFLOAD_XMIT_TC instead if you prefer.

enum flow_offload_xmit_type {
 FLOW_OFFLOAD_XMIT_TC= 0,
 FLOW_OFFLOAD_XMIT_NEIGH,
 FLOW_OFFLOAD_XMIT_XFRM,
 FLOW_OFFLOAD_XMIT_DIRECT,
};

so there is no need to check for no route in the
FLOW_OFFLOAD_XMIT_NEIGH case (it's assumed this type always has a
route).



thanks Pablo. were not sure I wanted to touch the enum.
I prefer unspec actually as you suggested initially.
it works fine by adding the enum.

i'll submit v2 with this suggestion.



Re: [PATCH net-next 1/1] netfilter: flowtable: Make sure dst_cache is valid before using it

2021-04-12 Thread Roi Dayan




On 2021-04-11 1:58 PM, Pablo Neira Ayuso wrote:

Hi Roi,

On Sun, Apr 11, 2021 at 11:13:34AM +0300, Roi Dayan wrote:

It could be dst_cache was not set so check it's not null before using
it.


Could you give a try to this fix?

net/sched/act_ct.c leaves the xmit_type as FLOW_OFFLOAD_XMIT_UNSPEC
since it does not cache a route.

Thanks.



what do you mean? FLOW_OFFLOAD_XMIT_UNSPEC doesn't exists so default 0
is set.

do you suggest adding that enum option as 0?

this is the current xmit_type enum

enum flow_offload_xmit_type {
   FLOW_OFFLOAD_XMIT_NEIGH = 0,
   FLOW_OFFLOAD_XMIT_XFRM,
   FLOW_OFFLOAD_XMIT_DIRECT,
};




Fixes: 8b9229d15877 ("netfilter: flowtable: dst_check() from garbage collector 
path")
Signed-off-by: Roi Dayan 
---
  net/netfilter/nf_flow_table_core.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index 76573bae6664..e426077aaed1 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -410,6 +410,8 @@ static bool flow_offload_stale_dst(struct 
flow_offload_tuple *tuple)
if (tuple->xmit_type == FLOW_OFFLOAD_XMIT_NEIGH ||
tuple->xmit_type == FLOW_OFFLOAD_XMIT_XFRM) {
dst = tuple->dst_cache;
+   if (!dst)
+   return false;
if (!dst_check(dst, tuple->dst_cookie))
return true;
}
--
2.26.2



[PATCH net-next 1/1] netfilter: flowtable: Make sure dst_cache is valid before using it

2021-04-11 Thread Roi Dayan
It could be dst_cache was not set so check it's not null before using
it.

Fixes: 8b9229d15877 ("netfilter: flowtable: dst_check() from garbage collector 
path")
Signed-off-by: Roi Dayan 
---
 net/netfilter/nf_flow_table_core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index 76573bae6664..e426077aaed1 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -410,6 +410,8 @@ static bool flow_offload_stale_dst(struct 
flow_offload_tuple *tuple)
if (tuple->xmit_type == FLOW_OFFLOAD_XMIT_NEIGH ||
tuple->xmit_type == FLOW_OFFLOAD_XMIT_XFRM) {
dst = tuple->dst_cache;
+   if (!dst)
+   return false;
if (!dst_check(dst, tuple->dst_cookie))
return true;
}
-- 
2.26.2



Re: [PATCH] net: mellanox: mlx5: fix error return code of mlx5e_stats_flower()

2021-03-14 Thread Roi Dayan




On 2021-03-12 12:47 AM, Saeed Mahameed wrote:

On Tue, 2021-03-09 at 11:44 +0200, Roi Dayan wrote:



On 2021-03-09 10:32 AM, Jia-Ju Bai wrote:



On 2021/3/9 16:24, Roi Dayan wrote:



On 2021-03-09 10:20 AM, Roi Dayan wrote:



On 2021-03-06 3:47 PM, Jia-Ju Bai wrote:

When mlx5e_tc_get_counter() returns NULL to counter or
mlx5_devcom_get_peer_data() returns NULL to peer_esw, no
error return
code of mlx5e_stats_flower() is assigned.
To fix this bug, err is assigned with -EINVAL in these cases.

Reported-by: TOTE Robot 


Hey Jia-Ju, What are the conditions for this robot to raise a flag?
sometimes it is totally normal to abort a function and return 0.. i am
just curious to know ?



Signed-off-by: Jia-Ju Bai 
---
   drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12
+---
   1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 0da69b98f38f..1f2c9da7bd35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4380,8 +4380,10 @@ int mlx5e_stats_flower(struct
net_device
*dev, struct mlx5e_priv *priv,
   if (mlx5e_is_offloaded_flow(flow) ||
flow_flag_test(flow, CT)) {
   counter = mlx5e_tc_get_counter(flow);
-    if (!counter)
+    if (!counter) {
+    err = -EINVAL;
   goto errout;
+    }
   mlx5_fc_query_cached(counter, &bytes, &packets,
&lastuse);
   }
@@ -4390,8 +4392,10 @@ int mlx5e_stats_flower(struct
net_device
*dev, struct mlx5e_priv *priv,
    * un-offloaded while the other rule is offloaded.
    */
   peer_esw = mlx5_devcom_get_peer_data(devcom,
MLX5_DEVCOM_ESW_OFFLOADS);
-    if (!peer_esw)
+    if (!peer_esw) {
+    err = -EINVAL;




This is not an error flow, i am curious what are the thresholds of this
robot ?


note here it's not an error. it could be there is no peer esw
so just continue with the stats update.


   goto out;
+    }
   if (flow_flag_test(flow, DUP) &&
   flow_flag_test(flow->peer_flow, OFFLOADED)) {
@@ -4400,8 +4404,10 @@ int mlx5e_stats_flower(struct
net_device
*dev, struct mlx5e_priv *priv,
   u64 lastuse2;
   counter = mlx5e_tc_get_counter(flow->peer_flow);
-    if (!counter)
+    if (!counter) {
+    err = -EINVAL;


this change is problematic. the current goto is to do stats
update with
the first counter stats we got but if you now want to return an
error
then you probably should not do any update at all.


Thanks for your reply :)
I am not sure whether an error code should be returned here?
If so, flow_stats_update(...) should not be called here?


Best wishes,
Jia-Ju Bai



basically flow and peer_flow should be valid and protected from
changes,
and counter should be valid.
it looks like the check here is more of a sanity check if something
goes
wrong but shouldn't. you can just let it be, update the stats from
the
first queried counter.



Roi, let's consider returning an error code here, we shouldn't be
silently returning if we are not expecting these errors,

why would mlx5e_stats_flower() be called if stats are not offloaded ?

Thanks,
Saeed.




yes we can return an error if peer counter missing.
I just pointed out we should probably not call flow_stats_update() if
we do return an error.
the other option, as today, is updating the stats with first counter
stats and because of that we didn't return an error.


Re: [PATCH] net: bonding: fix error return code of bond_neigh_init()

2021-03-10 Thread Roi Dayan




On 2021-03-08 5:11 AM, Jia-Ju Bai wrote:

When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error
return code of bond_neigh_init() is assigned.
To fix this bug, ret is assigned with -EINVAL in these cases.

Fixes: 9e99bfefdbce ("bonding: fix bond_neigh_init()")
Reported-by: TOTE Robot 
Signed-off-by: Jia-Ju Bai 
---
  drivers/net/bonding/bond_main.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 74cbbb22470b..456315bef3a8 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3978,11 +3978,15 @@ static int bond_neigh_init(struct neighbour *n)
  
  	rcu_read_lock();

slave = bond_first_slave_rcu(bond);
-   if (!slave)
+   if (!slave) {
+   ret = -EINVAL;
goto out;
+   }
slave_ops = slave->dev->netdev_ops;
-   if (!slave_ops->ndo_neigh_setup)
+   if (!slave_ops->ndo_neigh_setup) {
+   ret = -EINVAL;
goto out;
+   }
  
  	/* TODO: find another way [1] to implement this.

 * Passing a zeroed structure is fragile,




Hi,

This breaks basic functionally that always worked. A slave doesn't need
to exists nor to implement ndo_neigh_setup.
Now trying to add a neigh entry because of that fails.
This commit needs to be reverted.

Thanks,
Roi


Re: [PATCH] net: mellanox: mlx5: fix error return code of mlx5e_stats_flower()

2021-03-09 Thread Roi Dayan




On 2021-03-09 10:32 AM, Jia-Ju Bai wrote:



On 2021/3/9 16:24, Roi Dayan wrote:



On 2021-03-09 10:20 AM, Roi Dayan wrote:



On 2021-03-06 3:47 PM, Jia-Ju Bai wrote:

When mlx5e_tc_get_counter() returns NULL to counter or
mlx5_devcom_get_peer_data() returns NULL to peer_esw, no error return
code of mlx5e_stats_flower() is assigned.
To fix this bug, err is assigned with -EINVAL in these cases.

Reported-by: TOTE Robot 
Signed-off-by: Jia-Ju Bai 
---
  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c

index 0da69b98f38f..1f2c9da7bd35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4380,8 +4380,10 @@ int mlx5e_stats_flower(struct net_device 
*dev, struct mlx5e_priv *priv,

  if (mlx5e_is_offloaded_flow(flow) || flow_flag_test(flow, CT)) {
  counter = mlx5e_tc_get_counter(flow);
-    if (!counter)
+    if (!counter) {
+    err = -EINVAL;
  goto errout;
+    }
  mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse);
  }
@@ -4390,8 +4392,10 @@ int mlx5e_stats_flower(struct net_device 
*dev, struct mlx5e_priv *priv,

   * un-offloaded while the other rule is offloaded.
   */
  peer_esw = mlx5_devcom_get_peer_data(devcom, 
MLX5_DEVCOM_ESW_OFFLOADS);

-    if (!peer_esw)
+    if (!peer_esw) {
+    err = -EINVAL;


note here it's not an error. it could be there is no peer esw
so just continue with the stats update.


  goto out;
+    }
  if (flow_flag_test(flow, DUP) &&
  flow_flag_test(flow->peer_flow, OFFLOADED)) {
@@ -4400,8 +4404,10 @@ int mlx5e_stats_flower(struct net_device 
*dev, struct mlx5e_priv *priv,

  u64 lastuse2;
  counter = mlx5e_tc_get_counter(flow->peer_flow);
-    if (!counter)
+    if (!counter) {
+    err = -EINVAL;


this change is problematic. the current goto is to do stats update with
the first counter stats we got but if you now want to return an error
then you probably should not do any update at all.


Thanks for your reply :)
I am not sure whether an error code should be returned here?
If so, flow_stats_update(...) should not be called here?


Best wishes,
Jia-Ju Bai



basically flow and peer_flow should be valid and protected from changes,
and counter should be valid.
it looks like the check here is more of a sanity check if something goes
wrong but shouldn't. you can just let it be, update the stats from the
first queried counter.





  goto no_peer_counter;
+    }
  mlx5_fc_query_cached(counter, &bytes2, &packets2, &lastuse2);
  bytes += bytes2;








Re: [PATCH] net: mellanox: mlx5: fix error return code of mlx5e_stats_flower()

2021-03-09 Thread Roi Dayan




On 2021-03-09 10:20 AM, Roi Dayan wrote:



On 2021-03-06 3:47 PM, Jia-Ju Bai wrote:

When mlx5e_tc_get_counter() returns NULL to counter or
mlx5_devcom_get_peer_data() returns NULL to peer_esw, no error return
code of mlx5e_stats_flower() is assigned.
To fix this bug, err is assigned with -EINVAL in these cases.

Reported-by: TOTE Robot 
Signed-off-by: Jia-Ju Bai 
---
  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c

index 0da69b98f38f..1f2c9da7bd35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4380,8 +4380,10 @@ int mlx5e_stats_flower(struct net_device *dev, 
struct mlx5e_priv *priv,

  if (mlx5e_is_offloaded_flow(flow) || flow_flag_test(flow, CT)) {
  counter = mlx5e_tc_get_counter(flow);
-    if (!counter)
+    if (!counter) {
+    err = -EINVAL;
  goto errout;
+    }
  mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse);
  }
@@ -4390,8 +4392,10 @@ int mlx5e_stats_flower(struct net_device *dev, 
struct mlx5e_priv *priv,

   * un-offloaded while the other rule is offloaded.
   */
  peer_esw = mlx5_devcom_get_peer_data(devcom, 
MLX5_DEVCOM_ESW_OFFLOADS);

-    if (!peer_esw)
+    if (!peer_esw) {
+    err = -EINVAL;


note here it's not an error. it could be there is no peer esw
so just continue with the stats update.


  goto out;
+    }
  if (flow_flag_test(flow, DUP) &&
  flow_flag_test(flow->peer_flow, OFFLOADED)) {
@@ -4400,8 +4404,10 @@ int mlx5e_stats_flower(struct net_device *dev, 
struct mlx5e_priv *priv,

  u64 lastuse2;
  counter = mlx5e_tc_get_counter(flow->peer_flow);
-    if (!counter)
+    if (!counter) {
+    err = -EINVAL;


this change is problematic. the current goto is to do stats update with
the first counter stats we got but if you now want to return an error
then you probably should not do any update at all.


  goto no_peer_counter;
+    }
  mlx5_fc_query_cached(counter, &bytes2, &packets2, &lastuse2);
  bytes += bytes2;






Re: [PATCH] net: mellanox: mlx5: fix error return code of mlx5e_stats_flower()

2021-03-09 Thread Roi Dayan




On 2021-03-06 3:47 PM, Jia-Ju Bai wrote:

When mlx5e_tc_get_counter() returns NULL to counter or
mlx5_devcom_get_peer_data() returns NULL to peer_esw, no error return
code of mlx5e_stats_flower() is assigned.
To fix this bug, err is assigned with -EINVAL in these cases.

Reported-by: TOTE Robot 
Signed-off-by: Jia-Ju Bai 
---
  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 0da69b98f38f..1f2c9da7bd35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4380,8 +4380,10 @@ int mlx5e_stats_flower(struct net_device *dev, struct 
mlx5e_priv *priv,
  
  	if (mlx5e_is_offloaded_flow(flow) || flow_flag_test(flow, CT)) {

counter = mlx5e_tc_get_counter(flow);
-   if (!counter)
+   if (!counter) {
+   err = -EINVAL;
goto errout;
+   }
  
  		mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse);

}
@@ -4390,8 +4392,10 @@ int mlx5e_stats_flower(struct net_device *dev, struct 
mlx5e_priv *priv,
 * un-offloaded while the other rule is offloaded.
 */
peer_esw = mlx5_devcom_get_peer_data(devcom, MLX5_DEVCOM_ESW_OFFLOADS);
-   if (!peer_esw)
+   if (!peer_esw) {
+   err = -EINVAL;


note here it's not an error. it could be there is no peer esw
so just continue with the stats update.


goto out;
+   }
  
  	if (flow_flag_test(flow, DUP) &&

flow_flag_test(flow->peer_flow, OFFLOADED)) {
@@ -4400,8 +4404,10 @@ int mlx5e_stats_flower(struct net_device *dev, struct 
mlx5e_priv *priv,
u64 lastuse2;
  
  		counter = mlx5e_tc_get_counter(flow->peer_flow);

-   if (!counter)
+   if (!counter) {
+   err = -EINVAL;
goto no_peer_counter;
+   }
mlx5_fc_query_cached(counter, &bytes2, &packets2, &lastuse2);
  
  		bytes += bytes2;







Re: [PATCH] net/mlx5e: include net/nexthop.h where needed

2021-03-08 Thread Roi Dayan




On 2021-03-08 5:31 PM, Arnd Bergmann wrote:

From: Arnd Bergmann 

drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:12: error: 
implicit declaration of function 'fib_info_nh' 
[-Werror,-Wimplicit-function-declaration]
 fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev;
   ^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:12: note: did 
you mean 'fib_info_put'?
include/net/ip_fib.h:529:20: note: 'fib_info_put' declared here
static inline void fib_info_put(struct fib_info *fi)
^

Fixes: 8914add2c9e5 ("net/mlx5e: Handle FIB events to update tunnel endpoint 
device")
Signed-off-by: Arnd Bergmann 
---
  drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c
index 6a116335bb21..32d06fe94acc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c
@@ -2,6 +2,7 @@
  /* Copyright (c) 2021 Mellanox Technologies. */
  
  #include 

+#include 
  #include "tc_tun_encap.h"
  #include "en_tc.h"
  #include "tc_tun.h"



Hi,

I see internally we have a pending commit from Vlad fixing this already,
with few more fixes. "net/mlx5e: Add missing include"

I'll check why it's being held.

Thanks,
Roi


Re: [PATCH] net/mlx5: use kvfree() for memory allocated with kvzalloc()

2021-03-03 Thread Roi Dayan




On 2021-03-03 4:40 AM, angkery wrote:

From: Junlin Yang 

It is allocated with kvzalloc(), the corresponding release function
should not be kfree(), use kvfree() instead.

Generated by: scripts/coccinelle/api/kfree_mismatch.cocci

Signed-off-by: Junlin Yang 
---
  drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c 
b/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c
index 6f6772b..3da7bec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.c
@@ -248,7 +248,7 @@ static int mlx5_esw_indir_table_rule_get(struct 
mlx5_eswitch *esw,
  err_ethertype:
kfree(rule);
  out:
-   kfree(rule_spec);
+   kvfree(rule_spec);
return err;
  }
  
@@ -328,7 +328,7 @@ static int mlx5_create_indir_recirc_group(struct mlx5_eswitch *esw,

e->recirc_cnt = 0;
  
  out:

-   kfree(in);
+   kvfree(in);
return err;
  }
  
@@ -347,7 +347,7 @@ static int mlx5_create_indir_fwd_group(struct mlx5_eswitch *esw,
  
  	spec = kvzalloc(sizeof(*spec), GFP_KERNEL);

if (!spec) {
-   kfree(in);
+   kvfree(in);
return -ENOMEM;
}
  
@@ -371,8 +371,8 @@ static int mlx5_create_indir_fwd_group(struct mlx5_eswitch *esw,

}
  
  err_out:

-   kfree(spec);
-   kfree(in);
+   kvfree(spec);
+   kvfree(in);
return err;
  }
  



thanks!

Reviewed-by: Roi Dayan 



Re: [PATCH iproute2] dcb: Fix compilation warning about reallocarray

2021-02-22 Thread Roi Dayan




On 2021-02-22 2:09 PM, Roi Dayan wrote:

To use reallocarray we need to add bsd/stdlib.h.

dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did 
you mean ‘realloc’?

Fixes: 8e9bed1493f5 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan 
---
  dcb/dcb.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/dcb/dcb.c b/dcb/dcb.c
index 6640deef5688..32896c4d5732 100644
--- a/dcb/dcb.c
+++ b/dcb/dcb.c
@@ -5,6 +5,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "dcb.h"

  #include "mnl_utils.h"



sorry for the resend. please ignore. sent v2 of the patch.


[PATCH iproute2] dcb: Fix compilation warning about reallocarray

2021-02-22 Thread Roi Dayan
To use reallocarray we need to add bsd/stdlib.h.

dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did 
you mean ‘realloc’?

Fixes: 8e9bed1493f5 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan 
---
 dcb/dcb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/dcb/dcb.c b/dcb/dcb.c
index 6640deef5688..32896c4d5732 100644
--- a/dcb/dcb.c
+++ b/dcb/dcb.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "dcb.h"
 #include "mnl_utils.h"
-- 
2.8.0



[PATCH iproute2-next v2] dcb: Fix compilation warning about reallocarray

2021-02-22 Thread Roi Dayan
In older distros we need bsd/stdlib.h but newer distro doesn't
need it. Also old distro will need libbsd-devel installed and newer
doesn't. To remove a possible dependency on libbsd-devel replace usage
of reallocarray to realloc.

dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did 
you mean ‘realloc’?

Fixes: 8e9bed1493f5 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan 
---

Notes:
v2
- tag for iproute next
- replace reallocarray with realloc instead of messing with libbsd

 dcb/dcb_app.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/dcb/dcb_app.c b/dcb/dcb_app.c
index 7ce80f85072b..c4816bc2997f 100644
--- a/dcb/dcb_app.c
+++ b/dcb/dcb_app.c
@@ -65,8 +65,7 @@ static void dcb_app_table_fini(struct dcb_app_table *tab)
 
 static int dcb_app_table_push(struct dcb_app_table *tab, struct dcb_app *app)
 {
-   struct dcb_app *apps = reallocarray(tab->apps, tab->n_apps + 1,
-   sizeof(*tab->apps));
+   struct dcb_app *apps = realloc(tab->apps, (tab->n_apps + 1) * 
sizeof(*tab->apps));
 
if (apps == NULL) {
perror("Cannot allocate APP table");
-- 
2.8.0



Re: [PATCH iproute2] dcb: Fix compilation warning about reallocarray

2021-02-22 Thread Roi Dayan




On 2021-02-22 12:51 PM, Roi Dayan wrote:

To use reallocarray we need to add bsd/stdlib.h.

dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did 
you mean ‘realloc’?

Fixes: 8e9bed1493f5 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan 
---
  dcb/dcb.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/dcb/dcb.c b/dcb/dcb.c
index 6640deef5688..32896c4d5732 100644
--- a/dcb/dcb.c
+++ b/dcb/dcb.c
@@ -5,6 +5,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "dcb.h"

  #include "mnl_utils.h"



It seems I need this include in old centos distro but
not in fedora 27 for example as it is part of glibc.
don't merge this please.




[PATCH iproute2] dcb: Fix compilation warning about reallocarray

2021-02-22 Thread Roi Dayan
To use reallocarray we need to add bsd/stdlib.h.

dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did 
you mean ‘realloc’?

Fixes: 8e9bed1493f5 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan 
---
 dcb/dcb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/dcb/dcb.c b/dcb/dcb.c
index 6640deef5688..32896c4d5732 100644
--- a/dcb/dcb.c
+++ b/dcb/dcb.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "dcb.h"
 #include "mnl_utils.h"
-- 
2.8.0



Re: [PATCH iproute2 v4 1/1] build: Fix link errors on some systems

2021-02-22 Thread Roi Dayan




On 2021-01-12 2:21 PM, Petr Machata wrote:


Roi Dayan  writes:


Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here")
Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add print_size()")

Signed-off-by: Roi Dayan 


Looking good:

 $ ldd ip/ip | grep libm.so
 $ ldd dcb/dcb | grep libm.so
 libm.so.6 => /lib64/libm.so.6 (0x7f204d0c2000)

Reviewed-by: Petr Machata 
Tested-by: Petr Machata 



Hi,

Pinging about this commit. I noticed it was not taken.
anything you want me to update? push again?

Thanks,
Roi


Re: [PATCH iproute2 v4 1/1] build: Fix link errors on some systems

2021-02-22 Thread Roi Dayan




On 2021-02-22 12:36 PM, Roi Dayan wrote:



On 2021-01-12 2:21 PM, Petr Machata wrote:


Roi Dayan  writes:


Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here")
Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add 
print_size()")


Signed-off-by: Roi Dayan 


Looking good:

 $ ldd ip/ip | grep libm.so
 $ ldd dcb/dcb | grep libm.so
 libm.so.6 => /lib64/libm.so.6 (0x7f204d0c2000)

Reviewed-by: Petr Machata 
Tested-by: Petr Machata 



Hi,

Pinging about this commit. I noticed it was not taken.
anything you want me to update? push again?

Thanks,
Roi


sorry, my mistake. wrong git clone :)


Re: [PATCH net 1/1] netfilter: conntrack: Check offload bit on table dump

2021-02-07 Thread Roi Dayan




On 2021-02-03 2:50 PM, Florian Westphal wrote:

Roi Dayan  wrote:

Do you think rhashtable_insert_fast() in flow_offload_add() blocks for
dozens of seconds?


I'm not sure. but its not only that but also the time to be in
established state as only then we offload.


That makes it even more weird.  Timeout for established is even larger.
In case of TCP, its days... so I don't understand at all :/


Thats about the only thing I can see between 'offload bit gets set'
and 'timeout is extended' in flow_offload_add() that could at least
spend *some* time.


We hit this issue before more easily and pushed this fix

4203b19c2796 netfilter: flowtable: Set offload timeout when adding flow


This fix makes sense to me.


I just noted we didn't test correctly Pablo's suggestion instead of
to check the bit and extend the timeout in ctnetlink_dump_table() and
ct_seq_show() like GC does.


Ok.  Extending it there makes sense, but I still don't understand
why newly offloaded flows hit this problem.

Flow offload should never see a 'about to expire' ct entry.
The 'extend timeout from gc' is more to make sure GC doesn't reap
long-lived entries that have been offloaded aeons ago, not 'prevent
new flows from getting zapped...'



I'll add that an extended timeout from gc is if gc actually iterated
this conn since it was created.
I'll investigate this more and try to do more tests.
thanks for the info


Re: [PATCH net 1/1] netfilter: conntrack: Check offload bit on table dump

2021-02-02 Thread Roi Dayan




On 2021-02-01 5:25 PM, Florian Westphal wrote:

Roi Dayan  wrote:

TCP initial timeout is one minute, UDP 30 seconds.
That should surely be enough to do flow_offload_add (which extends
the timeout)?


Yes, flow_offload_add() extends the timeout. but it needs to finish.



Maybe something is doing flow_offload_add() for unconfirmed conntrack?

In unconfirmed conntrack case, ct->timeout is absolute timeout value, e.g. for
tcp it will be set to 60 * HZ.


When I hit the issue I printed jiffies and ct->timeout and saw they are
the same or very close but not an absolute number.


Thats strange, for new flows they should not be close at all.
UDP sets a 30 second timeout, TCP sets a 60 second initial timeout.

Do you think rhashtable_insert_fast() in flow_offload_add() blocks for
dozens of seconds?


I'm not sure. but its not only that but also the time to be in
established state as only then we offload.



Thats about the only thing I can see between 'offload bit gets set'
and 'timeout is extended' in flow_offload_add() that could at least
spend *some* time.


We hit this issue before more easily and pushed this fix

4203b19c2796 netfilter: flowtable: Set offload timeout when adding flow


This fix makes sense to me.




I just noted we didn't test correctly Pablo's suggestion instead of
to check the bit and extend the timeout in ctnetlink_dump_table() and
ct_seq_show() like GC does.
We replaced nf_conntrack module but not nf_conntrack_netlink and tested
with conntrack -L.
We redo the test and replaced correct modules and it looks ok now.
We still need to run the long test to be sure but is that still
a good option as a fix?



Re: [PATCH net 1/1] netfilter: conntrack: Check offload bit on table dump

2021-02-01 Thread Roi Dayan




On 2021-02-01 1:50 PM, Florian Westphal wrote:

Roi Dayan  wrote:

There is a 3rd caller nf_ct_gc_expired() which being called by 3
other callers:
nf_conntrack_find()
nf_conntrack_tuple_taken()
early_drop_list()


Hm. I'm not sure yet what path is triggering this bug.

Florian came up with the idea of setting a very large timeout for
offloaded flows (that are refreshed by the garbage collector) to avoid
the extra check from the packet path, so those 3 functions above never
hit the garbage collection path. This also applies for the ctnetlink
(conntrack -L) and the /proc/net/nf_conntrack sysctl paths that the
patch describes, those should not ever see an offloaded flow with a
small timeout.

nf_ct_offload_timeout() is called from:

#1 flow_offload_add() to set a very large timer.
#2 the garbage collector path, to refresh the timeout the very large
 offload timer.

Probably there is a race between setting the IPS_OFFLOAD and when
flow_offload_add() is called? Garbage collector gets in between and
zaps the connection. Is a newly offloaded connection that you observed
that is being removed?



yes. the flows being removed are newly offloaded connections.


If they are new, how can they be timed out already?

TCP initial timeout is one minute, UDP 30 seconds.
That should surely be enough to do flow_offload_add (which extends
the timeout)?


Yes, flow_offload_add() extends the timeout. but it needs to finish.



Maybe something is doing flow_offload_add() for unconfirmed conntrack?

In unconfirmed conntrack case, ct->timeout is absolute timeout value, e.g. for
tcp it will be set to 60 * HZ.


When I hit the issue I printed jiffies and ct->timeout and saw they are
the same or very close but not an absolute number.



conntrack confirmation adds jiffies32 to it to make it relative
to current time (this is before insertion into the conntrack table,
so GC isn't supposed to happen before this).



We hit this issue before more easily and pushed this fix

4203b19c2796 netfilter: flowtable: Set offload timeout when adding flow

That commit changed flow_offload_add() to extend ct timeout because on
we noticed on busy systems GC didn't finish a full iteration on all
conns and conns were cleaned.
I think we might have the same issue.

tcf_ct_flow_table_add() set the offload bit and calls flow_offload_add()

We do know the offload bit is set when conn it deleted, so we hit the
issue where timeout being tested after tcf_ct_flow_table_add() was 
called but before ct timeout was fixed. so flow_offload_add() didn't

finish and GC didn't start, or did start but did not finish full
iteration.


In any case adding test for the offload bit seems to be papering over
invalid/broken ct->timeout value.



Re: [PATCH net 1/1] netfilter: conntrack: Check offload bit on table dump

2021-01-31 Thread Roi Dayan




On 2021-02-01 5:08 AM, Pablo Neira Ayuso wrote:

Hi Roi,

On Sun, Jan 31, 2021 at 03:18:34PM +0200, Roi Dayan wrote:
[...]

Hi Pablo,

We did more tests with just updating the timeout in the 2 callers
and it's not enough. We reproduce the issue of rules being timed
out just now frim different place.


Thanks for giving it a try to my suggestion, it was not correct.


There is a 3rd caller nf_ct_gc_expired() which being called by 3
other callers:
nf_conntrack_find()
nf_conntrack_tuple_taken()
early_drop_list()


Hm. I'm not sure yet what path is triggering this bug.

Florian came up with the idea of setting a very large timeout for
offloaded flows (that are refreshed by the garbage collector) to avoid
the extra check from the packet path, so those 3 functions above never
hit the garbage collection path. This also applies for the ctnetlink
(conntrack -L) and the /proc/net/nf_conntrack sysctl paths that the
patch describes, those should not ever see an offloaded flow with a
small timeout.

nf_ct_offload_timeout() is called from:

#1 flow_offload_add() to set a very large timer.
#2 the garbage collector path, to refresh the timeout the very large
offload timer.

Probably there is a race between setting the IPS_OFFLOAD and when
flow_offload_add() is called? Garbage collector gets in between and
zaps the connection. Is a newly offloaded connection that you observed
that is being removed?



yes. the flows being removed are newly offloaded connections.
I don't think the race is between setting IPS_OFFLOAD and calling
flow_offload_add().
In act_ct.c tcf_ct_flow_table_add() we first set the bit
and then call flow_offload_add().
I think the race is more of getting into the other flows and gc still
didn't kick in.
I'm sure of this because we added trace dump in nf_ct_delete()
and while creating connections we also ran in a loop conntrack -L.
In the same way instead of conntrack we did the dump from
/proc/net/nf_conntrack and we also saw the trace from there.
In this scenario if we stopped the loop of the dump we didn't crash
and then in more tests we failed again after I tried to moving the bit
check to your suggestion.
But leaving the bit check as in this commit we didn't reproduce the
issue at all.


only early_drop_list() has a check to skip conns with offload bit
but without extending the timeout.
I didnt do a dump but the issue could be from the other 2 calls.

With current commit as is I didn't need to check more callers as I made
sure all callers will skip the non-offload gc.

Instead of updating more callers and there might be more callers
later why current commit is not enough?
We skip offloaded flows and soon gc_worker() will hit and will update
the timeout anyway.


Another possibility would be to check for the offload bit from
nf_ct_is_expired(), which is coming slighty before nf_ct_should_gc().
But this is also in the nf_conntrack_find() path.
> Florian?



Right. beside the dumps paths that just call nf_ct_should_gc() which
calls nf_ct_is_expired() again.
Moving the bit check into nf_ct_is_expired() should work the same as
this commit.
I'll wait for a final decision if you prefer the bit check in
nf_ct_is_expired() and we will test again.



Re: [PATCH net 1/1] netfilter: conntrack: Check offload bit on table dump

2021-01-31 Thread Roi Dayan




On 2021-01-31 12:01 PM, Roi Dayan wrote:



On 2021-01-30 2:01 PM, Pablo Neira Ayuso wrote:

Hi Roi,

On Thu, Jan 28, 2021 at 09:40:52AM +0200, Roi Dayan wrote:

Currently, offloaded flows might be deleted when executing conntrack -L
or cat /proc/net/nf_conntrack while rules being offloaded.
Ct timeout is not maintained for offloaded flows as aging
of offloaded flows are managed by the flow table offload infrastructure.

Don't do garbage collection for offloaded flows when dumping the
entries.

Fixes: 90964016e5d3 ("netfilter: nf_conntrack: add IPS_OFFLOAD status 
bit")

Signed-off-by: Roi Dayan 
Reviewed-by: Oz Shlomo 
---
  include/net/netfilter/nf_conntrack.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h

index 439379ca9ffa..87c85109946a 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -276,7 +276,7 @@ static inline bool nf_ct_is_expired(const struct 
nf_conn *ct)

  static inline bool nf_ct_should_gc(const struct nf_conn *ct)
  {
  return nf_ct_is_expired(ct) && nf_ct_is_confirmed(ct) &&
-   !nf_ct_is_dying(ct);
+   !nf_ct_is_dying(ct) && !test_bit(IPS_OFFLOAD_BIT, 
&ct->status);


The gc_worker() calls nf_ct_offload_timeout() if the flow if
offloaded, so it extends the timeout to skip the garbage collection.

Could you update ctnetlink_dump_table() and ct_seq_show() to extend
the timeout if the flow is offloaded?

Thanks.



sure. i'll submit v2.
thanks



Hi Pablo,

We did more tests with just updating the timeout in the 2 callers
and it's not enough. We reproduce the issue of rules being timed
out just now frim different place.
There is a 3rd caller nf_ct_gc_expired() which being called by 3
other callers:
nf_conntrack_find()
nf_conntrack_tuple_taken()
early_drop_list()

only early_drop_list() has a check to skip conns with offload bit
but without extending the timeout.
I didnt do a dump but the issue could be from the other 2 calls.

With current commit as is I didn't need to check more callers as I made
sure all callers will skip the non-offload gc.
Instead of updating more callers and there might be more callers
later why current commit is not enough?
We skip offloaded flows and soon gc_worker() will hit and will update
the timeout anyway.

Thanks,
Roi


[PATCH net 1/1] netfilter: conntrack: Check offload bit on table dump

2021-01-27 Thread Roi Dayan
Currently, offloaded flows might be deleted when executing conntrack -L
or cat /proc/net/nf_conntrack while rules being offloaded.
Ct timeout is not maintained for offloaded flows as aging
of offloaded flows are managed by the flow table offload infrastructure.

Don't do garbage collection for offloaded flows when dumping the
entries.

Fixes: 90964016e5d3 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit")
Signed-off-by: Roi Dayan 
Reviewed-by: Oz Shlomo 
---
 include/net/netfilter/nf_conntrack.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 439379ca9ffa..87c85109946a 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -276,7 +276,7 @@ static inline bool nf_ct_is_expired(const struct nf_conn 
*ct)
 static inline bool nf_ct_should_gc(const struct nf_conn *ct)
 {
return nf_ct_is_expired(ct) && nf_ct_is_confirmed(ct) &&
-  !nf_ct_is_dying(ct);
+  !nf_ct_is_dying(ct) && !test_bit(IPS_OFFLOAD_BIT, &ct->status);
 }
 
 #defineNF_CT_DAY   (86400 * HZ)
-- 
2.26.2



[PATCH iproute2 v4 1/1] build: Fix link errors on some systems

2021-01-12 Thread Roi Dayan
Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here")
Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add print_size()")

Signed-off-by: Roi Dayan 
---

Notes:
v4
- Use SPDX license excerpt

v3
- Modify _IS_JSON_CONTEXT comparison order.

v2
- As suggested by Petr.
  Instead of adding -lm to all utils move the functions that
  require math lib to seperate c files and add -lm to dcb that
  use those functions.

 dcb/Makefile  |   1 +
 include/json_print.h  |   3 ++
 lib/Makefile  |   4 +-
 lib/json_print.c  |  33 
 lib/json_print_math.c |  37 +
 lib/utils.c   | 114 ---
 lib/utils_math.c  | 123 ++
 7 files changed, 166 insertions(+), 149 deletions(-)
 create mode 100644 lib/json_print_math.c
 create mode 100644 lib/utils_math.c

diff --git a/dcb/Makefile b/dcb/Makefile
index 4add954b4bba..7c09bb4f2e00 100644
--- a/dcb/Makefile
+++ b/dcb/Makefile
@@ -7,6 +7,7 @@ ifeq ($(HAVE_MNL),y)
 
 DCBOBJ = dcb.o dcb_buffer.o dcb_ets.o dcb_maxrate.o dcb_pfc.o
 TARGETS += dcb
+LDLIBS += -lm
 
 endif
 
diff --git a/include/json_print.h b/include/json_print.h
index 1a1ad5ffa552..6fcf9fd910ec 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -15,6 +15,9 @@
 #include "json_writer.h"
 #include "color.h"
 
+#define _IS_JSON_CONTEXT(type) (is_json_context() && (type & PRINT_JSON || 
type & PRINT_ANY))
+#define _IS_FP_CONTEXT(type)   (!is_json_context() && (type & PRINT_FP || type 
& PRINT_ANY))
+
 json_writer_t *get_json_writer(void);
 
 /*
diff --git a/lib/Makefile b/lib/Makefile
index 764c9137d0ec..6c98f9a61fdb 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -3,8 +3,8 @@ include ../config.mk
 
 CFLAGS += -fPIC
 
-UTILOBJ = utils.o rt_names.o ll_map.o ll_types.o ll_proto.o ll_addr.o \
-   inet_proto.o namespace.o json_writer.o json_print.o \
+UTILOBJ = utils.o utils_math.o rt_names.o ll_map.o ll_types.o ll_proto.o 
ll_addr.o \
+   inet_proto.o namespace.o json_writer.o json_print.o json_print_math.o \
names.o color.o bpf_legacy.o bpf_glue.o exec.o fs.o cg_map.o
 
 ifeq ($(HAVE_ELF),y)
diff --git a/lib/json_print.c b/lib/json_print.c
index b086123ad1f4..994a2f8d6ae0 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -11,16 +11,12 @@
 
 #include 
 #include 
-#include 
 
 #include "utils.h"
 #include "json_print.h"
 
 static json_writer_t *_jw;
 
-#define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
-#define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
-
 static void __new_json_obj(int json, bool have_array)
 {
if (json) {
@@ -342,32 +338,3 @@ int print_color_rate(bool use_iec, enum output_type type, 
enum color_attr color,
free(buf);
return rc;
 }
-
-char *sprint_size(__u32 sz, char *buf)
-{
-   long kilo = 1024;
-   long mega = kilo * kilo;
-   size_t len = SPRINT_BSIZE - 1;
-   double tmp = sz;
-
-   if (sz >= mega && fabs(mega * rint(tmp / mega) - sz) < 1024)
-   snprintf(buf, len, "%gMb", rint(tmp / mega));
-   else if (sz >= kilo && fabs(kilo * rint(tmp / kilo) - sz) < 16)
-   snprintf(buf, len, "%gKb", rint(tmp / kilo));
-   else
-   snprintf(buf, len, "%ub", sz);
-
-   return buf;
-}
-
-int print_color_size(enum output_type type, enum color_attr color,
-const char *key, const char *fmt, __u32 sz)
-{
-   SPRINT_BUF(buf);
-
-   if (_IS_JSON_CONTEXT(type))
-   return print_color_uint(type, color, key, "%u", sz);
-
-   sprint_size(sz, buf);
-   return print_color_string(type, color, key, fmt, buf);
-}
diff --git a/lib/json_print_math.c b/lib/json_print_math.c
new file mode 100644
index ..f4d504995924
--- /dev/null
+++ b/lib/json_print_math.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include 
+#include 
+#includ

Re: [PATCH iproute2 v3 1/1] build: Fix link errors on some systems

2021-01-12 Thread Roi Dayan




On 2021-01-11 6:51 PM, Petr Machata wrote:


Roi Dayan  writes:


diff --git a/lib/json_print_math.c b/lib/json_print_math.c
new file mode 100644
index ..3d560defcd3e
--- /dev/null
+++ b/lib/json_print_math.c
@@ -0,0 +1,46 @@
+/*
+ * json_print_math.c   "print regular or json output, based on 
json_writer".
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *


This should have a SPDX tag instead of the license excerpt:

// SPDX-License-Identifier: GPL-2.0+


+ * Authors:Julien Fortin, 
+ */


sprint_size() comes from TC and predates iproute2 git repo (2004),
whereas Cumulus Networks was around from 2010. So the authorship
declaration is likely inaccurate. I think it's also unnecessary, and
would just drop it.


diff --git a/lib/utils_math.c b/lib/utils_math.c
new file mode 100644
index ..d67affeb16c2
--- /dev/null
+++ b/lib/utils_math.c
@@ -0,0 +1,133 @@
+/*
+ * utils.c
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Alexey Kuznetsov, 


The same here re license and authorship. The latter might in fact be
accurate in this case, but I would still drop it :)

Otherwise this looks good to me.



great thanks. sending v4 with the updates.


Re: [net-next 09/15] net/mlx5e: CT: Support offload of +trk+new ct rules

2021-01-09 Thread Roi Dayan




On 2021-01-08 11:59 PM, Marcelo Ricardo Leitner wrote:

Hi,

On Thu, Jan 07, 2021 at 09:30:48PM -0800, Saeed Mahameed wrote:

@@ -1429,6 +1600,14 @@ mlx5_tc_ct_add_ft_cb(struct mlx5_tc_ct_priv *ct_priv, 
u16 zone,
if (err)
goto err_insert;
  
+	nf_ct_zone_init(&ctzone, zone, NF_CT_DEFAULT_ZONE_DIR, 0);

+   ft->tmpl = nf_ct_tmpl_alloc(&init_net, &ctzone, GFP_KERNEL);


I didn't test but I think this will add a hard dependency to
nf_conntrack_core and will cause conntrack to always be loaded by
mlx5_core, which is not good for some use cases.
nf_ct_tmpl_alloc() is defined in nf_conntrack_core.c.

762f926d6f19 ("net/sched: act_ct: Make tcf_ct_flow_table_restore_skb
inline") was done similarly to avoid this.



right. we will take a look what we can do with this.
thanks


+   if (!ft->tmpl)
+   goto err_tmpl;
+
+   __set_bit(IPS_CONFIRMED_BIT, &ft->tmpl->status);
+   nf_conntrack_get(&ft->tmpl->ct_general);
+
err = nf_flow_table_offload_add_cb(ft->nf_ft,
   mlx5_tc_ct_block_flow_offload, ft);
if (err)


Re: [net-next 08/15] net/mlx5e: CT: Preparation for offloading +trk+new ct rules

2021-01-09 Thread Roi Dayan




On 2021-01-10 9:45 AM, Roi Dayan wrote:



On 2021-01-08 11:48 PM, Marcelo Ricardo Leitner wrote:

Hi,

On Thu, Jan 07, 2021 at 09:30:47PM -0800, Saeed Mahameed wrote:

From: Roi Dayan 

Connection tracking associates the connection state per packet. The
first packet of a connection is assigned with the +trk+new state. The
connection enters the established state once a packet is seen on the
other direction.

Currently we offload only the established flows. However, UDP traffic
using source port entropy (e.g. vxlan, RoCE) will never enter the
established state. Such protocols do not require stateful processing,
and therefore could be offloaded.


If it doesn't require stateful processing, please enlight me on why
conntrack is being used in the first place. What's the use case here?



The use case for example is when we have vxlan traffic but we do
conntrack on the inner packet (rules on the physical port) so
we never get established but on miss we can still offload as normal
vxlan traffic.



my mistake about "inner packet". we do CT on the underlay network, i.e.
the outer header.



The change in the model is that a miss on the CT table will be forwarded
to a new +trk+new ct table and a miss there will be forwarded to the 
slow

path table.


AFAICU this new +trk+new ct table is a wildcard match on sport with
specific dports. Also AFAICU, such entries will not be visible to the
userspace then. Is this right?

   Marcelo



right.


Re: [net-next 08/15] net/mlx5e: CT: Preparation for offloading +trk+new ct rules

2021-01-09 Thread Roi Dayan




On 2021-01-08 11:48 PM, Marcelo Ricardo Leitner wrote:

Hi,

On Thu, Jan 07, 2021 at 09:30:47PM -0800, Saeed Mahameed wrote:

From: Roi Dayan 

Connection tracking associates the connection state per packet. The
first packet of a connection is assigned with the +trk+new state. The
connection enters the established state once a packet is seen on the
other direction.

Currently we offload only the established flows. However, UDP traffic
using source port entropy (e.g. vxlan, RoCE) will never enter the
established state. Such protocols do not require stateful processing,
and therefore could be offloaded.


If it doesn't require stateful processing, please enlight me on why
conntrack is being used in the first place. What's the use case here?



The use case for example is when we have vxlan traffic but we do
conntrack on the inner packet (rules on the physical port) so
we never get established but on miss we can still offload as normal
vxlan traffic.



The change in the model is that a miss on the CT table will be forwarded
to a new +trk+new ct table and a miss there will be forwarded to the slow
path table.


AFAICU this new +trk+new ct table is a wildcard match on sport with
specific dports. Also AFAICU, such entries will not be visible to the
userspace then. Is this right?

   Marcelo



right.


[PATCH iproute2 v3 1/1] build: Fix link errors on some systems

2021-01-09 Thread Roi Dayan
Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here")
Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add print_size()")

Signed-off-by: Roi Dayan 
---

Notes:
v3
- Modify _IS_JSON_CONTEXT comparison order.

v2
- As suggested by Petr.
  Instead of adding -lm to all utils move the functions that
  require math lib to seperate c files and add -lm to dcb that
  use those functions.

 dcb/Makefile  |   1 +
 include/json_print.h  |   3 +
 lib/Makefile  |   4 +-
 lib/json_print.c  |  33 ---
 lib/json_print_math.c |  46 +++
 lib/utils.c   | 114 
 lib/utils_math.c  | 133 ++
 7 files changed, 185 insertions(+), 149 deletions(-)
 create mode 100644 lib/json_print_math.c
 create mode 100644 lib/utils_math.c

diff --git a/dcb/Makefile b/dcb/Makefile
index 4add954b4bba..7c09bb4f2e00 100644
--- a/dcb/Makefile
+++ b/dcb/Makefile
@@ -7,6 +7,7 @@ ifeq ($(HAVE_MNL),y)
 
 DCBOBJ = dcb.o dcb_buffer.o dcb_ets.o dcb_maxrate.o dcb_pfc.o
 TARGETS += dcb
+LDLIBS += -lm
 
 endif
 
diff --git a/include/json_print.h b/include/json_print.h
index 1a1ad5ffa552..6fcf9fd910ec 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -15,6 +15,9 @@
 #include "json_writer.h"
 #include "color.h"
 
+#define _IS_JSON_CONTEXT(type) (is_json_context() && (type & PRINT_JSON || 
type & PRINT_ANY))
+#define _IS_FP_CONTEXT(type)   (!is_json_context() && (type & PRINT_FP || type 
& PRINT_ANY))
+
 json_writer_t *get_json_writer(void);
 
 /*
diff --git a/lib/Makefile b/lib/Makefile
index 764c9137d0ec..6c98f9a61fdb 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -3,8 +3,8 @@ include ../config.mk
 
 CFLAGS += -fPIC
 
-UTILOBJ = utils.o rt_names.o ll_map.o ll_types.o ll_proto.o ll_addr.o \
-   inet_proto.o namespace.o json_writer.o json_print.o \
+UTILOBJ = utils.o utils_math.o rt_names.o ll_map.o ll_types.o ll_proto.o 
ll_addr.o \
+   inet_proto.o namespace.o json_writer.o json_print.o json_print_math.o \
names.o color.o bpf_legacy.o bpf_glue.o exec.o fs.o cg_map.o
 
 ifeq ($(HAVE_ELF),y)
diff --git a/lib/json_print.c b/lib/json_print.c
index b086123ad1f4..994a2f8d6ae0 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -11,16 +11,12 @@
 
 #include 
 #include 
-#include 
 
 #include "utils.h"
 #include "json_print.h"
 
 static json_writer_t *_jw;
 
-#define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
-#define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
-
 static void __new_json_obj(int json, bool have_array)
 {
if (json) {
@@ -342,32 +338,3 @@ int print_color_rate(bool use_iec, enum output_type type, 
enum color_attr color,
free(buf);
return rc;
 }
-
-char *sprint_size(__u32 sz, char *buf)
-{
-   long kilo = 1024;
-   long mega = kilo * kilo;
-   size_t len = SPRINT_BSIZE - 1;
-   double tmp = sz;
-
-   if (sz >= mega && fabs(mega * rint(tmp / mega) - sz) < 1024)
-   snprintf(buf, len, "%gMb", rint(tmp / mega));
-   else if (sz >= kilo && fabs(kilo * rint(tmp / kilo) - sz) < 16)
-   snprintf(buf, len, "%gKb", rint(tmp / kilo));
-   else
-   snprintf(buf, len, "%ub", sz);
-
-   return buf;
-}
-
-int print_color_size(enum output_type type, enum color_attr color,
-const char *key, const char *fmt, __u32 sz)
-{
-   SPRINT_BUF(buf);
-
-   if (_IS_JSON_CONTEXT(type))
-   return print_color_uint(type, color, key, "%u", sz);
-
-   sprint_size(sz, buf);
-   return print_color_string(type, color, key, fmt, buf);
-}
diff --git a/lib/json_print_math.c b/lib/json_print_math.c
new file mode 100644
index ..3d560defcd3e
--- /dev/null
+++ b/lib/json_print_math.c
@@ -0,0 +1,46 @@
+/*
+ * json_print_math.c   "print regular or json output, based on 
json_writer".
+ *
+ *

[PATCH iproute2 v2 1/1] build: Fix link errors on some systems

2021-01-06 Thread Roi Dayan
Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here")
Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add print_size()")

Signed-off-by: Roi Dayan 
---

Notes:
v2
- As suggested by Petr.
  Instead of adding -lm to all utils move the functions that
  require math lib to seperate c files and add -lm to dcb that
  use those functions.

 dcb/Makefile  |   1 +
 include/json_print.h  |   3 +
 lib/Makefile  |   4 +-
 lib/json_print.c  |  33 ---
 lib/json_print_math.c |  46 +++
 lib/utils.c   | 114 
 lib/utils_math.c  | 133 ++
 7 files changed, 185 insertions(+), 149 deletions(-)
 create mode 100644 lib/json_print_math.c
 create mode 100644 lib/utils_math.c

diff --git a/dcb/Makefile b/dcb/Makefile
index 4add954b4bba..7c09bb4f2e00 100644
--- a/dcb/Makefile
+++ b/dcb/Makefile
@@ -7,6 +7,7 @@ ifeq ($(HAVE_MNL),y)
 
 DCBOBJ = dcb.o dcb_buffer.o dcb_ets.o dcb_maxrate.o dcb_pfc.o
 TARGETS += dcb
+LDLIBS += -lm
 
 endif
 
diff --git a/include/json_print.h b/include/json_print.h
index 1a1ad5ffa552..9ec8626c4d85 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -15,6 +15,9 @@
 #include "json_writer.h"
 #include "color.h"
 
+#define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && 
is_json_context())
+#define _IS_FP_CONTEXT(type) (!is_json_context() && (type & PRINT_FP || type & 
PRINT_ANY))
+
 json_writer_t *get_json_writer(void);
 
 /*
diff --git a/lib/Makefile b/lib/Makefile
index 764c9137d0ec..6c98f9a61fdb 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -3,8 +3,8 @@ include ../config.mk
 
 CFLAGS += -fPIC
 
-UTILOBJ = utils.o rt_names.o ll_map.o ll_types.o ll_proto.o ll_addr.o \
-   inet_proto.o namespace.o json_writer.o json_print.o \
+UTILOBJ = utils.o utils_math.o rt_names.o ll_map.o ll_types.o ll_proto.o 
ll_addr.o \
+   inet_proto.o namespace.o json_writer.o json_print.o json_print_math.o \
names.o color.o bpf_legacy.o bpf_glue.o exec.o fs.o cg_map.o
 
 ifeq ($(HAVE_ELF),y)
diff --git a/lib/json_print.c b/lib/json_print.c
index b086123ad1f4..994a2f8d6ae0 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -11,16 +11,12 @@
 
 #include 
 #include 
-#include 
 
 #include "utils.h"
 #include "json_print.h"
 
 static json_writer_t *_jw;
 
-#define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
-#define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
-
 static void __new_json_obj(int json, bool have_array)
 {
if (json) {
@@ -342,32 +338,3 @@ int print_color_rate(bool use_iec, enum output_type type, 
enum color_attr color,
free(buf);
return rc;
 }
-
-char *sprint_size(__u32 sz, char *buf)
-{
-   long kilo = 1024;
-   long mega = kilo * kilo;
-   size_t len = SPRINT_BSIZE - 1;
-   double tmp = sz;
-
-   if (sz >= mega && fabs(mega * rint(tmp / mega) - sz) < 1024)
-   snprintf(buf, len, "%gMb", rint(tmp / mega));
-   else if (sz >= kilo && fabs(kilo * rint(tmp / kilo) - sz) < 16)
-   snprintf(buf, len, "%gKb", rint(tmp / kilo));
-   else
-   snprintf(buf, len, "%ub", sz);
-
-   return buf;
-}
-
-int print_color_size(enum output_type type, enum color_attr color,
-const char *key, const char *fmt, __u32 sz)
-{
-   SPRINT_BUF(buf);
-
-   if (_IS_JSON_CONTEXT(type))
-   return print_color_uint(type, color, key, "%u", sz);
-
-   sprint_size(sz, buf);
-   return print_color_string(type, color, key, fmt, buf);
-}
diff --git a/lib/json_print_math.c b/lib/json_print_math.c
new file mode 100644
index ..3d560defcd3e
--- /dev/null
+++ b/lib/json_print_math.c
@@ -0,0 +1,46 @@
+/*
+ * json_print_math.c   "print regular or json output, based on 
json_writer".
+ *
+ * This program is free software; you can redistribute it and/or

Re: [PATCH iproute2] build: Fix link errors on some systems

2021-01-06 Thread Roi Dayan




On 2021-01-06 4:24 PM, Petr Machata wrote:


Roi Dayan  writes:


On 2021-01-06 3:16 PM, Petr Machata wrote:

Regarding the publishing, the _jw reference can be changed to a call to
is_json_context(), which does the same thing. Then _jw can stay private
in json_print.c.
Exposing an _IS_JSON_CONTEXT / _IS_FP_CONTEXT might be odd on account of
the initial underscore, but since it's only used in implementations,
maybe it's OK?



With is_json_context() I cannot check the type passed by the caller.
i.e. PRINT_JSON, PRINT_FP, PRINT_ANY.


I meant s/_jw/is_json_context()/. Like this:

#define _IS_JSON_CONTEXT(type) \
 ((type & PRINT_JSON || type & PRINT_ANY) && is_json_context())

Then _jw can stay private.



right. thanks. i'll submit v2 for review.


Re: [PATCH iproute2] build: Fix link errors on some systems

2021-01-06 Thread Roi Dayan




On 2021-01-06 3:16 PM, Petr Machata wrote:


Roi Dayan  writes:


On 2021-01-06 10:42 AM, Roi Dayan wrote:


On 2021-01-04 6:07 PM, Petr Machata wrote:


I think that just adding an unnecessary -lm is more of a tidiness issue
than anything else. One way to avoid it is to split the -lm deps out
from util.c / json_print.c to like util_math.c / json_print_math.c. That
way they will be in an .o of their own, and won't be linked in unless
the binary in question needs the code. Then the binaries that do call it
can keep on linking in -lm like they did so far.

Thoughts?


ok fine by me.


I looked at this and for get_size()/rate/.. it went smooth.
but for print_color_size() there is an issue that it uses
_IS_JSON_CONTEXT and statuic *_jw which are defined in json_print.c
Is it ok to expose those in json_print.h now so json_print_math.c
could use?


You don't need json_print_math.h IMHO, it can all be backed by the same
header, just different implementation modules. From the API point of
view, I don't think the user should really care which of the symbols use
math (though of course they will have to know whether to link in -lm).


right ok.



Regarding the publishing, the _jw reference can be changed to a call to
is_json_context(), which does the same thing. Then _jw can stay private
in json_print.c.

Exposing an _IS_JSON_CONTEXT / _IS_FP_CONTEXT might be odd on account of
the initial underscore, but since it's only used in implementations,
maybe it's OK?



With is_json_context() I cannot check the type passed by the caller.
i.e. PRINT_JSON, PRINT_FP, PRINT_ANY.

From what I see now callers use is_json_context() to decide which print
to use. but in print_color_size() I should check the type to decide
which print.



Re: [PATCH iproute2] build: Fix link errors on some systems

2021-01-06 Thread Roi Dayan




On 2021-01-06 10:42 AM, Roi Dayan wrote:



On 2021-01-04 6:07 PM, Petr Machata wrote:


Roi Dayan  writes:


Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing the math lib.
Move the link flag from tc makefile to the main makefile.


Hmm, yeah, it gets optimized out on x86-64. The issue is reproducible
on any platform with -O0.


../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here")
Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add 
print_size()")

Signed-off-by: Roi Dayan 
---
  Makefile    | 1 +
  tc/Makefile | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e64c65992585..2a604ec58905 100644
--- a/Makefile
+++ b/Makefile
@@ -59,6 +59,7 @@ SUBDIRS=lib ip tc bridge misc netem genl tipc 
devlink rdma dcb man

  LIBNETLINK=../lib/libutil.a ../lib/libnetlink.a
  LDLIBS += $(LIBNETLINK)
+LDFLAGS += -lm
  all: config.mk
  @set -e; \
diff --git a/tc/Makefile b/tc/Makefile
index 5a517af20b7c..8d91900716c1 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -110,7 +110,7 @@ ifneq ($(TC_CONFIG_NO_XT),y)
  endif
  TCOBJ += $(TCMODULES)
-LDLIBS += -L. -lm
+LDLIBS += -L.
  ifeq ($(SHARED_LIBS),y)
  LDLIBS += -ldl


This will work, but it will give a libm dependency to all the tools.
util.c currently tries not to do that:

/* emulate ceil() without having to bring-in -lm and always be >= 
1 */

*val = t;
if (*val < t)
    *val += 1;

I think that just adding an unnecessary -lm is more of a tidiness issue
than anything else. One way to avoid it is to split the -lm deps out
from util.c / json_print.c to like util_math.c / json_print_math.c. That
way they will be in an .o of their own, and won't be linked in unless
the binary in question needs the code. Then the binaries that do call it
can keep on linking in -lm like they did so far.

Thoughts?



ok fine by me.


I looked at this and for get_size()/rate/.. it went smooth.
but for print_color_size() there is an issue that it uses
_IS_JSON_CONTEXT and statuic *_jw which are defined in json_print.c
Is it ok to expose those in json_print.h now so json_print_math.c
could use?


Re: [PATCH iproute2] build: Fix link errors on some systems

2021-01-06 Thread Roi Dayan




On 2021-01-04 6:07 PM, Petr Machata wrote:


Roi Dayan  writes:


Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing the math lib.
Move the link flag from tc makefile to the main makefile.


Hmm, yeah, it gets optimized out on x86-64. The issue is reproducible
on any platform with -O0.


../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here")
Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add print_size()")
Signed-off-by: Roi Dayan 
---
  Makefile| 1 +
  tc/Makefile | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e64c65992585..2a604ec58905 100644
--- a/Makefile
+++ b/Makefile
@@ -59,6 +59,7 @@ SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma 
dcb man
  
  LIBNETLINK=../lib/libutil.a ../lib/libnetlink.a

  LDLIBS += $(LIBNETLINK)
+LDFLAGS += -lm
  
  all: config.mk

@set -e; \
diff --git a/tc/Makefile b/tc/Makefile
index 5a517af20b7c..8d91900716c1 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -110,7 +110,7 @@ ifneq ($(TC_CONFIG_NO_XT),y)
  endif
  
  TCOBJ += $(TCMODULES)

-LDLIBS += -L. -lm
+LDLIBS += -L.
  
  ifeq ($(SHARED_LIBS),y)

  LDLIBS += -ldl


This will work, but it will give a libm dependency to all the tools.
util.c currently tries not to do that:

/* emulate ceil() without having to bring-in -lm and always be >= 1 */
*val = t;
if (*val < t)
*val += 1;

I think that just adding an unnecessary -lm is more of a tidiness issue
than anything else. One way to avoid it is to split the -lm deps out
from util.c / json_print.c to like util_math.c / json_print_math.c. That
way they will be in an .o of their own, and won't be linked in unless
the binary in question needs the code. Then the binaries that do call it
can keep on linking in -lm like they did so far.

Thoughts?



ok fine by me.


[PATCH iproute2] build: Fix link errors on some systems

2020-12-30 Thread Roi Dayan
Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing the math lib.
Move the link flag from tc makefile to the main makefile.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here")
Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add print_size()")
Signed-off-by: Roi Dayan 
---
 Makefile| 1 +
 tc/Makefile | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e64c65992585..2a604ec58905 100644
--- a/Makefile
+++ b/Makefile
@@ -59,6 +59,7 @@ SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma 
dcb man
 
 LIBNETLINK=../lib/libutil.a ../lib/libnetlink.a
 LDLIBS += $(LIBNETLINK)
+LDFLAGS += -lm
 
 all: config.mk
@set -e; \
diff --git a/tc/Makefile b/tc/Makefile
index 5a517af20b7c..8d91900716c1 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -110,7 +110,7 @@ ifneq ($(TC_CONFIG_NO_XT),y)
 endif
 
 TCOBJ += $(TCMODULES)
-LDLIBS += -L. -lm
+LDLIBS += -L.
 
 ifeq ($(SHARED_LIBS),y)
 LDLIBS += -ldl
-- 
1.8.3.1



Re: [PATCH iproute2 1/1] tc flower: fix parsing vlan_id and vlan_prio

2020-11-24 Thread Roi Dayan




On 2020-11-24 2:26 PM, Roi Dayan wrote:

When protocol is vlan then eth_type is set to the vlan eth type.
So when parsing vlan_id and vlan_prio need to check tc_proto
is vlan and not eth_type.

Fixes: 4c551369e083 ("tc flower: use right ethertype in icmp/arp parsing")
Signed-off-by: Roi Dayan 
---
  tc/f_flower.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 58e1140d7391..9b278f3c0e83 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -1432,7 +1432,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
__u16 vid;
  
  			NEXT_ARG();

-   if (!eth_type_vlan(eth_type)) {
+   if (!eth_type_vlan(tc_proto)) {
fprintf(stderr, "Can't set \"vlan_id\" if ethertype 
isn't 802.1Q or 802.1AD\n");
return -1;
}
@@ -1446,7 +1446,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
__u8 vlan_prio;
  
  			NEXT_ARG();

-   if (!eth_type_vlan(eth_type)) {
+   if (!eth_type_vlan(tc_proto)) {
fprintf(stderr, "Can't set \"vlan_prio\" if 
ethertype isn't 802.1Q or 802.1AD\n");
return -1;
}




sorry should have tagged as iproute2-next.
ignore this i sent with correct tag.


[PATCH iproute2 1/1] tc flower: fix parsing vlan_id and vlan_prio

2020-11-24 Thread Roi Dayan
When protocol is vlan then eth_type is set to the vlan eth type.
So when parsing vlan_id and vlan_prio need to check tc_proto
is vlan and not eth_type.

Fixes: 4c551369e083 ("tc flower: use right ethertype in icmp/arp parsing")
Signed-off-by: Roi Dayan 
---
 tc/f_flower.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 58e1140d7391..9b278f3c0e83 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -1432,7 +1432,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
__u16 vid;
 
NEXT_ARG();
-   if (!eth_type_vlan(eth_type)) {
+   if (!eth_type_vlan(tc_proto)) {
fprintf(stderr, "Can't set \"vlan_id\" if 
ethertype isn't 802.1Q or 802.1AD\n");
return -1;
}
@@ -1446,7 +1446,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
__u8 vlan_prio;
 
NEXT_ARG();
-   if (!eth_type_vlan(eth_type)) {
+   if (!eth_type_vlan(tc_proto)) {
fprintf(stderr, "Can't set \"vlan_prio\" if 
ethertype isn't 802.1Q or 802.1AD\n");
return -1;
}
-- 
2.25.4



[PATCH iproute2-next 1/1] tc flower: fix parsing vlan_id and vlan_prio

2020-11-24 Thread Roi Dayan
When protocol is vlan then eth_type is set to the vlan eth type.
So when parsing vlan_id and vlan_prio need to check tc_proto
is vlan and not eth_type.

Fixes: 4c551369e083 ("tc flower: use right ethertype in icmp/arp parsing")
Signed-off-by: Roi Dayan 
---
 tc/f_flower.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 58e1140d7391..9b278f3c0e83 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -1432,7 +1432,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
__u16 vid;
 
NEXT_ARG();
-   if (!eth_type_vlan(eth_type)) {
+   if (!eth_type_vlan(tc_proto)) {
fprintf(stderr, "Can't set \"vlan_id\" if 
ethertype isn't 802.1Q or 802.1AD\n");
return -1;
}
@@ -1446,7 +1446,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
__u8 vlan_prio;
 
NEXT_ARG();
-   if (!eth_type_vlan(eth_type)) {
+   if (!eth_type_vlan(tc_proto)) {
fprintf(stderr, "Can't set \"vlan_prio\" if 
ethertype isn't 802.1Q or 802.1AD\n");
return -1;
}
-- 
2.25.4



Re: [PATCH iproute2-next v2] tc flower: use right ethertype in icmp/arp parsing

2020-11-24 Thread Roi Dayan




On 2020-11-24 11:39 AM, Roi Dayan wrote:



On 2020-11-14 5:12 AM, David Ahern wrote:

On 11/10/20 12:53 AM, Zahari Doychev wrote:

Currently the icmp and arp parsing functions are called with incorrect
ethtype in case of vlan or cvlan filter options. In this case either
cvlan_ethtype or vlan_ethtype has to be used. The ethtype is now updated
each time a vlan ethtype is matched during parsing.

Signed-off-by: Zahari Doychev 
---
  tc/f_flower.c | 52 +++
  1 file changed, 23 insertions(+), 29 deletions(-)



Thanks, looks much better.

applied to iproute2-next.



Hi,

I didn't debug yet but with this commit I am failing to add a tc
rule I always could before. also the error msg doesn't make sense.

Example:

# tc filter add dev enp8s0f0 protocol 802.1Q parent : prio 1 flower\
  skip_hw dst_mac e4:11:22:11:4a:51 src_mac e4:11:22:11:4a:50\
  vlan_ethtype 0x800 vlan_id 100 vlan_prio 0 action vlan pop action\
  mirred egress redirect dev enp8s0f0_0


Can't set "vlan_id" if ethertype isn't 802.1Q or 802.1AD


I used protocol 802.1Q and vlan_ethtype 0x800.
am i missing something? the rule should look different now?

Thanks,
Roi



Hi,

I debugged this and it break vlan rules.
The issue is from this part of the change


@@ -1464,6 +1464,8 @@ static int flower_parse_opt(struct filter_util 
*qu, char *handle,

 &vlan_ethtype, n);
if (ret < 0)
return -1;
+   /* get new ethtype for later parsing  */
+   eth_type = vlan_ethtype;



Now params vlan_id, vlan_prio check if eth_type is vlan but it's not.
it's 0x0800 as the example as it was set to the vlan_ethtype.

Need to continue check the outer, so tc_proto.
i'll prep a fix commit for review.


Thanks,
Roi


Re: [PATCH iproute2-next v2] tc flower: use right ethertype in icmp/arp parsing

2020-11-24 Thread Roi Dayan




On 2020-11-14 5:12 AM, David Ahern wrote:

On 11/10/20 12:53 AM, Zahari Doychev wrote:

Currently the icmp and arp parsing functions are called with incorrect
ethtype in case of vlan or cvlan filter options. In this case either
cvlan_ethtype or vlan_ethtype has to be used. The ethtype is now updated
each time a vlan ethtype is matched during parsing.

Signed-off-by: Zahari Doychev 
---
  tc/f_flower.c | 52 +++
  1 file changed, 23 insertions(+), 29 deletions(-)



Thanks, looks much better.

applied to iproute2-next.



Hi,

I didn't debug yet but with this commit I am failing to add a tc
rule I always could before. also the error msg doesn't make sense.

Example:

# tc filter add dev enp8s0f0 protocol 802.1Q parent : prio 1 flower\
 skip_hw dst_mac e4:11:22:11:4a:51 src_mac e4:11:22:11:4a:50\
 vlan_ethtype 0x800 vlan_id 100 vlan_prio 0 action vlan pop action\
 mirred egress redirect dev enp8s0f0_0


Can't set "vlan_id" if ethertype isn't 802.1Q or 802.1AD


I used protocol 802.1Q and vlan_ethtype 0x800.
am i missing something? the rule should look different now?

Thanks,
Roi


Re: [PATCH mlx5-next v1 11/11] RDMA/mlx5: Remove IB representors dead code

2020-11-02 Thread Roi Dayan




On 2020-11-01 10:15 PM, Leon Romanovsky wrote:

From: Leon Romanovsky 

Delete dead code.

Signed-off-by: Leon Romanovsky 
---
  drivers/infiniband/hw/mlx5/ib_rep.c | 31 +++--
  drivers/infiniband/hw/mlx5/ib_rep.h | 31 -
  2 files changed, 7 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/ib_rep.c 
b/drivers/infiniband/hw/mlx5/ib_rep.c
index 9810bdd7f3bc..a1a9450ed92c 100644
--- a/drivers/infiniband/hw/mlx5/ib_rep.c
+++ b/drivers/infiniband/hw/mlx5/ib_rep.c
@@ -13,7 +13,7 @@ mlx5_ib_set_vport_rep(struct mlx5_core_dev *dev, struct 
mlx5_eswitch_rep *rep)
struct mlx5_ib_dev *ibdev;
int vport_index;

-   ibdev = mlx5_ib_get_uplink_ibdev(dev->priv.eswitch);
+   ibdev = mlx5_eswitch_uplink_get_proto_dev(dev->priv.eswitch, REP_IB);
vport_index = rep->vport_index;

ibdev->port[vport_index].rep = rep;
@@ -74,6 +74,11 @@ mlx5_ib_vport_rep_load(struct mlx5_core_dev *dev, struct 
mlx5_eswitch_rep *rep)
return ret;
  }

+static void *mlx5_ib_rep_to_dev(struct mlx5_eswitch_rep *rep)
+{
+   return rep->rep_data[REP_IB].priv;
+}
+
  static void
  mlx5_ib_vport_rep_unload(struct mlx5_eswitch_rep *rep)
  {
@@ -91,40 +96,18 @@ mlx5_ib_vport_rep_unload(struct mlx5_eswitch_rep *rep)
__mlx5_ib_remove(dev, dev->profile, MLX5_IB_STAGE_MAX);
  }

-static void *mlx5_ib_vport_get_proto_dev(struct mlx5_eswitch_rep *rep)
-{
-   return mlx5_ib_rep_to_dev(rep);
-}
-
  static const struct mlx5_eswitch_rep_ops rep_ops = {
.load = mlx5_ib_vport_rep_load,
.unload = mlx5_ib_vport_rep_unload,
-   .get_proto_dev = mlx5_ib_vport_get_proto_dev,
+   .get_proto_dev = mlx5_ib_rep_to_dev,
  };

-struct mlx5_ib_dev *mlx5_ib_get_rep_ibdev(struct mlx5_eswitch *esw,
- u16 vport_num)
-{
-   return mlx5_eswitch_get_proto_dev(esw, vport_num, REP_IB);
-}
-
  struct net_device *mlx5_ib_get_rep_netdev(struct mlx5_eswitch *esw,
  u16 vport_num)
  {
return mlx5_eswitch_get_proto_dev(esw, vport_num, REP_ETH);
  }

-struct mlx5_ib_dev *mlx5_ib_get_uplink_ibdev(struct mlx5_eswitch *esw)
-{
-   return mlx5_eswitch_uplink_get_proto_dev(esw, REP_IB);
-}
-
-struct mlx5_eswitch_rep *mlx5_ib_vport_rep(struct mlx5_eswitch *esw,
-  u16 vport_num)
-{
-   return mlx5_eswitch_vport_rep(esw, vport_num);
-}
-
  struct mlx5_flow_handle *create_flow_rule_vport_sq(struct mlx5_ib_dev *dev,
   struct mlx5_ib_sq *sq,
   u16 port)
diff --git a/drivers/infiniband/hw/mlx5/ib_rep.h 
b/drivers/infiniband/hw/mlx5/ib_rep.h
index 93f562735e89..ce1dcb105dbd 100644
--- a/drivers/infiniband/hw/mlx5/ib_rep.h
+++ b/drivers/infiniband/hw/mlx5/ib_rep.h
@@ -12,11 +12,6 @@
  extern const struct mlx5_ib_profile raw_eth_profile;

  #ifdef CONFIG_MLX5_ESWITCH
-struct mlx5_ib_dev *mlx5_ib_get_rep_ibdev(struct mlx5_eswitch *esw,
- u16 vport_num);
-struct mlx5_ib_dev *mlx5_ib_get_uplink_ibdev(struct mlx5_eswitch *esw);
-struct mlx5_eswitch_rep *mlx5_ib_vport_rep(struct mlx5_eswitch *esw,
-  u16 vport_num);
  int mlx5r_rep_init(void);
  void mlx5r_rep_cleanup(void);
  struct mlx5_flow_handle *create_flow_rule_vport_sq(struct mlx5_ib_dev *dev,
@@ -25,26 +20,6 @@ struct mlx5_flow_handle *create_flow_rule_vport_sq(struct 
mlx5_ib_dev *dev,
  struct net_device *mlx5_ib_get_rep_netdev(struct mlx5_eswitch *esw,
  u16 vport_num);
  #else /* CONFIG_MLX5_ESWITCH */
-static inline
-struct mlx5_ib_dev *mlx5_ib_get_rep_ibdev(struct mlx5_eswitch *esw,
- u16 vport_num)
-{
-   return NULL;
-}
-
-static inline
-struct mlx5_ib_dev *mlx5_ib_get_uplink_ibdev(struct mlx5_eswitch *esw)
-{
-   return NULL;
-}
-
-static inline
-struct mlx5_eswitch_rep *mlx5_ib_vport_rep(struct mlx5_eswitch *esw,
-  u16 vport_num)
-{
-   return NULL;
-}
-
  static inline int mlx5r_rep_init(void) { return 0; }
  static inline void mlx5r_rep_cleanup(void) {}
  static inline
@@ -62,10 +37,4 @@ struct net_device *mlx5_ib_get_rep_netdev(struct 
mlx5_eswitch *esw,
return NULL;
  }
  #endif
-
-static inline
-struct mlx5_ib_dev *mlx5_ib_rep_to_dev(struct mlx5_eswitch_rep *rep)
-{
-   return rep->rep_data[REP_IB].priv;
-}
  #endif /* __MLX5_IB_REP_H__ */
--
2.28.0



Reviewed-by: Roi Dayan 


Re: [PATCH mlx5-next v1 10/11] net/mlx5: Simplify eswitch mode check

2020-11-02 Thread Roi Dayan
-3135,7 +3135,7 @@ static void mlx5e_modify_admin_state(struct mlx5_core_dev 
*mdev,

mlx5_set_port_admin_status(mdev, state);

-   if (!MLX5_ESWITCH_MANAGER(mdev) ||  mlx5_eswitch_mode(esw) == 
MLX5_ESWITCH_OFFLOADS)
+   if (mlx5_eswitch_mode(mdev) != MLX5_ESWITCH_LEGACY)
return;

if (state == MLX5_PORT_UP)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index e3a968e9e2a0..7548bab78654 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -271,8 +271,6 @@ mlx5e_tc_match_to_reg_set(struct mlx5_core_dev *mdev,
return 0;
  }

-#define esw_offloads_mode(esw) (mlx5_eswitch_mode(esw) == 
MLX5_ESWITCH_OFFLOADS)
-
  static struct mlx5_tc_ct_priv *
  get_ct_priv(struct mlx5e_priv *priv)
  {
@@ -280,7 +278,7 @@ get_ct_priv(struct mlx5e_priv *priv)
struct mlx5_rep_uplink_priv *uplink_priv;
struct mlx5e_rep_priv *uplink_rpriv;

-   if (esw_offloads_mode(esw)) {
+   if (is_mdev_switchdev_mode(priv->mdev)) {
uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
uplink_priv = &uplink_rpriv->uplink_priv;

@@ -297,7 +295,7 @@ mlx5_tc_rule_insert(struct mlx5e_priv *priv,
  {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;

-   if (esw_offloads_mode(esw))
+   if (is_mdev_switchdev_mode(priv->mdev))
return mlx5_eswitch_add_offloaded_rule(esw, spec, attr);

return  mlx5e_add_offloaded_nic_rule(priv, spec, attr);
@@ -310,7 +308,7 @@ mlx5_tc_rule_delete(struct mlx5e_priv *priv,
  {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;

-   if (esw_offloads_mode(esw)) {
+   if (is_mdev_switchdev_mode(priv->mdev)) {
mlx5_eswitch_del_offloaded_rule(esw, rule, attr);

return;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index b652b4bde733..b44f28fb5518 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -2439,8 +2439,10 @@ int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch 
*esw,
return err;
  }

-u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw)
+u8 mlx5_eswitch_mode(struct mlx5_core_dev *dev)
  {
+   struct mlx5_eswitch *esw = dev->priv.eswitch;
+
return ESW_ALLOWED(esw) ? esw->mode : MLX5_ESWITCH_NONE;
  }
  EXPORT_SYMBOL_GPL(mlx5_eswitch_mode);
diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h
index b0ae8020f13e..29fd832950e0 100644
--- a/include/linux/mlx5/eswitch.h
+++ b/include/linux/mlx5/eswitch.h
@@ -96,10 +96,10 @@ static inline u32 mlx5_eswitch_get_vport_metadata_mask(void)

  u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
  u16 vport_num);
-u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw);
+u8 mlx5_eswitch_mode(struct mlx5_core_dev *dev);
  #else  /* CONFIG_MLX5_ESWITCH */

-static inline u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw)
+static inline u8 mlx5_eswitch_mode(struct mlx5_core_dev *dev)
  {
return MLX5_ESWITCH_NONE;
  }
@@ -136,4 +136,8 @@ mlx5_eswitch_get_vport_metadata_mask(void)
  }
  #endif /* CONFIG_MLX5_ESWITCH */

+static inline bool is_mdev_switchdev_mode(struct mlx5_core_dev *dev)
+{
+   return mlx5_eswitch_mode(dev) == MLX5_ESWITCH_OFFLOADS;
+}
  #endif
--
2.28.0



Reviewed-by: Roi Dayan 


Re: [PATCH mlx5-next v1 02/11] net/mlx5: Properly convey driver version to firmware

2020-11-02 Thread Roi Dayan




On 2020-11-01 10:15 PM, Leon Romanovsky wrote:

From: Leon Romanovsky 

mlx5 firmware expects driver version in specific format X.X.X, so
make it always correct and based on real kernel version aligned with
the driver.

Fixes: 012e50e109fd ("net/mlx5: Set driver version into firmware")
Signed-off-by: Leon Romanovsky 
---
  drivers/net/ethernet/mellanox/mlx5/core/main.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 8ff207aa1479..71e210f22f69 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -50,6 +50,7 @@
  #ifdef CONFIG_RFS_ACCEL
  #include 
  #endif
+#include 
  #include 
  #include "mlx5_core.h"
  #include "lib/eq.h"
@@ -233,7 +234,10 @@ static void mlx5_set_driver_version(struct mlx5_core_dev 
*dev)
strncat(string, ",", remaining_size);

remaining_size = max_t(int, 0, driver_ver_sz - strlen(string));
-   strncat(string, DRIVER_VERSION, remaining_size);
+
+   snprintf(string + strlen(string), remaining_size, "%u.%u.%u",
+(u8)(LINUX_VERSION_CODE >> 16), (u8)(LINUX_VERSION_CODE >> 8),
+(u16)(LINUX_VERSION_CODE & 0xff));

/*Send the command*/
MLX5_SET(set_driver_version_in, in, opcode,
--
2.28.0



Reviewed-by: Roi Dayan 


[PATCH net] net/sched: act_ct: Fix adding udp port mangle operation

2020-10-19 Thread Roi Dayan
Need to use the udp header type and not tcp.

Fixes: 9c26ba9b1f45 ("net/sched: act_ct: Instantiate flow table entry actions")
Signed-off-by: Roi Dayan 
Reviewed-by: Paul Blakey 
---
 net/sched/act_ct.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index a780afdf570d..0bac241a4123 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -156,11 +156,11 @@ tcf_ct_flow_table_add_action_nat_udp(const struct 
nf_conntrack_tuple *tuple,
__be16 target_dst = target.dst.u.udp.port;
 
if (target_src != tuple->src.u.udp.port)
-   tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP,
+   tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_UDP,
 offsetof(struct udphdr, source),
 0x, be16_to_cpu(target_src));
if (target_dst != tuple->dst.u.udp.port)
-   tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP,
+   tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_UDP,
 offsetof(struct udphdr, dest),
 0x, be16_to_cpu(target_dst));
 }
-- 
2.8.4



[PATCH ethtool] ethtool.spec: Add bash completion script

2020-08-03 Thread Roi Dayan
After the additon of the bash completion script, packaging
using the default spec file fails for installed but not packaged
error. so package it.

Fixes: 9b802643d7bd ("ethtool: Add bash-completion script")
Signed-off-by: Roi Dayan 
---
 ethtool.spec.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ethtool.spec.in b/ethtool.spec.in
index 9c01b07abf2b..75f9be6aafa6 100644
--- a/ethtool.spec.in
+++ b/ethtool.spec.in
@@ -34,6 +34,7 @@ make install DESTDIR=${RPM_BUILD_ROOT}
 %defattr(-,root,root)
 %{_sbindir}/ethtool
 %{_mandir}/man8/ethtool.8*
+%{_datadir}/bash-completion/completions/ethtool
 %doc AUTHORS COPYING NEWS README
 
 
-- 
2.8.4



Re: [PATCH net v2 1/2] netfilter: conntrack: Move nf_ct_offload_timeout to header file

2020-08-03 Thread Roi Dayan



On 2020-08-03 2:03 PM, Pablo Neira Ayuso wrote:
> On Mon, Aug 03, 2020 at 10:33:04AM +0300, Roi Dayan wrote:
>> To be used by callers from other modules.
>>
>> Signed-off-by: Roi Dayan 
>> Reviewed-by: Oz Shlomo 
>> ---
>>  include/net/netfilter/nf_conntrack.h | 12 
>>  net/netfilter/nf_conntrack_core.c| 12 
>>  2 files changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/net/netfilter/nf_conntrack.h 
>> b/include/net/netfilter/nf_conntrack.h
>> index 90690e37a56f..8481819ff632 100644
>> --- a/include/net/netfilter/nf_conntrack.h
>> +++ b/include/net/netfilter/nf_conntrack.h
>> @@ -279,6 +279,18 @@ static inline bool nf_ct_should_gc(const struct nf_conn 
>> *ct)
>> !nf_ct_is_dying(ct);
>>  }
>>  
>> +#define DAY (86400 * HZ)
>> +
>> +/* Set an arbitrary timeout large enough not to ever expire, this save
>> + * us a check for the IPS_OFFLOAD_BIT from the packet path via
>> + * nf_ct_is_expired().
>> + */
>> +static inline void nf_ct_offload_timeout(struct nf_conn *ct)
>> +{
>> +if (nf_ct_expires(ct) < DAY / 2)
>> +ct->timeout = nfct_time_stamp + DAY;
>> +}
>> +
>>  struct kernel_param;
>>  
> 
> For the record: I have renamed DAY to NF_CT_DAY to avoid a possible
> symbol name clash. No need to resend, I applied this small change
> before applying.
> 

thanks


[PATCH net v2 0/2] netfilter: conntrack: Fix CT offload timeout on heavily loaded systems

2020-08-03 Thread Roi Dayan
On heavily loaded systems the GC can take time to go over all existing
conns and reset their timeout. At that time other calls like from
nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
expired. To fix this when we set the offload bit we should also reset
the timeout instead of counting on GC to finish first iteration over
all conns before the initial timeout.

First commit is to expose the function that updates the timeout.
Second commit is to use it from flow_offload_add().

Roi Dayan (2):
  netfilter: conntrack: Move nf_ct_offload_timeout to header file
  netfilter: flowtable: Set offload timeout when adding flow

 include/net/netfilter/nf_conntrack.h | 12 
 net/netfilter/nf_conntrack_core.c| 12 
 net/netfilter/nf_flow_table_core.c   |  2 ++
 3 files changed, 14 insertions(+), 12 deletions(-)

-- 
2.8.4



[PATCH net v2 2/2] netfilter: flowtable: Set offload timeout when adding flow

2020-08-03 Thread Roi Dayan
On heavily loaded systems the GC can take time to go over all existing
conns and reset their timeout. At that time other calls like from
nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
expired. To fix this when we set the offload bit we should also reset
the timeout instead of counting on GC to finish first iteration over
all conns before the initial timeout.

Fixes: 90964016e5d3 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit")
Signed-off-by: Roi Dayan 
---

Notes:
v2
- timeout fix from flow_offload_add() instead of act_ct

 net/netfilter/nf_flow_table_core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index b1eb5272b379..4f7a567c536e 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -243,6 +243,8 @@ int flow_offload_add(struct nf_flowtable *flow_table, 
struct flow_offload *flow)
return err;
}
 
+   nf_ct_offload_timeout(flow->ct);
+
if (nf_flowtable_hw_offload(flow_table)) {
__set_bit(NF_FLOW_HW, &flow->flags);
nf_flow_offload_add(flow_table, flow);
-- 
2.8.4



[PATCH net v2 1/2] netfilter: conntrack: Move nf_ct_offload_timeout to header file

2020-08-03 Thread Roi Dayan
To be used by callers from other modules.

Signed-off-by: Roi Dayan 
Reviewed-by: Oz Shlomo 
---
 include/net/netfilter/nf_conntrack.h | 12 
 net/netfilter/nf_conntrack_core.c| 12 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 90690e37a56f..8481819ff632 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -279,6 +279,18 @@ static inline bool nf_ct_should_gc(const struct nf_conn 
*ct)
   !nf_ct_is_dying(ct);
 }
 
+#defineDAY (86400 * HZ)
+
+/* Set an arbitrary timeout large enough not to ever expire, this save
+ * us a check for the IPS_OFFLOAD_BIT from the packet path via
+ * nf_ct_is_expired().
+ */
+static inline void nf_ct_offload_timeout(struct nf_conn *ct)
+{
+   if (nf_ct_expires(ct) < DAY / 2)
+   ct->timeout = nfct_time_stamp + DAY;
+}
+
 struct kernel_param;
 
 int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp);
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 79cd9dde457b..947c6d9437c3 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1344,18 +1344,6 @@ static bool gc_worker_can_early_drop(const struct 
nf_conn *ct)
return false;
 }
 
-#defineDAY (86400 * HZ)
-
-/* Set an arbitrary timeout large enough not to ever expire, this save
- * us a check for the IPS_OFFLOAD_BIT from the packet path via
- * nf_ct_is_expired().
- */
-static void nf_ct_offload_timeout(struct nf_conn *ct)
-{
-   if (nf_ct_expires(ct) < DAY / 2)
-   ct->timeout = nfct_time_stamp + DAY;
-}
-
 static void gc_worker(struct work_struct *work)
 {
unsigned int min_interval = max(HZ / GC_MAX_BUCKETS_DIV, 1u);
-- 
2.8.4



Re: [PATCH net 2/2] net/sched: act_ct: Set offload timeout when setting the offload bit

2020-08-03 Thread Roi Dayan



On 2020-07-29 8:10 PM, Marcelo Ricardo Leitner wrote:
> On Wed, Jul 29, 2020 at 03:55:53PM +0300, Roi Dayan wrote:
>>
>>
>> On 2020-07-28 5:42 PM, Marcelo Ricardo Leitner wrote:
>>> On Tue, Jul 28, 2020 at 02:57:59PM +0300, Roi Dayan wrote:
>>>> On heavily loaded systems the GC can take time to go over all existing
>>>> conns and reset their timeout. At that time other calls like from
>>>> nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
>>>> expired. To fix this when we set the offload bit we should also reset
>>>> the timeout instead of counting on GC to finish first iteration over
>>>> all conns before the initial timeout.
>>>>
>>>> Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections 
>>>> to flow table")
>>>> Signed-off-by: Roi Dayan 
>>>> Reviewed-by: Oz Shlomo 
>>>> ---
>>>>  net/sched/act_ct.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>>>> index e9f3576cbf71..650c2d78a346 100644
>>>> --- a/net/sched/act_ct.c
>>>> +++ b/net/sched/act_ct.c
>>>> @@ -366,6 +366,8 @@ static void tcf_ct_flow_table_add(struct 
>>>> tcf_ct_flow_table *ct_ft,
>>>
>>> Extra context line:
>>> err = flow_offload_add(&ct_ft->nf_ft, entry);
>>>>if (err)
>>>>goto err_add;
>>>>  
>>>> +  nf_ct_offload_timeout(ct);
>>>> +
>>>
>>> What about adding this to flow_offload_add() instead?
>>> It is already adjusting the flow_offload timeout there and then it
>>> also effective for nft.
>>>
>>
>> As you said, in flow_offload_add() we adjust the flow timeout.
>> Here we adjust the conn timeout.
>> So it's outside flow_offload_add() which only touch the flow struct.
>> I guess it's like conn offload bit is set outside here and for nft.
> 
> Right, but
> 
>> What do you think?
> 
> I don't see why it can't update both. flow_offload_fixup_ct_timeout(),
> called by flow_offload_del(), is updating ct->timeout already. It
> looks consistent to me to update it in _add as well then. 
> 

I don't mind. just add is not consistent with del.
del also clears the ips_offload_bit but add doesn't add it.
i'll send v2 with your suggestion.

>>
>>>>return;
>>>>  
>>>>  err_add:
>>>> -- 
>>>> 2.8.4
>>>>


Re: [PATCH net 2/2] net/sched: act_ct: Set offload timeout when setting the offload bit

2020-07-29 Thread Roi Dayan



On 2020-07-28 5:42 PM, Marcelo Ricardo Leitner wrote:
> On Tue, Jul 28, 2020 at 02:57:59PM +0300, Roi Dayan wrote:
>> On heavily loaded systems the GC can take time to go over all existing
>> conns and reset their timeout. At that time other calls like from
>> nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
>> expired. To fix this when we set the offload bit we should also reset
>> the timeout instead of counting on GC to finish first iteration over
>> all conns before the initial timeout.
>>
>> Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to 
>> flow table")
>> Signed-off-by: Roi Dayan 
>> Reviewed-by: Oz Shlomo 
>> ---
>>  net/sched/act_ct.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> index e9f3576cbf71..650c2d78a346 100644
>> --- a/net/sched/act_ct.c
>> +++ b/net/sched/act_ct.c
>> @@ -366,6 +366,8 @@ static void tcf_ct_flow_table_add(struct 
>> tcf_ct_flow_table *ct_ft,
> 
> Extra context line:
>   err = flow_offload_add(&ct_ft->nf_ft, entry);
>>  if (err)
>>  goto err_add;
>>  
>> +nf_ct_offload_timeout(ct);
>> +
> 
> What about adding this to flow_offload_add() instead?
> It is already adjusting the flow_offload timeout there and then it
> also effective for nft.
> 

As you said, in flow_offload_add() we adjust the flow timeout.
Here we adjust the conn timeout.
So it's outside flow_offload_add() which only touch the flow struct.
I guess it's like conn offload bit is set outside here and for nft.
What do you think?

>>  return;
>>  
>>  err_add:
>> -- 
>> 2.8.4
>>


[PATCH net 1/2] netfilter: conntrack: Move nf_ct_offload_timeout to header file

2020-07-28 Thread Roi Dayan
To be used by callers from other modules.

Signed-off-by: Roi Dayan 
Reviewed-by: Oz Shlomo 
---
 include/net/netfilter/nf_conntrack.h | 12 
 net/netfilter/nf_conntrack_core.c| 12 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 90690e37a56f..8481819ff632 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -279,6 +279,18 @@ static inline bool nf_ct_should_gc(const struct nf_conn 
*ct)
   !nf_ct_is_dying(ct);
 }
 
+#defineDAY (86400 * HZ)
+
+/* Set an arbitrary timeout large enough not to ever expire, this save
+ * us a check for the IPS_OFFLOAD_BIT from the packet path via
+ * nf_ct_is_expired().
+ */
+static inline void nf_ct_offload_timeout(struct nf_conn *ct)
+{
+   if (nf_ct_expires(ct) < DAY / 2)
+   ct->timeout = nfct_time_stamp + DAY;
+}
+
 struct kernel_param;
 
 int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp);
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 79cd9dde457b..947c6d9437c3 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1344,18 +1344,6 @@ static bool gc_worker_can_early_drop(const struct 
nf_conn *ct)
return false;
 }
 
-#defineDAY (86400 * HZ)
-
-/* Set an arbitrary timeout large enough not to ever expire, this save
- * us a check for the IPS_OFFLOAD_BIT from the packet path via
- * nf_ct_is_expired().
- */
-static void nf_ct_offload_timeout(struct nf_conn *ct)
-{
-   if (nf_ct_expires(ct) < DAY / 2)
-   ct->timeout = nfct_time_stamp + DAY;
-}
-
 static void gc_worker(struct work_struct *work)
 {
unsigned int min_interval = max(HZ / GC_MAX_BUCKETS_DIV, 1u);
-- 
2.8.4



[PATCH net 0/2] netfilter: conntrack: Fix CT offload timeout on heavily loaded systems

2020-07-28 Thread Roi Dayan
On heavily loaded systems the GC can take time to go over all existing
conns and reset their timeout. At that time other calls like from
nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
expired. To fix this when we set the offload bit we should also reset
the timeout instead of counting on GC to finish first iteration over
all conns before the initial timeout.

First commit is to expose the function that updates the timeout.
Second commit is to use it from act_ct.

Roi Dayan (2):
  netfilter: conntrack: Move nf_ct_offload_timeout to header file
  net/sched: act_ct: Set offload timeout when setting the offload bit

 include/net/netfilter/nf_conntrack.h | 12 
 net/netfilter/nf_conntrack_core.c| 12 
 net/sched/act_ct.c   |  2 ++
 3 files changed, 14 insertions(+), 12 deletions(-)

-- 
2.8.4



[PATCH net 2/2] net/sched: act_ct: Set offload timeout when setting the offload bit

2020-07-28 Thread Roi Dayan
On heavily loaded systems the GC can take time to go over all existing
conns and reset their timeout. At that time other calls like from
nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
expired. To fix this when we set the offload bit we should also reset
the timeout instead of counting on GC to finish first iteration over
all conns before the initial timeout.

Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to 
flow table")
Signed-off-by: Roi Dayan 
Reviewed-by: Oz Shlomo 
---
 net/sched/act_ct.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index e9f3576cbf71..650c2d78a346 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -366,6 +366,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table 
*ct_ft,
if (err)
goto err_add;
 
+   nf_ct_offload_timeout(ct);
+
return;
 
 err_add:
-- 
2.8.4



Re: [PATCH net-next v3 1/3] net/mlx5e: Implicitly decap the tunnel packet when necessary

2020-06-21 Thread Roi Dayan



On 2020-06-18 11:36 AM, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> The commit 0a7fcb78cc21 ("net/mlx5e: Support inner header rewrite with
> goto action"), will decapsulate the tunnel packets if there is a goto
> action in chain 0. But in some case, we don't want do that, for example:
> 
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 0  \
> flower enc_dst_ip 2.2.2.100 enc_dst_port 4789   \
> action goto chain 2
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 2  \
> flower dst_mac 00:11:22:33:44:55 enc_src_ip 2.2.2.200   \
> enc_dst_ip 2.2.2.100 enc_dst_port 4789 enc_key_id 100   \
> action tunnel_key unset action mirred egress redirect dev enp130s0f0_0
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 2  \
> flower dst_mac 00:11:22:33:44:66 enc_src_ip 2.2.2.200   \
> enc_dst_ip 2.2.2.100 enc_dst_port 4789 enc_key_id 200   \
> action tunnel_key unset action mirred egress redirect dev enp130s0f0_1
> 
> If there are pedit and goto actions, do the decapsulate and id mapping action.
> 

Hi Tonghao,

I think you might missed Paul's comments on V2?

Thanks,
Roi

> Signed-off-by: Tonghao Zhang 
> ---
>  .../ethernet/mellanox/mlx5/core/en/mapping.c  |  24 
>  .../ethernet/mellanox/mlx5/core/en/mapping.h  |   1 +
>  .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 109 --
>  3 files changed, 99 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.c
> index ea321e528749..90306dde6b60 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.c
> @@ -74,6 +74,30 @@ int mapping_add(struct mapping_ctx *ctx, void *data, u32 
> *id)
>   return err;
>  }
>  
> +int mapping_find_by_data(struct mapping_ctx *ctx, void *data, u32 *id)
> +{
> + struct mapping_item *mi;
> + u32 hash_key;
> +
> + mutex_lock(&ctx->lock);
> +
> + hash_key = jhash(data, ctx->data_size, 0);
> + hash_for_each_possible(ctx->ht, mi, node, hash_key) {
> + if (!memcmp(data, mi->data, ctx->data_size))
> + goto found;
> + }
> +
> + mutex_unlock(&ctx->lock);
> + return -ENOENT;
> +
> +found:
> + if (id)
> + *id = mi->id;
> +
> + mutex_unlock(&ctx->lock);
> + return 0;
> +}
> +
>  static void mapping_remove_and_free(struct mapping_ctx *ctx,
>   struct mapping_item *mi)
>  {
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.h 
> b/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.h
> index 285525cc5470..af501c9796b7 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/mapping.h
> @@ -9,6 +9,7 @@ struct mapping_ctx;
>  int mapping_add(struct mapping_ctx *ctx, void *data, u32 *id);
>  int mapping_remove(struct mapping_ctx *ctx, u32 id);
>  int mapping_find(struct mapping_ctx *ctx, u32 id, void *data);
> +int mapping_find_by_data(struct mapping_ctx *ctx, void *data, u32 *id);
>  
>  /* mapping uses an xarray to map data to ids in add(), and for find().
>   * For locking, it uses a internal xarray spin lock for add()/remove(),
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 7fc84f58e28a..05f8df8b53af 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -1836,7 +1836,8 @@ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
>   }
>  }
>  
> -static int flow_has_tc_fwd_action(struct flow_cls_offload *f)
> +static int flow_has_tc_action(struct flow_cls_offload *f,
> +   enum flow_action_id action)
>  {
>   struct flow_rule *rule = flow_cls_offload_flow_rule(f);
>   struct flow_action *flow_action = &rule->action;
> @@ -1844,12 +1845,8 @@ static int flow_has_tc_fwd_action(struct 
> flow_cls_offload *f)
>   int i;
>  
>   flow_action_for_each(i, act, flow_action) {
> - switch (act->id) {
> - case FLOW_ACTION_GOTO:
> + if (act->id == action)
>   return true;
> - default:
> - continue;
> - }
>   }
>  
>   return false;
> @@ -1901,10 +1898,37 @@ enc_opts_is_dont_care_or_full_match(struct mlx5e_priv 
> *priv,
>  sizeof(*__dst));\
>  })
>  
> +static void mlx5e_make_tunnel_match_key(struct flow_cls_offload *f,
> + struct net_device *filter_dev,
> + struct tunnel_match_key *tunnel_key)
> +{
> + struct flow_rule *rule = flow_cls_offload_flow_rule(f);
> +
> + memset(tunnel_key, 0, sizeof(*tunnel_key));
> +

[PATCH net 0/2] remove dependency between mlx5, act_ct, nf_flow_table

2020-06-14 Thread Roi Dayan
Hi,

Some exported functions from act_ct and nf_flow_table being used in mlx5_core.
This leads that mlx5 module always require act_ct and nf_flow_table modules.
Those small exported functions can be moved to the header files to
avoid this module dependency.

Thanks,
Roi

Alaa Hleihel (2):
  net/sched: act_ct: Make tcf_ct_flow_table_restore_skb inline
  netfilter: flowtable: Make nf_flow_table_offload_add/del_cb inline

 include/net/netfilter/nf_flow_table.h | 49 ---
 include/net/tc_act/tc_ct.h| 11 +++-
 net/netfilter/nf_flow_table_core.c| 45 
 net/sched/act_ct.c| 11 
 4 files changed, 55 insertions(+), 61 deletions(-)

-- 
2.8.4



[PATCH net 2/2] netfilter: flowtable: Make nf_flow_table_offload_add/del_cb inline

2020-06-14 Thread Roi Dayan
From: Alaa Hleihel 

Currently, nf_flow_table_offload_add/del_cb are exported by nf_flow_table
module, therefore modules using them will have hard-dependency
on nf_flow_table and will require loading it all the time.

This can lead to an unnecessary overhead on systems that do not
use this API.

To relax the hard-dependency between the modules, we unexport these
functions and make them static inline.

Fixes: 978703f42549 ("netfilter: flowtable: Add API for registering to flow 
table events")
Signed-off-by: Alaa Hleihel 
Reviewed-by: Roi Dayan 
---
 include/net/netfilter/nf_flow_table.h | 49 ---
 net/netfilter/nf_flow_table_core.c| 45 
 2 files changed, 45 insertions(+), 49 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h 
b/include/net/netfilter/nf_flow_table.h
index c54a7f707e50..8a8f0e64edc3 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -161,10 +161,51 @@ struct nf_flow_route {
 struct flow_offload *flow_offload_alloc(struct nf_conn *ct);
 void flow_offload_free(struct flow_offload *flow);
 
-int nf_flow_table_offload_add_cb(struct nf_flowtable *flow_table,
-flow_setup_cb_t *cb, void *cb_priv);
-void nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table,
- flow_setup_cb_t *cb, void *cb_priv);
+static inline int
+nf_flow_table_offload_add_cb(struct nf_flowtable *flow_table,
+flow_setup_cb_t *cb, void *cb_priv)
+{
+   struct flow_block *block = &flow_table->flow_block;
+   struct flow_block_cb *block_cb;
+   int err = 0;
+
+   down_write(&flow_table->flow_block_lock);
+   block_cb = flow_block_cb_lookup(block, cb, cb_priv);
+   if (block_cb) {
+   err = -EEXIST;
+   goto unlock;
+   }
+
+   block_cb = flow_block_cb_alloc(cb, cb_priv, cb_priv, NULL);
+   if (IS_ERR(block_cb)) {
+   err = PTR_ERR(block_cb);
+   goto unlock;
+   }
+
+   list_add_tail(&block_cb->list, &block->cb_list);
+
+unlock:
+   up_write(&flow_table->flow_block_lock);
+   return err;
+}
+
+static inline void
+nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table,
+flow_setup_cb_t *cb, void *cb_priv)
+{
+   struct flow_block *block = &flow_table->flow_block;
+   struct flow_block_cb *block_cb;
+
+   down_write(&flow_table->flow_block_lock);
+   block_cb = flow_block_cb_lookup(block, cb, cb_priv);
+   if (block_cb) {
+   list_del(&block_cb->list);
+   flow_block_cb_free(block_cb);
+   } else {
+   WARN_ON(true);
+   }
+   up_write(&flow_table->flow_block_lock);
+}
 
 int flow_offload_route_init(struct flow_offload *flow,
const struct nf_flow_route *route);
diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index 42da6e337276..647680175213 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -387,51 +387,6 @@ static void nf_flow_offload_work_gc(struct work_struct 
*work)
queue_delayed_work(system_power_efficient_wq, &flow_table->gc_work, HZ);
 }
 
-int nf_flow_table_offload_add_cb(struct nf_flowtable *flow_table,
-flow_setup_cb_t *cb, void *cb_priv)
-{
-   struct flow_block *block = &flow_table->flow_block;
-   struct flow_block_cb *block_cb;
-   int err = 0;
-
-   down_write(&flow_table->flow_block_lock);
-   block_cb = flow_block_cb_lookup(block, cb, cb_priv);
-   if (block_cb) {
-   err = -EEXIST;
-   goto unlock;
-   }
-
-   block_cb = flow_block_cb_alloc(cb, cb_priv, cb_priv, NULL);
-   if (IS_ERR(block_cb)) {
-   err = PTR_ERR(block_cb);
-   goto unlock;
-   }
-
-   list_add_tail(&block_cb->list, &block->cb_list);
-
-unlock:
-   up_write(&flow_table->flow_block_lock);
-   return err;
-}
-EXPORT_SYMBOL_GPL(nf_flow_table_offload_add_cb);
-
-void nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table,
- flow_setup_cb_t *cb, void *cb_priv)
-{
-   struct flow_block *block = &flow_table->flow_block;
-   struct flow_block_cb *block_cb;
-
-   down_write(&flow_table->flow_block_lock);
-   block_cb = flow_block_cb_lookup(block, cb, cb_priv);
-   if (block_cb) {
-   list_del(&block_cb->list);
-   flow_block_cb_free(block_cb);
-   } else {
-   WARN_ON(true);
-   }
-   up_write(&flow_table->flow_block_lock);
-}
-EXPORT_SYMBOL_GPL(nf_flow_table_offload_del_cb);
 
 static int nf_flow_nat_port_tcp(struct sk_buff *skb, unsigned int thoff,
__be16 port, __be16 new_port)
-- 
2.8.4



[PATCH net 1/2] net/sched: act_ct: Make tcf_ct_flow_table_restore_skb inline

2020-06-14 Thread Roi Dayan
From: Alaa Hleihel 

Currently, tcf_ct_flow_table_restore_skb is exported by act_ct
module, therefore modules using it will have hard-dependency
on act_ct and will require loading it all the time.

This can lead to an unnecessary overhead on systems that do not
use hardware connection tracking action (ct_metadata action) in
the first place.

To relax the hard-dependency between the modules, we unexport this
function and make it a static inline one.

Fixes: 30b0cf90c6dd ("net/sched: act_ct: Support restoring conntrack info on 
skbs")
Signed-off-by: Alaa Hleihel 
Reviewed-by: Roi Dayan 
---
 include/net/tc_act/tc_ct.h | 11 ++-
 net/sched/act_ct.c | 11 ---
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
index 79654bcb9a29..8250d6f0a462 100644
--- a/include/net/tc_act/tc_ct.h
+++ b/include/net/tc_act/tc_ct.h
@@ -66,7 +66,16 @@ static inline struct nf_flowtable *tcf_ct_ft(const struct 
tc_action *a)
 #endif /* CONFIG_NF_CONNTRACK */
 
 #if IS_ENABLED(CONFIG_NET_ACT_CT)
-void tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie);
+static inline void
+tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie)
+{
+   enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;
+   struct nf_conn *ct;
+
+   ct = (struct nf_conn *)(cookie & NFCT_PTRMASK);
+   nf_conntrack_get(&ct->ct_general);
+   nf_ct_set(skb, ct, ctinfo);
+}
 #else
 static inline void
 tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie) { }
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index e29f0f45d688..e9f3576cbf71 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -1543,17 +1543,6 @@ static void __exit ct_cleanup_module(void)
destroy_workqueue(act_ct_wq);
 }
 
-void tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie)
-{
-   enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK;
-   struct nf_conn *ct;
-
-   ct = (struct nf_conn *)(cookie & NFCT_PTRMASK);
-   nf_conntrack_get(&ct->ct_general);
-   nf_ct_set(skb, ct, ctinfo);
-}
-EXPORT_SYMBOL_GPL(tcf_ct_flow_table_restore_skb);
-
 module_init(ct_init_module);
 module_exit(ct_cleanup_module);
 MODULE_AUTHOR("Paul Blakey ");
-- 
2.8.4



[PATCH iproute2] ip address: Fix loop initial declarations are only allowed in C99

2020-06-11 Thread Roi Dayan
On some distros, i.e. rhel 7.6, compilation fails with the following:

ipaddress.c: In function ‘lookup_flag_data_by_name’:
ipaddress.c:1260:2: error: ‘for’ loop initial declarations are only allowed in 
C99 mode
  for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
  ^
ipaddress.c:1260:2: note: use option -std=c99 or -std=gnu99 to compile your code

This commit fixes the single place needed for compilation to pass.

Fixes: 9d59c86e575b ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Roi Dayan 
---
 ip/ipaddress.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 3b53933f4167..f97eaff3dbbf 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1257,7 +1257,9 @@ static const struct ifa_flag_data_t {
 
 /* Returns a pointer to the data structure for a particular interface flag, or 
null if no flag could be found */
 static const struct ifa_flag_data_t* lookup_flag_data_by_name(const char* 
flag_name) {
-   for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
+   unsigned int i;
+
+   for (i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
if (strcmp(flag_name, ifa_flag_data[i].name) == 0)
return &ifa_flag_data[i];
}
-- 
2.8.4



Re: [PATCH net] netfilter: flowtable: Add pending bit for offload work

2020-05-11 Thread Roi Dayan



On 2020-05-11 2:59 PM, Pablo Neira Ayuso wrote:
> On Mon, May 11, 2020 at 11:32:36AM +0300, Paul Blakey wrote:
>> On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote:
> [...]
 @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct 
 flow_offload_work *offload)
  {
struct flow_offload_work *offload;
  
 +  if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
 +  return NULL;
>>> In case of stats, it's fine to lose work.
>>>
>>> But how does this work for the deletion case? Does this falls back to
>>> the timeout deletion?
>>
>> We get to nf_flow_table_offload_del (delete) in these cases:
>>
>>> ---if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) ||
>>> ---    test_bit(NF_FLOW_TEARDOWN, &flow->flags) {
>>> --->---   
>>> --->---    nf_flow_offload_del(flow_table, flow);
>>
>> Which are all persistent once set but the nf_flow_has_expired(flow). So we 
>> will
>> try the delete
>> again and again till pending flag is unset or the flow is 'saved' by the 
>> already
>> queued stats updating the timeout.
>> A pending stats update can't save the flow once it's marked for teardown or
>> (flow->ct is dying), only delay it.
> 
> Thanks for explaining.
> 
>> We didn't mention flush, like in table free. I guess we need to flush the
>> hardware workqueue
>> of any pending stats work, then queue the deletion, and flush again:
>> Adding nf_flow_table_offload_flush(flow_table), after
>> cancel_delayed_work_sync(&flow_table->gc_work);
> 
> The "flush" makes sure that stats work runs before the deletion, to
> ensure no races happen for in-transit work objects, right?
> 
> We might use alloc_ordered_workqueue() and let the workqueue handle
> this problem?
> 

ordered workqueue executes one work at a time.


Re: [PATCH net] netfilter: flowtable: Add pending bit for offload work

2020-05-11 Thread Roi Dayan



On 2020-05-11 2:59 PM, Pablo Neira Ayuso wrote:
> On Mon, May 11, 2020 at 11:32:36AM +0300, Paul Blakey wrote:
>> On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote:
> [...]
 @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct 
 flow_offload_work *offload)
  {
struct flow_offload_work *offload;
  
 +  if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
 +  return NULL;
>>> In case of stats, it's fine to lose work.
>>>
>>> But how does this work for the deletion case? Does this falls back to
>>> the timeout deletion?
>>
>> We get to nf_flow_table_offload_del (delete) in these cases:
>>
>>> ---if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) ||
>>> ---    test_bit(NF_FLOW_TEARDOWN, &flow->flags) {
>>> --->---   
>>> --->---    nf_flow_offload_del(flow_table, flow);
>>
>> Which are all persistent once set but the nf_flow_has_expired(flow). So we 
>> will
>> try the delete
>> again and again till pending flag is unset or the flow is 'saved' by the 
>> already
>> queued stats updating the timeout.
>> A pending stats update can't save the flow once it's marked for teardown or
>> (flow->ct is dying), only delay it.
> 
> Thanks for explaining.
> 
>> We didn't mention flush, like in table free. I guess we need to flush the
>> hardware workqueue
>> of any pending stats work, then queue the deletion, and flush again:
>> Adding nf_flow_table_offload_flush(flow_table), after
>> cancel_delayed_work_sync(&flow_table->gc_work);
> 
> The "flush" makes sure that stats work runs before the deletion, to
> ensure no races happen for in-transit work objects, right?
> 
> We might use alloc_ordered_workqueue() and let the workqueue handle
> this problem?
> 

ordered workqueue executed one work at a time.


[PATCH net] netfilter: nf_flow_table_offload: Remove WQ_MEM_RECLAIM from workqueue

2020-05-10 Thread Roi Dayan
This workqueue is in charge of handling offloaded flow tasks like
add/del/stats we should not use WQ_MEM_RECLAIM flag.
The flag can result in the following warning.

[  485.557189] [ cut here ]
[  485.562976] workqueue: WQ_MEM_RECLAIM nf_flow_table_offload:flow_offload_worr
[  485.562985] WARNING: CPU: 7 PID: 3731 at kernel/workqueue.c:2610 check_flush0
[  485.590191] Kernel panic - not syncing: panic_on_warn set ...
[  485.597100] CPU: 7 PID: 3731 Comm: kworker/u112:8 Not tainted 5.7.0-rc1.21802
[  485.606629] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/177
[  485.615487] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_f]
[  485.624834] Call Trace:
[  485.628077]  dump_stack+0x50/0x70
[  485.632280]  panic+0xfb/0x2d7
[  485.636083]  ? check_flush_dependency+0x110/0x130
[  485.641830]  __warn.cold.12+0x20/0x2a
[  485.646405]  ? check_flush_dependency+0x110/0x130
[  485.652154]  ? check_flush_dependency+0x110/0x130
[  485.657900]  report_bug+0xb8/0x100
[  485.662187]  ? sched_clock_cpu+0xc/0xb0
[  485.666974]  do_error_trap+0x9f/0xc0
[  485.671464]  do_invalid_op+0x36/0x40
[  485.675950]  ? check_flush_dependency+0x110/0x130
[  485.681699]  invalid_op+0x28/0x30

Fixes: 7da182a998d6 ("netfilter: flowtable: Use work entry per offload command")
Reported-by: Marcelo Ricardo Leitner 
Signed-off-by: Roi Dayan 
Reviewed-by: Paul Blakey 
---
 net/netfilter/nf_flow_table_offload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_flow_table_offload.c 
b/net/netfilter/nf_flow_table_offload.c
index e3b099c14eff..148d3bd11fbc 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -1056,7 +1056,7 @@ static struct flow_indr_block_entry block_ing_entry = {
 int nf_flow_table_offload_init(void)
 {
nf_flow_offload_wq  = alloc_workqueue("nf_flow_table_offload",
- WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
+ WQ_UNBOUND, 0);
if (!nf_flow_offload_wq)
return -ENOMEM;
 
-- 
2.8.4



Re: mlx5: Panic with conntrack offload

2020-05-10 Thread Roi Dayan



On 2020-04-24 7:10 AM, Marcelo Ricardo Leitner wrote:
> Hi Paul,
> 
> I'm triggering this panic out of 1802136023c01075, net-next today
> (disregard the hash at the end of the kernel version).
> I have a dual-port CX5 with VF LAG, with 1 guest and 2 VFs on it, with
> different subnets. Ovs OF flows steering each subnet to each VF and
> doing conntrack. (flows at the bottom here)
> 
> This happens after 2 or 5 netperf runs. I run netperf, wait for the
> flows to expire, then run it again.
> 
> I'm suspecting this was introduced in 9808dd0a2aee ("net/mlx5e: CT: Use
> rhashtable's ct entries instead of a separate list"), but I didn't
> bisect it yet.  I know it also happens with 2fcd80144b93ff908, FWIW.
> 
> Ideas?

+pablo

Hi Marcelo,

I investigated this and see the root.

In nf flow table offload we create a workqueue with flag WQ_MEM_RECLAIM.
Each CT add/del/stats is queued as work to that workqueue.
In del flow, when we reach mlx5 driver, we reach del_sw_flow_group
and want to release an rhashtable (&fg->ftes_hash).
The rhashtable implementation could also queue a work if rehash is needed.
that work is queued on system_wq which is not created with flag
WQ_MEM_RECLAIM. So if we are now in a del flow work and flush and destroy
the ftes_hash rhashtable and there is a rehash work queue to be flushed
first then we reach in check_flush_dependency() and see target_wq is
system_wq which doesn't have WQ_MEM_RECLAIM flag so we dont return right
away and now checking current_wq_worker() is the nf flow table offload
workqueue which has flag WQ_MEM_RECLAIM.
This generates the warn_once() you see.

WARN_ONCE(worker && ((worker->current_pwq->wq->flags &  
  (WQ_MEM_RECLAIM | __WQ_LEGACY)) == 
WQ_MEM_RECLAIM),
  "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM 
%s:%ps",
  worker->current_pwq->wq->name, worker->current_func,  
  target_wq->name, target_func);

It's only avoided if there is also internal flag __WQ_LEGACY which is being
added only for create workqueue wrappers that creates single thread
workqueue. but we do want multi threaded workqueue.

Using WQ_MEM_RECLAIM means all other wqs being used inside a work should
also have WQ_MEM_RECLAIM. because of rhashtable this is not the case.

WQ_MEM_RECLAIM means the wq is guaranteed to have at least one execution
context regardless of memory pressure.

I think we should maybe remove WQ_MEM_RECLAIM from nf flow table offload
workqueue.  Pablo, Oz, Paul ?

Thanks,
Roi


> 
> Thanks,
> Marcelo
> 
> [  485.557189] [ cut here ]
> [  485.562976] workqueue: WQ_MEM_RECLAIM 
> nf_flow_table_offload:flow_offload_worr
> [  485.562985] WARNING: CPU: 7 PID: 3731 at kernel/workqueue.c:2610 
> check_flush0
> [  485.590191] Kernel panic - not syncing: panic_on_warn set ...
> [  485.597100] CPU: 7 PID: 3731 Comm: kworker/u112:8 Not tainted 
> 5.7.0-rc1.21802
> [  485.606629] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 
> 01/177
> [  485.615487] Workqueue: nf_flow_table_offload flow_offload_work_handler 
> [nf_f]
> [  485.624834] Call Trace:
> [  485.628077]  dump_stack+0x50/0x70
> [  485.632280]  panic+0xfb/0x2d7
> [  485.636083]  ? check_flush_dependency+0x110/0x130
> [  485.641830]  __warn.cold.12+0x20/0x2a
> [  485.646405]  ? check_flush_dependency+0x110/0x130
> [  485.652154]  ? check_flush_dependency+0x110/0x130
> [  485.657900]  report_bug+0xb8/0x100
> [  485.662187]  ? sched_clock_cpu+0xc/0xb0
> [  485.666974]  do_error_trap+0x9f/0xc0
> [  485.671464]  do_invalid_op+0x36/0x40
> [  485.675950]  ? check_flush_dependency+0x110/0x130
> [  485.681699]  invalid_op+0x28/0x30
> [  485.685891] RIP: 0010:check_flush_dependency+0x110/0x130
> [  485.692324] Code: ff ff 48 8b 50 18 48 8d 8b b0 00 00 00 49 89 e8 48 81 c6 
> b0
> [  485.714353] RSP: 0018:a9474aea7a48 EFLAGS: 00010086
> [  485.720724] RAX:  RBX: 912c07c19400 RCX: 
> 
> [  485.729232] RDX: 0090 RSI: af67e1f0 RDI: 
> af67bd2c
> [  485.737737] RBP: ade8f8d0 R08: af67e160 R09: 
> 0002b6c0
> [  485.746240] R10: 017f622837fe R11: 0e93 R12: 
> 9148ad011780
> [  485.754751] R13: 914b3f771700 R14: 0001 R15: 
> 914b30f1c1d0
> [  485.763261]  ? rhashtable_insert_slow+0x470/0x470
> [  485.769056]  ? check_flush_dependency+0x110/0x130
> [  485.774856]  __flush_work+0x96/0x1d0
> [  485.779376]  ? work_busy+0x80/0x80
> [  485.783681]  __cancel_work_timer+0x103/0x190
> [  485.788950]  ? _cond_resched+0x15/0x30
> [  485.793634]  ? _cond_resched+0x15/0x30
> [  485.798321]  ? _cond_resched+0x15/0x30
> [  485.803008]  rhashtable_free_and_destroy+0x20/0x140
> [  485.808979]  del_sw_flow_group+0x45/0x2c0 [mlx5_core]
> [  485.815119]  tree_put_node+0xc3/0x150 [mlx5_core]
> [  485.820893]  mlx5_del_flow_rules+0x11c/0x240 [mlx5_core]
> [  485.827344]

Re: [PATCH net-next 1/3] net/mlx5e: Implicitly decap the tunnel packet when necessary

2020-04-30 Thread Roi Dayan



On 2020-04-28 8:24 AM, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> The commit 0a7fcb78cc21 ("net/mlx5e: Support inner header rewrite with
> goto action"), will decapitate the tunnel packets if there is a goto
> action in chain 0. But in some case, we don't want do that, for example:

Hi Zhang,

Thanks for your commit. i'll run more tests so i might have some comments later.
Can you use decapsulate instead of decapitate please.

> 
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 0\
>   flower enc_dst_ip 2.2.2.100 enc_dst_port 4789   \
>   action goto chain 2
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 2\
>   flower dst_mac 00:11:22:33:44:55 enc_src_ip 2.2.2.200   \
>   enc_dst_ip 2.2.2.100 enc_dst_port 4789 enc_key_id 100   \
>   action tunnel_key unset action mirred egress redirect dev enp130s0f0_0
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 2\
>   flower dst_mac 00:11:22:33:44:66 enc_src_ip 2.2.2.200   \
>   enc_dst_ip 2.2.2.100 enc_dst_port 4789 enc_key_id 200   \
>   action tunnel_key unset action mirred egress redirect dev enp130s0f0_1
> 
> In this patch, if there is a pedit action in chain, do the decapitation 
> action.

decapsulation

> if there are pedit and goto actions, do the decapitation and id mapping 
> action.
> 
> 8 test units:
> [1]:
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 0\
>   flower enc_dst_ip 2.2.2.100 enc_dst_port 4789   \
>   action goto chain 2
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 2\
>   flower enc_src_ip 2.2.2.200 enc_dst_ip 2.2.2.100\
>   enc_dst_port 4789 enc_key_id 100 dst_mac 00:11:22:33:44:55  \
>   action tunnel_key unset \
>   action mirred egress redirect dev enp130s0f0_0
> [2]:
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 0\
>   flower enc_dst_ip 2.2.2.100 enc_dst_port 4789 enc_key_id 100\
>   action goto chain 2
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 2\
>   flower enc_src_ip 2.2.2.200 enc_dst_ip 2.2.2.100\
>   enc_dst_port 4789 enc_key_id 100 dst_mac 00:11:22:33:44:55  \
>   action pedit ex munge eth src set 00:11:22:33:44:f0 \
>   action mirred egress redirect dev enp130s0f0_0
> [3]:
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 0\
>   flower enc_dst_ip 2.2.2.100 enc_dst_port 4789 enc_key_id 100\
>   action goto chain 2
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 2\
>   flower enc_src_ip 2.2.2.200 enc_dst_ip 2.2.2.100\
>   enc_dst_port 4789 enc_key_id 100 dst_mac 00:11:22:33:44:55  \
>   action tunnel_key unset \
>   action pedit ex munge eth src set 00:11:22:33:44:f0 \
>   action mirred egress redirect dev enp130s0f0_0
> [4]:
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 0\
>   flower enc_dst_ip 2.2.2.100 enc_dst_port 4789   \
>   enc_key_id 100 dst_mac 00:11:22:33:44:55\
>   action pedit ex munge eth src set 00:11:22:33:44:ff pipe\
>   action mirred egress redirect dev enp130s0f0_0

what about the case of only chain 0 without pedit?
I think now we skip matching tun? see comment below in parse_tunnel_attr().

> [5]:
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 0\
>   flower enc_src_ip 2.2.2.200 enc_dst_ip 2.2.2.100\
>   enc_dst_port 4789 enc_key_id 100 dst_mac 00:11:22:33:44:55  \
>   action pedit ex munge eth src set 00:11:22:33:44:ff \
>   action goto chain 2
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 2\
>   flower enc_src_ip 2.2.2.200 enc_dst_ip 2.2.2.100\
>   enc_dst_port 4789 enc_key_id 100 src_mac 00:11:22:33:44:ff  \
>   action tunnel_key unset \
>   action mirred egress redirect dev enp130s0f0_0
> [6]:
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 0\
>   flower enc_src_ip 2.2.2.200 enc_dst_ip 2.2.2.100\
>   enc_dst_port 4789 enc_key_id 100 dst_mac 00:11:22:33:44:55  \
>   action pedit ex munge eth src set 00:11:22:33:44:ff \
>   action goto chain 2
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 2\
>   flower enc_src_ip 2.2.2.200 enc_dst_ip 2.2.2.100\
>   enc_dst_port 4789 enc_key_id 100 src_mac 00:11:22:33:44:ff  \
>   action tunnel_key unset \
>   action pedit ex munge eth src set 00:11:22:33:44:f0 \
>   action mirred egress redirect dev enp130s0f0_0
> [7]:
> $ tc filter add dev $VXLAN protocol ip parent : prio 1 chain 0\
>   flower  enc_src_ip 2.2.2.200 

Re: [PATCH net-next 3/3] net/mlx5e: Fix the code style

2020-04-30 Thread Roi Dayan



On 2020-04-28 8:24 AM, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 55457f268495..6b68f47e7024 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -1367,7 +1367,7 @@ static bool mlx5e_rep_has_offload_stats(const struct 
> net_device *dev, int attr_i
>  {
>   switch (attr_id) {
>   case IFLA_OFFLOAD_XSTATS_CPU_HIT:
> - return true;
> + return true;
>   }
>  
>   return false;
> 


Reviewed-by: Roi Dayan 


Re: [PATCH net-next 2/3] net/mlx5e: Introduce mlx5e_eswitch_rep_uplink_priv helper

2020-04-30 Thread Roi Dayan



On 2020-04-28 8:24 AM, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> Introduce the mlx5e_eswitch_rep_uplink_priv helper
> to make the codes readable.
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c |  4 +--
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |  9 +++
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 30 
> +-
>  3 files changed, 17 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
> index 16416eaac39e..c5d5e69ff147 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
> @@ -100,10 +100,8 @@ struct mlx5_ct_entry {
>  {
>   struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>   struct mlx5_rep_uplink_priv *uplink_priv;
> - struct mlx5e_rep_priv *uplink_rpriv;
>  
> - uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
> - uplink_priv = &uplink_rpriv->uplink_priv;
> + uplink_priv = mlx5e_eswitch_rep_uplink_priv(esw);
>   return uplink_priv->ct_priv;
>  }
>  
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
> index 6a2337900420..899ffa0872b7 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
> @@ -109,6 +109,15 @@ struct mlx5e_rep_priv *mlx5e_rep_to_rep_priv(struct 
> mlx5_eswitch_rep *rep)
>   return rep->rep_data[REP_ETH].priv;
>  }
>  
> +static inline struct mlx5_rep_uplink_priv *
> +mlx5e_eswitch_rep_uplink_priv(struct mlx5_eswitch *esw)
> +{
> + struct mlx5e_rep_priv *priv;
> +
> + priv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
> + return &priv->uplink_priv;
> +}
> +

since its in en_rep.h, please use "mlx5e_rep_" prefix.
thanks

>  struct mlx5e_neigh {
>   struct net_device *dev;
>   union {
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 64f5c3f3dbb3..696544e2a25b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -1250,12 +1250,10 @@ static void unready_flow_del(struct mlx5e_tc_flow 
> *flow)
>  static void add_unready_flow(struct mlx5e_tc_flow *flow)
>  {
>   struct mlx5_rep_uplink_priv *uplink_priv;
> - struct mlx5e_rep_priv *rpriv;
>   struct mlx5_eswitch *esw;
>  
>   esw = flow->priv->mdev->priv.eswitch;
> - rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
> - uplink_priv = &rpriv->uplink_priv;
> + uplink_priv = mlx5e_eswitch_rep_uplink_priv(esw);
>  
>   mutex_lock(&uplink_priv->unready_flows_lock);
>   unready_flow_add(flow, &uplink_priv->unready_flows);
> @@ -1265,12 +1263,10 @@ static void add_unready_flow(struct mlx5e_tc_flow 
> *flow)
>  static void remove_unready_flow(struct mlx5e_tc_flow *flow)
>  {
>   struct mlx5_rep_uplink_priv *uplink_priv;
> - struct mlx5e_rep_priv *rpriv;
>   struct mlx5_eswitch *esw;
>  
>   esw = flow->priv->mdev->priv.eswitch;
> - rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
> - uplink_priv = &rpriv->uplink_priv;
> + uplink_priv = mlx5e_eswitch_rep_uplink_priv(esw);
>  
>   mutex_lock(&uplink_priv->unready_flows_lock);
>   unready_flow_del(flow);
> @@ -1888,7 +1884,6 @@ static int mlx5e_get_flow_tunnel_id(struct mlx5e_priv 
> *priv,
>   struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts;
>   struct flow_match_enc_opts enc_opts_match;
>   struct mlx5_rep_uplink_priv *uplink_priv;
> - struct mlx5e_rep_priv *uplink_rpriv;
>   struct tunnel_match_key tunnel_key;
>   bool enc_opts_is_dont_care = true;
>   u32 tun_id, enc_opts_id = 0;
> @@ -1897,8 +1892,7 @@ static int mlx5e_get_flow_tunnel_id(struct mlx5e_priv 
> *priv,
>   int err;
>  
>   esw = priv->mdev->priv.eswitch;
> - uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
> - uplink_priv = &uplink_rpriv->uplink_priv;
> + uplink_priv = mlx5e_eswitch_rep_uplink_priv(esw);
>  
>   mlx5e_make_tunnel_match_key(f, filter_dev, &tunnel_key);
>   err = mapping_add(uplink_priv->tunnel_mapping, &tunnel_key, &tun_id);
> @@ -1957,13 +1951,11 @@ static int mlx5e_lookup_flow_tunnel_id(struct 
> mlx5e_priv *priv,
>  u32 *tun_id)
>  {
>   struct mlx5_rep_uplink_priv *uplink_priv;
> - struct mlx5e_rep_priv *uplink_rpriv;
>   struct tunnel_match_key tunnel_key;
>   struct mlx5_eswitch *esw;
>  
>   esw = priv->mdev->priv.eswitch;
> - uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
> - uplink_priv = &uplink_rpriv->uplink_priv;
> + uplink_priv = mlx5e_eswitch_rep_uplink_priv(esw);
>  
>   mlx5e_make_tunnel_match_key(f, filter_dev, &tunnel_key);
>   return mapping_find_by_data(uplink_p

Re: [patch net-next v2] mlx5: Add missing init_net check in FIB notifier

2019-09-01 Thread Roi Dayan


On 2019-08-30 11:25 AM, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Take only FIB events that are happening in init_net into account. No other
> namespaces are supported.
> 
> Signed-off-by: Jiri Pirko 
> ---
> v1->v2:
> - no change, just cced maintainers (fat finger made me avoid them in v1)
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
> index e69766393990..5d20d615663e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
> @@ -248,6 +248,9 @@ static int mlx5_lag_fib_event(struct notifier_block *nb,
>   struct net_device *fib_dev;
>   struct fib_info *fi;
>  
> + if (!net_eq(info->net, &init_net))
> + return NOTIFY_DONE;
> +
>   if (info->family != AF_INET)
>   return NOTIFY_DONE;
>  
> 

thanks

Acked-by: Roi Dayan 


Re: [PATCH net-next] net/mlx5e: Allow dropping specific tunnel packets

2019-08-01 Thread Roi Dayan


On 2019-08-01 11:40 AM, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> In some case, we don't want to allow specific tunnel packets
> to host that can avoid to take up high CPU (e.g network attacks).
> But other tunnel packets which not matched in hardware will be
> sent to host too.
> 
> $ tc filter add dev vxlan_sys_4789 \
>   protocol ip chain 0 parent : prio 1 handle 1 \
>   flower dst_ip 1.1.1.100 ip_proto tcp dst_port 80 \
>   enc_dst_ip 2.2.2.100 enc_key_id 100 enc_dst_port 4789 \
>   action tunnel_key unset pipe action drop
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index f3ed028..25d423e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -2485,7 +2485,8 @@ static bool actions_match_supported(struct mlx5e_priv 
> *priv,
>  
>   if (flow_flag_test(flow, EGRESS) &&
>   !((actions & MLX5_FLOW_CONTEXT_ACTION_DECAP) ||
> -   (actions & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP)))
> +   (actions & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP) ||
> +   (actions & MLX5_FLOW_CONTEXT_ACTION_DROP)))
>   return false;
>  
>   if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
> 

thanks!

Reviewed-by: Roi Dayan 


Re: Bug or mis configuration for mlx5e lag and multipath

2019-05-28 Thread Roi Dayan


On 24/05/2019 04:02, wenxu wrote:
> Hi,
> 
> I can get the right log from demsg
> 
>  mlx5_core :81:00.0: modify lag map port 1:1 port 2:2
> 
> 
> I debug with the driver, I find the rule be add on mlx_pf0vf0 and the peer 
> one pf1,
> 
> So I think the esw0 and esw1 both have the rule.
> 
> The test case is based on the master branch of the net git tree.

ok thanks for reporting. we'll have to check it.

> 
> 在 2019/5/23 23:15, Roi Dayan 写道:
>>
>> On 20/05/2019 04:53, wenxu wrote:
>>> Hi Roi & Saeed,
>>>
>>> I just test the mlx5e lag and mutipath feature. There are some suituation 
>>> the outgoing can't be offloaded.
>>>
>>> ovs configureation as following.
>>>
>>> # ovs-vsctl show
>>> dfd71dfb-6e22-423e-b088-d2022103af6b
>>>     Bridge "br0"
>>>     Port "mlx_pf0vf0"
>>>     Interface "mlx_pf0vf0"
>>>     Port gre
>>>     Interface gre
>>>     type: gre
>>>     options: {key="1000", local_ip="172.168.152.75", 
>>> remote_ip="172.168.152.241"}
>>>     Port "br0"
>>>     Interface "br0"
>>>     type: internal
>>>
>>> set the mlx5e driver:
>>>
>>>
>>> modprobe mlx5_core
>>> echo 0 > /sys/class/net/eth2/device/sriov_numvfs
>>> echo 0 > /sys/class/net/eth3/device/sriov_numvfs
>>> echo 2 > /sys/class/net/eth2/device/sriov_numvfs
>>> echo 2 > /sys/class/net/eth3/device/sriov_numvfs
>>> lspci -nn | grep Mellanox
>>> echo :81:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind
>>> echo :81:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind
>>> echo :81:03.6 > /sys/bus/pci/drivers/mlx5_core/unbind
>>> echo :81:03.7 > /sys/bus/pci/drivers/mlx5_core/unbind
>>>
>>> devlink dev eswitch set pci/:81:00.0  mode switchdev encap enable
>>> devlink dev eswitch set pci/:81:00.1  mode switchdev encap enable
>>>
>>> modprobe bonding mode=802.3ad miimon=100 lacp_rate=1
>>> ip l del dev bond0
>>> ifconfig mlx_p0 down
>>> ifconfig mlx_p1 down
>>> ip l add dev bond0 type bond mode 802.3ad
>>> ifconfig bond0 172.168.152.75/24 up
>>> echo 1 > /sys/class/net/bond0/bonding/xmit_hash_policy
>>> ip l set dev mlx_p0 master bond0
>>> ip l set dev mlx_p1 master bond0
>>> ifconfig mlx_p0 up
>>> ifconfig mlx_p1 up
>>>
>>> systemctl start openvswitch
>>> ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>>> systemctl restart openvswitch
>>>
>>>
>>> mlx_pf0vf0 is assigned to vm. The tc rule show in_hw
>>>
>>> # tc filter ls dev mlx_pf0vf0 ingress
>>> filter protocol ip pref 2 flower
>>> filter protocol ip pref 2 flower handle 0x1
>>>   dst_mac 8e:c0:bd:bf:72:c3
>>>   src_mac 52:54:00:00:12:75
>>>   eth_type ipv4
>>>   ip_tos 0/3
>>>   ip_flags nofrag
>>>   in_hw
>>>     action order 1: tunnel_key set
>>>     src_ip 172.168.152.75
>>>     dst_ip 172.168.152.241
>>>     key_id 1000 pipe
>>>     index 2 ref 1 bind 1
>>>  
>>>     action order 2: mirred (Egress Redirect to device gre_sys) stolen
>>>  index 2 ref 1 bind 1
>>>
>>> In the vm:  the mlx5e driver enable xps default (by the way I think it is 
>>> better not enable xps in default kernel can select queue by each flow),  in 
>>> the lag mode different vf queue associate with hw PF.
>>>
>>> with command taskset -c 2 ping 10.0.0.241
>>>
>>> the packet can be offloaded , the outgoing pf is mlx_p0
>>>
>>> but with command taskset -c 1 ping 10.0.0.241
>>>
>>> the packet can't be offloaded, I can capture the packet on the mlx_pf0vf0, 
>>> the outgoing pf is mlx_p1. Althrough the tc flower rule show in_hw
>>>
>>>
>>> I check with the driver  both mlx_pf0vf0 and peer(mlx_p1) install the tc 
>>> rule success
>>>
>>> I think it's a problem of lag mode. Or I miss some configureation?
>>>
>>>
>>> BR
>>>
>>> wenxu
>>>
>>>
>>>
>>>
>>>
>> Hi,
>>
>> we need to verify the driver detected to be in lag mode and
>> duplicated the offload rule to both eswitches.
>> do you see lag map messages in dmesg?
>> something like "lag map port 1:1 port 2:2"
>> this is to make sure the driver actually in lag mode.
>> in this mode a rule added to mlx_pf0vf0 will be added to esw of pf0 and esw 
>> of pf1.
>> then when u send a packet it could be handled in esw0 or esw1
>> if the rule is not in esw1 then it wont be offloaded when using pf1.
>>
>> thanks,
>> Roi


Re: Bug or mis configuration for mlx5e lag and multipath

2019-05-23 Thread Roi Dayan


On 20/05/2019 04:53, wenxu wrote:
> Hi Roi & Saeed,
> 
> I just test the mlx5e lag and mutipath feature. There are some suituation the 
> outgoing can't be offloaded.
> 
> ovs configureation as following.
> 
> # ovs-vsctl show
> dfd71dfb-6e22-423e-b088-d2022103af6b
>     Bridge "br0"
>     Port "mlx_pf0vf0"
>     Interface "mlx_pf0vf0"
>     Port gre
>     Interface gre
>     type: gre
>     options: {key="1000", local_ip="172.168.152.75", 
> remote_ip="172.168.152.241"}
>     Port "br0"
>     Interface "br0"
>     type: internal
> 
> set the mlx5e driver:
> 
> 
> modprobe mlx5_core
> echo 0 > /sys/class/net/eth2/device/sriov_numvfs
> echo 0 > /sys/class/net/eth3/device/sriov_numvfs
> echo 2 > /sys/class/net/eth2/device/sriov_numvfs
> echo 2 > /sys/class/net/eth3/device/sriov_numvfs
> lspci -nn | grep Mellanox
> echo :81:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind
> echo :81:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind
> echo :81:03.6 > /sys/bus/pci/drivers/mlx5_core/unbind
> echo :81:03.7 > /sys/bus/pci/drivers/mlx5_core/unbind
> 
> devlink dev eswitch set pci/:81:00.0  mode switchdev encap enable
> devlink dev eswitch set pci/:81:00.1  mode switchdev encap enable
> 
> modprobe bonding mode=802.3ad miimon=100 lacp_rate=1
> ip l del dev bond0
> ifconfig mlx_p0 down
> ifconfig mlx_p1 down
> ip l add dev bond0 type bond mode 802.3ad
> ifconfig bond0 172.168.152.75/24 up
> echo 1 > /sys/class/net/bond0/bonding/xmit_hash_policy
> ip l set dev mlx_p0 master bond0
> ip l set dev mlx_p1 master bond0
> ifconfig mlx_p0 up
> ifconfig mlx_p1 up
> 
> systemctl start openvswitch
> ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> systemctl restart openvswitch
> 
> 
> mlx_pf0vf0 is assigned to vm. The tc rule show in_hw
> 
> # tc filter ls dev mlx_pf0vf0 ingress
> filter protocol ip pref 2 flower
> filter protocol ip pref 2 flower handle 0x1
>   dst_mac 8e:c0:bd:bf:72:c3
>   src_mac 52:54:00:00:12:75
>   eth_type ipv4
>   ip_tos 0/3
>   ip_flags nofrag
>   in_hw
>     action order 1: tunnel_key set
>     src_ip 172.168.152.75
>     dst_ip 172.168.152.241
>     key_id 1000 pipe
>     index 2 ref 1 bind 1
>  
>     action order 2: mirred (Egress Redirect to device gre_sys) stolen
>  index 2 ref 1 bind 1
> 
> In the vm:  the mlx5e driver enable xps default (by the way I think it is 
> better not enable xps in default kernel can select queue by each flow),  in 
> the lag mode different vf queue associate with hw PF.
> 
> with command taskset -c 2 ping 10.0.0.241
> 
> the packet can be offloaded , the outgoing pf is mlx_p0
> 
> but with command taskset -c 1 ping 10.0.0.241
> 
> the packet can't be offloaded, I can capture the packet on the mlx_pf0vf0, 
> the outgoing pf is mlx_p1. Althrough the tc flower rule show in_hw
> 
> 
> I check with the driver  both mlx_pf0vf0 and peer(mlx_p1) install the tc rule 
> success
> 
> I think it's a problem of lag mode. Or I miss some configureation?
> 
> 
> BR
> 
> wenxu
> 
> 
> 
> 
> 

Hi,

we need to verify the driver detected to be in lag mode and
duplicated the offload rule to both eswitches.
do you see lag map messages in dmesg?
something like "lag map port 1:1 port 2:2"
this is to make sure the driver actually in lag mode.
in this mode a rule added to mlx_pf0vf0 will be added to esw of pf0 and esw of 
pf1.
then when u send a packet it could be handled in esw0 or esw1
if the rule is not in esw1 then it wont be offloaded when using pf1.

thanks,
Roi


Re: [PATCH] net/mlx5e: Add bonding device for indr block to offload the packet received from bonding device

2019-05-19 Thread Roi Dayan


On 17/05/2019 11:21, we...@ucloud.cn wrote:
> From: wenxu 
> 
> The mlx5e support the lag mode. When add mlx_p0 and mlx_p1 to bond0.
> packet received from mlx_p0 or mlx_p1 and in the ingress tc flower
> forward to vf0. The tc rule can't be offloaded for the non indr
> rejistor block for the bonding device.
> 
> Signed-off-by: wenxu 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 91e24f1..134fa0b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -796,6 +796,7 @@ static int mlx5e_nic_rep_netdevice_event(struct 
> notifier_block *nb,
>   struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
>  
>   if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
> + !netif_is_bond_master(netdev) &&
>   !is_vlan_dev(netdev))
>   return NOTIFY_OK;
>  
> 

hmm. is this the only thing blocked you from offloading indirect from bond?
when u add the rule from bond like this this also need to make sure the rule
is duplicated to both eswitches in is_peer_flow_needed().

today we support bond offloading by using shared tc block.
i.e. you add a shared tc block to bond and its slaves.
then all rules you add to the shared block instead of a specific interface.

this is also how OVS currently offload the rules with bond.
when u add a bond port, ovs adds a shared tc block to the slaves.
then new rules added to the tc block instead of the bond interface.



Re: [PATCH] net/mlx5e: restrict the real_dev of vlan device is the same as uplink device

2019-05-19 Thread Roi Dayan


On 18/05/2019 06:10, wenxu wrote:
> There will be multiple vlan device which maybe not belong to the uplink rep 
> device, so wen can limit it
> 
> 在 2019/5/18 4:30, Saeed Mahameed 写道:
>> On Wed, 2019-05-15 at 17:25 +0800, we...@ucloud.cn wrote:
>>> From: wenxu 
>>>
>>> When register indr block for vlan device, it should check the
>>> real_dev
>>> of vlan device is same as uplink device. Or it will set offload rule
>>> to mlx5e which will never hit.
>>>
>> I would improve the commit message, it is not really clear to me what
>> is going on here.
>>
>> Anyway Roi and team, can you please provide feedback ..
>>
>>> Signed-off-by: wenxu 
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>>> index 91e24f1..a39fdac 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>>> @@ -796,7 +796,7 @@ static int mlx5e_nic_rep_netdevice_event(struct
>>> notifier_block *nb,
>>> struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
>>>  
>>> if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
>>> -   !is_vlan_dev(netdev))
>>> +   !(is_vlan_dev(netdev) && vlan_dev_real_dev(netdev) ==
>>> rpriv->netdev))
>>> return NOTIFY_OK;
>>>  
>>> switch (event) {

thanks!

you should add a fixes line
Fixes: 35a605db168c ("net/mlx5e: Offload TC e-switch rules with ingress VLAN 
device")

beside that all good.
Reviewed-by: Roi Dayan 




Re: [PATCH] net/mlx5e: Allow matching only enc_key_id/enc_dst_port for decapsulation action

2019-05-12 Thread Roi Dayan


On 06/05/2019 21:28, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> In some case, we don't care the enc_src_ip and enc_dst_ip, and
> if we don't match the field enc_src_ip and enc_dst_ip, we can use
> fewer flows in hardware when revice the tunnel packets. For example,
> the tunnel packets may be sent from different hosts, we must offload
> one rule for each host.
> 
>   $ tc filter add dev vxlan0 protocol ip parent : prio 1 \
>   flower dst_mac 00:11:22:33:44:00 \
>   enc_src_ip Host0_IP enc_dst_ip 2.2.2.100 \
>   enc_dst_port 4789 enc_key_id 100 \
>   action tunnel_key unset action mirred egress redirect dev eth0_1
> 
>   $ tc filter add dev vxlan0 protocol ip parent : prio 1 \
>   flower dst_mac 00:11:22:33:44:00 \
>   enc_src_ip Host1_IP enc_dst_ip 2.2.2.100 \
>   enc_dst_port 4789 enc_key_id 100 \
>   action tunnel_key unset action mirred egress redirect dev eth0_1
> 
> If we support flows which only match the enc_key_id and enc_dst_port,
> a flow can process the packets sent to VM which (mac 00:11:22:33:44:00).
> 
>   $ tc filter add dev vxlan0 protocol ip parent : prio 1 \
>   flower dst_mac 00:11:22:33:44:00 \
>   enc_dst_port 4789 enc_key_id 100 \
>   action tunnel_key unset action mirred egress redirect dev eth0_1
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 27 
> +++--
>  1 file changed, 7 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 122f457..91e4db1 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -1339,7 +1339,6 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
>   void *headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value,
>  outer_headers);
>   struct flow_rule *rule = tc_cls_flower_offload_flow_rule(f);
> - struct flow_match_control enc_control;
>   int err;
>  
>   err = mlx5e_tc_tun_parse(filter_dev, priv, spec, f,
> @@ -1350,9 +1349,7 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
>   return err;
>   }
>  
> - flow_rule_match_enc_control(rule, &enc_control);
> -
> - if (enc_control.key->addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {
> + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS)) {
>   struct flow_match_ipv4_addrs match;
>  
>   flow_rule_match_enc_ipv4_addrs(rule, &match);
> @@ -1372,7 +1369,7 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
>  
>   MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, ethertype);
>   MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype, ETH_P_IP);
> - } else if (enc_control.key->addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) 
> {
> + } else if (flow_rule_match_key(rule, 
> FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS)) {
>   struct flow_match_ipv6_addrs match;
>  
>   flow_rule_match_enc_ipv6_addrs(rule, &match);
> @@ -1504,22 +1501,12 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
>   return -EOPNOTSUPP;
>   }
>  
> - if ((flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) ||
> -  flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_KEYID) ||
> -  flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_PORTS)) &&
> - flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
> - struct flow_match_control match;
> -
> - flow_rule_match_enc_control(rule, &match);
> - switch (match.key->addr_type) {
> - case FLOW_DISSECTOR_KEY_IPV4_ADDRS:
> - case FLOW_DISSECTOR_KEY_IPV6_ADDRS:
> - if (parse_tunnel_attr(priv, spec, f, filter_dev, 
> tunnel_match_level))
> - return -EOPNOTSUPP;
> - break;
> - default:
> + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) ||
> + flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) ||
> + flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_KEYID) ||
> + flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_PORTS)) {
> +     if (parse_tunnel_attr(priv, spec, f, filter_dev, 
> tunnel_match_level))
>   return -EOPNOTSUPP;
> - }
>  
>   /* In decap flow, header pointers should point to the inner
>* headers, outer header were already set by parse_tunnel_attr
> 


Reviewed-by: Roi Dayan 


Re: [PATCH net v2 1/2] net/mlx5: Avoid panic when setting vport mac, getting vport config

2019-03-04 Thread Roi Dayan


On 04/03/2019 10:27, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> If we try to set VFs mac address on a VF (not PF) net device,
> the kernel will be crash. The commands are show as below:
> 
> $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
> $ ip link set $MLX_VF0 vf 0 mac 00:11:22:33:44:00
> 
> [exception RIP: mlx5_eswitch_set_vport_mac+41]
> [b8b7079e3688] do_setlink at 8f67f85b
> [b8b7079e37a8] __rtnl_newlink at 8f683778
> [b8b7079e3b68] rtnl_newlink at 8f683a63
> [b8b7079e3b90] rtnetlink_rcv_msg at 8f67d812
> [b8b7079e3c10] netlink_rcv_skb at 8f6b88ab
> [b8b7079e3c60] netlink_unicast at 8f6b808f
> [b8b7079e3ca0] netlink_sendmsg at 8f6b8412
> [b8b7079e3d18] sock_sendmsg at 8f6452f6
> [b8b7079e3d30] ___sys_sendmsg at 8f645860
> [b8b7079e3eb0] __sys_sendmsg at 8f647a38
> [b8b7079e3f38] do_syscall_64 at 8f00401b
> [b8b7079e3f50] entry_SYSCALL_64_after_hwframe at 8f80008c
> 
> and
> 
> [exception RIP: mlx5_eswitch_get_vport_config+12]
> [a70607e57678] mlx5e_get_vf_config at c03c7f8f [mlx5_core]
> [a70607e57688] do_setlink at bc67fa59
> [a70607e577a8] __rtnl_newlink at bc683778
> [a70607e57b68] rtnl_newlink at bc683a63
> [a70607e57b90] rtnetlink_rcv_msg at bc67d812
> [a70607e57c10] netlink_rcv_skb at bc6b88ab
> [a70607e57c60] netlink_unicast at bc6b808f
> [a70607e57ca0] netlink_sendmsg at bc6b8412
> [a70607e57d18] sock_sendmsg at bc6452f6
> [a70607e57d30] ___sys_sendmsg at bc645860
> [a70607e57eb0] __sys_sendmsg at bc647a38
> [a70607e57f38] do_syscall_64 at bc00401b
> [a70607e57f50] entry_SYSCALL_64_after_hwframe at bc80008c
> 
> Fixes: a8d70a054a718 ("net/mlx5: E-Switch, Disallow vlan/spoofcheck setup if 
> not being esw manager")
> Cc: Eli Cohen 
> Signed-off-by: Tonghao Zhang 
> ---
> v1->v2: check only the esw is null, instead of ESW_ALLOWED.
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 6cb9710..385e5cc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1871,7 +1871,7 @@ int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
>   u64 node_guid;
>   int err = 0;
>  
> - if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
> + if (!esw || !MLX5_CAP_GEN(esw->dev, vport_group_manager))
>   return -EPERM;
>   if (!LEGAL_VPORT(esw, vport) || is_multicast_ether_addr(mac))
>   return -EINVAL;
> @@ -1945,7 +1945,7 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch 
> *esw,
>  {
>   struct mlx5_vport *evport;
>  
> - if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
> + if (!esw || !MLX5_CAP_GEN(esw->dev, vport_group_manager))
>   return -EPERM;
>   if (!LEGAL_VPORT(esw, vport))
>   return -EINVAL;
> 

Reviewed-by: Roi Dayan 


Re: [PATCH net v2 2/2] net/mlx5: Avoid panic when setting vport rate

2019-03-04 Thread Roi Dayan


On 04/03/2019 10:27, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> If we try to set VFs rate on a VF (not PF) net device, the kernel
> will be crash. The commands are show as below:
> 
> $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
> $ ip link set $MLX_VF0 vf 0 max_tx_rate 2 min_tx_rate 1
> 
> If not applied the first patch ("net/mlx5: Avoid panic when setting
> vport mac, getting vport config"), the command:
> 
> $ ip link set $MLX_VF0 vf 0 rate 100
> 
> can also crash the kernel.
> 
> [ 1650.006388] RIP: 0010:mlx5_eswitch_set_vport_rate+0x1f/0x260 [mlx5_core]
> [ 1650.007092]  do_setlink+0x982/0xd20
> [ 1650.007129]  __rtnl_newlink+0x528/0x7d0
> [ 1650.007374]  rtnl_newlink+0x43/0x60
> [ 1650.007407]  rtnetlink_rcv_msg+0x2a2/0x320
> [ 1650.007484]  netlink_rcv_skb+0xcb/0x100
> [ 1650.007519]  netlink_unicast+0x17f/0x230
> [ 1650.007554]  netlink_sendmsg+0x2d2/0x3d0
> [ 1650.007592]  sock_sendmsg+0x36/0x50
> [ 1650.007625]  ___sys_sendmsg+0x280/0x2a0
> [ 1650.007963]  __sys_sendmsg+0x58/0xa0
> [ 1650.007998]  do_syscall_64+0x5b/0x180
> [ 1650.009438]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
> Cc: Mohamad Haj Yahia 
> Signed-off-by: Tonghao Zhang 
> ---
> v1->v2: change commit log
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 385e5cc..9e339b2 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -2116,19 +2116,24 @@ static int normalize_vports_min_rate(struct 
> mlx5_eswitch *esw, u32 divider)
>  int mlx5_eswitch_set_vport_rate(struct mlx5_eswitch *esw, int vport,
>   u32 max_rate, u32 min_rate)
>  {
> - u32 fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share);
> - bool min_rate_supported = MLX5_CAP_QOS(esw->dev, esw_bw_share) &&
> - fw_max_bw_share >= MLX5_MIN_BW_SHARE;
> - bool max_rate_supported = MLX5_CAP_QOS(esw->dev, esw_rate_limit);
>   struct mlx5_vport *evport;
> + u32 fw_max_bw_share;
>   u32 previous_min_rate;
>   u32 divider;
> + bool min_rate_supported;
> + bool max_rate_supported;
>   int err = 0;
>  
>   if (!ESW_ALLOWED(esw))
>   return -EPERM;
>   if (!LEGAL_VPORT(esw, vport))
>   return -EINVAL;
> +
> + fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share);
> + min_rate_supported = MLX5_CAP_QOS(esw->dev, esw_bw_share) &&
> +     fw_max_bw_share >= MLX5_MIN_BW_SHARE;
> + max_rate_supported = MLX5_CAP_QOS(esw->dev, esw_rate_limit);
> +
>   if ((min_rate && !min_rate_supported) || (max_rate && 
> !max_rate_supported))
>   return -EOPNOTSUPP;
>  
> 

Reviewed-by: Roi Dayan 


Re: [PATCH 1/2] net/mlx5: Avoid panic when setting vport mac, getting vport config

2019-03-03 Thread Roi Dayan


On 04/03/2019 03:26, Tonghao Zhang wrote:
> On Sun, Mar 3, 2019 at 8:42 PM Roi Dayan  wrote:
>>
>>
>>
>> On 03/03/2019 11:56, xiangxia.m@gmail.com wrote:
>>> From: Tonghao Zhang 
>>>
>>> If we try to set VFs mac address on a VF (not PF) net device,
>>> the kernel will be crash. The commands are show as below:
>>>
>>> $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
>>> $ ip link set $MLX_VF0 vf 0 mac 00:11:22:33:44:00
>>>
>>> [exception RIP: mlx5_eswitch_set_vport_mac+41]
>>> [b8b7079e3688] do_setlink at 8f67f85b
>>> [b8b7079e37a8] __rtnl_newlink at 8f683778
>>> [b8b7079e3b68] rtnl_newlink at 8f683a63
>>> [b8b7079e3b90] rtnetlink_rcv_msg at 8f67d812
>>> [b8b7079e3c10] netlink_rcv_skb at 8f6b88ab
>>> [b8b7079e3c60] netlink_unicast at 8f6b808f
>>> [b8b7079e3ca0] netlink_sendmsg at 8f6b8412
>>> [b8b7079e3d18] sock_sendmsg at 8f6452f6
>>> [b8b7079e3d30] ___sys_sendmsg at 8f645860
>>> [b8b7079e3eb0] __sys_sendmsg at 8f647a38
>>> [b8b7079e3f38] do_syscall_64 at 8f00401b
>>> [b8b7079e3f50] entry_SYSCALL_64_after_hwframe at 8f80008c
>>>
>>> and
>>>
>>> [exception RIP: mlx5_eswitch_get_vport_config+12]
>>> [a70607e57678] mlx5e_get_vf_config at c03c7f8f [mlx5_core]
>>> [a70607e57688] do_setlink at bc67fa59
>>> [a70607e577a8] __rtnl_newlink at bc683778
>>> [a70607e57b68] rtnl_newlink at bc683a63
>>> [a70607e57b90] rtnetlink_rcv_msg at bc67d812
>>> [a70607e57c10] netlink_rcv_skb at bc6b88ab
>>> [a70607e57c60] netlink_unicast at bc6b808f
>>> [a70607e57ca0] netlink_sendmsg at bc6b8412
>>> [a70607e57d18] sock_sendmsg at bc6452f6
>>> [a70607e57d30] ___sys_sendmsg at bc645860
>>> [a70607e57eb0] __sys_sendmsg at bc647a38
>>> [a70607e57f38] do_syscall_64 at bc00401b
>>> [a70607e57f50] entry_SYSCALL_64_after_hwframe at bc80008c
>>>
>>> Fixes: a8d70a054a718 ("net/mlx5: E-Switch, Disallow vlan/spoofcheck setup 
>>> if not being esw manager")
>>> Cc: Eli Cohen 
>>> Signed-off-by: Tonghao Zhang 
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 6 ++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
>>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>>> index 6cb9710..774edc9 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>>> @@ -1871,6 +1871,9 @@ int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch 
>>> *esw,
>>>   u64 node_guid;
>>>   int err = 0;
>>>
>>> + if (!ESW_ALLOWED(esw))
>>> + return -EPERM;
>>> +
>>
>> this will introduce a bug with smart nic.
>> from the commit in the fixes line, in smart nic the PF
>> is not an esw manager so it will block changing vf mac
>> with the pf. the fix should be checking if esw is null first.
> and when i fix this bug, I review all the  eswitch NDOs, i find except
> mlx5e_set_vf_mac
> mlx5e_get_vf_config
> 
> other NDOs  use the ESW_ALLOWED to check, that include
> mlx5e_set_vf_vlan
> mlx5e_set_vf_spoofchk
> mlx5e_set_vf_trust
> mlx5e_set_vf_rate (the bug will be fixed in patch 2)
> mlx5e_set_vf_link_state
> mlx5e_get_vf_stats
> 

because some operations are limited to esw manager and some to vport
group manager.
the commit in your fixes list what is allowed.

1. set_vf_vlan - disallowed
2. set_vf_spoofchk - disallowed
3. set_vf_mac  - allowed
4. get_vf_config   - allowed
5. set_vf_trust- disallowed
6. set_vf_rate - disallowed
7. get_vf_stat - allowed
8. set_vf_link_state - disallowed

from this maybe get_vf_stat is the only one checking for esw manager but
maybe should be checked against vport manager.
I'll have to check about that one to be sure.

>>>   if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
>>>   return -EPERM;
>>>   if (!LEGAL_VPORT(esw, vport) || is_multicast_ether_addr(mac))
>>> @@ -1945,6 +1948,9 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch 
>>> *esw,
>>>  {
>>>   struct mlx5_vport *evport;
>>>
>>> + if (!ESW_ALLOWED(esw))
>>> + return -EPERM;
>>> +
>>>   if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
>>>   return -EPERM;
>>>   if (!LEGAL_VPORT(esw, vport))
>>>


Re: [PATCH 1/2] net/mlx5: Avoid panic when setting vport mac, getting vport config

2019-03-03 Thread Roi Dayan


On 04/03/2019 03:04, Tonghao Zhang wrote:
> On Sun, Mar 3, 2019 at 8:42 PM Roi Dayan  wrote:
>>
>>
>>
>> On 03/03/2019 11:56, xiangxia.m@gmail.com wrote:
>>> From: Tonghao Zhang 
>>>
>>> If we try to set VFs mac address on a VF (not PF) net device,
>>> the kernel will be crash. The commands are show as below:
>>>
>>> $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
>>> $ ip link set $MLX_VF0 vf 0 mac 00:11:22:33:44:00
>>>
>>> [exception RIP: mlx5_eswitch_set_vport_mac+41]
>>> [b8b7079e3688] do_setlink at 8f67f85b
>>> [b8b7079e37a8] __rtnl_newlink at 8f683778
>>> [b8b7079e3b68] rtnl_newlink at 8f683a63
>>> [b8b7079e3b90] rtnetlink_rcv_msg at 8f67d812
>>> [b8b7079e3c10] netlink_rcv_skb at 8f6b88ab
>>> [b8b7079e3c60] netlink_unicast at 8f6b808f
>>> [b8b7079e3ca0] netlink_sendmsg at 8f6b8412
>>> [b8b7079e3d18] sock_sendmsg at 8f6452f6
>>> [b8b7079e3d30] ___sys_sendmsg at 8f645860
>>> [b8b7079e3eb0] __sys_sendmsg at 8f647a38
>>> [b8b7079e3f38] do_syscall_64 at 8f00401b
>>> [b8b7079e3f50] entry_SYSCALL_64_after_hwframe at 8f80008c
>>>
>>> and
>>>
>>> [exception RIP: mlx5_eswitch_get_vport_config+12]
>>> [a70607e57678] mlx5e_get_vf_config at c03c7f8f [mlx5_core]
>>> [a70607e57688] do_setlink at bc67fa59
>>> [a70607e577a8] __rtnl_newlink at bc683778
>>> [a70607e57b68] rtnl_newlink at bc683a63
>>> [a70607e57b90] rtnetlink_rcv_msg at bc67d812
>>> [a70607e57c10] netlink_rcv_skb at bc6b88ab
>>> [a70607e57c60] netlink_unicast at bc6b808f
>>> [a70607e57ca0] netlink_sendmsg at bc6b8412
>>> [a70607e57d18] sock_sendmsg at bc6452f6
>>> [a70607e57d30] ___sys_sendmsg at bc645860
>>> [a70607e57eb0] __sys_sendmsg at bc647a38
>>> [a70607e57f38] do_syscall_64 at bc00401b
>>> [a70607e57f50] entry_SYSCALL_64_after_hwframe at bc80008c
>>>
>>> Fixes: a8d70a054a718 ("net/mlx5: E-Switch, Disallow vlan/spoofcheck setup 
>>> if not being esw manager")
>>> Cc: Eli Cohen 
>>> Signed-off-by: Tonghao Zhang 
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 6 ++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
>>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>>> index 6cb9710..774edc9 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>>> @@ -1871,6 +1871,9 @@ int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch 
>>> *esw,
>>>   u64 node_guid;
>>>   int err = 0;
>>>
>>> + if (!ESW_ALLOWED(esw))
>>> + return -EPERM;
>>> +
>>
>> this will introduce a bug with smart nic.
>> from the commit in the fixes line, in smart nic the PF
>> is not an esw manager so it will block changing vf mac
>> with the pf. the fix should be checking if esw is null first.
> Thanks for your reply, I don't get the smart nic card and can't test
> it. So to fix this bug,
> we only check the esw is null right ?

correct. in smart nic we have PF and ECPF. PF is vport manager but not
esw manager and ECPF is the esw manager.
We set vf mac through the pf so the condition here should only be
vport group manager.


> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 6cb9710..dc332ba 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1871,6 +1871,9 @@ int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
> u64 node_guid;
> int err = 0;
> 
> +   if (!esw)
> +   return -EPERM;
> +
> if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
> return -EPERM;

maybe just add the condition to the same if.
 if (!esw || !MLX5_CAP_GEN(esw->dev, vport_group_manager))
 return -EPERM;


> if (!LEGAL_VPORT(esw, vport) || is_multicast_ether_addr(mac))
> @@ -1945,6 +1948,9 @@ int mlx5_eswitch_get_vport_config(struct
> mlx5_eswitch *esw,
>  {
> struct mlx5_vport *evport;
> 
> +   if (!esw)
> +   return -EPERM;
> +
> if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
> return -EPERM;
> if (!LEGAL_VPORT(esw, vport))
> 
>>
>>>   if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
>>>   return -EPERM;
>>>   if (!LEGAL_VPORT(esw, vport) || is_multicast_ether_addr(mac))
>>> @@ -1945,6 +1948,9 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch 
>>> *esw,
>>>  {
>>>   struct mlx5_vport *evport;
>>>
>>> + if (!ESW_ALLOWED(esw))
>>> + return -EPERM;
>>> +
>>>   if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
>>>   return -EPERM;
>>>   if (!LEGAL_VPORT(esw, vport))
>>>


Re: [PATCH][next] net/mlx5e: Remove redundant assignment

2019-03-03 Thread Roi Dayan


On 02/03/2019 21:39, Gustavo A. R. Silva wrote:
> Remove redundant assignment to tun_entropy->enabled.
> 
> Addesses-Coverity-ID: 1477328 ("Unused value")
> Fixes: 97417f6182f8 ("net/mlx5e: Fix GRE key by controlling port tunnel 
> entropy calculation")

the commit doesn't fix any real issue but is more of a cleanup.
so I'm not sure if fixes line is relevant or not.
beside that looks ok.

Reviewed-by: Roi Dayan 


> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c
> index 40f4a19b1ce1..be69c1d7941a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c
> @@ -80,10 +80,8 @@ void mlx5_init_port_tun_entropy(struct mlx5_tun_entropy 
> *tun_entropy,
>   mlx5_query_port_tun_entropy(mdev, &entropy_flags);
>   tun_entropy->num_enabling_entries = 0;
>   tun_entropy->num_disabling_entries = 0;
> - tun_entropy->enabled = entropy_flags.calc_enabled;
> - tun_entropy->enabled =
> - (entropy_flags.calc_supported) ?
> - entropy_flags.calc_enabled : true;
> + tun_entropy->enabled = entropy_flags.calc_supported ?
> +entropy_flags.calc_enabled : true;
>  }
>  
>  static int mlx5_set_entropy(struct mlx5_tun_entropy *tun_entropy,
> 


Re: [PATCH net 2/2] net/mlx5: Avoid panic when setting VF rate

2019-03-03 Thread Roi Dayan


On 03/03/2019 16:12, Roi Dayan wrote:
> 
> 
> On 03/03/2019 11:56, xiangxia.m@gmail.com wrote:
>> From: Tonghao Zhang 
>>
>> If we try to set VFs rate on a VF (not PF) net device, the kernel
>> will be crash. The commands are show as below:
>>
>> $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
>> $ ip link set $MLX_VF0 vf 0 rate 100
> 
> actually this didn't reproduce with this command + your first fix.
> 
> setting rate calls netlink IFLA_VF_TX_RATE which is calling
> nfo_get_vf_config() which is now returning error correctly.
> 
> we need the netlink call to be IFLA_VF_RATE. seems we get there
> when we set min/max_tx_rate.
> so i reproduce the error with this command
> $ ip link set dev $VF vf 0 max_tx_rate 2 max_tx_rate 1

I had a typo here. the cmd is
$ ip link set dev $VF vf 0 max_tx_rate 2 min_tx_rate 1

> 
> can you verify and fix the commit msg?
> 
>>
>> Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
>> Cc: Mohamad Haj Yahia 
>> Signed-off-by: Tonghao Zhang 
>> ---
>>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 13 +
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> index 774edc9..8c2f1e6 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> @@ -2122,19 +2122,24 @@ static int normalize_vports_min_rate(struct 
>> mlx5_eswitch *esw, u32 divider)
>>  int mlx5_eswitch_set_vport_rate(struct mlx5_eswitch *esw, int vport,
>>  u32 max_rate, u32 min_rate)
>>  {
>> -u32 fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share);
>> -bool min_rate_supported = MLX5_CAP_QOS(esw->dev, esw_bw_share) &&
>> -fw_max_bw_share >= MLX5_MIN_BW_SHARE;
>> -bool max_rate_supported = MLX5_CAP_QOS(esw->dev, esw_rate_limit);
>>  struct mlx5_vport *evport;
>> +u32 fw_max_bw_share;
>>  u32 previous_min_rate;
>>  u32 divider;
>> +bool min_rate_supported;
>> +bool max_rate_supported;
>>  int err = 0;
>>  
>>  if (!ESW_ALLOWED(esw))
>>  return -EPERM;
>>  if (!LEGAL_VPORT(esw, vport))
>>  return -EINVAL;
>> +
>> +fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share);
>> +min_rate_supported = MLX5_CAP_QOS(esw->dev, esw_bw_share) &&
>> +fw_max_bw_share >= MLX5_MIN_BW_SHARE;
>> +max_rate_supported = MLX5_CAP_QOS(esw->dev, esw_rate_limit);
>> +
>>  if ((min_rate && !min_rate_supported) || (max_rate && 
>> !max_rate_supported))
>>  return -EOPNOTSUPP;
>>  
>>


Re: [PATCH net 2/2] net/mlx5: Avoid panic when setting VF rate

2019-03-03 Thread Roi Dayan


On 03/03/2019 16:12, Roi Dayan wrote:
> 
> 
> On 03/03/2019 11:56, xiangxia.m@gmail.com wrote:
>> From: Tonghao Zhang 
>>
>> If we try to set VFs rate on a VF (not PF) net device, the kernel
>> will be crash. The commands are show as below:
>>
>> $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
>> $ ip link set $MLX_VF0 vf 0 rate 100
> 
> actually this didn't reproduce with this command + your first fix.
> 
> setting rate calls netlink IFLA_VF_TX_RATE which is calling
> nfo_get_vf_config() which is now returning error correctly.
> 
> we need the netlink call to be IFLA_VF_RATE. seems we get there
> when we set min/max_tx_rate.
> so i reproduce the error with this command
> $ ip link set dev $VF vf 0 max_tx_rate 2 max_tx_rate 1

i had a typo here. the cmd is
> $ ip link set dev $VF vf 0 max_tx_rate 2 min_tx_rate 1

> 
> can you verify and fix the commit msg?
> 
>>
>> Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
>> Cc: Mohamad Haj Yahia 
>> Signed-off-by: Tonghao Zhang 
>> ---
>>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 13 +
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> index 774edc9..8c2f1e6 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> @@ -2122,19 +2122,24 @@ static int normalize_vports_min_rate(struct 
>> mlx5_eswitch *esw, u32 divider)
>>  int mlx5_eswitch_set_vport_rate(struct mlx5_eswitch *esw, int vport,
>>  u32 max_rate, u32 min_rate)
>>  {
>> -u32 fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share);
>> -bool min_rate_supported = MLX5_CAP_QOS(esw->dev, esw_bw_share) &&
>> -fw_max_bw_share >= MLX5_MIN_BW_SHARE;
>> -bool max_rate_supported = MLX5_CAP_QOS(esw->dev, esw_rate_limit);
>>  struct mlx5_vport *evport;
>> +u32 fw_max_bw_share;
>>  u32 previous_min_rate;
>>  u32 divider;
>> +bool min_rate_supported;
>> +bool max_rate_supported;
>>  int err = 0;
>>  
>>  if (!ESW_ALLOWED(esw))
>>  return -EPERM;
>>  if (!LEGAL_VPORT(esw, vport))
>>  return -EINVAL;
>> +
>> +fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share);
>> +min_rate_supported = MLX5_CAP_QOS(esw->dev, esw_bw_share) &&
>> +fw_max_bw_share >= MLX5_MIN_BW_SHARE;
>> +max_rate_supported = MLX5_CAP_QOS(esw->dev, esw_rate_limit);
>> +
>>  if ((min_rate && !min_rate_supported) || (max_rate && 
>> !max_rate_supported))
>>  return -EOPNOTSUPP;
>>  
>>


Re: [PATCH net 2/2] net/mlx5: Avoid panic when setting VF rate

2019-03-03 Thread Roi Dayan


On 03/03/2019 11:56, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> If we try to set VFs rate on a VF (not PF) net device, the kernel
> will be crash. The commands are show as below:
> 
> $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
> $ ip link set $MLX_VF0 vf 0 rate 100

actually this didn't reproduce with this command + your first fix.

setting rate calls netlink IFLA_VF_TX_RATE which is calling
nfo_get_vf_config() which is now returning error correctly.

we need the netlink call to be IFLA_VF_RATE. seems we get there
when we set min/max_tx_rate.
so i reproduce the error with this command
$ ip link set dev $VF vf 0 max_tx_rate 2 max_tx_rate 1

can you verify and fix the commit msg?

> 
> Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
> Cc: Mohamad Haj Yahia 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 774edc9..8c2f1e6 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -2122,19 +2122,24 @@ static int normalize_vports_min_rate(struct 
> mlx5_eswitch *esw, u32 divider)
>  int mlx5_eswitch_set_vport_rate(struct mlx5_eswitch *esw, int vport,
>   u32 max_rate, u32 min_rate)
>  {
> - u32 fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share);
> - bool min_rate_supported = MLX5_CAP_QOS(esw->dev, esw_bw_share) &&
> - fw_max_bw_share >= MLX5_MIN_BW_SHARE;
> - bool max_rate_supported = MLX5_CAP_QOS(esw->dev, esw_rate_limit);
>   struct mlx5_vport *evport;
> + u32 fw_max_bw_share;
>   u32 previous_min_rate;
>   u32 divider;
> + bool min_rate_supported;
> + bool max_rate_supported;
>   int err = 0;
>  
>   if (!ESW_ALLOWED(esw))
>   return -EPERM;
>   if (!LEGAL_VPORT(esw, vport))
>   return -EINVAL;
> +
> + fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share);
> + min_rate_supported = MLX5_CAP_QOS(esw->dev, esw_bw_share) &&
> + fw_max_bw_share >= MLX5_MIN_BW_SHARE;
> + max_rate_supported = MLX5_CAP_QOS(esw->dev, esw_rate_limit);
> +
>   if ((min_rate && !min_rate_supported) || (max_rate && 
> !max_rate_supported))
>   return -EOPNOTSUPP;
>  
> 


Re: [PATCH 1/2] net/mlx5: Avoid panic when setting vport mac, getting vport config

2019-03-03 Thread Roi Dayan


On 03/03/2019 11:56, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> If we try to set VFs mac address on a VF (not PF) net device,
> the kernel will be crash. The commands are show as below:
> 
> $ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
> $ ip link set $MLX_VF0 vf 0 mac 00:11:22:33:44:00
> 
> [exception RIP: mlx5_eswitch_set_vport_mac+41]
> [b8b7079e3688] do_setlink at 8f67f85b
> [b8b7079e37a8] __rtnl_newlink at 8f683778
> [b8b7079e3b68] rtnl_newlink at 8f683a63
> [b8b7079e3b90] rtnetlink_rcv_msg at 8f67d812
> [b8b7079e3c10] netlink_rcv_skb at 8f6b88ab
> [b8b7079e3c60] netlink_unicast at 8f6b808f
> [b8b7079e3ca0] netlink_sendmsg at 8f6b8412
> [b8b7079e3d18] sock_sendmsg at 8f6452f6
> [b8b7079e3d30] ___sys_sendmsg at 8f645860
> [b8b7079e3eb0] __sys_sendmsg at 8f647a38
> [b8b7079e3f38] do_syscall_64 at 8f00401b
> [b8b7079e3f50] entry_SYSCALL_64_after_hwframe at 8f80008c
> 
> and
> 
> [exception RIP: mlx5_eswitch_get_vport_config+12]
> [a70607e57678] mlx5e_get_vf_config at c03c7f8f [mlx5_core]
> [a70607e57688] do_setlink at bc67fa59
> [a70607e577a8] __rtnl_newlink at bc683778
> [a70607e57b68] rtnl_newlink at bc683a63
> [a70607e57b90] rtnetlink_rcv_msg at bc67d812
> [a70607e57c10] netlink_rcv_skb at bc6b88ab
> [a70607e57c60] netlink_unicast at bc6b808f
> [a70607e57ca0] netlink_sendmsg at bc6b8412
> [a70607e57d18] sock_sendmsg at bc6452f6
> [a70607e57d30] ___sys_sendmsg at bc645860
> [a70607e57eb0] __sys_sendmsg at bc647a38
> [a70607e57f38] do_syscall_64 at bc00401b
> [a70607e57f50] entry_SYSCALL_64_after_hwframe at bc80008c
> 
> Fixes: a8d70a054a718 ("net/mlx5: E-Switch, Disallow vlan/spoofcheck setup if 
> not being esw manager")
> Cc: Eli Cohen 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 6cb9710..774edc9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1871,6 +1871,9 @@ int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
>   u64 node_guid;
>   int err = 0;
>  
> + if (!ESW_ALLOWED(esw))
> + return -EPERM;
> +

this will introduce a bug with smart nic.
from the commit in the fixes line, in smart nic the PF
is not an esw manager so it will block changing vf mac
with the pf. the fix should be checking if esw is null first.

>   if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
>   return -EPERM;
>   if (!LEGAL_VPORT(esw, vport) || is_multicast_ether_addr(mac))
> @@ -1945,6 +1948,9 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch 
> *esw,
>  {
>   struct mlx5_vport *evport;
>  
> + if (!ESW_ALLOWED(esw))
> + return -EPERM;
> +
>   if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
>   return -EPERM;
>   if (!LEGAL_VPORT(esw, vport))
> 


Re: [PATCH net-next v4 4/4] net/mlx5e: Return -EOPNOTSUPP when attempting to offload an unsupported action

2019-02-28 Thread Roi Dayan


On 27/02/2019 17:31, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> * Now the encapsulation is not supported for mlx5 VFs. When we try to
> offload that action, the -EINVAL is returned, but not -EOPNOTSUPP.
> This patch changes the returned value and ignore to confuse user.
> The command is shown as below [1].
> 
> * When max modify header action is zero, we return -EOPNOTSUPP
> directly. In this way, we can ignore wrong message info (e.g.
> "mlx5: parsed 0 pedit actions, can't do more"). This happens when
> offloading pedit actions on mlx(cx4) VFs. The command is shown as below [2].
> 
> For example: (p2p1_0 is VF net device)
> [1]
> $ tc filter add dev p2p1_0 protocol ip  parent : prio 1 flower skip_sw \
> src_mac e4:11:22:33:44:01\
> action tunnel_key set\
> src_ip 1.1.1.100\
> dst_ip 1.1.1.200\
> dst_port 4789 id 100\
> action mirred egress redirect dev vxlan0
> 
> [2]
> $ tc filter add dev p2p1_0 parent : protocol ip prio 1 \
> flower skip_sw dst_mac 00:10:56:fb:64:e8 \
> dst_ip 1.1.1.100 src_ip 1.1.1.200 \
> action pedit ex munge eth src set 00:10:56:b4:5d:20
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 27 
> ++---
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 56ac50d..52748e2 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -1999,6 +1999,15 @@ static int offload_pedit_fields(struct 
> pedit_headers_action *hdrs,
>   return 0;
>  }
>  
> +static int mlx5e_flow_namespace_max_modify_action(struct mlx5_core_dev *mdev,
> +   int namespace)
> +{
> + if (namespace == MLX5_FLOW_NAMESPACE_FDB) /* FDB offloading */
> + return MLX5_CAP_ESW_FLOWTABLE_FDB(mdev, 
> max_modify_header_actions);
> + else /* namespace is MLX5_FLOW_NAMESPACE_KERNEL - NIC offloading */
> + return MLX5_CAP_FLOWTABLE_NIC_RX(mdev, 
> max_modify_header_actions);
> +}
> +
>  static int alloc_mod_hdr_actions(struct mlx5e_priv *priv,
>struct pedit_headers_action *hdrs,
>int namespace,
> @@ -2010,11 +2019,7 @@ static int alloc_mod_hdr_actions(struct mlx5e_priv 
> *priv,
>   hdrs[TCA_PEDIT_KEY_EX_CMD_ADD].pedits;
>   action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto);
>  
> - if (namespace == MLX5_FLOW_NAMESPACE_FDB) /* FDB offloading */
> - max_actions = MLX5_CAP_ESW_FLOWTABLE_FDB(priv->mdev, 
> max_modify_header_actions);
> - else /* namespace is MLX5_FLOW_NAMESPACE_KERNEL - NIC offloading */
> - max_actions = MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, 
> max_modify_header_actions);
> -
> + max_actions = mlx5e_flow_namespace_max_modify_action(priv->mdev, 
> namespace);
>   /* can get up to crazingly 16 HW actions in 32 bits pedit SW key */
>   max_actions = min(max_actions, nkeys * 16);
>  
> @@ -2047,6 +2052,12 @@ static int parse_tc_pedit_action(struct mlx5e_priv 
> *priv,
>   goto out_err;
>   }
>  
> + if (!mlx5e_flow_namespace_max_modify_action(priv->mdev, namespace)) {
> + NL_SET_ERR_MSG_MOD(extack,
> +"The pedit offload action is not supported");
> + goto out_err;
> + }
> +
>   mask = act->mangle.mask;
>   val = act->mangle.val;
>   offset = act->mangle.offset;
> @@ -2294,7 +2305,8 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
>   }
>   break;
>   default:
> - return -EINVAL;
> + NL_SET_ERR_MSG_MOD(extack, "The offload action is not 
> supported");
> + return -EOPNOTSUPP;
>   }
>   }
>  
> @@ -2616,7 +2628,8 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
>   break;
>   }
>   default:
> - return -EINVAL;
> + NL_SET_ERR_MSG_MOD(extack, "The offload action is not 
> supported");
> + return -EOPNOTSUPP;
>   }
>   }
>  
> 

Reviewed-by: Roi Dayan 


Re: [PATCH net v1] net/mlx5e: Correctly use the namespace type when allocating pedit action

2019-02-27 Thread Roi Dayan


On 26/02/2019 14:28, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> The capacity of FDB offloading and NIC offloading table are
> different, and when allocating the pedit actions, we should
> use the correct namespace type.
> 
> Fixes: c500c86b0c75d ("net/mlx5e: support for two independent packet edit 
> actions")
> Cc: Pablo Neira Ayuso 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 3a02b22..467ef9e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -2635,7 +2635,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
>  
>   if (hdrs[TCA_PEDIT_KEY_EX_CMD_SET].pedits ||
>   hdrs[TCA_PEDIT_KEY_EX_CMD_ADD].pedits) {
> - err = alloc_tc_pedit_action(priv, MLX5_FLOW_NAMESPACE_KERNEL,
> + err = alloc_tc_pedit_action(priv, MLX5_FLOW_NAMESPACE_FDB,
>   parse_attr, hdrs, extack);
>   if (err)
>   return err;
> 

Reviewed-by: Roi Dayan 


Re: [PATCH net-next v3 4/4] net/mlx5e: Return -EOPNOTSUPP when attempting to offload an unsupported action

2019-02-27 Thread Roi Dayan


On 26/02/2019 14:27, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> * Now the encapsulation is not supported for mlx5 VFs. When we try to
> offload that action, the -EINVAL is returned, but not -EOPNOTSUPP.
> This patch changes the returned value and ignore to confuse user.
> The command is shown as below [1].
> 
> * When max modify header action is zero, we return -EOPNOTSUPP
> directly. In this way, we can ignore wrong message info (e.g.
> "mlx5: parsed 0 pedit actions, can't do more"). This happens when
> offloading pedit actions on mlx(cx4) VFs. The command is shown as below [2].
> 
> For example: (p2p1_0 is VF net device)
> [1]
> $ tc filter add dev p2p1_0 protocol ip  parent : prio 1 flower skip_sw \
> src_mac e4:11:22:33:44:01\
> action tunnel_key set\
> src_ip 1.1.1.100\
> dst_ip 1.1.1.200\
> dst_port 4789 id 100\
> action mirred egress redirect dev vxlan0
> 
> [2]
> $ tc filter add dev p2p1_0 parent : protocol ip prio 1 \
> flower skip_sw dst_mac 00:10:56:fb:64:e8 \
> dst_ip 1.1.1.100 src_ip 1.1.1.200 \
> action pedit ex munge eth src set 00:10:56:b4:5d:20
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 56ac50d..3a02b22 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -2035,7 +2035,7 @@ static int parse_tc_pedit_action(struct mlx5e_priv 
> *priv,
>struct netlink_ext_ack *extack)
>  {
>   u8 cmd = (act->id == FLOW_ACTION_MANGLE) ? 0 : 1;
> - int err = -EOPNOTSUPP;
> + int max_actions, err = -EOPNOTSUPP;
>   u32 mask, val, offset;
>   u8 htype;
>  
> @@ -2047,6 +2047,17 @@ static int parse_tc_pedit_action(struct mlx5e_priv 
> *priv,
>   goto out_err;
>   }
>  
> +if (namespace == MLX5_FLOW_NAMESPACE_FDB) /* FDB offloading */
> +max_actions = MLX5_CAP_ESW_FLOWTABLE_FDB(priv->mdev, 
> max_modify_header_actions);
> +else /* namespace is MLX5_FLOW_NAMESPACE_KERNEL - NIC offloading */
> +max_actions = MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, 
> max_modify_header_actions);
> +

we have the exact same block with comments in alloc_mod_hdr_actions() which
is called later after we parse the action and now parsing the pedit fields.
your check for max_actions was there in prev v. why move it?
if like this I suggest we create a small static function to get max actions
for a namespace and call it from both places to get max actions.

> + if (!max_actions) {
> + NL_SET_ERR_MSG_MOD(extack,
> +"don't support pedit actions, can't 
> offload");

please rephrase the msg to be consistent
i.e. The pedit offload action is not supported

> + goto out_err;
> + }
> +
>   mask = act->mangle.mask;
>   val = act->mangle.val;
>   offset = act->mangle.offset;
> @@ -2294,7 +2305,8 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
>   }
>   break;
>   default:
> - return -EINVAL;
> + NL_SET_ERR_MSG_MOD(extack, "The offload action is not 
> supported");
> + return -EOPNOTSUPP;
>   }
>   }
>  
> @@ -2616,7 +2628,8 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
>   break;
>   }
>   default:
> - return -EINVAL;
> + NL_SET_ERR_MSG_MOD(extack, "The offload action is not 
> supported");
> + return -EOPNOTSUPP;
>   }
>   }
>  
> 


Re: [PATCH net-next v2] net: sched: act_tunnel_key: fix metadata handling

2019-02-27 Thread Roi Dayan
005c73c202 RCX: 
> 7f28e91a8b87
> [  262.079087] RDX:  RSI: 7ffdc5c4e340 RDI: 
> 0003
> [  262.088122] RBP:  R08: 0001 R09: 
> 000c
> [  262.097157] R10: 000c R11: 0246 R12: 
> 0001
> [  262.106207] R13: 0067b4e0 R14: 7ffdc5c5248c R15: 
> 7ffdc5c52480
> [  262.115271] Modules linked in: act_tunnel_key act_skbmod act_simple 
> act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 act_csum libcrc32c 
> act_meta_skbtcindex act_meta_skbprio act_meta_mark act_ife ife act_police 
> act_sample psample act_gact veth nfsv3 nfs_acl nfs lockd grace fscache bridge 
> stp llc intel_rapl sb_edac mlx5_ib x86_pkg_temp_thermal sunrpc 
> intel_powerclamp coretemp ib_uverbs kvm_intel ib_core kvm irqbypass mlx5_core 
> crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel 
> intel_cstate mlxfw iTCO_wdt devlink intel_uncore iTCO_vendor_support 
> ipmi_ssif ptp mei_me intel_rapl_perf ioatdma joydev pps_core ses mei i2c_i801 
> pcspkr enclosure lpc_ich dca wmi ipmi_si ipmi_devintf ipmi_msghandler 
> acpi_pad acpi_power_meter pcc_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm 
> mpt3sas raid_class scsi_transport_sas
> [  262.204393] CR2: 00b0
> [  262.210390] ---[ end trace 2e41d786f2c7901a ]---
> [  262.226790] RIP: 0010:dst_cache_destroy+0x21/0xa0
> [  262.234083] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 
> c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 
> 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
> [  262.258311] RSP: 0018:888316447160 EFLAGS: 00010282
> [  262.266304] RAX:  RBX: 88835b3e2f00 RCX: 
> ad1c5071
> [  262.276251] RDX: 0003 RSI: dc00 RDI: 
> 0297
> [  262.286208] RBP:  R08: fbfff5dd4e89 R09: 
> fbfff5dd4e89
> [  262.296183] R10: 0001 R11: fbfff5dd4e88 R12: 
> 00b0
> [  262.306157] R13: 8883267a10c0 R14: af35fe60 R15: 
> 0001
> [  262.316139] FS:  7f28ea3e6400() GS:88836420() 
> knlGS:
> [  262.327146] CS:  0010 DS:  ES:  CR0: 80050033
> [  262.335815] CR2: 00b0 CR3: 0003178ae004 CR4: 
> 001606e0
> 
> Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support")
> Signed-off-by: Vlad Buslov 
> ---
> Changes from V1 to V2:
> - Extract metadata->dst error handler fix into standalone patch that targets
>   net.
> 
>  net/sched/act_tunnel_key.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
> index 3404af72b4c1..fc2b884b170c 100644
> --- a/net/sched/act_tunnel_key.c
> +++ b/net/sched/act_tunnel_key.c
> @@ -201,8 +201,14 @@ static void tunnel_key_release_params(struct 
> tcf_tunnel_key_params *p)
>  {
>   if (!p)
>   return;
> - if (p->tcft_action == TCA_TUNNEL_KEY_ACT_SET)
> + if (p->tcft_action == TCA_TUNNEL_KEY_ACT_SET) {
> +#ifdef CONFIG_DST_CACHE
> + struct ip_tunnel_info *info = &p->tcft_enc_metadata->u.tun_info;
> +
> + dst_cache_destroy(&info->dst_cache);
> +#endif
>   dst_release(&p->tcft_enc_metadata->dst);
> + }
>   kfree_rcu(p, rcu);
>  }
>  
> @@ -384,7 +390,8 @@ static int tunnel_key_init(struct net *net, struct nlattr 
> *nla,
>  
>  release_dst_cache:
>  #ifdef CONFIG_DST_CACHE
> - dst_cache_destroy(&metadata->u.tun_info.dst_cache);
> + if (metadata)
> + dst_cache_destroy(&metadata->u.tun_info.dst_cache);
>  #endif
>  release_tun_meta:
>   dst_release(&metadata->dst);
> @@ -401,15 +408,8 @@ static void tunnel_key_release(struct tc_action *a)
>  {
>   struct tcf_tunnel_key *t = to_tunnel_key(a);
>   struct tcf_tunnel_key_params *params;
> -#ifdef CONFIG_DST_CACHE
> - struct ip_tunnel_info *info;
> -#endif
>  
>   params = rcu_dereference_protected(t->params, 1);
> -#ifdef CONFIG_DST_CACHE
> - info = ¶ms->tcft_enc_metadata->u.tun_info;
> - dst_cache_destroy(&info->dst_cache);
> -#endif
>   tunnel_key_release_params(params);
>  }
>  
> 

Reviewed-by: Roi Dayan 


Re: [PATCH net-next v2 1/5] net/mlx5e: Return -EOPNOTSUPP when modify header action zero

2019-02-26 Thread Roi Dayan


On 25/02/2019 12:40, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> When max modify header action is zero, we return -EOPNOTSUPP
> directly. In this way, we can ignore wrong message info (e.g.
> "mlx5: parsed 0 pedit actions, can't do more").
> 
> This happens when offloading pedit actions on mlx VFs.
> 
> For example:
> $ tc filter add dev mlx5_vf parent : protocol ip prio 1 \
>   flower skip_sw dst_mac 00:10:56:fb:64:e8 \
>   dst_ip 1.1.1.100 src_ip 1.1.1.200 \
>   action pedit ex munge eth src set 00:10:56:b4:5d:20
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index b38986e..708f819 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -2002,7 +2002,8 @@ static int offload_pedit_fields(struct 
> pedit_headers_action *hdrs,
>  static int alloc_mod_hdr_actions(struct mlx5e_priv *priv,
>struct pedit_headers_action *hdrs,
>int namespace,
> -  struct mlx5e_tc_flow_parse_attr *parse_attr)
> +  struct mlx5e_tc_flow_parse_attr *parse_attr,
> +  struct netlink_ext_ack *extack)
>  {
>   int nkeys, action_size, max_actions;
>  
> @@ -2015,6 +2016,12 @@ static int alloc_mod_hdr_actions(struct mlx5e_priv 
> *priv,
>   else /* namespace is MLX5_FLOW_NAMESPACE_KERNEL - NIC offloading */
>   max_actions = MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, 
> max_modify_header_actions);
>  
> + if (!max_actions) {
> + NL_SET_ERR_MSG_MOD(extack,
> +"don't support pedit actions, can't 
> offload");

can we rephrase that to match the msg style you did in patch 5 ?
i.e. The pedit offload action is not supported


> + return -EOPNOTSUPP;
> + }
> +
>   /* can get up to crazingly 16 HW actions in 32 bits pedit SW key */
>   max_actions = min(max_actions, nkeys * 16);
>  
> @@ -2072,7 +2079,8 @@ static int alloc_tc_pedit_action(struct mlx5e_priv 
> *priv, int namespace,
>   u8 cmd;
>  
>   if (!parse_attr->mod_hdr_actions) {
> - err = alloc_mod_hdr_actions(priv, hdrs, namespace, parse_attr);
> + err = alloc_mod_hdr_actions(priv, hdrs,
> + namespace, parse_attr, extack);
>   if (err)
>   goto out_err;
>   }
> 


Re: [PATCH net-next v2 2/5] net/mlx5e: Make the log friendly when decapsulation offload not supported

2019-02-26 Thread Roi Dayan


On 25/02/2019 12:40, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> If we try to offload decapsulation actions to VFs hw, we get the log [1].
> It's not friendly, because the kind of net device is null, and we don't
> know what '0' means.
> 
> [1] "mlx5_core :05:01.2 vf_0: decapsulation offload is not supported for  
> net device (0)"
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
> index bdcc5e7..6cbfbfa 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
> @@ -84,7 +84,7 @@ static const char *mlx5e_netdev_kind(struct net_device *dev)
>   if (dev->rtnl_link_ops)
>   return dev->rtnl_link_ops->kind;
>   else
> - return "";
> + return "unknown";
>  }
>  
>  static int mlx5e_route_lookup_ipv6(struct mlx5e_priv *priv,
> @@ -620,8 +620,10 @@ int mlx5e_tc_tun_parse(struct net_device *filter_dev,
>   headers_c, headers_v);
>   } else {
>   netdev_warn(priv->netdev,
> - "decapsulation offload is not supported for %s net 
> device (%d)\n",
> - mlx5e_netdev_kind(filter_dev), tunnel_type);
> + "decapsulation offload is not supported for %s 
> (kind: \"%s\")\n",
> + netdev_name(filter_dev),
> + mlx5e_netdev_kind(filter_dev));
> +
>   return -EOPNOTSUPP;
>   }
>   return err;
> 

Reviewed-by: Roi Dayan 


Re: [PATCH net-next v2 5/5] net/mlx5e: Return -EOPNOTSUPP when attempting to offload an unsupported action

2019-02-26 Thread Roi Dayan


On 25/02/2019 12:40, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> The encapsulation is not supported for mlx5 VFs. When we try to
> offload that action, the -EINVAL is returned, but not -EOPNOTSUPP.
> This patch changes the returned value and ignore to confuse user.
> 
> For example: (p2p1_0 is VF net device)
> tc filter add dev p2p1_0 protocol ip  parent : prio 1 flower skip_sw \
>   src_mac e4:11:22:33:44:01   \
>   action tunnel_key set   \
>   src_ip 1.1.1.100\
>   dst_ip 1.1.1.200\
>   dst_port 4789 id 100\
>   action mirred egress redirect dev vxlan0
> 
> "RTNETLINK answers: Invalid argument"
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index d9fcb14..f5029ea 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -2302,7 +2302,8 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
>   }
>   break;
>   default:
> - return -EINVAL;
> + NL_SET_ERR_MSG_MOD(extack, "The offload action is not 
> supported");
> + return -EOPNOTSUPP;
>   }
>   }
>  
> @@ -2624,7 +2625,8 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
>   break;
>   }
>   default:
> - return -EINVAL;
> + NL_SET_ERR_MSG_MOD(extack, "The offload action is not 
> supported");
> + return -EOPNOTSUPP;
>   }
>   }
>  
> 

Reviewed-by: Roi Dayan 


Re: [PATCH net-next v2 4/5] net/mlx5e: Deletes unnecessary setting of esw_attr->parse_attr

2019-02-26 Thread Roi Dayan


On 25/02/2019 12:40, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> This patch deletes unnecessary setting of the esw_attr->parse_attr
> to parse_attr in parse_tc_fdb_actions() because it is already done
> by the mlx5e_flow_esw_attr_init() function.
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index e6583b9..d9fcb14 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -2566,7 +2566,6 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
>   out_dev->ifindex;
>   parse_attr->tun_info[attr->out_count] = *info;
>   encap = false;
> - attr->parse_attr = parse_attr;
>   attr->dests[attr->out_count].flags |=
>   MLX5_ESW_DEST_ENCAP;
>   attr->out_count++;
> 

Reviewed-by: Roi Dayan 


Re: [PATCH net-next v2 3/5] net/mlx5e: Remove 'parse_attr' argument in parse_tc_fdb_actions()

2019-02-26 Thread Roi Dayan


On 25/02/2019 12:40, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> This patch is a little improvement. Simplify the parse_tc_fdb_actions().
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 708f819..e6583b9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -2475,13 +2475,13 @@ static int parse_tc_vlan_action(struct mlx5e_priv 
> *priv,
>  
>  static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
>   struct flow_action *flow_action,
> - struct mlx5e_tc_flow_parse_attr *parse_attr,
>   struct mlx5e_tc_flow *flow,
>   struct netlink_ext_ack *extack)
>  {
>   struct pedit_headers_action hdrs[2] = {};
>   struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>   struct mlx5_esw_flow_attr *attr = flow->esw_attr;
> + struct mlx5e_tc_flow_parse_attr *parse_attr = attr->parse_attr;
>   struct mlx5e_rep_priv *rpriv = priv->ppriv;
>   const struct ip_tunnel_info *info = NULL;
>   const struct flow_action_entry *act;
> @@ -2796,7 +2796,7 @@ static bool is_peer_flow_needed(struct mlx5e_tc_flow 
> *flow)
>   if (err)
>   goto err_free;
>  
> - err = parse_tc_fdb_actions(priv, &rule->action, parse_attr, flow, 
> extack);
> +     err = parse_tc_fdb_actions(priv, &rule->action, flow, extack);
>   if (err)
>   goto err_free;
>  
> 

Reviewed-by: Roi Dayan 


[PATCH net] net/sched: cls_flower: Remove old entries from rhashtable

2018-12-19 Thread Roi Dayan
When replacing a rule we add the new rule to the rhashtable
but only remove the old if not in skip_sw.
This commit fix this and remove the old rule anyway.

Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under 
skip_sw")
Signed-off-by: Roi Dayan 
Reviewed-by: Vlad Buslov 
---
 net/sched/cls_flower.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 71312d7bd8f4..208d940464d7 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1258,10 +1258,9 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
if (fold) {
-   if (!tc_skip_sw(fold->flags))
-   rhashtable_remove_fast(&fold->mask->ht,
-  &fold->ht_node,
-  fold->mask->filter_ht_params);
+   rhashtable_remove_fast(&fold->mask->ht,
+  &fold->ht_node,
+  fold->mask->filter_ht_params);
if (!tc_skip_hw(fold->flags))
fl_hw_destroy_filter(tp, fold, NULL);
}
-- 
2.7.5



[PATCH iproute2] tc: Fix output of ip attributes

2018-07-03 Thread Roi Dayan
Example output is of tos and ttl.
Befoe this fix the format used %x caused output of the pointer
instead of the intended string created in the out variable.

Fixes: e28b88a464c4 ("tc: jsonify flower filter")
Signed-off-by: Roi Dayan 
---
 tc/f_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index c710765179fb..1dfd57d286d9 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -1134,7 +1134,7 @@ static void flower_print_ip_attr(char *name, struct 
rtattr *key_attr,
if (mask_attr)
sprintf(out + done, "/%x", rta_getattr_u8(mask_attr));
 
-   sprintf(namefrm, "\n  %s %%x", name);
+   sprintf(namefrm, "\n  %s %%s", name);
print_string(PRINT_ANY, name, namefrm, out);
 }
 
-- 
2.7.5



Re: [PATCH net-next 0/2] cls_flower: Various fixes

2018-06-04 Thread Roi Dayan




On 03/06/2018 22:39, Jiri Pirko wrote:

Sun, Jun 03, 2018 at 08:33:25PM CEST, xiyou.wangc...@gmail.com wrote:

On Wed, May 30, 2018 at 1:17 AM, Paul Blakey  wrote:

Two of the fixes are for my multiple mask patch

Paul Blakey (2):
   cls_flower: Fix missing free of rhashtable
   cls_flower: Fix comparing of old filter mask with new filter


Both are bug fixes and one-line fixes, so definitely should go
to -net tree and -stable tree.


I agree.



it's because the commit they fix doesn't exists in net yet.



[PATCH iproute2-next] tc: Fix compilation error with old iptables

2018-03-27 Thread Roi Dayan
The compat_rev field does not exists in old versions of iptables.
e.g. iptables 1.4.

Fixes: dd29621578d2 ("tc: add em_ipt ematch for calling xtables matches from tc 
matching context")
Signed-off-by: Roi Dayan 
---
 tc/em_ipt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tc/em_ipt.c b/tc/em_ipt.c
index 9d2b2f9b46d4..aa2edd63c550 100644
--- a/tc/em_ipt.c
+++ b/tc/em_ipt.c
@@ -41,7 +41,9 @@ static struct xtables_globals em_tc_ipt_globals = {
.program_version = "0.1",
.orig_opts = original_opts,
.opts = original_opts,
+#if (XTABLES_VERSION_CODE >= 11)
.compat_rev = xtables_compatible_revision,
+#endif
 };
 
 static struct xt_entry_match *fake_xt_entry_match(int data_size, void *data)
-- 
2.7.0



[PATCH net] net/sched: Fix update of lastuse in act modules implementing stats_update

2017-12-25 Thread Roi Dayan
We need to update lastuse to to the most updated value between what
is already set and the new value.
If HW matching fails, i.e. because of an issue, the stats are not updated
but it could be that software did match and updated lastuse.

Fixes: 5712bf9c5c30 ("net/sched: act_mirred: Use passed lastuse argument")
Fixes: 9fea47d93bcc ("net/sched: act_gact: Update statistics when offloaded to 
hardware")
Signed-off-by: Roi Dayan 
Reviewed-by: Paul Blakey 
Acked-by: Jiri Pirko 
---
 net/sched/act_gact.c   | 2 +-
 net/sched/act_mirred.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index e29a48e..a0ac42b 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -159,7 +159,7 @@ static void tcf_gact_stats_update(struct tc_action *a, u64 
bytes, u32 packets,
if (action == TC_ACT_SHOT)
this_cpu_ptr(gact->common.cpu_qstats)->drops += packets;
 
-   tm->lastuse = lastuse;
+   tm->lastuse = max_t(u64, tm->lastuse, lastuse);
 }
 
 static int tcf_gact_dump(struct sk_buff *skb, struct tc_action *a,
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 8b3e5938..08b6184 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -239,7 +239,7 @@ static void tcf_stats_update(struct tc_action *a, u64 
bytes, u32 packets,
struct tcf_t *tm = &m->tcf_tm;
 
_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
-   tm->lastuse = lastuse;
+   tm->lastuse = max_t(u64, tm->lastuse, lastuse);
 }
 
 static int tcf_mirred_dump(struct sk_buff *skb, struct tc_action *a, int bind,
-- 
2.7.5



Re: [net] vxlan: fix use-after-free on deletion

2017-06-01 Thread Roi Dayan



On 01/06/2017 11:53, Jiri Benc wrote:

On Thu, 1 Jun 2017 11:43:35 +0300, Mark Bloch wrote:

Adding a vxlan interface to a socket isn't symmetrical, while adding
is done in vxlan_open() the deletion is done in vxlan_dellink().
This can cause a use-after-free error when we close the vxlan
interface before deleting it.

We add vxlan_vs_del_dev() to match vxlan_vs_add_dev() and call
it from vxlan_stop() to match the call from vxlan_open().

Signed-off-by: Mark Bloch 


Acked-by: Jiri Benc 



Hi, I did some tests and didn't reproduce the original kasan issue
reported.

Tested-by: Roi Dayan 


[PATCH iproute2] devlink: Add option to set and show eswitch encapsulation support

2017-05-20 Thread Roi Dayan
This is an e-switch global knob to enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.

The actual encap/decap is carried out (along with the matching and other
actions) per offloaded e-switch rules, e.g as done when offloading the TC tunnel
key action.

Possible values are enable/disable.

Signed-off-by: Roi Dayan 
Reviewed-by: Jiri Pirko 
---
 devlink/devlink.c  | 48 +++-
 man/man8/devlink-dev.8 | 13 +
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index e22ee0a..f9bc16c 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -176,6 +176,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_ESWITCH_INLINE_MODE BIT(12)
 #define DL_OPT_DPIPE_TABLE_NAMEBIT(13)
 #define DL_OPT_DPIPE_TABLE_COUNTERSBIT(14)
+#define DL_OPT_ESWITCH_ENCAP_MODE  BIT(15)
 
 struct dl_opts {
uint32_t present; /* flags of present items */
@@ -195,6 +196,7 @@ struct dl_opts {
enum devlink_eswitch_inline_mode eswitch_inline_mode;
const char *dpipe_table_name;
bool dpipe_counters_enable;
+   bool eswitch_encap_mode;
 };
 
 struct dl {
@@ -299,6 +301,7 @@ static const enum mnl_attr_data_type 
devlink_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_SB_OCC_MAX] = MNL_TYPE_U32,
[DEVLINK_ATTR_ESWITCH_MODE] = MNL_TYPE_U16,
[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = MNL_TYPE_U8,
+   [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = MNL_TYPE_U8,
[DEVLINK_ATTR_DPIPE_TABLES] = MNL_TYPE_NESTED,
[DEVLINK_ATTR_DPIPE_TABLE] = MNL_TYPE_NESTED,
[DEVLINK_ATTR_DPIPE_TABLE_NAME] = MNL_TYPE_STRING,
@@ -754,6 +757,19 @@ static int dpipe_counters_enable_get(const char *typestr,
return 0;
 }
 
+static int eswitch_encap_mode_get(const char *typestr, bool *p_mode)
+{
+   if (strcmp(typestr, "enable") == 0) {
+   *p_mode = true;
+   } else if (strcmp(typestr, "disable") == 0) {
+   *p_mode = false;
+   } else {
+   pr_err("Unknown eswitch encap mode \"%s\"\n", typestr);
+   return -EINVAL;
+   }
+   return 0;
+}
+
 static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 uint32_t o_optional)
 {
@@ -908,7 +924,19 @@ static int dl_argv_parse(struct dl *dl, uint32_t 
o_required,
if (err)
return err;
o_found |= DL_OPT_DPIPE_TABLE_COUNTERS;
+   } else if (dl_argv_match(dl, "encap") &&
+  (o_all & DL_OPT_ESWITCH_ENCAP_MODE)) {
+   const char *typestr;
 
+   dl_arg_inc(dl);
+   err = dl_argv_str(dl, &typestr);
+   if (err)
+   return err;
+   err = eswitch_encap_mode_get(typestr,
+&opts->eswitch_encap_mode);
+   if (err)
+   return err;
+   o_found |= DL_OPT_ESWITCH_ENCAP_MODE;
} else {
pr_err("Unknown option \"%s\"\n", dl_argv(dl));
return -EINVAL;
@@ -986,6 +1014,13 @@ static int dl_argv_parse(struct dl *dl, uint32_t 
o_required,
pr_err("Dpipe table counter state expected\n");
return -EINVAL;
}
+
+   if ((o_required & DL_OPT_ESWITCH_ENCAP_MODE) &&
+   !(o_found & DL_OPT_ESWITCH_ENCAP_MODE)) {
+   pr_err("E-Switch encapsulation option expected.\n");
+   return -EINVAL;
+   }
+
return 0;
 }
 
@@ -1041,6 +1076,9 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl 
*dl)
if (opts->present & DL_OPT_DPIPE_TABLE_COUNTERS)
mnl_attr_put_u8(nlh, DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED,
opts->dpipe_counters_enable);
+   if (opts->present & DL_OPT_ESWITCH_ENCAP_MODE)
+   mnl_attr_put_u8(nlh, DEVLINK_ATTR_ESWITCH_ENCAP_MODE,
+   opts->eswitch_encap_mode);
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
@@ -1097,6 +1135,7 @@ static void cmd_dev_help(void)
pr_err("Usage: devlink dev show [ DEV ]\n");
pr_err("   devlink dev eswitch set DEV [ mode { legacy | switchdev 
} ]\n");
pr_err("   [ inline-mode { none | link | 
network | transport } ]\n");
+   pr_err("   [ encap { disable | enable } 
]\n");
pr_err("   devlink dev eswitch show DEV\n")

[PATCH iproute2 net-next] devlink: Add json and pretty options to help and man

2017-03-06 Thread Roi Dayan
While at it also fixed missing double dash for long opts.

Signed-off-by: Roi Dayan 
---
 devlink/devlink.c  |  2 +-
 man/man8/devlink.8 | 14 --
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index c357580..e90226e 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -2470,7 +2470,7 @@ static void help(void)
 {
pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
   "where  OBJECT := { dev | port | sb | monitor }\n"
-  "   OPTIONS := { -V[ersion] | -n[no-nice-names] }\n");
+  "   OPTIONS := { -V[ersion] | -n[no-nice-names] | -j[json] | 
-p[pretty] }\n");
 }
 
 static int dl_cmd(struct dl *dl)
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index cf0563b..a480766 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -20,19 +20,29 @@ devlink \- Devlink tool
 .IR OPTIONS " := { "
 \fB\-V\fR[\fIersion\fR] |
 \fB\-n\fR[\fIno-nice-names\fR] }
+\fB\-j\fR[\fIjson\fR] }
+\fB\-p\fR[\fIpretty\fR] }
 
 .SH OPTIONS
 
 .TP
-.BR "\-V" , " -Version"
+.BR "\-V" , " --Version"
 Print the version of the
 .B devlink
 utility and exit.
 
 .TP
-.BR "\-n" , " -no-nice-names"
+.BR "\-n" , " --no-nice-names"
 Turn off printing out nice names, for example netdevice ifnames instead of 
devlink port identification.
 
+.TP
+.BR "\-j" , " --json"
+Generate JSON output.
+
+.TP
+.BR "\-p" , " --pretty"
+When combined with -j generate a pretty JSON output.
+
 .SS
 .I OBJECT
 
-- 
2.7.4



  1   2   >