Re: [PATCH net-next] liquidio: correct error msg text when removing VLAN ID

2018-07-16 Thread David Miller
From: Felix Manlunas 
Date: Mon, 16 Jul 2018 18:06:07 -0700

> From: Rick Farrington 
> 
> Signed-off-by: Rick Farrington 
> Signed-off-by: Felix Manlunas 

Applied.


Re: [PATCH v3 net-next] net/sched: add skbprio scheduler

2018-07-16 Thread Cong Wang
On Fri, Jul 13, 2018 at 9:51 PM Marcelo Ricardo Leitner
 wrote:
>
> Well, it would help if you didn't cut out key parts of my words.

Sorry about it, please allow me to copy and paste all of your words
here:

"Yes, but Michel wants to drop from other lower priorities if needed,
and that's not possible if you handle the limit already in a child
qdisc as they don't know about their siblings. The idea in the example
above is to discard it from whatever lower priority is needed, then
queue it. (ok, the example missed to check the priority level)"

So from your own words, you agreed "the idea in the example"
is not what Michel wants, because "is to discard it from whatever
lower priority is needed", as "Michel wants to drop from other lower
priorities if needed".

You also agreed Michel's requirement is not possible (to implement
in sch_prio) because "you handle the limit already in a child qdisc
as they don't know about their siblings" is also true.

Based on the above, I said it "disproves your point of adding a flag
to sch_prio".

What am I missing?



>
> >
> > What am I missing here?
> >
> > Are you go further by suggesting moving the limit out of prio?
> > Or are you going to expand your definition of "adding a flag"?
> > Perhaps two flags? :)
> >
> > I am very open for discussion to see how far we can go.
>
> I am not keen on continuing this discussion if you keep twisting my
> words just for fun.

No, I am trying to understand seriously about what you suggest here.

Please be patient! I know I am stupid :)

Thanks!


Re: [PATCH v3 net-next 0/3] rds: IPv6 support

2018-07-16 Thread Ka-Cheong Poon

On 07/17/2018 12:20 AM, Sowmini Varadhan wrote:


-  Looks like rds_connect() is checking things in the right order (thanks)
However, rds_cancel_sent_to is still looking at the len to figure
out the family.. as we move to ipv6,  it would be better if we allow
the caller to specify struct sockaddr_storage, or even a union of
sockaddr_in/sockaddr_in6, rather than require them to hint at which
one of ipv4/ipv6 through the optlen.



The app can use either structures to make the call.  When the
app fills in the structure, it knows what it is filling in,
either sockaddr_in or sockaddr_in6.  So it knows the right size
to use.  The app can also use IPv4 mapped address in a sockaddr_in6
without a problem.



Please see __sys_connect and move_addr_to_kernel if the user-kernel
copy is the reason you are not doing this. Similar to inet_dgram_connect
you can then check the sa_family and use that to figure out the
"Assume IPv4" etc stuff.

This would also make the CANCEL_SEND_TO API consistent with the bind/
connect etc semantics.



Could you please explain the inconsistency?  An app can use IPv4
mapped address in a sockaddr_in6 to operate on an IPv4 connection,
in case you are thinking of this new addition in v3 of the patch.



-  net/rds/rds.h: thanks for moving RDS_CM_PORT to the rdma specific file.

I am guessing (?) that you want to update the comment to talk about
the non-existent "RDS over UDP" based on the title of the IANA registration?
I would just like to re-iterate that this is actually inaccurate
(and confusing to someone looking at this for the first time, since
there is no RDS-over-UDP today). If it were up to me, I would update
the comment to say

/* The following ports, 16385, 18634, 18635, are registered with IANA as
  * the ports to be used for "RDS over TCP and UDP".
  * The current linux implementation supports RDS over TCP and IB, and uses
  * the ports as follows: 18634 is the historical value used for the
  * RDMA_CM listener port.  RDS/TCP uses port 16385.  After
  * IPv6 work, RDMA_CM also uses 16385 as the listener port.  18634 is kept
  * to ensure compatibility with older RDS modules.  Those ports are defined
  * in each transport's header file.



Will update it to


/* The following ports, 16385, 18634, 18635, are registered with IANA as 

 * the ports to be used for RDS over TCP and UDP.  Currently, only RDS 
over
 * TCP and RDS over IB/RDMA are implemented.  18634 is the historical 
value
 * used for the RDMA_CM listener port.  RDS/TCP uses port 16385.  After 

 * IPv6 work, RDMA_CM also uses 16385 as the listener port.  18634 is 
kept
 * to ensure compatibility with older RDS modules.  Those ports are 
defined
 * in each transport's header file. 


 */



--
K. Poon
ka-cheong.p...@oracle.com




Re: [PATCH ipsec-next] xfrm: Allow Set Mark to be Updated Using UPDSA

2018-07-16 Thread Eyal Birger
On Mon, 16 Jul 2018 15:27:26 -0700
Nathan Harold  wrote:

> < re-sent with apologies due to incorrect formatting last
> time... :-( >
> 
> Hi Eyal,
> 
> > If x1 points to a state previously found using
> > __xfrm_state_locate(x), won't __xfrm_state_bump_genids(x1) be
> > equivalent to x1->genid++ in this case?  
> 
> In the vanilla case this is true. IE, if there are no strange/abusive
> uses of the API such as the test below where multiple SAs can match
> the locate().
> 
> > Is it possible that other states will match all of x1 parameters?  
> 
> Yes.  Not sure if it's a bug or a feature, but it's possible for
> multiple SAs to match... for a depressing example, check out
> https://android-review.googlesource.com/c/kernel/tests/+/680958. There
> may be cases where something like this is desired behavior that I'm
> not aware of. Since this is control path, it felt to me like the
> formalism of using the xfrm_state_bump_genids() was worth not possibly
> walking into a different subtle bug later.

Ok. This is indeed depressing and also unexpected.

I wonder if this behavior could be fixed... I'd find it odd if anyone
is relying on being to able to delete a 'no mark' state by supplying
parameters that do include an explicit mark. I have no idea if anyone
is relying on the state insertion order wrt marks - though it would
seem odd to me as well -- obviously such a change is unrelated to this
patch.

I now better understand the need to be cautious.

> 
> > Also, any idea why this isn't needed for other changes in the
> > state?  
> 
> The set_mark (output_mark) is somewhat special because changing this
> mark impacts the routing lookup, which up to now, none of the other
> parameters in the update_sa function do. A new output_mark can and
> will reroute packets to different interfaces. Thus, when we change
> this thing, we want to ensure that we always build a new bundle with a
> new bundle with a new route lookup based on the new set_mark. Since we
> removed the flow cache, things might *incidentally* seem to work right
> now; but, I think that's incidental rather than correct. By bumping
> the genid, we get the dst_entry->check() function to correctly return
> that the dst is obsolete when we call check(). I'm honestly not sure
> what corner cases we could land in if we didn't bump the genid in such
> a case.
> 
> There's definitely a lot going on behind the scenes in this little
> change that I only tenuously grasp, so it's possible that I'm being
> overly cautious in this case. Please let me know your further thoughts
> on whether we need to bump the genid. FYI once this patch is settled,
> I plan to upload a patch to update the xfrm_if_id, which I planned to
> nestle in to this same logic (and with similar, albeit possibly
> more-straightforward rationale).

Thanks so much for the clarification. Indeed there are nuances here and
I appreciate you taking the time to describe them.

FWIW you can add my:

Reviewed-by: Eyal Birger 

Thanks!
Eyal.


Re: [PATCH bpf-next 6/9] bpf: offload: allow program and map sharing per-ASIC

2018-07-16 Thread Jakub Kicinski
On Mon, 16 Jul 2018 20:00:47 -0700, Alexei Starovoitov wrote:
> On Mon, Jul 16, 2018 at 07:37:20PM -0700, Jakub Kicinski wrote:
> > Create a higher-level entity to represent a device/ASIC to allow
> > programs and maps to be shared between device ports.  The extra
> > work is required to make sure we don't destroy BPF objects as
> > soon as the netdev for which they were loaded gets destroyed,
> > as other ports may still be using them.  When netdev goes away
> > all of its BPF objects will be moved to other netdevs of the
> > device, and only destroyed when last netdev is unregistered.
> > 
> > Signed-off-by: Jakub Kicinski 
> > Reviewed-by: Quentin Monnet   
> ..
> > -bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map)
> > +static bool __bpf_offload_dev_match(struct bpf_prog *prog,
> > +   struct net_device *netdev)
> >  {
> > -   struct bpf_offloaded_map *offmap;
> > +   struct bpf_offload_netdev *ondev1, *ondev2;
> > struct bpf_prog_offload *offload;
> > -   bool ret;
> >  
> > if (!bpf_prog_is_dev_bound(prog->aux))
> > return false;
> > -   if (!bpf_map_is_dev_bound(map))
> > -   return bpf_map_offload_neutral(map);
> >  
> > -   down_read(_devs_lock);
> > offload = prog->aux->offload;
> > +   if (!offload)
> > +   return false;
> > +   if (offload->netdev == netdev)
> > +   return true;
> > +
> > +   ondev1 = bpf_offload_find_netdev(offload->netdev);
> > +   ondev2 = bpf_offload_find_netdev(netdev);
> > +
> > +   return ondev1 && ondev2 && ondev1->offdev == ondev2->offdev;
> > +}
> > +
> > +bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device 
> > *netdev)
> > +{
> > +   bool ret;
> > +
> > +   down_read(_devs_lock);
> > +   ret = __bpf_offload_dev_match(prog, netdev);
> > +   up_read(_devs_lock);
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(bpf_offload_dev_match);
> > +
> > +bool bpf_offload_match(struct bpf_prog *prog, struct bpf_map *map)
> > +{
> > +   struct bpf_offloaded_map *offmap;
> > +   bool ret;
> > +
> > +   if (!bpf_map_is_dev_bound(map))
> > +   return bpf_map_offload_neutral(map);
> > offmap = map_to_offmap(map);
> >  
> > -   ret = offload && offload->netdev == offmap->netdev;
> > +   down_read(_devs_lock);
> > +   ret = __bpf_offload_dev_match(prog, offmap->netdev);
> > up_read(_devs_lock);
> >  
> > return ret;
> >  }
> >
> ..
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 9e2bf834f13a..2c5b923eef75 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -5054,7 +5054,7 @@ static int check_map_prog_compatibility(struct 
> > bpf_verifier_env *env,
> > }
> >  
> > if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) &&
> > -   !bpf_offload_dev_match(prog, map)) {
> > +   !bpf_offload_match(prog, map)) {  
> 
> I'm confused with new names and renaming.
> May be split renaming into separate patch?
> Should new bpf_offload_match() be called bpf_offload_prog_map_match() ?
> or some other name?
> May be adding comments to these functions will make it clear...

It is messy.  The new functions to register/unregister ASIC are called
bpf_offload_dev_*, hence it seemed like a good idea to call the
function exported to the drivers bpf_offload_dev_match() (see patches 7
and 8) to keep the driver API consistent.  But then the old function
which is only used by the verivier has to be renamed.

I will use bpf_offload_prog_map_match() and split to a separate patch.


Re: [PATCH bpf-next 6/9] bpf: offload: allow program and map sharing per-ASIC

2018-07-16 Thread Alexei Starovoitov
On Mon, Jul 16, 2018 at 07:37:20PM -0700, Jakub Kicinski wrote:
> Create a higher-level entity to represent a device/ASIC to allow
> programs and maps to be shared between device ports.  The extra
> work is required to make sure we don't destroy BPF objects as
> soon as the netdev for which they were loaded gets destroyed,
> as other ports may still be using them.  When netdev goes away
> all of its BPF objects will be moved to other netdevs of the
> device, and only destroyed when last netdev is unregistered.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Quentin Monnet 
..
> -bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map)
> +static bool __bpf_offload_dev_match(struct bpf_prog *prog,
> + struct net_device *netdev)
>  {
> - struct bpf_offloaded_map *offmap;
> + struct bpf_offload_netdev *ondev1, *ondev2;
>   struct bpf_prog_offload *offload;
> - bool ret;
>  
>   if (!bpf_prog_is_dev_bound(prog->aux))
>   return false;
> - if (!bpf_map_is_dev_bound(map))
> - return bpf_map_offload_neutral(map);
>  
> - down_read(_devs_lock);
>   offload = prog->aux->offload;
> + if (!offload)
> + return false;
> + if (offload->netdev == netdev)
> + return true;
> +
> + ondev1 = bpf_offload_find_netdev(offload->netdev);
> + ondev2 = bpf_offload_find_netdev(netdev);
> +
> + return ondev1 && ondev2 && ondev1->offdev == ondev2->offdev;
> +}
> +
> +bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev)
> +{
> + bool ret;
> +
> + down_read(_devs_lock);
> + ret = __bpf_offload_dev_match(prog, netdev);
> + up_read(_devs_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(bpf_offload_dev_match);
> +
> +bool bpf_offload_match(struct bpf_prog *prog, struct bpf_map *map)
> +{
> + struct bpf_offloaded_map *offmap;
> + bool ret;
> +
> + if (!bpf_map_is_dev_bound(map))
> + return bpf_map_offload_neutral(map);
>   offmap = map_to_offmap(map);
>  
> - ret = offload && offload->netdev == offmap->netdev;
> + down_read(_devs_lock);
> + ret = __bpf_offload_dev_match(prog, offmap->netdev);
>   up_read(_devs_lock);
>  
>   return ret;
>  }
>  
..
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9e2bf834f13a..2c5b923eef75 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5054,7 +5054,7 @@ static int check_map_prog_compatibility(struct 
> bpf_verifier_env *env,
>   }
>  
>   if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) &&
> - !bpf_offload_dev_match(prog, map)) {
> + !bpf_offload_match(prog, map)) {

I'm confused with new names and renaming.
May be split renaming into separate patch?
Should new bpf_offload_match() be called bpf_offload_prog_map_match() ?
or some other name?
May be adding comments to these functions will make it clear...



[PATCH bpf-next 5/9] bpf: offload: aggregate offloads per-device

2018-07-16 Thread Jakub Kicinski
Currently we have two lists of offloaded objects - programs and maps.
Netdevice unregister notifier scans those lists to orphan objects
associated with device being unregistered.  This puts unnecessary
(even if negligible) burden on all netdev unregister calls in BPF-
-enabled kernel.  The lists of objects may potentially get long
making the linear scan even more problematic.  There haven't been
complaints about this mechanisms so far, but it is suboptimal.

Instead of relying on notifiers, make the few BPF-capable drivers
register explicitly for BPF offloads.  The programs and maps will
now be collected per-device not on a global list, and only scanned
for removal when driver unregisters from BPF offloads.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c |  13 ++
 drivers/net/netdevsim/bpf.c   |   7 +
 include/linux/bpf.h   |   2 +
 kernel/bpf/offload.c  | 142 --
 4 files changed, 118 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c 
b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index b95b94d008cf..dee039ada75c 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -404,6 +404,16 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
return -EINVAL;
 }
 
+static int nfp_bpf_ndo_init(struct nfp_app *app, struct net_device *netdev)
+{
+   return bpf_offload_dev_netdev_register(netdev);
+}
+
+static void nfp_bpf_ndo_uninit(struct nfp_app *app, struct net_device *netdev)
+{
+   bpf_offload_dev_netdev_unregister(netdev);
+}
+
 static int nfp_bpf_init(struct nfp_app *app)
 {
struct nfp_app_bpf *bpf;
@@ -466,6 +476,9 @@ const struct nfp_app_type app_bpf = {
 
.extra_cap  = nfp_bpf_extra_cap,
 
+   .ndo_init   = nfp_bpf_ndo_init,
+   .ndo_uninit = nfp_bpf_ndo_uninit,
+
.vnic_alloc = nfp_bpf_vnic_alloc,
.vnic_free  = nfp_bpf_vnic_free,
 
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 357f9e62f306..c4a2829e0e1f 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -582,6 +582,8 @@ int nsim_bpf(struct net_device *dev, struct netdev_bpf *bpf)
 
 int nsim_bpf_init(struct netdevsim *ns)
 {
+   int err;
+
if (ns->sdev->refcnt == 1) {
INIT_LIST_HEAD(>sdev->bpf_bound_progs);
INIT_LIST_HEAD(>sdev->bpf_bound_maps);
@@ -592,6 +594,10 @@ int nsim_bpf_init(struct netdevsim *ns)
return -ENOMEM;
}
 
+   err = bpf_offload_dev_netdev_register(ns->netdev);
+   if (err)
+   return err;
+
debugfs_create_u32("bpf_offloaded_id", 0400, ns->ddir,
   >bpf_offloaded_id);
 
@@ -625,6 +631,7 @@ void nsim_bpf_uninit(struct netdevsim *ns)
WARN_ON(ns->xdp.prog);
WARN_ON(ns->xdp_hw.prog);
WARN_ON(ns->bpf_offloaded);
+   bpf_offload_dev_netdev_unregister(ns->netdev);
 
if (ns->sdev->refcnt == 1) {
WARN_ON(!list_empty(>sdev->bpf_bound_progs));
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8827e797ff97..21c001c3285c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -648,6 +648,8 @@ int bpf_map_offload_delete_elem(struct bpf_map *map, void 
*key);
 int bpf_map_offload_get_next_key(struct bpf_map *map,
 void *key, void *next_key);
 
+int bpf_offload_dev_netdev_register(struct net_device *netdev);
+void bpf_offload_dev_netdev_unregister(struct net_device *netdev);
 bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map);
 
 #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index ac747d5cf7c6..b914f94c53d4 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -18,19 +18,37 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
-/* Protects bpf_prog_offload_devs, bpf_map_offload_devs and offload members
+/* Protects offdevs, members of bpf_offload_netdev and offload members
  * of all progs.
  * RTNL lock cannot be taken when holding this lock.
  */
 static DECLARE_RWSEM(bpf_devs_lock);
-static LIST_HEAD(bpf_prog_offload_devs);
-static LIST_HEAD(bpf_map_offload_devs);
+
+struct bpf_offload_netdev {
+   struct rhash_head l;
+   struct net_device *netdev;
+   struct list_head progs;
+   struct list_head maps;
+};
+
+static const struct rhashtable_params offdevs_params = {
+   .nelem_hint = 4,
+   .key_len= sizeof(struct net_device *),
+   .key_offset = offsetof(struct bpf_offload_netdev, netdev),
+   .head_offset= offsetof(struct bpf_offload_netdev, l),
+   .automatic_shrinking= true,
+};
+
+static struct 

[PATCH bpf-next 7/9] netdevsim: allow program sharing between devices

2018-07-16 Thread Jakub Kicinski
Allow program sharing between devices which were linked together.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/netdevsim/bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 9eab29f67a0e..81444208b216 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -294,7 +294,7 @@ nsim_setup_prog_hw_checks(struct netdevsim *ns, struct 
netdev_bpf *bpf)
NSIM_EA(bpf->extack, "xdpoffload of non-bound program");
return -EINVAL;
}
-   if (bpf->prog->aux->offload->netdev != ns->netdev) {
+   if (!bpf_offload_dev_match(bpf->prog, ns->netdev)) {
NSIM_EA(bpf->extack, "program bound to different dev");
return -EINVAL;
}
-- 
2.17.1



[PATCH bpf-next 9/9] selftests/bpf: add test for sharing objects between netdevs

2018-07-16 Thread Jakub Kicinski
Add tests for sharing programs and maps between different netdevs.
Use netdevsim's ability to pretend multiple netdevs belong to the
same "ASIC".

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 tools/testing/selftests/bpf/test_offload.py | 146 +++-
 1 file changed, 142 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_offload.py 
b/tools/testing/selftests/bpf/test_offload.py
index ee1abef384ea..d59642e70f56 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -158,8 +158,9 @@ netns = [] # net namespaces to be removed
 else:
 return ret, out
 
-def bpftool(args, JSON=True, ns="", fail=True):
-return tool("bpftool", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail)
+def bpftool(args, JSON=True, ns="", fail=True, include_stderr=False):
+return tool("bpftool", args, {"json":"-p"}, JSON=JSON, ns=ns,
+fail=fail, include_stderr=include_stderr)
 
 def bpftool_prog_list(expected=None, ns=""):
 _, progs = bpftool("prog show", JSON=True, ns=ns, fail=True)
@@ -201,6 +202,21 @@ netns = [] # net namespaces to be removed
 time.sleep(0.05)
 raise Exception("Time out waiting for map counts to stabilize want %d, 
have %d" % (expected, nmaps))
 
+def bpftool_prog_load(sample, file_name, maps=[], prog_type="xdp", dev=None,
+  fail=True, include_stderr=False):
+args = "prog load %s %s" % (os.path.join(bpf_test_dir, sample), file_name)
+if prog_type is not None:
+args += " type " + prog_type
+if dev is not None:
+args += " dev " + dev
+if len(maps):
+args += " map " + " map ".join(maps)
+
+res = bpftool(args, fail=fail, include_stderr=include_stderr)
+if res[0] == 0:
+files.append(file_name)
+return res
+
 def ip(args, force=False, JSON=True, ns="", fail=True, include_stderr=False):
 if force:
 args = "-force " + args
@@ -307,7 +323,9 @@ netns = [] # net namespaces to be removed
 Class for netdevsim netdevice and its attributes.
 """
 
-def __init__(self):
+def __init__(self, link=None):
+self.link = link
+
 self.dev = self._netdevsim_create()
 devs.append(self)
 
@@ -321,8 +339,9 @@ netns = [] # net namespaces to be removed
 return self.dev[key]
 
 def _netdevsim_create(self):
+link = "" if self.link is None else "link " + self.link.dev['ifname']
 _, old  = ip("link show")
-ip("link add sim%d type netdevsim")
+ip("link add sim%d {link} type netdevsim".format(link=link))
 _, new  = ip("link show")
 
 for dev in new:
@@ -848,6 +867,25 @@ netns = []
 sim.set_mtu(1500)
 
 sim.wait_for_flush()
+start_test("Test non-offload XDP attaching to HW...")
+bpftool_prog_load("sample_ret0.o", "/sys/fs/bpf/nooffload")
+nooffload = bpf_pinned("/sys/fs/bpf/nooffload")
+ret, _, err = sim.set_xdp(nooffload, "offload",
+  fail=False, include_stderr=True)
+fail(ret == 0, "attached non-offloaded XDP program to HW")
+check_extack_nsim(err, "xdpoffload of non-bound program.", args)
+rm("/sys/fs/bpf/nooffload")
+
+start_test("Test offload XDP attaching to drv...")
+bpftool_prog_load("sample_ret0.o", "/sys/fs/bpf/offload",
+  dev=sim['ifname'])
+offload = bpf_pinned("/sys/fs/bpf/offload")
+ret, _, err = sim.set_xdp(offload, "drv", fail=False, include_stderr=True)
+fail(ret == 0, "attached offloaded XDP program to drv")
+check_extack(err, "using device-bound program without HW_MODE flag is not 
supported.", args)
+rm("/sys/fs/bpf/offload")
+sim.wait_for_flush()
+
 start_test("Test XDP offload...")
 _, _, err = sim.set_xdp(obj, "offload", verbose=True, include_stderr=True)
 ipl = sim.ip_link_show(xdp=True)
@@ -1141,6 +1179,106 @@ netns = []
 fail(ret == 0,
  "netdevsim didn't refuse to create a map with offload disabled")
 
+sim.remove()
+
+start_test("Test multi-dev ASIC program reuse...")
+simA = NetdevSim()
+simB1 = NetdevSim()
+simB2 = NetdevSim(link=simB1)
+simB3 = NetdevSim(link=simB1)
+sims = (simA, simB1, simB2, simB3)
+simB = (simB1, simB2, simB3)
+
+bpftool_prog_load("sample_map_ret0.o", "/sys/fs/bpf/nsimA",
+  dev=simA['ifname'])
+progA = bpf_pinned("/sys/fs/bpf/nsimA")
+bpftool_prog_load("sample_map_ret0.o", "/sys/fs/bpf/nsimB",
+  dev=simB1['ifname'])
+progB = bpf_pinned("/sys/fs/bpf/nsimB")
+
+simA.set_xdp(progA, "offload", JSON=False)
+for d in simB:
+d.set_xdp(progB, "offload", JSON=False)
+
+start_test("Test multi-dev ASIC cross-dev replace...")
+ret, _ = simA.set_xdp(progB, "offload", force=True, JSON=False, fail=False)
+fail(ret == 0, "cross-ASIC program allowed")
+for d in simB:
+ret, _ = d.set_xdp(progA, "offload", 

[PATCH bpf-next 0/9] bpf: offload program and map sharing

2018-07-16 Thread Jakub Kicinski
Hi!

This patchset adds support for sharing BPF objects within one ASIC.
This will allow us to reuse of the same program on multiple ports of
a device leading to better code store utilization.  It also enables
sharing maps between programs attached to different ports of a device.

Jakub Kicinski (9):
  netdevsim: add switch_id attribute
  netdevsim: add shared netdevsim devices
  netdevsim: associate bound programs with shared dev
  nfp: add .ndo_init() and .ndo_uninit() callbacks
  bpf: offload: aggregate offloads per-device
  bpf: offload: allow program and map sharing per-ASIC
  netdevsim: allow program sharing between devices
  nfp: bpf: allow program sharing within ASIC
  selftests/bpf: add test for sharing objects between netdevs

 drivers/net/ethernet/netronome/nfp/bpf/main.c |  23 ++
 drivers/net/ethernet/netronome/nfp/bpf/main.h |   4 +
 .../net/ethernet/netronome/nfp/bpf/offload.c  |  10 +-
 drivers/net/ethernet/netronome/nfp/nfp_app.c  |  17 ++
 drivers/net/ethernet/netronome/nfp/nfp_app.h  |   8 +
 .../ethernet/netronome/nfp/nfp_net_common.c   |   2 +
 .../net/ethernet/netronome/nfp/nfp_net_repr.c |   2 +
 drivers/net/netdevsim/bpf.c   |  50 +++-
 drivers/net/netdevsim/netdev.c| 103 +++-
 drivers/net/netdevsim/netdevsim.h |  23 +-
 include/linux/bpf.h   |  10 +-
 kernel/bpf/offload.c  | 223 ++
 kernel/bpf/verifier.c |   2 +-
 tools/testing/selftests/bpf/test_offload.py   | 151 +++-
 14 files changed, 543 insertions(+), 85 deletions(-)

-- 
2.17.1



[PATCH bpf-next 6/9] bpf: offload: allow program and map sharing per-ASIC

2018-07-16 Thread Jakub Kicinski
Create a higher-level entity to represent a device/ASIC to allow
programs and maps to be shared between device ports.  The extra
work is required to make sure we don't destroy BPF objects as
soon as the netdev for which they were loaded gets destroyed,
as other ports may still be using them.  When netdev goes away
all of its BPF objects will be moved to other netdevs of the
device, and only destroyed when last netdev is unregistered.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c |  14 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.h |   4 +
 drivers/net/netdevsim/bpf.c   |  17 ++-
 drivers/net/netdevsim/netdevsim.h |   3 +
 include/linux/bpf.h   |  12 +-
 kernel/bpf/offload.c  | 123 ++
 kernel/bpf/verifier.c |   2 +-
 7 files changed, 142 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c 
b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index dee039ada75c..458f49235d06 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -406,12 +406,16 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
 
 static int nfp_bpf_ndo_init(struct nfp_app *app, struct net_device *netdev)
 {
-   return bpf_offload_dev_netdev_register(netdev);
+   struct nfp_app_bpf *bpf = app->priv;
+
+   return bpf_offload_dev_netdev_register(bpf->bpf_dev, netdev);
 }
 
 static void nfp_bpf_ndo_uninit(struct nfp_app *app, struct net_device *netdev)
 {
-   bpf_offload_dev_netdev_unregister(netdev);
+   struct nfp_app_bpf *bpf = app->priv;
+
+   bpf_offload_dev_netdev_unregister(bpf->bpf_dev, netdev);
 }
 
 static int nfp_bpf_init(struct nfp_app *app)
@@ -437,6 +441,11 @@ static int nfp_bpf_init(struct nfp_app *app)
if (err)
goto err_free_neutral_maps;
 
+   bpf->bpf_dev = bpf_offload_dev_create();
+   err = PTR_ERR_OR_ZERO(bpf->bpf_dev);
+   if (err)
+   goto err_free_neutral_maps;
+
return 0;
 
 err_free_neutral_maps:
@@ -455,6 +464,7 @@ static void nfp_bpf_clean(struct nfp_app *app)
 {
struct nfp_app_bpf *bpf = app->priv;
 
+   bpf_offload_dev_destroy(bpf->bpf_dev);
WARN_ON(!skb_queue_empty(>cmsg_replies));
WARN_ON(!list_empty(>map_list));
WARN_ON(bpf->maps_in_use || bpf->map_elems_in_use);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 9845c1a2d4c2..bec935468f90 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -110,6 +110,8 @@ enum pkt_vec {
  * struct nfp_app_bpf - bpf app priv structure
  * @app:   backpointer to the app
  *
+ * @bpf_dev:   BPF offload device handle
+ *
  * @tag_allocator: bitmap of control message tags in use
  * @tag_alloc_next:next tag bit to allocate
  * @tag_alloc_last:next tag bit to be freed
@@ -150,6 +152,8 @@ enum pkt_vec {
 struct nfp_app_bpf {
struct nfp_app *app;
 
+   struct bpf_offload_dev *bpf_dev;
+
DECLARE_BITMAP(tag_allocator, U16_MAX + 1);
u16 tag_alloc_next;
u16 tag_alloc_last;
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index c4a2829e0e1f..9eab29f67a0e 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -592,11 +592,16 @@ int nsim_bpf_init(struct netdevsim *ns)
debugfs_create_dir("bpf_bound_progs", ns->sdev->ddir);
if (IS_ERR_OR_NULL(ns->sdev->ddir_bpf_bound_progs))
return -ENOMEM;
+
+   ns->sdev->bpf_dev = bpf_offload_dev_create();
+   err = PTR_ERR_OR_ZERO(ns->sdev->bpf_dev);
+   if (err)
+   return err;
}
 
-   err = bpf_offload_dev_netdev_register(ns->netdev);
+   err = bpf_offload_dev_netdev_register(ns->sdev->bpf_dev, ns->netdev);
if (err)
-   return err;
+   goto err_destroy_bdev;
 
debugfs_create_u32("bpf_offloaded_id", 0400, ns->ddir,
   >bpf_offloaded_id);
@@ -624,6 +629,11 @@ int nsim_bpf_init(struct netdevsim *ns)
>bpf_map_accept);
 
return 0;
+
+err_destroy_bdev:
+   if (ns->sdev->refcnt == 1)
+   bpf_offload_dev_destroy(ns->sdev->bpf_dev);
+   return err;
 }
 
 void nsim_bpf_uninit(struct netdevsim *ns)
@@ -631,10 +641,11 @@ void nsim_bpf_uninit(struct netdevsim *ns)
WARN_ON(ns->xdp.prog);
WARN_ON(ns->xdp_hw.prog);
WARN_ON(ns->bpf_offloaded);
-   bpf_offload_dev_netdev_unregister(ns->netdev);
+   bpf_offload_dev_netdev_unregister(ns->sdev->bpf_dev, ns->netdev);
 
if (ns->sdev->refcnt == 1) {

[PATCH bpf-next 2/9] netdevsim: add shared netdevsim devices

2018-07-16 Thread Jakub Kicinski
Factor out sharable netdevsim sub-object and use IFLA_LINK to link
netdevsims together at creation time.  Sharable object will have
its own DebugFS directory.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/netdevsim/netdev.c| 87 ---
 drivers/net/netdevsim/netdevsim.h | 10 +++-
 2 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index 9125637ef5d8..2d244551298b 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -152,8 +152,8 @@ nsim_port_attr_get(struct net_device *dev, struct 
switchdev_attr *attr)
 
switch (attr->id) {
case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
-   attr->u.ppid.id_len = sizeof(ns->switch_id);
-   memcpy(>u.ppid.id, >switch_id,
+   attr->u.ppid.id_len = sizeof(ns->sdev->switch_id);
+   memcpy(>u.ppid.id, >sdev->switch_id,
   attr->u.ppid.id_len);
return 0;
default:
@@ -167,19 +167,41 @@ static const struct switchdev_ops nsim_switchdev_ops = {
 
 static int nsim_init(struct net_device *dev)
 {
+   char sdev_ddir_name[10], sdev_link_name[32];
struct netdevsim *ns = netdev_priv(dev);
int err;
 
ns->netdev = dev;
-   ns->switch_id = nsim_dev_id;
-
ns->ddir = debugfs_create_dir(netdev_name(dev), nsim_ddir);
if (IS_ERR_OR_NULL(ns->ddir))
return -ENOMEM;
 
+   if (!ns->sdev) {
+   ns->sdev = kzalloc(sizeof(*ns->sdev), GFP_KERNEL);
+   if (!ns->sdev) {
+   err = -ENOMEM;
+   goto err_debugfs_destroy;
+   }
+   ns->sdev->refcnt = 1;
+   ns->sdev->switch_id = nsim_dev_id;
+   sprintf(sdev_ddir_name, "%u", ns->sdev->switch_id);
+   ns->sdev->ddir = debugfs_create_dir(sdev_ddir_name,
+   nsim_sdev_ddir);
+   if (IS_ERR_OR_NULL(ns->sdev->ddir)) {
+   err = PTR_ERR_OR_ZERO(ns->sdev->ddir) ?: -EINVAL;
+   goto err_sdev_free;
+   }
+   } else {
+   sprintf(sdev_ddir_name, "%u", ns->sdev->switch_id);
+   ns->sdev->refcnt++;
+   }
+
+   sprintf(sdev_link_name, "../../" DRV_NAME "_sdev/%s", sdev_ddir_name);
+   debugfs_create_symlink("sdev", ns->ddir, sdev_link_name);
+
err = nsim_bpf_init(ns);
if (err)
-   goto err_debugfs_destroy;
+   goto err_sdev_destroy;
 
ns->dev.id = nsim_dev_id++;
ns->dev.bus = _bus;
@@ -203,6 +225,12 @@ static int nsim_init(struct net_device *dev)
device_unregister(>dev);
 err_bpf_uninit:
nsim_bpf_uninit(ns);
+err_sdev_destroy:
+   if (!--ns->sdev->refcnt) {
+   debugfs_remove_recursive(ns->sdev->ddir);
+err_sdev_free:
+   kfree(ns->sdev);
+   }
 err_debugfs_destroy:
debugfs_remove_recursive(ns->ddir);
return err;
@@ -216,6 +244,10 @@ static void nsim_uninit(struct net_device *dev)
nsim_devlink_teardown(ns);
debugfs_remove_recursive(ns->ddir);
nsim_bpf_uninit(ns);
+   if (!--ns->sdev->refcnt) {
+   debugfs_remove_recursive(ns->sdev->ddir);
+   kfree(ns->sdev);
+   }
 }
 
 static void nsim_free(struct net_device *dev)
@@ -494,14 +526,48 @@ static int nsim_validate(struct nlattr *tb[], struct 
nlattr *data[],
return 0;
 }
 
+static int nsim_newlink(struct net *src_net, struct net_device *dev,
+   struct nlattr *tb[], struct nlattr *data[],
+   struct netlink_ext_ack *extack)
+{
+   struct netdevsim *ns = netdev_priv(dev);
+
+   if (tb[IFLA_LINK]) {
+   struct net_device *joindev;
+   struct netdevsim *joinns;
+
+   joindev = __dev_get_by_index(src_net,
+nla_get_u32(tb[IFLA_LINK]));
+   if (!joindev)
+   return -ENODEV;
+   if (joindev->netdev_ops != _netdev_ops)
+   return -EINVAL;
+
+   joinns = netdev_priv(joindev);
+   if (!joinns->sdev || !joinns->sdev->refcnt)
+   return -EINVAL;
+   ns->sdev = joinns->sdev;
+   }
+
+   return register_netdevice(dev);
+}
+
+static void nsim_dellink(struct net_device *dev, struct list_head *head)
+{
+   unregister_netdevice_queue(dev, head);
+}
+
 static struct rtnl_link_ops nsim_link_ops __read_mostly = {
.kind   = DRV_NAME,
.priv_size  = sizeof(struct netdevsim),
.setup  = nsim_setup,
.validate   = nsim_validate,
+   .newlink= nsim_newlink,
+   .dellink= nsim_dellink,
 };
 
 struct dentry *nsim_ddir;
+struct dentry 

[PATCH bpf-next 8/9] nfp: bpf: allow program sharing within ASIC

2018-07-16 Thread Jakub Kicinski
Allow program sharing between netdevs of the same NFP ASIC.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/bpf/offload.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 78f44c4d95b4..49b03f7dbf46 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -566,14 +566,8 @@ int nfp_net_bpf_offload(struct nfp_net *nn, struct 
bpf_prog *prog,
 {
int err;
 
-   if (prog) {
-   struct bpf_prog_offload *offload = prog->aux->offload;
-
-   if (!offload)
-   return -EINVAL;
-   if (offload->netdev != nn->dp.netdev)
-   return -EINVAL;
-   }
+   if (prog && !bpf_offload_dev_match(prog, nn->dp.netdev))
+   return -EINVAL;
 
if (prog && old_prog) {
u8 cap;
-- 
2.17.1



[PATCH bpf-next 3/9] netdevsim: associate bound programs with shared dev

2018-07-16 Thread Jakub Kicinski
Move bound program information from netdevsim to shared sub-object,
as programs will soon be shared between netdevs of the same ASIC.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/netdevsim/bpf.c | 30 -
 drivers/net/netdevsim/netdevsim.h   | 11 
 tools/testing/selftests/bpf/test_offload.py |  5 ++--
 3 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index c36d2a768202..357f9e62f306 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -238,8 +238,8 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, 
struct bpf_prog *prog)
state->state = "verify";
 
/* Program id is not populated yet when we create the state. */
-   sprintf(name, "%u", ns->prog_id_gen++);
-   state->ddir = debugfs_create_dir(name, ns->ddir_bpf_bound_progs);
+   sprintf(name, "%u", ns->sdev->prog_id_gen++);
+   state->ddir = debugfs_create_dir(name, ns->sdev->ddir_bpf_bound_progs);
if (IS_ERR_OR_NULL(state->ddir)) {
kfree(state);
return -ENOMEM;
@@ -250,7 +250,7 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, 
struct bpf_prog *prog)
>state, _bpf_string_fops);
debugfs_create_bool("loaded", 0400, state->ddir, >is_loaded);
 
-   list_add_tail(>l, >bpf_bound_progs);
+   list_add_tail(>l, >sdev->bpf_bound_progs);
 
prog->aux->offload->dev_priv = state;
 
@@ -497,7 +497,7 @@ nsim_bpf_map_alloc(struct netdevsim *ns, struct 
bpf_offloaded_map *offmap)
}
 
offmap->dev_ops = _bpf_map_ops;
-   list_add_tail(>l, >bpf_bound_maps);
+   list_add_tail(>l, >sdev->bpf_bound_maps);
 
return 0;
 
@@ -582,8 +582,15 @@ int nsim_bpf(struct net_device *dev, struct netdev_bpf 
*bpf)
 
 int nsim_bpf_init(struct netdevsim *ns)
 {
-   INIT_LIST_HEAD(>bpf_bound_progs);
-   INIT_LIST_HEAD(>bpf_bound_maps);
+   if (ns->sdev->refcnt == 1) {
+   INIT_LIST_HEAD(>sdev->bpf_bound_progs);
+   INIT_LIST_HEAD(>sdev->bpf_bound_maps);
+
+   ns->sdev->ddir_bpf_bound_progs =
+   debugfs_create_dir("bpf_bound_progs", ns->sdev->ddir);
+   if (IS_ERR_OR_NULL(ns->sdev->ddir_bpf_bound_progs))
+   return -ENOMEM;
+   }
 
debugfs_create_u32("bpf_offloaded_id", 0400, ns->ddir,
   >bpf_offloaded_id);
@@ -593,10 +600,6 @@ int nsim_bpf_init(struct netdevsim *ns)
>bpf_bind_accept);
debugfs_create_u32("bpf_bind_verifier_delay", 0600, ns->ddir,
   >bpf_bind_verifier_delay);
-   ns->ddir_bpf_bound_progs =
-   debugfs_create_dir("bpf_bound_progs", ns->ddir);
-   if (IS_ERR_OR_NULL(ns->ddir_bpf_bound_progs))
-   return -ENOMEM;
 
ns->bpf_tc_accept = true;
debugfs_create_bool("bpf_tc_accept", 0600, ns->ddir,
@@ -619,9 +622,12 @@ int nsim_bpf_init(struct netdevsim *ns)
 
 void nsim_bpf_uninit(struct netdevsim *ns)
 {
-   WARN_ON(!list_empty(>bpf_bound_progs));
-   WARN_ON(!list_empty(>bpf_bound_maps));
WARN_ON(ns->xdp.prog);
WARN_ON(ns->xdp_hw.prog);
WARN_ON(ns->bpf_offloaded);
+
+   if (ns->sdev->refcnt == 1) {
+   WARN_ON(!list_empty(>sdev->bpf_bound_progs));
+   WARN_ON(!list_empty(>sdev->bpf_bound_maps));
+   }
 }
diff --git a/drivers/net/netdevsim/netdevsim.h 
b/drivers/net/netdevsim/netdevsim.h
index 8743ce74d2d9..98f26fa1e671 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -35,6 +35,12 @@ struct netdevsim_shared_dev {
u32 switch_id;
 
struct dentry *ddir;
+
+   struct dentry *ddir_bpf_bound_progs;
+   u32 prog_id_gen;
+
+   struct list_head bpf_bound_progs;
+   struct list_head bpf_bound_maps;
 };
 
 #define NSIM_IPSEC_MAX_SA_COUNT33
@@ -79,12 +85,8 @@ struct netdevsim {
struct xdp_attachment_info xdp;
struct xdp_attachment_info xdp_hw;
 
-   u32 prog_id_gen;
-
bool bpf_bind_accept;
u32 bpf_bind_verifier_delay;
-   struct dentry *ddir_bpf_bound_progs;
-   struct list_head bpf_bound_progs;
 
bool bpf_tc_accept;
bool bpf_tc_non_bound_accept;
@@ -92,7 +94,6 @@ struct netdevsim {
bool bpf_xdpoffload_accept;
 
bool bpf_map_accept;
-   struct list_head bpf_bound_maps;
 #if IS_ENABLED(CONFIG_NET_DEVLINK)
struct devlink *devlink;
 #endif
diff --git a/tools/testing/selftests/bpf/test_offload.py 
b/tools/testing/selftests/bpf/test_offload.py
index b746227eaff2..ee1abef384ea 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -314,6 +314,7 @@ netns = [] # net namespaces to be removed
 self.ns = ""
 
 

[PATCH bpf-next 1/9] netdevsim: add switch_id attribute

2018-07-16 Thread Jakub Kicinski
Grouping netdevsim devices into "ASICs" will soon be supported.
Add switch_id attribute to all netdevsims.  For now each netdevsim
will have its switch_id matching the device id.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/netdevsim/netdev.c| 24 
 drivers/net/netdevsim/netdevsim.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index a7b179f0d954..9125637ef5d8 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "netdevsim.h"
 
@@ -144,12 +145,34 @@ static struct device_type nsim_dev_type = {
.release = nsim_dev_release,
 };
 
+static int
+nsim_port_attr_get(struct net_device *dev, struct switchdev_attr *attr)
+{
+   struct netdevsim *ns = netdev_priv(dev);
+
+   switch (attr->id) {
+   case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+   attr->u.ppid.id_len = sizeof(ns->switch_id);
+   memcpy(>u.ppid.id, >switch_id,
+  attr->u.ppid.id_len);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static const struct switchdev_ops nsim_switchdev_ops = {
+   .switchdev_port_attr_get= nsim_port_attr_get,
+};
+
 static int nsim_init(struct net_device *dev)
 {
struct netdevsim *ns = netdev_priv(dev);
int err;
 
ns->netdev = dev;
+   ns->switch_id = nsim_dev_id;
+
ns->ddir = debugfs_create_dir(netdev_name(dev), nsim_ddir);
if (IS_ERR_OR_NULL(ns->ddir))
return -ENOMEM;
@@ -166,6 +189,7 @@ static int nsim_init(struct net_device *dev)
goto err_bpf_uninit;
 
SET_NETDEV_DEV(dev, >dev);
+   SWITCHDEV_SET_OPS(dev, _switchdev_ops);
 
err = nsim_devlink_setup(ns);
if (err)
diff --git a/drivers/net/netdevsim/netdevsim.h 
b/drivers/net/netdevsim/netdevsim.h
index 0aeabbe81cc6..e2f232325259 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -59,6 +59,7 @@ struct netdevsim {
struct u64_stats_sync syncp;
 
struct device dev;
+   u32 switch_id;
 
struct dentry *ddir;
 
-- 
2.17.1



[PATCH bpf-next 4/9] nfp: add .ndo_init() and .ndo_uninit() callbacks

2018-07-16 Thread Jakub Kicinski
BPF code should unregister the offload capabilities from .ndo_uninit(),
to make sure the operation is atomic with unlist_netdevice().  Plumb
the init/uninit NDOs for vNICs and representors.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/nfp_app.c| 17 +
 drivers/net/ethernet/netronome/nfp/nfp_app.h|  8 
 .../net/ethernet/netronome/nfp/nfp_net_common.c |  2 ++
 .../net/ethernet/netronome/nfp/nfp_net_repr.c   |  2 ++
 4 files changed, 29 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c 
b/drivers/net/ethernet/netronome/nfp/nfp_app.c
index f28b244f4ee7..69d4ae7a61f3 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c
@@ -86,6 +86,23 @@ const char *nfp_app_mip_name(struct nfp_app *app)
return nfp_mip_name(app->pf->mip);
 }
 
+int nfp_app_ndo_init(struct net_device *netdev)
+{
+   struct nfp_app *app = nfp_app_from_netdev(netdev);
+
+   if (!app || !app->type->ndo_init)
+   return 0;
+   return app->type->ndo_init(app, netdev);
+}
+
+void nfp_app_ndo_uninit(struct net_device *netdev)
+{
+   struct nfp_app *app = nfp_app_from_netdev(netdev);
+
+   if (app && app->type->ndo_uninit)
+   app->type->ndo_uninit(app, netdev);
+}
+
 u64 *nfp_app_port_get_stats(struct nfp_port *port, u64 *data)
 {
if (!port || !port->app || !port->app->type->port_get_stats)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h 
b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index ee74caacb015..afbc19aa66a8 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -78,6 +78,8 @@ extern const struct nfp_app_type app_abm;
  * @init:  perform basic app checks and init
  * @clean: clean app state
  * @extra_cap: extra capabilities string
+ * @ndo_init:  vNIC and repr netdev .ndo_init
+ * @ndo_uninit:vNIC and repr netdev .ndo_unint
  * @vnic_alloc:allocate vNICs (assign port types, etc.)
  * @vnic_free: free up app's vNIC state
  * @vnic_init: vNIC netdev was registered
@@ -117,6 +119,9 @@ struct nfp_app_type {
 
const char *(*extra_cap)(struct nfp_app *app, struct nfp_net *nn);
 
+   int (*ndo_init)(struct nfp_app *app, struct net_device *netdev);
+   void (*ndo_uninit)(struct nfp_app *app, struct net_device *netdev);
+
int (*vnic_alloc)(struct nfp_app *app, struct nfp_net *nn,
  unsigned int id);
void (*vnic_free)(struct nfp_app *app, struct nfp_net *nn);
@@ -200,6 +205,9 @@ static inline void nfp_app_clean(struct nfp_app *app)
app->type->clean(app);
 }
 
+int nfp_app_ndo_init(struct net_device *netdev);
+void nfp_app_ndo_uninit(struct net_device *netdev);
+
 static inline int nfp_app_vnic_alloc(struct nfp_app *app, struct nfp_net *nn,
 unsigned int id)
 {
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index a712e83c3f0f..279b8ab8a17b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3480,6 +3480,8 @@ static int nfp_net_set_mac_address(struct net_device 
*netdev, void *addr)
 }
 
 const struct net_device_ops nfp_net_netdev_ops = {
+   .ndo_init   = nfp_app_ndo_init,
+   .ndo_uninit = nfp_app_ndo_uninit,
.ndo_open   = nfp_net_netdev_open,
.ndo_stop   = nfp_net_netdev_close,
.ndo_start_xmit = nfp_net_tx,
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index d7b712f6362f..18a09cdcd9c6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -262,6 +262,8 @@ static int nfp_repr_open(struct net_device *netdev)
 }
 
 const struct net_device_ops nfp_repr_netdev_ops = {
+   .ndo_init   = nfp_app_ndo_init,
+   .ndo_uninit = nfp_app_ndo_uninit,
.ndo_open   = nfp_repr_open,
.ndo_stop   = nfp_repr_stop,
.ndo_start_xmit = nfp_repr_xmit,
-- 
2.17.1



[net-next:master 238/243] drivers/net/phy/mdio-thunder.c:40:8: error: implicit declaration of function 'pcim_enable_device'; did you mean 'pci_enable_device'?

2018-07-16 Thread kbuild test robot
Hi Alexander,

First bad commit (maybe != root cause):

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   ccdb51717ba3bdc9585998e4ffd41d70c04dedea
commit: 7e2bc7fb65d544bb8598a0ab64e40ee9c60ded6e [238/243] net: cavium: Drop 
dependency of NET_VENDOR_CAVIUM on PCI
config: um-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
git checkout 7e2bc7fb65d544bb8598a0ab64e40ee9c60ded6e
# save the attached .config to linux build tree
make ARCH=um 

All error/warnings (new ones prefixed by >>):

   drivers/net/phy/mdio-thunder.c: In function 'thunder_mdiobus_pci_probe':
>> drivers/net/phy/mdio-thunder.c:40:8: error: implicit declaration of function 
>> 'pcim_enable_device'; did you mean 'pci_enable_device'? 
>> [-Werror=implicit-function-declaration]
 err = pcim_enable_device(pdev);
   ^~
   pci_enable_device
   drivers/net/phy/mdio-thunder.c: At top level:
>> drivers/net/phy/mdio-thunder.c:151:1: warning: data definition has no type 
>> or storage class
module_pci_driver(thunder_mdiobus_driver);
^
>> drivers/net/phy/mdio-thunder.c:151:1: error: type defaults to 'int' in 
>> declaration of 'module_pci_driver' [-Werror=implicit-int]
>> drivers/net/phy/mdio-thunder.c:151:1: warning: parameter names (without 
>> types) in function declaration
   drivers/net/phy/mdio-thunder.c:144:26: warning: 'thunder_mdiobus_driver' 
defined but not used [-Wunused-variable]
static struct pci_driver thunder_mdiobus_driver = {
 ^~
   cc1: some warnings being treated as errors
--
   In file included from drivers/net/ethernet/cavium/liquidio/lio_main.c:31:0:
   drivers/net/ethernet/cavium/liquidio/octeon_main.h: In function 
'octeon_unmap_pci_barx':
>> drivers/net/ethernet/cavium/liquidio/octeon_main.h:97:3: error: implicit 
>> declaration of function 'pci_release_region'; did you mean 
>> 'pci_release_regions'? [-Werror=implicit-function-declaration]
  pci_release_region(oct->pci_dev, baridx * 2);
  ^~
  pci_release_regions
   drivers/net/ethernet/cavium/liquidio/octeon_main.h: In function 
'octeon_map_pci_barx':
>> drivers/net/ethernet/cavium/liquidio/octeon_main.h:111:6: error: implicit 
>> declaration of function 'pci_request_region'; did you mean 
>> 'pci_request_regions'? [-Werror=implicit-function-declaration]
 if (pci_request_region(oct->pci_dev, baridx * 2, DRV_NAME)) {
 ^~
 pci_request_regions
   drivers/net/ethernet/cavium/liquidio/lio_main.c: In function 'stop_pci_io':
>> drivers/net/ethernet/cavium/liquidio/lio_main.c:332:3: error: implicit 
>> declaration of function 'pci_disable_msi'; did you mean 'pci_disable_sriov'? 
>> [-Werror=implicit-function-declaration]
  pci_disable_msi(oct->pci_dev);
  ^~~
  pci_disable_sriov
   drivers/net/ethernet/cavium/liquidio/lio_main.c: In function 
'octeon_pci_flr':
>> drivers/net/ethernet/cavium/liquidio/lio_main.c:983:2: error: implicit 
>> declaration of function 'pci_cfg_access_lock'; did you mean '__access_ok'? 
>> [-Werror=implicit-function-declaration]
 pci_cfg_access_lock(oct->pci_dev);
 ^~~
 __access_ok
>> drivers/net/ethernet/cavium/liquidio/lio_main.c:989:7: error: implicit 
>> declaration of function '__pci_reset_function_locked' 
>> [-Werror=implicit-function-declaration]
 rc = __pci_reset_function_locked(oct->pci_dev);
  ^~~
>> drivers/net/ethernet/cavium/liquidio/lio_main.c:995:2: error: implicit 
>> declaration of function 'pci_cfg_access_unlock'; did you mean '__access_ok'? 
>> [-Werror=implicit-function-declaration]
 pci_cfg_access_unlock(oct->pci_dev);
 ^
 __access_ok
   drivers/net/ethernet/cavium/liquidio/lio_main.c: In function 
'octeon_destroy_resources':
>> drivers/net/ethernet/cavium/liquidio/lio_main.c:1063:20: error: invalid use 
>> of undefined type 'struct msix_entry'
   msix_entries[i].vector,
   ^
>> drivers/net/ethernet/cavium/liquidio/lio_main.c:1063:20: error: 
>> dereferencing pointer to incomplete type 'struct msix_entry'
   drivers/net/ethernet/cavium/liquidio/lio_main.c:1065:27: error: invalid use 
of undefined type 'struct msix_entry'
 free_irq(msix_entries[i].vector,
  ^
   drivers/net/ethernet/cavium/liquidio/lio_main.c:1071:25: error: invalid use 
of undefined type 'struct msix_entry'
   free_irq(msix_entries[i].vector, oct);
^
>> drivers/net/ethernet/cavium/liquidio/lio_main.c:1073:4: error: implicit 
>> declaration of function 'pci_disable_msix'; did you mean 
>> 'pci_disable_sriov'? [-Werror=implicit-function-declaration]
   pci_disable_msix(oct->pci_dev);
   ^~~~
   pci_disable_sriov
>> 

[PATCH net-next] xdp: fix uninitialized 'err' variable

2018-07-16 Thread Jakub Kicinski
Smatch caught an uninitialized variable error which GCC seems
to miss.

Fixes: a25717d2b604 ("xdp: support simultaneous driver and hw XDP attachment")
Signed-off-by: Jakub Kicinski 
---
 net/core/rtnetlink.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e03258e954c8..92b6fa5d5f6e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1414,14 +1414,17 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct 
net_device *dev)
 
prog_id = 0;
mode = XDP_ATTACHED_NONE;
-   if (rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_SKB,
-   IFLA_XDP_SKB_PROG_ID, rtnl_xdp_prog_skb))
+   err = rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_SKB,
+ IFLA_XDP_SKB_PROG_ID, rtnl_xdp_prog_skb);
+   if (err)
goto err_cancel;
-   if (rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_DRV,
-   IFLA_XDP_DRV_PROG_ID, rtnl_xdp_prog_drv))
+   err = rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_DRV,
+ IFLA_XDP_DRV_PROG_ID, rtnl_xdp_prog_drv);
+   if (err)
goto err_cancel;
-   if (rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_HW,
-   IFLA_XDP_HW_PROG_ID, rtnl_xdp_prog_hw))
+   err = rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_HW,
+ IFLA_XDP_HW_PROG_ID, rtnl_xdp_prog_hw);
+   if (err)
goto err_cancel;
 
err = nla_put_u8(skb, IFLA_XDP_ATTACHED, mode);
-- 
2.17.1



[PATCH mlx5-next 5/8] net/mlx5: Add missing SET_DRIVER_VERSION command translation

2018-07-16 Thread Saeed Mahameed
From: Noa Osherovich 

When translating command opcodes to a string, SET_DRIVER_VERSION
command was missing.

Fixes: 42ca502e179d0 ('net/mlx5_core: Use a macro in mlx5_command_str()')
Signed-off-by: Noa Osherovich 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 10517b2a0643..041c18faea46 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -458,6 +458,7 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(SET_HCA_CAP);
MLX5_COMMAND_STR_CASE(QUERY_ISSI);
MLX5_COMMAND_STR_CASE(SET_ISSI);
+   MLX5_COMMAND_STR_CASE(SET_DRIVER_VERSION);
MLX5_COMMAND_STR_CASE(CREATE_MKEY);
MLX5_COMMAND_STR_CASE(QUERY_MKEY);
MLX5_COMMAND_STR_CASE(DESTROY_MKEY);
-- 
2.17.0



[PATCH mlx5-next 8/8] net/mlx5: Fix tristate and description for MLX5 module

2018-07-16 Thread Saeed Mahameed
From: Eran Ben Elisha 

Current description did not include new devices. Fix that by proving the
correct generic description.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/Kconfig  | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/main.c  | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/Kconfig 
b/drivers/infiniband/hw/mlx5/Kconfig
index fb4d77be019b..0440966bc6ec 100644
--- a/drivers/infiniband/hw/mlx5/Kconfig
+++ b/drivers/infiniband/hw/mlx5/Kconfig
@@ -1,5 +1,5 @@
 config MLX5_INFINIBAND
-   tristate "Mellanox Connect-IB HCA support"
+   tristate "Mellanox 5th generation network adapters (ConnectX series) 
support"
depends on NETDEVICES && ETHERNET && PCI && MLX5_CORE
depends on INFINIBAND_USER_ACCESS || INFINIBAND_USER_ACCESS=n
---help---
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 2545296a0c08..7a84dd07ced2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -3,7 +3,7 @@
 #
 
 config MLX5_CORE
-   tristate "Mellanox Technologies ConnectX-4 and Connect-IB core driver"
+   tristate "Mellanox 5th generation network adapters (ConnectX series) 
core driver"
depends on MAY_USE_DEVLINK
depends on PCI
imply PTP_1588_CLOCK
@@ -27,7 +27,7 @@ config MLX5_FPGA
   sandbox-specific client drivers.
 
 config MLX5_CORE_EN
-   bool "Mellanox Technologies ConnectX-4 Ethernet support"
+   bool "Mellanox 5th generation network adapters (ConnectX series) 
Ethernet support"
depends on NETDEVICES && ETHERNET && INET && PCI && MLX5_CORE
depends on IPV6=y || IPV6=n || MLX5_CORE=m
select PAGE_POOL
@@ -69,7 +69,7 @@ config MLX5_CORE_EN_DCB
  If unsure, set to Y
 
 config MLX5_CORE_IPOIB
-   bool "Mellanox Technologies ConnectX-4 IPoIB offloads support"
+   bool "Mellanox 5th generation network adapters (connectX series) IPoIB 
offloads support"
depends on MLX5_CORE_EN
default n
---help---
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 615005e63819..f9b950e1bd85 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -64,7 +64,7 @@
 #include "lib/clock.h"
 
 MODULE_AUTHOR("Eli Cohen ");
-MODULE_DESCRIPTION("Mellanox Connect-IB, ConnectX-4 core driver");
+MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) 
core driver");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_VERSION(DRIVER_VERSION);
 
-- 
2.17.0



[PATCH mlx5-next 4/8] net/mlx5: Add XRQ commands definitions

2018-07-16 Thread Saeed Mahameed
From: Max Gurtovoy 

Update mlx5 command list and error return function to handle XRQ
commands.

Signed-off-by: Max Gurtovoy 
Reviewed-by: Daniel Jurgens 
Reviewed-by: Artemy Kovalyov 
Reviewed-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index a94955302482..10517b2a0643 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -278,6 +278,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_DESTROY_PSV:
case MLX5_CMD_OP_DESTROY_SRQ:
case MLX5_CMD_OP_DESTROY_XRC_SRQ:
+   case MLX5_CMD_OP_DESTROY_XRQ:
case MLX5_CMD_OP_DESTROY_DCT:
case MLX5_CMD_OP_DEALLOC_Q_COUNTER:
case MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT:
@@ -347,6 +348,9 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_CREATE_XRC_SRQ:
case MLX5_CMD_OP_QUERY_XRC_SRQ:
case MLX5_CMD_OP_ARM_XRC_SRQ:
+   case MLX5_CMD_OP_CREATE_XRQ:
+   case MLX5_CMD_OP_QUERY_XRQ:
+   case MLX5_CMD_OP_ARM_XRQ:
case MLX5_CMD_OP_CREATE_DCT:
case MLX5_CMD_OP_DRAIN_DCT:
case MLX5_CMD_OP_QUERY_DCT:
@@ -601,6 +605,10 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(FPGA_QUERY_QP);
MLX5_COMMAND_STR_CASE(FPGA_QUERY_QP_COUNTERS);
MLX5_COMMAND_STR_CASE(FPGA_DESTROY_QP);
+   MLX5_COMMAND_STR_CASE(CREATE_XRQ);
+   MLX5_COMMAND_STR_CASE(DESTROY_XRQ);
+   MLX5_COMMAND_STR_CASE(QUERY_XRQ);
+   MLX5_COMMAND_STR_CASE(ARM_XRQ);
MLX5_COMMAND_STR_CASE(CREATE_GENERAL_OBJECT);
MLX5_COMMAND_STR_CASE(DESTROY_GENERAL_OBJECT);
default: return "unknown command opcode";
-- 
2.17.0



[PATCH mlx5-next 7/8] net/mlx5: Better return types for CQE API

2018-07-16 Thread Saeed Mahameed
From: Tariq Toukan 

Reduce sizes of return types.
Use bool for binary indication.

Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/device.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index f8671c0a43aa..0566c6a94805 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -750,7 +750,7 @@ enum {
 
 #define MLX5_MINI_CQE_ARRAY_SIZE 8
 
-static inline int mlx5_get_cqe_format(struct mlx5_cqe64 *cqe)
+static inline u8 mlx5_get_cqe_format(struct mlx5_cqe64 *cqe)
 {
return (cqe->op_own >> 2) & 0x3;
 }
@@ -770,14 +770,14 @@ static inline u8 get_cqe_l3_hdr_type(struct mlx5_cqe64 
*cqe)
return (cqe->l4_l3_hdr_type >> 2) & 0x3;
 }
 
-static inline u8 cqe_is_tunneled(struct mlx5_cqe64 *cqe)
+static inline bool cqe_is_tunneled(struct mlx5_cqe64 *cqe)
 {
return cqe->outer_l3_tunneled & 0x1;
 }
 
-static inline int cqe_has_vlan(struct mlx5_cqe64 *cqe)
+static inline bool cqe_has_vlan(struct mlx5_cqe64 *cqe)
 {
-   return !!(cqe->l4_l3_hdr_type & 0x1);
+   return cqe->l4_l3_hdr_type & 0x1;
 }
 
 static inline u64 get_cqe_ts(struct mlx5_cqe64 *cqe)
-- 
2.17.0



[PATCH mlx5-next 2/8] net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures

2018-07-16 Thread Saeed Mahameed
From: Eran Ben Elisha 

This patch exposes PRM layout for handling MPEGC (Management PCIe
General Configuration).

This will be used in the downstream patch for configuring MPEGC via the
driver.

Signed-off-by: Eran Ben Elisha 
Reviewed-by: Moshe Shemesh 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/driver.h   |  1 +
 include/linux/mlx5/mlx5_ifc.h | 23 +--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 4a4125b4279d..957199c20a0f 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -145,6 +145,7 @@ enum {
MLX5_REG_MPCNT   = 0x9051,
MLX5_REG_MTPPS   = 0x9053,
MLX5_REG_MTPPSE  = 0x9054,
+   MLX5_REG_MPEGC   = 0x9056,
MLX5_REG_MCQI= 0x9061,
MLX5_REG_MCC = 0x9062,
MLX5_REG_MCDA= 0x9063,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index bd7b71f54d59..2de5feaeb74a 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8049,6 +8049,19 @@ struct mlx5_ifc_peir_reg_bits {
u8 error_type[0x8];
 };
 
+struct mlx5_ifc_mpegc_reg_bits {
+   u8 reserved_at_0[0x30];
+   u8 field_select[0x10];
+
+   u8 tx_overflow_sense[0x1];
+   u8 mark_cqe[0x1];
+   u8 mark_cnp[0x1];
+   u8 reserved_at_43[0x1b];
+   u8 tx_lossy_overflow_oper[0x2];
+
+   u8 reserved_at_60[0x100];
+};
+
 struct mlx5_ifc_pcam_enhanced_features_bits {
u8 reserved_at_0[0x6d];
u8 rx_icrc_encapsulated_counter[0x1];
@@ -8097,7 +8110,11 @@ struct mlx5_ifc_pcam_reg_bits {
 };
 
 struct mlx5_ifc_mcam_enhanced_features_bits {
-   u8 reserved_at_0[0x7b];
+   u8 reserved_at_0[0x74];
+   u8 mark_tx_action_cnp[0x1];
+   u8 mark_tx_action_cqe[0x1];
+   u8 dynamic_tx_overflow[0x1];
+   u8 reserved_at_77[0x4];
u8 pcie_outbound_stalled[0x1];
u8 tx_overflow_buffer_pkt[0x1];
u8 mtpps_enh_out_per_adj[0x1];
@@ -8112,7 +8129,9 @@ struct mlx5_ifc_mcam_access_reg_bits {
u8 mcqi[0x1];
u8 reserved_at_1f[0x1];
 
-   u8 regs_95_to_68[0x1c];
+   u8 regs_95_to_87[0x9];
+   u8 mpegc[0x1];
+   u8 regs_85_to_68[0x12];
u8 tracer_registers[0x4];
 
u8 regs_63_to_32[0x20];
-- 
2.17.0



[PATCH mlx5-next 1/8] net/mlx5: FW tracer, add hardware structures

2018-07-16 Thread Saeed Mahameed
From: Feras Daoud 

This change adds the infrastructure to mlx5 core fw tracer.
It introduces the following 4 new registers:
MLX5_REG_MTRC_CAP  - Used to read tracer capabilities
MLX5_REG_MTRC_CONF - Used to set tracer configurations
MLX5_REG_MTRC_STDB - Used to query tracer strings database
MLX5_REG_MTRC_CTRL - Used to control the tracer

The capability of the tracing can be checked using mcam access
register, therefore, the mcam access register interface will expose
the tracer register.

Signed-off-by: Feras Daoud 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/driver.h   |  4 +++
 include/linux/mlx5/mlx5_ifc.h | 61 ++-
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 1cb1c0317b77..4a4125b4279d 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -138,6 +138,10 @@ enum {
MLX5_REG_HOST_ENDIANNESS = 0x7004,
MLX5_REG_MCIA= 0x9014,
MLX5_REG_MLCR= 0x902b,
+   MLX5_REG_MTRC_CAP= 0x9040,
+   MLX5_REG_MTRC_CONF   = 0x9041,
+   MLX5_REG_MTRC_STDB   = 0x9042,
+   MLX5_REG_MTRC_CTRL   = 0x9043,
MLX5_REG_MPCNT   = 0x9051,
MLX5_REG_MTPPS   = 0x9053,
MLX5_REG_MTPPSE  = 0x9054,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 1853e7fd6924..bd7b71f54d59 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8112,7 +8112,9 @@ struct mlx5_ifc_mcam_access_reg_bits {
u8 mcqi[0x1];
u8 reserved_at_1f[0x1];
 
-   u8 regs_95_to_64[0x20];
+   u8 regs_95_to_68[0x1c];
+   u8 tracer_registers[0x4];
+
u8 regs_63_to_32[0x20];
u8 regs_31_to_0[0x20];
 };
@@ -9187,4 +9189,61 @@ struct mlx5_ifc_create_uctx_in_bits {
struct mlx5_ifc_uctx_bits uctx;
 };
 
+struct mlx5_ifc_mtrc_string_db_param_bits {
+   u8 string_db_base_address[0x20];
+
+   u8 reserved_at_20[0x8];
+   u8 string_db_size[0x18];
+};
+
+struct mlx5_ifc_mtrc_cap_bits {
+   u8 trace_owner[0x1];
+   u8 trace_to_memory[0x1];
+   u8 reserved_at_2[0x4];
+   u8 trc_ver[0x2];
+   u8 reserved_at_8[0x14];
+   u8 num_string_db[0x4];
+
+   u8 first_string_trace[0x8];
+   u8 num_string_trace[0x8];
+   u8 reserved_at_30[0x28];
+
+   u8 log_max_trace_buffer_size[0x8];
+
+   u8 reserved_at_60[0x20];
+
+   struct mlx5_ifc_mtrc_string_db_param_bits string_db_param[8];
+
+   u8 reserved_at_280[0x180];
+};
+
+struct mlx5_ifc_mtrc_conf_bits {
+   u8 reserved_at_0[0x1c];
+   u8 trace_mode[0x4];
+   u8 reserved_at_20[0x18];
+   u8 log_trace_buffer_size[0x8];
+   u8 trace_mkey[0x20];
+   u8 reserved_at_60[0x3a0];
+};
+
+struct mlx5_ifc_mtrc_stdb_bits {
+   u8 string_db_index[0x4];
+   u8 reserved_at_4[0x4];
+   u8 read_size[0x18];
+   u8 start_offset[0x20];
+   u8 string_db_data[0];
+};
+
+struct mlx5_ifc_mtrc_ctrl_bits {
+   u8 trace_status[0x2];
+   u8 reserved_at_2[0x2];
+   u8 arm_event[0x1];
+   u8 reserved_at_5[0xb];
+   u8 modify_field_select[0x10];
+   u8 reserved_at_20[0x2b];
+   u8 current_timestamp52_32[0x15];
+   u8 current_timestamp31_0[0x20];
+   u8 reserved_at_80[0x180];
+};
+
 #endif /* MLX5_IFC_H */
-- 
2.17.0



[PATCH mlx5-next 6/8] net/mlx5: Use ERR_CAST() instead of coding it

2018-07-16 Thread Saeed Mahameed
From: Roi Dayan 

This makes it more readable that rule is being used to return an err.

Signed-off-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 05e7a5112b74..29b86232f13a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1825,7 +1825,7 @@ _mlx5_add_flow_rules(struct mlx5_flow_table *ft,
 
g = alloc_auto_flow_group(ft, spec);
if (IS_ERR(g)) {
-   rule = (void *)g;
+   rule = ERR_CAST(g);
up_write_ref_node(>node);
return rule;
}
-- 
2.17.0



[PATCH mlx5-next 3/8] net/mlx5: Add core support for double vlan push/pop steering action

2018-07-16 Thread Saeed Mahameed
From: Jianbo Liu 

As newer firmware supports double push/pop in a single FTE, we add
core bits and extend vlan action logic for it.

Signed-off-by: Jianbo Liu 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 .../ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h |  2 ++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c   |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 12 +---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c|  4 +++-
 include/linux/mlx5/fs.h  |  4 +++-
 include/linux/mlx5/mlx5_ifc.h| 11 +--
 6 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h 
b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h
index 09f178a3fcab..0240aee9189e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h
@@ -138,6 +138,8 @@ TRACE_EVENT(mlx5_fs_del_fg,
{MLX5_FLOW_CONTEXT_ACTION_MOD_HDR,   "MOD_HDR"},\
{MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH, "VLAN_PUSH"},\
{MLX5_FLOW_CONTEXT_ACTION_VLAN_POP,  "VLAN_POP"},\
+   {MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH_2,   "VLAN_PUSH_2"},\
+   {MLX5_FLOW_CONTEXT_ACTION_VLAN_POP_2,"VLAN_POP_2"},\
{MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO, "NEXT_PRIO"}
 
 TRACE_EVENT(mlx5_fs_set_fte,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index cecd201f0b73..8f50ce80ff66 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -70,9 +70,9 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
flow_act.action &= ~(MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH |
 MLX5_FLOW_CONTEXT_ACTION_VLAN_POP);
else if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH) {
-   flow_act.vlan.ethtype = ntohs(attr->vlan_proto);
-   flow_act.vlan.vid = attr->vlan_vid;
-   flow_act.vlan.prio = attr->vlan_prio;
+   flow_act.vlan[0].ethtype = ntohs(attr->vlan_proto);
+   flow_act.vlan[0].vid = attr->vlan_vid;
+   flow_act.vlan[0].prio = attr->vlan_prio;
}
 
if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 5a00deff5457..6a62b84e57f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -349,9 +349,15 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
 
vlan = MLX5_ADDR_OF(flow_context, in_flow_context, push_vlan);
 
-   MLX5_SET(vlan, vlan, ethtype, fte->action.vlan.ethtype);
-   MLX5_SET(vlan, vlan, vid, fte->action.vlan.vid);
-   MLX5_SET(vlan, vlan, prio, fte->action.vlan.prio);
+   MLX5_SET(vlan, vlan, ethtype, fte->action.vlan[0].ethtype);
+   MLX5_SET(vlan, vlan, vid, fte->action.vlan[0].vid);
+   MLX5_SET(vlan, vlan, prio, fte->action.vlan[0].prio);
+
+   vlan = MLX5_ADDR_OF(flow_context, in_flow_context, push_vlan_2);
+
+   MLX5_SET(vlan, vlan, ethtype, fte->action.vlan[1].ethtype);
+   MLX5_SET(vlan, vlan, vid, fte->action.vlan[1].vid);
+   MLX5_SET(vlan, vlan, prio, fte->action.vlan[1].prio);
 
in_match_value = MLX5_ADDR_OF(flow_context, in_flow_context,
  match_value);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 49a75d31185e..05e7a5112b74 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1464,7 +1464,9 @@ static bool check_conflicting_actions(u32 action1, u32 
action2)
 MLX5_FLOW_CONTEXT_ACTION_DECAP |
 MLX5_FLOW_CONTEXT_ACTION_MOD_HDR  |
 MLX5_FLOW_CONTEXT_ACTION_VLAN_POP |
-MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH))
+MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH |
+MLX5_FLOW_CONTEXT_ACTION_VLAN_POP_2 |
+MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH_2))
return true;
 
return false;
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 757b4a30281e..c40f2fc68655 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -152,6 +152,8 @@ struct mlx5_fs_vlan {
 u8  prio;
 };
 
+#define MLX5_FS_VLAN_DEPTH 2
+
 struct mlx5_flow_act {
u32 action;
bool has_flow_tag;
@@ -159,7 +161,7 @@ struct mlx5_flow_act {
u32 encap_id;
u32 modify_id;
uintptr_t esp_id;
-   struct mlx5_fs_vlan 

[PATCH mlx5-next 0/8] Mellanox, mlx5 updates 2018-07-16

2018-07-16 Thread Saeed Mahameed
Hi,

This series includes mlx5 core infrastructure updates and fixes
aimed for mlx5-next branch.

In case of no objections, below patches will be applied to mlx5-next branch
and next mlx5 net-next pull request will start with a merge commit
pointing to the last patch in this series.

>From Eran:
 - Add MPEGC (Management PCIe General Configuration) registers and btis
 - Fix tristate and description for MLX5 module

>From Feras:
 - Add hardware structures for the firmware tracer

>From Jainbo:
 - Core support for double vlan push/pop steering action

>From Max:
 - Add XRQ commands definitions

>From Noa:
 - Add missing SET_DRIVER_VERSION command translation

>From Roi:
 - Use ERR_CAST() instead of coding it

>From Tariq:
 - Better return types for CQE API

Thanks,
Saeed

--- 

Eran Ben Elisha (2):
  net/mlx5: Expose MPEGC (Management PCIe General Configuration)
structures
  net/mlx5: Fix tristate and description for MLX5 module

Feras Daoud (1):
  net/mlx5: FW tracer, add hardware structures

Jianbo Liu (1):
  net/mlx5: Add core support for double vlan push/pop steering action

Max Gurtovoy (1):
  net/mlx5: Add XRQ commands definitions

Noa Osherovich (1):
  net/mlx5: Add missing SET_DRIVER_VERSION command translation

Roi Dayan (1):
  net/mlx5: Use ERR_CAST() instead of coding it

Tariq Toukan (1):
  net/mlx5: Better return types for CQE API

 drivers/infiniband/hw/mlx5/Kconfig|  2 +-
 .../net/ethernet/mellanox/mlx5/core/Kconfig   |  6 +-
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  9 ++
 .../mellanox/mlx5/core/diag/fs_tracepoint.h   |  2 +
 .../mellanox/mlx5/core/eswitch_offloads.c |  6 +-
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.c  | 12 ++-
 .../net/ethernet/mellanox/mlx5/core/fs_core.c |  6 +-
 .../net/ethernet/mellanox/mlx5/core/main.c|  2 +-
 include/linux/mlx5/device.h   |  8 +-
 include/linux/mlx5/driver.h   |  5 +
 include/linux/mlx5/fs.h   |  4 +-
 include/linux/mlx5/mlx5_ifc.h | 93 ++-
 12 files changed, 133 insertions(+), 22 deletions(-)

-- 
2.17.0



[PATCH net-next] liquidio: correct error msg text when removing VLAN ID

2018-07-16 Thread Felix Manlunas
From: Rick Farrington 

Signed-off-by: Rick Farrington 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 2 +-
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index a60d5af..4edb158 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -2628,7 +2628,7 @@ static int liquidio_vlan_rx_kill_vid(struct net_device 
*netdev,
 
ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, );
if (ret < 0) {
-   dev_err(>pci_dev->dev, "Add VLAN filter failed in core 
(ret: 0x%x)\n",
+   dev_err(>pci_dev->dev, "Del VLAN filter failed in core 
(ret: 0x%x)\n",
ret);
}
return ret;
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 7fa0212..b778357 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -1693,7 +1693,7 @@ liquidio_vlan_rx_kill_vid(struct net_device *netdev,
 
ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, );
if (ret < 0) {
-   dev_err(>pci_dev->dev, "Add VLAN filter failed in core 
(ret: 0x%x)\n",
+   dev_err(>pci_dev->dev, "Del VLAN filter failed in core 
(ret: 0x%x)\n",
ret);
}
return ret;


Re: [PATCH v2 iproute2-next 06/31] tc/util: add print helpers for JSON

2018-07-16 Thread David Ahern
On 7/10/18 3:05 PM, Stephen Hemminger wrote:
> From: Stephen Hemminger 
> 
> Add a helper to print rate, time and size in numeric or pretty format
> based on JSON flag.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  tc/tc_util.c | 83 +---
>  tc/tc_util.h |  6 
>  2 files changed, 59 insertions(+), 30 deletions(-)

This one fails to compile on Stretch:

tc
CC   tc_util.o
tc_util.c:388:6: error: conflicting types for ‘print_time’
 void print_time(const char *key, const char *fmt, __u32 tm)
  ^~
In file included from tc_util.c:27:0:
tc_util.h:92:6: note: previous declaration of ‘print_time’ was here
 void print_time(const char *key, const char *fmt, __s32 tm);
  ^~
../config.mk:43: recipe for target 'tc_util.o' failed


Re: [PATCH][net-next][v2] net: convert gro_count to bitmask

2018-07-16 Thread kbuild test robot
Hi Li,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on next-20180713]
[cannot apply to v4.18-rc5]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Li-RongQing/net-convert-gro_count-to-bitmask/20180715-233722
config: i386-randconfig-s1-201828 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 
:: branch date: 15 hours ago
:: commit date: 15 hours ago

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/current.h:5:0,
from include/linux/sched.h:12,
from include/linux/uaccess.h:5,
from net/core/dev.c:75:
   net/core/dev.c: In function 'netdev_init':
>> include/linux/compiler.h:339:38: error: call to '__compiletime_assert_9285' 
>> declared with attribute error: BUILD_BUG_ON failed: GRO_HASH_BUCKETS > 
>> FIELD_SIZEOF(struct napi_struct, gro_bitmask)
 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 ^
   include/linux/compiler.h:319:4: note: in definition of macro 
'__compiletime_assert'
   prefix ## suffix();\
   ^~
   include/linux/compiler.h:339:2: note: in expansion of macro 
'_compiletime_assert'
 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 ^~~
   include/linux/build_bug.h:45:37: note: in expansion of macro 
'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~
   include/linux/build_bug.h:69:2: note: in expansion of macro 
'BUILD_BUG_ON_MSG'
 BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
 ^~~~
   net/core/dev.c:9284:2: note: in expansion of macro 'BUILD_BUG_ON'
 BUILD_BUG_ON(GRO_HASH_BUCKETS >
 ^~~~
--
   In file included from arch/x86/include/asm/current.h:5:0,
from include/linux/sched.h:12,
from include/linux/uaccess.h:5,
from net//core/dev.c:75:
   net//core/dev.c: In function 'netdev_init':
>> include/linux/compiler.h:339:38: error: call to '__compiletime_assert_9285' 
>> declared with attribute error: BUILD_BUG_ON failed: GRO_HASH_BUCKETS > 
>> FIELD_SIZEOF(struct napi_struct, gro_bitmask)
 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 ^
   include/linux/compiler.h:319:4: note: in definition of macro 
'__compiletime_assert'
   prefix ## suffix();\
   ^~
   include/linux/compiler.h:339:2: note: in expansion of macro 
'_compiletime_assert'
 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 ^~~
   include/linux/build_bug.h:45:37: note: in expansion of macro 
'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~
   include/linux/build_bug.h:69:2: note: in expansion of macro 
'BUILD_BUG_ON_MSG'
 BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
 ^~~~
   net//core/dev.c:9284:2: note: in expansion of macro 'BUILD_BUG_ON'
 BUILD_BUG_ON(GRO_HASH_BUCKETS >
 ^~~~

# 
https://github.com/0day-ci/linux/commit/b4ba3db381100e1869270a58dd2d9950ef0923de
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout b4ba3db381100e1869270a58dd2d9950ef0923de
vim +/__compiletime_assert_9285 +339 include/linux/compiler.h

9a8ab1c3 Daniel Santos 2013-02-21  325  
9a8ab1c3 Daniel Santos 2013-02-21  326  #define _compiletime_assert(condition, 
msg, prefix, suffix) \
9a8ab1c3 Daniel Santos 2013-02-21  327  __compiletime_assert(condition, 
msg, prefix, suffix)
9a8ab1c3 Daniel Santos 2013-02-21  328  
9a8ab1c3 Daniel Santos 2013-02-21  329  /**
9a8ab1c3 Daniel Santos 2013-02-21  330   * compiletime_assert - break build and 
emit msg if condition is false
9a8ab1c3 Daniel Santos 2013-02-21  331   * @condition: a compile-time constant 
condition to check
9a8ab1c3 Daniel Santos 2013-02-21  332   * @msg:   a message to emit if 
condition is false
9a8ab1c3 Daniel Santos 2013-02-21  333   *
9a8ab1c3 Daniel Santos 2013-02-21  334   * In tradition of POSIX assert, this 
macro will break the build if the
9a8ab1c3 Daniel Santos 2013-02-21  335   * supplied condition is *false*, 
emitting the supplied error message if the
9a8ab1c3 Daniel Santos 2013-02-21  336   * compiler has support to do so.
9a8ab1c3 Daniel Santos 2013-02-21  337   */
9a8ab1c3 Daniel Santos 2013-02-21  338  #define compiletime_assert(condition, 
msg) \
9a8ab1c3 Daniel Santos 2013-02-21 @339  

Re: [PATCH] net: cavium: Drop dependency of NET_VENDOR_CAVIUM on PCI

2018-07-16 Thread kbuild test robot
Hi Alexander,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.18-rc5 next-20180713]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Alexander-Sverdlin/net-cavium-Drop-dependency-of-NET_VENDOR_CAVIUM-on-PCI/20180716-002448
config: s390-defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=s390 
:: branch date: 15 hours ago
:: commit date: 15 hours ago

All error/warnings (new ones prefixed by >>):

   drivers/net/ethernet/cavium/common/cavium_ptp.c: In function 
'cavium_ptp_probe':
>> drivers/net/ethernet/cavium/common/cavium_ptp.c:235:8: error: implicit 
>> declaration of function 'pcim_enable_device'; did you mean 
>> 'pci_enable_device'? [-Werror=implicit-function-declaration]
 err = pcim_enable_device(pdev);
   ^~
   pci_enable_device
   drivers/net/ethernet/cavium/common/cavium_ptp.c: At top level:
>> drivers/net/ethernet/cavium/common/cavium_ptp.c:339:1: warning: data 
>> definition has no type or storage class
module_pci_driver(cavium_ptp_driver);
^
>> drivers/net/ethernet/cavium/common/cavium_ptp.c:339:1: error: type defaults 
>> to 'int' in declaration of 'module_pci_driver' [-Werror=implicit-int]
>> drivers/net/ethernet/cavium/common/cavium_ptp.c:339:1: warning: parameter 
>> names (without types) in function declaration
   drivers/net/ethernet/cavium/common/cavium_ptp.c:332:26: warning: 
'cavium_ptp_driver' defined but not used [-Wunused-variable]
static struct pci_driver cavium_ptp_driver = {
 ^
   cc1: some warnings being treated as errors

# 
https://github.com/0day-ci/linux/commit/c862aa8f427828f2c08fdc96494152690a2ec5d0
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout c862aa8f427828f2c08fdc96494152690a2ec5d0
vim +235 drivers/net/ethernet/cavium/common/cavium_ptp.c

8c56df37 Radoslaw Biernacki 2018-01-15  216  
8c56df37 Radoslaw Biernacki 2018-01-15  217  static int cavium_ptp_probe(struct 
pci_dev *pdev,
8c56df37 Radoslaw Biernacki 2018-01-15  218 const 
struct pci_device_id *ent)
8c56df37 Radoslaw Biernacki 2018-01-15  219  {
8c56df37 Radoslaw Biernacki 2018-01-15  220 struct device *dev = >dev;
8c56df37 Radoslaw Biernacki 2018-01-15  221 struct cavium_ptp *clock;
8c56df37 Radoslaw Biernacki 2018-01-15  222 struct cyclecounter *cc;
8c56df37 Radoslaw Biernacki 2018-01-15  223 u64 clock_cfg;
8c56df37 Radoslaw Biernacki 2018-01-15  224 u64 clock_comp;
8c56df37 Radoslaw Biernacki 2018-01-15  225 int err;
8c56df37 Radoslaw Biernacki 2018-01-15  226  
8c56df37 Radoslaw Biernacki 2018-01-15  227 clock = devm_kzalloc(dev, 
sizeof(*clock), GFP_KERNEL);
8c56df37 Radoslaw Biernacki 2018-01-15  228 if (!clock) {
8c56df37 Radoslaw Biernacki 2018-01-15  229 err = -ENOMEM;
8c56df37 Radoslaw Biernacki 2018-01-15  230 goto error;
8c56df37 Radoslaw Biernacki 2018-01-15  231 }
8c56df37 Radoslaw Biernacki 2018-01-15  232  
8c56df37 Radoslaw Biernacki 2018-01-15  233 clock->pdev = pdev;
8c56df37 Radoslaw Biernacki 2018-01-15  234  
8c56df37 Radoslaw Biernacki 2018-01-15 @235 err = pcim_enable_device(pdev);
8c56df37 Radoslaw Biernacki 2018-01-15  236 if (err)
8c56df37 Radoslaw Biernacki 2018-01-15  237 goto error_free;
8c56df37 Radoslaw Biernacki 2018-01-15  238  
8c56df37 Radoslaw Biernacki 2018-01-15  239 err = pcim_iomap_regions(pdev, 
1 << PCI_PTP_BAR_NO, pci_name(pdev));
8c56df37 Radoslaw Biernacki 2018-01-15  240 if (err)
8c56df37 Radoslaw Biernacki 2018-01-15  241 goto error_free;
8c56df37 Radoslaw Biernacki 2018-01-15  242  
8c56df37 Radoslaw Biernacki 2018-01-15  243 clock->reg_base = 
pcim_iomap_table(pdev)[PCI_PTP_BAR_NO];
8c56df37 Radoslaw Biernacki 2018-01-15  244  
8c56df37 Radoslaw Biernacki 2018-01-15  245 
spin_lock_init(>spin_lock);
8c56df37 Radoslaw Biernacki 2018-01-15  246  
8c56df37 Radoslaw Biernacki 2018-01-15  247 cc = >cycle_counter;
8c56df37 Radoslaw Biernacki 2018-01-15  248 cc->read = cavium_ptp_cc_read;
8c56df37 Radoslaw Biernacki 2018-01-15  249 cc->mask = 
CYCLECOUNTER_MASK(64);
8c56df37 Radoslaw Biernacki 2018-01-15  250 cc->mult = 1;
8c56df37 Radoslaw Biernacki 2018-01-15  251 cc->shift = 0;
8c56df37 Radoslaw Biernacki 2018-01-15  252  
8c56df37 Radoslaw Biernacki 2018-01-15  253 
timecounter_init(>time_counte

Re: [PATCH next] bonding: pass link-local packets to bonding master also.

2018-07-16 Thread Stephen Hemminger
On Mon, 16 Jul 2018 16:57:22 -0700
Mahesh Bandewar (महेश बंडेवार)  wrote:

> On Mon, Jul 16, 2018 at 4:33 PM, Stephen Hemminger
>  wrote:
> > On Sun, 15 Jul 2018 18:12:46 -0700
> > Mahesh Bandewar  wrote:
> >  
> >> From: Mahesh Bandewar 
> >>
> >> Commit b89f04c61efe ("bonding: deliver link-local packets with
> >> skb->dev set to link that packets arrived on") changed the behavior
> >> of how link-local-multicast packets are processed. The change in
> >> the behavior broke some legacy use cases where these packets are
> >> expected to arrive on bonding master device also.
> >>
> >> This patch passes the packet to the stack with the link it arrived
> >> on as well as passes to the bonding-master device to preserve the
> >> legacy use case.
> >>
> >> Reported-by: Michal Soltys 
> >> Signed-off-by: Mahesh Bandewar   
> >
> > Thanks for fixing this.
> >
> > Why not add a Fixes: tag instead of just talking about the commit?
> > That helps the stable maintainers know which versions of the kernel
> > need the patch.  
> Well, I thought about it. It's definitely 'related' but not sure it
> 'fixes' in true sense. It definitely fixes the broken legacy case
> though. Is that sufficient to add 'fixes' tag?

The previous commit caused a regression. your change fixes the regression


Re: [PATCH][net-next][v2] net: convert gro_count to bitmask

2018-07-16 Thread David Miller
From: Eric Dumazet 
Date: Mon, 16 Jul 2018 16:40:52 -0700

> I guess we could either use BITS_PER_LONG or :
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 
> c883b17ee0fe2c8a7ca2f2867560ba74004790a7..4f8b92d81d107fc9acd2499297435cbd9e9b5c67
>  100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9282,7 +9282,7 @@ static struct hlist_head * __net_init 
> netdev_create_hash(void)

Commited thusly:


[PATCH] net: Fix GRO_HASH_BUCKETS assertion.

FIELD_SIZEOF() is in bytes, but we want bits.

Fixes: d9f37d01e294 ("net: convert gro_count to bitmask")
Suggested-by: Eric Dumazet 
Signed-off-by: David S. Miller 
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c883b17ee0fe..4f8b92d81d10 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9282,7 +9282,7 @@ static struct hlist_head * __net_init 
netdev_create_hash(void)
 static int __net_init netdev_init(struct net *net)
 {
BUILD_BUG_ON(GRO_HASH_BUCKETS >
-   FIELD_SIZEOF(struct napi_struct, gro_bitmask));
+8 * FIELD_SIZEOF(struct napi_struct, gro_bitmask));
 
if (net != _net)
INIT_LIST_HEAD(>dev_base_head);
-- 
2.13.6



Re: [PATCH next] bonding: pass link-local packets to bonding master also.

2018-07-16 Thread महेश बंडेवार
On Mon, Jul 16, 2018 at 4:33 PM, Stephen Hemminger
 wrote:
> On Sun, 15 Jul 2018 18:12:46 -0700
> Mahesh Bandewar  wrote:
>
>> From: Mahesh Bandewar 
>>
>> Commit b89f04c61efe ("bonding: deliver link-local packets with
>> skb->dev set to link that packets arrived on") changed the behavior
>> of how link-local-multicast packets are processed. The change in
>> the behavior broke some legacy use cases where these packets are
>> expected to arrive on bonding master device also.
>>
>> This patch passes the packet to the stack with the link it arrived
>> on as well as passes to the bonding-master device to preserve the
>> legacy use case.
>>
>> Reported-by: Michal Soltys 
>> Signed-off-by: Mahesh Bandewar 
>
> Thanks for fixing this.
>
> Why not add a Fixes: tag instead of just talking about the commit?
> That helps the stable maintainers know which versions of the kernel
> need the patch.
Well, I thought about it. It's definitely 'related' but not sure it
'fixes' in true sense. It definitely fixes the broken legacy case
though. Is that sufficient to add 'fixes' tag?


Re: [PATCH next] bonding: pass link-local packets to bonding master also.

2018-07-16 Thread महेश बंडेवार
On Mon, Jul 16, 2018 at 2:24 PM, Jay Vosburgh
 wrote:
> Mahesh Bandewar  wrote:
>
>>From: Mahesh Bandewar 
>>
>>Commit b89f04c61efe ("bonding: deliver link-local packets with
>>skb->dev set to link that packets arrived on") changed the behavior
>>of how link-local-multicast packets are processed. The change in
>>the behavior broke some legacy use cases where these packets are
>>expected to arrive on bonding master device also.
>>
>>This patch passes the packet to the stack with the link it arrived
>>on as well as passes to the bonding-master device to preserve the
>>legacy use case.
>
> Michal, can you test this?  I'm travelling this week and won't
> be able to run the patch.
>
> Mahesh, will this confuse LLDP, et al, daemons that, e.g., bind
> to every possible interface and now see the same LLDP PDU (identical
> Chassis ID, Port ID, et al, TLVs) on multiple interfaces?
>
Well it's hard to say. In the previous world when these packets used
to appear only on bonding-master, that service had to go extra-lengths
to figure it out which link it actually came on in. With the earlier
change (SHA1: b89f04c61efe) it didn't have to but with this patch, the
best thing that they could do is just ignore those packets coming from
(any) virtual devices. The only reason why I'm OK with this change is
because L2 of a physical link is shared with a virtual link (bonding
master) and hence both links receiving the same link-local-multicast
seems acceptable. Making them appear only on bonding-master is just
wrong while correcting that behavior breaks the legacy use case and
here we are.

BTW when links are aggregated and using LACP, these packets don't
arrive the system-mac but the real mac of the sender with a dest
multicast-mac.

--mahesh..

> Thanks,
>
> -J
>
>>Reported-by: Michal Soltys 
>>Signed-off-by: Mahesh Bandewar 
>>---
>> drivers/net/bonding/bond_main.c | 17 +++--
>> 1 file changed, 15 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>index 9a2ea3c1f949..1d3b7d8448f2 100644
>>--- a/drivers/net/bonding/bond_main.c
>>+++ b/drivers/net/bonding/bond_main.c
>>@@ -1177,9 +1177,22 @@ static rx_handler_result_t bond_handle_frame(struct 
>>sk_buff **pskb)
>>   }
>>   }
>>
>>-  /* don't change skb->dev for link-local packets */
>>-  if (is_link_local_ether_addr(eth_hdr(skb)->h_dest))
>>+  /* Link-local multicast packets should be passed to the
>>+   * stack on the link they arrive as well as pass them to the
>>+   * bond-master device. These packets are mostly usable when
>>+   * stack receives it with the link on which they arrive
>>+   * (e.g. LLDP) but there may be some legacy behavior that
>>+   * expects these packets to appear on bonding master too.
>>+   */
>>+  if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) {
>>+  struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>>+
>>+  if (nskb) {
>>+  nskb->dev = bond->dev;
>>+  netif_rx(nskb);
>>+  }
>>   return RX_HANDLER_PASS;
>>+  }
>>   if (bond_should_deliver_exact_match(skb, slave, bond))
>>   return RX_HANDLER_EXACT;
>>
>>--
>>2.18.0.203.gfac676dfb9-goog
>
> ---
> -Jay Vosburgh, jay.vosbu...@canonical.com


[net:master 66/72] drivers/net/hyperv/rndis_filter.c:1341:16: sparse: Using plain integer as NULL pointer

2018-07-16 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head:   3578a7ecb69920efc3885dbd610e98c00dbdf5db
commit: 916c5e1413be058d1c1f6e502db350df890730ce [66/72] hv/netvsc: fix 
handling of fallback to single queue mode
reproduce:
# apt-get install sparse
git checkout 916c5e1413be058d1c1f6e502db350df890730ce
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   drivers/net/hyperv/rndis_filter.c:1307:31: sparse: expression using 
sizeof(void)
   drivers/net/hyperv/rndis_filter.c:1307:31: sparse: expression using 
sizeof(void)
   drivers/net/hyperv/rndis_filter.c:1310:31: sparse: expression using 
sizeof(void)
   drivers/net/hyperv/rndis_filter.c:1313:31: sparse: expression using 
sizeof(void)
   drivers/net/hyperv/rndis_filter.c:1313:31: sparse: expression using 
sizeof(void)
>> drivers/net/hyperv/rndis_filter.c:1341:16: sparse: Using plain integer as 
>> NULL pointer

vim +1341 drivers/net/hyperv/rndis_filter.c

  1224  
  1225  struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
  1226struct netvsc_device_info 
*device_info)
  1227  {
  1228  struct net_device *net = hv_get_drvdata(dev);
  1229  struct netvsc_device *net_device;
  1230  struct rndis_device *rndis_device;
  1231  struct ndis_recv_scale_cap rsscap;
  1232  u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
  1233  u32 mtu, size;
  1234  u32 num_possible_rss_qs;
  1235  int i, ret;
  1236  
  1237  rndis_device = get_rndis_device();
  1238  if (!rndis_device)
  1239  return ERR_PTR(-ENODEV);
  1240  
  1241  /* Let the inner driver handle this first to create the netvsc 
channel
  1242   * NOTE! Once the channel is created, we may get a receive 
callback
  1243   * (RndisFilterOnReceive()) before this call is completed
  1244   */
  1245  net_device = netvsc_device_add(dev, device_info);
  1246  if (IS_ERR(net_device)) {
  1247  kfree(rndis_device);
  1248  return net_device;
  1249  }
  1250  
  1251  /* Initialize the rndis device */
  1252  net_device->max_chn = 1;
  1253  net_device->num_chn = 1;
  1254  
  1255  net_device->extension = rndis_device;
  1256  rndis_device->ndev = net;
  1257  
  1258  /* Send the rndis initialization message */
  1259  ret = rndis_filter_init_device(rndis_device, net_device);
  1260  if (ret != 0)
  1261  goto err_dev_remv;
  1262  
  1263  /* Get the MTU from the host */
  1264  size = sizeof(u32);
  1265  ret = rndis_filter_query_device(rndis_device, net_device,
  1266  
RNDIS_OID_GEN_MAXIMUM_FRAME_SIZE,
  1267  , );
  1268  if (ret == 0 && size == sizeof(u32) && mtu < net->mtu)
  1269  net->mtu = mtu;
  1270  
  1271  /* Get the mac address */
  1272  ret = rndis_filter_query_device_mac(rndis_device, net_device);
  1273  if (ret != 0)
  1274  goto err_dev_remv;
  1275  
  1276  memcpy(device_info->mac_adr, rndis_device->hw_mac_adr, 
ETH_ALEN);
  1277  
  1278  /* Get friendly name as ifalias*/
  1279  if (!net->ifalias)
  1280  rndis_get_friendly_name(net, rndis_device, net_device);
  1281  
  1282  /* Query and set hardware capabilities */
  1283  ret = rndis_netdev_set_hwcaps(rndis_device, net_device);
  1284  if (ret != 0)
  1285  goto err_dev_remv;
  1286  
  1287  rndis_filter_query_device_link_status(rndis_device, net_device);
  1288  
  1289  netdev_dbg(net, "Device MAC %pM link state %s\n",
  1290 rndis_device->hw_mac_adr,
  1291 rndis_device->link_state ? "down" : "up");
  1292  
  1293  if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_5)
  1294  goto out;
  1295  
  1296  rndis_filter_query_link_speed(rndis_device, net_device);
  1297  
  1298  /* vRSS setup */
  1299  memset(, 0, rsscap_size);
  1300  ret = rndis_filter_query_device(rndis_device, net_device,
  1301  
OID_GEN_RECEIVE_SCALE_CAPABILITIES,
  1302  , _size);
  1303  if (ret || rsscap.num_recv_que < 2)
  1304  goto out;
  1305  
  1306  /* This guarantees that num_possible_rss_qs <= num_online_cpus 
*/
> 1307  num_possible_rss_qs = min_t(u32, num_online_cpus(),
  1308  rsscap.num_recv_que);
  1309  
  1310  net_device->max_chn = min_t(u32, VRSS_CHANNEL_MAX, 
num_possible_rss_qs);
  1311  

Re: [PATCH][net-next][v2] net: convert gro_count to bitmask

2018-07-16 Thread Eric Dumazet



On 07/12/2018 11:41 PM, Li RongQing wrote:
> gro_hash size is 192 bytes, and uses 3 cache lines, if there is few

 */
> @@ -9264,6 +9273,9 @@ static struct hlist_head * __net_init 
> netdev_create_hash(void)
>  /* Initialize per network namespace state */
>  static int __net_init netdev_init(struct net *net)
>  {
> + BUILD_BUG_ON(GRO_HASH_BUCKETS >
> + FIELD_SIZEOF(struct napi_struct, gro_bitmask));
> +

Sorry for the delay (patch is already merged)

This looks wrong to me. 

FIELD_SIZEOF() is in bytes not bits.

I guess we could either use BITS_PER_LONG or :

diff --git a/net/core/dev.c b/net/core/dev.c
index 
c883b17ee0fe2c8a7ca2f2867560ba74004790a7..4f8b92d81d107fc9acd2499297435cbd9e9b5c67
 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9282,7 +9282,7 @@ static struct hlist_head * __net_init 
netdev_create_hash(void)
 static int __net_init netdev_init(struct net *net)
 {
BUILD_BUG_ON(GRO_HASH_BUCKETS >
-   FIELD_SIZEOF(struct napi_struct, gro_bitmask));
+8 * FIELD_SIZEOF(struct napi_struct, gro_bitmask));
 
if (net != _net)
INIT_LIST_HEAD(>dev_base_head);



Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible

2018-07-16 Thread Cong Wang
On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni  wrote:
>
> When mirred is invoked from the ingress path, and it wants to redirect
> the processed packet, it can now use the ACT_REDIRECT action,
> filling the tcf_result accordingly.
>
> This avoids a skb_clone() in the TC S/W data path giving a ~10%
> improvement in forwarding performances. Overall TC S/W performances
> are now comparable to the kernel openswitch datapath.

Avoiding skb_clone() for redirection is cool, but why need to use
skb_do_redirect() here?

There is a subtle difference here:

skb_do_redirect() calls __bpf_rx_skb() which calls
dev_forward_skb().

while the current mirred action doesn't scrub packets when
redirecting to ingress (from egress). Although I forget if it is
intentionally.

Also, skb->skb_iif is unset in skb_do_redirect() when
redirecting to ingress, I recall we have to set it correctly
for input routing. Probably yet another reason why we
can't scrub it, unless my memory goes wrong. :)

Thanks!


Re: [PATCH bpf-next 0/2] tools: bpf: build cleanups

2018-07-16 Thread Alexei Starovoitov
On Tue, Jul 17, 2018 at 12:34:03AM +0200, Daniel Borkmann wrote:
> On 07/16/2018 07:57 PM, Jakub Kicinski wrote:
> > Hi!
> > 
> > While tracking down the perf vs libbpf vs reallocarray build issue
> > I noticed libbpf is checking for a feature it never uses and that
> > bpftool's makefile attempt to reuse feature dump doesn't really
> > make sense.
> > 
> > Jakub Kicinski (2):
> >   tools: libbpf: remove libelf-getphdrnum feature detection
> >   tools: bpftool: don't pass FEATURES_DUMP to libbpf
> > 
> >  tools/bpf/bpftool/Makefile | 2 +-
> >  tools/lib/bpf/Makefile | 6 +-
> >  2 files changed, 2 insertions(+), 6 deletions(-)
> > 
> 
> Acked-by: Daniel Borkmann 

somehow cover letter didn't make it into patchworks,
so I applied both patches manually to bpf-next and propagated Daniel's Ack.
Thanks!



Re: [PATCH next] bonding: pass link-local packets to bonding master also.

2018-07-16 Thread Stephen Hemminger
On Sun, 15 Jul 2018 18:12:46 -0700
Mahesh Bandewar  wrote:

> From: Mahesh Bandewar 
> 
> Commit b89f04c61efe ("bonding: deliver link-local packets with
> skb->dev set to link that packets arrived on") changed the behavior
> of how link-local-multicast packets are processed. The change in
> the behavior broke some legacy use cases where these packets are
> expected to arrive on bonding master device also.
> 
> This patch passes the packet to the stack with the link it arrived
> on as well as passes to the bonding-master device to preserve the
> legacy use case.
> 
> Reported-by: Michal Soltys 
> Signed-off-by: Mahesh Bandewar 

Thanks for fixing this.

Why not add a Fixes: tag instead of just talking about the commit?
That helps the stable maintainers know which versions of the kernel
need the patch.


Re: [PATCH v2 net-next 01/14] net: Clear skb->tstamp only on the forwarding path

2018-07-16 Thread Eric Dumazet



On 07/16/2018 02:52 PM, Jesus Sanchez-Palencia wrote:
> Hi Eric,
> 
> 
> 
> On 07/13/2018 10:35 AM, Eric Dumazet wrote:
>>
>>
>> On 07/03/2018 03:42 PM, Jesus Sanchez-Palencia wrote:
>>> This is done in preparation for the upcoming time based transmission
>>> patchset. Now that skb->tstamp will be used to hold packet's txtime,
>>> we must ensure that it is being cleared when traversing namespaces.
>>> Also, doing that from skb_scrub_packet() before the early return would
>>> break our feature when tunnels are used.
>>>
>>> Signed-off-by: Jesus Sanchez-Palencia 
>>> ---
>>>  net/core/skbuff.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 1357f36c8a5e..c4e24ac27464 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -4898,7 +4898,6 @@ EXPORT_SYMBOL(skb_try_coalesce);
>>>   */
>>>  void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>>>  {
>>> -   skb->tstamp = 0;
>>> skb->pkt_type = PACKET_HOST;
>>> skb->skb_iif = 0;
>>> skb->ignore_df = 0;
>>> @@ -4912,6 +4911,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>>>  
>>> ipvs_reset(skb);
>>> skb->mark = 0;
>>> +   skb->tstamp = 0;
>>>  }
>>>  EXPORT_SYMBOL_GPL(skb_scrub_packet);
>>>  
>>>
>>
>>
>>
>> I believe we had some misunderstanding here.
>>
>> What I meant by forwarding is the following case :
>>
>> - We receive a packet.
>> - netstamp_wanted is >0 (because at least one packet capture is active)
>> - __net_timestamp() is called and does :
>> skb->tstamp = ktime_get_real();
>>
>> Then this skb is forwarded into an interface where EDT is taken into
>> consideration by either a qdisc or a device.
>>
>> Since CLOCK_TAI is a different base than CLOCK_REALTIME, we might have a 
>> problem.
> 
> 
> I'm not sure we have a problem here. For the Tx path I only see
> net_timestamp_set() being called from dev_queue_xmit_nit(). And even there, 
> it's
> a clone of the skb that gets timestamped.
> 
> I believe the original skb, which had the valid txtime copied into 
> skb->tstamp,
> is not modified anywhere along that path.
> 
> What am I missing, please?
> 
> Thanks,
> Jesus
> 


I am simply stating that a linux router, receiving packet on ethX and forwarding
them on ethY, could have a problem if ethY has a qdisc looking at skb->tstamp
assuming a timestamp in CLOCK_TAI base.

In this case, skb->tstamp would have been set at ingress (not using CLOCK_TAI
but CLOCK_REALTIME), and would be read at egress (assuming CLOCK_TAI)

Normal IPV4 routing path would be in net/ipv4/ip_forward.c, no scrubbing ever 
happens,
and no cloning either.

Your patch  (Clear skb->tstamp only on the forwarding path) is not handling the
typical forward path, only the cases where 'scrubbing' is used.



> 
> 
>>
>>
>> Solutions for this problem :
>>
>> 1) Convert all our skb->tstamp usages to CLOCK_TAI base.
>>
>> or
>>
>> 2) clear skb->tstamp in forwarding paths, including the ones not scrubbing 
>> the packet.
>>
>> My preference is 1), even if it is a bit more work.
>>


Re: [PATCH bpf-next 0/2] tools: bpf: build cleanups

2018-07-16 Thread Daniel Borkmann
On 07/16/2018 07:57 PM, Jakub Kicinski wrote:
> Hi!
> 
> While tracking down the perf vs libbpf vs reallocarray build issue
> I noticed libbpf is checking for a feature it never uses and that
> bpftool's makefile attempt to reuse feature dump doesn't really
> make sense.
> 
> Jakub Kicinski (2):
>   tools: libbpf: remove libelf-getphdrnum feature detection
>   tools: bpftool: don't pass FEATURES_DUMP to libbpf
> 
>  tools/bpf/bpftool/Makefile | 2 +-
>  tools/lib/bpf/Makefile | 6 +-
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 

Acked-by: Daniel Borkmann 


Re: [PATCH ipsec-next] xfrm: Allow Set Mark to be Updated Using UPDSA

2018-07-16 Thread Nathan Harold
< re-sent with apologies due to incorrect formatting last time... :-( >

Hi Eyal,

> If x1 points to a state previously found using __xfrm_state_locate(x),
> won't __xfrm_state_bump_genids(x1) be equivalent to x1->genid++ in
> this case?

In the vanilla case this is true. IE, if there are no strange/abusive
uses of the API such as the test below where multiple SAs can match
the locate().

> Is it possible that other states will match all of x1 parameters?

Yes.  Not sure if it's a bug or a feature, but it's possible for
multiple SAs to match... for a depressing example, check out
https://android-review.googlesource.com/c/kernel/tests/+/680958. There
may be cases where something like this is desired behavior that I'm
not aware of. Since this is control path, it felt to me like the
formalism of using the xfrm_state_bump_genids() was worth not possibly
walking into a different subtle bug later.

> Also, any idea why this isn't needed for other changes in the state?

The set_mark (output_mark) is somewhat special because changing this
mark impacts the routing lookup, which up to now, none of the other
parameters in the update_sa function do. A new output_mark can and
will reroute packets to different interfaces. Thus, when we change
this thing, we want to ensure that we always build a new bundle with a
new bundle with a new route lookup based on the new set_mark. Since we
removed the flow cache, things might *incidentally* seem to work right
now; but, I think that's incidental rather than correct. By bumping
the genid, we get the dst_entry->check() function to correctly return
that the dst is obsolete when we call check(). I'm honestly not sure
what corner cases we could land in if we didn't bump the genid in such
a case.

There's definitely a lot going on behind the scenes in this little
change that I only tenuously grasp, so it's possible that I'm being
overly cautious in this case. Please let me know your further thoughts
on whether we need to bump the genid. FYI once this patch is settled,
I plan to upload a patch to update the xfrm_if_id, which I planned to
nestle in to this same logic (and with similar, albeit possibly
more-straightforward rationale).

-Nathan

On Mon, Jul 2, 2018 at 10:14 PM, Eyal Birger  wrote:
> Hi Nathan,
>
> On Fri, 29 Jun 2018 15:07:10 -0700
> Nathan Harold  wrote:
>
>> Allow UPDSA to change "set mark" to permit
>> policy separation of packet routing decisions from
>> SA keying in systems that use mark-based routing.
>>
>> The set mark, used as a routing and firewall mark
>> for outbound packets, is made update-able which
>> allows routing decisions to be handled independently
>> of keying/SA creation. To maintain consistency with
>> other optional attributes, the set mark is only
>> updated if sent with a non-zero value.
>>
>> The per-SA lock and the xfrm_state_lock are taken in
>> that order to avoid a deadlock with
>> xfrm_timer_handler(), which also takes the locks in
>> that order.
>>
>> Signed-off-by: Nathan Harold 
>> Change-Id: Ia05c6733a94c1901cd1e54eb7c7e237704678d71
>> ---
>>  net/xfrm/xfrm_state.c | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
>> index e04a510ec992..c9ffcdfa89f6 100644
>> --- a/net/xfrm/xfrm_state.c
>> +++ b/net/xfrm/xfrm_state.c
>> @@ -1562,6 +1562,15 @@ int xfrm_state_update(struct xfrm_state *x)
>>   if (x1->curlft.use_time)
>>   xfrm_state_check_expire(x1);
>>
>> + if (x->props.smark.m || x->props.smark.v) {
>> + spin_lock_bh(>xfrm.xfrm_state_lock);
>> +
>> + x1->props.smark = x->props.smark;
>> +
>> + __xfrm_state_bump_genids(x1);
>
> So I'm trying to wrap my head around this genid thing :)
>
> If x1 points to a state previously found using __xfrm_state_locate(x),
> won't __xfrm_state_bump_genids(x1) be equivalent to x1->genid++ in
> this case?
>
> Is it possible that other states will match all of x1 parameters?
>
> Also, any idea why this isn't needed for other changes in the state?
>
> Thanks!
> Eyal.


[PATCH iproute2 net-next] ipneigh: exclude NTF_EXT_LEARNED from default filter

2018-07-16 Thread Roopa Prabhu
From: Roopa Prabhu 

NUD_NOARP entries are filtered out by default by iproute2.
We dont want NUD_NOARP with NTF_EXT_LEARNED flag filtered out.
This patch extends the default filter check for ip neigh show
to include the NTF_EXT_LEARNED flag.

Signed-off-by: Roopa Prabhu 
---
 ip/ipneigh.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index bd6e5c5..a0af705 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -262,6 +262,7 @@ int print_neigh(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
return 0;
if (!(filter.state>ndm_state) &&
!(r->ndm_flags & NTF_PROXY) &&
+   !(r->ndm_flags & NTF_EXT_LEARNED) &&
(r->ndm_state || !(filter.state&0x100)) &&
(r->ndm_family != AF_DECnet))
return 0;
-- 
2.1.4



Re: [PATCH 2/2] samples/bpf: test_cgrp2_sock2: fix an off by one

2018-07-16 Thread Alexei Starovoitov
On Fri, Jul 13, 2018 at 06:05:37PM +0300, Dan Carpenter wrote:
> "prog_cnt" is the number of elements which are filled out in prog_fd[]
> so the test should be >= instead of >.
> 
> Signed-off-by: Dan Carpenter 

since this is sample code I've applied both patches to bpf-next tree.
Thanks



RE: [RFC PATCH mlx5-next 07/18] net/mlx5: Expose new packet reformat capabilities

2018-07-16 Thread Mark Bloch

> -Original Message-
> From: Or Gerlitz [mailto:gerlitz...@gmail.com]
> Sent: Monday, July 16, 2018 2:33 PM
> To: Mark Bloch 
> Cc: Doug Ledford ; Jason Gunthorpe
> ; Leon Romanovsky ; RDMA
> mailing list ; Saeed Mahameed
> ; linux-netdev 
> Subject: Re: [RFC PATCH mlx5-next 07/18] net/mlx5: Expose new packet
> reformat capabilities
> 
> On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky 
> wrote:
> > From: Mark Bloch 
> >
> > Expose new abilities when creating a packet reformat context.
> >
> > The new types which can be created are:
> > MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL: Ability to create generic
> encap
> > opertion to be done by the HW.
> 
> opertion -> fix
> 
> > MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2: Ability to create generic
> decap
> > opertion where the inner packet doesn't contain L2.
> 
> opertion -> fix
> 
> >
> > MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL: Ability to create generic
> encap
> > opertion to be done by the HW. The L2 of the original packet
> 
> opertion -> fix

Thx, will be fixed.

> 
> > is dropped.
> >
> > Signed-off-by: Mark Bloch 
> > Signed-off-by: Leon Romanovsky 
> > ---
> >  include/linux/mlx5/mlx5_ifc.h | 20 +---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
> > index 059ec97e7b32..c71d711d4893 100644
> > --- a/include/linux/mlx5/mlx5_ifc.h
> > +++ b/include/linux/mlx5/mlx5_ifc.h
> > @@ -341,8 +341,13 @@ struct mlx5_ifc_flow_table_prop_layout_bits {
> > u8 reserved_at_9[0x1];
> > u8 pop_vlan[0x1];
> > u8 push_vlan[0x1];
> > -   u8 reserved_at_c[0x14];
> > -
> > +   u8 reserved_at_c[0x3];
> > +   u8 reformat_and_vlan_action[0x1];
> 
> unused in downstream patches
> what is this BTW?

It's needed for competence for all the bits that deal with packet reformat.
The bit itself indicates whatever the flow table supports
reformat action with a vlan action (pop/push) in the same rule.
> 
> > +   u8 reserved_at_10[0x2];
> > +   u8 reformat_l3_tunnel_to_l2[0x1];
> > +   u8 reformat_l2_to_l3_tunnel[0x1];
> > +   u8 reformat_and_modify_action[0x1];
> 
> unused in downstream patches
> what is this BTW?

Bits to indicate whatever the flow table support the new packet reformat modes,
and setting reformat action with modify action in the same rule.
Those will be used once a FW which expose them is made available,  but as a 
feature/
cap flags I would like to expose them now.

Mark

> 
> 
> 
> > +   u8 reserved_at_15[0xb];
> > u8 reserved_at_20[0x2];
> > u8 log_max_ft_size[0x6];
> > u8 log_max_modify_header_context[0x8];
> > @@ -551,7 +556,13 @@ struct mlx5_ifc_flow_table_nic_cap_bits {
> > u8 nic_rx_multi_path_tirs[0x1];
> > u8 nic_rx_multi_path_tirs_fts[0x1];
> > u8 allow_sniffer_and_nic_rx_shared_tir[0x1];
> > -   u8 reserved_at_3[0x1fd];
> > +   u8 reserved_at_3[0x1d];
> > +   u8 encap_general_header[0x1];
> > +   u8 reserved_at_21[0xa];
> > +   u8 log_max_packet_reformat_context[0x5];
> > +   u8 reserved_at_30[0x6];
> > +   u8 max_encap_header_size[0xa];
> > +   u8 reserved_at_40[0x1c0];
> 
> we are inconsistent, for some fields the term "encap" remained wheres
> for other fields we moved to use "reformat" or "packet reformat" etc


Re: [PATCH v2 net-next 01/14] net: Clear skb->tstamp only on the forwarding path

2018-07-16 Thread Jesus Sanchez-Palencia
Hi Eric,



On 07/13/2018 10:35 AM, Eric Dumazet wrote:
> 
> 
> On 07/03/2018 03:42 PM, Jesus Sanchez-Palencia wrote:
>> This is done in preparation for the upcoming time based transmission
>> patchset. Now that skb->tstamp will be used to hold packet's txtime,
>> we must ensure that it is being cleared when traversing namespaces.
>> Also, doing that from skb_scrub_packet() before the early return would
>> break our feature when tunnels are used.
>>
>> Signed-off-by: Jesus Sanchez-Palencia 
>> ---
>>  net/core/skbuff.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 1357f36c8a5e..c4e24ac27464 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -4898,7 +4898,6 @@ EXPORT_SYMBOL(skb_try_coalesce);
>>   */
>>  void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>>  {
>> -skb->tstamp = 0;
>>  skb->pkt_type = PACKET_HOST;
>>  skb->skb_iif = 0;
>>  skb->ignore_df = 0;
>> @@ -4912,6 +4911,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>>  
>>  ipvs_reset(skb);
>>  skb->mark = 0;
>> +skb->tstamp = 0;
>>  }
>>  EXPORT_SYMBOL_GPL(skb_scrub_packet);
>>  
>>
> 
> 
> 
> I believe we had some misunderstanding here.
> 
> What I meant by forwarding is the following case :
> 
> - We receive a packet.
> - netstamp_wanted is >0 (because at least one packet capture is active)
> - __net_timestamp() is called and does :
> skb->tstamp = ktime_get_real();
> 
> Then this skb is forwarded into an interface where EDT is taken into
> consideration by either a qdisc or a device.
> 
> Since CLOCK_TAI is a different base than CLOCK_REALTIME, we might have a 
> problem.


I'm not sure we have a problem here. For the Tx path I only see
net_timestamp_set() being called from dev_queue_xmit_nit(). And even there, it's
a clone of the skb that gets timestamped.

I believe the original skb, which had the valid txtime copied into skb->tstamp,
is not modified anywhere along that path.

What am I missing, please?

Thanks,
Jesus



> 
> 
> Solutions for this problem :
> 
> 1) Convert all our skb->tstamp usages to CLOCK_TAI base.
> 
> or
> 
> 2) clear skb->tstamp in forwarding paths, including the ones not scrubbing 
> the packet.
> 
> My preference is 1), even if it is a bit more work.
> 


Re: [PATCH net-next] cxgb4: collect ASIC LA dumps from ULP TX

2018-07-16 Thread David Miller
From: Rahul Lakkireddy 
Date: Mon, 16 Jul 2018 19:40:54 +0530

> From: Surendra Mobiya 
> 
> Signed-off-by: Surendra Mobiya 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Ganesh Goudar 

Applied, thank you.


RE: [RFC PATCH rdma-next 13/18] RDMA/mlx5: Enable decap and packet reformat on flow tables

2018-07-16 Thread Mark Bloch

> -Original Message-
> From: Or Gerlitz [mailto:gerlitz...@gmail.com]
> Sent: Monday, July 16, 2018 2:24 PM
> To: Mark Bloch 
> Cc: Doug Ledford ; Jason Gunthorpe
> ; Leon Romanovsky ; RDMA
> mailing list ; Saeed Mahameed
> ; linux-netdev 
> Subject: Re: [RFC PATCH rdma-next 13/18] RDMA/mlx5: Enable decap and
> packet reformat on flow tables
> 
> On Mon, Jul 16, 2018 at 11:23 AM, Leon Romanovsky 
> wrote:
> > From: Mark Bloch 
> >
> > If NIC RX flow tables support decap opertion, enable it on creation.
> 
> opertion --> operation
> 
> > If NIC TX flow tables support reformat opertion, enable it on creation.
> 
> What is the trigger to use the decap flag on RX table or encap flag on
> TX table?
> 

It has no performance penalty to always enable that, so that's what I do if 
supported.
 
> Please note that we have a short blanket w.r.t mutual usage by

FDB and NIC steering tables have different limitations, so encap/decap on NIC 
steering
have nothing to do with the limitations the FDB has with those operations.

> NIC vs e-Switch  steering, did you consider to do that on demand?

The flow table needs to be created with those flags set if we want to attach
decap/packet reformat action to it. BTW, there is no modify action for those 
bits
so that's why I'm doing it on creation.

Mark


Re: [PATCH net 0/2] tg3: Update copyright and fix for tx timeout with 5762

2018-07-16 Thread David Miller
From: Siva Reddy Kallam 
Date: Mon, 16 Jul 2018 11:13:30 +0530

> From: Siva Reddy Kallam 
> 
> First patch:
> Update copyright
> 
> Second patch:
> Add higher cpu clock for 5762

Series applied, thank you.


Re: [PATCH net] ibmvnic: Fix error recovery on login failure

2018-07-16 Thread David Miller
From: John Allen 
Date: Mon, 16 Jul 2018 10:29:30 -0500

> Testing has uncovered a failure case that is not handled properly. In the
> event that a login fails and we are not able to recover on the spot, we
> return 0 from do_reset, preventing any error recovery code from being
> triggered.  Additionally, the state is set to "probed" meaning that when we
> are able to trigger the error recovery, the driver always comes up in the
> probed state. To handle the case properly, we need to return a failure code
> here and set the adapter state to the state that we entered the reset in
> indicating the state that we would like to come out of the recovery reset
> in.
> 
> Signed-off-by: John Allen 

Applied, thanks.


Re: [RFC PATCH mlx5-next 07/18] net/mlx5: Expose new packet reformat capabilities

2018-07-16 Thread Or Gerlitz
On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky  wrote:
> From: Mark Bloch 
>
> Expose new abilities when creating a packet reformat context.
>
> The new types which can be created are:
> MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL: Ability to create generic encap
> opertion to be done by the HW.

opertion -> fix

> MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2: Ability to create generic decap
> opertion where the inner packet doesn't contain L2.

opertion -> fix

>
> MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL: Ability to create generic encap
> opertion to be done by the HW. The L2 of the original packet

opertion -> fix

> is dropped.
>
> Signed-off-by: Mark Bloch 
> Signed-off-by: Leon Romanovsky 
> ---
>  include/linux/mlx5/mlx5_ifc.h | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
> index 059ec97e7b32..c71d711d4893 100644
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@ -341,8 +341,13 @@ struct mlx5_ifc_flow_table_prop_layout_bits {
> u8 reserved_at_9[0x1];
> u8 pop_vlan[0x1];
> u8 push_vlan[0x1];
> -   u8 reserved_at_c[0x14];
> -
> +   u8 reserved_at_c[0x3];
> +   u8 reformat_and_vlan_action[0x1];

unused in downstream patches
what is this BTW?

> +   u8 reserved_at_10[0x2];
> +   u8 reformat_l3_tunnel_to_l2[0x1];
> +   u8 reformat_l2_to_l3_tunnel[0x1];
> +   u8 reformat_and_modify_action[0x1];

unused in downstream patches
what is this BTW?



> +   u8 reserved_at_15[0xb];
> u8 reserved_at_20[0x2];
> u8 log_max_ft_size[0x6];
> u8 log_max_modify_header_context[0x8];
> @@ -551,7 +556,13 @@ struct mlx5_ifc_flow_table_nic_cap_bits {
> u8 nic_rx_multi_path_tirs[0x1];
> u8 nic_rx_multi_path_tirs_fts[0x1];
> u8 allow_sniffer_and_nic_rx_shared_tir[0x1];
> -   u8 reserved_at_3[0x1fd];
> +   u8 reserved_at_3[0x1d];
> +   u8 encap_general_header[0x1];
> +   u8 reserved_at_21[0xa];
> +   u8 log_max_packet_reformat_context[0x5];
> +   u8 reserved_at_30[0x6];
> +   u8 max_encap_header_size[0xa];
> +   u8 reserved_at_40[0x1c0];

we are inconsistent, for some fields the term "encap" remained wheres
for other fields we moved to use "reformat" or "packet reformat" etc


Re: [RFC PATCH mlx5-next 04/18] net/mlx5: Break encap/decap into two separated flags

2018-07-16 Thread Or Gerlitz
On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky  wrote:
> From: Mark Bloch 
>
> Today we are able to attach encap and decap actions only to the FDB.
> In preparation to enable those actions on the NIC flow tables break

tables break --> tables, break

> the single flag into two.


Re: [RFC PATCH mlx5-next 02/18] net/mlx5: Export modify header alloc/dealloc functions

2018-07-16 Thread Or Gerlitz
On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky  wrote:
> From: Mark Bloch 
>
> Those function will be used by the RDMA side to create modify header

function --> functions

> actions to be attached to flow steering rules via verbs.


Re: [PATCH next] bonding: pass link-local packets to bonding master also.

2018-07-16 Thread Jay Vosburgh
Mahesh Bandewar  wrote:

>From: Mahesh Bandewar 
>
>Commit b89f04c61efe ("bonding: deliver link-local packets with
>skb->dev set to link that packets arrived on") changed the behavior
>of how link-local-multicast packets are processed. The change in
>the behavior broke some legacy use cases where these packets are
>expected to arrive on bonding master device also.
>
>This patch passes the packet to the stack with the link it arrived
>on as well as passes to the bonding-master device to preserve the
>legacy use case.

Michal, can you test this?  I'm travelling this week and won't
be able to run the patch.

Mahesh, will this confuse LLDP, et al, daemons that, e.g., bind
to every possible interface and now see the same LLDP PDU (identical
Chassis ID, Port ID, et al, TLVs) on multiple interfaces?

Thanks,

-J

>Reported-by: Michal Soltys 
>Signed-off-by: Mahesh Bandewar 
>---
> drivers/net/bonding/bond_main.c | 17 +++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 9a2ea3c1f949..1d3b7d8448f2 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1177,9 +1177,22 @@ static rx_handler_result_t bond_handle_frame(struct 
>sk_buff **pskb)
>   }
>   }
> 
>-  /* don't change skb->dev for link-local packets */
>-  if (is_link_local_ether_addr(eth_hdr(skb)->h_dest))
>+  /* Link-local multicast packets should be passed to the
>+   * stack on the link they arrive as well as pass them to the
>+   * bond-master device. These packets are mostly usable when
>+   * stack receives it with the link on which they arrive
>+   * (e.g. LLDP) but there may be some legacy behavior that
>+   * expects these packets to appear on bonding master too.
>+   */
>+  if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) {
>+  struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>+
>+  if (nskb) {
>+  nskb->dev = bond->dev;
>+  netif_rx(nskb);
>+  }
>   return RX_HANDLER_PASS;
>+  }
>   if (bond_should_deliver_exact_match(skb, slave, bond))
>   return RX_HANDLER_EXACT;
> 
>-- 
>2.18.0.203.gfac676dfb9-goog

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: [RFC PATCH rdma-next 13/18] RDMA/mlx5: Enable decap and packet reformat on flow tables

2018-07-16 Thread Or Gerlitz
On Mon, Jul 16, 2018 at 11:23 AM, Leon Romanovsky  wrote:
> From: Mark Bloch 
>
> If NIC RX flow tables support decap opertion, enable it on creation.

opertion --> operation

> If NIC TX flow tables support reformat opertion, enable it on creation.

What is the trigger to use the decap flag on RX table or encap flag on
TX table?

Please note that we have a short blanket w.r.t mutual usage by
NIC vs e-Switch  steering, did you consider to do that on demand?


Re: [RFC PATCH mlx5-next 01/18] net/mlx5: Add proper NIC TX steering flow tables support

2018-07-16 Thread Or Gerlitz
On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky  wrote:
> From: Mark Bloch 
>
> Expose the ability to add steering rules to NIC TX flow tables.
> For now, we are only adding TX bypass (egress) which is used by the RDMA
> side. While we are here clean the switch logic.
>
> We expose the same number of priorities as the RX bypass.

What is the use-case / model for priorities in TX steering?

Is/where this (tx prios) is used @ downstream patch?

Or.


Re: [PATCH v2 net] net/ipv6: Do not allow device only routes via the multipath API

2018-07-16 Thread David Miller
From: dsah...@kernel.org
Date: Sun, 15 Jul 2018 09:35:19 -0700

> From: David Ahern 
> 
> Eric reported that reverting the patch that fixed and simplified IPv6
> multipath routes means reverting back to invalid userspace notifications.
> eg.,
> $ ip -6 route add 2001:db8:1::/64 nexthop dev eth0 nexthop dev eth1
> 
> only generates a single notification:
> 2001:db8:1::/64 dev eth0 metric 1024 pref medium
> 
> While working on a fix for this problem I found another case that is just
> broken completely - a multipath route with a gateway followed by device
> followed by gateway:
> $ ip -6 ro add 2001:db8:103::/64
>   nexthop via 2001:db8:1::64
>   nexthop dev dummy2
>   nexthop via 2001:db8:3::64
> 
> In this case the device only route is dropped completely - no notification
> to userpsace but no addition to the FIB either:
> 
> $ ip -6 ro ls
> 2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
> 2001:db8:2::/64 dev dummy2 proto kernel metric 256 pref medium
> 2001:db8:3::/64 dev dummy3 proto kernel metric 256 pref medium
> 2001:db8:103::/64 metric 1024
>   nexthop via 2001:db8:1::64 dev dummy1 weight 1
>   nexthop via 2001:db8:3::64 dev dummy3 weight 1 pref medium
> fe80::/64 dev dummy1 proto kernel metric 256 pref medium
> fe80::/64 dev dummy2 proto kernel metric 256 pref medium
> fe80::/64 dev dummy3 proto kernel metric 256 pref medium
> 
> Really, IPv6 multipath is just FUBAR'ed beyond repair when it comes to
> device only routes, so do not allow it all.
> 
> This change will break any scripts relying on the mpath api for insert,
> but I don't see any other way to handle the permutations. Besides, since
> the routes are added to the FIB as standalone (non-multipath) routes the
> kernel is not doing what the user requested, so it might as well tell the
> user that.
> 
> Reported-by: Eric Dumazet 
> Signed-off-by: David Ahern 

Applied, thanks David.

Is this a -stable candidate?


Re: [PATCH net] net/mlx4_en: Don't reuse RX page when XDP is set

2018-07-16 Thread David Miller
From: Tariq Toukan 
Date: Sun, 15 Jul 2018 13:54:39 +0300

> From: Saeed Mahameed 
> 
> When a new rx packet arrives, the rx path will decide whether to reuse
> the remainder of the page or not according to one of the below conditions:
> 1. frag_info->frag_stride == PAGE_SIZE / 2
> 2. frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> 
> The first condition is no met for when XDP is set.
> For XDP, page_offset is always set to priv->rx_headroom which is
> XDP_PACKET_HEADROOM and frag_info->frag_size is around mtu size + some
> padding, still the 2nd release condition will hold since
> XDP_PACKET_HEADROOM + 1536 < PAGE_SIZE, as a result the page will not
> be released and will be _wrongly_ reused for next free rx descriptor.
> 
> In XDP there is an assumption to have a page per packet and reuse can
> break such assumption and might cause packet data corruptions.
> 
> Fix this by adding an extra condition (!priv->rx_headroom) to the 2nd
> case to avoid page reuse when XDP is set, since rx_headroom is set to 0
> for non XDP setup and set to XDP_PACKET_HEADROOM for XDP setup.
> 
> No additional cache line is required for the new condition.
> 
> Fixes: 34db548bfb95 ("mlx4: add page recycling in receive path")
> Signed-off-by: Saeed Mahameed 
> Signed-off-by: Tariq Toukan 
> Suggested-by: Martin KaFai Lau 

Applied and queued up for -stable.


Re: [RFC net-next v1 1/1] net/sched: Introduce the taprio scheduler

2018-07-16 Thread Jakub Kicinski
On Mon, 16 Jul 2018 10:13:23 -0700, Vinicius Costa Gomes wrote:
> Hi Jiri,
> 
> Jiri Pirko  writes:
> 
> [...]
> 
> >>
> >>gates.sched  
> >
> > Any particular reason this has to be in file and not on the cmdline?  
> 
> The idea here was to keep longer schedules more manageable. And during
> testing I found it more ergonomic to have a file.
> 
> It also has the advantage that the file can be reused by other tools,
> dump-classifier (awful name, I admit), included in that github gist, is
> one example, it uses the schedule (and some more information) to
> calculate which packets would fall outside their "windows" in a pcap
> dump.
> 
> Anyway, if there are use cases that having the schedule in the command
> line helps, I would be happy to add it.

FWIW there is some precedent in cls_bpf/act_bpf for allowing specifying
potentially long sequences both in command line and as a file (cBPF
filters in that case - see man tc-bpf bytecode and bytecode-file).


Re: [PATCH net-next] mlxsw: spectrum: Expose counters for various packet sizes

2018-07-16 Thread David Miller
From: Ido Schimmel 
Date: Sun, 15 Jul 2018 10:45:42 +0300

> From: Jiri Pirko 
> 
> Expose counters ASIC has in the group of RFC 2819 counters that count
> number of packets within specific size range.
> 
> Signed-off-by: Jiri Pirko 
> Signed-off-by: Ido Schimmel 

Applied.


Re: [PATCH net-next] liquidio: fix hang when re-binding VF host drv after running DPDK VF driver

2018-07-16 Thread David Miller
From: Felix Manlunas 
Date: Fri, 13 Jul 2018 12:50:21 -0700

> From: Rick Farrington 
> 
> When configuring SLI_PKTn_OUTPUT_CONTROL, VF driver was assuming that IPTR
> mode was disabled by reset, which was not true.  Since DPDK driver had
> set IPTR mode previously, the VF driver (which uses buf-ptr-only mode) was
> not properly handling DROQ packets (i.e. it saw zero-length packets).
> 
> This represented an invalid hardware configuration which the driver could
> not handle.
> 
> Signed-off-by: Rick Farrington 
> Signed-off-by: Felix Manlunas 

Applied.


[PATCH net] af_unix: ensure POLLOUT on remote close() for connected dgram sockets

2018-07-16 Thread Jason Baron
Applications use ECONNREFUSED as returned from write() in order to
determine that a socket should be closed. When using connected dgram
unix sockets in a poll/write loop, this relies on POLLOUT being
signaled when the remote end closes. However, due to a race POLLOUT
can be missed when the remote closes:

  thread 1 (client)   thread 2 (server)

connect() to server
write() returns -EAGAIN
unix_dgram_poll()
 -> unix_recvq_full() is true
   close()
->unix_release_sock()
 ->wake_up_interruptible_all()
unix_dgram_poll() (due to the
 wake_up_interruptible_all)
 -> unix_recvq_full() still is true
 ->free all skbs


Now thread 1 is stuck and will not receive anymore wakeups. In this
case, when thread 1 gets the -EAGAIN, it has not queued any skbs
otherwise the 'free all skbs' step would in fact cause a wakeup and
a POLLOUT return. So the race here is probably fairly rare because
it means there are no skbs that thread 1 queued and that thread 1
runs before the 'free all skbs' step. Nevertheless, this has been
observed when the syslog daemon closes /dev/log. Tested against
a reproducer that re-creates the syslog hang.

The proposed fix is to move the wake_up_interruptible_all() call
after the 'free all skbs' step.

Reported-by: Ian Lance Taylor 
Cc: Rainer Weikusat 
Signed-off-by: Jason Baron 
---
 net/unix/af_unix.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e5473c0..de242cf 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -529,8 +529,6 @@ static void unix_release_sock(struct sock *sk, int embrion)
sk->sk_state = TCP_CLOSE;
unix_state_unlock(sk);
 
-   wake_up_interruptible_all(>peer_wait);
-
skpair = unix_peer(sk);
 
if (skpair != NULL) {
@@ -560,6 +558,9 @@ static void unix_release_sock(struct sock *sk, int embrion)
kfree_skb(skb);
}
 
+   /* after freeing skbs to make sure POLLOUT triggers */
+   wake_up_interruptible_all(>peer_wait);
+
if (path.dentry)
path_put();
 
-- 
2.7.4



Re: [PATCH net] hv/netvsc: fix handling of fallback to single queue mode

2018-07-16 Thread David Miller
From: Stephen Hemminger 
Date: Fri, 13 Jul 2018 10:38:38 -0700

> The netvsc device may need to fallback to running in single queue
> mode if host side only wants to support single queue.
> 
> Recent change for handling mtu broke this in setup logic.
> 
> Reported-by: Dan Carpenter 
> Fixes: 3ffe64f1a641 ("hv_netvsc: split sub-channel setup into async and sync")
> Signed-off-by: Stephen Hemminger 

Applied.


Re: [PATCH net] ibmvnic: Revise RX/TX queue error messages

2018-07-16 Thread David Miller
From: Thomas Falcon 
Date: Fri, 13 Jul 2018 12:03:32 -0500

> During a device failover, there may be latency between the loss
> of the current backing device and a notification from firmware that
> a failover has occurred. This latency can result in a large amount of
> error printouts as firmware returns outgoing traffic with a generic
> error code. These are not necessarily errors in this case as the
> firmware is busy swapping in a new backing adapter and is not ready
> to send packets yet. This patch reclassifies those error codes as
> warnings with an explanation that a failover may be pending. All
> other return codes will be considered errors.
> 
> Signed-off-by: Thomas Falcon 

Applied.


Re: [PATCH net] ipv6: make DAD fail with enhanced DAD when nonce length differs

2018-07-16 Thread David Miller
From: Sabrina Dubroca 
Date: Fri, 13 Jul 2018 17:21:42 +0200

> Commit adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)")
> added enhanced DAD with a nonce length of 6 bytes. However, RFC7527
> doesn't specify the length of the nonce, other than being 6 + 8*k bytes,
> with integer k >= 0 (RFC3971 5.3.2). The current implementation simply
> assumes that the nonce will always be 6 bytes, but others systems are
> free to choose different sizes.
> 
> If another system sends a nonce of different length but with the same 6
> bytes prefix, it shouldn't be considered as the same nonce. Thus, check
> that the length of the received nonce is the same as the length we sent.
> 
> Ugly scapy test script running on veth0:
> 
> def loop():
> pkt=sniff(iface="veth0", filter="icmp6", count=1)
> pkt = pkt[0]
> b = bytearray(pkt[Raw].load)
> b[1] += 1
> b += b'\xde\xad\xbe\xef\xde\xad\xbe\xef'
> pkt[Raw].load = bytes(b)
> pkt[IPv6].plen += 8
> # fixup checksum after modifying the payload
> pkt[IPv6].payload.cksum -= 0x3b44
> if pkt[IPv6].payload.cksum < 0:
> pkt[IPv6].payload.cksum += 0x
> sendp(pkt, iface="veth0")
> 
> This should result in DAD failure for any address added to veth0's peer,
> but is currently ignored.
> 
> Fixes: adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)")
> Signed-off-by: Sabrina Dubroca 
> Reviewed-by: Stefano Brivio 

Applied and queued up for -stable, thank you!


Re: [PATCH] net: cavium: Drop dependency of NET_VENDOR_CAVIUM on PCI

2018-07-16 Thread David Miller
From: Alexander Sverdlin 
Date: Fri, 13 Jul 2018 17:04:28 +0200

> Octeon Ethernet drivers work perfectly without PCI.
> 
> Signed-off-by: Alexander Sverdlin 

Applied.


Re: [PATCH net-next] cxgb4: do not return DUPLEX_UNKNOWN when link is down

2018-07-16 Thread David Miller
From: Ganesh Goudar 
Date: Fri, 13 Jul 2018 17:56:55 +0530

> We were returning DUPLEX_UNKNOWN in get_link_ksettings() when
> the link was down.  Unfortunately, this causes a problem when
> "ethtool -s autoneg on" is issued for a link which is down because
> the ethtool code first reads the settings and then reapplies them
> with only the changes provided on the command line. Which results
> in us diving into set_link_ksettings() with DUPLEX_UNKNOWN which is
> not DUPLEX_FULL, so set_link_ksettings() throws an -EINVAL error.
> do not return DUPLEX_UNKNOWN to fix the issue.
> 
> Signed-off-by: Casey Leedom 
> Signed-off-by: Ganesh Goudar 

Applied.


Re: [PATCH][net-next][v2] net: convert gro_count to bitmask

2018-07-16 Thread David Miller
From: Li RongQing 
Date: Fri, 13 Jul 2018 14:41:36 +0800

> gro_hash size is 192 bytes, and uses 3 cache lines, if there is few
> flows, gro_hash may be not fully used, so it is unnecessary to iterate
> all gro_hash in napi_gro_flush(), to occupy unnecessary cacheline.
> 
> convert gro_count to a bitmask, and rename it as gro_bitmask, each bit
> represents a element of gro_hash, only flush a gro_hash element if the
> related bit is set, to speed up napi_gro_flush().
> 
> and update gro_bitmask only if it will be changed, to reduce cache
> update
> 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Li RongQing 
> Cc: Stefano Brivio 
> ---
> netperf shows no difference, maybe because my testing machine has large
> cache

Applied.


Re: [PATCH net-next] net: ip6_gre: get ipv6hdr after skb_cow_head()

2018-07-16 Thread David Miller
From: Prashant Bhole 
Date: Fri, 13 Jul 2018 14:40:50 +0900

> A KASAN:use-after-free bug was found related to ip6-erspan
> while running selftests/net/ip6_gre_headroom.sh
> 
> It happens because of following sequence:
> - ipv6hdr pointer is obtained from skb
> - skb_cow_head() is called, skb->head memory is reallocated
> - old data is accessed using ipv6hdr pointer
> 
> skb_cow_head() call was added in e41c7c68ea77 ("ip6erspan: make sure
> enough headroom at xmit."), but looking at the history there was a
> chance of similar bug because gre_handle_offloads() and pskb_trim()
> can also reallocate skb->head memory. Fixes tag points to commit
> which introduced possibility of this bug.
> 
> This patch moves ipv6hdr pointer assignment after skb_cow_head() call.
> 
> Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
> Signed-off-by: Prashant Bhole 

This bug goes back to 4.16, therefore applied to 'net' and queued up
for -stable.

Please do not submit bug fixes against 'net-next' in this situation
in the future.

Thanks.


Re: [PATCH net] tun: Fix use-after-free on XDP_TX

2018-07-16 Thread David Miller
From: Toshiaki Makita 
Date: Fri, 13 Jul 2018 13:24:38 +0900

> On XDP_TX we need to free up the frame only when tun_xdp_tx() returns a
> negative value. A positive value indicates that the packet is
> successfully enqueued to the ptr_ring, so freeing the page causes
> use-after-free.
> 
> Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking")
> Signed-off-by: Toshiaki Makita 

Applied, thank you.


Re: [PATCH] liquidio: Use %pad printk format for dma_addr_t values

2018-07-16 Thread David Miller
From: Helge Deller 
Date: Thu, 12 Jul 2018 22:36:29 +0200

> Use the existing %pad printk format to print dma_addr_t values.
> This avoids the following warnings when compiling on the parisc platform:
> 
> warning: format '%llx' expects argument of type 'long long unsigned int', but 
> argument 2 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]
> 
> Signed-off-by: Helge Deller 

Applied.


Re: [PATCH net-next] net: phy: realtek: add missing entry for RTL8211C to mdio_device_id table

2018-07-16 Thread David Miller
From: Heiner Kallweit 
Date: Thu, 12 Jul 2018 21:45:08 +0200

> Add missing entry for RTL8211C to mdio_device_id table.
> 
> Signed-off-by: Heiner Kallweit 
> Fixes: cf87915cb9f8 ("net: phy: realtek: add support for RTL8211C")

Applied.


Re: [PATCH net-next v2 0/2] net: phy: add functionality to speed down PHY when waiting for WoL packet

2018-07-16 Thread David Miller
From: Heiner Kallweit 
Date: Thu, 12 Jul 2018 21:30:19 +0200

> Some network drivers include functionality to speed down the PHY when
> suspending and just waiting for a WoL packet because this saves energy.
> 
> This patch is based on our recent discussion about factoring out this
> functionality to phylib. First user will be the r8169 driver.
> 
> v2:
> - add warning comment to phy_speed_down regarding usage of sync = false
> - remove sync parameter from phy_speed_up

Series applied, thank you.


Re: [PATCH net-next] selftests: tls: add selftests for TLS sockets

2018-07-16 Thread David Miller
From: Dave Watson 
Date: Thu, 12 Jul 2018 10:59:20 -0700

> Add selftests for tls socket.  Tests various iov and message options,
> poll blocking and nonblocking behavior, partial message sends / receives,
>  and control message data.  Tests should pass regardless of if TLS
> is enabled in the kernel or not, and print a warning message if not.
> 
> Signed-off-by: Dave Watson 

This is great, thanks Dave!

Applied to net-next.


Re: [PATCH net] tls: Stricter error checking in zerocopy sendmsg path

2018-07-16 Thread David Miller
From: Dave Watson 
Date: Thu, 12 Jul 2018 08:03:43 -0700

> In the zerocopy sendmsg() path, there are error checks to revert
> the zerocopy if we get any error code.  syzkaller has discovered
> that tls_push_record can return -ECONNRESET, which is fatal, and
> happens after the point at which it is safe to revert the iter,
> as we've already passed the memory to do_tcp_sendpages.
> 
> Previously this code could return -ENOMEM and we would want to
> revert the iter, but AFAIK this no longer returns ENOMEM after
> a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"),
> so we fail for all error codes.
> 
> Reported-by: syzbot+c226690f7b3126c5e...@syzkaller.appspotmail.com
> Reported-by: syzbot+709f2810a6a05f11d...@syzkaller.appspotmail.com
> Signed-off-by: Dave Watson 
> Fixes: 3c4d7559159b ("tls: kernel TLS support")

Applied and queued up for -stable, thanks Dave.


[net-next:master 717/734] drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast to restricted __be64

2018-07-16 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   568a74d491124c720e604ed3265722f969a5fb38
commit: afd3baaa938ce85dc738cd9279716cdb684cc707 [717/734] net/mlx5e: TLS, add 
software statistics
reproduce:
# apt-get install sparse
git checkout afd3baaa938ce85dc738cd9279716cdb684cc707
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast 
>> to restricted __be64
   drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:174:52: sparse: 
incorrect type in argument 2 (different base types) @@expected unsigned int 
[unsigned] [usertype] handle @@got ed int [unsigned] [usertype] handle @@
   drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:174:52:expected 
unsigned int [unsigned] [usertype] handle
   drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:174:52:got 
restricted __be32 [usertype] handle

vim +173 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c

   162  
   163  static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock 
*sk,
   164  u32 seq, u64 rcd_sn)
   165  {
   166  struct tls_context *tls_ctx = tls_get_ctx(sk);
   167  struct mlx5e_priv *priv = netdev_priv(netdev);
   168  struct mlx5e_tls_offload_context_rx *rx_ctx;
   169  
   170  rx_ctx = mlx5e_get_tls_rx_context(tls_ctx);
   171  
   172  netdev_info(netdev, "resyncing seq %d rcd %lld\n", seq,
 > 173  be64_to_cpu(rcd_sn));
   174  mlx5_accel_tls_resync_rx(priv->mdev, rx_ctx->handle, seq, 
rcd_sn);
   175  atomic64_inc(>tls->sw_stats.rx_tls_resync_reply);
   176  }
   177  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


RE: tc mqprio offload command error

2018-07-16 Thread Chopra, Manish
> -Original Message-
> From: Jesus Sanchez-Palencia 
> Sent: Monday, July 16, 2018 11:28 PM
> To: Alexander Duyck ; Chopra, Manish
> 
> Cc: Stephen Hemminger ; David Miller
> ; Jiri Pirko ;
> netdev@vger.kernel.org
> Subject: Re: tc mqprio offload command error
> 
> External Email
> 
> Hi,
> 
> 
> On 07/16/2018 10:20 AM, Alexander Duyck wrote:
> > On Sun, Jul 15, 2018 at 6:30 PM, Chopra, Manish
> >  wrote:
> >> Hello Folks,
> >>
> >> I am trying to set below command to try mqprio offload on 4.18 kernel. It
> is throwing the flowing error.
> >>
> >> # tc qdisc add dev eth0 root mqprio num_tc 2 map 1 1 1 1 0 0 0 0
> >> RTNETLINK answers: Numerical result out of range
> >>
> >> I can't really make out what's wrong with the above command, since this
> works fine with other OS kernels.
> >> Any thoughts if it is something broken on upstream kernel ?
> >>
> >> Thanks,
> >> Manish
> >
> > You might need to specify the traffic class for the 8 remaining
> > priorities. The full map size is 16 entries, not just 8. The default
> > value for the last 4 mapping entries is TC 3 which would be out of
> > range if you only have 2 TCs specified.
> 
> 
> In addition to that, you might hit the same bug we brought up [1] a while
> ago.
> If that is the case, a fix was just proposed here [2]. Note that other qdiscs
> might be broken as well, but we could only spot the issue with mqprio and
> netem so far.
> 
> [1] https://patchwork.ozlabs.org/patch/867860/#1893405
> [2] https://patchwork.ozlabs.org/patch/944565/
> 
> 

Issue is same with all of 16 prio-tc map supplied -

# tc qdisc add dev eth0 root mqprio num_tc 4 map 1 1 1 1 0 0 0 0 2 2 2 2 3 3 3 3
RTNETLINK answers: Numerical result out of range

Thanks Jesus, I will try the fix[2] and see.

Regards
-Manish



Re: [PATCH net v2] KEYS: DNS: fix parsing multiple options

2018-07-16 Thread David Miller
From: Eric Biggers 
Date: Wed, 11 Jul 2018 10:46:29 -0700

> From: Eric Biggers 
> 
> My recent fix for dns_resolver_preparse() printing very long strings was
> incomplete, as shown by syzbot which still managed to hit the
> WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:
> 
> precision 50001 too large
> WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0
> 
> The bug this time isn't just a printing bug, but also a logical error
> when multiple options ("#"-separated strings) are given in the key
> payload.  Specifically, when separating an option string into name and
> value, if there is no value then the name is incorrectly considered to
> end at the end of the key payload, rather than the end of the current
> option.  This bypasses validation of the option length, and also means
> that specifying multiple options is broken -- which presumably has gone
> unnoticed as there is currently only one valid option anyway.
> 
> A similar problem also applied to option values, as the kstrtoul() when
> parsing the "dnserror" option will read past the end of the current
> option and into the next option.
> 
> Fix these bugs by correctly computing the length of the option name and
> by copying the option value, null-terminated, into a temporary buffer.
> 
> Reproducer for the WARN_ONCE() that syzbot hit:
> 
> perl -e 'print "#A#", "\0" x 5' | keyctl padd dns_resolver desc @s
> 
> Reproducer for "dnserror" option being parsed incorrectly (expected
> behavior is to fail when seeing the unknown option "foo", actual
> behavior was to read the dnserror value as "1#foo" and fail there):
> 
> perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s
> 
> Reported-by: syzbot 
> Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to 
> be cached [ver #2]")
> Signed-off-by: Eric Biggers 
> ---
> 
> Changed since v1:
> - Also fix parsing the option values, not just option names.

Applied and queued up for -stable.


Re: [PATCHv2 net 0/2] multicast: init as INCLUDE when join SSM INCLUDE group

2018-07-16 Thread David Miller
From: Hangbin Liu 
Date: Tue, 10 Jul 2018 22:41:25 +0800

> Based on RFC3376 5.1 and RFC3810 6.1, we should init as INCLUDE when join SSM
> INCLUDE group. In my first version I only clear the group change record. But
> this is not enough as when a new group join, it will init as EXCLUDE and
> trigger an filter mode change in ip/ip6_mc_add_src(), which will clear all
> source addresses' sf_crcount. This will prevent early joined address sending
> state change records if multi source addresses joined at the same time.
> 
> In this v2 patchset, I fixed it by directly initializing the mode to INCLUDE
> for SSM JOIN_SOURCE_GROUP. I also split the original patch into two separated
> patches for IPv4 and IPv6.
> 
> Test: test by myself and customer.

Series applied, thanks!


Re: tc mqprio offload command error

2018-07-16 Thread Jesus Sanchez-Palencia
Hi,


On 07/16/2018 10:20 AM, Alexander Duyck wrote:
> On Sun, Jul 15, 2018 at 6:30 PM, Chopra, Manish
>  wrote:
>> Hello Folks,
>>
>> I am trying to set below command to try mqprio offload on 4.18 kernel. It is 
>> throwing the flowing error.
>>
>> # tc qdisc add dev eth0 root mqprio num_tc 2 map 1 1 1 1 0 0 0 0
>> RTNETLINK answers: Numerical result out of range
>>
>> I can't really make out what's wrong with the above command, since this 
>> works fine with other OS kernels.
>> Any thoughts if it is something broken on upstream kernel ?
>>
>> Thanks,
>> Manish
> 
> You might need to specify the traffic class for the 8 remaining
> priorities. The full map size is 16 entries, not just 8. The default
> value for the last 4 mapping entries is TC 3 which would be out of
> range if you only have 2 TCs specified.


In addition to that, you might hit the same bug we brought up [1] a while ago.
If that is the case, a fix was just proposed here [2]. Note that other qdiscs
might be broken as well, but we could only spot the issue with mqprio and netem
so far.

[1] https://patchwork.ozlabs.org/patch/867860/#1893405
[2] https://patchwork.ozlabs.org/patch/944565/


Regards,
Jesus


> 
> - Alex
> 


[PATCH bpf-next 2/2] tools: bpftool: don't pass FEATURES_DUMP to libbpf

2018-07-16 Thread Jakub Kicinski
bpftool does not export features it probed for, i.e.
FEATURE_DUMP_EXPORT is always empty, so don't try to communicate
the features to libbpf.  It has no effect.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 tools/bpf/bpftool/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 6c4830e18879..74288a2197ab 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -26,7 +26,7 @@ LIBBPF = $(BPF_PATH)libbpf.a
 BPFTOOL_VERSION := $(shell make --no-print-directory -sC ../../.. 
kernelversion)
 
 $(LIBBPF): FORCE
-   $(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a 
FEATURES_DUMP=$(FEATURE_DUMP_EXPORT)
+   $(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a
 
 $(LIBBPF)-clean:
$(call QUIET_CLEAN, libbpf)
-- 
2.17.1



[PATCH v1 iproute2] tc: Do not use addattr_nest_compat on mqprio and netem

2018-07-16 Thread Jesus Sanchez-Palencia
Here we are partially reverting commit c14f9d92eee107
"treewide: Use addattr_nest()/addattr_nest_end() to handle nested
attributes" .

As discussed in [1], changing from the 'manually' coded version that
used addattr_l() to addattr_nest_compat() wasn't functionally
equivalent, because now the messages have extra fields appended to it.

This introduced a regression since the implementation of parse_attr()
from both mqprio and netem can't handle this new message format.

Without this fix, mqprio returns an error. netem won't return an error
but its internal configuration ends up wrong.

As an example, this can be reproduced by the following commands when
this patch is not applied:

 1) mqprio
$ tc qdisc replace dev enp3s0 parent root handle 100 mqprio \
num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 hw 0

RTNETLINK answers: Numerical result out of range

 2) netem
$ tc qdisc add dev enp3s0 root netem rate 5kbit 20 100 5 \
distribution normal latency 1 1

$ tc -s qdisc

(...)
qdisc netem 8001: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us
 Sent 402 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

With this patch applied, the tc -s qdisc command above for netem instead
reads:

(...)
qdisc netem 8002: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us \
rate 5Kbit packetoverhead 20 cellsize 100 celloverhead 5
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

[1] https://patchwork.ozlabs.org/patch/867860/#1893405

Fixes: c14f9d92eee107 ("treewide: Use addattr_nest()/addattr_nest_end() to 
handle nested attributes")
Reported-by: Vinicius Costa Gomes 
Signed-off-by: Jesus Sanchez-Palencia 
---
 tc/q_mqprio.c | 5 +++--
 tc/q_netem.c  | 7 +--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/tc/q_mqprio.c b/tc/q_mqprio.c
index 207d6441..89b46002 100644
--- a/tc/q_mqprio.c
+++ b/tc/q_mqprio.c
@@ -173,7 +173,8 @@ static int mqprio_parse_opt(struct qdisc_util *qu, int argc,
argc--; argv++;
}
 
-   tail = addattr_nest_compat(n, 1024, TCA_OPTIONS, , sizeof(opt));
+   tail = NLMSG_TAIL(n);
+   addattr_l(n, 1024, TCA_OPTIONS, , sizeof(opt));
 
if (flags & TC_MQPRIO_F_MODE)
addattr_l(n, 1024, TCA_MQPRIO_MODE,
@@ -208,7 +209,7 @@ static int mqprio_parse_opt(struct qdisc_util *qu, int argc,
addattr_nest_end(n, start);
}
 
-   addattr_nest_compat_end(n, tail);
+   tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail;
 
return 0;
 }
diff --git a/tc/q_netem.c b/tc/q_netem.c
index 623ec903..9f9a9b3d 100644
--- a/tc/q_netem.c
+++ b/tc/q_netem.c
@@ -422,6 +422,8 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, 
char **argv,
}
}
 
+   tail = NLMSG_TAIL(n);
+
if (reorder.probability) {
if (opt.latency == 0) {
fprintf(stderr, "reordering not possible without 
specifying some delay\n");
@@ -450,7 +452,8 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, 
char **argv,
return -1;
}
 
-   tail = addattr_nest_compat(n, 1024, TCA_OPTIONS, , sizeof(opt));
+   if (addattr_l(n, 1024, TCA_OPTIONS, , sizeof(opt)) < 0)
+   return -1;
 
if (present[TCA_NETEM_CORR] &&
addattr_l(n, 1024, TCA_NETEM_CORR, , sizeof(cor)) < 0)
@@ -509,7 +512,7 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, 
char **argv,
return -1;
free(dist_data);
}
-   addattr_nest_compat_end(n, tail);
+   tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
return 0;
 }
 
-- 
2.18.0



[PATCH bpf-next 1/2] tools: libbpf: remove libelf-getphdrnum feature detection

2018-07-16 Thread Jakub Kicinski
libbpf does not depend on libelf-getphdrnum feature, don't check it.

$ git grep HAVE_ELF_GETPHDRNUM_SUPPORT
tools/perf/Makefile.config:CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT
tools/perf/util/symbol-elf.c:#ifndef HAVE_ELF_GETPHDRNUM_SUPPORT

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 tools/lib/bpf/Makefile | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 7a8e4c98ef1a..d49902e818b5 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -66,7 +66,7 @@ ifndef VERBOSE
 endif
 
 FEATURE_USER = .libbpf
-FEATURE_TESTS = libelf libelf-getphdrnum libelf-mmap bpf reallocarray
+FEATURE_TESTS = libelf libelf-mmap bpf reallocarray
 FEATURE_DISPLAY = libelf bpf
 
 INCLUDES = -I. -I$(srctree)/tools/include 
-I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi 
-I$(srctree)/tools/perf
@@ -116,10 +116,6 @@ ifeq ($(feature-libelf-mmap), 1)
   override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
 endif
 
-ifeq ($(feature-libelf-getphdrnum), 1)
-  override CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT
-endif
-
 ifeq ($(feature-reallocarray), 0)
   override CFLAGS += -DCOMPAT_NEED_REALLOCARRAY
 endif
-- 
2.17.1



[PATCH bpf-next 0/2] tools: bpf: build cleanups

2018-07-16 Thread Jakub Kicinski
Hi!

While tracking down the perf vs libbpf vs reallocarray build issue
I noticed libbpf is checking for a feature it never uses and that
bpftool's makefile attempt to reuse feature dump doesn't really
make sense.

Jakub Kicinski (2):
  tools: libbpf: remove libelf-getphdrnum feature detection
  tools: bpftool: don't pass FEATURES_DUMP to libbpf

 tools/bpf/bpftool/Makefile | 2 +-
 tools/lib/bpf/Makefile | 6 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

-- 
2.17.1



[net-next:master 716/721] drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:329:66: sparse: incorrect type in argument 6 (different base types)

2018-07-16 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   aea06eb276d99590f400c877ca2bd74b4db91330
commit: 00aebab27c8752c7420dce286270ccedc70ac39a [716/721] net/mlx5e: TLS, add 
Innova TLS rx data path
reproduce:
# apt-get install sparse
git checkout 00aebab27c8752c7420dce286270ccedc70ac39a
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:329:66: sparse: 
>> incorrect type in argument 6 (different base types) @@expected unsigned 
>> short const [unsigned] [usertype] hnum @@got  const [unsigned] 
>> [usertype] hnum @@
   drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:329:66:
expected unsigned short const [unsigned] [usertype] hnum
   drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:329:66:got 
restricted __be16 [usertype] dest
>> include/net/tls.h:435:47: sparse: cast from restricted __be32

vim +329 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c

   302  
   303  static int tls_update_resync_sn(struct net_device *netdev,
   304  struct sk_buff *skb,
   305  struct mlx5e_tls_metadata *mdata)
   306  {
   307  struct sock *sk = NULL;
   308  struct iphdr *iph;
   309  struct tcphdr *th;
   310  __be32 seq;
   311  
   312  if (mdata->ethertype != htons(ETH_P_IP))
   313  return -EINVAL;
   314  
   315  iph = (struct iphdr *)(mdata + 1);
   316  
   317  th = ((void *)iph) + iph->ihl * 4;
   318  
   319  if (iph->version == 4) {
   320  sk = inet_lookup_established(dev_net(netdev), 
_hashinfo,
   321   iph->saddr, th->source, 
iph->daddr,
   322   th->dest, netdev->ifindex);
   323  #if IS_ENABLED(CONFIG_IPV6)
   324  } else {
   325  struct ipv6hdr *ipv6h = (struct ipv6hdr *)iph;
   326  
   327  sk = __inet6_lookup_established(dev_net(netdev), 
_hashinfo,
   328  >saddr, 
th->source,
 > 329  >daddr, th->dest,
   330  netdev->ifindex, 0);
   331  #endif
   332  }
   333  if (!sk || sk->sk_state == TCP_TIME_WAIT)
   334  goto out;
   335  
   336  skb->sk = sk;
   337  skb->destructor = sock_edemux;
   338  
   339  memcpy(, >content.recv.sync_seq, sizeof(seq));
   340  tls_offload_rx_resync_request(sk, seq);
   341  out:
   342  return 0;
   343  }
   344  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: tc mqprio offload command error

2018-07-16 Thread Alexander Duyck
On Sun, Jul 15, 2018 at 6:30 PM, Chopra, Manish
 wrote:
> Hello Folks,
>
> I am trying to set below command to try mqprio offload on 4.18 kernel. It is 
> throwing the flowing error.
>
> # tc qdisc add dev eth0 root mqprio num_tc 2 map 1 1 1 1 0 0 0 0
> RTNETLINK answers: Numerical result out of range
>
> I can't really make out what's wrong with the above command, since this works 
> fine with other OS kernels.
> Any thoughts if it is something broken on upstream kernel ?
>
> Thanks,
> Manish

You might need to specify the traffic class for the 8 remaining
priorities. The full map size is 16 entries, not just 8. The default
value for the last 4 mapping entries is TC 3 which would be out of
range if you only have 2 TCs specified.

- Alex


Re: [PATCH v2 net] net/ipv6: Do not allow device only routes via the multipath API

2018-07-16 Thread David Ahern
On 7/16/18 10:09 AM, Eric Dumazet wrote:
> Yes, I guess we have no real choice for the moment.

It is unfortunate that we are forever stuck with this mess from a short
sighted implementation years ago. From a uapi perspective, dev-only
nexthops and proper add-to/append/replace semantics should have been a
part of the code from the beginning.


Re: [RFC net-next v1 1/1] net/sched: Introduce the taprio scheduler

2018-07-16 Thread Vinicius Costa Gomes
Hi Jiri,

Jiri Pirko  writes:

[...]

>>
>>gates.sched
>
> Any particular reason this has to be in file and not on the cmdline?

The idea here was to keep longer schedules more manageable. And during
testing I found it more ergonomic to have a file.

It also has the advantage that the file can be reused by other tools,
dump-classifier (awful name, I admit), included in that github gist, is
one example, it uses the schedule (and some more information) to
calculate which packets would fall outside their "windows" in a pcap
dump.

Anyway, if there are use cases that having the schedule in the command
line helps, I would be happy to add it.


Cheers,
--
Vinicius


Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-07-16 Thread Steve Wise
Hey Max:


On 7/16/2018 11:46 AM, Max Gurtovoy wrote:
>
>
> On 7/16/2018 5:59 PM, Sagi Grimberg wrote:
>>
>>> Hi,
>>> I've tested this patch and seems problematic at this moment.
>>
>> Problematic how? what are you seeing?
>
> Connection failures and same error Steve saw:
>
> [Mon Jul 16 16:19:11 2018] nvme nvme0: Connect command failed, error
> wo/DNR bit: -16402
> [Mon Jul 16 16:19:11 2018] nvme nvme0: failed to connect queue: 2 ret=-18
>
>
>>
>>> maybe this is because of the bug that Steve mentioned in the NVMe
>>> mailing list. Sagi mentioned that we should fix it in the NVMe/RDMA
>>> initiator and I'll run his suggestion as well.
>>
>> Is your device irq affinity linear?
>
> When it's linear and the balancer is stopped the patch works.
>
>>
>>> BTW, when I run the blk_mq_map_queues it works for every irq affinity.
>>
>> But its probably not aligned to the device vector affinity.
>
> but I guess it's better in some cases.
>
> I've checked the situation before Leon's patch and set all the vetcors
> to CPU 0. In this case (I think that this was the initial report by
> Steve), we use the affinity_hint (Israel's and Saeed's patches were we
> use dev->priv.irq_info[vector].mask) and it worked fine.
>
> Steve,
> Can you share your configuration (kernel, HCA, affinity map, connect
> command, lscpu) ?
> I want to repro it in my lab.
>

- linux-4.18-rc1 + the nvme/nvmet inline_data_size patches + patches to
enable ib_get_vector_affinity() in cxgb4 + sagi's patch + leon's mlx5
patch so I can change the affinity via procfs. 

- mlx5 MT27700 RoCE card, cxgb4 T62100-CR iWARP card

- The system has 2 numa nodes with 8 real cpus in each == 16 cpus all
online.  HT disabled.

- i'm testing over HW loopback for simplicity, so the node is both the
nvme target and host.  Connecting one device like this: nvme connect -t
rdma -a 172.16.2.1 -n nvme-nullb0

- to reproduce the nvme-rdma bug, just map any two hca cq comp vectors
to the same cpu. 

- lscpu output:

[root@stevo1 linux]# lscpu
Architecture:  x86_64
CPU op-mode(s):    32-bit, 64-bit
Byte Order:    Little Endian
CPU(s):    16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    8
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:    6
Model: 45
Model name:    Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz
Stepping:  7
CPU MHz:   3400.057
CPU max MHz:   3800.
CPU min MHz:   1200.
BogoMIPS:  6200.10
Virtualization:    VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  20480K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2
x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti
tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts

Steve




Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-16 Thread Matteo Croce
On Tue, Jul 10, 2018 at 6:31 PM Pravin Shelar  wrote:
>
> On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce  wrote:
> > From: Stefano Brivio 
> >
> > Open vSwitch sends to userspace all received packets that have
> > no associated flow (thus doing an "upcall"). Then the userspace
> > program creates a new flow and determines the actions to apply
> > based on its configuration.
> >
> > When a single port generates a high rate of upcalls, it can
> > prevent other ports from dispatching their own upcalls. vswitchd
> > overcomes this problem by creating many netlink sockets for each
> > port, but it quickly exceeds any reasonable maximum number of
> > open files when dealing with huge amounts of ports.
> >
> > This patch queues all the upcalls into a list, ordering them in
> > a per-port round-robin fashion, and schedules a deferred work to
> > queue them to userspace.
> >
> > The algorithm to queue upcalls in a round-robin fashion,
> > provided by Stefano, is based on these two rules:
> >  - upcalls for a given port must be inserted after all the other
> >occurrences of upcalls for the same port already in the queue,
> >in order to avoid out-of-order upcalls for a given port
> >  - insertion happens once the highest upcall count for any given
> >port (excluding the one currently at hand) is greater than the
> >count for the port we're queuing to -- if this condition is
> >never true, upcall is queued at the tail. This results in a
> >per-port round-robin order.
> >
> > In order to implement a fair round-robin behaviour, a variable
> > queueing delay is introduced. This will be zero if the upcalls
> > rate is below a given threshold, and grows linearly with the
> > queue utilisation (i.e. upcalls rate) otherwise.
> >
> > This ensures fairness among ports under load and with few
> > netlink sockets.
> >
> Thanks for the patch.
> This patch is adding following overhead for upcall handling:
> 1. kmalloc.
> 2. global spin-lock.
> 3. context switch to single worker thread.
> I think this could become bottle neck on most of multi core systems.
> You have mentioned issue with existing fairness mechanism, Can you
> elaborate on those, I think we could improve that before implementing
> heavy weight fairness in upcall handling.

Hi Pravin,

vswitchd allocates N * P netlink sockets, where N is the number of
online CPU cores, and P the number of ports.
With some setups, this number can grow quite fast, also exceeding the
system maximum file descriptor limit.
I've seen a 48 core server failing with -EMFILE when trying to create
more than 65535 netlink sockets needed for handling 1800+ ports.

I made a previous attempt to reduce the sockets to one per CPU, but
this was discussed and rejected on ovs-dev because it would remove
fairness among ports[1].
I think that the current approach of opening a huge number of sockets
doesn't really work, (it doesn't scale for sure), it still needs some
queueing logic (either in kernel or user space) if we really want to
be sure that low traffic ports gets their upcalls quota when other
ports are doing way more traffic.

If you are concerned about the kmalloc or spinlock, we can solve them
with kmem_cache or two copies of the list and rcu, I'll happy to
discuss the implementation details, as long as we all agree that the
current implementation doesn't scale well and has an issue.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344279.html

--
Matteo Croce
per aspera ad upstream


Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-07-16 Thread Max Gurtovoy




On 7/16/2018 5:59 PM, Sagi Grimberg wrote:



Hi,
I've tested this patch and seems problematic at this moment.


Problematic how? what are you seeing?


Connection failures and same error Steve saw:

[Mon Jul 16 16:19:11 2018] nvme nvme0: Connect command failed, error 
wo/DNR bit: -16402

[Mon Jul 16 16:19:11 2018] nvme nvme0: failed to connect queue: 2 ret=-18




maybe this is because of the bug that Steve mentioned in the NVMe 
mailing list. Sagi mentioned that we should fix it in the NVMe/RDMA 
initiator and I'll run his suggestion as well.


Is your device irq affinity linear?


When it's linear and the balancer is stopped the patch works.




BTW, when I run the blk_mq_map_queues it works for every irq affinity.


But its probably not aligned to the device vector affinity.


but I guess it's better in some cases.

I've checked the situation before Leon's patch and set all the vetcors 
to CPU 0. In this case (I think that this was the initial report by 
Steve), we use the affinity_hint (Israel's and Saeed's patches were we 
use dev->priv.irq_info[vector].mask) and it worked fine.


Steve,
Can you share your configuration (kernel, HCA, affinity map, connect 
command, lscpu) ?

I want to repro it in my lab.

-Max.


Re: [PATCH v3 net-next 0/3] rds: IPv6 support

2018-07-16 Thread Sowmini Varadhan


-  Looks like rds_connect() is checking things in the right order (thanks)
   However, rds_cancel_sent_to is still looking at the len to figure
   out the family.. as we move to ipv6,  it would be better if we allow
   the caller to specify struct sockaddr_storage, or even a union of
   sockaddr_in/sockaddr_in6, rather than require them to hint at which 
   one of ipv4/ipv6 through the optlen.

   Please see __sys_connect and move_addr_to_kernel if the user-kernel
   copy is the reason you are not doing this. Similar to inet_dgram_connect
   you can then check the sa_family and use that to figure out the
   "Assume IPv4" etc stuff.

   This would also make the CANCEL_SEND_TO API consistent with the bind/
   connect etc semantics.
   
-  net/rds/rds.h: thanks for moving RDS_CM_PORT to the rdma specific file.

   I am guessing (?) that you want to update the comment to talk about
   the non-existent "RDS over UDP" based on the title of the IANA registration?
   I would just like to re-iterate that this is actually inaccurate
   (and confusing to someone looking at this for the first time, since
   there is no RDS-over-UDP today). If it were up to me, I would update
   the comment to say

/* The following ports, 16385, 18634, 18635, are registered with IANA as
 * the ports to be used for "RDS over TCP and UDP".
 * The current linux implementation supports RDS over TCP and IB, and uses
 * the ports as follows: 18634 is the historical value used for the
 * RDMA_CM listener port.  RDS/TCP uses port 16385.  After
 * IPv6 work, RDMA_CM also uses 16385 as the listener port.  18634 is kept
 * to ensure compatibility with older RDS modules.  Those ports are defined
 * in each transport's header file.

IMHO that makes the comment look a little less odd (I've already explained
to you why RDS-over-UDP does not make much practical sense for the RDS
use-cases we anticipate). YMMV.

Thanks,

--Sowmini


Re: [PATCH v2 net] net/ipv6: Do not allow device only routes via the multipath API

2018-07-16 Thread Eric Dumazet



On 07/15/2018 09:35 AM, dsah...@kernel.org wrote:
> From: David Ahern 
> 
> Eric reported that reverting the patch that fixed and simplified IPv6
> multipath routes means reverting back to invalid userspace notifications.
> eg.,
> $ ip -6 route add 2001:db8:1::/64 nexthop dev eth0 nexthop dev eth1
> 
> only generates a single notification:
> 2001:db8:1::/64 dev eth0 metric 1024 pref medium
> 
> While working on a fix for this problem I found another case that is just
> broken completely - a multipath route with a gateway followed by device
> followed by gateway:
> $ ip -6 ro add 2001:db8:103::/64
>   nexthop via 2001:db8:1::64
>   nexthop dev dummy2
>   nexthop via 2001:db8:3::64
> 
> In this case the device only route is dropped completely - no notification
> to userpsace but no addition to the FIB either:
> 
> $ ip -6 ro ls
> 2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
> 2001:db8:2::/64 dev dummy2 proto kernel metric 256 pref medium
> 2001:db8:3::/64 dev dummy3 proto kernel metric 256 pref medium
> 2001:db8:103::/64 metric 1024
>   nexthop via 2001:db8:1::64 dev dummy1 weight 1
>   nexthop via 2001:db8:3::64 dev dummy3 weight 1 pref medium
> fe80::/64 dev dummy1 proto kernel metric 256 pref medium
> fe80::/64 dev dummy2 proto kernel metric 256 pref medium
> fe80::/64 dev dummy3 proto kernel metric 256 pref medium
> 
> Really, IPv6 multipath is just FUBAR'ed beyond repair when it comes to
> device only routes, so do not allow it all.
> 
> This change will break any scripts relying on the mpath api for insert,
> but I don't see any other way to handle the permutations. Besides, since
> the routes are added to the FIB as standalone (non-multipath) routes the
> kernel is not doing what the user requested, so it might as well tell the
> user that.

Yes, I guess we have no real choice for the moment.

Thanks David

Reviewed-by: Eric Dumazet 




[PATCH net] ibmvnic: Fix error recovery on login failure

2018-07-16 Thread John Allen
Testing has uncovered a failure case that is not handled properly. In the
event that a login fails and we are not able to recover on the spot, we
return 0 from do_reset, preventing any error recovery code from being
triggered.  Additionally, the state is set to "probed" meaning that when we
are able to trigger the error recovery, the driver always comes up in the
probed state. To handle the case properly, we need to return a failure code
here and set the adapter state to the state that we entered the reset in
indicating the state that we would like to come out of the recovery reset
in.

Signed-off-by: John Allen 
---
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index d0e196b..c1e23bb 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1825,8 +1825,8 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 
rc = ibmvnic_login(netdev);
if (rc) {
-   adapter->state = VNIC_PROBED;
-   return 0;
+   adapter->state = reset_state;
+   return rc;
}
 
if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM ||



[net-next:master 715/721] drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:172:52: sparse: incorrect type in argument 2 (different base types)

2018-07-16 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   aea06eb276d99590f400c877ca2bd74b4db91330
commit: ca942c78f3237e09567d80ac19dffe9690c74d79 [715/721] net/mlx5e: TLS, add 
innova rx support
reproduce:
# apt-get install sparse
git checkout ca942c78f3237e09567d80ac19dffe9690c74d79
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:172:52: sparse: 
>> incorrect type in argument 2 (different base types) @@expected unsigned 
>> int [unsigned] [usertype] handle @@got ed int [unsigned] [usertype] 
>> handle @@
   drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:172:52:expected 
unsigned int [unsigned] [usertype] handle
   drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:172:52:got 
restricted __be32 [usertype] handle

vim +172 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c

   162  
   163  static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock 
*sk,
   164  u32 seq, u64 rcd_sn)
   165  {
   166  struct tls_context *tls_ctx = tls_get_ctx(sk);
   167  struct mlx5e_priv *priv = netdev_priv(netdev);
   168  struct mlx5e_tls_offload_context_rx *rx_ctx;
   169  
   170  rx_ctx = mlx5e_get_tls_rx_context(tls_ctx);
   171  
 > 172  mlx5_accel_tls_resync_rx(priv->mdev, rx_ctx->handle, seq, 
 > rcd_sn);
   173  }
   174  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-07-16 Thread Sagi Grimberg




Hi,
I've tested this patch and seems problematic at this moment.


Problematic how? what are you seeing?

maybe this is because of the bug that Steve mentioned in the NVMe 
mailing list. Sagi mentioned that we should fix it in the NVMe/RDMA 
initiator and I'll run his suggestion as well.


Is your device irq affinity linear?


BTW, when I run the blk_mq_map_queues it works for every irq affinity.


But its probably not aligned to the device vector affinity.


general protection fault in do_raw_spin_unlock

2018-07-16 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:1d4eb636f0ab Add linux-next specific files for 20180716
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1186bf0c40
kernel config:  https://syzkaller.appspot.com/x/.config?x=ea5926dddb0db97a
dashboard link: https://syzkaller.appspot.com/bug?extid=83a25334ef203851dc81
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=179ed0

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+83a25334ef203851d...@syzkaller.appspotmail.com

IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 4.18.0-rc5-next-20180716+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: events p9_poll_workfn
RIP: 0010:debug_spin_unlock kernel/locking/spinlock_debug.c:97 [inline]
RIP: 0010:do_raw_spin_unlock+0x65/0x2f0 kernel/locking/spinlock_debug.c:134
Code: 0a bd 88 48 c7 85 78 ff ff ff b3 8a b5 41 48 c7 45 88 d0 3c 60 81 c7  
02 f1 f1 f1 f1 c7 42 04 04 f2 f2 f2 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48  
89 f8 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9

RSP: 0018:8801d945f288 EFLAGS: 00010047
RAX: dc00 RBX:  RCX: 8770a045
RDX:  RSI: 0001 RDI: 0004
RBP: 8801d945f310 R08: 11003b28be45 R09: ed0035e7bd88
R10: ed0035e7bd88 R11: 8801af3dec43 R12: 
R13: 11003b28be51 R14: 8801d945f2e8 R15: 8801c5811d50
FS:  () GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0072c029 CR3: 0001b19fd000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:159 [inline]
 _raw_spin_unlock_irqrestore+0x27/0xc0 kernel/locking/spinlock.c:184
 spin_unlock_irqrestore include/linux/spinlock.h:384 [inline]
 p9_conn_cancel+0x9b6/0xd30 net/9p/trans_fd.c:208
 p9_poll_mux net/9p/trans_fd.c:620 [inline]
 p9_poll_workfn+0x4b2/0x6d0 net/9p/trans_fd.c:1107
 process_one_work+0xc73/0x1ba0 kernel/workqueue.c:2153
 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
 kthread+0x345/0x410 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace 4d86351f63a12683 ]---
RIP: 0010:debug_spin_unlock kernel/locking/spinlock_debug.c:97 [inline]
RIP: 0010:do_raw_spin_unlock+0x65/0x2f0 kernel/locking/spinlock_debug.c:134
Code: 0a bd 88 48 c7 85 78 ff ff ff b3 8a b5 41 48 c7 45 88 d0 3c 60 81 c7  
02 f1 f1 f1 f1 c7 42 04 04 f2 f2 f2 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48  
89 f8 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9

RSP: 0018:8801d945f288 EFLAGS: 00010047
RAX: dc00 RBX:  RCX: 8770a045
RDX:  RSI: 0001 RDI: 0004
RBP: 8801d945f310 R08: 11003b28be45 R09: ed0035e7bd88
R10: ed0035e7bd88 R11: 8801af3dec43 R12: 
R13: 11003b28be51 R14: 8801d945f2e8 R15: 8801c5811d50
FS:  () GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0072c029 CR3: 0001b19fd000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-07-16 Thread Max Gurtovoy

Hi,
I've tested this patch and seems problematic at this moment.
maybe this is because of the bug that Steve mentioned in the NVMe 
mailing list. Sagi mentioned that we should fix it in the NVMe/RDMA 
initiator and I'll run his suggestion as well.

BTW, when I run the blk_mq_map_queues it works for every irq affinity.

On 7/16/2018 1:30 PM, Leon Romanovsky wrote:

On Mon, Jul 16, 2018 at 01:23:24PM +0300, Sagi Grimberg wrote:

Leon, I'd like to see a tested-by tag for this (at least
until I get some time to test it).


Of course.

Thanks



The patch itself looks fine to me.



-Max.


Re: [PATCH iproute2] ip: add support for seg6local End.BPF action

2018-07-16 Thread Stephen Hemminger
On Mon, 16 Jul 2018 14:47:41 +
Mathieu Xhonneux  wrote:

> This patch adds support for the End.BPF action of the seg6local
> lightweight tunnel. Functions from the BPF lightweight tunnel are
> re-used in this patch. Example:
> 
> $ ip -6 route add fc00::18 encap seg6local action End.BPF obj my_bpf.o
> sec my_func dev eth0
> 
> $ ip -6 route show fc00::18
> fc00::18  encap seg6local action End.BPF my_bpf.o:[my_func] dev eth0
> metric 1024 pref medium
> 
> Signed-off-by: Mathieu Xhonneux 

> ---
>  ip/iproute_lwtunnel.c | 122 
> +-
>  lib/bpf.c |   5 +++
>  2 files changed, 77 insertions(+), 50 deletions(-)
> 
> diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
> index 46a212c8..71c3d8a4 100644
> --- a/ip/iproute_lwtunnel.c
> +++ b/ip/iproute_lwtunnel.c
> @@ -177,6 +177,7 @@ static const char 
> *seg6_action_names[SEG6_LOCAL_ACTION_MAX + 1] = {
>   [SEG6_LOCAL_ACTION_END_S]   = "End.S",
>   [SEG6_LOCAL_ACTION_END_AS]  = "End.AS",
>   [SEG6_LOCAL_ACTION_END_AM]  = "End.AM",
> + [SEG6_LOCAL_ACTION_END_BPF] = "End.BPF",
>  };
>  
>  static const char *format_action_type(int action)
> @@ -250,6 +251,15 @@ static void print_encap_seg6local(FILE *fp, struct 
> rtattr *encap)
>   print_string(PRINT_ANY, "oif",
>"oif %s ", ll_index_to_name(oif));
>   }
> +
> + if (tb[SEG6_LOCAL_BPF]) {
> + struct rtattr *tb_bpf[LWT_BPF_PROG_MAX+1];
> +
> + parse_rtattr_nested(tb_bpf, LWT_BPF_PROG_MAX, 
> tb[SEG6_LOCAL_BPF]);
> +
> + if (tb_bpf[LWT_BPF_PROG_NAME])
> + fprintf(fp, "%s ", 
> rta_getattr_str(tb_bpf[LWT_BPF_PROG_NAME]));
> + }
>  }

Please use print_string to support JSON output.


  1   2   >