Re: [PATCH net-next] liquidio: correct error msg text when removing VLAN ID
From: Felix Manlunas Date: Mon, 16 Jul 2018 18:06:07 -0700 > From: Rick Farrington > > Signed-off-by: Rick Farrington > Signed-off-by: Felix Manlunas Applied.
Re: [PATCH v3 net-next] net/sched: add skbprio scheduler
On Fri, Jul 13, 2018 at 9:51 PM Marcelo Ricardo Leitner wrote: > > Well, it would help if you didn't cut out key parts of my words. Sorry about it, please allow me to copy and paste all of your words here: "Yes, but Michel wants to drop from other lower priorities if needed, and that's not possible if you handle the limit already in a child qdisc as they don't know about their siblings. The idea in the example above is to discard it from whatever lower priority is needed, then queue it. (ok, the example missed to check the priority level)" So from your own words, you agreed "the idea in the example" is not what Michel wants, because "is to discard it from whatever lower priority is needed", as "Michel wants to drop from other lower priorities if needed". You also agreed Michel's requirement is not possible (to implement in sch_prio) because "you handle the limit already in a child qdisc as they don't know about their siblings" is also true. Based on the above, I said it "disproves your point of adding a flag to sch_prio". What am I missing? > > > > > What am I missing here? > > > > Are you go further by suggesting moving the limit out of prio? > > Or are you going to expand your definition of "adding a flag"? > > Perhaps two flags? :) > > > > I am very open for discussion to see how far we can go. > > I am not keen on continuing this discussion if you keep twisting my > words just for fun. No, I am trying to understand seriously about what you suggest here. Please be patient! I know I am stupid :) Thanks!
Re: [PATCH v3 net-next 0/3] rds: IPv6 support
On 07/17/2018 12:20 AM, Sowmini Varadhan wrote: - Looks like rds_connect() is checking things in the right order (thanks) However, rds_cancel_sent_to is still looking at the len to figure out the family.. as we move to ipv6, it would be better if we allow the caller to specify struct sockaddr_storage, or even a union of sockaddr_in/sockaddr_in6, rather than require them to hint at which one of ipv4/ipv6 through the optlen. The app can use either structures to make the call. When the app fills in the structure, it knows what it is filling in, either sockaddr_in or sockaddr_in6. So it knows the right size to use. The app can also use IPv4 mapped address in a sockaddr_in6 without a problem. Please see __sys_connect and move_addr_to_kernel if the user-kernel copy is the reason you are not doing this. Similar to inet_dgram_connect you can then check the sa_family and use that to figure out the "Assume IPv4" etc stuff. This would also make the CANCEL_SEND_TO API consistent with the bind/ connect etc semantics. Could you please explain the inconsistency? An app can use IPv4 mapped address in a sockaddr_in6 to operate on an IPv4 connection, in case you are thinking of this new addition in v3 of the patch. - net/rds/rds.h: thanks for moving RDS_CM_PORT to the rdma specific file. I am guessing (?) that you want to update the comment to talk about the non-existent "RDS over UDP" based on the title of the IANA registration? I would just like to re-iterate that this is actually inaccurate (and confusing to someone looking at this for the first time, since there is no RDS-over-UDP today). If it were up to me, I would update the comment to say /* The following ports, 16385, 18634, 18635, are registered with IANA as * the ports to be used for "RDS over TCP and UDP". * The current linux implementation supports RDS over TCP and IB, and uses * the ports as follows: 18634 is the historical value used for the * RDMA_CM listener port. RDS/TCP uses port 16385. After * IPv6 work, RDMA_CM also uses 16385 as the listener port. 18634 is kept * to ensure compatibility with older RDS modules. Those ports are defined * in each transport's header file. Will update it to /* The following ports, 16385, 18634, 18635, are registered with IANA as * the ports to be used for RDS over TCP and UDP. Currently, only RDS over * TCP and RDS over IB/RDMA are implemented. 18634 is the historical value * used for the RDMA_CM listener port. RDS/TCP uses port 16385. After * IPv6 work, RDMA_CM also uses 16385 as the listener port. 18634 is kept * to ensure compatibility with older RDS modules. Those ports are defined * in each transport's header file. */ -- K. Poon ka-cheong.p...@oracle.com
Re: [PATCH ipsec-next] xfrm: Allow Set Mark to be Updated Using UPDSA
On Mon, 16 Jul 2018 15:27:26 -0700 Nathan Harold wrote: > < re-sent with apologies due to incorrect formatting last > time... :-( > > > Hi Eyal, > > > If x1 points to a state previously found using > > __xfrm_state_locate(x), won't __xfrm_state_bump_genids(x1) be > > equivalent to x1->genid++ in this case? > > In the vanilla case this is true. IE, if there are no strange/abusive > uses of the API such as the test below where multiple SAs can match > the locate(). > > > Is it possible that other states will match all of x1 parameters? > > Yes. Not sure if it's a bug or a feature, but it's possible for > multiple SAs to match... for a depressing example, check out > https://android-review.googlesource.com/c/kernel/tests/+/680958. There > may be cases where something like this is desired behavior that I'm > not aware of. Since this is control path, it felt to me like the > formalism of using the xfrm_state_bump_genids() was worth not possibly > walking into a different subtle bug later. Ok. This is indeed depressing and also unexpected. I wonder if this behavior could be fixed... I'd find it odd if anyone is relying on being to able to delete a 'no mark' state by supplying parameters that do include an explicit mark. I have no idea if anyone is relying on the state insertion order wrt marks - though it would seem odd to me as well -- obviously such a change is unrelated to this patch. I now better understand the need to be cautious. > > > Also, any idea why this isn't needed for other changes in the > > state? > > The set_mark (output_mark) is somewhat special because changing this > mark impacts the routing lookup, which up to now, none of the other > parameters in the update_sa function do. A new output_mark can and > will reroute packets to different interfaces. Thus, when we change > this thing, we want to ensure that we always build a new bundle with a > new bundle with a new route lookup based on the new set_mark. Since we > removed the flow cache, things might *incidentally* seem to work right > now; but, I think that's incidental rather than correct. By bumping > the genid, we get the dst_entry->check() function to correctly return > that the dst is obsolete when we call check(). I'm honestly not sure > what corner cases we could land in if we didn't bump the genid in such > a case. > > There's definitely a lot going on behind the scenes in this little > change that I only tenuously grasp, so it's possible that I'm being > overly cautious in this case. Please let me know your further thoughts > on whether we need to bump the genid. FYI once this patch is settled, > I plan to upload a patch to update the xfrm_if_id, which I planned to > nestle in to this same logic (and with similar, albeit possibly > more-straightforward rationale). Thanks so much for the clarification. Indeed there are nuances here and I appreciate you taking the time to describe them. FWIW you can add my: Reviewed-by: Eyal Birger Thanks! Eyal.
Re: [PATCH bpf-next 6/9] bpf: offload: allow program and map sharing per-ASIC
On Mon, 16 Jul 2018 20:00:47 -0700, Alexei Starovoitov wrote: > On Mon, Jul 16, 2018 at 07:37:20PM -0700, Jakub Kicinski wrote: > > Create a higher-level entity to represent a device/ASIC to allow > > programs and maps to be shared between device ports. The extra > > work is required to make sure we don't destroy BPF objects as > > soon as the netdev for which they were loaded gets destroyed, > > as other ports may still be using them. When netdev goes away > > all of its BPF objects will be moved to other netdevs of the > > device, and only destroyed when last netdev is unregistered. > > > > Signed-off-by: Jakub Kicinski > > Reviewed-by: Quentin Monnet > .. > > -bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map) > > +static bool __bpf_offload_dev_match(struct bpf_prog *prog, > > + struct net_device *netdev) > > { > > - struct bpf_offloaded_map *offmap; > > + struct bpf_offload_netdev *ondev1, *ondev2; > > struct bpf_prog_offload *offload; > > - bool ret; > > > > if (!bpf_prog_is_dev_bound(prog->aux)) > > return false; > > - if (!bpf_map_is_dev_bound(map)) > > - return bpf_map_offload_neutral(map); > > > > - down_read(_devs_lock); > > offload = prog->aux->offload; > > + if (!offload) > > + return false; > > + if (offload->netdev == netdev) > > + return true; > > + > > + ondev1 = bpf_offload_find_netdev(offload->netdev); > > + ondev2 = bpf_offload_find_netdev(netdev); > > + > > + return ondev1 && ondev2 && ondev1->offdev == ondev2->offdev; > > +} > > + > > +bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device > > *netdev) > > +{ > > + bool ret; > > + > > + down_read(_devs_lock); > > + ret = __bpf_offload_dev_match(prog, netdev); > > + up_read(_devs_lock); > > + > > + return ret; > > +} > > +EXPORT_SYMBOL_GPL(bpf_offload_dev_match); > > + > > +bool bpf_offload_match(struct bpf_prog *prog, struct bpf_map *map) > > +{ > > + struct bpf_offloaded_map *offmap; > > + bool ret; > > + > > + if (!bpf_map_is_dev_bound(map)) > > + return bpf_map_offload_neutral(map); > > offmap = map_to_offmap(map); > > > > - ret = offload && offload->netdev == offmap->netdev; > > + down_read(_devs_lock); > > + ret = __bpf_offload_dev_match(prog, offmap->netdev); > > up_read(_devs_lock); > > > > return ret; > > } > > > .. > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 9e2bf834f13a..2c5b923eef75 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -5054,7 +5054,7 @@ static int check_map_prog_compatibility(struct > > bpf_verifier_env *env, > > } > > > > if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) && > > - !bpf_offload_dev_match(prog, map)) { > > + !bpf_offload_match(prog, map)) { > > I'm confused with new names and renaming. > May be split renaming into separate patch? > Should new bpf_offload_match() be called bpf_offload_prog_map_match() ? > or some other name? > May be adding comments to these functions will make it clear... It is messy. The new functions to register/unregister ASIC are called bpf_offload_dev_*, hence it seemed like a good idea to call the function exported to the drivers bpf_offload_dev_match() (see patches 7 and 8) to keep the driver API consistent. But then the old function which is only used by the verivier has to be renamed. I will use bpf_offload_prog_map_match() and split to a separate patch.
Re: [PATCH bpf-next 6/9] bpf: offload: allow program and map sharing per-ASIC
On Mon, Jul 16, 2018 at 07:37:20PM -0700, Jakub Kicinski wrote: > Create a higher-level entity to represent a device/ASIC to allow > programs and maps to be shared between device ports. The extra > work is required to make sure we don't destroy BPF objects as > soon as the netdev for which they were loaded gets destroyed, > as other ports may still be using them. When netdev goes away > all of its BPF objects will be moved to other netdevs of the > device, and only destroyed when last netdev is unregistered. > > Signed-off-by: Jakub Kicinski > Reviewed-by: Quentin Monnet .. > -bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map) > +static bool __bpf_offload_dev_match(struct bpf_prog *prog, > + struct net_device *netdev) > { > - struct bpf_offloaded_map *offmap; > + struct bpf_offload_netdev *ondev1, *ondev2; > struct bpf_prog_offload *offload; > - bool ret; > > if (!bpf_prog_is_dev_bound(prog->aux)) > return false; > - if (!bpf_map_is_dev_bound(map)) > - return bpf_map_offload_neutral(map); > > - down_read(_devs_lock); > offload = prog->aux->offload; > + if (!offload) > + return false; > + if (offload->netdev == netdev) > + return true; > + > + ondev1 = bpf_offload_find_netdev(offload->netdev); > + ondev2 = bpf_offload_find_netdev(netdev); > + > + return ondev1 && ondev2 && ondev1->offdev == ondev2->offdev; > +} > + > +bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev) > +{ > + bool ret; > + > + down_read(_devs_lock); > + ret = __bpf_offload_dev_match(prog, netdev); > + up_read(_devs_lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(bpf_offload_dev_match); > + > +bool bpf_offload_match(struct bpf_prog *prog, struct bpf_map *map) > +{ > + struct bpf_offloaded_map *offmap; > + bool ret; > + > + if (!bpf_map_is_dev_bound(map)) > + return bpf_map_offload_neutral(map); > offmap = map_to_offmap(map); > > - ret = offload && offload->netdev == offmap->netdev; > + down_read(_devs_lock); > + ret = __bpf_offload_dev_match(prog, offmap->netdev); > up_read(_devs_lock); > > return ret; > } > .. > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 9e2bf834f13a..2c5b923eef75 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -5054,7 +5054,7 @@ static int check_map_prog_compatibility(struct > bpf_verifier_env *env, > } > > if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) && > - !bpf_offload_dev_match(prog, map)) { > + !bpf_offload_match(prog, map)) { I'm confused with new names and renaming. May be split renaming into separate patch? Should new bpf_offload_match() be called bpf_offload_prog_map_match() ? or some other name? May be adding comments to these functions will make it clear...
[PATCH bpf-next 5/9] bpf: offload: aggregate offloads per-device
Currently we have two lists of offloaded objects - programs and maps. Netdevice unregister notifier scans those lists to orphan objects associated with device being unregistered. This puts unnecessary (even if negligible) burden on all netdev unregister calls in BPF- -enabled kernel. The lists of objects may potentially get long making the linear scan even more problematic. There haven't been complaints about this mechanisms so far, but it is suboptimal. Instead of relying on notifiers, make the few BPF-capable drivers register explicitly for BPF offloads. The programs and maps will now be collected per-device not on a global list, and only scanned for removal when driver unregisters from BPF offloads. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- drivers/net/ethernet/netronome/nfp/bpf/main.c | 13 ++ drivers/net/netdevsim/bpf.c | 7 + include/linux/bpf.h | 2 + kernel/bpf/offload.c | 142 -- 4 files changed, 118 insertions(+), 46 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index b95b94d008cf..dee039ada75c 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -404,6 +404,16 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app) return -EINVAL; } +static int nfp_bpf_ndo_init(struct nfp_app *app, struct net_device *netdev) +{ + return bpf_offload_dev_netdev_register(netdev); +} + +static void nfp_bpf_ndo_uninit(struct nfp_app *app, struct net_device *netdev) +{ + bpf_offload_dev_netdev_unregister(netdev); +} + static int nfp_bpf_init(struct nfp_app *app) { struct nfp_app_bpf *bpf; @@ -466,6 +476,9 @@ const struct nfp_app_type app_bpf = { .extra_cap = nfp_bpf_extra_cap, + .ndo_init = nfp_bpf_ndo_init, + .ndo_uninit = nfp_bpf_ndo_uninit, + .vnic_alloc = nfp_bpf_vnic_alloc, .vnic_free = nfp_bpf_vnic_free, diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c index 357f9e62f306..c4a2829e0e1f 100644 --- a/drivers/net/netdevsim/bpf.c +++ b/drivers/net/netdevsim/bpf.c @@ -582,6 +582,8 @@ int nsim_bpf(struct net_device *dev, struct netdev_bpf *bpf) int nsim_bpf_init(struct netdevsim *ns) { + int err; + if (ns->sdev->refcnt == 1) { INIT_LIST_HEAD(>sdev->bpf_bound_progs); INIT_LIST_HEAD(>sdev->bpf_bound_maps); @@ -592,6 +594,10 @@ int nsim_bpf_init(struct netdevsim *ns) return -ENOMEM; } + err = bpf_offload_dev_netdev_register(ns->netdev); + if (err) + return err; + debugfs_create_u32("bpf_offloaded_id", 0400, ns->ddir, >bpf_offloaded_id); @@ -625,6 +631,7 @@ void nsim_bpf_uninit(struct netdevsim *ns) WARN_ON(ns->xdp.prog); WARN_ON(ns->xdp_hw.prog); WARN_ON(ns->bpf_offloaded); + bpf_offload_dev_netdev_unregister(ns->netdev); if (ns->sdev->refcnt == 1) { WARN_ON(!list_empty(>sdev->bpf_bound_progs)); diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 8827e797ff97..21c001c3285c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -648,6 +648,8 @@ int bpf_map_offload_delete_elem(struct bpf_map *map, void *key); int bpf_map_offload_get_next_key(struct bpf_map *map, void *key, void *next_key); +int bpf_offload_dev_netdev_register(struct net_device *netdev); +void bpf_offload_dev_netdev_unregister(struct net_device *netdev); bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map); #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL) diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index ac747d5cf7c6..b914f94c53d4 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -18,19 +18,37 @@ #include #include #include +#include #include #include #include +#include #include #include -/* Protects bpf_prog_offload_devs, bpf_map_offload_devs and offload members +/* Protects offdevs, members of bpf_offload_netdev and offload members * of all progs. * RTNL lock cannot be taken when holding this lock. */ static DECLARE_RWSEM(bpf_devs_lock); -static LIST_HEAD(bpf_prog_offload_devs); -static LIST_HEAD(bpf_map_offload_devs); + +struct bpf_offload_netdev { + struct rhash_head l; + struct net_device *netdev; + struct list_head progs; + struct list_head maps; +}; + +static const struct rhashtable_params offdevs_params = { + .nelem_hint = 4, + .key_len= sizeof(struct net_device *), + .key_offset = offsetof(struct bpf_offload_netdev, netdev), + .head_offset= offsetof(struct bpf_offload_netdev, l), + .automatic_shrinking= true, +}; + +static struct
[PATCH bpf-next 7/9] netdevsim: allow program sharing between devices
Allow program sharing between devices which were linked together. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- drivers/net/netdevsim/bpf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c index 9eab29f67a0e..81444208b216 100644 --- a/drivers/net/netdevsim/bpf.c +++ b/drivers/net/netdevsim/bpf.c @@ -294,7 +294,7 @@ nsim_setup_prog_hw_checks(struct netdevsim *ns, struct netdev_bpf *bpf) NSIM_EA(bpf->extack, "xdpoffload of non-bound program"); return -EINVAL; } - if (bpf->prog->aux->offload->netdev != ns->netdev) { + if (!bpf_offload_dev_match(bpf->prog, ns->netdev)) { NSIM_EA(bpf->extack, "program bound to different dev"); return -EINVAL; } -- 2.17.1
[PATCH bpf-next 9/9] selftests/bpf: add test for sharing objects between netdevs
Add tests for sharing programs and maps between different netdevs. Use netdevsim's ability to pretend multiple netdevs belong to the same "ASIC". Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- tools/testing/selftests/bpf/test_offload.py | 146 +++- 1 file changed, 142 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py index ee1abef384ea..d59642e70f56 100755 --- a/tools/testing/selftests/bpf/test_offload.py +++ b/tools/testing/selftests/bpf/test_offload.py @@ -158,8 +158,9 @@ netns = [] # net namespaces to be removed else: return ret, out -def bpftool(args, JSON=True, ns="", fail=True): -return tool("bpftool", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail) +def bpftool(args, JSON=True, ns="", fail=True, include_stderr=False): +return tool("bpftool", args, {"json":"-p"}, JSON=JSON, ns=ns, +fail=fail, include_stderr=include_stderr) def bpftool_prog_list(expected=None, ns=""): _, progs = bpftool("prog show", JSON=True, ns=ns, fail=True) @@ -201,6 +202,21 @@ netns = [] # net namespaces to be removed time.sleep(0.05) raise Exception("Time out waiting for map counts to stabilize want %d, have %d" % (expected, nmaps)) +def bpftool_prog_load(sample, file_name, maps=[], prog_type="xdp", dev=None, + fail=True, include_stderr=False): +args = "prog load %s %s" % (os.path.join(bpf_test_dir, sample), file_name) +if prog_type is not None: +args += " type " + prog_type +if dev is not None: +args += " dev " + dev +if len(maps): +args += " map " + " map ".join(maps) + +res = bpftool(args, fail=fail, include_stderr=include_stderr) +if res[0] == 0: +files.append(file_name) +return res + def ip(args, force=False, JSON=True, ns="", fail=True, include_stderr=False): if force: args = "-force " + args @@ -307,7 +323,9 @@ netns = [] # net namespaces to be removed Class for netdevsim netdevice and its attributes. """ -def __init__(self): +def __init__(self, link=None): +self.link = link + self.dev = self._netdevsim_create() devs.append(self) @@ -321,8 +339,9 @@ netns = [] # net namespaces to be removed return self.dev[key] def _netdevsim_create(self): +link = "" if self.link is None else "link " + self.link.dev['ifname'] _, old = ip("link show") -ip("link add sim%d type netdevsim") +ip("link add sim%d {link} type netdevsim".format(link=link)) _, new = ip("link show") for dev in new: @@ -848,6 +867,25 @@ netns = [] sim.set_mtu(1500) sim.wait_for_flush() +start_test("Test non-offload XDP attaching to HW...") +bpftool_prog_load("sample_ret0.o", "/sys/fs/bpf/nooffload") +nooffload = bpf_pinned("/sys/fs/bpf/nooffload") +ret, _, err = sim.set_xdp(nooffload, "offload", + fail=False, include_stderr=True) +fail(ret == 0, "attached non-offloaded XDP program to HW") +check_extack_nsim(err, "xdpoffload of non-bound program.", args) +rm("/sys/fs/bpf/nooffload") + +start_test("Test offload XDP attaching to drv...") +bpftool_prog_load("sample_ret0.o", "/sys/fs/bpf/offload", + dev=sim['ifname']) +offload = bpf_pinned("/sys/fs/bpf/offload") +ret, _, err = sim.set_xdp(offload, "drv", fail=False, include_stderr=True) +fail(ret == 0, "attached offloaded XDP program to drv") +check_extack(err, "using device-bound program without HW_MODE flag is not supported.", args) +rm("/sys/fs/bpf/offload") +sim.wait_for_flush() + start_test("Test XDP offload...") _, _, err = sim.set_xdp(obj, "offload", verbose=True, include_stderr=True) ipl = sim.ip_link_show(xdp=True) @@ -1141,6 +1179,106 @@ netns = [] fail(ret == 0, "netdevsim didn't refuse to create a map with offload disabled") +sim.remove() + +start_test("Test multi-dev ASIC program reuse...") +simA = NetdevSim() +simB1 = NetdevSim() +simB2 = NetdevSim(link=simB1) +simB3 = NetdevSim(link=simB1) +sims = (simA, simB1, simB2, simB3) +simB = (simB1, simB2, simB3) + +bpftool_prog_load("sample_map_ret0.o", "/sys/fs/bpf/nsimA", + dev=simA['ifname']) +progA = bpf_pinned("/sys/fs/bpf/nsimA") +bpftool_prog_load("sample_map_ret0.o", "/sys/fs/bpf/nsimB", + dev=simB1['ifname']) +progB = bpf_pinned("/sys/fs/bpf/nsimB") + +simA.set_xdp(progA, "offload", JSON=False) +for d in simB: +d.set_xdp(progB, "offload", JSON=False) + +start_test("Test multi-dev ASIC cross-dev replace...") +ret, _ = simA.set_xdp(progB, "offload", force=True, JSON=False, fail=False) +fail(ret == 0, "cross-ASIC program allowed") +for d in simB: +ret, _ = d.set_xdp(progA, "offload",
[PATCH bpf-next 0/9] bpf: offload program and map sharing
Hi! This patchset adds support for sharing BPF objects within one ASIC. This will allow us to reuse of the same program on multiple ports of a device leading to better code store utilization. It also enables sharing maps between programs attached to different ports of a device. Jakub Kicinski (9): netdevsim: add switch_id attribute netdevsim: add shared netdevsim devices netdevsim: associate bound programs with shared dev nfp: add .ndo_init() and .ndo_uninit() callbacks bpf: offload: aggregate offloads per-device bpf: offload: allow program and map sharing per-ASIC netdevsim: allow program sharing between devices nfp: bpf: allow program sharing within ASIC selftests/bpf: add test for sharing objects between netdevs drivers/net/ethernet/netronome/nfp/bpf/main.c | 23 ++ drivers/net/ethernet/netronome/nfp/bpf/main.h | 4 + .../net/ethernet/netronome/nfp/bpf/offload.c | 10 +- drivers/net/ethernet/netronome/nfp/nfp_app.c | 17 ++ drivers/net/ethernet/netronome/nfp/nfp_app.h | 8 + .../ethernet/netronome/nfp/nfp_net_common.c | 2 + .../net/ethernet/netronome/nfp/nfp_net_repr.c | 2 + drivers/net/netdevsim/bpf.c | 50 +++- drivers/net/netdevsim/netdev.c| 103 +++- drivers/net/netdevsim/netdevsim.h | 23 +- include/linux/bpf.h | 10 +- kernel/bpf/offload.c | 223 ++ kernel/bpf/verifier.c | 2 +- tools/testing/selftests/bpf/test_offload.py | 151 +++- 14 files changed, 543 insertions(+), 85 deletions(-) -- 2.17.1
[PATCH bpf-next 6/9] bpf: offload: allow program and map sharing per-ASIC
Create a higher-level entity to represent a device/ASIC to allow programs and maps to be shared between device ports. The extra work is required to make sure we don't destroy BPF objects as soon as the netdev for which they were loaded gets destroyed, as other ports may still be using them. When netdev goes away all of its BPF objects will be moved to other netdevs of the device, and only destroyed when last netdev is unregistered. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- drivers/net/ethernet/netronome/nfp/bpf/main.c | 14 +- drivers/net/ethernet/netronome/nfp/bpf/main.h | 4 + drivers/net/netdevsim/bpf.c | 17 ++- drivers/net/netdevsim/netdevsim.h | 3 + include/linux/bpf.h | 12 +- kernel/bpf/offload.c | 123 ++ kernel/bpf/verifier.c | 2 +- 7 files changed, 142 insertions(+), 33 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index dee039ada75c..458f49235d06 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -406,12 +406,16 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app) static int nfp_bpf_ndo_init(struct nfp_app *app, struct net_device *netdev) { - return bpf_offload_dev_netdev_register(netdev); + struct nfp_app_bpf *bpf = app->priv; + + return bpf_offload_dev_netdev_register(bpf->bpf_dev, netdev); } static void nfp_bpf_ndo_uninit(struct nfp_app *app, struct net_device *netdev) { - bpf_offload_dev_netdev_unregister(netdev); + struct nfp_app_bpf *bpf = app->priv; + + bpf_offload_dev_netdev_unregister(bpf->bpf_dev, netdev); } static int nfp_bpf_init(struct nfp_app *app) @@ -437,6 +441,11 @@ static int nfp_bpf_init(struct nfp_app *app) if (err) goto err_free_neutral_maps; + bpf->bpf_dev = bpf_offload_dev_create(); + err = PTR_ERR_OR_ZERO(bpf->bpf_dev); + if (err) + goto err_free_neutral_maps; + return 0; err_free_neutral_maps: @@ -455,6 +464,7 @@ static void nfp_bpf_clean(struct nfp_app *app) { struct nfp_app_bpf *bpf = app->priv; + bpf_offload_dev_destroy(bpf->bpf_dev); WARN_ON(!skb_queue_empty(>cmsg_replies)); WARN_ON(!list_empty(>map_list)); WARN_ON(bpf->maps_in_use || bpf->map_elems_in_use); diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h index 9845c1a2d4c2..bec935468f90 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h @@ -110,6 +110,8 @@ enum pkt_vec { * struct nfp_app_bpf - bpf app priv structure * @app: backpointer to the app * + * @bpf_dev: BPF offload device handle + * * @tag_allocator: bitmap of control message tags in use * @tag_alloc_next:next tag bit to allocate * @tag_alloc_last:next tag bit to be freed @@ -150,6 +152,8 @@ enum pkt_vec { struct nfp_app_bpf { struct nfp_app *app; + struct bpf_offload_dev *bpf_dev; + DECLARE_BITMAP(tag_allocator, U16_MAX + 1); u16 tag_alloc_next; u16 tag_alloc_last; diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c index c4a2829e0e1f..9eab29f67a0e 100644 --- a/drivers/net/netdevsim/bpf.c +++ b/drivers/net/netdevsim/bpf.c @@ -592,11 +592,16 @@ int nsim_bpf_init(struct netdevsim *ns) debugfs_create_dir("bpf_bound_progs", ns->sdev->ddir); if (IS_ERR_OR_NULL(ns->sdev->ddir_bpf_bound_progs)) return -ENOMEM; + + ns->sdev->bpf_dev = bpf_offload_dev_create(); + err = PTR_ERR_OR_ZERO(ns->sdev->bpf_dev); + if (err) + return err; } - err = bpf_offload_dev_netdev_register(ns->netdev); + err = bpf_offload_dev_netdev_register(ns->sdev->bpf_dev, ns->netdev); if (err) - return err; + goto err_destroy_bdev; debugfs_create_u32("bpf_offloaded_id", 0400, ns->ddir, >bpf_offloaded_id); @@ -624,6 +629,11 @@ int nsim_bpf_init(struct netdevsim *ns) >bpf_map_accept); return 0; + +err_destroy_bdev: + if (ns->sdev->refcnt == 1) + bpf_offload_dev_destroy(ns->sdev->bpf_dev); + return err; } void nsim_bpf_uninit(struct netdevsim *ns) @@ -631,10 +641,11 @@ void nsim_bpf_uninit(struct netdevsim *ns) WARN_ON(ns->xdp.prog); WARN_ON(ns->xdp_hw.prog); WARN_ON(ns->bpf_offloaded); - bpf_offload_dev_netdev_unregister(ns->netdev); + bpf_offload_dev_netdev_unregister(ns->sdev->bpf_dev, ns->netdev); if (ns->sdev->refcnt == 1) {
[PATCH bpf-next 2/9] netdevsim: add shared netdevsim devices
Factor out sharable netdevsim sub-object and use IFLA_LINK to link netdevsims together at creation time. Sharable object will have its own DebugFS directory. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- drivers/net/netdevsim/netdev.c| 87 --- drivers/net/netdevsim/netdevsim.h | 10 +++- 2 files changed, 90 insertions(+), 7 deletions(-) diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index 9125637ef5d8..2d244551298b 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -152,8 +152,8 @@ nsim_port_attr_get(struct net_device *dev, struct switchdev_attr *attr) switch (attr->id) { case SWITCHDEV_ATTR_ID_PORT_PARENT_ID: - attr->u.ppid.id_len = sizeof(ns->switch_id); - memcpy(>u.ppid.id, >switch_id, + attr->u.ppid.id_len = sizeof(ns->sdev->switch_id); + memcpy(>u.ppid.id, >sdev->switch_id, attr->u.ppid.id_len); return 0; default: @@ -167,19 +167,41 @@ static const struct switchdev_ops nsim_switchdev_ops = { static int nsim_init(struct net_device *dev) { + char sdev_ddir_name[10], sdev_link_name[32]; struct netdevsim *ns = netdev_priv(dev); int err; ns->netdev = dev; - ns->switch_id = nsim_dev_id; - ns->ddir = debugfs_create_dir(netdev_name(dev), nsim_ddir); if (IS_ERR_OR_NULL(ns->ddir)) return -ENOMEM; + if (!ns->sdev) { + ns->sdev = kzalloc(sizeof(*ns->sdev), GFP_KERNEL); + if (!ns->sdev) { + err = -ENOMEM; + goto err_debugfs_destroy; + } + ns->sdev->refcnt = 1; + ns->sdev->switch_id = nsim_dev_id; + sprintf(sdev_ddir_name, "%u", ns->sdev->switch_id); + ns->sdev->ddir = debugfs_create_dir(sdev_ddir_name, + nsim_sdev_ddir); + if (IS_ERR_OR_NULL(ns->sdev->ddir)) { + err = PTR_ERR_OR_ZERO(ns->sdev->ddir) ?: -EINVAL; + goto err_sdev_free; + } + } else { + sprintf(sdev_ddir_name, "%u", ns->sdev->switch_id); + ns->sdev->refcnt++; + } + + sprintf(sdev_link_name, "../../" DRV_NAME "_sdev/%s", sdev_ddir_name); + debugfs_create_symlink("sdev", ns->ddir, sdev_link_name); + err = nsim_bpf_init(ns); if (err) - goto err_debugfs_destroy; + goto err_sdev_destroy; ns->dev.id = nsim_dev_id++; ns->dev.bus = _bus; @@ -203,6 +225,12 @@ static int nsim_init(struct net_device *dev) device_unregister(>dev); err_bpf_uninit: nsim_bpf_uninit(ns); +err_sdev_destroy: + if (!--ns->sdev->refcnt) { + debugfs_remove_recursive(ns->sdev->ddir); +err_sdev_free: + kfree(ns->sdev); + } err_debugfs_destroy: debugfs_remove_recursive(ns->ddir); return err; @@ -216,6 +244,10 @@ static void nsim_uninit(struct net_device *dev) nsim_devlink_teardown(ns); debugfs_remove_recursive(ns->ddir); nsim_bpf_uninit(ns); + if (!--ns->sdev->refcnt) { + debugfs_remove_recursive(ns->sdev->ddir); + kfree(ns->sdev); + } } static void nsim_free(struct net_device *dev) @@ -494,14 +526,48 @@ static int nsim_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } +static int nsim_newlink(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[], + struct netlink_ext_ack *extack) +{ + struct netdevsim *ns = netdev_priv(dev); + + if (tb[IFLA_LINK]) { + struct net_device *joindev; + struct netdevsim *joinns; + + joindev = __dev_get_by_index(src_net, +nla_get_u32(tb[IFLA_LINK])); + if (!joindev) + return -ENODEV; + if (joindev->netdev_ops != _netdev_ops) + return -EINVAL; + + joinns = netdev_priv(joindev); + if (!joinns->sdev || !joinns->sdev->refcnt) + return -EINVAL; + ns->sdev = joinns->sdev; + } + + return register_netdevice(dev); +} + +static void nsim_dellink(struct net_device *dev, struct list_head *head) +{ + unregister_netdevice_queue(dev, head); +} + static struct rtnl_link_ops nsim_link_ops __read_mostly = { .kind = DRV_NAME, .priv_size = sizeof(struct netdevsim), .setup = nsim_setup, .validate = nsim_validate, + .newlink= nsim_newlink, + .dellink= nsim_dellink, }; struct dentry *nsim_ddir; +struct dentry
[PATCH bpf-next 8/9] nfp: bpf: allow program sharing within ASIC
Allow program sharing between netdevs of the same NFP ASIC. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c index 78f44c4d95b4..49b03f7dbf46 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c @@ -566,14 +566,8 @@ int nfp_net_bpf_offload(struct nfp_net *nn, struct bpf_prog *prog, { int err; - if (prog) { - struct bpf_prog_offload *offload = prog->aux->offload; - - if (!offload) - return -EINVAL; - if (offload->netdev != nn->dp.netdev) - return -EINVAL; - } + if (prog && !bpf_offload_dev_match(prog, nn->dp.netdev)) + return -EINVAL; if (prog && old_prog) { u8 cap; -- 2.17.1
[PATCH bpf-next 3/9] netdevsim: associate bound programs with shared dev
Move bound program information from netdevsim to shared sub-object, as programs will soon be shared between netdevs of the same ASIC. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- drivers/net/netdevsim/bpf.c | 30 - drivers/net/netdevsim/netdevsim.h | 11 tools/testing/selftests/bpf/test_offload.py | 5 ++-- 3 files changed, 27 insertions(+), 19 deletions(-) diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c index c36d2a768202..357f9e62f306 100644 --- a/drivers/net/netdevsim/bpf.c +++ b/drivers/net/netdevsim/bpf.c @@ -238,8 +238,8 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, struct bpf_prog *prog) state->state = "verify"; /* Program id is not populated yet when we create the state. */ - sprintf(name, "%u", ns->prog_id_gen++); - state->ddir = debugfs_create_dir(name, ns->ddir_bpf_bound_progs); + sprintf(name, "%u", ns->sdev->prog_id_gen++); + state->ddir = debugfs_create_dir(name, ns->sdev->ddir_bpf_bound_progs); if (IS_ERR_OR_NULL(state->ddir)) { kfree(state); return -ENOMEM; @@ -250,7 +250,7 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, struct bpf_prog *prog) >state, _bpf_string_fops); debugfs_create_bool("loaded", 0400, state->ddir, >is_loaded); - list_add_tail(>l, >bpf_bound_progs); + list_add_tail(>l, >sdev->bpf_bound_progs); prog->aux->offload->dev_priv = state; @@ -497,7 +497,7 @@ nsim_bpf_map_alloc(struct netdevsim *ns, struct bpf_offloaded_map *offmap) } offmap->dev_ops = _bpf_map_ops; - list_add_tail(>l, >bpf_bound_maps); + list_add_tail(>l, >sdev->bpf_bound_maps); return 0; @@ -582,8 +582,15 @@ int nsim_bpf(struct net_device *dev, struct netdev_bpf *bpf) int nsim_bpf_init(struct netdevsim *ns) { - INIT_LIST_HEAD(>bpf_bound_progs); - INIT_LIST_HEAD(>bpf_bound_maps); + if (ns->sdev->refcnt == 1) { + INIT_LIST_HEAD(>sdev->bpf_bound_progs); + INIT_LIST_HEAD(>sdev->bpf_bound_maps); + + ns->sdev->ddir_bpf_bound_progs = + debugfs_create_dir("bpf_bound_progs", ns->sdev->ddir); + if (IS_ERR_OR_NULL(ns->sdev->ddir_bpf_bound_progs)) + return -ENOMEM; + } debugfs_create_u32("bpf_offloaded_id", 0400, ns->ddir, >bpf_offloaded_id); @@ -593,10 +600,6 @@ int nsim_bpf_init(struct netdevsim *ns) >bpf_bind_accept); debugfs_create_u32("bpf_bind_verifier_delay", 0600, ns->ddir, >bpf_bind_verifier_delay); - ns->ddir_bpf_bound_progs = - debugfs_create_dir("bpf_bound_progs", ns->ddir); - if (IS_ERR_OR_NULL(ns->ddir_bpf_bound_progs)) - return -ENOMEM; ns->bpf_tc_accept = true; debugfs_create_bool("bpf_tc_accept", 0600, ns->ddir, @@ -619,9 +622,12 @@ int nsim_bpf_init(struct netdevsim *ns) void nsim_bpf_uninit(struct netdevsim *ns) { - WARN_ON(!list_empty(>bpf_bound_progs)); - WARN_ON(!list_empty(>bpf_bound_maps)); WARN_ON(ns->xdp.prog); WARN_ON(ns->xdp_hw.prog); WARN_ON(ns->bpf_offloaded); + + if (ns->sdev->refcnt == 1) { + WARN_ON(!list_empty(>sdev->bpf_bound_progs)); + WARN_ON(!list_empty(>sdev->bpf_bound_maps)); + } } diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h index 8743ce74d2d9..98f26fa1e671 100644 --- a/drivers/net/netdevsim/netdevsim.h +++ b/drivers/net/netdevsim/netdevsim.h @@ -35,6 +35,12 @@ struct netdevsim_shared_dev { u32 switch_id; struct dentry *ddir; + + struct dentry *ddir_bpf_bound_progs; + u32 prog_id_gen; + + struct list_head bpf_bound_progs; + struct list_head bpf_bound_maps; }; #define NSIM_IPSEC_MAX_SA_COUNT33 @@ -79,12 +85,8 @@ struct netdevsim { struct xdp_attachment_info xdp; struct xdp_attachment_info xdp_hw; - u32 prog_id_gen; - bool bpf_bind_accept; u32 bpf_bind_verifier_delay; - struct dentry *ddir_bpf_bound_progs; - struct list_head bpf_bound_progs; bool bpf_tc_accept; bool bpf_tc_non_bound_accept; @@ -92,7 +94,6 @@ struct netdevsim { bool bpf_xdpoffload_accept; bool bpf_map_accept; - struct list_head bpf_bound_maps; #if IS_ENABLED(CONFIG_NET_DEVLINK) struct devlink *devlink; #endif diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py index b746227eaff2..ee1abef384ea 100755 --- a/tools/testing/selftests/bpf/test_offload.py +++ b/tools/testing/selftests/bpf/test_offload.py @@ -314,6 +314,7 @@ netns = [] # net namespaces to be removed self.ns = ""
[PATCH bpf-next 1/9] netdevsim: add switch_id attribute
Grouping netdevsim devices into "ASICs" will soon be supported. Add switch_id attribute to all netdevsims. For now each netdevsim will have its switch_id matching the device id. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- drivers/net/netdevsim/netdev.c| 24 drivers/net/netdevsim/netdevsim.h | 1 + 2 files changed, 25 insertions(+) diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index a7b179f0d954..9125637ef5d8 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "netdevsim.h" @@ -144,12 +145,34 @@ static struct device_type nsim_dev_type = { .release = nsim_dev_release, }; +static int +nsim_port_attr_get(struct net_device *dev, struct switchdev_attr *attr) +{ + struct netdevsim *ns = netdev_priv(dev); + + switch (attr->id) { + case SWITCHDEV_ATTR_ID_PORT_PARENT_ID: + attr->u.ppid.id_len = sizeof(ns->switch_id); + memcpy(>u.ppid.id, >switch_id, + attr->u.ppid.id_len); + return 0; + default: + return -EOPNOTSUPP; + } +} + +static const struct switchdev_ops nsim_switchdev_ops = { + .switchdev_port_attr_get= nsim_port_attr_get, +}; + static int nsim_init(struct net_device *dev) { struct netdevsim *ns = netdev_priv(dev); int err; ns->netdev = dev; + ns->switch_id = nsim_dev_id; + ns->ddir = debugfs_create_dir(netdev_name(dev), nsim_ddir); if (IS_ERR_OR_NULL(ns->ddir)) return -ENOMEM; @@ -166,6 +189,7 @@ static int nsim_init(struct net_device *dev) goto err_bpf_uninit; SET_NETDEV_DEV(dev, >dev); + SWITCHDEV_SET_OPS(dev, _switchdev_ops); err = nsim_devlink_setup(ns); if (err) diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h index 0aeabbe81cc6..e2f232325259 100644 --- a/drivers/net/netdevsim/netdevsim.h +++ b/drivers/net/netdevsim/netdevsim.h @@ -59,6 +59,7 @@ struct netdevsim { struct u64_stats_sync syncp; struct device dev; + u32 switch_id; struct dentry *ddir; -- 2.17.1
[PATCH bpf-next 4/9] nfp: add .ndo_init() and .ndo_uninit() callbacks
BPF code should unregister the offload capabilities from .ndo_uninit(), to make sure the operation is atomic with unlist_netdevice(). Plumb the init/uninit NDOs for vNICs and representors. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- drivers/net/ethernet/netronome/nfp/nfp_app.c| 17 + drivers/net/ethernet/netronome/nfp/nfp_app.h| 8 .../net/ethernet/netronome/nfp/nfp_net_common.c | 2 ++ .../net/ethernet/netronome/nfp/nfp_net_repr.c | 2 ++ 4 files changed, 29 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c b/drivers/net/ethernet/netronome/nfp/nfp_app.c index f28b244f4ee7..69d4ae7a61f3 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c @@ -86,6 +86,23 @@ const char *nfp_app_mip_name(struct nfp_app *app) return nfp_mip_name(app->pf->mip); } +int nfp_app_ndo_init(struct net_device *netdev) +{ + struct nfp_app *app = nfp_app_from_netdev(netdev); + + if (!app || !app->type->ndo_init) + return 0; + return app->type->ndo_init(app, netdev); +} + +void nfp_app_ndo_uninit(struct net_device *netdev) +{ + struct nfp_app *app = nfp_app_from_netdev(netdev); + + if (app && app->type->ndo_uninit) + app->type->ndo_uninit(app, netdev); +} + u64 *nfp_app_port_get_stats(struct nfp_port *port, u64 *data) { if (!port || !port->app || !port->app->type->port_get_stats) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h index ee74caacb015..afbc19aa66a8 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h @@ -78,6 +78,8 @@ extern const struct nfp_app_type app_abm; * @init: perform basic app checks and init * @clean: clean app state * @extra_cap: extra capabilities string + * @ndo_init: vNIC and repr netdev .ndo_init + * @ndo_uninit:vNIC and repr netdev .ndo_unint * @vnic_alloc:allocate vNICs (assign port types, etc.) * @vnic_free: free up app's vNIC state * @vnic_init: vNIC netdev was registered @@ -117,6 +119,9 @@ struct nfp_app_type { const char *(*extra_cap)(struct nfp_app *app, struct nfp_net *nn); + int (*ndo_init)(struct nfp_app *app, struct net_device *netdev); + void (*ndo_uninit)(struct nfp_app *app, struct net_device *netdev); + int (*vnic_alloc)(struct nfp_app *app, struct nfp_net *nn, unsigned int id); void (*vnic_free)(struct nfp_app *app, struct nfp_net *nn); @@ -200,6 +205,9 @@ static inline void nfp_app_clean(struct nfp_app *app) app->type->clean(app); } +int nfp_app_ndo_init(struct net_device *netdev); +void nfp_app_ndo_uninit(struct net_device *netdev); + static inline int nfp_app_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, unsigned int id) { diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index a712e83c3f0f..279b8ab8a17b 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -3480,6 +3480,8 @@ static int nfp_net_set_mac_address(struct net_device *netdev, void *addr) } const struct net_device_ops nfp_net_netdev_ops = { + .ndo_init = nfp_app_ndo_init, + .ndo_uninit = nfp_app_ndo_uninit, .ndo_open = nfp_net_netdev_open, .ndo_stop = nfp_net_netdev_close, .ndo_start_xmit = nfp_net_tx, diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c index d7b712f6362f..18a09cdcd9c6 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -262,6 +262,8 @@ static int nfp_repr_open(struct net_device *netdev) } const struct net_device_ops nfp_repr_netdev_ops = { + .ndo_init = nfp_app_ndo_init, + .ndo_uninit = nfp_app_ndo_uninit, .ndo_open = nfp_repr_open, .ndo_stop = nfp_repr_stop, .ndo_start_xmit = nfp_repr_xmit, -- 2.17.1
[net-next:master 238/243] drivers/net/phy/mdio-thunder.c:40:8: error: implicit declaration of function 'pcim_enable_device'; did you mean 'pci_enable_device'?
Hi Alexander, First bad commit (maybe != root cause): tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: ccdb51717ba3bdc9585998e4ffd41d70c04dedea commit: 7e2bc7fb65d544bb8598a0ab64e40ee9c60ded6e [238/243] net: cavium: Drop dependency of NET_VENDOR_CAVIUM on PCI config: um-allmodconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: git checkout 7e2bc7fb65d544bb8598a0ab64e40ee9c60ded6e # save the attached .config to linux build tree make ARCH=um All error/warnings (new ones prefixed by >>): drivers/net/phy/mdio-thunder.c: In function 'thunder_mdiobus_pci_probe': >> drivers/net/phy/mdio-thunder.c:40:8: error: implicit declaration of function >> 'pcim_enable_device'; did you mean 'pci_enable_device'? >> [-Werror=implicit-function-declaration] err = pcim_enable_device(pdev); ^~ pci_enable_device drivers/net/phy/mdio-thunder.c: At top level: >> drivers/net/phy/mdio-thunder.c:151:1: warning: data definition has no type >> or storage class module_pci_driver(thunder_mdiobus_driver); ^ >> drivers/net/phy/mdio-thunder.c:151:1: error: type defaults to 'int' in >> declaration of 'module_pci_driver' [-Werror=implicit-int] >> drivers/net/phy/mdio-thunder.c:151:1: warning: parameter names (without >> types) in function declaration drivers/net/phy/mdio-thunder.c:144:26: warning: 'thunder_mdiobus_driver' defined but not used [-Wunused-variable] static struct pci_driver thunder_mdiobus_driver = { ^~ cc1: some warnings being treated as errors -- In file included from drivers/net/ethernet/cavium/liquidio/lio_main.c:31:0: drivers/net/ethernet/cavium/liquidio/octeon_main.h: In function 'octeon_unmap_pci_barx': >> drivers/net/ethernet/cavium/liquidio/octeon_main.h:97:3: error: implicit >> declaration of function 'pci_release_region'; did you mean >> 'pci_release_regions'? [-Werror=implicit-function-declaration] pci_release_region(oct->pci_dev, baridx * 2); ^~ pci_release_regions drivers/net/ethernet/cavium/liquidio/octeon_main.h: In function 'octeon_map_pci_barx': >> drivers/net/ethernet/cavium/liquidio/octeon_main.h:111:6: error: implicit >> declaration of function 'pci_request_region'; did you mean >> 'pci_request_regions'? [-Werror=implicit-function-declaration] if (pci_request_region(oct->pci_dev, baridx * 2, DRV_NAME)) { ^~ pci_request_regions drivers/net/ethernet/cavium/liquidio/lio_main.c: In function 'stop_pci_io': >> drivers/net/ethernet/cavium/liquidio/lio_main.c:332:3: error: implicit >> declaration of function 'pci_disable_msi'; did you mean 'pci_disable_sriov'? >> [-Werror=implicit-function-declaration] pci_disable_msi(oct->pci_dev); ^~~ pci_disable_sriov drivers/net/ethernet/cavium/liquidio/lio_main.c: In function 'octeon_pci_flr': >> drivers/net/ethernet/cavium/liquidio/lio_main.c:983:2: error: implicit >> declaration of function 'pci_cfg_access_lock'; did you mean '__access_ok'? >> [-Werror=implicit-function-declaration] pci_cfg_access_lock(oct->pci_dev); ^~~ __access_ok >> drivers/net/ethernet/cavium/liquidio/lio_main.c:989:7: error: implicit >> declaration of function '__pci_reset_function_locked' >> [-Werror=implicit-function-declaration] rc = __pci_reset_function_locked(oct->pci_dev); ^~~ >> drivers/net/ethernet/cavium/liquidio/lio_main.c:995:2: error: implicit >> declaration of function 'pci_cfg_access_unlock'; did you mean '__access_ok'? >> [-Werror=implicit-function-declaration] pci_cfg_access_unlock(oct->pci_dev); ^ __access_ok drivers/net/ethernet/cavium/liquidio/lio_main.c: In function 'octeon_destroy_resources': >> drivers/net/ethernet/cavium/liquidio/lio_main.c:1063:20: error: invalid use >> of undefined type 'struct msix_entry' msix_entries[i].vector, ^ >> drivers/net/ethernet/cavium/liquidio/lio_main.c:1063:20: error: >> dereferencing pointer to incomplete type 'struct msix_entry' drivers/net/ethernet/cavium/liquidio/lio_main.c:1065:27: error: invalid use of undefined type 'struct msix_entry' free_irq(msix_entries[i].vector, ^ drivers/net/ethernet/cavium/liquidio/lio_main.c:1071:25: error: invalid use of undefined type 'struct msix_entry' free_irq(msix_entries[i].vector, oct); ^ >> drivers/net/ethernet/cavium/liquidio/lio_main.c:1073:4: error: implicit >> declaration of function 'pci_disable_msix'; did you mean >> 'pci_disable_sriov'? [-Werror=implicit-function-declaration] pci_disable_msix(oct->pci_dev); ^~~~ pci_disable_sriov >>
[PATCH net-next] xdp: fix uninitialized 'err' variable
Smatch caught an uninitialized variable error which GCC seems to miss. Fixes: a25717d2b604 ("xdp: support simultaneous driver and hw XDP attachment") Signed-off-by: Jakub Kicinski --- net/core/rtnetlink.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index e03258e954c8..92b6fa5d5f6e 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1414,14 +1414,17 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev) prog_id = 0; mode = XDP_ATTACHED_NONE; - if (rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_SKB, - IFLA_XDP_SKB_PROG_ID, rtnl_xdp_prog_skb)) + err = rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_SKB, + IFLA_XDP_SKB_PROG_ID, rtnl_xdp_prog_skb); + if (err) goto err_cancel; - if (rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_DRV, - IFLA_XDP_DRV_PROG_ID, rtnl_xdp_prog_drv)) + err = rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_DRV, + IFLA_XDP_DRV_PROG_ID, rtnl_xdp_prog_drv); + if (err) goto err_cancel; - if (rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_HW, - IFLA_XDP_HW_PROG_ID, rtnl_xdp_prog_hw)) + err = rtnl_xdp_report_one(skb, dev, _id, , XDP_ATTACHED_HW, + IFLA_XDP_HW_PROG_ID, rtnl_xdp_prog_hw); + if (err) goto err_cancel; err = nla_put_u8(skb, IFLA_XDP_ATTACHED, mode); -- 2.17.1
[PATCH mlx5-next 5/8] net/mlx5: Add missing SET_DRIVER_VERSION command translation
From: Noa Osherovich When translating command opcodes to a string, SET_DRIVER_VERSION command was missing. Fixes: 42ca502e179d0 ('net/mlx5_core: Use a macro in mlx5_command_str()') Signed-off-by: Noa Osherovich Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 10517b2a0643..041c18faea46 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -458,6 +458,7 @@ const char *mlx5_command_str(int command) MLX5_COMMAND_STR_CASE(SET_HCA_CAP); MLX5_COMMAND_STR_CASE(QUERY_ISSI); MLX5_COMMAND_STR_CASE(SET_ISSI); + MLX5_COMMAND_STR_CASE(SET_DRIVER_VERSION); MLX5_COMMAND_STR_CASE(CREATE_MKEY); MLX5_COMMAND_STR_CASE(QUERY_MKEY); MLX5_COMMAND_STR_CASE(DESTROY_MKEY); -- 2.17.0
[PATCH mlx5-next 8/8] net/mlx5: Fix tristate and description for MLX5 module
From: Eran Ben Elisha Current description did not include new devices. Fix that by proving the correct generic description. Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- drivers/infiniband/hw/mlx5/Kconfig | 2 +- drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 6 +++--- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/Kconfig b/drivers/infiniband/hw/mlx5/Kconfig index fb4d77be019b..0440966bc6ec 100644 --- a/drivers/infiniband/hw/mlx5/Kconfig +++ b/drivers/infiniband/hw/mlx5/Kconfig @@ -1,5 +1,5 @@ config MLX5_INFINIBAND - tristate "Mellanox Connect-IB HCA support" + tristate "Mellanox 5th generation network adapters (ConnectX series) support" depends on NETDEVICES && ETHERNET && PCI && MLX5_CORE depends on INFINIBAND_USER_ACCESS || INFINIBAND_USER_ACCESS=n ---help--- diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig index 2545296a0c08..7a84dd07ced2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig @@ -3,7 +3,7 @@ # config MLX5_CORE - tristate "Mellanox Technologies ConnectX-4 and Connect-IB core driver" + tristate "Mellanox 5th generation network adapters (ConnectX series) core driver" depends on MAY_USE_DEVLINK depends on PCI imply PTP_1588_CLOCK @@ -27,7 +27,7 @@ config MLX5_FPGA sandbox-specific client drivers. config MLX5_CORE_EN - bool "Mellanox Technologies ConnectX-4 Ethernet support" + bool "Mellanox 5th generation network adapters (ConnectX series) Ethernet support" depends on NETDEVICES && ETHERNET && INET && PCI && MLX5_CORE depends on IPV6=y || IPV6=n || MLX5_CORE=m select PAGE_POOL @@ -69,7 +69,7 @@ config MLX5_CORE_EN_DCB If unsure, set to Y config MLX5_CORE_IPOIB - bool "Mellanox Technologies ConnectX-4 IPoIB offloads support" + bool "Mellanox 5th generation network adapters (connectX series) IPoIB offloads support" depends on MLX5_CORE_EN default n ---help--- diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 615005e63819..f9b950e1bd85 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -64,7 +64,7 @@ #include "lib/clock.h" MODULE_AUTHOR("Eli Cohen "); -MODULE_DESCRIPTION("Mellanox Connect-IB, ConnectX-4 core driver"); +MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_VERSION(DRIVER_VERSION); -- 2.17.0
[PATCH mlx5-next 4/8] net/mlx5: Add XRQ commands definitions
From: Max Gurtovoy Update mlx5 command list and error return function to handle XRQ commands. Signed-off-by: Max Gurtovoy Reviewed-by: Daniel Jurgens Reviewed-by: Artemy Kovalyov Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index a94955302482..10517b2a0643 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -278,6 +278,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_DESTROY_PSV: case MLX5_CMD_OP_DESTROY_SRQ: case MLX5_CMD_OP_DESTROY_XRC_SRQ: + case MLX5_CMD_OP_DESTROY_XRQ: case MLX5_CMD_OP_DESTROY_DCT: case MLX5_CMD_OP_DEALLOC_Q_COUNTER: case MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT: @@ -347,6 +348,9 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_CREATE_XRC_SRQ: case MLX5_CMD_OP_QUERY_XRC_SRQ: case MLX5_CMD_OP_ARM_XRC_SRQ: + case MLX5_CMD_OP_CREATE_XRQ: + case MLX5_CMD_OP_QUERY_XRQ: + case MLX5_CMD_OP_ARM_XRQ: case MLX5_CMD_OP_CREATE_DCT: case MLX5_CMD_OP_DRAIN_DCT: case MLX5_CMD_OP_QUERY_DCT: @@ -601,6 +605,10 @@ const char *mlx5_command_str(int command) MLX5_COMMAND_STR_CASE(FPGA_QUERY_QP); MLX5_COMMAND_STR_CASE(FPGA_QUERY_QP_COUNTERS); MLX5_COMMAND_STR_CASE(FPGA_DESTROY_QP); + MLX5_COMMAND_STR_CASE(CREATE_XRQ); + MLX5_COMMAND_STR_CASE(DESTROY_XRQ); + MLX5_COMMAND_STR_CASE(QUERY_XRQ); + MLX5_COMMAND_STR_CASE(ARM_XRQ); MLX5_COMMAND_STR_CASE(CREATE_GENERAL_OBJECT); MLX5_COMMAND_STR_CASE(DESTROY_GENERAL_OBJECT); default: return "unknown command opcode"; -- 2.17.0
[PATCH mlx5-next 7/8] net/mlx5: Better return types for CQE API
From: Tariq Toukan Reduce sizes of return types. Use bool for binary indication. Signed-off-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index f8671c0a43aa..0566c6a94805 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -750,7 +750,7 @@ enum { #define MLX5_MINI_CQE_ARRAY_SIZE 8 -static inline int mlx5_get_cqe_format(struct mlx5_cqe64 *cqe) +static inline u8 mlx5_get_cqe_format(struct mlx5_cqe64 *cqe) { return (cqe->op_own >> 2) & 0x3; } @@ -770,14 +770,14 @@ static inline u8 get_cqe_l3_hdr_type(struct mlx5_cqe64 *cqe) return (cqe->l4_l3_hdr_type >> 2) & 0x3; } -static inline u8 cqe_is_tunneled(struct mlx5_cqe64 *cqe) +static inline bool cqe_is_tunneled(struct mlx5_cqe64 *cqe) { return cqe->outer_l3_tunneled & 0x1; } -static inline int cqe_has_vlan(struct mlx5_cqe64 *cqe) +static inline bool cqe_has_vlan(struct mlx5_cqe64 *cqe) { - return !!(cqe->l4_l3_hdr_type & 0x1); + return cqe->l4_l3_hdr_type & 0x1; } static inline u64 get_cqe_ts(struct mlx5_cqe64 *cqe) -- 2.17.0
[PATCH mlx5-next 2/8] net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures
From: Eran Ben Elisha This patch exposes PRM layout for handling MPEGC (Management PCIe General Configuration). This will be used in the downstream patch for configuring MPEGC via the driver. Signed-off-by: Eran Ben Elisha Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + include/linux/mlx5/mlx5_ifc.h | 23 +-- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 4a4125b4279d..957199c20a0f 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -145,6 +145,7 @@ enum { MLX5_REG_MPCNT = 0x9051, MLX5_REG_MTPPS = 0x9053, MLX5_REG_MTPPSE = 0x9054, + MLX5_REG_MPEGC = 0x9056, MLX5_REG_MCQI= 0x9061, MLX5_REG_MCC = 0x9062, MLX5_REG_MCDA= 0x9063, diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index bd7b71f54d59..2de5feaeb74a 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -8049,6 +8049,19 @@ struct mlx5_ifc_peir_reg_bits { u8 error_type[0x8]; }; +struct mlx5_ifc_mpegc_reg_bits { + u8 reserved_at_0[0x30]; + u8 field_select[0x10]; + + u8 tx_overflow_sense[0x1]; + u8 mark_cqe[0x1]; + u8 mark_cnp[0x1]; + u8 reserved_at_43[0x1b]; + u8 tx_lossy_overflow_oper[0x2]; + + u8 reserved_at_60[0x100]; +}; + struct mlx5_ifc_pcam_enhanced_features_bits { u8 reserved_at_0[0x6d]; u8 rx_icrc_encapsulated_counter[0x1]; @@ -8097,7 +8110,11 @@ struct mlx5_ifc_pcam_reg_bits { }; struct mlx5_ifc_mcam_enhanced_features_bits { - u8 reserved_at_0[0x7b]; + u8 reserved_at_0[0x74]; + u8 mark_tx_action_cnp[0x1]; + u8 mark_tx_action_cqe[0x1]; + u8 dynamic_tx_overflow[0x1]; + u8 reserved_at_77[0x4]; u8 pcie_outbound_stalled[0x1]; u8 tx_overflow_buffer_pkt[0x1]; u8 mtpps_enh_out_per_adj[0x1]; @@ -8112,7 +8129,9 @@ struct mlx5_ifc_mcam_access_reg_bits { u8 mcqi[0x1]; u8 reserved_at_1f[0x1]; - u8 regs_95_to_68[0x1c]; + u8 regs_95_to_87[0x9]; + u8 mpegc[0x1]; + u8 regs_85_to_68[0x12]; u8 tracer_registers[0x4]; u8 regs_63_to_32[0x20]; -- 2.17.0
[PATCH mlx5-next 1/8] net/mlx5: FW tracer, add hardware structures
From: Feras Daoud This change adds the infrastructure to mlx5 core fw tracer. It introduces the following 4 new registers: MLX5_REG_MTRC_CAP - Used to read tracer capabilities MLX5_REG_MTRC_CONF - Used to set tracer configurations MLX5_REG_MTRC_STDB - Used to query tracer strings database MLX5_REG_MTRC_CTRL - Used to control the tracer The capability of the tracing can be checked using mcam access register, therefore, the mcam access register interface will expose the tracer register. Signed-off-by: Feras Daoud Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 4 +++ include/linux/mlx5/mlx5_ifc.h | 61 ++- 2 files changed, 64 insertions(+), 1 deletion(-) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 1cb1c0317b77..4a4125b4279d 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -138,6 +138,10 @@ enum { MLX5_REG_HOST_ENDIANNESS = 0x7004, MLX5_REG_MCIA= 0x9014, MLX5_REG_MLCR= 0x902b, + MLX5_REG_MTRC_CAP= 0x9040, + MLX5_REG_MTRC_CONF = 0x9041, + MLX5_REG_MTRC_STDB = 0x9042, + MLX5_REG_MTRC_CTRL = 0x9043, MLX5_REG_MPCNT = 0x9051, MLX5_REG_MTPPS = 0x9053, MLX5_REG_MTPPSE = 0x9054, diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 1853e7fd6924..bd7b71f54d59 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -8112,7 +8112,9 @@ struct mlx5_ifc_mcam_access_reg_bits { u8 mcqi[0x1]; u8 reserved_at_1f[0x1]; - u8 regs_95_to_64[0x20]; + u8 regs_95_to_68[0x1c]; + u8 tracer_registers[0x4]; + u8 regs_63_to_32[0x20]; u8 regs_31_to_0[0x20]; }; @@ -9187,4 +9189,61 @@ struct mlx5_ifc_create_uctx_in_bits { struct mlx5_ifc_uctx_bits uctx; }; +struct mlx5_ifc_mtrc_string_db_param_bits { + u8 string_db_base_address[0x20]; + + u8 reserved_at_20[0x8]; + u8 string_db_size[0x18]; +}; + +struct mlx5_ifc_mtrc_cap_bits { + u8 trace_owner[0x1]; + u8 trace_to_memory[0x1]; + u8 reserved_at_2[0x4]; + u8 trc_ver[0x2]; + u8 reserved_at_8[0x14]; + u8 num_string_db[0x4]; + + u8 first_string_trace[0x8]; + u8 num_string_trace[0x8]; + u8 reserved_at_30[0x28]; + + u8 log_max_trace_buffer_size[0x8]; + + u8 reserved_at_60[0x20]; + + struct mlx5_ifc_mtrc_string_db_param_bits string_db_param[8]; + + u8 reserved_at_280[0x180]; +}; + +struct mlx5_ifc_mtrc_conf_bits { + u8 reserved_at_0[0x1c]; + u8 trace_mode[0x4]; + u8 reserved_at_20[0x18]; + u8 log_trace_buffer_size[0x8]; + u8 trace_mkey[0x20]; + u8 reserved_at_60[0x3a0]; +}; + +struct mlx5_ifc_mtrc_stdb_bits { + u8 string_db_index[0x4]; + u8 reserved_at_4[0x4]; + u8 read_size[0x18]; + u8 start_offset[0x20]; + u8 string_db_data[0]; +}; + +struct mlx5_ifc_mtrc_ctrl_bits { + u8 trace_status[0x2]; + u8 reserved_at_2[0x2]; + u8 arm_event[0x1]; + u8 reserved_at_5[0xb]; + u8 modify_field_select[0x10]; + u8 reserved_at_20[0x2b]; + u8 current_timestamp52_32[0x15]; + u8 current_timestamp31_0[0x20]; + u8 reserved_at_80[0x180]; +}; + #endif /* MLX5_IFC_H */ -- 2.17.0
[PATCH mlx5-next 6/8] net/mlx5: Use ERR_CAST() instead of coding it
From: Roi Dayan This makes it more readable that rule is being used to return an err. Signed-off-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 05e7a5112b74..29b86232f13a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1825,7 +1825,7 @@ _mlx5_add_flow_rules(struct mlx5_flow_table *ft, g = alloc_auto_flow_group(ft, spec); if (IS_ERR(g)) { - rule = (void *)g; + rule = ERR_CAST(g); up_write_ref_node(>node); return rule; } -- 2.17.0
[PATCH mlx5-next 3/8] net/mlx5: Add core support for double vlan push/pop steering action
From: Jianbo Liu As newer firmware supports double push/pop in a single FTE, we add core bits and extend vlan action logic for it. Signed-off-by: Jianbo Liu Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h | 2 ++ .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 6 +++--- drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 12 +--- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c| 4 +++- include/linux/mlx5/fs.h | 4 +++- include/linux/mlx5/mlx5_ifc.h| 11 +-- 6 files changed, 29 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h index 09f178a3fcab..0240aee9189e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h @@ -138,6 +138,8 @@ TRACE_EVENT(mlx5_fs_del_fg, {MLX5_FLOW_CONTEXT_ACTION_MOD_HDR, "MOD_HDR"},\ {MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH, "VLAN_PUSH"},\ {MLX5_FLOW_CONTEXT_ACTION_VLAN_POP, "VLAN_POP"},\ + {MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH_2, "VLAN_PUSH_2"},\ + {MLX5_FLOW_CONTEXT_ACTION_VLAN_POP_2,"VLAN_POP_2"},\ {MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO, "NEXT_PRIO"} TRACE_EVENT(mlx5_fs_set_fte, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index cecd201f0b73..8f50ce80ff66 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -70,9 +70,9 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, flow_act.action &= ~(MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH | MLX5_FLOW_CONTEXT_ACTION_VLAN_POP); else if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH) { - flow_act.vlan.ethtype = ntohs(attr->vlan_proto); - flow_act.vlan.vid = attr->vlan_vid; - flow_act.vlan.prio = attr->vlan_prio; + flow_act.vlan[0].ethtype = ntohs(attr->vlan_proto); + flow_act.vlan[0].vid = attr->vlan_vid; + flow_act.vlan[0].prio = attr->vlan_prio; } if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c index 5a00deff5457..6a62b84e57f4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c @@ -349,9 +349,15 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev, vlan = MLX5_ADDR_OF(flow_context, in_flow_context, push_vlan); - MLX5_SET(vlan, vlan, ethtype, fte->action.vlan.ethtype); - MLX5_SET(vlan, vlan, vid, fte->action.vlan.vid); - MLX5_SET(vlan, vlan, prio, fte->action.vlan.prio); + MLX5_SET(vlan, vlan, ethtype, fte->action.vlan[0].ethtype); + MLX5_SET(vlan, vlan, vid, fte->action.vlan[0].vid); + MLX5_SET(vlan, vlan, prio, fte->action.vlan[0].prio); + + vlan = MLX5_ADDR_OF(flow_context, in_flow_context, push_vlan_2); + + MLX5_SET(vlan, vlan, ethtype, fte->action.vlan[1].ethtype); + MLX5_SET(vlan, vlan, vid, fte->action.vlan[1].vid); + MLX5_SET(vlan, vlan, prio, fte->action.vlan[1].prio); in_match_value = MLX5_ADDR_OF(flow_context, in_flow_context, match_value); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 49a75d31185e..05e7a5112b74 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1464,7 +1464,9 @@ static bool check_conflicting_actions(u32 action1, u32 action2) MLX5_FLOW_CONTEXT_ACTION_DECAP | MLX5_FLOW_CONTEXT_ACTION_MOD_HDR | MLX5_FLOW_CONTEXT_ACTION_VLAN_POP | -MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH)) +MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH | +MLX5_FLOW_CONTEXT_ACTION_VLAN_POP_2 | +MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH_2)) return true; return false; diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 757b4a30281e..c40f2fc68655 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -152,6 +152,8 @@ struct mlx5_fs_vlan { u8 prio; }; +#define MLX5_FS_VLAN_DEPTH 2 + struct mlx5_flow_act { u32 action; bool has_flow_tag; @@ -159,7 +161,7 @@ struct mlx5_flow_act { u32 encap_id; u32 modify_id; uintptr_t esp_id; - struct mlx5_fs_vlan
[PATCH mlx5-next 0/8] Mellanox, mlx5 updates 2018-07-16
Hi, This series includes mlx5 core infrastructure updates and fixes aimed for mlx5-next branch. In case of no objections, below patches will be applied to mlx5-next branch and next mlx5 net-next pull request will start with a merge commit pointing to the last patch in this series. >From Eran: - Add MPEGC (Management PCIe General Configuration) registers and btis - Fix tristate and description for MLX5 module >From Feras: - Add hardware structures for the firmware tracer >From Jainbo: - Core support for double vlan push/pop steering action >From Max: - Add XRQ commands definitions >From Noa: - Add missing SET_DRIVER_VERSION command translation >From Roi: - Use ERR_CAST() instead of coding it >From Tariq: - Better return types for CQE API Thanks, Saeed --- Eran Ben Elisha (2): net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures net/mlx5: Fix tristate and description for MLX5 module Feras Daoud (1): net/mlx5: FW tracer, add hardware structures Jianbo Liu (1): net/mlx5: Add core support for double vlan push/pop steering action Max Gurtovoy (1): net/mlx5: Add XRQ commands definitions Noa Osherovich (1): net/mlx5: Add missing SET_DRIVER_VERSION command translation Roi Dayan (1): net/mlx5: Use ERR_CAST() instead of coding it Tariq Toukan (1): net/mlx5: Better return types for CQE API drivers/infiniband/hw/mlx5/Kconfig| 2 +- .../net/ethernet/mellanox/mlx5/core/Kconfig | 6 +- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 9 ++ .../mellanox/mlx5/core/diag/fs_tracepoint.h | 2 + .../mellanox/mlx5/core/eswitch_offloads.c | 6 +- .../net/ethernet/mellanox/mlx5/core/fs_cmd.c | 12 ++- .../net/ethernet/mellanox/mlx5/core/fs_core.c | 6 +- .../net/ethernet/mellanox/mlx5/core/main.c| 2 +- include/linux/mlx5/device.h | 8 +- include/linux/mlx5/driver.h | 5 + include/linux/mlx5/fs.h | 4 +- include/linux/mlx5/mlx5_ifc.h | 93 ++- 12 files changed, 133 insertions(+), 22 deletions(-) -- 2.17.0
[PATCH net-next] liquidio: correct error msg text when removing VLAN ID
From: Rick Farrington Signed-off-by: Rick Farrington Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/lio_main.c| 2 +- drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index a60d5af..4edb158 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -2628,7 +2628,7 @@ static int liquidio_vlan_rx_kill_vid(struct net_device *netdev, ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, ); if (ret < 0) { - dev_err(>pci_dev->dev, "Add VLAN filter failed in core (ret: 0x%x)\n", + dev_err(>pci_dev->dev, "Del VLAN filter failed in core (ret: 0x%x)\n", ret); } return ret; diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c index 7fa0212..b778357 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c @@ -1693,7 +1693,7 @@ liquidio_vlan_rx_kill_vid(struct net_device *netdev, ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, ); if (ret < 0) { - dev_err(>pci_dev->dev, "Add VLAN filter failed in core (ret: 0x%x)\n", + dev_err(>pci_dev->dev, "Del VLAN filter failed in core (ret: 0x%x)\n", ret); } return ret;
Re: [PATCH v2 iproute2-next 06/31] tc/util: add print helpers for JSON
On 7/10/18 3:05 PM, Stephen Hemminger wrote: > From: Stephen Hemminger > > Add a helper to print rate, time and size in numeric or pretty format > based on JSON flag. > > Signed-off-by: Stephen Hemminger > --- > tc/tc_util.c | 83 +--- > tc/tc_util.h | 6 > 2 files changed, 59 insertions(+), 30 deletions(-) This one fails to compile on Stretch: tc CC tc_util.o tc_util.c:388:6: error: conflicting types for ‘print_time’ void print_time(const char *key, const char *fmt, __u32 tm) ^~ In file included from tc_util.c:27:0: tc_util.h:92:6: note: previous declaration of ‘print_time’ was here void print_time(const char *key, const char *fmt, __s32 tm); ^~ ../config.mk:43: recipe for target 'tc_util.o' failed
Re: [PATCH][net-next][v2] net: convert gro_count to bitmask
Hi Li, Thank you for the patch! Yet something to improve: [auto build test ERROR on net-next/master] [also build test ERROR on next-20180713] [cannot apply to v4.18-rc5] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Li-RongQing/net-convert-gro_count-to-bitmask/20180715-233722 config: i386-randconfig-s1-201828 (attached as .config) compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 reproduce: # save the attached .config to linux build tree make ARCH=i386 :: branch date: 15 hours ago :: commit date: 15 hours ago All errors (new ones prefixed by >>): In file included from arch/x86/include/asm/current.h:5:0, from include/linux/sched.h:12, from include/linux/uaccess.h:5, from net/core/dev.c:75: net/core/dev.c: In function 'netdev_init': >> include/linux/compiler.h:339:38: error: call to '__compiletime_assert_9285' >> declared with attribute error: BUILD_BUG_ON failed: GRO_HASH_BUCKETS > >> FIELD_SIZEOF(struct napi_struct, gro_bitmask) _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:45:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/build_bug.h:69:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^~~~ net/core/dev.c:9284:2: note: in expansion of macro 'BUILD_BUG_ON' BUILD_BUG_ON(GRO_HASH_BUCKETS > ^~~~ -- In file included from arch/x86/include/asm/current.h:5:0, from include/linux/sched.h:12, from include/linux/uaccess.h:5, from net//core/dev.c:75: net//core/dev.c: In function 'netdev_init': >> include/linux/compiler.h:339:38: error: call to '__compiletime_assert_9285' >> declared with attribute error: BUILD_BUG_ON failed: GRO_HASH_BUCKETS > >> FIELD_SIZEOF(struct napi_struct, gro_bitmask) _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:45:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/build_bug.h:69:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^~~~ net//core/dev.c:9284:2: note: in expansion of macro 'BUILD_BUG_ON' BUILD_BUG_ON(GRO_HASH_BUCKETS > ^~~~ # https://github.com/0day-ci/linux/commit/b4ba3db381100e1869270a58dd2d9950ef0923de git remote add linux-review https://github.com/0day-ci/linux git remote update linux-review git checkout b4ba3db381100e1869270a58dd2d9950ef0923de vim +/__compiletime_assert_9285 +339 include/linux/compiler.h 9a8ab1c3 Daniel Santos 2013-02-21 325 9a8ab1c3 Daniel Santos 2013-02-21 326 #define _compiletime_assert(condition, msg, prefix, suffix) \ 9a8ab1c3 Daniel Santos 2013-02-21 327 __compiletime_assert(condition, msg, prefix, suffix) 9a8ab1c3 Daniel Santos 2013-02-21 328 9a8ab1c3 Daniel Santos 2013-02-21 329 /** 9a8ab1c3 Daniel Santos 2013-02-21 330 * compiletime_assert - break build and emit msg if condition is false 9a8ab1c3 Daniel Santos 2013-02-21 331 * @condition: a compile-time constant condition to check 9a8ab1c3 Daniel Santos 2013-02-21 332 * @msg: a message to emit if condition is false 9a8ab1c3 Daniel Santos 2013-02-21 333 * 9a8ab1c3 Daniel Santos 2013-02-21 334 * In tradition of POSIX assert, this macro will break the build if the 9a8ab1c3 Daniel Santos 2013-02-21 335 * supplied condition is *false*, emitting the supplied error message if the 9a8ab1c3 Daniel Santos 2013-02-21 336 * compiler has support to do so. 9a8ab1c3 Daniel Santos 2013-02-21 337 */ 9a8ab1c3 Daniel Santos 2013-02-21 338 #define compiletime_assert(condition, msg) \ 9a8ab1c3 Daniel Santos 2013-02-21 @339
Re: [PATCH] net: cavium: Drop dependency of NET_VENDOR_CAVIUM on PCI
Hi Alexander, I love your patch! Yet something to improve: [auto build test ERROR on net-next/master] [also build test ERROR on v4.18-rc5 next-20180713] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Alexander-Sverdlin/net-cavium-Drop-dependency-of-NET_VENDOR_CAVIUM-on-PCI/20180716-002448 config: s390-defconfig (attached as .config) compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=s390 :: branch date: 15 hours ago :: commit date: 15 hours ago All error/warnings (new ones prefixed by >>): drivers/net/ethernet/cavium/common/cavium_ptp.c: In function 'cavium_ptp_probe': >> drivers/net/ethernet/cavium/common/cavium_ptp.c:235:8: error: implicit >> declaration of function 'pcim_enable_device'; did you mean >> 'pci_enable_device'? [-Werror=implicit-function-declaration] err = pcim_enable_device(pdev); ^~ pci_enable_device drivers/net/ethernet/cavium/common/cavium_ptp.c: At top level: >> drivers/net/ethernet/cavium/common/cavium_ptp.c:339:1: warning: data >> definition has no type or storage class module_pci_driver(cavium_ptp_driver); ^ >> drivers/net/ethernet/cavium/common/cavium_ptp.c:339:1: error: type defaults >> to 'int' in declaration of 'module_pci_driver' [-Werror=implicit-int] >> drivers/net/ethernet/cavium/common/cavium_ptp.c:339:1: warning: parameter >> names (without types) in function declaration drivers/net/ethernet/cavium/common/cavium_ptp.c:332:26: warning: 'cavium_ptp_driver' defined but not used [-Wunused-variable] static struct pci_driver cavium_ptp_driver = { ^ cc1: some warnings being treated as errors # https://github.com/0day-ci/linux/commit/c862aa8f427828f2c08fdc96494152690a2ec5d0 git remote add linux-review https://github.com/0day-ci/linux git remote update linux-review git checkout c862aa8f427828f2c08fdc96494152690a2ec5d0 vim +235 drivers/net/ethernet/cavium/common/cavium_ptp.c 8c56df37 Radoslaw Biernacki 2018-01-15 216 8c56df37 Radoslaw Biernacki 2018-01-15 217 static int cavium_ptp_probe(struct pci_dev *pdev, 8c56df37 Radoslaw Biernacki 2018-01-15 218 const struct pci_device_id *ent) 8c56df37 Radoslaw Biernacki 2018-01-15 219 { 8c56df37 Radoslaw Biernacki 2018-01-15 220 struct device *dev = >dev; 8c56df37 Radoslaw Biernacki 2018-01-15 221 struct cavium_ptp *clock; 8c56df37 Radoslaw Biernacki 2018-01-15 222 struct cyclecounter *cc; 8c56df37 Radoslaw Biernacki 2018-01-15 223 u64 clock_cfg; 8c56df37 Radoslaw Biernacki 2018-01-15 224 u64 clock_comp; 8c56df37 Radoslaw Biernacki 2018-01-15 225 int err; 8c56df37 Radoslaw Biernacki 2018-01-15 226 8c56df37 Radoslaw Biernacki 2018-01-15 227 clock = devm_kzalloc(dev, sizeof(*clock), GFP_KERNEL); 8c56df37 Radoslaw Biernacki 2018-01-15 228 if (!clock) { 8c56df37 Radoslaw Biernacki 2018-01-15 229 err = -ENOMEM; 8c56df37 Radoslaw Biernacki 2018-01-15 230 goto error; 8c56df37 Radoslaw Biernacki 2018-01-15 231 } 8c56df37 Radoslaw Biernacki 2018-01-15 232 8c56df37 Radoslaw Biernacki 2018-01-15 233 clock->pdev = pdev; 8c56df37 Radoslaw Biernacki 2018-01-15 234 8c56df37 Radoslaw Biernacki 2018-01-15 @235 err = pcim_enable_device(pdev); 8c56df37 Radoslaw Biernacki 2018-01-15 236 if (err) 8c56df37 Radoslaw Biernacki 2018-01-15 237 goto error_free; 8c56df37 Radoslaw Biernacki 2018-01-15 238 8c56df37 Radoslaw Biernacki 2018-01-15 239 err = pcim_iomap_regions(pdev, 1 << PCI_PTP_BAR_NO, pci_name(pdev)); 8c56df37 Radoslaw Biernacki 2018-01-15 240 if (err) 8c56df37 Radoslaw Biernacki 2018-01-15 241 goto error_free; 8c56df37 Radoslaw Biernacki 2018-01-15 242 8c56df37 Radoslaw Biernacki 2018-01-15 243 clock->reg_base = pcim_iomap_table(pdev)[PCI_PTP_BAR_NO]; 8c56df37 Radoslaw Biernacki 2018-01-15 244 8c56df37 Radoslaw Biernacki 2018-01-15 245 spin_lock_init(>spin_lock); 8c56df37 Radoslaw Biernacki 2018-01-15 246 8c56df37 Radoslaw Biernacki 2018-01-15 247 cc = >cycle_counter; 8c56df37 Radoslaw Biernacki 2018-01-15 248 cc->read = cavium_ptp_cc_read; 8c56df37 Radoslaw Biernacki 2018-01-15 249 cc->mask = CYCLECOUNTER_MASK(64); 8c56df37 Radoslaw Biernacki 2018-01-15 250 cc->mult = 1; 8c56df37 Radoslaw Biernacki 2018-01-15 251 cc->shift = 0; 8c56df37 Radoslaw Biernacki 2018-01-15 252 8c56df37 Radoslaw Biernacki 2018-01-15 253 timecounter_init(>time_counte
Re: [PATCH next] bonding: pass link-local packets to bonding master also.
On Mon, 16 Jul 2018 16:57:22 -0700 Mahesh Bandewar (महेश बंडेवार) wrote: > On Mon, Jul 16, 2018 at 4:33 PM, Stephen Hemminger > wrote: > > On Sun, 15 Jul 2018 18:12:46 -0700 > > Mahesh Bandewar wrote: > > > >> From: Mahesh Bandewar > >> > >> Commit b89f04c61efe ("bonding: deliver link-local packets with > >> skb->dev set to link that packets arrived on") changed the behavior > >> of how link-local-multicast packets are processed. The change in > >> the behavior broke some legacy use cases where these packets are > >> expected to arrive on bonding master device also. > >> > >> This patch passes the packet to the stack with the link it arrived > >> on as well as passes to the bonding-master device to preserve the > >> legacy use case. > >> > >> Reported-by: Michal Soltys > >> Signed-off-by: Mahesh Bandewar > > > > Thanks for fixing this. > > > > Why not add a Fixes: tag instead of just talking about the commit? > > That helps the stable maintainers know which versions of the kernel > > need the patch. > Well, I thought about it. It's definitely 'related' but not sure it > 'fixes' in true sense. It definitely fixes the broken legacy case > though. Is that sufficient to add 'fixes' tag? The previous commit caused a regression. your change fixes the regression
Re: [PATCH][net-next][v2] net: convert gro_count to bitmask
From: Eric Dumazet Date: Mon, 16 Jul 2018 16:40:52 -0700 > I guess we could either use BITS_PER_LONG or : > > diff --git a/net/core/dev.c b/net/core/dev.c > index > c883b17ee0fe2c8a7ca2f2867560ba74004790a7..4f8b92d81d107fc9acd2499297435cbd9e9b5c67 > 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -9282,7 +9282,7 @@ static struct hlist_head * __net_init > netdev_create_hash(void) Commited thusly: [PATCH] net: Fix GRO_HASH_BUCKETS assertion. FIELD_SIZEOF() is in bytes, but we want bits. Fixes: d9f37d01e294 ("net: convert gro_count to bitmask") Suggested-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index c883b17ee0fe..4f8b92d81d10 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9282,7 +9282,7 @@ static struct hlist_head * __net_init netdev_create_hash(void) static int __net_init netdev_init(struct net *net) { BUILD_BUG_ON(GRO_HASH_BUCKETS > - FIELD_SIZEOF(struct napi_struct, gro_bitmask)); +8 * FIELD_SIZEOF(struct napi_struct, gro_bitmask)); if (net != _net) INIT_LIST_HEAD(>dev_base_head); -- 2.13.6
Re: [PATCH next] bonding: pass link-local packets to bonding master also.
On Mon, Jul 16, 2018 at 4:33 PM, Stephen Hemminger wrote: > On Sun, 15 Jul 2018 18:12:46 -0700 > Mahesh Bandewar wrote: > >> From: Mahesh Bandewar >> >> Commit b89f04c61efe ("bonding: deliver link-local packets with >> skb->dev set to link that packets arrived on") changed the behavior >> of how link-local-multicast packets are processed. The change in >> the behavior broke some legacy use cases where these packets are >> expected to arrive on bonding master device also. >> >> This patch passes the packet to the stack with the link it arrived >> on as well as passes to the bonding-master device to preserve the >> legacy use case. >> >> Reported-by: Michal Soltys >> Signed-off-by: Mahesh Bandewar > > Thanks for fixing this. > > Why not add a Fixes: tag instead of just talking about the commit? > That helps the stable maintainers know which versions of the kernel > need the patch. Well, I thought about it. It's definitely 'related' but not sure it 'fixes' in true sense. It definitely fixes the broken legacy case though. Is that sufficient to add 'fixes' tag?
Re: [PATCH next] bonding: pass link-local packets to bonding master also.
On Mon, Jul 16, 2018 at 2:24 PM, Jay Vosburgh wrote: > Mahesh Bandewar wrote: > >>From: Mahesh Bandewar >> >>Commit b89f04c61efe ("bonding: deliver link-local packets with >>skb->dev set to link that packets arrived on") changed the behavior >>of how link-local-multicast packets are processed. The change in >>the behavior broke some legacy use cases where these packets are >>expected to arrive on bonding master device also. >> >>This patch passes the packet to the stack with the link it arrived >>on as well as passes to the bonding-master device to preserve the >>legacy use case. > > Michal, can you test this? I'm travelling this week and won't > be able to run the patch. > > Mahesh, will this confuse LLDP, et al, daemons that, e.g., bind > to every possible interface and now see the same LLDP PDU (identical > Chassis ID, Port ID, et al, TLVs) on multiple interfaces? > Well it's hard to say. In the previous world when these packets used to appear only on bonding-master, that service had to go extra-lengths to figure it out which link it actually came on in. With the earlier change (SHA1: b89f04c61efe) it didn't have to but with this patch, the best thing that they could do is just ignore those packets coming from (any) virtual devices. The only reason why I'm OK with this change is because L2 of a physical link is shared with a virtual link (bonding master) and hence both links receiving the same link-local-multicast seems acceptable. Making them appear only on bonding-master is just wrong while correcting that behavior breaks the legacy use case and here we are. BTW when links are aggregated and using LACP, these packets don't arrive the system-mac but the real mac of the sender with a dest multicast-mac. --mahesh.. > Thanks, > > -J > >>Reported-by: Michal Soltys >>Signed-off-by: Mahesh Bandewar >>--- >> drivers/net/bonding/bond_main.c | 17 +++-- >> 1 file changed, 15 insertions(+), 2 deletions(-) >> >>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >>index 9a2ea3c1f949..1d3b7d8448f2 100644 >>--- a/drivers/net/bonding/bond_main.c >>+++ b/drivers/net/bonding/bond_main.c >>@@ -1177,9 +1177,22 @@ static rx_handler_result_t bond_handle_frame(struct >>sk_buff **pskb) >> } >> } >> >>- /* don't change skb->dev for link-local packets */ >>- if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) >>+ /* Link-local multicast packets should be passed to the >>+ * stack on the link they arrive as well as pass them to the >>+ * bond-master device. These packets are mostly usable when >>+ * stack receives it with the link on which they arrive >>+ * (e.g. LLDP) but there may be some legacy behavior that >>+ * expects these packets to appear on bonding master too. >>+ */ >>+ if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) { >>+ struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC); >>+ >>+ if (nskb) { >>+ nskb->dev = bond->dev; >>+ netif_rx(nskb); >>+ } >> return RX_HANDLER_PASS; >>+ } >> if (bond_should_deliver_exact_match(skb, slave, bond)) >> return RX_HANDLER_EXACT; >> >>-- >>2.18.0.203.gfac676dfb9-goog > > --- > -Jay Vosburgh, jay.vosbu...@canonical.com
[net:master 66/72] drivers/net/hyperv/rndis_filter.c:1341:16: sparse: Using plain integer as NULL pointer
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master head: 3578a7ecb69920efc3885dbd610e98c00dbdf5db commit: 916c5e1413be058d1c1f6e502db350df890730ce [66/72] hv/netvsc: fix handling of fallback to single queue mode reproduce: # apt-get install sparse git checkout 916c5e1413be058d1c1f6e502db350df890730ce make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) drivers/net/hyperv/rndis_filter.c:1307:31: sparse: expression using sizeof(void) drivers/net/hyperv/rndis_filter.c:1307:31: sparse: expression using sizeof(void) drivers/net/hyperv/rndis_filter.c:1310:31: sparse: expression using sizeof(void) drivers/net/hyperv/rndis_filter.c:1313:31: sparse: expression using sizeof(void) drivers/net/hyperv/rndis_filter.c:1313:31: sparse: expression using sizeof(void) >> drivers/net/hyperv/rndis_filter.c:1341:16: sparse: Using plain integer as >> NULL pointer vim +1341 drivers/net/hyperv/rndis_filter.c 1224 1225 struct netvsc_device *rndis_filter_device_add(struct hv_device *dev, 1226struct netvsc_device_info *device_info) 1227 { 1228 struct net_device *net = hv_get_drvdata(dev); 1229 struct netvsc_device *net_device; 1230 struct rndis_device *rndis_device; 1231 struct ndis_recv_scale_cap rsscap; 1232 u32 rsscap_size = sizeof(struct ndis_recv_scale_cap); 1233 u32 mtu, size; 1234 u32 num_possible_rss_qs; 1235 int i, ret; 1236 1237 rndis_device = get_rndis_device(); 1238 if (!rndis_device) 1239 return ERR_PTR(-ENODEV); 1240 1241 /* Let the inner driver handle this first to create the netvsc channel 1242 * NOTE! Once the channel is created, we may get a receive callback 1243 * (RndisFilterOnReceive()) before this call is completed 1244 */ 1245 net_device = netvsc_device_add(dev, device_info); 1246 if (IS_ERR(net_device)) { 1247 kfree(rndis_device); 1248 return net_device; 1249 } 1250 1251 /* Initialize the rndis device */ 1252 net_device->max_chn = 1; 1253 net_device->num_chn = 1; 1254 1255 net_device->extension = rndis_device; 1256 rndis_device->ndev = net; 1257 1258 /* Send the rndis initialization message */ 1259 ret = rndis_filter_init_device(rndis_device, net_device); 1260 if (ret != 0) 1261 goto err_dev_remv; 1262 1263 /* Get the MTU from the host */ 1264 size = sizeof(u32); 1265 ret = rndis_filter_query_device(rndis_device, net_device, 1266 RNDIS_OID_GEN_MAXIMUM_FRAME_SIZE, 1267 , ); 1268 if (ret == 0 && size == sizeof(u32) && mtu < net->mtu) 1269 net->mtu = mtu; 1270 1271 /* Get the mac address */ 1272 ret = rndis_filter_query_device_mac(rndis_device, net_device); 1273 if (ret != 0) 1274 goto err_dev_remv; 1275 1276 memcpy(device_info->mac_adr, rndis_device->hw_mac_adr, ETH_ALEN); 1277 1278 /* Get friendly name as ifalias*/ 1279 if (!net->ifalias) 1280 rndis_get_friendly_name(net, rndis_device, net_device); 1281 1282 /* Query and set hardware capabilities */ 1283 ret = rndis_netdev_set_hwcaps(rndis_device, net_device); 1284 if (ret != 0) 1285 goto err_dev_remv; 1286 1287 rndis_filter_query_device_link_status(rndis_device, net_device); 1288 1289 netdev_dbg(net, "Device MAC %pM link state %s\n", 1290 rndis_device->hw_mac_adr, 1291 rndis_device->link_state ? "down" : "up"); 1292 1293 if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_5) 1294 goto out; 1295 1296 rndis_filter_query_link_speed(rndis_device, net_device); 1297 1298 /* vRSS setup */ 1299 memset(, 0, rsscap_size); 1300 ret = rndis_filter_query_device(rndis_device, net_device, 1301 OID_GEN_RECEIVE_SCALE_CAPABILITIES, 1302 , _size); 1303 if (ret || rsscap.num_recv_que < 2) 1304 goto out; 1305 1306 /* This guarantees that num_possible_rss_qs <= num_online_cpus */ > 1307 num_possible_rss_qs = min_t(u32, num_online_cpus(), 1308 rsscap.num_recv_que); 1309 1310 net_device->max_chn = min_t(u32, VRSS_CHANNEL_MAX, num_possible_rss_qs); 1311
Re: [PATCH][net-next][v2] net: convert gro_count to bitmask
On 07/12/2018 11:41 PM, Li RongQing wrote: > gro_hash size is 192 bytes, and uses 3 cache lines, if there is few */ > @@ -9264,6 +9273,9 @@ static struct hlist_head * __net_init > netdev_create_hash(void) > /* Initialize per network namespace state */ > static int __net_init netdev_init(struct net *net) > { > + BUILD_BUG_ON(GRO_HASH_BUCKETS > > + FIELD_SIZEOF(struct napi_struct, gro_bitmask)); > + Sorry for the delay (patch is already merged) This looks wrong to me. FIELD_SIZEOF() is in bytes not bits. I guess we could either use BITS_PER_LONG or : diff --git a/net/core/dev.c b/net/core/dev.c index c883b17ee0fe2c8a7ca2f2867560ba74004790a7..4f8b92d81d107fc9acd2499297435cbd9e9b5c67 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9282,7 +9282,7 @@ static struct hlist_head * __net_init netdev_create_hash(void) static int __net_init netdev_init(struct net *net) { BUILD_BUG_ON(GRO_HASH_BUCKETS > - FIELD_SIZEOF(struct napi_struct, gro_bitmask)); +8 * FIELD_SIZEOF(struct napi_struct, gro_bitmask)); if (net != _net) INIT_LIST_HEAD(>dev_base_head);
Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni wrote: > > When mirred is invoked from the ingress path, and it wants to redirect > the processed packet, it can now use the ACT_REDIRECT action, > filling the tcf_result accordingly. > > This avoids a skb_clone() in the TC S/W data path giving a ~10% > improvement in forwarding performances. Overall TC S/W performances > are now comparable to the kernel openswitch datapath. Avoiding skb_clone() for redirection is cool, but why need to use skb_do_redirect() here? There is a subtle difference here: skb_do_redirect() calls __bpf_rx_skb() which calls dev_forward_skb(). while the current mirred action doesn't scrub packets when redirecting to ingress (from egress). Although I forget if it is intentionally. Also, skb->skb_iif is unset in skb_do_redirect() when redirecting to ingress, I recall we have to set it correctly for input routing. Probably yet another reason why we can't scrub it, unless my memory goes wrong. :) Thanks!
Re: [PATCH bpf-next 0/2] tools: bpf: build cleanups
On Tue, Jul 17, 2018 at 12:34:03AM +0200, Daniel Borkmann wrote: > On 07/16/2018 07:57 PM, Jakub Kicinski wrote: > > Hi! > > > > While tracking down the perf vs libbpf vs reallocarray build issue > > I noticed libbpf is checking for a feature it never uses and that > > bpftool's makefile attempt to reuse feature dump doesn't really > > make sense. > > > > Jakub Kicinski (2): > > tools: libbpf: remove libelf-getphdrnum feature detection > > tools: bpftool: don't pass FEATURES_DUMP to libbpf > > > > tools/bpf/bpftool/Makefile | 2 +- > > tools/lib/bpf/Makefile | 6 +- > > 2 files changed, 2 insertions(+), 6 deletions(-) > > > > Acked-by: Daniel Borkmann somehow cover letter didn't make it into patchworks, so I applied both patches manually to bpf-next and propagated Daniel's Ack. Thanks!
Re: [PATCH next] bonding: pass link-local packets to bonding master also.
On Sun, 15 Jul 2018 18:12:46 -0700 Mahesh Bandewar wrote: > From: Mahesh Bandewar > > Commit b89f04c61efe ("bonding: deliver link-local packets with > skb->dev set to link that packets arrived on") changed the behavior > of how link-local-multicast packets are processed. The change in > the behavior broke some legacy use cases where these packets are > expected to arrive on bonding master device also. > > This patch passes the packet to the stack with the link it arrived > on as well as passes to the bonding-master device to preserve the > legacy use case. > > Reported-by: Michal Soltys > Signed-off-by: Mahesh Bandewar Thanks for fixing this. Why not add a Fixes: tag instead of just talking about the commit? That helps the stable maintainers know which versions of the kernel need the patch.
Re: [PATCH v2 net-next 01/14] net: Clear skb->tstamp only on the forwarding path
On 07/16/2018 02:52 PM, Jesus Sanchez-Palencia wrote: > Hi Eric, > > > > On 07/13/2018 10:35 AM, Eric Dumazet wrote: >> >> >> On 07/03/2018 03:42 PM, Jesus Sanchez-Palencia wrote: >>> This is done in preparation for the upcoming time based transmission >>> patchset. Now that skb->tstamp will be used to hold packet's txtime, >>> we must ensure that it is being cleared when traversing namespaces. >>> Also, doing that from skb_scrub_packet() before the early return would >>> break our feature when tunnels are used. >>> >>> Signed-off-by: Jesus Sanchez-Palencia >>> --- >>> net/core/skbuff.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>> index 1357f36c8a5e..c4e24ac27464 100644 >>> --- a/net/core/skbuff.c >>> +++ b/net/core/skbuff.c >>> @@ -4898,7 +4898,6 @@ EXPORT_SYMBOL(skb_try_coalesce); >>> */ >>> void skb_scrub_packet(struct sk_buff *skb, bool xnet) >>> { >>> - skb->tstamp = 0; >>> skb->pkt_type = PACKET_HOST; >>> skb->skb_iif = 0; >>> skb->ignore_df = 0; >>> @@ -4912,6 +4911,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet) >>> >>> ipvs_reset(skb); >>> skb->mark = 0; >>> + skb->tstamp = 0; >>> } >>> EXPORT_SYMBOL_GPL(skb_scrub_packet); >>> >>> >> >> >> >> I believe we had some misunderstanding here. >> >> What I meant by forwarding is the following case : >> >> - We receive a packet. >> - netstamp_wanted is >0 (because at least one packet capture is active) >> - __net_timestamp() is called and does : >> skb->tstamp = ktime_get_real(); >> >> Then this skb is forwarded into an interface where EDT is taken into >> consideration by either a qdisc or a device. >> >> Since CLOCK_TAI is a different base than CLOCK_REALTIME, we might have a >> problem. > > > I'm not sure we have a problem here. For the Tx path I only see > net_timestamp_set() being called from dev_queue_xmit_nit(). And even there, > it's > a clone of the skb that gets timestamped. > > I believe the original skb, which had the valid txtime copied into > skb->tstamp, > is not modified anywhere along that path. > > What am I missing, please? > > Thanks, > Jesus > I am simply stating that a linux router, receiving packet on ethX and forwarding them on ethY, could have a problem if ethY has a qdisc looking at skb->tstamp assuming a timestamp in CLOCK_TAI base. In this case, skb->tstamp would have been set at ingress (not using CLOCK_TAI but CLOCK_REALTIME), and would be read at egress (assuming CLOCK_TAI) Normal IPV4 routing path would be in net/ipv4/ip_forward.c, no scrubbing ever happens, and no cloning either. Your patch (Clear skb->tstamp only on the forwarding path) is not handling the typical forward path, only the cases where 'scrubbing' is used. > > >> >> >> Solutions for this problem : >> >> 1) Convert all our skb->tstamp usages to CLOCK_TAI base. >> >> or >> >> 2) clear skb->tstamp in forwarding paths, including the ones not scrubbing >> the packet. >> >> My preference is 1), even if it is a bit more work. >>
Re: [PATCH bpf-next 0/2] tools: bpf: build cleanups
On 07/16/2018 07:57 PM, Jakub Kicinski wrote: > Hi! > > While tracking down the perf vs libbpf vs reallocarray build issue > I noticed libbpf is checking for a feature it never uses and that > bpftool's makefile attempt to reuse feature dump doesn't really > make sense. > > Jakub Kicinski (2): > tools: libbpf: remove libelf-getphdrnum feature detection > tools: bpftool: don't pass FEATURES_DUMP to libbpf > > tools/bpf/bpftool/Makefile | 2 +- > tools/lib/bpf/Makefile | 6 +- > 2 files changed, 2 insertions(+), 6 deletions(-) > Acked-by: Daniel Borkmann
Re: [PATCH ipsec-next] xfrm: Allow Set Mark to be Updated Using UPDSA
< re-sent with apologies due to incorrect formatting last time... :-( > Hi Eyal, > If x1 points to a state previously found using __xfrm_state_locate(x), > won't __xfrm_state_bump_genids(x1) be equivalent to x1->genid++ in > this case? In the vanilla case this is true. IE, if there are no strange/abusive uses of the API such as the test below where multiple SAs can match the locate(). > Is it possible that other states will match all of x1 parameters? Yes. Not sure if it's a bug or a feature, but it's possible for multiple SAs to match... for a depressing example, check out https://android-review.googlesource.com/c/kernel/tests/+/680958. There may be cases where something like this is desired behavior that I'm not aware of. Since this is control path, it felt to me like the formalism of using the xfrm_state_bump_genids() was worth not possibly walking into a different subtle bug later. > Also, any idea why this isn't needed for other changes in the state? The set_mark (output_mark) is somewhat special because changing this mark impacts the routing lookup, which up to now, none of the other parameters in the update_sa function do. A new output_mark can and will reroute packets to different interfaces. Thus, when we change this thing, we want to ensure that we always build a new bundle with a new bundle with a new route lookup based on the new set_mark. Since we removed the flow cache, things might *incidentally* seem to work right now; but, I think that's incidental rather than correct. By bumping the genid, we get the dst_entry->check() function to correctly return that the dst is obsolete when we call check(). I'm honestly not sure what corner cases we could land in if we didn't bump the genid in such a case. There's definitely a lot going on behind the scenes in this little change that I only tenuously grasp, so it's possible that I'm being overly cautious in this case. Please let me know your further thoughts on whether we need to bump the genid. FYI once this patch is settled, I plan to upload a patch to update the xfrm_if_id, which I planned to nestle in to this same logic (and with similar, albeit possibly more-straightforward rationale). -Nathan On Mon, Jul 2, 2018 at 10:14 PM, Eyal Birger wrote: > Hi Nathan, > > On Fri, 29 Jun 2018 15:07:10 -0700 > Nathan Harold wrote: > >> Allow UPDSA to change "set mark" to permit >> policy separation of packet routing decisions from >> SA keying in systems that use mark-based routing. >> >> The set mark, used as a routing and firewall mark >> for outbound packets, is made update-able which >> allows routing decisions to be handled independently >> of keying/SA creation. To maintain consistency with >> other optional attributes, the set mark is only >> updated if sent with a non-zero value. >> >> The per-SA lock and the xfrm_state_lock are taken in >> that order to avoid a deadlock with >> xfrm_timer_handler(), which also takes the locks in >> that order. >> >> Signed-off-by: Nathan Harold >> Change-Id: Ia05c6733a94c1901cd1e54eb7c7e237704678d71 >> --- >> net/xfrm/xfrm_state.c | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c >> index e04a510ec992..c9ffcdfa89f6 100644 >> --- a/net/xfrm/xfrm_state.c >> +++ b/net/xfrm/xfrm_state.c >> @@ -1562,6 +1562,15 @@ int xfrm_state_update(struct xfrm_state *x) >> if (x1->curlft.use_time) >> xfrm_state_check_expire(x1); >> >> + if (x->props.smark.m || x->props.smark.v) { >> + spin_lock_bh(>xfrm.xfrm_state_lock); >> + >> + x1->props.smark = x->props.smark; >> + >> + __xfrm_state_bump_genids(x1); > > So I'm trying to wrap my head around this genid thing :) > > If x1 points to a state previously found using __xfrm_state_locate(x), > won't __xfrm_state_bump_genids(x1) be equivalent to x1->genid++ in > this case? > > Is it possible that other states will match all of x1 parameters? > > Also, any idea why this isn't needed for other changes in the state? > > Thanks! > Eyal.
[PATCH iproute2 net-next] ipneigh: exclude NTF_EXT_LEARNED from default filter
From: Roopa Prabhu NUD_NOARP entries are filtered out by default by iproute2. We dont want NUD_NOARP with NTF_EXT_LEARNED flag filtered out. This patch extends the default filter check for ip neigh show to include the NTF_EXT_LEARNED flag. Signed-off-by: Roopa Prabhu --- ip/ipneigh.c | 1 + 1 file changed, 1 insertion(+) diff --git a/ip/ipneigh.c b/ip/ipneigh.c index bd6e5c5..a0af705 100644 --- a/ip/ipneigh.c +++ b/ip/ipneigh.c @@ -262,6 +262,7 @@ int print_neigh(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) return 0; if (!(filter.state>ndm_state) && !(r->ndm_flags & NTF_PROXY) && + !(r->ndm_flags & NTF_EXT_LEARNED) && (r->ndm_state || !(filter.state&0x100)) && (r->ndm_family != AF_DECnet)) return 0; -- 2.1.4
Re: [PATCH 2/2] samples/bpf: test_cgrp2_sock2: fix an off by one
On Fri, Jul 13, 2018 at 06:05:37PM +0300, Dan Carpenter wrote: > "prog_cnt" is the number of elements which are filled out in prog_fd[] > so the test should be >= instead of >. > > Signed-off-by: Dan Carpenter since this is sample code I've applied both patches to bpf-next tree. Thanks
RE: [RFC PATCH mlx5-next 07/18] net/mlx5: Expose new packet reformat capabilities
> -Original Message- > From: Or Gerlitz [mailto:gerlitz...@gmail.com] > Sent: Monday, July 16, 2018 2:33 PM > To: Mark Bloch > Cc: Doug Ledford ; Jason Gunthorpe > ; Leon Romanovsky ; RDMA > mailing list ; Saeed Mahameed > ; linux-netdev > Subject: Re: [RFC PATCH mlx5-next 07/18] net/mlx5: Expose new packet > reformat capabilities > > On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky > wrote: > > From: Mark Bloch > > > > Expose new abilities when creating a packet reformat context. > > > > The new types which can be created are: > > MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL: Ability to create generic > encap > > opertion to be done by the HW. > > opertion -> fix > > > MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2: Ability to create generic > decap > > opertion where the inner packet doesn't contain L2. > > opertion -> fix > > > > > MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL: Ability to create generic > encap > > opertion to be done by the HW. The L2 of the original packet > > opertion -> fix Thx, will be fixed. > > > is dropped. > > > > Signed-off-by: Mark Bloch > > Signed-off-by: Leon Romanovsky > > --- > > include/linux/mlx5/mlx5_ifc.h | 20 +--- > > 1 file changed, 17 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h > > index 059ec97e7b32..c71d711d4893 100644 > > --- a/include/linux/mlx5/mlx5_ifc.h > > +++ b/include/linux/mlx5/mlx5_ifc.h > > @@ -341,8 +341,13 @@ struct mlx5_ifc_flow_table_prop_layout_bits { > > u8 reserved_at_9[0x1]; > > u8 pop_vlan[0x1]; > > u8 push_vlan[0x1]; > > - u8 reserved_at_c[0x14]; > > - > > + u8 reserved_at_c[0x3]; > > + u8 reformat_and_vlan_action[0x1]; > > unused in downstream patches > what is this BTW? It's needed for competence for all the bits that deal with packet reformat. The bit itself indicates whatever the flow table supports reformat action with a vlan action (pop/push) in the same rule. > > > + u8 reserved_at_10[0x2]; > > + u8 reformat_l3_tunnel_to_l2[0x1]; > > + u8 reformat_l2_to_l3_tunnel[0x1]; > > + u8 reformat_and_modify_action[0x1]; > > unused in downstream patches > what is this BTW? Bits to indicate whatever the flow table support the new packet reformat modes, and setting reformat action with modify action in the same rule. Those will be used once a FW which expose them is made available, but as a feature/ cap flags I would like to expose them now. Mark > > > > > + u8 reserved_at_15[0xb]; > > u8 reserved_at_20[0x2]; > > u8 log_max_ft_size[0x6]; > > u8 log_max_modify_header_context[0x8]; > > @@ -551,7 +556,13 @@ struct mlx5_ifc_flow_table_nic_cap_bits { > > u8 nic_rx_multi_path_tirs[0x1]; > > u8 nic_rx_multi_path_tirs_fts[0x1]; > > u8 allow_sniffer_and_nic_rx_shared_tir[0x1]; > > - u8 reserved_at_3[0x1fd]; > > + u8 reserved_at_3[0x1d]; > > + u8 encap_general_header[0x1]; > > + u8 reserved_at_21[0xa]; > > + u8 log_max_packet_reformat_context[0x5]; > > + u8 reserved_at_30[0x6]; > > + u8 max_encap_header_size[0xa]; > > + u8 reserved_at_40[0x1c0]; > > we are inconsistent, for some fields the term "encap" remained wheres > for other fields we moved to use "reformat" or "packet reformat" etc
Re: [PATCH v2 net-next 01/14] net: Clear skb->tstamp only on the forwarding path
Hi Eric, On 07/13/2018 10:35 AM, Eric Dumazet wrote: > > > On 07/03/2018 03:42 PM, Jesus Sanchez-Palencia wrote: >> This is done in preparation for the upcoming time based transmission >> patchset. Now that skb->tstamp will be used to hold packet's txtime, >> we must ensure that it is being cleared when traversing namespaces. >> Also, doing that from skb_scrub_packet() before the early return would >> break our feature when tunnels are used. >> >> Signed-off-by: Jesus Sanchez-Palencia >> --- >> net/core/skbuff.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index 1357f36c8a5e..c4e24ac27464 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -4898,7 +4898,6 @@ EXPORT_SYMBOL(skb_try_coalesce); >> */ >> void skb_scrub_packet(struct sk_buff *skb, bool xnet) >> { >> -skb->tstamp = 0; >> skb->pkt_type = PACKET_HOST; >> skb->skb_iif = 0; >> skb->ignore_df = 0; >> @@ -4912,6 +4911,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet) >> >> ipvs_reset(skb); >> skb->mark = 0; >> +skb->tstamp = 0; >> } >> EXPORT_SYMBOL_GPL(skb_scrub_packet); >> >> > > > > I believe we had some misunderstanding here. > > What I meant by forwarding is the following case : > > - We receive a packet. > - netstamp_wanted is >0 (because at least one packet capture is active) > - __net_timestamp() is called and does : > skb->tstamp = ktime_get_real(); > > Then this skb is forwarded into an interface where EDT is taken into > consideration by either a qdisc or a device. > > Since CLOCK_TAI is a different base than CLOCK_REALTIME, we might have a > problem. I'm not sure we have a problem here. For the Tx path I only see net_timestamp_set() being called from dev_queue_xmit_nit(). And even there, it's a clone of the skb that gets timestamped. I believe the original skb, which had the valid txtime copied into skb->tstamp, is not modified anywhere along that path. What am I missing, please? Thanks, Jesus > > > Solutions for this problem : > > 1) Convert all our skb->tstamp usages to CLOCK_TAI base. > > or > > 2) clear skb->tstamp in forwarding paths, including the ones not scrubbing > the packet. > > My preference is 1), even if it is a bit more work. >
Re: [PATCH net-next] cxgb4: collect ASIC LA dumps from ULP TX
From: Rahul Lakkireddy Date: Mon, 16 Jul 2018 19:40:54 +0530 > From: Surendra Mobiya > > Signed-off-by: Surendra Mobiya > Signed-off-by: Rahul Lakkireddy > Signed-off-by: Ganesh Goudar Applied, thank you.
RE: [RFC PATCH rdma-next 13/18] RDMA/mlx5: Enable decap and packet reformat on flow tables
> -Original Message- > From: Or Gerlitz [mailto:gerlitz...@gmail.com] > Sent: Monday, July 16, 2018 2:24 PM > To: Mark Bloch > Cc: Doug Ledford ; Jason Gunthorpe > ; Leon Romanovsky ; RDMA > mailing list ; Saeed Mahameed > ; linux-netdev > Subject: Re: [RFC PATCH rdma-next 13/18] RDMA/mlx5: Enable decap and > packet reformat on flow tables > > On Mon, Jul 16, 2018 at 11:23 AM, Leon Romanovsky > wrote: > > From: Mark Bloch > > > > If NIC RX flow tables support decap opertion, enable it on creation. > > opertion --> operation > > > If NIC TX flow tables support reformat opertion, enable it on creation. > > What is the trigger to use the decap flag on RX table or encap flag on > TX table? > It has no performance penalty to always enable that, so that's what I do if supported. > Please note that we have a short blanket w.r.t mutual usage by FDB and NIC steering tables have different limitations, so encap/decap on NIC steering have nothing to do with the limitations the FDB has with those operations. > NIC vs e-Switch steering, did you consider to do that on demand? The flow table needs to be created with those flags set if we want to attach decap/packet reformat action to it. BTW, there is no modify action for those bits so that's why I'm doing it on creation. Mark
Re: [PATCH net 0/2] tg3: Update copyright and fix for tx timeout with 5762
From: Siva Reddy Kallam Date: Mon, 16 Jul 2018 11:13:30 +0530 > From: Siva Reddy Kallam > > First patch: > Update copyright > > Second patch: > Add higher cpu clock for 5762 Series applied, thank you.
Re: [PATCH net] ibmvnic: Fix error recovery on login failure
From: John Allen Date: Mon, 16 Jul 2018 10:29:30 -0500 > Testing has uncovered a failure case that is not handled properly. In the > event that a login fails and we are not able to recover on the spot, we > return 0 from do_reset, preventing any error recovery code from being > triggered. Additionally, the state is set to "probed" meaning that when we > are able to trigger the error recovery, the driver always comes up in the > probed state. To handle the case properly, we need to return a failure code > here and set the adapter state to the state that we entered the reset in > indicating the state that we would like to come out of the recovery reset > in. > > Signed-off-by: John Allen Applied, thanks.
Re: [RFC PATCH mlx5-next 07/18] net/mlx5: Expose new packet reformat capabilities
On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky wrote: > From: Mark Bloch > > Expose new abilities when creating a packet reformat context. > > The new types which can be created are: > MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL: Ability to create generic encap > opertion to be done by the HW. opertion -> fix > MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2: Ability to create generic decap > opertion where the inner packet doesn't contain L2. opertion -> fix > > MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL: Ability to create generic encap > opertion to be done by the HW. The L2 of the original packet opertion -> fix > is dropped. > > Signed-off-by: Mark Bloch > Signed-off-by: Leon Romanovsky > --- > include/linux/mlx5/mlx5_ifc.h | 20 +--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h > index 059ec97e7b32..c71d711d4893 100644 > --- a/include/linux/mlx5/mlx5_ifc.h > +++ b/include/linux/mlx5/mlx5_ifc.h > @@ -341,8 +341,13 @@ struct mlx5_ifc_flow_table_prop_layout_bits { > u8 reserved_at_9[0x1]; > u8 pop_vlan[0x1]; > u8 push_vlan[0x1]; > - u8 reserved_at_c[0x14]; > - > + u8 reserved_at_c[0x3]; > + u8 reformat_and_vlan_action[0x1]; unused in downstream patches what is this BTW? > + u8 reserved_at_10[0x2]; > + u8 reformat_l3_tunnel_to_l2[0x1]; > + u8 reformat_l2_to_l3_tunnel[0x1]; > + u8 reformat_and_modify_action[0x1]; unused in downstream patches what is this BTW? > + u8 reserved_at_15[0xb]; > u8 reserved_at_20[0x2]; > u8 log_max_ft_size[0x6]; > u8 log_max_modify_header_context[0x8]; > @@ -551,7 +556,13 @@ struct mlx5_ifc_flow_table_nic_cap_bits { > u8 nic_rx_multi_path_tirs[0x1]; > u8 nic_rx_multi_path_tirs_fts[0x1]; > u8 allow_sniffer_and_nic_rx_shared_tir[0x1]; > - u8 reserved_at_3[0x1fd]; > + u8 reserved_at_3[0x1d]; > + u8 encap_general_header[0x1]; > + u8 reserved_at_21[0xa]; > + u8 log_max_packet_reformat_context[0x5]; > + u8 reserved_at_30[0x6]; > + u8 max_encap_header_size[0xa]; > + u8 reserved_at_40[0x1c0]; we are inconsistent, for some fields the term "encap" remained wheres for other fields we moved to use "reformat" or "packet reformat" etc
Re: [RFC PATCH mlx5-next 04/18] net/mlx5: Break encap/decap into two separated flags
On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky wrote: > From: Mark Bloch > > Today we are able to attach encap and decap actions only to the FDB. > In preparation to enable those actions on the NIC flow tables break tables break --> tables, break > the single flag into two.
Re: [RFC PATCH mlx5-next 02/18] net/mlx5: Export modify header alloc/dealloc functions
On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky wrote: > From: Mark Bloch > > Those function will be used by the RDMA side to create modify header function --> functions > actions to be attached to flow steering rules via verbs.
Re: [PATCH next] bonding: pass link-local packets to bonding master also.
Mahesh Bandewar wrote: >From: Mahesh Bandewar > >Commit b89f04c61efe ("bonding: deliver link-local packets with >skb->dev set to link that packets arrived on") changed the behavior >of how link-local-multicast packets are processed. The change in >the behavior broke some legacy use cases where these packets are >expected to arrive on bonding master device also. > >This patch passes the packet to the stack with the link it arrived >on as well as passes to the bonding-master device to preserve the >legacy use case. Michal, can you test this? I'm travelling this week and won't be able to run the patch. Mahesh, will this confuse LLDP, et al, daemons that, e.g., bind to every possible interface and now see the same LLDP PDU (identical Chassis ID, Port ID, et al, TLVs) on multiple interfaces? Thanks, -J >Reported-by: Michal Soltys >Signed-off-by: Mahesh Bandewar >--- > drivers/net/bonding/bond_main.c | 17 +++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 9a2ea3c1f949..1d3b7d8448f2 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -1177,9 +1177,22 @@ static rx_handler_result_t bond_handle_frame(struct >sk_buff **pskb) > } > } > >- /* don't change skb->dev for link-local packets */ >- if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) >+ /* Link-local multicast packets should be passed to the >+ * stack on the link they arrive as well as pass them to the >+ * bond-master device. These packets are mostly usable when >+ * stack receives it with the link on which they arrive >+ * (e.g. LLDP) but there may be some legacy behavior that >+ * expects these packets to appear on bonding master too. >+ */ >+ if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) { >+ struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC); >+ >+ if (nskb) { >+ nskb->dev = bond->dev; >+ netif_rx(nskb); >+ } > return RX_HANDLER_PASS; >+ } > if (bond_should_deliver_exact_match(skb, slave, bond)) > return RX_HANDLER_EXACT; > >-- >2.18.0.203.gfac676dfb9-goog --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [RFC PATCH rdma-next 13/18] RDMA/mlx5: Enable decap and packet reformat on flow tables
On Mon, Jul 16, 2018 at 11:23 AM, Leon Romanovsky wrote: > From: Mark Bloch > > If NIC RX flow tables support decap opertion, enable it on creation. opertion --> operation > If NIC TX flow tables support reformat opertion, enable it on creation. What is the trigger to use the decap flag on RX table or encap flag on TX table? Please note that we have a short blanket w.r.t mutual usage by NIC vs e-Switch steering, did you consider to do that on demand?
Re: [RFC PATCH mlx5-next 01/18] net/mlx5: Add proper NIC TX steering flow tables support
On Mon, Jul 16, 2018 at 11:22 AM, Leon Romanovsky wrote: > From: Mark Bloch > > Expose the ability to add steering rules to NIC TX flow tables. > For now, we are only adding TX bypass (egress) which is used by the RDMA > side. While we are here clean the switch logic. > > We expose the same number of priorities as the RX bypass. What is the use-case / model for priorities in TX steering? Is/where this (tx prios) is used @ downstream patch? Or.
Re: [PATCH v2 net] net/ipv6: Do not allow device only routes via the multipath API
From: dsah...@kernel.org Date: Sun, 15 Jul 2018 09:35:19 -0700 > From: David Ahern > > Eric reported that reverting the patch that fixed and simplified IPv6 > multipath routes means reverting back to invalid userspace notifications. > eg., > $ ip -6 route add 2001:db8:1::/64 nexthop dev eth0 nexthop dev eth1 > > only generates a single notification: > 2001:db8:1::/64 dev eth0 metric 1024 pref medium > > While working on a fix for this problem I found another case that is just > broken completely - a multipath route with a gateway followed by device > followed by gateway: > $ ip -6 ro add 2001:db8:103::/64 > nexthop via 2001:db8:1::64 > nexthop dev dummy2 > nexthop via 2001:db8:3::64 > > In this case the device only route is dropped completely - no notification > to userpsace but no addition to the FIB either: > > $ ip -6 ro ls > 2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium > 2001:db8:2::/64 dev dummy2 proto kernel metric 256 pref medium > 2001:db8:3::/64 dev dummy3 proto kernel metric 256 pref medium > 2001:db8:103::/64 metric 1024 > nexthop via 2001:db8:1::64 dev dummy1 weight 1 > nexthop via 2001:db8:3::64 dev dummy3 weight 1 pref medium > fe80::/64 dev dummy1 proto kernel metric 256 pref medium > fe80::/64 dev dummy2 proto kernel metric 256 pref medium > fe80::/64 dev dummy3 proto kernel metric 256 pref medium > > Really, IPv6 multipath is just FUBAR'ed beyond repair when it comes to > device only routes, so do not allow it all. > > This change will break any scripts relying on the mpath api for insert, > but I don't see any other way to handle the permutations. Besides, since > the routes are added to the FIB as standalone (non-multipath) routes the > kernel is not doing what the user requested, so it might as well tell the > user that. > > Reported-by: Eric Dumazet > Signed-off-by: David Ahern Applied, thanks David. Is this a -stable candidate?
Re: [PATCH net] net/mlx4_en: Don't reuse RX page when XDP is set
From: Tariq Toukan Date: Sun, 15 Jul 2018 13:54:39 +0300 > From: Saeed Mahameed > > When a new rx packet arrives, the rx path will decide whether to reuse > the remainder of the page or not according to one of the below conditions: > 1. frag_info->frag_stride == PAGE_SIZE / 2 > 2. frags->page_offset + frag_info->frag_size > PAGE_SIZE; > > The first condition is no met for when XDP is set. > For XDP, page_offset is always set to priv->rx_headroom which is > XDP_PACKET_HEADROOM and frag_info->frag_size is around mtu size + some > padding, still the 2nd release condition will hold since > XDP_PACKET_HEADROOM + 1536 < PAGE_SIZE, as a result the page will not > be released and will be _wrongly_ reused for next free rx descriptor. > > In XDP there is an assumption to have a page per packet and reuse can > break such assumption and might cause packet data corruptions. > > Fix this by adding an extra condition (!priv->rx_headroom) to the 2nd > case to avoid page reuse when XDP is set, since rx_headroom is set to 0 > for non XDP setup and set to XDP_PACKET_HEADROOM for XDP setup. > > No additional cache line is required for the new condition. > > Fixes: 34db548bfb95 ("mlx4: add page recycling in receive path") > Signed-off-by: Saeed Mahameed > Signed-off-by: Tariq Toukan > Suggested-by: Martin KaFai Lau Applied and queued up for -stable.
Re: [RFC net-next v1 1/1] net/sched: Introduce the taprio scheduler
On Mon, 16 Jul 2018 10:13:23 -0700, Vinicius Costa Gomes wrote: > Hi Jiri, > > Jiri Pirko writes: > > [...] > > >> > >>gates.sched > > > > Any particular reason this has to be in file and not on the cmdline? > > The idea here was to keep longer schedules more manageable. And during > testing I found it more ergonomic to have a file. > > It also has the advantage that the file can be reused by other tools, > dump-classifier (awful name, I admit), included in that github gist, is > one example, it uses the schedule (and some more information) to > calculate which packets would fall outside their "windows" in a pcap > dump. > > Anyway, if there are use cases that having the schedule in the command > line helps, I would be happy to add it. FWIW there is some precedent in cls_bpf/act_bpf for allowing specifying potentially long sequences both in command line and as a file (cBPF filters in that case - see man tc-bpf bytecode and bytecode-file).
Re: [PATCH net-next] mlxsw: spectrum: Expose counters for various packet sizes
From: Ido Schimmel Date: Sun, 15 Jul 2018 10:45:42 +0300 > From: Jiri Pirko > > Expose counters ASIC has in the group of RFC 2819 counters that count > number of packets within specific size range. > > Signed-off-by: Jiri Pirko > Signed-off-by: Ido Schimmel Applied.
Re: [PATCH net-next] liquidio: fix hang when re-binding VF host drv after running DPDK VF driver
From: Felix Manlunas Date: Fri, 13 Jul 2018 12:50:21 -0700 > From: Rick Farrington > > When configuring SLI_PKTn_OUTPUT_CONTROL, VF driver was assuming that IPTR > mode was disabled by reset, which was not true. Since DPDK driver had > set IPTR mode previously, the VF driver (which uses buf-ptr-only mode) was > not properly handling DROQ packets (i.e. it saw zero-length packets). > > This represented an invalid hardware configuration which the driver could > not handle. > > Signed-off-by: Rick Farrington > Signed-off-by: Felix Manlunas Applied.
[PATCH net] af_unix: ensure POLLOUT on remote close() for connected dgram sockets
Applications use ECONNREFUSED as returned from write() in order to determine that a socket should be closed. When using connected dgram unix sockets in a poll/write loop, this relies on POLLOUT being signaled when the remote end closes. However, due to a race POLLOUT can be missed when the remote closes: thread 1 (client) thread 2 (server) connect() to server write() returns -EAGAIN unix_dgram_poll() -> unix_recvq_full() is true close() ->unix_release_sock() ->wake_up_interruptible_all() unix_dgram_poll() (due to the wake_up_interruptible_all) -> unix_recvq_full() still is true ->free all skbs Now thread 1 is stuck and will not receive anymore wakeups. In this case, when thread 1 gets the -EAGAIN, it has not queued any skbs otherwise the 'free all skbs' step would in fact cause a wakeup and a POLLOUT return. So the race here is probably fairly rare because it means there are no skbs that thread 1 queued and that thread 1 runs before the 'free all skbs' step. Nevertheless, this has been observed when the syslog daemon closes /dev/log. Tested against a reproducer that re-creates the syslog hang. The proposed fix is to move the wake_up_interruptible_all() call after the 'free all skbs' step. Reported-by: Ian Lance Taylor Cc: Rainer Weikusat Signed-off-by: Jason Baron --- net/unix/af_unix.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index e5473c0..de242cf 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -529,8 +529,6 @@ static void unix_release_sock(struct sock *sk, int embrion) sk->sk_state = TCP_CLOSE; unix_state_unlock(sk); - wake_up_interruptible_all(>peer_wait); - skpair = unix_peer(sk); if (skpair != NULL) { @@ -560,6 +558,9 @@ static void unix_release_sock(struct sock *sk, int embrion) kfree_skb(skb); } + /* after freeing skbs to make sure POLLOUT triggers */ + wake_up_interruptible_all(>peer_wait); + if (path.dentry) path_put(); -- 2.7.4
Re: [PATCH net] hv/netvsc: fix handling of fallback to single queue mode
From: Stephen Hemminger Date: Fri, 13 Jul 2018 10:38:38 -0700 > The netvsc device may need to fallback to running in single queue > mode if host side only wants to support single queue. > > Recent change for handling mtu broke this in setup logic. > > Reported-by: Dan Carpenter > Fixes: 3ffe64f1a641 ("hv_netvsc: split sub-channel setup into async and sync") > Signed-off-by: Stephen Hemminger Applied.
Re: [PATCH net] ibmvnic: Revise RX/TX queue error messages
From: Thomas Falcon Date: Fri, 13 Jul 2018 12:03:32 -0500 > During a device failover, there may be latency between the loss > of the current backing device and a notification from firmware that > a failover has occurred. This latency can result in a large amount of > error printouts as firmware returns outgoing traffic with a generic > error code. These are not necessarily errors in this case as the > firmware is busy swapping in a new backing adapter and is not ready > to send packets yet. This patch reclassifies those error codes as > warnings with an explanation that a failover may be pending. All > other return codes will be considered errors. > > Signed-off-by: Thomas Falcon Applied.
Re: [PATCH net] ipv6: make DAD fail with enhanced DAD when nonce length differs
From: Sabrina Dubroca Date: Fri, 13 Jul 2018 17:21:42 +0200 > Commit adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)") > added enhanced DAD with a nonce length of 6 bytes. However, RFC7527 > doesn't specify the length of the nonce, other than being 6 + 8*k bytes, > with integer k >= 0 (RFC3971 5.3.2). The current implementation simply > assumes that the nonce will always be 6 bytes, but others systems are > free to choose different sizes. > > If another system sends a nonce of different length but with the same 6 > bytes prefix, it shouldn't be considered as the same nonce. Thus, check > that the length of the received nonce is the same as the length we sent. > > Ugly scapy test script running on veth0: > > def loop(): > pkt=sniff(iface="veth0", filter="icmp6", count=1) > pkt = pkt[0] > b = bytearray(pkt[Raw].load) > b[1] += 1 > b += b'\xde\xad\xbe\xef\xde\xad\xbe\xef' > pkt[Raw].load = bytes(b) > pkt[IPv6].plen += 8 > # fixup checksum after modifying the payload > pkt[IPv6].payload.cksum -= 0x3b44 > if pkt[IPv6].payload.cksum < 0: > pkt[IPv6].payload.cksum += 0x > sendp(pkt, iface="veth0") > > This should result in DAD failure for any address added to veth0's peer, > but is currently ignored. > > Fixes: adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)") > Signed-off-by: Sabrina Dubroca > Reviewed-by: Stefano Brivio Applied and queued up for -stable, thank you!
Re: [PATCH] net: cavium: Drop dependency of NET_VENDOR_CAVIUM on PCI
From: Alexander Sverdlin Date: Fri, 13 Jul 2018 17:04:28 +0200 > Octeon Ethernet drivers work perfectly without PCI. > > Signed-off-by: Alexander Sverdlin Applied.
Re: [PATCH net-next] cxgb4: do not return DUPLEX_UNKNOWN when link is down
From: Ganesh Goudar Date: Fri, 13 Jul 2018 17:56:55 +0530 > We were returning DUPLEX_UNKNOWN in get_link_ksettings() when > the link was down. Unfortunately, this causes a problem when > "ethtool -s autoneg on" is issued for a link which is down because > the ethtool code first reads the settings and then reapplies them > with only the changes provided on the command line. Which results > in us diving into set_link_ksettings() with DUPLEX_UNKNOWN which is > not DUPLEX_FULL, so set_link_ksettings() throws an -EINVAL error. > do not return DUPLEX_UNKNOWN to fix the issue. > > Signed-off-by: Casey Leedom > Signed-off-by: Ganesh Goudar Applied.
Re: [PATCH][net-next][v2] net: convert gro_count to bitmask
From: Li RongQing Date: Fri, 13 Jul 2018 14:41:36 +0800 > gro_hash size is 192 bytes, and uses 3 cache lines, if there is few > flows, gro_hash may be not fully used, so it is unnecessary to iterate > all gro_hash in napi_gro_flush(), to occupy unnecessary cacheline. > > convert gro_count to a bitmask, and rename it as gro_bitmask, each bit > represents a element of gro_hash, only flush a gro_hash element if the > related bit is set, to speed up napi_gro_flush(). > > and update gro_bitmask only if it will be changed, to reduce cache > update > > Suggested-by: Eric Dumazet > Signed-off-by: Li RongQing > Cc: Stefano Brivio > --- > netperf shows no difference, maybe because my testing machine has large > cache Applied.
Re: [PATCH net-next] net: ip6_gre: get ipv6hdr after skb_cow_head()
From: Prashant Bhole Date: Fri, 13 Jul 2018 14:40:50 +0900 > A KASAN:use-after-free bug was found related to ip6-erspan > while running selftests/net/ip6_gre_headroom.sh > > It happens because of following sequence: > - ipv6hdr pointer is obtained from skb > - skb_cow_head() is called, skb->head memory is reallocated > - old data is accessed using ipv6hdr pointer > > skb_cow_head() call was added in e41c7c68ea77 ("ip6erspan: make sure > enough headroom at xmit."), but looking at the history there was a > chance of similar bug because gre_handle_offloads() and pskb_trim() > can also reallocate skb->head memory. Fixes tag points to commit > which introduced possibility of this bug. > > This patch moves ipv6hdr pointer assignment after skb_cow_head() call. > > Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") > Signed-off-by: Prashant Bhole This bug goes back to 4.16, therefore applied to 'net' and queued up for -stable. Please do not submit bug fixes against 'net-next' in this situation in the future. Thanks.
Re: [PATCH net] tun: Fix use-after-free on XDP_TX
From: Toshiaki Makita Date: Fri, 13 Jul 2018 13:24:38 +0900 > On XDP_TX we need to free up the frame only when tun_xdp_tx() returns a > negative value. A positive value indicates that the packet is > successfully enqueued to the ptr_ring, so freeing the page causes > use-after-free. > > Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking") > Signed-off-by: Toshiaki Makita Applied, thank you.
Re: [PATCH] liquidio: Use %pad printk format for dma_addr_t values
From: Helge Deller Date: Thu, 12 Jul 2018 22:36:29 +0200 > Use the existing %pad printk format to print dma_addr_t values. > This avoids the following warnings when compiling on the parisc platform: > > warning: format '%llx' expects argument of type 'long long unsigned int', but > argument 2 has type 'dma_addr_t {aka unsigned int}' [-Wformat=] > > Signed-off-by: Helge Deller Applied.
Re: [PATCH net-next] net: phy: realtek: add missing entry for RTL8211C to mdio_device_id table
From: Heiner Kallweit Date: Thu, 12 Jul 2018 21:45:08 +0200 > Add missing entry for RTL8211C to mdio_device_id table. > > Signed-off-by: Heiner Kallweit > Fixes: cf87915cb9f8 ("net: phy: realtek: add support for RTL8211C") Applied.
Re: [PATCH net-next v2 0/2] net: phy: add functionality to speed down PHY when waiting for WoL packet
From: Heiner Kallweit Date: Thu, 12 Jul 2018 21:30:19 +0200 > Some network drivers include functionality to speed down the PHY when > suspending and just waiting for a WoL packet because this saves energy. > > This patch is based on our recent discussion about factoring out this > functionality to phylib. First user will be the r8169 driver. > > v2: > - add warning comment to phy_speed_down regarding usage of sync = false > - remove sync parameter from phy_speed_up Series applied, thank you.
Re: [PATCH net-next] selftests: tls: add selftests for TLS sockets
From: Dave Watson Date: Thu, 12 Jul 2018 10:59:20 -0700 > Add selftests for tls socket. Tests various iov and message options, > poll blocking and nonblocking behavior, partial message sends / receives, > and control message data. Tests should pass regardless of if TLS > is enabled in the kernel or not, and print a warning message if not. > > Signed-off-by: Dave Watson This is great, thanks Dave! Applied to net-next.
Re: [PATCH net] tls: Stricter error checking in zerocopy sendmsg path
From: Dave Watson Date: Thu, 12 Jul 2018 08:03:43 -0700 > In the zerocopy sendmsg() path, there are error checks to revert > the zerocopy if we get any error code. syzkaller has discovered > that tls_push_record can return -ECONNRESET, which is fatal, and > happens after the point at which it is safe to revert the iter, > as we've already passed the memory to do_tcp_sendpages. > > Previously this code could return -ENOMEM and we would want to > revert the iter, but AFAIK this no longer returns ENOMEM after > a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"), > so we fail for all error codes. > > Reported-by: syzbot+c226690f7b3126c5e...@syzkaller.appspotmail.com > Reported-by: syzbot+709f2810a6a05f11d...@syzkaller.appspotmail.com > Signed-off-by: Dave Watson > Fixes: 3c4d7559159b ("tls: kernel TLS support") Applied and queued up for -stable, thanks Dave.
[net-next:master 717/734] drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast to restricted __be64
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 568a74d491124c720e604ed3265722f969a5fb38 commit: afd3baaa938ce85dc738cd9279716cdb684cc707 [717/734] net/mlx5e: TLS, add software statistics reproduce: # apt-get install sparse git checkout afd3baaa938ce85dc738cd9279716cdb684cc707 make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:173:21: sparse: cast >> to restricted __be64 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:174:52: sparse: incorrect type in argument 2 (different base types) @@expected unsigned int [unsigned] [usertype] handle @@got ed int [unsigned] [usertype] handle @@ drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:174:52:expected unsigned int [unsigned] [usertype] handle drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:174:52:got restricted __be32 [usertype] handle vim +173 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 162 163 static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock *sk, 164 u32 seq, u64 rcd_sn) 165 { 166 struct tls_context *tls_ctx = tls_get_ctx(sk); 167 struct mlx5e_priv *priv = netdev_priv(netdev); 168 struct mlx5e_tls_offload_context_rx *rx_ctx; 169 170 rx_ctx = mlx5e_get_tls_rx_context(tls_ctx); 171 172 netdev_info(netdev, "resyncing seq %d rcd %lld\n", seq, > 173 be64_to_cpu(rcd_sn)); 174 mlx5_accel_tls_resync_rx(priv->mdev, rx_ctx->handle, seq, rcd_sn); 175 atomic64_inc(>tls->sw_stats.rx_tls_resync_reply); 176 } 177 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
RE: tc mqprio offload command error
> -Original Message- > From: Jesus Sanchez-Palencia > Sent: Monday, July 16, 2018 11:28 PM > To: Alexander Duyck ; Chopra, Manish > > Cc: Stephen Hemminger ; David Miller > ; Jiri Pirko ; > netdev@vger.kernel.org > Subject: Re: tc mqprio offload command error > > External Email > > Hi, > > > On 07/16/2018 10:20 AM, Alexander Duyck wrote: > > On Sun, Jul 15, 2018 at 6:30 PM, Chopra, Manish > > wrote: > >> Hello Folks, > >> > >> I am trying to set below command to try mqprio offload on 4.18 kernel. It > is throwing the flowing error. > >> > >> # tc qdisc add dev eth0 root mqprio num_tc 2 map 1 1 1 1 0 0 0 0 > >> RTNETLINK answers: Numerical result out of range > >> > >> I can't really make out what's wrong with the above command, since this > works fine with other OS kernels. > >> Any thoughts if it is something broken on upstream kernel ? > >> > >> Thanks, > >> Manish > > > > You might need to specify the traffic class for the 8 remaining > > priorities. The full map size is 16 entries, not just 8. The default > > value for the last 4 mapping entries is TC 3 which would be out of > > range if you only have 2 TCs specified. > > > In addition to that, you might hit the same bug we brought up [1] a while > ago. > If that is the case, a fix was just proposed here [2]. Note that other qdiscs > might be broken as well, but we could only spot the issue with mqprio and > netem so far. > > [1] https://patchwork.ozlabs.org/patch/867860/#1893405 > [2] https://patchwork.ozlabs.org/patch/944565/ > > Issue is same with all of 16 prio-tc map supplied - # tc qdisc add dev eth0 root mqprio num_tc 4 map 1 1 1 1 0 0 0 0 2 2 2 2 3 3 3 3 RTNETLINK answers: Numerical result out of range Thanks Jesus, I will try the fix[2] and see. Regards -Manish
Re: [PATCH net v2] KEYS: DNS: fix parsing multiple options
From: Eric Biggers Date: Wed, 11 Jul 2018 10:46:29 -0700 > From: Eric Biggers > > My recent fix for dns_resolver_preparse() printing very long strings was > incomplete, as shown by syzbot which still managed to hit the > WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key: > > precision 50001 too large > WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0 > > The bug this time isn't just a printing bug, but also a logical error > when multiple options ("#"-separated strings) are given in the key > payload. Specifically, when separating an option string into name and > value, if there is no value then the name is incorrectly considered to > end at the end of the key payload, rather than the end of the current > option. This bypasses validation of the option length, and also means > that specifying multiple options is broken -- which presumably has gone > unnoticed as there is currently only one valid option anyway. > > A similar problem also applied to option values, as the kstrtoul() when > parsing the "dnserror" option will read past the end of the current > option and into the next option. > > Fix these bugs by correctly computing the length of the option name and > by copying the option value, null-terminated, into a temporary buffer. > > Reproducer for the WARN_ONCE() that syzbot hit: > > perl -e 'print "#A#", "\0" x 5' | keyctl padd dns_resolver desc @s > > Reproducer for "dnserror" option being parsed incorrectly (expected > behavior is to fail when seeing the unknown option "foo", actual > behavior was to read the dnserror value as "1#foo" and fail there): > > perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s > > Reported-by: syzbot > Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to > be cached [ver #2]") > Signed-off-by: Eric Biggers > --- > > Changed since v1: > - Also fix parsing the option values, not just option names. Applied and queued up for -stable.
Re: [PATCHv2 net 0/2] multicast: init as INCLUDE when join SSM INCLUDE group
From: Hangbin Liu Date: Tue, 10 Jul 2018 22:41:25 +0800 > Based on RFC3376 5.1 and RFC3810 6.1, we should init as INCLUDE when join SSM > INCLUDE group. In my first version I only clear the group change record. But > this is not enough as when a new group join, it will init as EXCLUDE and > trigger an filter mode change in ip/ip6_mc_add_src(), which will clear all > source addresses' sf_crcount. This will prevent early joined address sending > state change records if multi source addresses joined at the same time. > > In this v2 patchset, I fixed it by directly initializing the mode to INCLUDE > for SSM JOIN_SOURCE_GROUP. I also split the original patch into two separated > patches for IPv4 and IPv6. > > Test: test by myself and customer. Series applied, thanks!
Re: tc mqprio offload command error
Hi, On 07/16/2018 10:20 AM, Alexander Duyck wrote: > On Sun, Jul 15, 2018 at 6:30 PM, Chopra, Manish > wrote: >> Hello Folks, >> >> I am trying to set below command to try mqprio offload on 4.18 kernel. It is >> throwing the flowing error. >> >> # tc qdisc add dev eth0 root mqprio num_tc 2 map 1 1 1 1 0 0 0 0 >> RTNETLINK answers: Numerical result out of range >> >> I can't really make out what's wrong with the above command, since this >> works fine with other OS kernels. >> Any thoughts if it is something broken on upstream kernel ? >> >> Thanks, >> Manish > > You might need to specify the traffic class for the 8 remaining > priorities. The full map size is 16 entries, not just 8. The default > value for the last 4 mapping entries is TC 3 which would be out of > range if you only have 2 TCs specified. In addition to that, you might hit the same bug we brought up [1] a while ago. If that is the case, a fix was just proposed here [2]. Note that other qdiscs might be broken as well, but we could only spot the issue with mqprio and netem so far. [1] https://patchwork.ozlabs.org/patch/867860/#1893405 [2] https://patchwork.ozlabs.org/patch/944565/ Regards, Jesus > > - Alex >
[PATCH bpf-next 2/2] tools: bpftool: don't pass FEATURES_DUMP to libbpf
bpftool does not export features it probed for, i.e. FEATURE_DUMP_EXPORT is always empty, so don't try to communicate the features to libbpf. It has no effect. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- tools/bpf/bpftool/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile index 6c4830e18879..74288a2197ab 100644 --- a/tools/bpf/bpftool/Makefile +++ b/tools/bpf/bpftool/Makefile @@ -26,7 +26,7 @@ LIBBPF = $(BPF_PATH)libbpf.a BPFTOOL_VERSION := $(shell make --no-print-directory -sC ../../.. kernelversion) $(LIBBPF): FORCE - $(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a FEATURES_DUMP=$(FEATURE_DUMP_EXPORT) + $(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a $(LIBBPF)-clean: $(call QUIET_CLEAN, libbpf) -- 2.17.1
[PATCH v1 iproute2] tc: Do not use addattr_nest_compat on mqprio and netem
Here we are partially reverting commit c14f9d92eee107 "treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes" . As discussed in [1], changing from the 'manually' coded version that used addattr_l() to addattr_nest_compat() wasn't functionally equivalent, because now the messages have extra fields appended to it. This introduced a regression since the implementation of parse_attr() from both mqprio and netem can't handle this new message format. Without this fix, mqprio returns an error. netem won't return an error but its internal configuration ends up wrong. As an example, this can be reproduced by the following commands when this patch is not applied: 1) mqprio $ tc qdisc replace dev enp3s0 parent root handle 100 mqprio \ num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 2@2 hw 0 RTNETLINK answers: Numerical result out of range 2) netem $ tc qdisc add dev enp3s0 root netem rate 5kbit 20 100 5 \ distribution normal latency 1 1 $ tc -s qdisc (...) qdisc netem 8001: dev enp3s0 root refcnt 9 limit 1000 delay 0us 0us Sent 402 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 (...) With this patch applied, the tc -s qdisc command above for netem instead reads: (...) qdisc netem 8002: dev enp3s0 root refcnt 9 limit 1000 delay 0us 0us \ rate 5Kbit packetoverhead 20 cellsize 100 celloverhead 5 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 (...) [1] https://patchwork.ozlabs.org/patch/867860/#1893405 Fixes: c14f9d92eee107 ("treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes") Reported-by: Vinicius Costa Gomes Signed-off-by: Jesus Sanchez-Palencia --- tc/q_mqprio.c | 5 +++-- tc/q_netem.c | 7 +-- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/tc/q_mqprio.c b/tc/q_mqprio.c index 207d6441..89b46002 100644 --- a/tc/q_mqprio.c +++ b/tc/q_mqprio.c @@ -173,7 +173,8 @@ static int mqprio_parse_opt(struct qdisc_util *qu, int argc, argc--; argv++; } - tail = addattr_nest_compat(n, 1024, TCA_OPTIONS, , sizeof(opt)); + tail = NLMSG_TAIL(n); + addattr_l(n, 1024, TCA_OPTIONS, , sizeof(opt)); if (flags & TC_MQPRIO_F_MODE) addattr_l(n, 1024, TCA_MQPRIO_MODE, @@ -208,7 +209,7 @@ static int mqprio_parse_opt(struct qdisc_util *qu, int argc, addattr_nest_end(n, start); } - addattr_nest_compat_end(n, tail); + tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail; return 0; } diff --git a/tc/q_netem.c b/tc/q_netem.c index 623ec903..9f9a9b3d 100644 --- a/tc/q_netem.c +++ b/tc/q_netem.c @@ -422,6 +422,8 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv, } } + tail = NLMSG_TAIL(n); + if (reorder.probability) { if (opt.latency == 0) { fprintf(stderr, "reordering not possible without specifying some delay\n"); @@ -450,7 +452,8 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv, return -1; } - tail = addattr_nest_compat(n, 1024, TCA_OPTIONS, , sizeof(opt)); + if (addattr_l(n, 1024, TCA_OPTIONS, , sizeof(opt)) < 0) + return -1; if (present[TCA_NETEM_CORR] && addattr_l(n, 1024, TCA_NETEM_CORR, , sizeof(cor)) < 0) @@ -509,7 +512,7 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv, return -1; free(dist_data); } - addattr_nest_compat_end(n, tail); + tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail; return 0; } -- 2.18.0
[PATCH bpf-next 1/2] tools: libbpf: remove libelf-getphdrnum feature detection
libbpf does not depend on libelf-getphdrnum feature, don't check it. $ git grep HAVE_ELF_GETPHDRNUM_SUPPORT tools/perf/Makefile.config:CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT tools/perf/util/symbol-elf.c:#ifndef HAVE_ELF_GETPHDRNUM_SUPPORT Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- tools/lib/bpf/Makefile | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 7a8e4c98ef1a..d49902e818b5 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -66,7 +66,7 @@ ifndef VERBOSE endif FEATURE_USER = .libbpf -FEATURE_TESTS = libelf libelf-getphdrnum libelf-mmap bpf reallocarray +FEATURE_TESTS = libelf libelf-mmap bpf reallocarray FEATURE_DISPLAY = libelf bpf INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi -I$(srctree)/tools/perf @@ -116,10 +116,6 @@ ifeq ($(feature-libelf-mmap), 1) override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT endif -ifeq ($(feature-libelf-getphdrnum), 1) - override CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT -endif - ifeq ($(feature-reallocarray), 0) override CFLAGS += -DCOMPAT_NEED_REALLOCARRAY endif -- 2.17.1
[PATCH bpf-next 0/2] tools: bpf: build cleanups
Hi! While tracking down the perf vs libbpf vs reallocarray build issue I noticed libbpf is checking for a feature it never uses and that bpftool's makefile attempt to reuse feature dump doesn't really make sense. Jakub Kicinski (2): tools: libbpf: remove libelf-getphdrnum feature detection tools: bpftool: don't pass FEATURES_DUMP to libbpf tools/bpf/bpftool/Makefile | 2 +- tools/lib/bpf/Makefile | 6 +- 2 files changed, 2 insertions(+), 6 deletions(-) -- 2.17.1
[net-next:master 716/721] drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:329:66: sparse: incorrect type in argument 6 (different base types)
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: aea06eb276d99590f400c877ca2bd74b4db91330 commit: 00aebab27c8752c7420dce286270ccedc70ac39a [716/721] net/mlx5e: TLS, add Innova TLS rx data path reproduce: # apt-get install sparse git checkout 00aebab27c8752c7420dce286270ccedc70ac39a make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:329:66: sparse: >> incorrect type in argument 6 (different base types) @@expected unsigned >> short const [unsigned] [usertype] hnum @@got const [unsigned] >> [usertype] hnum @@ drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:329:66: expected unsigned short const [unsigned] [usertype] hnum drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:329:66:got restricted __be16 [usertype] dest >> include/net/tls.h:435:47: sparse: cast from restricted __be32 vim +329 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 302 303 static int tls_update_resync_sn(struct net_device *netdev, 304 struct sk_buff *skb, 305 struct mlx5e_tls_metadata *mdata) 306 { 307 struct sock *sk = NULL; 308 struct iphdr *iph; 309 struct tcphdr *th; 310 __be32 seq; 311 312 if (mdata->ethertype != htons(ETH_P_IP)) 313 return -EINVAL; 314 315 iph = (struct iphdr *)(mdata + 1); 316 317 th = ((void *)iph) + iph->ihl * 4; 318 319 if (iph->version == 4) { 320 sk = inet_lookup_established(dev_net(netdev), _hashinfo, 321 iph->saddr, th->source, iph->daddr, 322 th->dest, netdev->ifindex); 323 #if IS_ENABLED(CONFIG_IPV6) 324 } else { 325 struct ipv6hdr *ipv6h = (struct ipv6hdr *)iph; 326 327 sk = __inet6_lookup_established(dev_net(netdev), _hashinfo, 328 >saddr, th->source, > 329 >daddr, th->dest, 330 netdev->ifindex, 0); 331 #endif 332 } 333 if (!sk || sk->sk_state == TCP_TIME_WAIT) 334 goto out; 335 336 skb->sk = sk; 337 skb->destructor = sock_edemux; 338 339 memcpy(, >content.recv.sync_seq, sizeof(seq)); 340 tls_offload_rx_resync_request(sk, seq); 341 out: 342 return 0; 343 } 344 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: tc mqprio offload command error
On Sun, Jul 15, 2018 at 6:30 PM, Chopra, Manish wrote: > Hello Folks, > > I am trying to set below command to try mqprio offload on 4.18 kernel. It is > throwing the flowing error. > > # tc qdisc add dev eth0 root mqprio num_tc 2 map 1 1 1 1 0 0 0 0 > RTNETLINK answers: Numerical result out of range > > I can't really make out what's wrong with the above command, since this works > fine with other OS kernels. > Any thoughts if it is something broken on upstream kernel ? > > Thanks, > Manish You might need to specify the traffic class for the 8 remaining priorities. The full map size is 16 entries, not just 8. The default value for the last 4 mapping entries is TC 3 which would be out of range if you only have 2 TCs specified. - Alex
Re: [PATCH v2 net] net/ipv6: Do not allow device only routes via the multipath API
On 7/16/18 10:09 AM, Eric Dumazet wrote: > Yes, I guess we have no real choice for the moment. It is unfortunate that we are forever stuck with this mess from a short sighted implementation years ago. From a uapi perspective, dev-only nexthops and proper add-to/append/replace semantics should have been a part of the code from the beginning.
Re: [RFC net-next v1 1/1] net/sched: Introduce the taprio scheduler
Hi Jiri, Jiri Pirko writes: [...] >> >>gates.sched > > Any particular reason this has to be in file and not on the cmdline? The idea here was to keep longer schedules more manageable. And during testing I found it more ergonomic to have a file. It also has the advantage that the file can be reused by other tools, dump-classifier (awful name, I admit), included in that github gist, is one example, it uses the schedule (and some more information) to calculate which packets would fall outside their "windows" in a pcap dump. Anyway, if there are use cases that having the schedule in the command line helps, I would be happy to add it. Cheers, -- Vinicius
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask
Hey Max: On 7/16/2018 11:46 AM, Max Gurtovoy wrote: > > > On 7/16/2018 5:59 PM, Sagi Grimberg wrote: >> >>> Hi, >>> I've tested this patch and seems problematic at this moment. >> >> Problematic how? what are you seeing? > > Connection failures and same error Steve saw: > > [Mon Jul 16 16:19:11 2018] nvme nvme0: Connect command failed, error > wo/DNR bit: -16402 > [Mon Jul 16 16:19:11 2018] nvme nvme0: failed to connect queue: 2 ret=-18 > > >> >>> maybe this is because of the bug that Steve mentioned in the NVMe >>> mailing list. Sagi mentioned that we should fix it in the NVMe/RDMA >>> initiator and I'll run his suggestion as well. >> >> Is your device irq affinity linear? > > When it's linear and the balancer is stopped the patch works. > >> >>> BTW, when I run the blk_mq_map_queues it works for every irq affinity. >> >> But its probably not aligned to the device vector affinity. > > but I guess it's better in some cases. > > I've checked the situation before Leon's patch and set all the vetcors > to CPU 0. In this case (I think that this was the initial report by > Steve), we use the affinity_hint (Israel's and Saeed's patches were we > use dev->priv.irq_info[vector].mask) and it worked fine. > > Steve, > Can you share your configuration (kernel, HCA, affinity map, connect > command, lscpu) ? > I want to repro it in my lab. > - linux-4.18-rc1 + the nvme/nvmet inline_data_size patches + patches to enable ib_get_vector_affinity() in cxgb4 + sagi's patch + leon's mlx5 patch so I can change the affinity via procfs. - mlx5 MT27700 RoCE card, cxgb4 T62100-CR iWARP card - The system has 2 numa nodes with 8 real cpus in each == 16 cpus all online. HT disabled. - i'm testing over HW loopback for simplicity, so the node is both the nvme target and host. Connecting one device like this: nvme connect -t rdma -a 172.16.2.1 -n nvme-nullb0 - to reproduce the nvme-rdma bug, just map any two hca cq comp vectors to the same cpu. - lscpu output: [root@stevo1 linux]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 45 Model name: Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz Stepping: 7 CPU MHz: 3400.057 CPU max MHz: 3800. CPU min MHz: 1200. BogoMIPS: 6200.10 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 20480K NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts Steve
Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order
On Tue, Jul 10, 2018 at 6:31 PM Pravin Shelar wrote: > > On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce wrote: > > From: Stefano Brivio > > > > Open vSwitch sends to userspace all received packets that have > > no associated flow (thus doing an "upcall"). Then the userspace > > program creates a new flow and determines the actions to apply > > based on its configuration. > > > > When a single port generates a high rate of upcalls, it can > > prevent other ports from dispatching their own upcalls. vswitchd > > overcomes this problem by creating many netlink sockets for each > > port, but it quickly exceeds any reasonable maximum number of > > open files when dealing with huge amounts of ports. > > > > This patch queues all the upcalls into a list, ordering them in > > a per-port round-robin fashion, and schedules a deferred work to > > queue them to userspace. > > > > The algorithm to queue upcalls in a round-robin fashion, > > provided by Stefano, is based on these two rules: > > - upcalls for a given port must be inserted after all the other > >occurrences of upcalls for the same port already in the queue, > >in order to avoid out-of-order upcalls for a given port > > - insertion happens once the highest upcall count for any given > >port (excluding the one currently at hand) is greater than the > >count for the port we're queuing to -- if this condition is > >never true, upcall is queued at the tail. This results in a > >per-port round-robin order. > > > > In order to implement a fair round-robin behaviour, a variable > > queueing delay is introduced. This will be zero if the upcalls > > rate is below a given threshold, and grows linearly with the > > queue utilisation (i.e. upcalls rate) otherwise. > > > > This ensures fairness among ports under load and with few > > netlink sockets. > > > Thanks for the patch. > This patch is adding following overhead for upcall handling: > 1. kmalloc. > 2. global spin-lock. > 3. context switch to single worker thread. > I think this could become bottle neck on most of multi core systems. > You have mentioned issue with existing fairness mechanism, Can you > elaborate on those, I think we could improve that before implementing > heavy weight fairness in upcall handling. Hi Pravin, vswitchd allocates N * P netlink sockets, where N is the number of online CPU cores, and P the number of ports. With some setups, this number can grow quite fast, also exceeding the system maximum file descriptor limit. I've seen a 48 core server failing with -EMFILE when trying to create more than 65535 netlink sockets needed for handling 1800+ ports. I made a previous attempt to reduce the sockets to one per CPU, but this was discussed and rejected on ovs-dev because it would remove fairness among ports[1]. I think that the current approach of opening a huge number of sockets doesn't really work, (it doesn't scale for sure), it still needs some queueing logic (either in kernel or user space) if we really want to be sure that low traffic ports gets their upcalls quota when other ports are doing way more traffic. If you are concerned about the kmalloc or spinlock, we can solve them with kmem_cache or two copies of the list and rcu, I'll happy to discuss the implementation details, as long as we all agree that the current implementation doesn't scale well and has an issue. [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344279.html -- Matteo Croce per aspera ad upstream
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask
On 7/16/2018 5:59 PM, Sagi Grimberg wrote: Hi, I've tested this patch and seems problematic at this moment. Problematic how? what are you seeing? Connection failures and same error Steve saw: [Mon Jul 16 16:19:11 2018] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 [Mon Jul 16 16:19:11 2018] nvme nvme0: failed to connect queue: 2 ret=-18 maybe this is because of the bug that Steve mentioned in the NVMe mailing list. Sagi mentioned that we should fix it in the NVMe/RDMA initiator and I'll run his suggestion as well. Is your device irq affinity linear? When it's linear and the balancer is stopped the patch works. BTW, when I run the blk_mq_map_queues it works for every irq affinity. But its probably not aligned to the device vector affinity. but I guess it's better in some cases. I've checked the situation before Leon's patch and set all the vetcors to CPU 0. In this case (I think that this was the initial report by Steve), we use the affinity_hint (Israel's and Saeed's patches were we use dev->priv.irq_info[vector].mask) and it worked fine. Steve, Can you share your configuration (kernel, HCA, affinity map, connect command, lscpu) ? I want to repro it in my lab. -Max.
Re: [PATCH v3 net-next 0/3] rds: IPv6 support
- Looks like rds_connect() is checking things in the right order (thanks) However, rds_cancel_sent_to is still looking at the len to figure out the family.. as we move to ipv6, it would be better if we allow the caller to specify struct sockaddr_storage, or even a union of sockaddr_in/sockaddr_in6, rather than require them to hint at which one of ipv4/ipv6 through the optlen. Please see __sys_connect and move_addr_to_kernel if the user-kernel copy is the reason you are not doing this. Similar to inet_dgram_connect you can then check the sa_family and use that to figure out the "Assume IPv4" etc stuff. This would also make the CANCEL_SEND_TO API consistent with the bind/ connect etc semantics. - net/rds/rds.h: thanks for moving RDS_CM_PORT to the rdma specific file. I am guessing (?) that you want to update the comment to talk about the non-existent "RDS over UDP" based on the title of the IANA registration? I would just like to re-iterate that this is actually inaccurate (and confusing to someone looking at this for the first time, since there is no RDS-over-UDP today). If it were up to me, I would update the comment to say /* The following ports, 16385, 18634, 18635, are registered with IANA as * the ports to be used for "RDS over TCP and UDP". * The current linux implementation supports RDS over TCP and IB, and uses * the ports as follows: 18634 is the historical value used for the * RDMA_CM listener port. RDS/TCP uses port 16385. After * IPv6 work, RDMA_CM also uses 16385 as the listener port. 18634 is kept * to ensure compatibility with older RDS modules. Those ports are defined * in each transport's header file. IMHO that makes the comment look a little less odd (I've already explained to you why RDS-over-UDP does not make much practical sense for the RDS use-cases we anticipate). YMMV. Thanks, --Sowmini
Re: [PATCH v2 net] net/ipv6: Do not allow device only routes via the multipath API
On 07/15/2018 09:35 AM, dsah...@kernel.org wrote: > From: David Ahern > > Eric reported that reverting the patch that fixed and simplified IPv6 > multipath routes means reverting back to invalid userspace notifications. > eg., > $ ip -6 route add 2001:db8:1::/64 nexthop dev eth0 nexthop dev eth1 > > only generates a single notification: > 2001:db8:1::/64 dev eth0 metric 1024 pref medium > > While working on a fix for this problem I found another case that is just > broken completely - a multipath route with a gateway followed by device > followed by gateway: > $ ip -6 ro add 2001:db8:103::/64 > nexthop via 2001:db8:1::64 > nexthop dev dummy2 > nexthop via 2001:db8:3::64 > > In this case the device only route is dropped completely - no notification > to userpsace but no addition to the FIB either: > > $ ip -6 ro ls > 2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium > 2001:db8:2::/64 dev dummy2 proto kernel metric 256 pref medium > 2001:db8:3::/64 dev dummy3 proto kernel metric 256 pref medium > 2001:db8:103::/64 metric 1024 > nexthop via 2001:db8:1::64 dev dummy1 weight 1 > nexthop via 2001:db8:3::64 dev dummy3 weight 1 pref medium > fe80::/64 dev dummy1 proto kernel metric 256 pref medium > fe80::/64 dev dummy2 proto kernel metric 256 pref medium > fe80::/64 dev dummy3 proto kernel metric 256 pref medium > > Really, IPv6 multipath is just FUBAR'ed beyond repair when it comes to > device only routes, so do not allow it all. > > This change will break any scripts relying on the mpath api for insert, > but I don't see any other way to handle the permutations. Besides, since > the routes are added to the FIB as standalone (non-multipath) routes the > kernel is not doing what the user requested, so it might as well tell the > user that. Yes, I guess we have no real choice for the moment. Thanks David Reviewed-by: Eric Dumazet
[PATCH net] ibmvnic: Fix error recovery on login failure
Testing has uncovered a failure case that is not handled properly. In the event that a login fails and we are not able to recover on the spot, we return 0 from do_reset, preventing any error recovery code from being triggered. Additionally, the state is set to "probed" meaning that when we are able to trigger the error recovery, the driver always comes up in the probed state. To handle the case properly, we need to return a failure code here and set the adapter state to the state that we entered the reset in indicating the state that we would like to come out of the recovery reset in. Signed-off-by: John Allen --- diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index d0e196b..c1e23bb 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1825,8 +1825,8 @@ static int do_reset(struct ibmvnic_adapter *adapter, rc = ibmvnic_login(netdev); if (rc) { - adapter->state = VNIC_PROBED; - return 0; + adapter->state = reset_state; + return rc; } if (adapter->reset_reason == VNIC_RESET_CHANGE_PARAM ||
[net-next:master 715/721] drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:172:52: sparse: incorrect type in argument 2 (different base types)
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: aea06eb276d99590f400c877ca2bd74b4db91330 commit: ca942c78f3237e09567d80ac19dffe9690c74d79 [715/721] net/mlx5e: TLS, add innova rx support reproduce: # apt-get install sparse git checkout ca942c78f3237e09567d80ac19dffe9690c74d79 make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:172:52: sparse: >> incorrect type in argument 2 (different base types) @@expected unsigned >> int [unsigned] [usertype] handle @@got ed int [unsigned] [usertype] >> handle @@ drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:172:52:expected unsigned int [unsigned] [usertype] handle drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:172:52:got restricted __be32 [usertype] handle vim +172 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 162 163 static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock *sk, 164 u32 seq, u64 rcd_sn) 165 { 166 struct tls_context *tls_ctx = tls_get_ctx(sk); 167 struct mlx5e_priv *priv = netdev_priv(netdev); 168 struct mlx5e_tls_offload_context_rx *rx_ctx; 169 170 rx_ctx = mlx5e_get_tls_rx_context(tls_ctx); 171 > 172 mlx5_accel_tls_resync_rx(priv->mdev, rx_ctx->handle, seq, > rcd_sn); 173 } 174 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask
Hi, I've tested this patch and seems problematic at this moment. Problematic how? what are you seeing? maybe this is because of the bug that Steve mentioned in the NVMe mailing list. Sagi mentioned that we should fix it in the NVMe/RDMA initiator and I'll run his suggestion as well. Is your device irq affinity linear? BTW, when I run the blk_mq_map_queues it works for every irq affinity. But its probably not aligned to the device vector affinity.
general protection fault in do_raw_spin_unlock
Hello, syzbot found the following crash on: HEAD commit:1d4eb636f0ab Add linux-next specific files for 20180716 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=1186bf0c40 kernel config: https://syzkaller.appspot.com/x/.config?x=ea5926dddb0db97a dashboard link: https://syzkaller.appspot.com/bug?extid=83a25334ef203851dc81 compiler: gcc (GCC) 8.0.1 20180413 (experimental) syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=179ed0 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+83a25334ef203851d...@syzkaller.appspotmail.com IPVS: ftp: loaded support on port[0] = 21 IPVS: ftp: loaded support on port[0] = 21 IPVS: ftp: loaded support on port[0] = 21 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 4.18.0-rc5-next-20180716+ #8 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events p9_poll_workfn RIP: 0010:debug_spin_unlock kernel/locking/spinlock_debug.c:97 [inline] RIP: 0010:do_raw_spin_unlock+0x65/0x2f0 kernel/locking/spinlock_debug.c:134 Code: 0a bd 88 48 c7 85 78 ff ff ff b3 8a b5 41 48 c7 45 88 d0 3c 60 81 c7 02 f1 f1 f1 f1 c7 42 04 04 f2 f2 f2 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9 RSP: 0018:8801d945f288 EFLAGS: 00010047 RAX: dc00 RBX: RCX: 8770a045 RDX: RSI: 0001 RDI: 0004 RBP: 8801d945f310 R08: 11003b28be45 R09: ed0035e7bd88 R10: ed0035e7bd88 R11: 8801af3dec43 R12: R13: 11003b28be51 R14: 8801d945f2e8 R15: 8801c5811d50 FS: () GS:8801daf0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0072c029 CR3: 0001b19fd000 CR4: 001406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:159 [inline] _raw_spin_unlock_irqrestore+0x27/0xc0 kernel/locking/spinlock.c:184 spin_unlock_irqrestore include/linux/spinlock.h:384 [inline] p9_conn_cancel+0x9b6/0xd30 net/9p/trans_fd.c:208 p9_poll_mux net/9p/trans_fd.c:620 [inline] p9_poll_workfn+0x4b2/0x6d0 net/9p/trans_fd.c:1107 process_one_work+0xc73/0x1ba0 kernel/workqueue.c:2153 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x345/0x410 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 4d86351f63a12683 ]--- RIP: 0010:debug_spin_unlock kernel/locking/spinlock_debug.c:97 [inline] RIP: 0010:do_raw_spin_unlock+0x65/0x2f0 kernel/locking/spinlock_debug.c:134 Code: 0a bd 88 48 c7 85 78 ff ff ff b3 8a b5 41 48 c7 45 88 d0 3c 60 81 c7 02 f1 f1 f1 f1 c7 42 04 04 f2 f2 f2 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9 RSP: 0018:8801d945f288 EFLAGS: 00010047 RAX: dc00 RBX: RCX: 8770a045 RDX: RSI: 0001 RDI: 0004 RBP: 8801d945f310 R08: 11003b28be45 R09: ed0035e7bd88 R10: ed0035e7bd88 R11: 8801af3dec43 R12: R13: 11003b28be51 R14: 8801d945f2e8 R15: 8801c5811d50 FS: () GS:8801daf0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0072c029 CR3: 0001b19fd000 CR4: 001406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot. syzbot can test patches for this bug, for details see: https://goo.gl/tpsmEJ#testing-patches
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask
Hi, I've tested this patch and seems problematic at this moment. maybe this is because of the bug that Steve mentioned in the NVMe mailing list. Sagi mentioned that we should fix it in the NVMe/RDMA initiator and I'll run his suggestion as well. BTW, when I run the blk_mq_map_queues it works for every irq affinity. On 7/16/2018 1:30 PM, Leon Romanovsky wrote: On Mon, Jul 16, 2018 at 01:23:24PM +0300, Sagi Grimberg wrote: Leon, I'd like to see a tested-by tag for this (at least until I get some time to test it). Of course. Thanks The patch itself looks fine to me. -Max.
Re: [PATCH iproute2] ip: add support for seg6local End.BPF action
On Mon, 16 Jul 2018 14:47:41 + Mathieu Xhonneux wrote: > This patch adds support for the End.BPF action of the seg6local > lightweight tunnel. Functions from the BPF lightweight tunnel are > re-used in this patch. Example: > > $ ip -6 route add fc00::18 encap seg6local action End.BPF obj my_bpf.o > sec my_func dev eth0 > > $ ip -6 route show fc00::18 > fc00::18 encap seg6local action End.BPF my_bpf.o:[my_func] dev eth0 > metric 1024 pref medium > > Signed-off-by: Mathieu Xhonneux > --- > ip/iproute_lwtunnel.c | 122 > +- > lib/bpf.c | 5 +++ > 2 files changed, 77 insertions(+), 50 deletions(-) > > diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c > index 46a212c8..71c3d8a4 100644 > --- a/ip/iproute_lwtunnel.c > +++ b/ip/iproute_lwtunnel.c > @@ -177,6 +177,7 @@ static const char > *seg6_action_names[SEG6_LOCAL_ACTION_MAX + 1] = { > [SEG6_LOCAL_ACTION_END_S] = "End.S", > [SEG6_LOCAL_ACTION_END_AS] = "End.AS", > [SEG6_LOCAL_ACTION_END_AM] = "End.AM", > + [SEG6_LOCAL_ACTION_END_BPF] = "End.BPF", > }; > > static const char *format_action_type(int action) > @@ -250,6 +251,15 @@ static void print_encap_seg6local(FILE *fp, struct > rtattr *encap) > print_string(PRINT_ANY, "oif", >"oif %s ", ll_index_to_name(oif)); > } > + > + if (tb[SEG6_LOCAL_BPF]) { > + struct rtattr *tb_bpf[LWT_BPF_PROG_MAX+1]; > + > + parse_rtattr_nested(tb_bpf, LWT_BPF_PROG_MAX, > tb[SEG6_LOCAL_BPF]); > + > + if (tb_bpf[LWT_BPF_PROG_NAME]) > + fprintf(fp, "%s ", > rta_getattr_str(tb_bpf[LWT_BPF_PROG_NAME])); > + } > } Please use print_string to support JSON output.