Re: [PATCH net-next] ip: silence udp zerocopy smatch false positive
From: Willem de Bruijn Date: Sat, 8 Dec 2018 06:22:46 -0500 > From: Willem de Bruijn > > extra_uref is used in __ip(6)_append_data only if uarg is set. > > Smatch sees that the variable is passed to sock_zerocopy_put_abort. > This function accesses it only when uarg is set, but smatch cannot > infer this. > > Make this dependency explicit. > > Fixes: 52900d22288e ("udp: elide zerocopy operation in hot path") > Signed-off-by: Willem de Bruijn I looked and can't figure out a better way to fix this :) Applied, thanks Willem.
Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
From: Ilias Apalodimas Date: Sat, 8 Dec 2018 16:57:28 +0200 > The patchset speeds up the mvneta driver on the default network > stack. The only change that was needed was to adapt the driver to > using the page_pool API. The speed improvements we are seeing on > specific workloads (i.e 256b < packet < 400b) are almost 3x. > > Lots of high speed drivers are doing similar recycling tricks themselves (and > there's no common code, everyone is doing something similar though). All we > are > trying to do is provide a unified API to make that easier for the rest. > Another > advantage is that if the some drivers switch to the API, adding XDP > functionality on them is pretty trivial. Yeah this is a very important point moving forward. Jesse Brandeberg brought the following up to me at LPC and I'd like to develop it further. Right now we tell driver authors to write a new driver as SKB based, and once they've done all of that work we tell them to basically shoe-horn XDP support into that somewhat different framework. Instead, the model should be the other way around, because with a raw meta-data free set of data buffers we can always construct an SKB or pass it to XDP. So drivers should be targetting some raw data buffer kind of interface which takes care of all of this stuff. If the buffers get wrapped into an SKB and get pushed into the traditional networking stack, the driver shouldn't know or care. Likewise if it ends up being processed with XDP, it should not need to know or care. All of those details should be behind a common layer. Then we can control: 1) Buffer handling, recycling, "fast paths" 2) Statistics 3) XDP feature sets We can consolidate behavior and semantics across all of the drivers if we do this. No more talk about "supporting all XDP features", and the inconsistencies we have because of that. The whole common statistics discussion could be resolved with this common layer as well. We'd be able to control and properly optimize everything.
Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
From: Jesper Dangaard Brouer Date: Sat, 8 Dec 2018 12:36:10 +0100 > The annoying part is actually that depending on the kernel config > options CONFIG_XFRM, CONFIG_NF_CONNTRACK and CONFIG_BRIDGE_NETFILTER, > whether there is a cache-line split, where mem_info gets moved into the > next cacheline. Note that Florian Westphal's work (trying to help MP-TCP) would eliminate this variability.
Re: [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API
From: Jesper Dangaard Brouer Date: Fri, 07 Dec 2018 00:25:47 +0100 > @@ -744,6 +745,10 @@ struct sk_buff { > head_frag:1, > xmit_more:1, > pfmemalloc:1; > + /* TODO: Future idea, extend mem_info with __u8 flags, and > + * move bits head_frag and pfmemalloc there. > + */ > + struct xdp_mem_info mem_info; This is 4 bytes right? I guess I can live with this. Please do some microbenchmarks to make sure this doesn't show any obvious regressions. Thanks.
Re: [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA
From: Jesper Dangaard Brouer Date: Fri, 07 Dec 2018 00:25:32 +0100 > From: Ilias Apalodimas > > Add helper functions for retreiving dma_addr_t stored in page_private and > unmapping dma addresses, mapped via the page_pool API. > > Signed-off-by: Ilias Apalodimas > Signed-off-by: Jesper Dangaard Brouer This isn't going to work on 32-bit platforms where dma_addr_t is a u64, because the page private is unsigned long. Grep for PHY_ADDR_T_64BIT under arch/ to see the vast majority of the cases where this happens, then ARCH_DMA_ADDR_T_64BIT.
Re: [PATCH] Revert "net/ibm/emac: wrong bit is used for STA control"
From: Benjamin Herrenschmidt Date: Fri, 07 Dec 2018 15:05:04 +1100 > This reverts commit 624ca9c33c8a853a4a589836e310d776620f4ab9. > > This commit is completely bogus. The STACR register has two formats, old > and new, depending on the version of the IP block used. There's a pair of > device-tree properties that can be used to specify the format used: > > has-inverted-stacr-oc > has-new-stacr-staopc > > What this commit did was to change the bit definition used with the old > parts to match the new parts. This of course breaks the driver on all > the old ones. > > Instead, the author should have set the appropriate properties in the > device-tree for the variant used on his board. > > Signed-off-by: Benjamin Herrenschmidt > --- > > Found while setting up some old ppc440 boxes for test/CI Applied, thanks.
Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup
From: Maciej Żenczykowski Date: Fri, 7 Dec 2018 16:46:36 -0800 >> This doesn't apply to the current net tree. >> >> Also "net-udp: " is a weird subsystem prefix, just use "udp: ". >> >> Thank you. > > Interesting... this patch was on top of net-next/master, and it still > rebases cleanly on current net-next/master. > > Would you like it on net/master instead? It indeed doesn't apply > cleanly there... Well, it is a bug fix isn't it? Or is this more like a behavioral feature?
Re: [PATCH net-next 0/4] tc-testing: implement command timeouts and better results tracking
From: Lucas Bates Date: Thu, 6 Dec 2018 17:42:23 -0500 > Patch 1 adds a timeout feature for any command tdc launches in a subshell. > This prevents tdc from hanging indefinitely. > > Patches 2-4 introduce a new method for tracking and generating test case > results, and implements it across the core script and all applicable > plugins. Series applied.
Re: [PATCH net v2 0/2] Fix slab out-of-bounds on insufficient headroom for IPv6 packets
From: Stefano Brivio Date: Thu, 6 Dec 2018 19:30:35 +0100 > Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over > IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX. > > Patch 2/2 makes sure we avoid writing before the allocated buffer in > neigh_hh_output() in case the headroom is enough for the unaligned hardware > header size, but not enough for the aligned one, and that we warn if we hit > this condition. Series applied and queued up for -stable, thanks.
Re: [PATCH net] tcp: lack of available data can also cause TSO defer
From: Eric Dumazet Date: Thu, 6 Dec 2018 09:58:24 -0800 > tcp_tso_should_defer() can return true in three different cases : > > 1) We are cwnd-limited > 2) We are rwnd-limited > 3) We are application limited. > > Neal pointed out that my recent fix went too far, since > it assumed that if we were not in 1) case, we must be rwnd-limited > > Fix this by properly populating the is_cwnd_limited and > is_rwnd_limited booleans. > > After this change, we can finally move the silly check for FIN > flag only for the application-limited case. > > The same move for EOR bit will be handled in net-next, > since commit 1c09f7d073b1 ("tcp: do not try to defer skbs > with eor mark (MSG_EOR)") is scheduled for linux-4.21 > > Tested by running 200 concurrent netperf -t TCP_RR -- -r 6,100 > and checking none of them was rwnd_limited in the chrono_stat > output from "ss -ti" command. > > Fixes: 41727549de3e ("tcp: Do not underestimate rwnd_limited") > Signed-off-by: Eric Dumazet > Suggested-by: Neal Cardwell > Reviewed-by: Neal Cardwell > Acked-by: Soheil Hassas Yeganeh > Reviewed-by: Yuchung Cheng Applied.
Re: [PATCH] net-udp: deprioritize cpu match for udp socket lookup
From: Maciej Żenczykowski Date: Wed, 5 Dec 2018 12:59:17 -0800 > From: Maciej Żenczykowski > > During udp socket lookup cpu match should be lowest priority, > hence it should increase score by only 1. > > The next priority is delivering v4 to v4 sockets, and v6 to v6 sockets. > The v6 code path doesn't have to deal with this so it always gets > a score of '4'. The v4 code path uses '4' or '2' depending on > whether we're delivering to a v4 socket or a dualstack v6 socket. > > This is more important than cpu match, so has to be greater than > the '1' bump in score from cpu match. > > All other matches (src/dst ip, src port) are even *more* important, > so need to bump score by 4 for ipv4. > > For ipv6 we could simply bump by 2, but let's keep the two code > paths as similar as possible. > > (also, while at it, remove two unnecessary unconditional score bumps) > > Signed-off-by: Maciej Żenczykowski This doesn't apply to the current net tree. Also "net-udp: " is a weird subsystem prefix, just use "udp: ". Thank you.
Re: [Patch v2 net-next] call sk_dst_reset when set SO_DONTROUTE
From: yupeng Date: Wed, 5 Dec 2018 18:56:28 -0800 > after set SO_DONTROUTE to 1, the IP layer should not route packets if > the dest IP address is not in link scope. But if the socket has cached > the dst_entry, such packets would be routed until the sk_dst_cache > expires. So we should clean the sk_dst_cache when a user set > SO_DONTROUTE option. Below are server/client python scripts which > could reprodue this issue: ... > Signed-off-by: yupeng Applied.
Re: [PATCH v2 net-next] neighbor: Improve garbage collection
From: David Ahern Date: Fri, 7 Dec 2018 12:24:57 -0800 > From: David Ahern > > The existing garbage collection algorithm has a number of problems: ... > This patch addresses these problems as follows: > > 1. Use of a separate list_head to track entries that can be garbage >collected along with a separate counter. PERMANENT entries are not >added to this list. > >The gc_thresh parameters are only compared to the new counter, not the >total entries in the table. The forced_gc function is updated to only >walk this new gc_list looking for entries to evict. > > 2. Entries are added to the list head at the tail and removed from the >front. > > 3. Entries are only evicted if they were last updated more than 5 seconds >ago, adhering to the original intent of gc_thresh2. > > 4. Forced gc is stopped once the number of gc_entries drops below >gc_thresh2. > > 5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped >when allocating a new neighbor for a PERMANENT entry. By extension this >means there are no explicit limits on the number of PERMANENT entries >that can be created, but this is no different than FIB entries or FDB >entries. > > Signed-off-by: David Ahern > --- > v2 > - remove on_gc_list boolean in favor of !list_empty > - fix neigh_alloc to add new entry to tail of list_head Again, looks great, applied.
Re: [PATCH V2] net: dsa: ksz: Add reset GPIO handling
From: Marek Vasut Date: Fri, 7 Dec 2018 23:59:58 +0100 > On 12/07/2018 11:24 PM, Andrew Lunn wrote: >> On Fri, Dec 07, 2018 at 10:51:36PM +0100, Marek Vasut wrote: >>> Add code to handle optional reset GPIO in the KSZ switch driver. The switch >>> has a reset GPIO line which can be controlled by the CPU, so make sure it is >>> configured correctly in such setups. >> >> Hi Marek > > Hi Andrew, > >> Please make this a patch series, not two individual patches. > > This actually is an individual patch, it doesn't depend on anything. > Or do you mean a series with the DT documentation change ? Yes, but all of this stuff is building up for one single purpose, and that is to support a new mode of operation with DSA or whatever. So please group them together in a series with an appropriate header posting.
Re: [PATCH net-next] neighbor: Add protocol attribute
From: Eric Dumazet Date: Fri, 7 Dec 2018 15:03:04 -0800 > On 12/07/2018 02:24 PM, David Ahern wrote: >> On 12/7/18 3:20 PM, Eric Dumazet wrote: >> >> /* --- cacheline 3 boundary (192 bytes) --- */ >> struct hh_cachehh; /* 19248 */ >> >> ... >> >> but does not change the actual allocation size which is rounded to 512. >> > > I have not talked about the allocation size, but alignment of ->ha field, > which is kind of assuming long alignment, in a strange way. Right, neigh->ha[] should probably be kept 8-byte aligned.
Re: [PATCH net-next] neighbor: Add protocol attribute
On 12/7/18 3:20 PM, Eric Dumazet wrote: > > > On 12/07/2018 01:49 PM, David Ahern wrote: >> From: David Ahern >> >> Similar to routes and rules, add protocol attribute to neighbor entries >> for easier tracking of how each was created. >> >> Signed-off-by: David Ahern >> --- >> include/net/neighbour.h| 2 ++ >> include/uapi/linux/neighbour.h | 1 + >> net/core/neighbour.c | 24 +++- >> 3 files changed, 26 insertions(+), 1 deletion(-) >> >> diff --git a/include/net/neighbour.h b/include/net/neighbour.h >> index 6c13072910ab..e93c59df9501 100644 >> --- a/include/net/neighbour.h >> +++ b/include/net/neighbour.h >> @@ -149,6 +149,7 @@ struct neighbour { >> __u8nud_state; >> __u8type; >> __u8dead; >> +u8 protocol; >> seqlock_t ha_lock; >> unsigned char ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))]; > > This looks like ha[] alignment would change, I am not sure how critical it is. Just adds 4 bytes to neighbour: ... /* --- cacheline 2 boundary (128 bytes) --- */ long unsigned int used; /* 128 8 */ atomic_t probes; /* 136 4 */ __u8 flags;/* 140 1 */ __u8 nud_state;/* 141 1 */ __u8 type; /* 142 1 */ __u8 dead; /* 143 1 */ u8 protocol; /* 144 1 */ /* XXX 3 bytes hole, try to pack */ seqlock_t ha_lock; /* 148 8 */ unsigned char ha[32]; /* 15632 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 3 boundary (192 bytes) --- */ struct hh_cachehh; /* 19248 */ ... but does not change the actual allocation size which is rounded to 512.
[PATCH net-next] neighbor: Add protocol attribute
From: David Ahern Similar to routes and rules, add protocol attribute to neighbor entries for easier tracking of how each was created. Signed-off-by: David Ahern --- include/net/neighbour.h| 2 ++ include/uapi/linux/neighbour.h | 1 + net/core/neighbour.c | 24 +++- 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/include/net/neighbour.h b/include/net/neighbour.h index 6c13072910ab..e93c59df9501 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -149,6 +149,7 @@ struct neighbour { __u8nud_state; __u8type; __u8dead; + u8 protocol; seqlock_t ha_lock; unsigned char ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))]; struct hh_cache hh; @@ -173,6 +174,7 @@ struct pneigh_entry { possible_net_t net; struct net_device *dev; u8 flags; + u8 protocol; u8 key[0]; }; diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h index 998155444e0d..cd144e3099a3 100644 --- a/include/uapi/linux/neighbour.h +++ b/include/uapi/linux/neighbour.h @@ -28,6 +28,7 @@ enum { NDA_MASTER, NDA_LINK_NETNSID, NDA_SRC_VNI, + NDA_PROTOCOL, /* Originator of entry */ __NDA_MAX }; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index c3b58712e98b..56984695585d 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1799,6 +1799,7 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, struct net_device *dev = NULL; struct neighbour *neigh; void *dst, *lladdr; + u8 protocol = 0; int err; ASSERT_RTNL(); @@ -1838,6 +1839,14 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, dst = nla_data(tb[NDA_DST]); lladdr = tb[NDA_LLADDR] ? nla_data(tb[NDA_LLADDR]) : NULL; + if (tb[NDA_PROTOCOL]) { + if (nla_len(tb[NDA_PROTOCOL]) != sizeof(u8)) { + NL_SET_ERR_MSG(extack, "Invalid protocol attribute"); + goto out; + } + protocol = nla_get_u8(tb[NDA_PROTOCOL]); + } + if (ndm->ndm_flags & NTF_PROXY) { struct pneigh_entry *pn; @@ -1845,6 +1854,8 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, pn = pneigh_lookup(tbl, net, dst, dev, 1); if (pn) { pn->flags = ndm->ndm_flags; + if (protocol) + pn->protocol = protocol; err = 0; } goto out; @@ -1893,6 +1904,10 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, } else err = __neigh_update(neigh, lladdr, ndm->ndm_state, flags, NETLINK_CB(skb).portid, extack); + + if (protocol) + neigh->protocol = protocol; + neigh_release(neigh); out: @@ -2386,6 +2401,9 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh, nla_put(skb, NDA_CACHEINFO, sizeof(ci), )) goto nla_put_failure; + if (neigh->protocol && nla_put_u8(skb, NDA_PROTOCOL, neigh->protocol)) + goto nla_put_failure; + nlmsg_end(skb, nlh); return 0; @@ -2417,6 +2435,9 @@ static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn, if (nla_put(skb, NDA_DST, tbl->key_len, pn->key)) goto nla_put_failure; + if (pn->protocol && nla_put_u8(skb, NDA_PROTOCOL, pn->protocol)) + goto nla_put_failure; + nlmsg_end(skb, nlh); return 0; @@ -3072,7 +3093,8 @@ static inline size_t neigh_nlmsg_size(void) + nla_total_size(MAX_ADDR_LEN) /* NDA_DST */ + nla_total_size(MAX_ADDR_LEN) /* NDA_LLADDR */ + nla_total_size(sizeof(struct nda_cacheinfo)) - + nla_total_size(4); /* NDA_PROBES */ + + nla_total_size(4) /* NDA_PROBES */ + + nla_total_size(1); /* NDA_PROTOCOL */ } static void __neigh_notify(struct neighbour *n, int type, int flags, -- 2.11.0
Re: [PATCH iproute2-next 0/2] devlink: Add support for 'fw_load_policy' generic parameter
On 12/4/18 3:14 AM, Shalom Toledo wrote: > Patch #1 add string to uint conversion support for generic parameters. > Patch #2 add string to uint support for 'fw_load_policy' generic parameter > > Shalom Toledo (2): > devlink: Add string to uint{8,16,32} conversion for generic parameters > devlink: Add support for 'fw_load_policy' generic parameter > > devlink/devlink.c| 156 --- > include/uapi/linux/devlink.h | 5 ++ > 2 files changed, 151 insertions(+), 10 deletions(-) > applied to iproute2-next. Thanks
Re: [PATCH 1/5] net: dsa: ksz: Add MIB counter reading support
Every patch series should have a header posting with Subject of the form "[PATCH 0/N] ..." explaining what the series does at a high level, how it does it, and why it does it that way.
Re: [PATCH v2 net-next 0/4] net: aquantia: add RSS configuration
From: Igor Russkikh Date: Fri, 7 Dec 2018 14:00:09 + > In this patchset few bugs related to RSS are fixed and RSS table and > hash key configuration is added. > > We also do increase max number of HW rings upto 8. > > v2: removed extra arg check Series applied.
[PATCH v2 net-next] neighbor: Improve garbage collection
From: David Ahern The existing garbage collection algorithm has a number of problems: 1. The gc algorithm will not evict PERMANENT entries as those entries are managed by userspace, yet the existing algorithm walks the entire hash table which means it always considers PERMANENT entries when looking for entries to evict. In some use cases (e.g., EVPN) there can be tens of thousands of PERMANENT entries leading to wasted CPU cycles when gc kicks in. As an example, with 32k permanent entries, neigh_alloc has been observed taking more than 4 msec per invocation. 2. Currently, when the number of neighbor entries hits gc_thresh2 and the last flush for the table was more than 5 seconds ago gc kicks in walks the entire hash table evicting *all* entries not in PERMANENT or REACHABLE state and not marked as externally learned. There is no discriminator on when the neigh entry was created or if it just moved from REACHABLE to another NUD_VALID state (e.g., NUD_STALE). It is possible for entries to be created or for established neighbor entries to be moved to STALE (e.g., an external node sends an ARP request) right before the 5 second window lapses: -|-x|--|- t-5 t t+5 If that happens those entries are evicted during gc causing unnecessary thrashing on neighbor entries and userspace caches trying to track them. Further, this contradicts the description of gc_thresh2 which says "Entries older than 5 seconds will be cleared". One workaround is to make gc_thresh2 == gc_thresh3 but that negates the whole point of having separate thresholds. 3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries when gc_thresh2 is exceeded is over kill and contributes to trashing especially during startup. This patch addresses these problems as follows: 1. Use of a separate list_head to track entries that can be garbage collected along with a separate counter. PERMANENT entries are not added to this list. The gc_thresh parameters are only compared to the new counter, not the total entries in the table. The forced_gc function is updated to only walk this new gc_list looking for entries to evict. 2. Entries are added to the list head at the tail and removed from the front. 3. Entries are only evicted if they were last updated more than 5 seconds ago, adhering to the original intent of gc_thresh2. 4. Forced gc is stopped once the number of gc_entries drops below gc_thresh2. 5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped when allocating a new neighbor for a PERMANENT entry. By extension this means there are no explicit limits on the number of PERMANENT entries that can be created, but this is no different than FIB entries or FDB entries. Signed-off-by: David Ahern --- v2 - remove on_gc_list boolean in favor of !list_empty - fix neigh_alloc to add new entry to tail of list_head Documentation/networking/ip-sysctl.txt | 4 +- include/net/neighbour.h| 3 + net/core/neighbour.c | 119 +++-- 3 files changed, 90 insertions(+), 36 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index af2a69439b93..acdfb5d2bcaa 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -108,8 +108,8 @@ neigh/default/gc_thresh2 - INTEGER Default: 512 neigh/default/gc_thresh3 - INTEGER - Maximum number of neighbor entries allowed. Increase this - when using large numbers of interfaces and when communicating + Maximum number of non-PERMANENT neighbor entries allowed. Increase + this when using large numbers of interfaces and when communicating with large numbers of directly-connected peers. Default: 1024 diff --git a/include/net/neighbour.h b/include/net/neighbour.h index f58b384aa6c9..6c13072910ab 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -154,6 +154,7 @@ struct neighbour { struct hh_cache hh; int (*output)(struct neighbour *, struct sk_buff *); const struct neigh_ops *ops; + struct list_headgc_list; struct rcu_head rcu; struct net_device *dev; u8 primary_key[0]; @@ -214,6 +215,8 @@ struct neigh_table { struct timer_list proxy_timer; struct sk_buff_head proxy_queue; atomic_tentries; + atomic_tgc_entries; + struct list_headgc_list; rwlock_tlock; unsigned long last_rand; struct neigh_statistics __percpu *stats; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 6d479b5562be..c3b58712e98b 100644 --- a/net/core/neighbo
Re: [PATCH net] ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output
From: Shmulik Ladkani Date: Fri, 7 Dec 2018 09:50:17 +0200 > In 'seg6_output', stack variable 'struct flowi6 fl6' was missing > initialization. > > Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and > injection with lwtunnels") > Signed-off-by: Shmulik Ladkani Applied and queued up for -stable, thanks.
Re: [PATCH net-next] neighbour: Improve garbage collection
On 12/6/18 8:59 PM, David Miller wrote: > But why do you need the on_gc_list boolean state? f mental blockage. v2 coming up.
I wait to hear from you.
My Greeting, How are you today?Did you receive the letter i sent to you. Please answer me. Best Regard, Mr.David Abraham.
Re: [PATCH] Revert "net/ibm/emac: wrong bit is used for STA control"
Looks like your posting was empty?
Re: [PATCH net-next] neighbour: Improve garbage collection
From: David Ahern Date: Thu, 6 Dec 2018 14:38:44 -0800 > The existing garbage collection algorithm has a number of problems: Thanks for working on this! I totally agree with what you are doing, especially the separate gc_list. But why do you need the on_gc_list boolean state? That's equivalent to "!list_empty(>gc_list)" and seems redundant.
[PATCH net-next] neighbour: Improve garbage collection
From: David Ahern The existing garbage collection algorithm has a number of problems: 1. The gc algorithm will not evict PERMANENT entries as those entries are managed by userspace, yet the existing algorithm walks the entire hash table which means it always considers PERMANENT entries when looking for entries to evict. In some use cases (e.g., EVPN) there can be tens of thousands of PERMANENT entries leading to wasted CPU cycles when gc kicks in. As an example, with 32k permanent entries, neigh_alloc has been observed taking more than 4 msec per invocation. 2. Currently, when the number of neighbor entries hits gc_thresh2 and the last flush for the table was more than 5 seconds ago gc kicks in walks the entire hash table evicting *all* entries not in PERMANENT or REACHABLE state and not marked as externally learned. There is no discriminator on when the neigh entry was created or if it just moved from REACHABLE to another NUD_VALID state (e.g., NUD_STALE). It is possible for entries to be created or for established neighbor entries to be moved to STALE (e.g., an external node sends an ARP request) right before the 5 second window lapses: -|-x|--|- t-5 t t+5 If that happens those entries are evicted during gc causing unnecessary thrashing on neighbor entries and userspace caches trying to track them. Further, this contradicts the description of gc_thresh2 which says "Entries older than 5 seconds will be cleared". One workaround is to make gc_thresh2 == gc_thresh3 but that negates the whole point of having separate thresholds. 3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries when gc_thresh2 is exceeded is over kill and contributes to trashing especially during startup. This patch addresses these problems as follows: 1. use of a separate list_head to track entries that can be garbage collected along with a separate counter. PERMANENT entries are not added to this list. The gc_thresh parameters are only compared to the new counter, not the total entries in the table. The forced_gc function is updated to only walk this new gc_list looking for entries to evict. 2. Entries are added to the list head at the tail and removed from the front. 3. Entries are only evicted if they were last updated more than 5 seconds ago, adhering to the original intent of gc_thresh2. 4. Forced gc is stopped once the number of gc_entries drops below gc_thresh2. 5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped when allocating a new neighbor for a PERMANENT entry. By extension this means there are no explicit limits on the number of PERMANENT entries that can be created, but this is no different than FIB entries or FDB entries. Signed-off-by: David Ahern --- Documentation/networking/ip-sysctl.txt | 4 +- include/net/neighbour.h| 4 ++ net/core/neighbour.c | 122 +++-- 3 files changed, 93 insertions(+), 37 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index af2a69439b93..acdfb5d2bcaa 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -108,8 +108,8 @@ neigh/default/gc_thresh2 - INTEGER Default: 512 neigh/default/gc_thresh3 - INTEGER - Maximum number of neighbor entries allowed. Increase this - when using large numbers of interfaces and when communicating + Maximum number of non-PERMANENT neighbor entries allowed. Increase + this when using large numbers of interfaces and when communicating with large numbers of directly-connected peers. Default: 1024 diff --git a/include/net/neighbour.h b/include/net/neighbour.h index f58b384aa6c9..846ad8da91eb 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -154,6 +154,8 @@ struct neighbour { struct hh_cache hh; int (*output)(struct neighbour *, struct sk_buff *); const struct neigh_ops *ops; + struct list_headgc_list; + boolon_gc_list; struct rcu_head rcu; struct net_device *dev; u8 primary_key[0]; @@ -214,6 +216,8 @@ struct neigh_table { struct timer_list proxy_timer; struct sk_buff_head proxy_queue; atomic_tentries; + atomic_tgc_entries; + struct list_headgc_list; rwlock_tlock; unsigned long last_rand; struct neigh_statistics __percpu *stats; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 6d479b5562be..ab11e94ec44d 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -118,6 +118,36 @@ uns
Re: [PATCH net-next 2/2] net: dsa: Set the master device's MTU to account for DSA overheads
From: Andrew Lunn Date: Thu, 6 Dec 2018 21:48:46 +0100 > David has already accepted the patchset, so i will add a followup > patch. Yeah sorry for jumping the gun, the changes looked pretty straightforward to me. :-/
Re: [PATCH net-next v2 0/8] Pass extack to NETDEV_PRE_UP
From: Petr Machata Date: Thu, 6 Dec 2018 17:05:35 + > Drivers may need to validate configuration of a device that's about to > be upped. An example is mlxsw, which needs to check the configuration of > a VXLAN device attached to an offloaded bridge. Should the validation > fail, there's currently no way to communicate details of the failure to > the user, beyond an error number. > > Therefore this patch set extends the NETDEV_PRE_UP event to include > extack, if available. ... Series applied, thank you.
Re: [PATCH net 0/4] mlxsw: Various fixes
From: Ido Schimmel Date: Thu, 6 Dec 2018 17:44:48 + > Patches #1 and #2 fix two VxLAN related issues. The first patch removes > warnings that can currently be triggered from user space. Second patch > avoids leaking a FID in an error path. > > Patch #3 fixes a too strict check that causes certain host routes not to > be promoted to perform GRE decapsulation in hardware. > > Last patch avoids a use-after-free when deleting a VLAN device via an > ioctl when it is enslaved to a bridge. I have a patchset for net-next > that reworks this code and makes the driver more robust. Series applied.
Re: mv88e6060: Turn e6060 driver into e6065 driver
From: Pavel Machek Date: Thu, 6 Dec 2018 14:03:45 +0100 > @@ -79,7 +82,7 @@ static enum dsa_tag_protocol > mv88e6060_get_tag_protocol(struct dsa_switch *ds, > { >//return DSA_TAG_PROTO_QCA; >//return DSA_TAG_PROTO_TRAILER; These C++ style comments are not in any of my tree(s). Your patch submission really needs to shape up if you want your patches to be considered seriously. Thank you.
Re: [PATCH] mv88e6060: Warn about errors
Plain "printk" are never appropriate. Please explicitly use pr_warn() or similar. If there is a device context available, either a generic device or a netdev, use one of the dev_*() or netdev_*() variants.
Re: [PATCH] tcp: fix code style in tcp_recvmsg()
From: Pedro Tammela Date: Thu, 6 Dec 2018 10:45:28 -0200 > 2 goto labels are indented with a tab. remove the tabs and > keep the code style consistent. > > Signed-off-by: Pedro Tammela Applied to net-next.
Re: [PATCH net-next 0/2] Adjust MTU of DSA master interface
From: Andrew Lunn Date: Thu, 6 Dec 2018 11:36:03 +0100 > DSA makes use of additional headers to direct a frame in/out of a > specific port of the switch. When the slave interfaces uses an MTU of > 1500, the master interface can be asked to handle frames with an MTU > of 1504, or 1508 bytes. Some Ethernet interfaces won't > transmit/receive frames which are bigger than their MTU. > > Automate the increasing of the MTU on the master interface, by adding > to each tagging driver how much overhead they need, and then calling > dev_set_mtu() of the master interface to increase its MTU as needed. Series applied, thanks Andrew.
Re: [PATCH][net-next] tun: align write-heavy flow entry members to a cache line
From: Li RongQing Date: Thu, 6 Dec 2018 16:08:17 +0800 > tun flow entry 'updated' fields are written when receive > every packet. Thus if a flow is receiving packets from a > particular flow entry, it'll cause false-sharing with > all the other who has looked it up, so move it in its own > cache line > > and update 'queue_index' and 'update' field only when > they are changed to reduce the cache false-sharing. > > Signed-off-by: Zhang Yu > Signed-off-by: Wang Li > Signed-off-by: Li RongQing Applied.
Re: [PATCH][net-next] tun: remove unnecessary check in tun_flow_update
From: Li RongQing Date: Thu, 6 Dec 2018 16:28:11 +0800 > caller has guaranted that rxhash is not zero > > Signed-off-by: Li RongQing Applied.
Re: [PATCH 1/2] net: linkwatch: send change uevent on link changes
From: Jouke Witteveen Date: Thu, 6 Dec 2018 09:59:20 +0100 > On Thu, Dec 6, 2018 at 1:34 AM David Miller wrote: >> >> From: Jouke Witteveen >> Date: Wed, 5 Dec 2018 23:38:17 +0100 >> >> > Can you elaborate a bit? I may not be aware of the policy you have in >> > mind. >> >> When we have a user facing interface to do something, we don't create >> another one unless it is absolutely, positively, unavoidable. > > Obviously, if I would have known this I would not have gone through > the trouble of investigating and proposing this patch. It was an > honest attempt at making the kernel better. > Where could I have found this policy? I have looked on kernel.org/doc, > but couldn't find it. It is not formally documented but it is a concern we raise every time a duplicate piece of user facing functionality is proposed.
Re: [PATCH net] sctp: fix pr_warn max_data argument type mismatch
From: Jakub Audykowicz Date: Thu, 6 Dec 2018 08:58:37 +0100 > My previous patch introduced a compilation warning regarding a type > mismatch (int vs size_t). This is a one-letter fix for good housekeeping. > > Signed-off-by: Jakub Audykowicz Still wrong and I fixed it when I applied your patch. You need to use the 'Z' prefix for size_t, so %Zu in this case.
Re: [PATCH net-next] neighbor: Add extack messages for add and delete commands
From: David Ahern Date: Wed, 5 Dec 2018 20:02:29 -0800 > From: David Ahern > > Add extack messages for failures in neigh_add and neigh_delete. > > Signed-off-by: David Ahern Looks good, applied, thanks David.
Re: [PATCH net] ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
From: Jiri Wiesner Date: Wed, 5 Dec 2018 16:55:29 +0100 > The *_frag_reasm() functions are susceptible to miscalculating the byte > count of packet fragments in case the truesize of a head buffer changes. > The truesize member may be changed by the call to skb_unclone(), leaving > the fragment memory limit counter unbalanced even if all fragments are > processed. This miscalculation goes unnoticed as long as the network > namespace which holds the counter is not destroyed. > > Should an attempt be made to destroy a network namespace that holds an > unbalanced fragment memory limit counter the cleanup of the namespace > never finishes. The thread handling the cleanup gets stuck in > inet_frags_exit_net() waiting for the percpu counter to reach zero. The > thread is usually in running state with a stacktrace similar to: > > PID: 1073 TASK: 880626711440 CPU: 1 COMMAND: "kworker/u48:4" > #5 [880621563d48] _raw_spin_lock at 815f5480 > #6 [880621563d48] inet_evict_bucket at 8158020b > #7 [880621563d80] inet_frags_exit_net at 8158051c > #8 [880621563db0] ops_exit_list at 814f5856 > #9 [880621563dd8] cleanup_net at 814f67c0 > #10 [880621563e38] process_one_work at 81096f14 > > It is not possible to create new network namespaces, and processes > that call unshare() end up being stuck in uninterruptible sleep state > waiting to acquire the net_mutex. > > The bug was observed in the IPv6 netfilter code by Per Sundstrom. > I thank him for his analysis of the problem. The parts of this patch > that apply to IPv4 and IPv6 fragment reassembly are preemptive measures. > > Signed-off-by: Jiri Wiesner > Reported-by: Per Sundstrom Nice catch. Applied and queued up for -stable, thanks!
Re: [PATCH net 1/3] flex_array: make FLEX_ARRAY_BASE_SIZE the same value of FLEX_ARRAY_PART_SIZE
From: Xin Long Date: Wed, 5 Dec 2018 14:49:40 +0800 > This patch is to separate the base data memory from struct flex_array and > save it into a page. With this change, total_nr_elements of a flex_array > can grow or shrink without having the old element's memory changed when > the new size of the flex_arry crosses FLEX_ARRAY_BASE_SIZE, which will > be added in the next patch. > > Suggested-by: Neil Horman > Signed-off-by: Xin Long This needs to be reviewed by the flex array hackers and lkml. It can't just get reviewed on netdev alone.
Re: [PATCH v2 net-next 1/1] net: netem: use a list in addition to rbtree
From: Peter Oskolkov Date: Tue, 4 Dec 2018 11:55:56 -0800 > When testing high-bandwidth TCP streams with large windows, > high latency, and low jitter, netem consumes a lot of CPU cycles > doing rbtree rebalancing. > > This patch uses a linear list/queue in addition to the rbtree: > if an incoming packet is past the tail of the linear queue, it is > added there, otherwise it is inserted into the rbtree. > > Without this patch, perf shows netem_enqueue, netem_dequeue, > and rb_* functions among the top offenders. With this patch, > only netem_enqueue is noticeable if jitter is low/absent. > > Suggested-by: Eric Dumazet > Signed-off-by: Peter Oskolkov Applied, thanks.
Re: [PATCH net] sctp: frag_point sanity check
From: Jakub Audykowicz Date: Tue, 4 Dec 2018 20:27:41 +0100 > If for some reason an association's fragmentation point is zero, > sctp_datamsg_from_user will try to endlessly try to divide a message > into zero-sized chunks. This eventually causes kernel panic due to > running out of memory. > > Although this situation is quite unlikely, it has occurred before as > reported. I propose to add this simple last-ditch sanity check due to > the severity of the potential consequences. > > Signed-off-by: Jakub Audykowicz Applied.
[PATCH net-next] neighbor: Add extack messages for add and delete commands
From: David Ahern Add extack messages for failures in neigh_add and neigh_delete. Signed-off-by: David Ahern --- net/core/neighbour.c | 55 +--- 1 file changed, 39 insertions(+), 16 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 41954e42a2de..6d479b5562be 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1137,8 +1137,9 @@ static void neigh_update_hhs(struct neighbour *neigh) Caller MUST hold reference count on the entry. */ -int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, -u32 flags, u32 nlmsg_pid) +static int __neigh_update(struct neighbour *neigh, const u8 *lladdr, + u8 new, u32 flags, u32 nlmsg_pid, + struct netlink_ext_ack *extack) { u8 old; int err; @@ -1155,8 +1156,10 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, if (!(flags & NEIGH_UPDATE_F_ADMIN) && (old & (NUD_NOARP | NUD_PERMANENT))) goto out; - if (neigh->dead) + if (neigh->dead) { + NL_SET_ERR_MSG(extack, "Neighbor entry is now dead"); goto out; + } neigh_update_ext_learned(neigh, flags, ); @@ -1193,8 +1196,10 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, use it, otherwise discard the request. */ err = -EINVAL; - if (!(old & NUD_VALID)) + if (!(old & NUD_VALID)) { + NL_SET_ERR_MSG(extack, "No link layer address given"); goto out; + } lladdr = neigh->ha; } @@ -1307,6 +1312,12 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, return err; } + +int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, +u32 flags, u32 nlmsg_pid) +{ + return __neigh_update(neigh, lladdr, new, flags, nlmsg_pid, NULL); +} EXPORT_SYMBOL(neigh_update); /* Update the neigh to listen temporarily for probe responses, even if it is @@ -1678,8 +1689,10 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, goto out; dst_attr = nlmsg_find_attr(nlh, sizeof(*ndm), NDA_DST); - if (dst_attr == NULL) + if (!dst_attr) { + NL_SET_ERR_MSG(extack, "Network address not specified"); goto out; + } ndm = nlmsg_data(nlh); if (ndm->ndm_ifindex) { @@ -1694,8 +1707,10 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, if (tbl == NULL) return -EAFNOSUPPORT; - if (nla_len(dst_attr) < (int)tbl->key_len) + if (nla_len(dst_attr) < (int)tbl->key_len) { + NL_SET_ERR_MSG(extack, "Invalid network address"); goto out; + } if (ndm->ndm_flags & NTF_PROXY) { err = pneigh_delete(tbl, net, nla_data(dst_attr), dev); @@ -1711,10 +1726,9 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, goto out; } - err = neigh_update(neigh, NULL, NUD_FAILED, - NEIGH_UPDATE_F_OVERRIDE | - NEIGH_UPDATE_F_ADMIN, - NETLINK_CB(skb).portid); + err = __neigh_update(neigh, NULL, NUD_FAILED, +NEIGH_UPDATE_F_OVERRIDE | NEIGH_UPDATE_F_ADMIN, +NETLINK_CB(skb).portid, extack); write_lock_bh(>lock); neigh_release(neigh); neigh_remove_one(neigh, tbl); @@ -1744,8 +1758,10 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, goto out; err = -EINVAL; - if (tb[NDA_DST] == NULL) + if (!tb[NDA_DST]) { + NL_SET_ERR_MSG(extack, "Network address not specified"); goto out; + } ndm = nlmsg_data(nlh); if (ndm->ndm_ifindex) { @@ -1755,16 +1771,21 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, goto out; } - if (tb[NDA_LLADDR] && nla_len(tb[NDA_LLADDR]) < dev->addr_len) + if (tb[NDA_LLADDR] && nla_len(tb[NDA_LLADDR]) < dev->addr_len) { + NL_SET_ERR_MSG(extack, "Invalid link address"); goto out; + } } tbl = neigh_find_table(ndm->ndm_family); if (tbl == NULL) return -EAFNOSUPPORT; - if (nla_len(tb[NDA_DST]) < (int)tbl->key_len) + if (nla_len(tb[NDA_DST]) < (int)tbl->key_len) { + NL_SET_ERR_MSG(extack, "Invalid network address"); goto
Re: [PATCH net-next 2/7] neighbor: Fold ___neigh_lookup_noref into __neigh_lookup_noref
From: David Ahern Date: Wed, 5 Dec 2018 17:46:37 -0700 > ok. patches 5-7 are not dependent on 1-4. Should I re-send outside of > this set? Yes, please respin. Thanks David.
Re: [pull request][net-next V2 0/7] Mellanox, mlx5e updates 2018-12-04
From: Saeed Mahameed Date: Wed, 5 Dec 2018 16:12:58 -0800 > The following series is for mlx5e netdevice driver, it adds ethtool > support for RX hash fields configuration and some misc updates, please > see tag log below. > > Please pull and let me know if there's any problem. > > v1->v2: > - Move static const array to c file. > - Remove unnecessary blank line > - Add #include > - Print priv flag name rather than its hex value Pulled, thanks Saeed.
Re: [PATCH net-next 2/7] neighbor: Fold ___neigh_lookup_noref into __neigh_lookup_noref
On 12/5/18 5:46 PM, David Ahern wrote: > ok. patches 5-7 are not dependent on 1-4. Should I re-send outside of > this set? bleh. 5 is. I'll re-send.
Re: [PATCH net-next 2/7] neighbor: Fold ___neigh_lookup_noref into __neigh_lookup_noref
On 12/5/18 5:44 PM, David Miller wrote: > From: David Ahern > Date: Wed, 5 Dec 2018 15:34:09 -0800 > >> @@ -270,37 +270,25 @@ static inline bool neigh_key_eq128(const struct >> neighbour *n, const void *pkey) >> (n32[2] ^ p32[2]) | (n32[3] ^ p32[3])) == 0; >> } >> >> -static inline struct neighbour *___neigh_lookup_noref( >> -struct neigh_table *tbl, >> -bool (*key_eq)(const struct neighbour *n, const void *pkey), >> -__u32 (*hash)(const void *pkey, >> - const struct net_device *dev, >> - __u32 *hash_rnd), >> -const void *pkey, >> -struct net_device *dev) >> +static inline struct neighbour *__neigh_lookup_noref(struct neigh_table >> *tbl, >> + const void *pkey, >> + struct net_device *dev) >> { > > Sorry, we can't do this. > > The whole point of how this is laid out is so that the entire hash traversal, > including the hash function, is expanded inline. > > This demux is extremely critical on the output side, it must be the > smallest number of cycles possible. It was the only way I could justify > not caching neigh entries in the routes any more when I wrote this code. > > Even before retpoline, putting an indirect call here is painful. With > retpoline it is deadly. > > Please avoid removing the full inline expansion of the neigh lookup in the > ipv6 > and ipv4 data paths. > ok. patches 5-7 are not dependent on 1-4. Should I re-send outside of this set?
Re: [PATCH net-next 2/7] neighbor: Fold ___neigh_lookup_noref into __neigh_lookup_noref
From: David Ahern Date: Wed, 5 Dec 2018 15:34:09 -0800 > @@ -270,37 +270,25 @@ static inline bool neigh_key_eq128(const struct > neighbour *n, const void *pkey) > (n32[2] ^ p32[2]) | (n32[3] ^ p32[3])) == 0; > } > > -static inline struct neighbour *___neigh_lookup_noref( > - struct neigh_table *tbl, > - bool (*key_eq)(const struct neighbour *n, const void *pkey), > - __u32 (*hash)(const void *pkey, > - const struct net_device *dev, > - __u32 *hash_rnd), > - const void *pkey, > - struct net_device *dev) > +static inline struct neighbour *__neigh_lookup_noref(struct neigh_table *tbl, > + const void *pkey, > + struct net_device *dev) > { Sorry, we can't do this. The whole point of how this is laid out is so that the entire hash traversal, including the hash function, is expanded inline. This demux is extremely critical on the output side, it must be the smallest number of cycles possible. It was the only way I could justify not caching neigh entries in the routes any more when I wrote this code. Even before retpoline, putting an indirect call here is painful. With retpoline it is deadly. Please avoid removing the full inline expansion of the neigh lookup in the ipv6 and ipv4 data paths. Thank you.
Re: [PATCH net] tcp: fix NULL ref in tail loss probe
From: Yuchung Cheng Date: Wed, 5 Dec 2018 14:38:38 -0800 > TCP loss probe timer may fire when the retranmission queue is empty but > has a non-zero tp->packets_out counter. tcp_send_loss_probe will call > tcp_rearm_rto which triggers NULL pointer reference by fetching the > retranmission queue head in its sub-routines. > > Add a more detailed warning to help catch the root cause of the inflight > accounting inconsistency. > > Reported-by: Rafael Tinoco > Signed-off-by: Yuchung Cheng > Signed-off-by: Eric Dumazet > Signed-off-by: Neal Cardwell Applied, thanks for working to diagnose this so quickly.
Re: [PATCH 1/2] net: linkwatch: send change uevent on link changes
From: Jouke Witteveen Date: Wed, 5 Dec 2018 23:38:17 +0100 > Can you elaborate a bit? I may not be aware of the policy you have in > mind. When we have a user facing interface to do something, we don't create another one unless it is absolutely, positively, unavoidable.
Re: [PATCH net] tcp: Do not underestimate rwnd_limited
From: Eric Dumazet Date: Wed, 5 Dec 2018 14:24:31 -0800 > If available rwnd is too small, tcp_tso_should_defer() > can decide it is worth waiting before splitting a TSO packet. > > This really means we are rwnd limited. > > Fixes: 5615f88614a4 ("tcp: instrument how long TCP is limited by receive > window") > Signed-off-by: Eric Dumazet Applied and queued up for -stable, thanks Eric.
Re: pull-request: bpf 2018-12-05
From: Alexei Starovoitov Date: Wed, 5 Dec 2018 13:23:22 -0800 > The following pull-request contains BPF updates for your *net* tree. > > The main changes are: > > 1) fix bpf uapi pointers for 32-bit architectures, from Daniel. > > 2) improve verifer ability to handle progs with a lot of branches, from > Alexei. > > 3) strict btf checks, from Yonghong. > > 4) bpf_sk_lookup api cleanup, from Joe. > > 5) other misc fixes > > Please consider pulling these changes from: > > git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Pulled, thank you.
Re: [PATCH net-next 0/6] u32 to linkmode fixes
From: Andrew Lunn Date: Wed, 5 Dec 2018 21:49:39 +0100 > This patchset fixes issues found in the last patchset which converted > the phydev advertise etc, from a u32 to a linux bitmap. Most of the > issues are the result of clearing bits which should not of been > cleared. To make the API clearer, the idea from Heiner Kallweit was > used, with _mod_ to indicate the function modifies just the bits it > needs to, or _to_ to clear all bits and just set bit that need to be > set. Series applied, thanks Andrew. Please always list the Fixes tag first in the future. I fixed if up for you this time. Thanks again.
Re: [PATCH net] net: use skb_list_del_init() to remove from RX sublists
From: Edward Cree Date: Tue, 4 Dec 2018 17:37:57 + > list_del() leaves the skb->next pointer poisoned, which can then lead to > a crash in e.g. OVS forwarding. For example, setting up an OVS VXLAN > forwarding bridge on sfc as per: ... > So, in all listified-receive handling, instead pull skbs off the lists with > skb_list_del_init(). > > Fixes: 9af86f933894 ("net: core: fix use-after-free in > __netif_receive_skb_list_core") > Fixes: 7da517a3bc52 ("net: core: Another step of skb receive list processing") > Fixes: a4ca8b7df73c ("net: ipv4: fix drop handling in ip_list_rcv() and > ip_list_rcv_finish()") > Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()") > Signed-off-by: Edward Cree Applied and queued up for -stable > I'm not sure if these are the right Fixes tags, or if I should instead be > fingering some commit that made dev_hard_start_xmit() more sensitive to > skb->next. > Also, I only saw a crash from the list_del() in > __netif_receive_skb_list_core() > but I converted all of them in the listified RX path, in case any others > have similar ways to escape into paths that care about skb->next. I think we should use skb_list_del_init() on in all cases skb->list except where we immediately queue it onto another list in a trivially auditable way. Therefore I think what you did is the way to go. Thanks.
[PATCH net-next 2/7] neighbor: Fold ___neigh_lookup_noref into __neigh_lookup_noref
From: David Ahern There are no more direct callers of ___neigh_lookup_noref so no need for it to be a standalone helper. Signed-off-by: David Ahern --- include/net/neighbour.h | 22 +- 1 file changed, 5 insertions(+), 17 deletions(-) diff --git a/include/net/neighbour.h b/include/net/neighbour.h index f58b384aa6c9..aac87bc2d96b 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -270,37 +270,25 @@ static inline bool neigh_key_eq128(const struct neighbour *n, const void *pkey) (n32[2] ^ p32[2]) | (n32[3] ^ p32[3])) == 0; } -static inline struct neighbour *___neigh_lookup_noref( - struct neigh_table *tbl, - bool (*key_eq)(const struct neighbour *n, const void *pkey), - __u32 (*hash)(const void *pkey, - const struct net_device *dev, - __u32 *hash_rnd), - const void *pkey, - struct net_device *dev) +static inline struct neighbour *__neigh_lookup_noref(struct neigh_table *tbl, +const void *pkey, +struct net_device *dev) { struct neigh_hash_table *nht = rcu_dereference_bh(tbl->nht); struct neighbour *n; u32 hash_val; - hash_val = hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift); + hash_val = tbl->hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift); for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]); n != NULL; n = rcu_dereference_bh(n->next)) { - if (n->dev == dev && key_eq(n, pkey)) + if (n->dev == dev && tbl->key_eq(n, pkey)) return n; } return NULL; } -static inline struct neighbour *__neigh_lookup_noref(struct neigh_table *tbl, -const void *pkey, -struct net_device *dev) -{ - return ___neigh_lookup_noref(tbl, tbl->key_eq, tbl->hash, pkey, dev); -} - void neigh_table_init(int index, struct neigh_table *tbl); int neigh_table_clear(int index, struct neigh_table *tbl); struct neighbour *neigh_lookup(struct neigh_table *tbl, const void *pkey, -- 2.11.0
[PATCH net-next 5/7] neighbor: Create a neigh_hash helper
From: David Ahern Consolidate calculations of the neighbor hash into a single helper. Signed-off-by: David Ahern --- include/net/neighbour.h | 10 +- net/core/neighbour.c| 15 +-- 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/include/net/neighbour.h b/include/net/neighbour.h index aac87bc2d96b..092493a8c91b 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -270,6 +270,14 @@ static inline bool neigh_key_eq128(const struct neighbour *n, const void *pkey) (n32[2] ^ p32[2]) | (n32[3] ^ p32[3])) == 0; } +static inline u32 neigh_hash(struct neigh_table *tbl, +struct neigh_hash_table *nht, +const void *pkey, +struct net_device *dev) +{ + return tbl->hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift); +} + static inline struct neighbour *__neigh_lookup_noref(struct neigh_table *tbl, const void *pkey, struct net_device *dev) @@ -278,7 +286,7 @@ static inline struct neighbour *__neigh_lookup_noref(struct neigh_table *tbl, struct neighbour *n; u32 hash_val; - hash_val = tbl->hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift); + hash_val = neigh_hash(tbl, nht, pkey, dev); for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]); n != NULL; n = rcu_dereference_bh(n->next)) { diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 41954e42a2de..53e30c15882d 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -151,9 +151,8 @@ bool neigh_remove_one(struct neighbour *ndel, struct neigh_table *tbl) nht = rcu_dereference_protected(tbl->nht, lockdep_is_held(>lock)); - hash_val = tbl->hash(pkey, ndel->dev, nht->hash_rnd); - hash_val = hash_val >> (32 - nht->hash_shift); + hash_val = neigh_hash(tbl, nht, pkey, ndel->dev); np = >hash_buckets[hash_val]; while ((n = rcu_dereference_protected(*np, lockdep_is_held(>lock { @@ -434,10 +433,7 @@ static struct neigh_hash_table *neigh_hash_grow(struct neigh_table *tbl, lockdep_is_held(>lock)); n != NULL; n = next) { - hash = tbl->hash(n->primary_key, n->dev, -new_nht->hash_rnd); - - hash >>= (32 - new_nht->hash_shift); + hash = neigh_hash(tbl, new_nht, n->primary_key, n->dev); next = rcu_dereference_protected(n->next, lockdep_is_held(>lock)); @@ -485,9 +481,9 @@ struct neighbour *neigh_lookup_nodev(struct neigh_table *tbl, struct net *net, NEIGH_CACHE_STAT_INC(tbl, lookups); rcu_read_lock_bh(); - nht = rcu_dereference_bh(tbl->nht); - hash_val = tbl->hash(pkey, NULL, nht->hash_rnd) >> (32 - nht->hash_shift); + nht = rcu_dereference_bh(tbl->nht); + hash_val = neigh_hash(tbl, nht, pkey, NULL); for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]); n != NULL; n = rcu_dereference_bh(n->next)) { @@ -553,13 +549,12 @@ struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey, if (atomic_read(>entries) > (1 << nht->hash_shift)) nht = neigh_hash_grow(tbl, nht->hash_shift + 1); - hash_val = tbl->hash(n->primary_key, dev, nht->hash_rnd) >> (32 - nht->hash_shift); - if (n->parms->dead) { rc = ERR_PTR(-EINVAL); goto out_tbl_unlock; } + hash_val = neigh_hash(tbl, nht, n->primary_key, dev); for (n1 = rcu_dereference_protected(nht->hash_buckets[hash_val], lockdep_is_held(>lock)); n1 != NULL; -- 2.11.0
[PATCH net-next 6/7] neighbor: Skip the duplicate lookup in neigh_add
From: David Ahern When adding a new neighbor via rtnetlink, neigh_add does a lookup and if the result is NULL calls __neigh_lookup_errno to create a new entry if the NLM_F_CREATE flag is set. But, __neigh_lookup_errno calls neigh_lookup again before neigh_create; the neigh_lookup is redundant. Replace the call to __neigh_lookup_errno with a call to __neigh_create to more efficiently achieve the same result and prepare for the next patch. Signed-off-by: David Ahern --- net/core/neighbour.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 53e30c15882d..e324467e9a71 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1785,7 +1785,7 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, goto out; } - neigh = __neigh_lookup_errno(tbl, dst, dev); + neigh = __neigh_create(tbl, dst, dev, true); if (IS_ERR(neigh)) { err = PTR_ERR(neigh); goto out; -- 2.11.0
[PATCH net-next 0/7] neighbor: cleanups plus extack for add and delete
From: David Ahern cleanups: - remove open coding of key and hash functions for ipv4 and ipv6 and then collapse hash functions - collapse now unnecessary ___neigh_lookup_noref helper - create helper for neigh hash computation - remove duplicate lookup in neigh_add After that add extack messages for neighbor add and delete. David Ahern (7): neighbor: Remove open coding of key and hash functions neighbor: Fold ___neigh_lookup_noref into __neigh_lookup_noref net/ipv4: Move arp_hashfn into arp_hash net/ipv6: Move ndisc_hashfn to ndisc_hash neighbor: Create a neigh_hash helper neighbor: Skip the duplicate lookup in neigh_add neighbor: Add extack messages for add and delete commands include/net/arp.h | 10 +-- include/net/ndisc.h | 12 + include/net/neighbour.h | 30 + net/core/filter.c | 3 +-- net/core/neighbour.c| 72 ++--- net/ipv4/arp.c | 5 +++- net/ipv6/ndisc.c| 7 - 7 files changed, 71 insertions(+), 68 deletions(-) -- 2.11.0
[PATCH net-next 7/7] neighbor: Add extack messages for add and delete commands
From: David Ahern Add extack messages for failures in neigh_add and neigh_delete. Also, require NDA_DST length to be exactly the key length for the table otherwise it is an unexpected address and can lead to unexpected entries. e.g., IPv4 table sent and IPv6 address (using a modified ip): $ ip neigh add 2001:db8:1::1 dev foo $ ip neigh ls dev foo 32.1.13.184 dev foo lladdr 72:ed:f1:d9:20:9a PERMANENT Signed-off-by: David Ahern --- net/core/neighbour.c | 55 +--- 1 file changed, 39 insertions(+), 16 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index e324467e9a71..916a99fbb306 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1132,8 +1132,9 @@ static void neigh_update_hhs(struct neighbour *neigh) Caller MUST hold reference count on the entry. */ -int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, -u32 flags, u32 nlmsg_pid) +static int __neigh_update(struct neighbour *neigh, const u8 *lladdr, + u8 new, u32 flags, u32 nlmsg_pid, + struct netlink_ext_ack *extack) { u8 old; int err; @@ -1150,8 +1151,10 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, if (!(flags & NEIGH_UPDATE_F_ADMIN) && (old & (NUD_NOARP | NUD_PERMANENT))) goto out; - if (neigh->dead) + if (neigh->dead) { + NL_SET_ERR_MSG(extack, "Neighbor entry is now dead"); goto out; + } neigh_update_ext_learned(neigh, flags, ); @@ -1188,8 +1191,10 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, use it, otherwise discard the request. */ err = -EINVAL; - if (!(old & NUD_VALID)) + if (!(old & NUD_VALID)) { + NL_SET_ERR_MSG(extack, "No link layer address given"); goto out; + } lladdr = neigh->ha; } @@ -1302,6 +1307,12 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, return err; } + +int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, +u32 flags, u32 nlmsg_pid) +{ + return __neigh_update(neigh, lladdr, new, flags, nlmsg_pid, NULL); +} EXPORT_SYMBOL(neigh_update); /* Update the neigh to listen temporarily for probe responses, even if it is @@ -1673,8 +1684,10 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, goto out; dst_attr = nlmsg_find_attr(nlh, sizeof(*ndm), NDA_DST); - if (dst_attr == NULL) + if (!dst_attr) { + NL_SET_ERR_MSG(extack, "Network address not specified"); goto out; + } ndm = nlmsg_data(nlh); if (ndm->ndm_ifindex) { @@ -1689,8 +1702,10 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, if (tbl == NULL) return -EAFNOSUPPORT; - if (nla_len(dst_attr) < (int)tbl->key_len) + if (nla_len(dst_attr) < (int)tbl->key_len) { + NL_SET_ERR_MSG(extack, "Invalid network address"); goto out; + } if (ndm->ndm_flags & NTF_PROXY) { err = pneigh_delete(tbl, net, nla_data(dst_attr), dev); @@ -1706,10 +1721,9 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, goto out; } - err = neigh_update(neigh, NULL, NUD_FAILED, - NEIGH_UPDATE_F_OVERRIDE | - NEIGH_UPDATE_F_ADMIN, - NETLINK_CB(skb).portid); + err = __neigh_update(neigh, NULL, NUD_FAILED, +NEIGH_UPDATE_F_OVERRIDE | NEIGH_UPDATE_F_ADMIN, +NETLINK_CB(skb).portid, extack); write_lock_bh(>lock); neigh_release(neigh); neigh_remove_one(neigh, tbl); @@ -1739,8 +1753,10 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, goto out; err = -EINVAL; - if (tb[NDA_DST] == NULL) + if (!tb[NDA_DST]) { + NL_SET_ERR_MSG(extack, "Network address not specified"); goto out; + } ndm = nlmsg_data(nlh); if (ndm->ndm_ifindex) { @@ -1750,16 +1766,21 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, goto out; } - if (tb[NDA_LLADDR] && nla_len(tb[NDA_LLADDR]) < dev->addr_len) + if (tb[NDA_LLADDR] && nla_len(tb[NDA_LLADDR]) < dev->addr_len) { + NL_SET_ERR_MSG(extack, "Invalid link address"); g
[PATCH net-next 3/7] net/ipv4: Move arp_hashfn into arp_hash
From: David Ahern There are no more direct references to arp_hashfn so fold it into arp_hash, the hash callback for arp. Signed-off-by: David Ahern --- include/net/arp.h | 8 net/ipv4/arp.c| 5 - 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/include/net/arp.h b/include/net/arp.h index a5091f13cd3e..9f433c077b67 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -10,14 +10,6 @@ extern struct neigh_table arp_tbl; -static inline u32 arp_hashfn(const void *pkey, const struct net_device *dev, u32 *hash_rnd) -{ - u32 key = *(const u32 *)pkey; - u32 val = key ^ hash32_ptr(dev); - - return val * hash_rnd[0]; -} - static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key) { if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 850a6f13a082..6b88211287ae 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -213,7 +213,10 @@ static u32 arp_hash(const void *pkey, const struct net_device *dev, __u32 *hash_rnd) { - return arp_hashfn(pkey, dev, hash_rnd); + u32 key = *(const u32 *)pkey; + u32 val = key ^ hash32_ptr(dev); + + return val * hash_rnd[0]; } static bool arp_key_eq(const struct neighbour *neigh, const void *pkey) -- 2.11.0
[PATCH net-next 4/7] net/ipv6: Move ndisc_hashfn to ndisc_hash
From: David Ahern There are no more direct references to ndisc_hashfn so fold it into ndisc_hash, the hash callback for ndisc. Signed-off-by: David Ahern --- include/net/ndisc.h | 10 -- net/ipv6/ndisc.c| 7 ++- 2 files changed, 6 insertions(+), 11 deletions(-) diff --git a/include/net/ndisc.h b/include/net/ndisc.h index c354345c679b..83a84f68901b 100644 --- a/include/net/ndisc.h +++ b/include/net/ndisc.h @@ -364,16 +364,6 @@ static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p, ndisc_addr_option_pad(dev->type)); } -static inline u32 ndisc_hashfn(const void *pkey, const struct net_device *dev, __u32 *hash_rnd) -{ - const u32 *p32 = pkey; - - return (((p32[0] ^ hash32_ptr(dev)) * hash_rnd[0]) + - (p32[1] * hash_rnd[1]) + - (p32[2] * hash_rnd[2]) + - (p32[3] * hash_rnd[3])); -} - static inline struct neighbour *__ipv6_neigh_lookup_noref(struct net_device *dev, const void *pkey) { return __neigh_lookup_noref(_tbl, pkey, dev); diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 659ecf4e4b3c..304a32b3c3f5 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -311,7 +311,12 @@ static u32 ndisc_hash(const void *pkey, const struct net_device *dev, __u32 *hash_rnd) { - return ndisc_hashfn(pkey, dev, hash_rnd); + const u32 *p32 = pkey; + + return (((p32[0] ^ hash32_ptr(dev)) * hash_rnd[0]) + +(p32[1] * hash_rnd[1]) + +(p32[2] * hash_rnd[2]) + +(p32[3] * hash_rnd[3])); } static bool ndisc_key_eq(const struct neighbour *n, const void *pkey) -- 2.11.0
[PATCH net-next 1/7] neighbor: Remove open coding of key and hash functions
From: David Ahern ___neigh_lookup_noref takes the key and hash functions as inputs, yet those are part of the operations listed in the neigh_table which is also passed as an arugment. Remove the open coding of these internal implementations by converting uses of ___neigh_lookup_noref to __neigh_lookup_noref. For IPv4, arp_key_eq is essentially a call to neigh_key_eq32 and arp_hash is a call to arp_hashfn. Similarly for IPv6, ndisc_key_eq calls neigh_key_eq128 and ndisc_hash calls ndisc_hashfn. So the change in helpers is a no-op. Signed-off-by: David Ahern --- include/net/arp.h | 2 +- include/net/ndisc.h | 2 +- net/core/filter.c | 3 +-- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/include/net/arp.h b/include/net/arp.h index 977aabfcdc03..a5091f13cd3e 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -23,7 +23,7 @@ static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) key = INADDR_ANY; - return ___neigh_lookup_noref(_tbl, neigh_key_eq32, arp_hashfn, , dev); + return __neigh_lookup_noref(_tbl, , dev); } static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32 key) diff --git a/include/net/ndisc.h b/include/net/ndisc.h index ddfbb591e2c5..c354345c679b 100644 --- a/include/net/ndisc.h +++ b/include/net/ndisc.h @@ -376,7 +376,7 @@ static inline u32 ndisc_hashfn(const void *pkey, const struct net_device *dev, _ static inline struct neighbour *__ipv6_neigh_lookup_noref(struct net_device *dev, const void *pkey) { - return ___neigh_lookup_noref(_tbl, neigh_key_eq128, ndisc_hashfn, pkey, dev); + return __neigh_lookup_noref(_tbl, pkey, dev); } static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, const void *pkey) diff --git a/net/core/filter.c b/net/core/filter.c index bd0df75dc7b6..f10cc675783c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4668,8 +4668,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, * not needed here. Can not use __ipv6_neigh_lookup_noref here * because we need to get nd_tbl via the stub */ - neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128, - ndisc_hashfn, dst, dev); + neigh = __neigh_lookup_noref(ipv6_stub->nd_tbl, dst, dev); if (!neigh) return BPF_FIB_LKUP_RET_NO_NEIGH; -- 2.11.0
Re: [PATCH net] macvlan: remove duplicate check
From: Matteo Croce Date: Tue, 4 Dec 2018 18:05:42 +0100 > Following commit 59f997b088d2 ("macvlan: return correct error value"), > there is a duplicate check for mac addresses both in macvlan_sync_address() > and macvlan_set_mac_address(). > As the former calls the latter, remove the one in macvlan_set_mac_address() > and move the one in macvlan_sync_address() before any other check. > > Signed-off-by: Matteo Croce Hmmm, doesn't this change behavior? For the handling of the NETDEV_CHANGEADDR event in macvlan_device_event() we would make it to macvlan_sync_address(), and if IFF_UP is false, we would elide the macvlan_addr_busy() check and just copy the MAC addres over and return. Now, we would always perform the macvlan_addr_busy() check. Please, if this is OK, explain and document this behavioral chance in the commit message. Thank you.
Re: [PATCH 0/3] net: macb: DMA race condition fixes
From: Anssi Hannula Date: Fri, 30 Nov 2018 20:21:34 +0200 > Here are a couple of race condition fixes for the macb driver. The first > two are issues observed on real HW. It looks like there is still an active discussion about the memory barriers in patch #3 being excessive. Once that is sorted out to everyone's satisfaction, would you please repost this series with appropriate ACKs, reviewed-by's, tested-by's, etc. added? Thank you.
Re: [PATCH 1/3] net: macb: fix random memory corruption on RX with 64-bit DMA
From: Anssi Hannula Date: Fri, 30 Nov 2018 20:21:35 +0200 > @@ -682,6 +682,11 @@ static void macb_set_addr(struct macb *bp, struct > macb_dma_desc *desc, dma_addr_ > if (bp->hw_dma_cap & HW_DMA_CAP_64B) { > desc_64 = macb_64b_desc(bp, desc); > desc_64->addrh = upper_32_bits(addr); > + /* The low bits of RX address contain the RX_USED bit, clearing > + * of which allows packet RX. Make sure the high bits are also > + * visible to HW at that point. > + */ > + dma_wmb(); > } I agree with that dma_wmb() is what should be used here. We are ordering CPU stores with DMA visibility, which is exactly what the dma_*() are for. If it doesn't work properly on some architecture's implementation of dma_*(), those should be fixed rather than papering over it in the drivers.
Re: [PATCH 1/2] net: linkwatch: send change uevent on link changes
From: Jouke Witteveen Date: Wed, 5 Dec 2018 14:50:31 +0100 > For example, I maintain a network manager that delegates the actual > networking work to specialized programs. Basically "I've implemented things using separate programs" > Basically, it is an implementation of network manager logic in shell > script. For such a shell script, it is easy to respond to uevents > (via udev, or alternatives), but responding to rtnetlink messages > would require a separate program. And "In order to use rtnetlink I'll need a separate program!" (╯°□°)╯︵ ┻━┻ So it's ok to use the separate program paradigm for dividing up the tasks, but not for processing events? I'm not convinced. Either use the facility we have or extend it to fill a valid missing need. I'm not applying these patches, your logic doesn't add up and it's inconsistent with our clear goals of not duplicating functionality.
Re: [PATCH bpf-next 2/7] ppc: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_*
From: Jiong Wang Date: Wed, 05 Dec 2018 11:28:32 + > Indeed. Doubled checked the ISA doc,"Bit 32 of RS is replicated to fill > RA0:31.". > > Will fix both places in v2. See, sparc64 isn't so weird :-)
Re: [PATCH net-next] tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT
From: Eric Dumazet Date: Tue, 4 Dec 2018 07:58:17 -0800 > TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12 > as a step to enable bigger tcp sndbuf limits. > > It works reasonably well, but the following happens : > > Once the limit is reached, TCP stack generates > an [E]POLLOUT event for every incoming ACK packet. > > This causes a high number of context switches. > > This patch implements the strategy David Miller added > in sock_def_write_space() : > > - If TCP socket has a notsent_lowat constraint of X bytes, >allow sendmsg() to fill up to X bytes, but send [E]POLLOUT >only if number of notsent bytes is below X/2 > > This considerably reduces TCP_NOTSENT_LOWAT overhead, > while allowing to keep the pipe full. ... > Signed-off-by: Eric Dumazet > Acked-by: Soheil Hassas Yeganeh Applied, thanks Eric.
Re: [PATCH v2 2/2] net: mvpp2: fix phylink handling of invalid PHY modes
From: Baruch Siach Date: Tue, 4 Dec 2018 16:03:53 +0200 > The .validate phylink callback should empty the supported bitmap when > the interface mode is invalid. > > Cc: Maxime Chevallier > Cc: Antoine Tenart > Reported-by: Russell King > Signed-off-by: Baruch Siach Applied.
Re: [PATCH v2 1/2] net: mvpp2: fix detection of 10G SFP modules
From: Baruch Siach Date: Tue, 4 Dec 2018 16:03:52 +0200 > The mvpp2_phylink_validate() relies on the interface field of > phylink_link_state to determine valid link modes. However, when called > from phylink_sfp_module_insert() this field in not initialized. The > default switch case then excludes 10G link modes. This allows 10G SFP > modules that are detected correctly to be configured at max rate of > 2.5G. > > Catch the uninitialized PHY mode case, and allow 10G rates. > > Fixes: d97c9f4ab000b ("net: mvpp2: 1000baseX support") > Cc: Maxime Chevallier > Cc: Antoine Tenart > Acked-by: Russell King > Signed-off-by: Baruch Siach Applied.
Re: [PATCH v2 net-next] ip6_tunnel: Adding support of mapping rules for MAP-E tunnel
From: Felix Jia Date: Mon, 3 Dec 2018 16:39:31 +1300 > +int > +ip6_get_addrport(struct iphdr *iph, __be32 *saddr4, __be32 *daddr4, > + __be16 *sport4, __be16 *dport4, __u8 *proto, int *icmperr) > +{ This looks like something the flow dissector can do alreayd, please look into utilizing that common piece of infrastructure instead of reimplementing it. > + u8 *ptr; > + struct iphdr *icmpiph = NULL; > + struct tcphdr *tcph, *icmptcph; > + struct udphdr *udph, *icmpudph; > + struct icmphdr *icmph, *icmpicmph; Please always order local variables from longest to shortest line. Please audit your entire submission for this problem. > +static struct ip6_tnl_rule *ip6_tnl_rule_find(struct net_device *dev, > + __be32 _dst) > +{ > + u32 dst = ntohl(_dst); > + struct ip6_rule_list *pos = NULL; > + struct ip6_tnl *t = netdev_priv(dev); > + > + list_for_each_entry(pos, >rules.list, list) { > + int mask = > + 0x ^ ((1 << (32 - pos->data.ipv4_prefixlen)) - 1); > + if ((dst & mask) == ntohl(pos->data.ipv4_subnet.s_addr)) > + return >data; > + } > + return NULL; > +} How will this scale with large numbers of rules? This rule facility seems to be designed in a way that sophisticated (at least as fast as "O(log N)") lookup schemes aren't even possible, and that even worse the ordering matters.
Re: [PATCH net-next V2 0/2] net/sched: act_tunnel_key: support key-less tunnels
From: Or Gerlitz Date: Sun, 2 Dec 2018 14:55:19 +0200 > This short series from Adi Nissim allows to support key-less tunnels > by the tc tunnel key actions, which is needed for some GRE use-cases. > > changes from V0: > - addresses build warning spotted by kbuild, make sure to always init >to zero the tunnel key Series applied to net-next, thank you.
Re: [PATCH 1/2] net: linkwatch: send change uevent on link changes
From: Jouke Witteveen Date: Sat, 1 Dec 2018 17:00:21 +0100 > Make it easy for userspace to respond to acquisition/loss of carrier. > The uevent is picked up by udev and, on systems with systemd, the > device unit of the interface announces a configuration reload. > > Signed-off-by: Jouke Witteveen > --- > I did not want to change the commit message into a systemd-howto, but > subscribing to udev events can be done through a line like > ReloadPropagatedFrom=sys-subsystem-net-devices-%i.device > in a systemd unit file. I want to hear more about "why". If we have the rtnetlink message that can be listened for, userspace ought to use that. That's what it is there for.
Re: [PATCH net] rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices
From: Eric Dumazet Date: Tue, 4 Dec 2018 09:40:35 -0800 > kmsan was able to trigger a kernel-infoleak using a gre device [1] > > nlmsg_populate_fdb_fill() has a hard coded assumption > that dev->addr_len is ETH_ALEN, as normally guaranteed > for ARPHRD_ETHER devices. > > A similar issue was fixed recently in commit da71577545a5 > ("rtnetlink: Disallow FDB configuration for non-Ethernet device") ... > Fixes: d83b06036048 ("net: add fdb generic dump routine") > Signed-off-by: Eric Dumazet Applied and queued up for -stable, thanks Eric.
Re: [PATCH net-next v2 0/3] net: bridge: convert multicast to generic rhashtable
From: Nikolay Aleksandrov Date: Wed, 5 Dec 2018 01:45:16 +0200 > On a related note I saw Paul's call_rcu patches hit, so I'll wait for those > to go in and will rebase on top of them before sending the v3 as the bridge > change will have a conflict with this set. They aren't going in via my tree, so I wouldn't wait for that before you respin.
Re: [RFC bpf-next 1/7] bpf: interpreter support BPF_ALU | BPF_ARSH
From: Alexei Starovoitov Date: Tue, 4 Dec 2018 12:16:04 -0800 > You already did :) Amazing, I'll take the rest of the day off, thanks! :)
Re: [RFC bpf-next 1/7] bpf: interpreter support BPF_ALU | BPF_ARSH
From: Jiong Wang Date: Tue, 4 Dec 2018 20:14:11 + > On 04/12/2018 20:10, David Miller wrote: >> From: Alexei Starovoitov >> Date: Tue, 4 Dec 2018 11:29:55 -0800 >> >>> I guess sparc doesn't really have 32 subregisters. All registers >>> are considered 64-bit. It has 32-bit alu ops on 64-bit registers >>> instead. >> Right. >> >> Anyways, sparc will require two instructions because of this, the >> 'sra' then a 'srl' by zero bits to clear the top 32-bits. >> >> I'll code up the sparc JIT part when this goes in. > > Hmm, I had been going through all JIT backends, and saw there is > do_alu32_trunc after jitting sra for BPF_ALU. That's what needed? Yes, it clears the top 32-bits of a register after a 32-bit ALU op beccause BPF's semantics require it. In fact, we call it too much, we even call it for 32-bit shift right instructions which automatically clear those top bits. I've been meaning to optimize that. Meanwhile, again the answer to your question is yes.
Re: [RFC bpf-next 1/7] bpf: interpreter support BPF_ALU | BPF_ARSH
From: Alexei Starovoitov Date: Tue, 4 Dec 2018 11:29:55 -0800 > I guess sparc doesn't really have 32 subregisters. All registers > are considered 64-bit. It has 32-bit alu ops on 64-bit registers > instead. Right. Anyways, sparc will require two instructions because of this, the 'sra' then a 'srl' by zero bits to clear the top 32-bits. I'll code up the sparc JIT part when this goes in.
Re: [PATCH net-next] net: netem: use a list in addition to rbtree
From: Peter Oskolkov Date: Tue, 4 Dec 2018 11:10:55 -0800 > Thanks, Stephen! > > I don't care much about braces either. David, do you want me to send a > new patch with braces moved around? Single statement basic blocks definitely must not have curly braces, please remove them and repost. Thank you.
RE: [PATCH net 1/2] net/mlx4_en: Change min MTU size to ETH_MIN_MTU
From: Eric Dumazet > Sent: 04 December 2018 17:04 > On 12/04/2018 08:59 AM, David Laight wrote: > > From: Tariq Toukan > >> Sent: 02 December 2018 12:35 > >> From: Eran Ben Elisha > >> > >> NIC driver minimal MTU size shall be set to ETH_MIN_MTU, as defined in > >> the RFC791 and in the network stack. Remove old mlx4_en only define for > >> it, which was set to wrong value. > > ... > >> > >> - /* MTU range: 46 - hw-specific max */ > >> - dev->min_mtu = MLX4_EN_MIN_MTU; > >> + /* MTU range: 68 - hw-specific max */ > >> + dev->min_mtu = ETH_MIN_MTU; > >>dev->max_mtu = priv->max_mtu; > > > > Where does 68 come from? > > Min IPv4 MTU per RFC791 Maybe I'm just confused and these are the ranges that the 'maximum mtu' can be set to. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
RE: [PATCH net 1/2] net/mlx4_en: Change min MTU size to ETH_MIN_MTU
From: Eric Dumazet > Sent: 04 December 2018 17:04 > > On 12/04/2018 08:59 AM, David Laight wrote: > > From: Tariq Toukan > >> Sent: 02 December 2018 12:35 > >> From: Eran Ben Elisha > >> > >> NIC driver minimal MTU size shall be set to ETH_MIN_MTU, as defined in > >> the RFC791 and in the network stack. Remove old mlx4_en only define for > >> it, which was set to wrong value. > > ... > >> > >> - /* MTU range: 46 - hw-specific max */ > >> - dev->min_mtu = MLX4_EN_MIN_MTU; > >> + /* MTU range: 68 - hw-specific max */ > >> + dev->min_mtu = ETH_MIN_MTU; > >>dev->max_mtu = priv->max_mtu; > > > > Where does 68 come from? > > Min IPv4 MTU per RFC791 Which has nothing to do with an ethernet driver. Indeed, IIRC, it is the smallest maximum frame size that IPv4 can work over. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
RE: [PATCH net 1/2] net/mlx4_en: Change min MTU size to ETH_MIN_MTU
From: Tariq Toukan > Sent: 02 December 2018 12:35 > From: Eran Ben Elisha > > NIC driver minimal MTU size shall be set to ETH_MIN_MTU, as defined in > the RFC791 and in the network stack. Remove old mlx4_en only define for > it, which was set to wrong value. ... > > - /* MTU range: 46 - hw-specific max */ > - dev->min_mtu = MLX4_EN_MIN_MTU; > + /* MTU range: 68 - hw-specific max */ > + dev->min_mtu = ETH_MIN_MTU; > dev->max_mtu = priv->max_mtu; Where does 68 come from? The minimum size of an ethernet packet including the mac addresses and CRC is 64 bytes - but that would never be an 'mtu'. Since 64 - 46 = 18, the 46 probably excludes both MAC addresses, the ethertype/length and the CRC. This is 'sort of' the minimum mtu for an ethernet frame. I'm not sure which values are supposed to be in dev->min/max_mtu. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH net-next 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin
From: Paolo Abeni Date: Tue, 04 Dec 2018 12:27:51 +0100 > On Mon, 2018-12-03 at 10:04 -0800, Eric Dumazet wrote: >> On 12/03/2018 03:40 AM, Paolo Abeni wrote: >> > This header define a bunch of helpers that allow avoiding the >> > retpoline overhead when calling builtin functions via function pointers. >> > It boils down to explicitly comparing the function pointers to >> > known builtin functions and eventually invoke directly the latter. >> > >> > The macros defined here implement the boilerplate for the above schema >> > and will be used by the next patches. >> > >> > rfc -> v1: >> > - use branch prediction hint, as suggested by Eric >> > >> > Suggested-by: Eric Dumazet >> > Signed-off-by: Paolo Abeni >> > --- >> > include/linux/indirect_call_wrapper.h | 77 +++ >> > 1 file changed, 77 insertions(+) >> > create mode 100644 include/linux/indirect_call_wrapper.h >> >> This needs to be discussed more broadly, please include lkml > > Agreed. @David: please let me know if you prefer a repost or a v2 with > the expanded recipients list. v2 probably works better and will help me better keep track of things. Thanks for asking.
Re: [RFC bpf-next 1/7] bpf: interpreter support BPF_ALU | BPF_ARSH
From: Jiong Wang Date: Tue, 4 Dec 2018 04:56:29 -0500 > This patch implements interpreting BPF_ALU | BPF_ARSH. Do arithmetic right > shift on low 32-bit sub-register, and zero the high 32 bits. > > Reviewed-by: Jakub Kicinski > Signed-off-by: Jiong Wang I just want to say that this behavior is interesting because on most cpus that have a 32-bit and 64-bit variant, the 32-bit arithmetic right shift typically sign extends to 64-bit rather than zero extends which is what is being defined here. Well, definitely, sparc64 behaves this way.
Re: [PATCH net-next 0/4] mlxsw: Add one-armed router support
From: Ido Schimmel Date: Tue, 4 Dec 2018 08:15:09 + > Up until now, when a packet was routed by the ASIC through the same > router interface (RIF) from which it ingressed from, the ASIC passed the > sole copy of the packet to the kernel. This allowed the kernel to route > the packet and also potentially generate an ICMP redirect. > > There are scenarios (e.g., "one-armed router") where packets are > intentionally routed this way and are therefore not deemed as > exceptions. In such scenarios the current method of trapping packets to > the CPU is problematic, as it results in major packet loss. > > This patchset solves the problem by having the ASIC forward the packet, > but also send a copy to the CPU, which gives the kernel the opportunity > to generate required exceptions. > > To prevent the kernel from forwarding such packets again, the driver > marks them with 'offload_l3_fwd_mark', which causes the kernel to > consume them in ip{,6}_forward_finish(). > > Patch #1 renames 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark'. When > set, the field indicates that a packet was already forwarded in L3 > (unicast / multicast) by a capable device. > > Patch #2 teaches the kernel to consume unicast packets that have > 'offload_l3_fwd_mark' set. > > Patch #3 changes mlxsw to mirror loopbacked (iRIF == eRIF) packets, > instead of trapping them. > > Patch #4 adds a test case for above mentioned scenario. Series applied, thank you.
Re: consistency for statistics with XDP mode
From: David Ahern Date: Mon, 3 Dec 2018 17:15:03 -0700 > So, instead of a program tag which the program writer controls, how > about some config knob that an admin controls that says at attach time > use standard stats? How about, instead of replacing it is in addition to, and admin can override? I'm all for choice so how can I object? :)
Re: [PATCH net-next 0/2] mlx4_core cleanups
From: Tariq Toukan Date: Sun, 2 Dec 2018 17:40:24 +0200 > This patchset by Erez contains cleanups to the mlx4_core driver. > > Patch 1 replaces -EINVAL with -EOPNOTSUPP for unsupported operations. > Patch 2 fixes some coding style issues. > > Series generated against net-next commit: > 97e6c858a26e net: usb: aqc111: Initialize wol_cfg with memset in > aqc111_suspend Series applied, thanks.
Re: [PATCH net-next v2] net: phy: Also request modules for C45 IDs
From: Jose Abreu Date: Sun, 2 Dec 2018 16:33:14 +0100 > Logic of phy_device_create() requests PHY modules according to PHY ID > but for C45 PHYs we use different field for the IDs. > > Let's also request the modules for these IDs. > > Changes from v1: > - Only request C22 modules if C45 are not present (Andrew) > > Signed-off-by: Jose Abreu Applied, thanks Jose. Florian, just for the record, I actually like the changelogs to be in the commit messages. It can help people understand that something was deliberately implemented a certain way and alternative approaches were considered.
Re: [PATCH net-next v2 00/14] octeontx2-af: NIX and NPC enhancements
From: Jerin Jacob Date: Sun, 2 Dec 2018 18:17:35 +0530 > This patchset is a continuation to earlier submitted four patch > series to add a new driver for Marvell's OcteonTX2 SOC's > Resource virtualization unit (RVU) admin function driver. > > 1. octeontx2-af: Add RVU Admin Function driver >https://www.spinics.net/lists/netdev/msg528272.html > 2. octeontx2-af: NPA and NIX blocks initialization >https://www.spinics.net/lists/netdev/msg529163.html > 3. octeontx2-af: NPC parser and NIX blocks initialization >https://www.spinics.net/lists/netdev/msg530252.html > 4. octeontx2-af: NPC MCAM support and FLR handling >https://www.spinics.net/lists/netdev/msg534392.html > > This patch series adds support for below > > NPC block: > - Add NPC(mkex) profile support for various Key extraction configurations > > NIX block: > - Enable dynamic RSS flow key algorithm configuration > - Enhancements on Rx checksum and error checks > - Add support for Tx packet marking support > - TL1 schedule queue allocation enhancements > - Add LSO format configuration mbox > - VLAN TPID configuration > - Skip multicast entry init for broadcast tables ... Series applied, thanks.
Re: [PATCH net 0/2] mlx4 fixes for 4.20-rc
From: Tariq Toukan Date: Sun, 2 Dec 2018 14:34:35 +0200 > This patchset includes small fixes for the mlx4_en driver. > > First patch by Eran fixes the value used to init the netdevice's > min_mtu field. > Please queue it to -stable >= v4.10. > > Second patch by Saeed adds missing Kconfig build dependencies. > > Series generated against net commit: > 35b827b6d061 tun: forbid iface creation with rtnl ops Series applied and patch #1 queued up for -stable, thanks.
Re: consistency for statistics with XDP mode
On 12/3/18 5:00 PM, David Miller wrote: > From: Toke Høiland-Jørgensen > Date: Mon, 03 Dec 2018 22:00:32 +0200 > >> I wonder if it would be possible to support both the "give me user >> normal stats" case and the "let me do whatever I want" case by a >> combination of userspace tooling and maybe a helper or two? >> >> I.e., create a "do_stats()" helper (please pick a better name), which >> will either just increment the desired counters, or set a flag so the >> driver can do it at napi poll exit. With this, the userspace tooling >> could have a "--give-me-normal-stats" switch (or some other interface), >> which would inject a call instruction to that helper at the start of the >> program. >> >> This would enable the normal counters in a relatively painless way, >> while still letting people opt out if they don't want to pay the cost in >> terms of overhead. And having the userspace tooling inject the helper >> call helps support the case where the admin didn't write the XDP >> programs being loaded. >> >> Any reason why that wouldn't work? > > I think this is a good idea, or even an attribute tag that gets added > to the XDP program that controls stats handling. > My argument is that the ebpf program writer should *not* get that choice; the admin of the box should. Program writers make mistakes. Box admins / customer support are the ones that have to deal with those mistakes. Program writers - especially for xdp - are going to be focused on benchmarks; admins are focused on the big picture and should be given the option of trading a small amount of performance for simpler management. So, instead of a program tag which the program writer controls, how about some config knob that an admin controls that says at attach time use standard stats?
Re: [iproute2-next PATCH v6] tc: flower: Classify packets based port ranges
On 12/3/18 4:58 PM, Nambiar, Amritha wrote: > A previous version v3 of this patch was already applied to iproute2-next. > https://patchwork.ozlabs.org/patch/998644/ > > I think that needs to be reverted for this v6 to apply clean. ugh. That's embarrassing. Looks like I inadvertently pushed the older one. Reverted and applied. Thanks,
Re: [PATCH net] macvlan: return correct error value
From: Matteo Croce Date: Sat, 1 Dec 2018 00:26:27 +0100 > A MAC address must be unique among all the macvlan devices with the same > lower device. The only exception is the passthru [sic] mode, > which shares the lower device address. > > When duplicate addresses are detected, EBUSY is returned when bringing > the interface up: > > # ip link add macvlan0 link eth0 type macvlan > # read addr # ip link set macvlan0 address $addr > # ip link set macvlan0 up > RTNETLINK answers: Device or resource busy > > Use correct error code which is EADDRINUSE, and do the check also > earlier, on address change: > > # ip link set macvlan0 address $addr > RTNETLINK answers: Address already in use > > Signed-off-by: Matteo Croce Applied, thanks Matteo.
Re: consistency for statistics with XDP mode
From: Toke Høiland-Jørgensen Date: Mon, 03 Dec 2018 22:00:32 +0200 > I wonder if it would be possible to support both the "give me user > normal stats" case and the "let me do whatever I want" case by a > combination of userspace tooling and maybe a helper or two? > > I.e., create a "do_stats()" helper (please pick a better name), which > will either just increment the desired counters, or set a flag so the > driver can do it at napi poll exit. With this, the userspace tooling > could have a "--give-me-normal-stats" switch (or some other interface), > which would inject a call instruction to that helper at the start of the > program. > > This would enable the normal counters in a relatively painless way, > while still letting people opt out if they don't want to pay the cost in > terms of overhead. And having the userspace tooling inject the helper > call helps support the case where the admin didn't write the XDP > programs being loaded. > > Any reason why that wouldn't work? I think this is a good idea, or even an attribute tag that gets added to the XDP program that controls stats handling.
Re: [PATCH net-next v4 0/3] udp msg_zerocopy
From: Willem de Bruijn Date: Fri, 30 Nov 2018 15:32:38 -0500 > Enable MSG_ZEROCOPY for udp sockets Series applied, thanks for keeping up with this.
Re: [PATCH 0/3] net: macb: DMA race condition fixes
From: Date: Mon, 3 Dec 2018 08:26:52 + > Can you please delay a bit the acceptance of this series, I would like > that we assess these findings with tests on our hardware before applying > them. Sure.
Re: [PATCH net] sctp: kfree_rcu asoc
From: Xin Long Date: Sat, 1 Dec 2018 01:36:59 +0800 > In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences > a transport's asoc under rcu_read_lock while asoc is freed not after > a grace period, which leads to a use-after-free panic. > > This patch fixes it by calling kfree_rcu to make asoc be freed after > a grace period. > > Note that only the asoc's memory is delayed to free in the patch, it > won't cause sk to linger longer. > > Thanks Neil and Marcelo to make this clear. > > Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport > rhashtable") > Fixes: cd2b70875058 ("sctp: check duplicate node before inserting a new > transport") > Reported-by: syzbot+0b05d8aa7cb185107...@syzkaller.appspotmail.com > Reported-by: syzbot+aad231d51b1923158...@syzkaller.appspotmail.com > Suggested-by: Neil Horman > Signed-off-by: Xin Long Applied and queued up for -stable, thanks.
Re: [PATCH net] net/ibmvnic: Fix RTNL deadlock during device reset
From: Thomas Falcon Date: Fri, 30 Nov 2018 10:59:08 -0600 > Commit a5681e20b541 ("net/ibmnvic: Fix deadlock problem > in reset") made the change to hold the RTNL lock during > driver reset but still calls netdev_notify_peers, which > results in a deadlock. Instead, use call_netdevice_notifiers, > which is functionally the same except that it does not > take the RTNL lock again. > > Fixes: a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset") > > Signed-off-by: Thomas Falcon Applied.
Re: [PATCH net] rtnetlink: Refine sanity checks in rtnl_fdb_{add|del}
From: Ido Schimmel Date: Fri, 30 Nov 2018 19:00:24 +0200 > Yes, agree. Patch is good. I'll tag your v2. This means, I assume, that a new version of this fix is coming. Eric, is this correct?