[PATCH] fec: Use gpio_set_value_cansleep()
From: Fabio EstevamWe are in a context where we can sleep, and the FEC PHY reset gpio may be on an I2C expander. Use the cansleep() variant when setting the GPIO value. Based on a patch from Russell King for pci-mvebu.c. Signed-off-by: Fabio Estevam --- drivers/net/ethernet/freescale/fec_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 501e143..b2a3220 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3261,7 +3261,7 @@ static void fec_reset_phy(struct platform_device *pdev) return; } msleep(msec); - gpio_set_value(phy_reset, 1); + gpio_set_value_cansleep(phy_reset, 1); } #else /* CONFIG_OF */ static void fec_reset_phy(struct platform_device *pdev) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 2/2] ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues
Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. This is the second version of the patch which doesn't use the overflow_usub function, which got reverted for now. Suggested-by: Linus TorvaldsCc: Linus Torvalds Reported-by: Dmitry Vyukov Cc: Dmitry Vyukov Signed-off-by: Hannes Frederic Sowa --- net/ipv6/ip6_output.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index d03d6da..f84ec4e 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -584,6 +584,8 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb, if (np->frag_size) mtu = np->frag_size; } + if (mtu < hlen + sizeof(struct frag_hdr) + 8) + goto fail_toobig; mtu -= hlen + sizeof(struct frag_hdr); frag_id = ipv6_select_ident(net, _hdr(skb)->daddr, -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 1/2] Revert "Merge branch 'ipv6-overflow-arith'"
Linus dislikes these changes. To not hold up the net-merge let's revert it for now and fix the bug like Linus suggested. This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9. Cc: Linus TorvaldsSigned-off-by: Hannes Frederic Sowa --- Sorry for delaying the net pull request! include/linux/compiler-gcc.h | 4 include/linux/overflow-arith.h | 18 -- net/ipv6/ip6_output.c | 6 +- 3 files changed, 1 insertion(+), 27 deletions(-) delete mode 100644 include/linux/overflow-arith.h diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 82c159e..dfaa7b3 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -237,10 +237,6 @@ #define KASAN_ABI_VERSION 3 #endif -#if GCC_VERSION >= 5 -#define CC_HAVE_BUILTIN_OVERFLOW -#endif - #endif /* gcc version >= 4 specific checks */ #if !defined(__noclone) diff --git a/include/linux/overflow-arith.h b/include/linux/overflow-arith.h deleted file mode 100644 index e12ccf8..000 --- a/include/linux/overflow-arith.h +++ /dev/null @@ -1,18 +0,0 @@ -#pragma once - -#include - -#ifdef CC_HAVE_BUILTIN_OVERFLOW - -#define overflow_usub __builtin_usub_overflow - -#else - -static inline bool overflow_usub(unsigned int a, unsigned int b, -unsigned int *res) -{ - *res = a - b; - return *res > a ? true : false; -} - -#endif diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 8dddb45..d03d6da 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -28,7 +28,6 @@ #include #include -#include #include #include #include @@ -585,10 +584,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb, if (np->frag_size) mtu = np->frag_size; } - - if (overflow_usub(mtu, hlen + sizeof(struct frag_hdr), ) || - mtu <= 7) - goto fail_toobig; + mtu -= hlen + sizeof(struct frag_hdr); frag_id = ipv6_select_ident(net, _hdr(skb)->daddr, _hdr(skb)->saddr); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v6, 4/6] fsl/fman: Add FMan SP support
From: Igal LibermanThe Storage Profiles contain parameters that are used by the FMan for frame reception and transmission. Signed-off-by: Igal Liberman --- drivers/net/ethernet/freescale/fman/Makefile |2 +- drivers/net/ethernet/freescale/fman/fman_sp.c | 167 + drivers/net/ethernet/freescale/fman/fman_sp.h | 103 +++ 3 files changed, 271 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/freescale/fman/fman_sp.c create mode 100644 drivers/net/ethernet/freescale/fman/fman_sp.h diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile index 43360d70..5141532 100644 --- a/drivers/net/ethernet/freescale/fman/Makefile +++ b/drivers/net/ethernet/freescale/fman/Makefile @@ -2,5 +2,5 @@ subdir-ccflags-y += -I$(srctree)/drivers/net/ethernet/freescale/fman obj-y += fsl_fman.o fsl_fman_mac.o -fsl_fman-objs := fman_muram.o fman.o +fsl_fman-objs := fman_muram.o fman.o fman_sp.o fsl_fman_mac-objs := fman_dtsec.o fman_memac.o fman_tgec.o diff --git a/drivers/net/ethernet/freescale/fman/fman_sp.c b/drivers/net/ethernet/freescale/fman/fman_sp.c new file mode 100644 index 000..f36c622 --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/fman_sp.c @@ -0,0 +1,167 @@ +/* + * Copyright 2008 - 2015 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include "fman_sp.h" +#include "fman.h" + +void fman_sp_set_buf_pools_in_asc_order_of_buf_sizes(struct fman_ext_pools +*fm_ext_pools, +u8 *ordered_array, +u16 *sizes_array) +{ + u16 buf_size = 0; + int i = 0, j = 0, k = 0; + + /* First we copy the external buffers pools information +* to an ordered local array +*/ + for (i = 0; i < fm_ext_pools->num_of_pools_used; i++) { + /* get pool size */ + buf_size = fm_ext_pools->ext_buf_pool[i].size; + + /* keep sizes in an array according to poolId +* for direct access +*/ + sizes_array[fm_ext_pools->ext_buf_pool[i].id] = buf_size; + + /* save poolId in an ordered array according to size */ + for (j = 0; j <= i; j++) { + /* this is the next free place in the array */ + if (j == i) + ordered_array[i] = + fm_ext_pools->ext_buf_pool[i].id; + else { + /* find the right place for this poolId */ + if (buf_size < sizes_array[ordered_array[j]]) { + /* move the pool_ids one place ahead +* to make room for this poolId +*/ + for (k = i; k > j; k--) + ordered_array[k] = +
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On Wed, 2015-10-28 at 12:35 +, Al Viro wrote: > [Linus and Dave added, Solaris and NetBSD folks dropped from Cc] > > On Tue, Oct 27, 2015 at 05:13:56PM -0700, Eric Dumazet wrote: > > On Tue, 2015-10-27 at 23:17 +, Al Viro wrote: > > > > > * [Linux-specific aside] our __alloc_fd() can degrade quite badly > > > with some use patterns. The cacheline pingpong in the bitmap is probably > > > inevitable, unless we accept considerably heavier memory footprint, > > > but we also have a case when alloc_fd() takes O(n) and it's _not_ hard > > > to trigger - close(3);open(...); will have the next open() after that > > > scanning the entire in-use bitmap. I think I see a way to improve it > > > without slowing the normal case down, but I'll need to experiment a > > > bit before I post patches. Anybody with examples of real-world loads > > > that make our descriptor allocator to degrade is very welcome to post > > > the reproducers... > > > > Well, I do have real-world loads, but quite hard to setup in a lab :( > > > > Note that we also hit the 'struct cred'->usage refcount for every > > open()/close()/sock_alloc(), and simply moving uid/gid out of the first > > cache line really helps, as current_fsuid() and current_fsgid() no > > longer forces a pingpong. > > > > I moved seldom used fields on the first cache line, so that overall > > memory usage did not change (192 bytes on 64 bit arches) > > [snip] > > Makes sense, but there's a funny thing about that refcount - the part > coming from ->f_cred is the most frequently changed *and* almost all > places using ->f_cred are just looking at its fields and do not manipulate > its refcount. The only exception (do_process_acct()) is easy to eliminate > just by storing a separate reference to the current creds of acct(2) caller > and using it instead of looking at ->f_cred. What's more, the place where we > grab what will be ->f_cred is guaranteed to have a non-f_cred reference *and* > most of the time such a reference is there for dropping ->f_cred (in > file_free()/file_free_rcu()). > > With that change in kernel/acct.c done, we could do the following: > a) split the cred refcount into the normal and percpu parts and > add a spinlock in there. > b) have put_cred() do this: > if (atomic_dec_and_test(>usage)) { > this_cpu_add(>f_cred_usage, 1); > call_rcu(>rcu, put_f_cred_rcu); > } > c) have get_empty_filp() increment current_cred ->f_cred_usage with > this_cpu_add() > d) have file_free() do > percpu_counter_dec(_files); > rcu_read_lock(); > if (likely(atomic_read(>f_cred->usage))) { > this_cpu_add(>f_cred->f_cred_usage, -1); > rcu_read_unlock(); > call_rcu(>f_u.fu_rcuhead, file_free_rcu_light); > } else { > rcu_read_unlock(); > call_rcu(>f_u.fu_rcuhead, file_free_rcu); > } > file_free_rcu() being > static void file_free_rcu(struct rcu_head *head) > { > struct file *f = container_of(head, struct file, f_u.fu_rcuhead); > put_f_cred(>f_cred->rcu); > kmem_cache_free(filp_cachep, f); > } > and file_free_rcu_light() - the same sans put_f_cred(); > > with put_f_cred() doing > spin_lock cred->lock > this_cpu_add(>f_cred_usage, -1); > find the sum of cred->f_cred_usage > spin_unlock cred->lock > if the sum has not reached 0 > return > current put_cred_rcu(cred) > > IOW, let's try to get rid of cross-cpu stores in ->f_cred grabbing and > (most of) ->f_cred dropping. > > Note that there are two paths leading to put_f_cred() in the above - via > call_rcu() on >rcu and from file_free_rcu() called via call_rcu() on > >f_u.fu_rcuhead. Both are RCU-delayed and they can happen in parallel - > different rcu_head are used. > > atomic_read() check in file_free() might give false positives if it comes > just before put_cred() on another CPU kills the last non-f_cred reference. > It's not a problem, since put_f_cred() from that put_cred() won't be > executed until we drop rcu_read_lock(), so we can safely decrement the > cred->f_cred_usage without cred->lock here (and we are guaranteed that we > won't > be dropping the last of that - the same put_cred() would've incremented > ->f_cred_usage). > > Does anybody see problems with that approach? I'm going to grab some sleep > (only a couple of hours so far tonight ;-/), will cook an incremental to > Eric's > field-reordering patch when I get up... Before I take a deep look at your suggestion, are you sure plain use of include/linux/percpu-refcount.h infra is not possible for struct cred ? Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH] xfrm: dst_entries_init() per-net dst_ops
On Tue, Oct 27, 2015 at 12:15 PM,wrote: > From: Dan Streetman > > The ipv4 and ipv6 xfrms each create a template dst_ops object, and > perform dst_entries_init() on the template objects. Then each net > namespace has its net.xfrm.xfrm[46]_dst_ops field set to the template > values. The problem with that is the dst_ops.pcpuc_entries field is > a percpu counter and cannot be used correctly by simply copying it to > another object. > > The result of this is a very subtle bug; changes to the dst entries > counter from one net namespace may sometimes get applied to a different > net namespace dst entries counter. This is because of how the percpu > counter works; it has a main count field as well as a pointer to the > percpu variables. Each net namespace maintains its own main count > variable, but all point to one set of percpu variables. When any net > namespace happens to change one of the percpu variables to outside its > small batch range, its count is moved to the net namespace's main count > variable. So with multiple net namespaces operating concurrently, the > dst_ops entries counter can stray from the actual value that it should > be; if counts are consistently moved from one net namespace to another > (which my testing showed is likely), then one net namespace winds up > with a negative dst_ops count (which is reported as 0) while another > winds up with a continually increasing count, eventually reaching its > gc_thresh limit, which causes all new traffic on the net namespace to > fail with -ENOBUFS. > > This removes the dst_entries_init (and dst_entries_destroy) call for > the template dst_ops objects; their counters will never be used. > Instead dst_entries_init is called for each net namespace's dst_ops > object, right after copying its values from the template, and Well I'm not sure why my test kernel booted, while the test robot found the bug of GFP_KERNEL percpu counter alloc during atomic context. Thanks test robot! I'll update the patch and resend. > dst_entries_destroy is called when the net namespace is removed. > > Signed-off-by: Dan Streetman > Signed-off-by: Dan Streetman > --- > net/ipv4/xfrm4_policy.c | 5 +++-- > net/ipv6/xfrm6_policy.c | 10 -- > net/xfrm/xfrm_policy.c | 25 +++-- > 3 files changed, 30 insertions(+), 10 deletions(-) > > diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c > index f2606b9..5f747ee 100644 > --- a/net/ipv4/xfrm4_policy.c > +++ b/net/ipv4/xfrm4_policy.c > @@ -235,6 +235,9 @@ static void xfrm4_dst_ifdown(struct dst_entry *dst, > struct net_device *dev, > xfrm_dst_ifdown(dst, dev); > } > > +/* This is used as a template only; the dst_entries counter is not > + * initialized for this, but must be on per-net copies of this > + */ > static struct dst_ops xfrm4_dst_ops = { > .family = AF_INET, > .gc = xfrm4_garbage_collect, > @@ -325,8 +328,6 @@ static void __init xfrm4_policy_init(void) > > void __init xfrm4_init(void) > { > - dst_entries_init(_dst_ops); > - > xfrm4_state_init(); > xfrm4_policy_init(); > xfrm4_protocol_init(); > diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c > index 2cc5840..b895ec1 100644 > --- a/net/ipv6/xfrm6_policy.c > +++ b/net/ipv6/xfrm6_policy.c > @@ -279,6 +279,9 @@ static void xfrm6_dst_ifdown(struct dst_entry *dst, > struct net_device *dev, > xfrm_dst_ifdown(dst, dev); > } > > +/* This is used as a template only; the dst_entries counter is not > + * initialized for this, but must be on per-net copies of this > + */ > static struct dst_ops xfrm6_dst_ops = { > .family = AF_INET6, > .gc = xfrm6_garbage_collect, > @@ -376,13 +379,9 @@ int __init xfrm6_init(void) > { > int ret; > > - dst_entries_init(_dst_ops); > - > ret = xfrm6_policy_init(); > - if (ret) { > - dst_entries_destroy(_dst_ops); > + if (ret) > goto out; > - } > ret = xfrm6_state_init(); > if (ret) > goto out_policy; > @@ -411,5 +410,4 @@ void xfrm6_fini(void) > xfrm6_protocol_fini(); > xfrm6_policy_fini(); > xfrm6_state_fini(); > - dst_entries_destroy(_dst_ops); > } > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > index 09bfcba..5381719 100644 > --- a/net/xfrm/xfrm_policy.c > +++ b/net/xfrm/xfrm_policy.c > @@ -2896,12 +2896,32 @@ static void __net_init xfrm_dst_ops_init(struct net > *net) > > rcu_read_lock(); > afinfo = rcu_dereference(xfrm_policy_afinfo[AF_INET]); > - if (afinfo) > + if (afinfo) { > net->xfrm.xfrm4_dst_ops = *afinfo->dst_ops; > + dst_entries_init(>xfrm.xfrm4_dst_ops); > + } > #if IS_ENABLED(CONFIG_IPV6) >
Re: [PATCH v7 02/10] ss: created formatters for json and hr
-BEGIN PGP SIGNED MESSAGE- Hash: SHA384 > I did not take over maintenance responsibility (whatever that means > to you precisely). I merely reviewed the patches, focussing on the > technical aspects of both implementation and patch management. Ah, I meant the maintenance of iproute2 as a whole. Though, obviosly I must have misconceived that. >> Those resentments were related to the patchsets complexity and >> size. > > I didn't see any problem with that in the first place. It is indeed > a big change, achieving something like that without a big patch set > is unlikely. > Fine, I was just repounding that since Steven Hemminger raised that. My reasoning here is that I just don't want to kick off restarting work whith objections still in the minds – since we are already at V7 now. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBCQAGBQJWMLgnAAoJEOAWT1uK3zQ7dMkH/jHps8no3c23LRXGnVaX08Ap Eha6XWU9pHrCHAM2AF6XI8aKERjS00ycuC12rFKoPZC2sjSXv4PTGFJq9w8AF71K os5PPi1iZRFQ/0tti7pMkGTmUwRrtHmdfGNKvu79oRJfADaqaNtpZV+4UiS2bPCP jy+89mA02XXgJpNkJgG/md6wNFHEsJBUGtcx3KSWqYXHHpV2FJoN1H8P28ESVAJA H2o1De6g7XBbSpigiHX8X69CkzjZor5cYyWF6W5lUNXhGCQ4xqmGJycNKjM3Et/g OPXvcaRKwv2R06pSYzkQ17tsnm9u8+R/v3nQvFDJGD0+zZJsc+c2by2KTQt6qm4= =ONCK -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface
On Wed, 28 Oct 2015 10:31:17 -0200 Fabio Estevamwrote: > On Wed, Oct 28, 2015 at 9:19 AM, David Jander wrote: > > > Sorry, I somehow assumed it was obvious I'd report against latest > > mainline... I'm on 4.3-rc7. > > Are you able to find out a previous kernel version that does not > exhibit this failure? I can search further down, but 4.1 is also broken. Are there specific changes or versions you are suspicious of? Russel mentioned something similar being fixed in the past... any pointers to this fix, so I can investigate whether this has any relation? Best regards, -- David Jander Protonic Holland. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface
On Wed, Oct 28, 2015 at 9:19 AM, David Janderwrote: > Sorry, I somehow assumed it was obvious I'd report against latest mainline... > I'm on 4.3-rc7. Are you able to find out a previous kernel version that does not exhibit this failure? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v6, 6/6] fsl/fman: Add FMan MAC driver
From: Igal LibermanThis patch adds the Ethernet MAC driver supporting the three different types of MACs: dTSEC, tGEC and mEMAC. Signed-off-by: Igal Liberman --- drivers/net/ethernet/freescale/fman/Makefile |3 +- drivers/net/ethernet/freescale/fman/mac.c| 980 ++ drivers/net/ethernet/freescale/fman/mac.h| 97 +++ 3 files changed, 1079 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/freescale/fman/mac.c create mode 100644 drivers/net/ethernet/freescale/fman/mac.h diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile index 2eb0b9b..51fd2e6 100644 --- a/drivers/net/ethernet/freescale/fman/Makefile +++ b/drivers/net/ethernet/freescale/fman/Makefile @@ -1,6 +1,7 @@ subdir-ccflags-y += -I$(srctree)/drivers/net/ethernet/freescale/fman -obj-y += fsl_fman.o fsl_fman_mac.o +obj-y += fsl_fman.o fsl_fman_mac.o fsl_mac.o fsl_fman-objs := fman_muram.o fman.o fman_sp.o fman_port.o fsl_fman_mac-objs := fman_dtsec.o fman_memac.o fman_tgec.o +fsl_mac-objs += mac.o diff --git a/drivers/net/ethernet/freescale/fman/mac.c b/drivers/net/ethernet/freescale/fman/mac.c new file mode 100644 index 000..17a5a5c --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/mac.c @@ -0,0 +1,980 @@ +/* Copyright 2008-2015 Freescale Semiconductor, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "mac.h" +#include "fman_mac.h" +#include "fman_dtsec.h" +#include "fman_tgec.h" +#include "fman_memac.h" + +#define MAC_DESCRIPTION "FSL FMan MAC API based driver" + +MODULE_LICENSE("Dual BSD/GPL"); + +MODULE_AUTHOR("Emil Medve "); + +MODULE_DESCRIPTION(MAC_DESCRIPTION); + +struct mac_priv_s { + struct device *dev; + void __iomem*vaddr; + u8 cell_index; + phy_interface_t phy_if; + struct fman *fman; + struct device_node *phy_node; + /* List of multicast addresses */ + struct list_headmc_addr_list; + struct platform_device *eth_dev; + struct fixed_phy_status *fixed_link; + u16 speed; + u16 max_speed; + + int (*enable)(struct fman_mac *mac_dev, enum comm_mode mode); + int (*disable)(struct fman_mac *mac_dev, enum comm_mode mode); +}; + +struct mac_address { + u8 addr[ETH_ALEN]; + struct list_head list; +}; + +static void mac_exception(void *_mac_dev, enum fman_mac_exceptions ex) +{ + struct mac_device *mac_dev; + struct mac_priv_s *priv; + + mac_dev = (struct mac_device *)_mac_dev; + priv = mac_dev->priv; + + if (ex == FM_MAC_EX_10G_RX_FIFO_OVFL) { + /* don't flag RX FIFO after the first */ + mac_dev->set_exception(mac_dev->fman_mac, +
Re: [RFC PATCH net-next 1/4] perf tools: Enable pre-event inherit setting by config terms
On Wed, Oct 28, 2015 at 10:55:02AM +, Wang Nan wrote: SNIP > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c > index f820906..397fb4e 100644 > --- a/tools/perf/util/evsel.c > +++ b/tools/perf/util/evsel.c > @@ -653,6 +653,15 @@ static void apply_config_terms(struct perf_evsel *evsel, > case PERF_EVSEL__CONFIG_TERM_STACK_USER: > dump_size = term->val.stack_user; > break; > + case PERF_EVSEL__CONFIG_TERM_INHERIT: > + /* > + * attr->inherit should has already been set by > + * perf_evsel__config. If user explicitly set > + * inherit using config terms, override global > + * opt->no_inherit setting. > + */ > + attr->inherit = term->val.inherit ? 1 : 0; > + break; > default: > break; > } > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h > index 9a95e73..e402f83 100644 > --- a/tools/perf/util/evsel.h > +++ b/tools/perf/util/evsel.h > @@ -43,6 +43,7 @@ enum { > PERF_EVSEL__CONFIG_TERM_TIME, > PERF_EVSEL__CONFIG_TERM_CALLGRAPH, > PERF_EVSEL__CONFIG_TERM_STACK_USER, > + PERF_EVSEL__CONFIG_TERM_INHERIT, > PERF_EVSEL__CONFIG_TERM_MAX, > }; > > @@ -55,6 +56,7 @@ struct perf_evsel_config_term { > booltime; > char*callgraph; > u64 stack_user; > + u64 inherit; seems like bool would be enough jirka -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND PATCH 07/10] net: wireless: iwlegacy: Remove unneeded variable ret
Hello. On 10/27/2015 10:02 PM, Punit Vara wrote: This patch is to the 3945-mac.c file that fixes up following warning by coccicheck: drivers/net/wireless/iwlegacy/3945-mac.c:247:5-8: Unneeded variable: "ret". Return "- EOPNOTSUPP" on line 249 Return -EOPNOTSUPP directly instead of return using ret Signed-off-by: Punit Vara--- drivers/net/wireless/iwlegacy/3945-mac.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/wireless/iwlegacy/3945-mac.c b/drivers/net/wireless/iwlegacy/3945-mac.c index af1b3e6..ff4dc44 100644 --- a/drivers/net/wireless/iwlegacy/3945-mac.c +++ b/drivers/net/wireless/iwlegacy/3945-mac.c @@ -244,9 +244,7 @@ il3945_set_dynamic_key(struct il_priv *il, struct ieee80211_key_conf *keyconf, static int il3945_remove_static_key(struct il_priv *il) { - int ret = -EOPNOTSUPP; - - return ret; + return -EOPNOTSUPP; } static int @@ -529,7 +527,6 @@ il3945_tx_skb(struct il_priv *il, if (unlikely(tid >= MAX_TID_COUNT)) goto drop; } - Unrelated white space change. MBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm: dst_entries_init() per-net dst_ops
From: Dan StreetmanDate: Wed, 28 Oct 2015 09:32:47 -0400 > Well I'm not sure why my test kernel booted, while the test robot > found the bug of GFP_KERNEL percpu counter alloc during atomic > context. Thanks test robot! It's because of the kernel config options you (don't) have enabled. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT] Networking
From: Linus TorvaldsDate: Wed, 28 Oct 2015 18:39:56 +0900 > Get rid of it. And I don't *ever* want to see that shit again. No problem, I'll revert it all. I asked Hannes to repost his patches to linux-kernel hoping someone would review and say it stunk or not, give him some feedback, or whatever, and nobody reviewed the changes at all... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] powerpc32: checksum_wrappers_64 becomes checksum_wrappers
Hi Scott, > I wonder why it was 64-bit specific in the first place. I think it was part of a series where I added my 64bit assembly checksum routines, and I didn't step back and think that the wrapper code would be useful on 32 bit. Anton -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND PATCH 07/10] net: wireless: iwlegacy: Remove unneeded variable ret
On 10/28/2015 4:35 PM, Sergei Shtylyov wrote: This patch is to the 3945-mac.c file that fixes up following warning by coccicheck: drivers/net/wireless/iwlegacy/3945-mac.c:247:5-8: Unneeded variable: "ret". Return "- EOPNOTSUPP" on line 249 Return -EOPNOTSUPP directly instead of return using ret Signed-off-by: Punit Vara--- drivers/net/wireless/iwlegacy/3945-mac.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/wireless/iwlegacy/3945-mac.c b/drivers/net/wireless/iwlegacy/3945-mac.c index af1b3e6..ff4dc44 100644 --- a/drivers/net/wireless/iwlegacy/3945-mac.c +++ b/drivers/net/wireless/iwlegacy/3945-mac.c @@ -244,9 +244,7 @@ il3945_set_dynamic_key(struct il_priv *il, struct ieee80211_key_conf *keyconf, static int il3945_remove_static_key(struct il_priv *il) { -int ret = -EOPNOTSUPP; - -return ret; +return -EOPNOTSUPP; } static int @@ -529,7 +527,6 @@ il3945_tx_skb(struct il_priv *il, if (unlikely(tid >= MAX_TID_COUNT)) goto drop; } - Unrelated white space change. And I've already complained about it! Please remove this hunk. MBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v6, 1/6] fsl/fman: Add FMan MURAM support
From: Igal LibermanAdd Frame Manager Multi-User RAM support. This internal FMan memory block is used by the FMan hardware modules, the management being made through the generic allocator. The FMan Internal memory, for example, is used for allocating transmit and receive FIFOs. Signed-off-by: Igal Liberman --- drivers/net/ethernet/freescale/Kconfig |1 + drivers/net/ethernet/freescale/Makefile |2 + drivers/net/ethernet/freescale/fman/Kconfig |8 ++ drivers/net/ethernet/freescale/fman/Makefile |5 + drivers/net/ethernet/freescale/fman/fman_muram.c | 159 ++ drivers/net/ethernet/freescale/fman/fman_muram.h | 51 +++ 6 files changed, 226 insertions(+) create mode 100644 drivers/net/ethernet/freescale/fman/Kconfig create mode 100644 drivers/net/ethernet/freescale/fman/Makefile create mode 100644 drivers/net/ethernet/freescale/fman/fman_muram.c create mode 100644 drivers/net/ethernet/freescale/fman/fman_muram.h diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig index ff76d4e..f3f89cc 100644 --- a/drivers/net/ethernet/freescale/Kconfig +++ b/drivers/net/ethernet/freescale/Kconfig @@ -53,6 +53,7 @@ config FEC_MPC52xx_MDIO If compiled as module, it will be called fec_mpc52xx_phy. source "drivers/net/ethernet/freescale/fs_enet/Kconfig" +source "drivers/net/ethernet/freescale/fman/Kconfig" config FSL_PQ_MDIO tristate "Freescale PQ MDIO" diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile index 71debd1..4097c58 100644 --- a/drivers/net/ethernet/freescale/Makefile +++ b/drivers/net/ethernet/freescale/Makefile @@ -17,3 +17,5 @@ gianfar_driver-objs := gianfar.o \ gianfar_ethtool.o obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o + +obj-$(CONFIG_FSL_FMAN) += fman/ diff --git a/drivers/net/ethernet/freescale/fman/Kconfig b/drivers/net/ethernet/freescale/fman/Kconfig new file mode 100644 index 000..66b7296 --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/Kconfig @@ -0,0 +1,8 @@ +config FSL_FMAN + bool "FMan support" + depends on FSL_SOC || COMPILE_TEST + select GENERIC_ALLOCATOR + default n + help + Freescale Data-Path Acceleration Architecture Frame Manager + (FMan) support diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile new file mode 100644 index 000..fc2e194 --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/Makefile @@ -0,0 +1,5 @@ +subdir-ccflags-y += -I$(srctree)/drivers/net/ethernet/freescale/fman + +obj-y += fsl_fman.o + +fsl_fman-objs := fman_muram.o diff --git a/drivers/net/ethernet/freescale/fman/fman_muram.c b/drivers/net/ethernet/freescale/fman/fman_muram.c new file mode 100644 index 000..35d4a50 --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/fman_muram.c @@ -0,0 +1,159 @@ +/* + * Copyright 2008-2015 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include "fman_muram.h" + +#include +#include +#include
[v6, 3/6] fsl/fman: Add FMan MAC support
From: Igal LibermanAdd the Data Path Acceleration Architecture Frame Manger MAC support. This patch adds The FMan MAC configuration, initialization and runtime control routines. This patch contains support for these types of MACs: - dTSEC: Three speed Ethernet controller (10/100/1000 Mbps) - tGEC: 10G Ethernet controller (10 Gbps) - mEMAC: Multi-rate Ethernet MAC (10/100/1000/1 Mbps) Different FMan revisions have different type and number of MACs. Signed-off-by: Igal Liberman --- drivers/net/ethernet/freescale/fman/Makefile |3 +- .../net/ethernet/freescale/fman/crc_mac_addr_ext.h | 314 drivers/net/ethernet/freescale/fman/fman_dtsec.c | 1608 drivers/net/ethernet/freescale/fman/fman_dtsec.h | 59 + drivers/net/ethernet/freescale/fman/fman_mac.h | 276 drivers/net/ethernet/freescale/fman/fman_memac.c | 1307 drivers/net/ethernet/freescale/fman/fman_memac.h | 60 + drivers/net/ethernet/freescale/fman/fman_tgec.c| 798 ++ drivers/net/ethernet/freescale/fman/fman_tgec.h| 55 + 9 files changed, 4479 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h create mode 100644 drivers/net/ethernet/freescale/fman/fman_dtsec.c create mode 100644 drivers/net/ethernet/freescale/fman/fman_dtsec.h create mode 100644 drivers/net/ethernet/freescale/fman/fman_mac.h create mode 100644 drivers/net/ethernet/freescale/fman/fman_memac.c create mode 100644 drivers/net/ethernet/freescale/fman/fman_memac.h create mode 100644 drivers/net/ethernet/freescale/fman/fman_tgec.c create mode 100644 drivers/net/ethernet/freescale/fman/fman_tgec.h diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile index fb5a7f0..43360d70 100644 --- a/drivers/net/ethernet/freescale/fman/Makefile +++ b/drivers/net/ethernet/freescale/fman/Makefile @@ -1,5 +1,6 @@ subdir-ccflags-y += -I$(srctree)/drivers/net/ethernet/freescale/fman -obj-y += fsl_fman.o +obj-y += fsl_fman.o fsl_fman_mac.o fsl_fman-objs := fman_muram.o fman.o +fsl_fman_mac-objs := fman_dtsec.o fman_memac.o fman_tgec.o diff --git a/drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h b/drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h new file mode 100644 index 000..92f2e87 --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h @@ -0,0 +1,314 @@ +/* + * Copyright 2008-2015 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* Define a macro that calculate the crc value of an Ethernet MAC address + * (48 bitd address) + */ + +#ifndef __crc_mac_addr_ext_h +#define __crc_mac_addr_ext_h + +#include + +static u32 crc_table[256] = { + 0x, + 0x77073096, + 0xee0e612c, + 0x990951ba, + 0x076dc419, + 0x706af48f, + 0xe963a535, + 0x9e6495a3, + 0x0edb8832, + 0x79dcb8a4, + 0xe0d5e91e, + 0x97d2d988, + 0x09b64c2b, + 0x7eb17cbd, + 0xe7b82d07, + 0x90bf1d91, + 0x1db71064, + 0x6ab020f2, + 0xf3b97148, +
[v6, 2/6] fsl/fman: Add FMan support
From: Igal LibermanAdd the Data Path Acceleration Architecture Frame Manger Driver. The FMan embeds a series of hardware blocks that implement a group of Ethernet interfaces. This patch adds The FMan configuration, initialization and runtime control routines. The FMan driver supports several hardware versions differentiated by things like: - Different type of MACs - Number of MAC and ports - Available resources - Different hardware errata Signed-off-by: Igal Liberman --- drivers/net/ethernet/freescale/fman/Makefile |2 +- drivers/net/ethernet/freescale/fman/fman.c | 2896 ++ drivers/net/ethernet/freescale/fman/fman.h | 329 +++ 3 files changed, 3226 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/freescale/fman/fman.c create mode 100644 drivers/net/ethernet/freescale/fman/fman.h diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile index fc2e194..fb5a7f0 100644 --- a/drivers/net/ethernet/freescale/fman/Makefile +++ b/drivers/net/ethernet/freescale/fman/Makefile @@ -2,4 +2,4 @@ subdir-ccflags-y += -I$(srctree)/drivers/net/ethernet/freescale/fman obj-y += fsl_fman.o -fsl_fman-objs := fman_muram.o +fsl_fman-objs := fman_muram.o fman.o diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c new file mode 100644 index 000..c8923c68 --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/fman.c @@ -0,0 +1,2896 @@ +/* + * Copyright 2008-2015 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include "fman.h" +#include "fman_muram.h" +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* General defines */ +#define FMAN_LIODN_TBL 64 /* size of LIODN table */ +#define MAX_NUM_OF_MACS10 +#define FM_NUM_OF_FMAN_CTRL_EVENT_REGS 4 +#define BASE_RX_PORTID 0x08 +#define BASE_TX_PORTID 0x28 + +/* Modules registers offsets */ +#define BMI_OFFSET 0x0008 +#define QMI_OFFSET 0x00080400 +#define DMA_OFFSET 0x000C2000 +#define FPM_OFFSET 0x000C3000 +#define IMEM_OFFSET0x000C4000 +#define CGP_OFFSET 0x000DB000 + +/* Exceptions bit map */ +#define EX_DMA_BUS_ERROR 0x8000 +#define EX_DMA_READ_ECC0x4000 +#define EX_DMA_SYSTEM_WRITE_ECC0x2000 +#define EX_DMA_FM_WRITE_ECC0x1000 +#define EX_FPM_STALL_ON_TASKS 0x0800 +#define EX_FPM_SINGLE_ECC 0x0400 +#define EX_FPM_DOUBLE_ECC 0x0200 +#define EX_QMI_SINGLE_ECC 0x0100 +#define EX_QMI_DEQ_FROM_UNKNOWN_PORTID 0x0080 +#define EX_QMI_DOUBLE_ECC 0x0040 +#define EX_BMI_LIST_RAM_ECC0x0020 +#define EX_BMI_STORAGE_PROFILE_ECC 0x0010 +#define EX_BMI_STATISTICS_RAM_ECC 0x0008 +#define EX_IRAM_ECC0x0004 +#define EX_MURAM_ECC
Re: [RFC PATCH net-next 2/4] perf tools: Introduce bpf-output event
Hello. On 10/28/2015 1:55 PM, Wang Nan wrote: Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce bpf_perf_event_output() helper) add a helper to enable BPF program You haven't run the patch thru scripts/checkpath.pl, I guess? It now enforces the certain style of citing a commit. output data to perf ring buffer through a new type of perf event PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf event of that type. Now perf user can use following cmdline to receive output data from BPF programs: # perf record -a -e evt=bpf-output/no-inherit/ \ -e ./test_bpf_output.c/maps.bpf-output.event=evt/ ls # perf script perf 12927 [004] 355971.129276: 0 evt=bpf-output/no-inherit/: 811ed5f1 sys_write perf 12927 [004] 355971.129279: 0 evt=bpf-output/no-inherit/: 811ed5f1 sys_write ... Signed-off-by: Wang NanCc: Alexei Starovoitov Cc: Arnaldo Carvalho de Melo Cc: Brendan Gregg Cc: David S. Miller --- tools/perf/util/evsel.c| 6 ++ tools/perf/util/parse-events.c | 4 tools/perf/util/parse-events.l | 1 + 3 files changed, 11 insertions(+) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 397fb4e..f01defb 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -224,6 +224,12 @@ struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx) if (evsel != NULL) perf_evsel__init(evsel, attr, idx); + if ((evsel->attr.type == PERF_TYPE_SOFTWARE) && + (evsel->attr.config == PERF_COUNT_SW_BPF_OUTPUT)) { Inner parens not necessary here. [...] MBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] amd-xgbe: Fix race between access of desc and desc index
On 10/27/2015 09:50 PM, David Miller wrote: From: Tom LendackyDate: Mon, 26 Oct 2015 17:13:54 -0500 During Tx cleanup it's still possible for the descriptor data to be read ahead of the descriptor index. A memory barrier is required between the read of the descriptor index and the start of the Tx cleanup loop. This allows a change to a lighter-weight barrier in the Tx transmit routine just before updating the current descriptor index. Since the memory barrier does result in extra overhead on arm64, keep the previous change to not chase the current descriptor value. This prevents the execution of the barrier for each loop performed. Suggested-by: Alexander Duyck Signed-off-by: Tom Lendacky Applied, thanks. Thanks David. Could you queue this up for the 4.1 and 4.2 stable trees? Thanks, Tom -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface
On Wed, Oct 28, 2015 at 10:48:54AM +0100, David Jander wrote: > > Hi all, > > I was unable to figure out who's maintaining > drivers/net/ethernet/freescale/fec_main.c, so I hope someone can help out on > this list... > > We have a board with a RMII phy connected to an i.MX6S. The hardware seems to > be ok, since I can receive and transmit ethernet frames without drops or > errors. However only simple things like ping and dhcp seemed to work. TCP/IP > connections could not be made. When looking at both ends with tcpdump, I > realized that all transmitted packages arrived at the other end with the TCP > and IP header checksums zeroed-out. > > After issuing the following command, TCP/IP started working correctly: > > $ ethtool --offload eth0 tx off > > This works around the issue. For some reason, when the FEC is in RMII mode, it > isn't filling in the checksums. > > On another board with an RGMII phy the same kernel works fine without the need > to disable offloading. What can possibly relate this functionality to the > choice of MAC interface? You don't mention which kernel version you're using. There has been a bug here with older kernels... -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface
On Wed, 28 Oct 2015 11:14:14 + Russell King - ARM Linuxwrote: > On Wed, Oct 28, 2015 at 10:48:54AM +0100, David Jander wrote: > > > > Hi all, > > > > I was unable to figure out who's maintaining > > drivers/net/ethernet/freescale/fec_main.c, so I hope someone can help out > > on this list... > > > > We have a board with a RMII phy connected to an i.MX6S. The hardware seems > > to be ok, since I can receive and transmit ethernet frames without drops or > > errors. However only simple things like ping and dhcp seemed to work. > > TCP/IP connections could not be made. When looking at both ends with > > tcpdump, I realized that all transmitted packages arrived at the other end > > with the TCP and IP header checksums zeroed-out. > > > > After issuing the following command, TCP/IP started working correctly: > > > > $ ethtool --offload eth0 tx off > > > > This works around the issue. For some reason, when the FEC is in RMII > > mode, it isn't filling in the checksums. > > > > On another board with an RGMII phy the same kernel works fine without the > > need to disable offloading. What can possibly relate this functionality to > > the choice of MAC interface? > > You don't mention which kernel version you're using. There has been a bug > here with older kernels... Sorry, I somehow assumed it was obvious I'd report against latest mainline... I'm on 4.3-rc7. Best regards, -- David Jander Protonic Holland. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
[Linus and Dave added, Solaris and NetBSD folks dropped from Cc] On Tue, Oct 27, 2015 at 05:13:56PM -0700, Eric Dumazet wrote: > On Tue, 2015-10-27 at 23:17 +, Al Viro wrote: > > > * [Linux-specific aside] our __alloc_fd() can degrade quite badly > > with some use patterns. The cacheline pingpong in the bitmap is probably > > inevitable, unless we accept considerably heavier memory footprint, > > but we also have a case when alloc_fd() takes O(n) and it's _not_ hard > > to trigger - close(3);open(...); will have the next open() after that > > scanning the entire in-use bitmap. I think I see a way to improve it > > without slowing the normal case down, but I'll need to experiment a > > bit before I post patches. Anybody with examples of real-world loads > > that make our descriptor allocator to degrade is very welcome to post > > the reproducers... > > Well, I do have real-world loads, but quite hard to setup in a lab :( > > Note that we also hit the 'struct cred'->usage refcount for every > open()/close()/sock_alloc(), and simply moving uid/gid out of the first > cache line really helps, as current_fsuid() and current_fsgid() no > longer forces a pingpong. > > I moved seldom used fields on the first cache line, so that overall > memory usage did not change (192 bytes on 64 bit arches) [snip] Makes sense, but there's a funny thing about that refcount - the part coming from ->f_cred is the most frequently changed *and* almost all places using ->f_cred are just looking at its fields and do not manipulate its refcount. The only exception (do_process_acct()) is easy to eliminate just by storing a separate reference to the current creds of acct(2) caller and using it instead of looking at ->f_cred. What's more, the place where we grab what will be ->f_cred is guaranteed to have a non-f_cred reference *and* most of the time such a reference is there for dropping ->f_cred (in file_free()/file_free_rcu()). With that change in kernel/acct.c done, we could do the following: a) split the cred refcount into the normal and percpu parts and add a spinlock in there. b) have put_cred() do this: if (atomic_dec_and_test(>usage)) { this_cpu_add(>f_cred_usage, 1); call_rcu(>rcu, put_f_cred_rcu); } c) have get_empty_filp() increment current_cred ->f_cred_usage with this_cpu_add() d) have file_free() do percpu_counter_dec(_files); rcu_read_lock(); if (likely(atomic_read(>f_cred->usage))) { this_cpu_add(>f_cred->f_cred_usage, -1); rcu_read_unlock(); call_rcu(>f_u.fu_rcuhead, file_free_rcu_light); } else { rcu_read_unlock(); call_rcu(>f_u.fu_rcuhead, file_free_rcu); } file_free_rcu() being static void file_free_rcu(struct rcu_head *head) { struct file *f = container_of(head, struct file, f_u.fu_rcuhead); put_f_cred(>f_cred->rcu); kmem_cache_free(filp_cachep, f); } and file_free_rcu_light() - the same sans put_f_cred(); with put_f_cred() doing spin_lock cred->lock this_cpu_add(>f_cred_usage, -1); find the sum of cred->f_cred_usage spin_unlock cred->lock if the sum has not reached 0 return current put_cred_rcu(cred) IOW, let's try to get rid of cross-cpu stores in ->f_cred grabbing and (most of) ->f_cred dropping. Note that there are two paths leading to put_f_cred() in the above - via call_rcu() on >rcu and from file_free_rcu() called via call_rcu() on >f_u.fu_rcuhead. Both are RCU-delayed and they can happen in parallel - different rcu_head are used. atomic_read() check in file_free() might give false positives if it comes just before put_cred() on another CPU kills the last non-f_cred reference. It's not a problem, since put_f_cred() from that put_cred() won't be executed until we drop rcu_read_lock(), so we can safely decrement the cred->f_cred_usage without cred->lock here (and we are guaranteed that we won't be dropping the last of that - the same put_cred() would've incremented ->f_cred_usage). Does anybody see problems with that approach? I'm going to grab some sleep (only a couple of hours so far tonight ;-/), will cook an incremental to Eric's field-reordering patch when I get up... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v6, 0/6] Freescale DPAA FMan
From: Igal LibermanThe Freescale Data Path Acceleration Architecture (DPAA) is a set of hardware components on specific QorIQ multicore processors. This architecture provides the infrastructure to support simplified sharing of networking interfaces and accelerators by multiple CPU cores and the accelerators. One of the DPAA accelerators is the Frame Manager (FMan) which contains a series of hardware blocks: ports, Ethernet MACs, a multi user RAM (MURAM) and Storage Profile (SP). This patch set introduce the FMan drivers. Each driver configures and initializes the corresponding FMan hardware module (described above). The MAC driver offers support for three different types of MACs (eTSEC, TGEC, MEMAC). v5 --> v6: - Addressed feedback from Scott: - Moved kernel doc to source files - Removed a series of configurable settings - Miscellaneous code updates v4 --> v5: - Addressed feedback from David Miller: - Removed driver layering - Reduce namespace pollution - Reduce code complexity and size v3 --> v4: - Remove device_initcall call in driver registration (redundant) - Remove hot/cold labels - Minor update in FMan Clock read from device-tree - Update fixed-link support - Addressed feedback from Stephen Hemminger - Remove bogus blank line v2 --> v3: - Addressed feedback from Scott: - Remove typedefs - Remove unnecessary memory barriers - Remove unnecessary casting - Remove KConfig options - Remove early_params - Remove Hungarian notation - Remove __packed__ attribute and padding from structures - Remove unlikely attribute (where it's not needed) - Use proper error codes and remove unnecessary prints - Use proper values for sleep routines - Replace complex Macros with functions - Improve device tree processing code - Use symbolic defines - Add time-out in busy-wait loops - Removed exit code (loadable module support will be added later) - Fixed "fixed-link" issue raised by Joakim Tjernlund v1 --> v2: - Addressed feedback from Paul Bolle: - General feedback of FMan Driver layer - Remove Errata defines - Aligned comments to Kernel Doc - Remove Loadable Module support (not yet supported) - Removed not needed KConfig dependencies - Addressed feedback from Scott Wood - Use Kernel ioread/iowrite services - Squash FLIB source and header patches together This submission is based on the prior Freescale DPAA FMan V3,RFC submission. Several issues addresses in this submission: - Reduced MAC layering and complexity - Reduced code base - T1024/T2080 10G best effort support Igal Liberman (6): fsl/fman: Add FMan MURAM support fsl/fman: Add FMan support fsl/fman: Add FMan MAC support fsl/fman: Add FMan SP support fsl/fman: Add FMan Port Support fsl/fman: Add FMan MAC driver drivers/net/ethernet/freescale/Kconfig |1 + drivers/net/ethernet/freescale/Makefile|2 + drivers/net/ethernet/freescale/fman/Kconfig|8 + drivers/net/ethernet/freescale/fman/Makefile |7 + .../net/ethernet/freescale/fman/crc_mac_addr_ext.h | 314 +++ drivers/net/ethernet/freescale/fman/fman.c | 2896 drivers/net/ethernet/freescale/fman/fman.h | 329 +++ drivers/net/ethernet/freescale/fman/fman_dtsec.c | 1608 +++ drivers/net/ethernet/freescale/fman/fman_dtsec.h | 59 + drivers/net/ethernet/freescale/fman/fman_mac.h | 276 ++ drivers/net/ethernet/freescale/fman/fman_memac.c | 1307 + drivers/net/ethernet/freescale/fman/fman_memac.h | 60 + drivers/net/ethernet/freescale/fman/fman_muram.c | 159 ++ drivers/net/ethernet/freescale/fman/fman_muram.h | 51 + drivers/net/ethernet/freescale/fman/fman_port.c| 1800 drivers/net/ethernet/freescale/fman/fman_port.h| 151 + drivers/net/ethernet/freescale/fman/fman_sp.c | 167 ++ drivers/net/ethernet/freescale/fman/fman_sp.h | 103 + drivers/net/ethernet/freescale/fman/fman_tgec.c| 798 ++ drivers/net/ethernet/freescale/fman/fman_tgec.h| 55 + drivers/net/ethernet/freescale/fman/mac.c | 980 +++ drivers/net/ethernet/freescale/fman/mac.h | 97 + 22 files changed, 11228 insertions(+) create mode 100644 drivers/net/ethernet/freescale/fman/Kconfig create mode 100644 drivers/net/ethernet/freescale/fman/Makefile create mode 100644 drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h create mode
[v6, 5/6] fsl/fman: Add FMan Port Support
From: Igal LibermanAdd the Data Path Acceleration Architecture Frame Manger Port Driver. The FMan driver uses a module called "Port" to represent the physical TX and RX ports. Each FMan version has different number of physical ports. This patch adds The FMan Port configuration, initialization and runtime control routines for both TX and RX. Signed-off-by: Igal Liberman --- drivers/net/ethernet/freescale/fman/Makefile|2 +- drivers/net/ethernet/freescale/fman/fman_port.c | 1800 +++ drivers/net/ethernet/freescale/fman/fman_port.h | 151 ++ 3 files changed, 1952 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/freescale/fman/fman_port.c create mode 100644 drivers/net/ethernet/freescale/fman/fman_port.h diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile index 5141532..2eb0b9b 100644 --- a/drivers/net/ethernet/freescale/fman/Makefile +++ b/drivers/net/ethernet/freescale/fman/Makefile @@ -2,5 +2,5 @@ subdir-ccflags-y += -I$(srctree)/drivers/net/ethernet/freescale/fman obj-y += fsl_fman.o fsl_fman_mac.o -fsl_fman-objs := fman_muram.o fman.o fman_sp.o +fsl_fman-objs := fman_muram.o fman.o fman_sp.o fman_port.o fsl_fman_mac-objs := fman_dtsec.o fman_memac.o fman_tgec.o diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c new file mode 100644 index 000..462f83d --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/fman_port.c @@ -0,0 +1,1800 @@ +/* + * Copyright 2008 - 2015 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include "fman_port.h" +#include "fman.h" +#include "fman_sp.h" + +#include +#include +#include +#include +#include +#include +#include + +/* Queue ID */ +#define DFLT_FQ_ID 0x00FF + +/* General defines */ +#define PORT_BMI_FIFO_UNITS0x100 + +#define MAX_PORT_FIFO_SIZE(bmi_max_fifo_size) \ + min((u32)bmi_max_fifo_size, (u32)1024 * FMAN_BMI_FIFO_UNITS) + +#define PORT_CG_MAP_NUM8 +#define PORT_PRS_RESULT_WORDS_NUM 8 +#define PORT_IC_OFFSET_UNITS 0x10 + +#define MIN_EXT_BUF_SIZE 64 + +#define BMI_PORT_REGS_OFFSET 0 +#define QMI_PORT_REGS_OFFSET 0x400 + +/* Default values */ +#define DFLT_PORT_BUFFER_PREFIX_CONTEXT_DATA_ALIGN \ + DFLT_FM_SP_BUFFER_PREFIX_CONTEXT_DATA_ALIGN + +#define DFLT_PORT_CUT_BYTES_FROM_END 4 + +#define DFLT_PORT_ERRORS_TO_DISCARDFM_PORT_FRM_ERR_CLS_DISCARD +#define DFLT_PORT_MAX_FRAME_LENGTH 9600 + +#define DFLT_PORT_RX_FIFO_PRI_ELEVATION_LEV(bmi_max_fifo_size) \ + MAX_PORT_FIFO_SIZE(bmi_max_fifo_size) + +#define DFLT_PORT_RX_FIFO_THRESHOLD(major, bmi_max_fifo_size) \ + (major == 6 ? \ + MAX_PORT_FIFO_SIZE(bmi_max_fifo_size) : \ + (MAX_PORT_FIFO_SIZE(bmi_max_fifo_size) * 3 / 4))\ + +#define DFLT_PORT_EXTRA_NUM_OF_FIFO_BUFS 0 + +/* QMI defines */ +#define
Re: [RFC PATCH net-next 1/4] perf tools: Enable pre-event inherit setting by config terms
Em Wed, Oct 28, 2015 at 02:21:26PM +0100, Jiri Olsa escreveu: > On Wed, Oct 28, 2015 at 10:55:02AM +, Wang Nan wrote: > > @@ -55,6 +56,7 @@ struct perf_evsel_config_term { > > booltime; > > char*callgraph; > > u64 stack_user; > > + u64 inherit; > > seems like bool would be enough Ok, will change this and move it to a more suitable place member alignment wise. Can I, with this change, slap an Acked-by: jirka? - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfrm: dst_entries_init() per-net dst_ops
Hello, On Wed, Oct 28, 2015, at 14:32, Dan Streetman wrote: > On Tue, Oct 27, 2015 at 12:15 PM,wrote: > > From: Dan Streetman > > > > The ipv4 and ipv6 xfrms each create a template dst_ops object, and > > perform dst_entries_init() on the template objects. Then each net > > namespace has its net.xfrm.xfrm[46]_dst_ops field set to the template > > values. The problem with that is the dst_ops.pcpuc_entries field is > > a percpu counter and cannot be used correctly by simply copying it to > > another object. How hard would it be to split of the counters from the dst_ops struct? We could make dst_ops instances const and have normal pointers to them and keep the dst_entries as a small array in net namespace? Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next 1/4] perf tools: Enable pre-event inherit setting by config terms
Em Wed, Oct 28, 2015 at 10:42:13AM -0300, Arnaldo Carvalho de Melo escreveu: > Em Wed, Oct 28, 2015 at 02:21:26PM +0100, Jiri Olsa escreveu: > > On Wed, Oct 28, 2015 at 10:55:02AM +, Wang Nan wrote: > > > @@ -55,6 +56,7 @@ struct perf_evsel_config_term { > > > booltime; > > > char*callgraph; > > > u64 stack_user; > > > + u64 inherit; > > > > seems like bool would be enough > > Ok, will change this and move it to a more suitable place member > alignment wise. Nah, switched it to bool, but no need to move it around, that is an union... > Can I, with this change, slap an Acked-by: jirka? > > - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT] Networking
On Wed, Oct 28 2015, Hannes Frederic Sowawrote: > Hi Linus, > > On Wed, Oct 28, 2015, at 10:39, Linus Torvalds wrote: >> Get rid of it. And I don't *ever* want to see that shit again. > > I don't want to give up on that this easily: > > In future I would like to see an interface like this. It is often hard > to do correct overflow/wrap-around tests and it would be great if there > are helper functions which could easily and without a lot of thinking be > used by people to remove those problems from the kernel. I agree - proper overflow checking can be really hard. Quick, assuming a and b have the same unsigned integer type, is 'a+b
Re: [PATCH] xfrm: dst_entries_init() per-net dst_ops
On Wed, Oct 28, 2015 at 9:42 AM, Hannes Frederic Sowawrote: > Hello, > > On Wed, Oct 28, 2015, at 14:32, Dan Streetman wrote: >> On Tue, Oct 27, 2015 at 12:15 PM, wrote: >> > From: Dan Streetman >> > >> > The ipv4 and ipv6 xfrms each create a template dst_ops object, and >> > perform dst_entries_init() on the template objects. Then each net >> > namespace has its net.xfrm.xfrm[46]_dst_ops field set to the template >> > values. The problem with that is the dst_ops.pcpuc_entries field is >> > a percpu counter and cannot be used correctly by simply copying it to >> > another object. > > How hard would it be to split of the counters from the dst_ops struct? > We could make dst_ops instances const and have normal pointers to them > and keep the dst_entries as a small array in net namespace? Well, the dst_ops->pcpuc_entries counter is used in dst.c which just gets a struct dst_ops *, so it doesn't have access to the owning net namespace. And, not all dst_ops users have a per-net-namespace dst_ops; ipv4/route.c for example uses a global "ipv4_dst_ops" object. So it probably does need to stay owned by dst_ops. > > Bye, > Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/4] sfc: Use cpu_to_mem() to support memoryless nodes
From: Bert KenwardWith CONFIG_HAVE_MEMORYLESS_NODES cpu_to_node() may return nodes without memory, which is not a good choice when later using that to allocate memory. cpu_to_mem() instead provides the most appropriate NUMA node to allocate from. Signed-off-by: Shradha Shah --- drivers/net/ethernet/sfc/efx.c| 2 +- drivers/net/ethernet/sfc/net_driver.h | 4 ++-- drivers/net/ethernet/sfc/rx.c | 3 ++- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 89fbd03..84f9e90 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -445,7 +445,7 @@ efx_alloc_channel(struct efx_nic *efx, int i, struct efx_channel *old_channel) channel->efx = efx; channel->channel = i; channel->type = _default_channel_type; - channel->irq_node = NUMA_NO_NODE; + channel->irq_mem_node = NUMA_NO_NODE; for (j = 0; j < EFX_TXQ_TYPES; j++) { tx_queue = >tx_queue[j]; diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h index 0ab9080a..bab6cc0 100644 --- a/drivers/net/ethernet/sfc/net_driver.h +++ b/drivers/net/ethernet/sfc/net_driver.h @@ -419,7 +419,7 @@ enum efx_sync_events_state { * @sync_events_state: Current state of sync events on this channel * @sync_timestamp_major: Major part of the last ptp sync event * @sync_timestamp_minor: Minor part of the last ptp sync event - * @irq_node: NUMA node of interrupt + * @irq_mem_node: Memory NUMA node of interrupt */ struct efx_channel { struct efx_nic *efx; @@ -479,7 +479,7 @@ struct efx_channel { u32 sync_timestamp_major; u32 sync_timestamp_minor; - int irq_node; + int irq_mem_node; }; #ifdef CONFIG_NET_RX_BUSY_POLL diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c index c5ef1e8..095d1af 100644 --- a/drivers/net/ethernet/sfc/rx.c +++ b/drivers/net/ethernet/sfc/rx.c @@ -171,7 +171,8 @@ static int efx_init_rx_buffers(struct efx_rx_queue *rx_queue, bool atomic) struct efx_channel *channel; channel = efx_rx_queue_channel(rx_queue); - page = alloc_pages_node(channel->irq_node, __GFP_COMP | + page = alloc_pages_node(channel->irq_mem_node, + __GFP_COMP | (atomic ? (GFP_ATOMIC | __GFP_NOWARN) : GFP_KERNEL), -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/4] sfc: allocate rx pages on the same node as the interrupt
From: Daniel PieczkoWhen the interrupt servicing a channel is on a NUMA node that is not local to the device, performance is improved by allocating rx pages on the node local to the interrupt (remote to the device) The performance-optimal case, where interrupts and applications are pinned to CPUs on the same node as the device, is not altered by this change. This change gave a 1% improvement in transaction rate using Nginx with all interrupts and Nginx threads on the node remote to the device. It also gave a small reduction in round-trip latency, again with the interrupt and application on a different node to the device. Allocating rx pages based on the channel->irq_node value is only valid for the initial driver-load interrupt affinities; if an interrupt is moved later, the wrong node may be used for the allocation. Signed-off-by: Shradha Shah --- drivers/net/ethernet/sfc/efx.c| 1 + drivers/net/ethernet/sfc/net_driver.h | 3 +++ drivers/net/ethernet/sfc/rx.c | 14 +- 3 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 974637d..89fbd03 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -445,6 +445,7 @@ efx_alloc_channel(struct efx_nic *efx, int i, struct efx_channel *old_channel) channel->efx = efx; channel->channel = i; channel->type = _default_channel_type; + channel->irq_node = NUMA_NO_NODE; for (j = 0; j < EFX_TXQ_TYPES; j++) { tx_queue = >tx_queue[j]; diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h index ad56231..0ab9080a 100644 --- a/drivers/net/ethernet/sfc/net_driver.h +++ b/drivers/net/ethernet/sfc/net_driver.h @@ -419,6 +419,7 @@ enum efx_sync_events_state { * @sync_events_state: Current state of sync events on this channel * @sync_timestamp_major: Major part of the last ptp sync event * @sync_timestamp_minor: Minor part of the last ptp sync event + * @irq_node: NUMA node of interrupt */ struct efx_channel { struct efx_nic *efx; @@ -477,6 +478,8 @@ struct efx_channel { enum efx_sync_events_state sync_events_state; u32 sync_timestamp_major; u32 sync_timestamp_minor; + + int irq_node; }; #ifdef CONFIG_NET_RX_BUSY_POLL diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c index 3f0e129..c5ef1e8 100644 --- a/drivers/net/ethernet/sfc/rx.c +++ b/drivers/net/ethernet/sfc/rx.c @@ -168,11 +168,15 @@ static int efx_init_rx_buffers(struct efx_rx_queue *rx_queue, bool atomic) * context in such a case. So, use __GFP_NO_WARN * in case of atomic. */ - page = alloc_pages(__GFP_COLD | __GFP_COMP | - (atomic ? - (GFP_ATOMIC | __GFP_NOWARN) - : GFP_KERNEL), - efx->rx_buffer_order); + struct efx_channel *channel; + + channel = efx_rx_queue_channel(rx_queue); + page = alloc_pages_node(channel->irq_node, __GFP_COMP | + (atomic ? +(GFP_ATOMIC | __GFP_NOWARN) +: GFP_KERNEL), + efx->rx_buffer_order); + if (unlikely(page == NULL)) return -ENOMEM; dma_addr = -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] net: phy: Stop 'phy-state-machine' and 'phy_change' work on remove
> > void phy_disconnect(struct phy_device *phydev) > > { > > if (phydev->irq > 0) > > phy_stop_interrupts(phydev); > > > > phy_stop_machine(phydev); > > > > phydev->adjust_link = NULL; > > > > phy_detach(phydev); > > } > > And this does not yet get called. It probably needs to be in > dsa_switch_destroy() just before unregister_netdev() of the slave > devices. > > However, the ordering in dsa_switch_destroy() looks wrong. The fixed > phys are destroyed before the slave devices. They should probably be > destroyed after the slave devices, or at least after the > phy_disconnect() is called. > > Andrew > Andrew, Florian, Thanks for the review, a call to phy_disconnect was missing in dsa_switch_destroy. I will post a new patchset with the correct fix, a switch to delayed_work and a separate dsa_slave_destroy function for sake of maintenance ease. Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 2/4] net: dsa: bcm_sf2: cleanup resources in remove callback
Implement a remove callback allowing the switch driver to cleanup resources it used: interrupts and remapped register ranges. Signed-off-by: Florian FainelliSigned-off-by: Neil Armstrong --- drivers/net/dsa/bcm_sf2.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index 6f946fe..e0be318 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -1054,6 +1054,25 @@ out_unmap: return ret; } +static void bcm_sf2_sw_remove(struct dsa_switch *ds) +{ + struct bcm_sf2_priv *priv = ds_to_priv(ds); + void __iomem **base; + unsigned int i; + + /* Disable all interrupts and free them */ + bcm_sf2_intr_disable(priv); + + free_irq(priv->irq0, priv); + free_irq(priv->irq1, priv); + + base = >core; + for (i = 0; i < BCM_SF2_REGS_NUM; i++) { + iounmap(*base); + base++; + } +} + static int bcm_sf2_sw_set_addr(struct dsa_switch *ds, u8 *addr) { return 0; @@ -1367,6 +1386,7 @@ static struct dsa_switch_driver bcm_sf2_switch_driver = { .tag_protocol = DSA_TAG_PROTO_BRCM, .priv_size = sizeof(struct bcm_sf2_priv), .probe = bcm_sf2_sw_probe, + .remove = bcm_sf2_sw_remove, .setup = bcm_sf2_sw_setup, .set_addr = bcm_sf2_sw_set_addr, .get_phy_flags = bcm_sf2_sw_get_phy_flags, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 3/4] net: dsa: mv88e6xxx: add common and ppu remove function
With the previously introduced remove callback, add a mv88e6xxx common remove function to cleanup all resources. Signed-off-by: Neil Armstrong--- drivers/net/dsa/mv88e6xxx.c | 18 ++ drivers/net/dsa/mv88e6xxx.h | 2 ++ 2 files changed, 20 insertions(+) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index b1b14f5..6287096 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -331,6 +331,16 @@ void mv88e6xxx_ppu_state_init(struct dsa_switch *ds) ps->ppu_timer.function = mv88e6xxx_ppu_reenable_timer; } +void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds) +{ + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + + del_timer_sync(>ppu_timer); + + cancel_work_sync(>bridge_work); + flush_work(>bridge_work); +} + int mv88e6xxx_phy_read_ppu(struct dsa_switch *ds, int addr, int regnum) { int ret; @@ -2083,6 +2093,14 @@ int mv88e6xxx_setup_common(struct dsa_switch *ds) return 0; } +void mv88e6xxx_remove_common(struct dsa_switch *ds) +{ + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + + cancel_work_sync(>bridge_work); + flush_work(>bridge_work); +} + int mv88e6xxx_setup_global(struct dsa_switch *ds) { struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h index 6f9ed5d..64d37a0 100644 --- a/drivers/net/dsa/mv88e6xxx.h +++ b/drivers/net/dsa/mv88e6xxx.h @@ -417,6 +417,7 @@ struct mv88e6xxx_hw_stat { int mv88e6xxx_switch_reset(struct dsa_switch *ds, bool ppu_active); int mv88e6xxx_setup_ports(struct dsa_switch *ds); int mv88e6xxx_setup_common(struct dsa_switch *ds); +void mv88e6xxx_remove_common(struct dsa_switch *ds); int mv88e6xxx_setup_global(struct dsa_switch *ds); int __mv88e6xxx_reg_read(struct mii_bus *bus, int sw_addr, int addr, int reg); int mv88e6xxx_reg_read(struct dsa_switch *ds, int addr, int reg); @@ -431,6 +432,7 @@ int mv88e6xxx_phy_read_indirect(struct dsa_switch *ds, int port, int regnum); int mv88e6xxx_phy_write_indirect(struct dsa_switch *ds, int port, int regnum, u16 val); void mv88e6xxx_ppu_state_init(struct dsa_switch *ds); +void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds); int mv88e6xxx_phy_read_ppu(struct dsa_switch *ds, int addr, int regnum); int mv88e6xxx_phy_write_ppu(struct dsa_switch *ds, int addr, int regnum, u16 val); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 1/4] net: dsa: allow switch drivers to cleanup their resources
Some switch drivers might request interrupts, remap register ranges, allow such drivers to implement a "remove" callback doing just that. Signed-off-by: Florian FainelliSigned-off-by: Neil Armstrong --- include/net/dsa.h | 1 + net/dsa/dsa.c | 4 2 files changed, 5 insertions(+) diff --git a/include/net/dsa.h b/include/net/dsa.h index 98ccbde..0e1502c 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -212,6 +212,7 @@ struct dsa_switch_driver { */ char*(*probe)(struct device *host_dev, int sw_addr); int (*setup)(struct dsa_switch *ds); + void(*remove)(struct dsa_switch *ds); int (*set_addr)(struct dsa_switch *ds, u8 *addr); u32 (*get_phy_flags)(struct dsa_switch *ds, int port); diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 1eba07f..f462fc5 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -459,6 +459,10 @@ static void dsa_switch_destroy(struct dsa_switch *ds) } mdiobus_unregister(ds->slave_mii_bus); + + /* Leave a chance to the driver to cleanup */ + if (ds->drv->remove) + ds->drv->remove(ds); } #ifdef CONFIG_PM_SLEEP -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next 1/4] perf tools: Enable pre-event inherit setting by config terms
On Wed, Oct 28, 2015 at 10:42:13AM -0300, Arnaldo Carvalho de Melo wrote: > Em Wed, Oct 28, 2015 at 02:21:26PM +0100, Jiri Olsa escreveu: > > On Wed, Oct 28, 2015 at 10:55:02AM +, Wang Nan wrote: > > > @@ -55,6 +56,7 @@ struct perf_evsel_config_term { > > > booltime; > > > char*callgraph; > > > u64 stack_user; > > > + u64 inherit; > > > > seems like bool would be enough > > Ok, will change this and move it to a more suitable place member > alignment wise. > > Can I, with this change, slap an Acked-by: jirka? yep jirka -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 0/4] net: dsa: cleanup dsa driver
Introduce a new remove callback to allow DSA drivers to cleanup their ressources. Then add a remove implementation for bcm_sf2 and mv88e6xxx. This patch was not tested due of a lack of hardware. v2: add remove callback patch to the serie Neil Armstrong (4): net: dsa: allow switch drivers to cleanup their resources net: dsa: bcm_sf2: cleanup resources in remove callback net: dsa: mv88e6xxx: add common and ppu remove function net: dsa: make usage of mv88e6xxx common remove function drivers/net/dsa/bcm_sf2.c | 20 drivers/net/dsa/mv88e6123_61_65.c | 1 + drivers/net/dsa/mv88e6131.c | 8 drivers/net/dsa/mv88e6171.c | 1 + drivers/net/dsa/mv88e6352.c | 1 + drivers/net/dsa/mv88e6xxx.c | 18 ++ drivers/net/dsa/mv88e6xxx.h | 2 ++ include/net/dsa.h | 1 + net/dsa/dsa.c | 4 9 files changed, 56 insertions(+) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 02/10] ss: created formatters for json and hr
On Wed, Oct 28, 2015 at 12:57:28PM +0100, Matthias Tafelmeier wrote: > >> Those resentments were related to the patchsets complexity and > >> size. > > > > I didn't see any problem with that in the first place. It is indeed > > a big change, achieving something like that without a big patch set > > is unlikely. > > > > Fine, I was just repounding that since Steven Hemminger raised that. > My reasoning here is that I just don't want to kick off restarting > work whith objections still in the minds – since we are already at V7 > now. Yeah, sorry for not having looked into this earlier. Also, I neither have nor claim any power of veto. Apart from that, I'm not against this patch series in general, just trying to help raise it's quality a bit. Eventually, we don't set anything in stone so everything can be fixed/improved later on. Except Git history of course, which is important to get right in relation to bisecting. Cheers, Phil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 3/4] net: dsa: mv88e6xxx: add common and ppu remove function
On Wed, Oct 28, 2015 at 03:13:16PM +0100, Neil Armstrong wrote: > With the previously introduced remove callback, add a > mv88e6xxx common remove function to cleanup all resources. > > Signed-off-by: Neil Armstrong> --- > drivers/net/dsa/mv88e6xxx.c | 18 ++ > drivers/net/dsa/mv88e6xxx.h | 2 ++ > 2 files changed, 20 insertions(+) > > diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c > index b1b14f5..6287096 100644 > --- a/drivers/net/dsa/mv88e6xxx.c > +++ b/drivers/net/dsa/mv88e6xxx.c > @@ -331,6 +331,16 @@ void mv88e6xxx_ppu_state_init(struct dsa_switch *ds) > ps->ppu_timer.function = mv88e6xxx_ppu_reenable_timer; > } > > +void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds) > +{ > + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); > + > + del_timer_sync(>ppu_timer); > + > + cancel_work_sync(>bridge_work); > + flush_work(>bridge_work); > +} > + You add this function, but you don't use it anywhere? Also, why cancel bridge work, not ppu_work? Or has that been consolidated in some patch i'm missing? Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/4] sfc: NUMA support
This patch series adds support for - allocating rx pages local to the interrupt - setting affinity hint to influence IRQs to be allocated on the same NUMA node as the one where the card resides. Alexandra Kossovsky (1): sfc: use __GFP_NOWARN when allocating RX pages from atomic context. Bert Kenward (2): sfc: Use cpu_to_mem() to support memoryless nodes sfc: set and clear interrupt affinity hints Daniel Pieczko (1): sfc: allocate rx pages on the same node as the interrupt drivers/net/ethernet/sfc/efx.c| 36 +++ drivers/net/ethernet/sfc/net_driver.h | 3 +++ drivers/net/ethernet/sfc/rx.c | 18 +++--- 3 files changed, 54 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/4] sfc: set and clear interrupt affinity hints
From: Bert KenwardUse cpumask_local_spread to provide interrupt affinity hints for each queue. This will spread interrupts across NUMA local CPUs first, extending to remote nodes if needed. Signed-off-by: Shradha Shah --- drivers/net/ethernet/sfc/efx.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 84f9e90..93c4c0e 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -1489,6 +1489,30 @@ static int efx_probe_interrupts(struct efx_nic *efx) return 0; } +#if defined(CONFIG_SMP) +static void efx_set_interrupt_affinity(struct efx_nic *efx) +{ + struct efx_channel *channel; + unsigned int cpu; + + efx_for_each_channel(channel, efx) { + cpu = cpumask_local_spread(channel->channel, + pcibus_to_node(efx->pci_dev->bus)); + + irq_set_affinity_hint(channel->irq, cpumask_of(cpu)); + channel->irq_mem_node = cpu_to_mem(cpu); + } +} + +static void efx_clear_interrupt_affinity(struct efx_nic *efx) +{ + struct efx_channel *channel; + + efx_for_each_channel(channel, efx) + irq_set_affinity_hint(channel->irq, NULL); +} +#endif /* CONFIG_SMP */ + static int efx_soft_enable_interrupts(struct efx_nic *efx) { struct efx_channel *channel, *end_channel; @@ -2932,6 +2956,9 @@ static void efx_pci_remove_main(struct efx_nic *efx) cancel_work_sync(>reset_work); efx_disable_interrupts(efx); +#if defined(CONFIG_SMP) + efx_clear_interrupt_affinity(efx); +#endif efx_nic_fini_interrupt(efx); efx_fini_port(efx); efx->type->fini(efx); @@ -3081,6 +3108,11 @@ static int efx_pci_probe_main(struct efx_nic *efx) rc = efx_nic_init_interrupt(efx); if (rc) goto fail5; + +#if defined(CONFIG_SMP) + efx_set_interrupt_affinity(efx); +#endif + rc = efx_enable_interrupts(efx); if (rc) goto fail6; @@ -3088,6 +3120,9 @@ static int efx_pci_probe_main(struct efx_nic *efx) return 0; fail6: +#if defined(CONFIG_SMP) + efx_clear_interrupt_affinity(efx); +#endif efx_nic_fini_interrupt(efx); fail5: efx_fini_port(efx); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] net: dsa: allow switch drivers to cleanup their resources
On 10/27/2015 05:59 PM, Vivien Didelot wrote: > On Oct. Tuesday 27 (44) 04:43 PM, Neil Armstrong wrote: >> >> Yes, I didn't know how to handle this since it was part of a larger patch. >> >> I forgot to add this into the cover-letter but I wanted to send an RFC serie >> with >> your bcm remove patch and a mv88e6xxx remove experimental code. >> >> Yet, the mv88e6060 does not make usage of this. > > So this patch must be part of your RFC for module removal instead of > this patchset. > > Thanks, > -v > Vivien, Florian, Thanks for the review, I will integrate it in the other RFC patchset with the correct Signed-off-by tag. Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/4] net: dsa: allow switch drivers to cleanup their resources
On Wed, Oct 28, 2015 at 03:12:57PM +0100, Neil Armstrong wrote: > Some switch drivers might request interrupts, remap register ranges, > allow such drivers to implement a "remove" callback doing just that. > > Signed-off-by: Florian Fainelli> Signed-off-by: Neil Armstrong > --- > include/net/dsa.h | 1 + > net/dsa/dsa.c | 4 > 2 files changed, 5 insertions(+) > > diff --git a/include/net/dsa.h b/include/net/dsa.h > index 98ccbde..0e1502c 100644 > --- a/include/net/dsa.h > +++ b/include/net/dsa.h > @@ -212,6 +212,7 @@ struct dsa_switch_driver { >*/ > char*(*probe)(struct device *host_dev, int sw_addr); > int (*setup)(struct dsa_switch *ds); > + void(*remove)(struct dsa_switch *ds); > int (*set_addr)(struct dsa_switch *ds, u8 *addr); > u32 (*get_phy_flags)(struct dsa_switch *ds, int port); > > diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c > index 1eba07f..f462fc5 100644 > --- a/net/dsa/dsa.c > +++ b/net/dsa/dsa.c > @@ -459,6 +459,10 @@ static void dsa_switch_destroy(struct dsa_switch *ds) > } > > mdiobus_unregister(ds->slave_mii_bus); > + > + /* Leave a chance to the driver to cleanup */ A nitpick: /* Give the driver a chance to cleanup */ would be better English. Reviewed-by: Andrew Lunn Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On Wed, 2015-10-28 at 06:24 -0700, Eric Dumazet wrote: > Before I take a deep look at your suggestion, are you sure plain use of > include/linux/percpu-refcount.h infra is not possible for struct cred ? BTW, I am not convinced we need to spend so much energy and per-cpu memory for struct cred refcount. The big problem is fd array spinlock of course and bitmap search for POSIX compliance. The cache line trashing in struct cred is a minor one ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/4] sfc: use __GFP_NOWARN when allocating RX pages from atomic context.
From: Alexandra KossovskyIf we fail to allocate a page when in atomic context this is handled by scheduling a fill in non-atomic context. As such, a warning is not needed. Signed-off-by: Shradha Shah --- drivers/net/ethernet/sfc/rx.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c index 809ea461..3f0e129 100644 --- a/drivers/net/ethernet/sfc/rx.c +++ b/drivers/net/ethernet/sfc/rx.c @@ -163,8 +163,15 @@ static int efx_init_rx_buffers(struct efx_rx_queue *rx_queue, bool atomic) do { page = efx_reuse_page(rx_queue); if (page == NULL) { + /* GFP_ATOMIC may fail because of various reasons, +* and we re-schedule rx_fill from non-atomic +* context in such a case. So, use __GFP_NO_WARN +* in case of atomic. +*/ page = alloc_pages(__GFP_COLD | __GFP_COMP | - (atomic ? GFP_ATOMIC : GFP_KERNEL), + (atomic ? + (GFP_ATOMIC | __GFP_NOWARN) + : GFP_KERNEL), efx->rx_buffer_order); if (unlikely(page == NULL)) return -ENOMEM; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 4/4] net: dsa: make usage of mv88e6xxx common remove function
Make usage of previously introduced mv88e6xxx common remove function in all mv88e6xxx drivers. Signed-off-by: Neil Armstrong--- drivers/net/dsa/mv88e6123_61_65.c | 1 + drivers/net/dsa/mv88e6131.c | 8 drivers/net/dsa/mv88e6171.c | 1 + drivers/net/dsa/mv88e6352.c | 1 + 4 files changed, 11 insertions(+) diff --git a/drivers/net/dsa/mv88e6123_61_65.c b/drivers/net/dsa/mv88e6123_61_65.c index 4bcfd68..1773c99 100644 --- a/drivers/net/dsa/mv88e6123_61_65.c +++ b/drivers/net/dsa/mv88e6123_61_65.c @@ -122,6 +122,7 @@ struct dsa_switch_driver mv88e6123_61_65_switch_driver = { .priv_size = sizeof(struct mv88e6xxx_priv_state), .probe = mv88e6123_61_65_probe, .setup = mv88e6123_61_65_setup, + .remove = mv88e6xxx_remove_common, .set_addr = mv88e6xxx_set_addr_indirect, .phy_read = mv88e6xxx_phy_read, .phy_write = mv88e6xxx_phy_write, diff --git a/drivers/net/dsa/mv88e6131.c b/drivers/net/dsa/mv88e6131.c index c73121c..0f559b4 100644 --- a/drivers/net/dsa/mv88e6131.c +++ b/drivers/net/dsa/mv88e6131.c @@ -137,6 +137,13 @@ static int mv88e6131_setup(struct dsa_switch *ds) return mv88e6xxx_setup_ports(ds); } +static void mv88e6131_remove(struct dsa_switch *ds) +{ + mv88e6xxx_ppu_state_remove(ds); + + mv88e6xxx_remove_common(ds); +} + static int mv88e6131_port_to_phy_addr(struct dsa_switch *ds, int port) { struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); @@ -175,6 +182,7 @@ struct dsa_switch_driver mv88e6131_switch_driver = { .priv_size = sizeof(struct mv88e6xxx_priv_state), .probe = mv88e6131_probe, .setup = mv88e6131_setup, + .remove = mv88e6131_remove, .set_addr = mv88e6xxx_set_addr_direct, .phy_read = mv88e6131_phy_read, .phy_write = mv88e6131_phy_write, diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c index 2c8eb6f..382529b 100644 --- a/drivers/net/dsa/mv88e6171.c +++ b/drivers/net/dsa/mv88e6171.c @@ -101,6 +101,7 @@ struct dsa_switch_driver mv88e6171_switch_driver = { .priv_size = sizeof(struct mv88e6xxx_priv_state), .probe = mv88e6171_probe, .setup = mv88e6171_setup, + .remove = mv88e6xxx_remove_common, .set_addr = mv88e6xxx_set_addr_indirect, .phy_read = mv88e6xxx_phy_read_indirect, .phy_write = mv88e6xxx_phy_write_indirect, diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c index cbf4dd8..7938901 100644 --- a/drivers/net/dsa/mv88e6352.c +++ b/drivers/net/dsa/mv88e6352.c @@ -321,6 +321,7 @@ struct dsa_switch_driver mv88e6352_switch_driver = { .priv_size = sizeof(struct mv88e6xxx_priv_state), .probe = mv88e6352_probe, .setup = mv88e6352_setup, + .remove = mv88e6xxx_remove_common, .set_addr = mv88e6xxx_set_addr_indirect, .phy_read = mv88e6xxx_phy_read_indirect, .phy_write = mv88e6xxx_phy_write_indirect, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 3/4] net: dsa: mv88e6xxx: add common and ppu remove function
On Wed, Oct 28, 2015 at 03:37:02PM +0100, Neil Armstrong wrote: > Hi Andrew, > > On 10/28/2015 03:35 PM, Andrew Lunn wrote: > > On Wed, Oct 28, 2015 at 03:13:16PM +0100, Neil Armstrong wrote: > >> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c > >> index b1b14f5..6287096 100644 > >> --- a/drivers/net/dsa/mv88e6xxx.c > >> +++ b/drivers/net/dsa/mv88e6xxx.c > >> @@ -331,6 +331,16 @@ void mv88e6xxx_ppu_state_init(struct dsa_switch *ds) > >>ps->ppu_timer.function = mv88e6xxx_ppu_reenable_timer; > >> } > >> > >> +void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds) > >> +{ > >> + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); > >> + > >> + del_timer_sync(>ppu_timer); > >> + > >> + cancel_work_sync(>bridge_work); > >> + flush_work(>bridge_work); > >> +} > >> + > > > > You add this function, but you don't use it anywhere? Also, why > > cancel bridge work, not ppu_work? Or has that been consolidated > > in some patch i'm missing? > > > >Andrew > > > > It's called in the next patch, in mv88e6131_remove for mv88e6131. Hi Neil It would be better to split this out into a patch of its own, and include the mv88e6131 change with it. Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 3/4] net: dsa: mv88e6xxx: add common and ppu remove function
Hi Andrew, On 10/28/2015 03:35 PM, Andrew Lunn wrote: > On Wed, Oct 28, 2015 at 03:13:16PM +0100, Neil Armstrong wrote: >> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c >> index b1b14f5..6287096 100644 >> --- a/drivers/net/dsa/mv88e6xxx.c >> +++ b/drivers/net/dsa/mv88e6xxx.c >> @@ -331,6 +331,16 @@ void mv88e6xxx_ppu_state_init(struct dsa_switch *ds) >> ps->ppu_timer.function = mv88e6xxx_ppu_reenable_timer; >> } >> >> +void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds) >> +{ >> +struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); >> + >> +del_timer_sync(>ppu_timer); >> + >> +cancel_work_sync(>bridge_work); >> +flush_work(>bridge_work); >> +} >> + > > You add this function, but you don't use it anywhere? Also, why > cancel bridge work, not ppu_work? Or has that been consolidated > in some patch i'm missing? > >Andrew > It's called in the next patch, in mv88e6131_remove for mv88e6131. Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT] Networking
This may look a bit scary this late in the release cycle, but as is typically the case it's predominantly small driver fixes all over the place. 1) Fix two regressions in ipv6 route lookups, particularly wrt. output interface specifications in the lookup key. From David Ahern. 2) Fix checks in ipv6 IPSEC tunnel pre-encap fragmentation, from Herbert Xu. 3) Fix mis-advertisement of 1000BASE-T on bcm63xx_enet, from Simon Arlott. 4) Some smsc phys misbehave with energy detect mode enabled, so add a DT property and disable it on such switches. From Heiko Schocher. 5) Fix TSO corruption on TX in mv643xx_eth, from Philipp Kirchhofer. 6) Fix regression added by removal of openvswitch vport stats, from James Morse. 7) Vendor Kconfig options should be bool, not tristate, from Andreas Schwab. 8) Use non-_BH() net stats bump in tcp_xmit_probe_skb(), otherwise we barf during TCP REPAIR operations. 9) Fix various bugs in openvswitch conntrack support, from Joe Stringer. 10) Fix NETLINK_LIST_MEMBERSHIPS locking, from David Herrmann. 11) Don't have VSOCK do sock_put() in interrupt context, from Jorgen Hansen. 12) Fix skb_realloc_headroom() failures properly in ISDN, from Karsten Keil. 13) Add some device IDs to qmi_wwan, from Bjorn Mork. 14) Fix ovs egress tunnel information when using lwtunnel devices, from Pravin B Shelar. 15) Add missing NETIF_F_FRAGLIST to macvtab feature list, from Jason Wang. 16) Fix incorrect handling of throw routes when the result of the throw cannot find a match, from Xin Long. 17) Protect ipv6 MTU calculations from wrap-around, from Hannes Frederic Sowa. 18) Fix failed autonegotiation on KSZ9031 micrel PHYs, from Nathan Sullivan. 19) Add missing memory barries in descriptor accesses or xgbe driver, from Thomas Lendacky. 20) Fix release conditon test in pppoe_release(), from Guillaume Nault. 21) Fix gianfar bugs wrt. filter configuration, from Claudiu Manoil. 22) Fix violations of RX buffer alignment in sh_eth driver, from Sergei Shtylyov. 23) Fixing missing of_node_put() calls in various places around the networking, from Julia Lawall. 24) Fix incorrect leaf now walking in ipv4 routing tree, from Alexander Duyck. 25) RDS doesn't check pskb_pull()/pskb_trim() return values, from Sowmini Varadhan. 26) Fix VLAN configuration in mlx4 driver, from Jack Morgenstein. Please pull, thanks a lot. The following changes since commit 1099f86044111e9a7807f09523e42d4c9d0fb781: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-10-19 09:55:40 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master for you to fetch changes up to e18f6ac30d31433d8cd9ccf693d3cdd5d2e66ef9: Merge branch 'mlx4-fixes' (2015-10-27 20:27:45 -0700) Alexander Duyck (1): fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key Andreas Schwab (1): net: cavium: change NET_VENDOR_CAVIUM to bool Andrew F. Davis (1): net: phy: dp83848: Add TI DP83848 Ethernet PHY Andrew Shewmaker (1): tcp: allow dctcp alpha to drop to zero Bjørn Mork (1): qmi_wwan: add Sierra Wireless MC74xx/EM74xx Carol L Soto (1): net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes Claudiu Manoil (4): gianfar: Remove duplicated argument to bitwise OR gianfar: Don't enable the Filer w/o the Parser gianfar: Fix Rx BSY error handling MAINTAINERS: Add entry for gianfar ethernet driver Dan Carpenter (1): irda: precedence bug in irlmp_seq_hb_idx() David Ahern (2): net: Really fix vti6 with oif in dst lookups net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set David Daney (1): net: thunderx: Rewrite silicon revision tests. David Herrmann (1): netlink: fix locking around NETLINK_LIST_MEMBERSHIPS David S. Miller (12): Merge branch 'smsc-energy-detect' Merge branch 'mv643xx-fixes' Merge git://git.kernel.org/.../pablo/nf Merge branch 'isdn-null-deref' Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec Merge branch 'master' of git://git.kernel.org/.../jkirsher/net-queue Merge branch 'ipv6-overflow-arith' Merge branch 'thunderx-fixes' Merge branch 'gianfar-fixes' Merge branch 'sh_eth-fixes' Merge branch 'net_of_node_put' Merge branch 'mlx4-fixes' Eric Dumazet (1): ipv6: gre: support SIT encapsulation Florian Westphal (1): netfilter: sync with packet rx also after removing queue entries Gao feng (1): vsock: fix missing cleanup when misc_register failed Guillaume Nault (1): ppp: fix pppoe_dev deletion condition in pppoe_release() Hannes Frederic Sowa (2): overflow-arith: begin to add support for overflow builtin functions ipv6: protect mtu calculation of wrap-around and infinite loop by
Re: [PATCH v2 1/3] virtio_net: Stop doing DMA from the stack
On Tue, Oct 27, 2015 at 10:30:19PM -0700, Andy Lutomirski wrote: > From: Andy Lutomirski> > Once virtio starts using the DMA API, we won't be able to safely DMA > from the stack. virtio-net does a couple of config DMA requests > from small stack buffers -- switch to using dynamically-allocated > memory. > > This should have no effect on any performance-critical code paths. > > Cc: netdev@vger.kernel.org > Cc: "Michael S. Tsirkin" > Cc: virtualizat...@lists.linux-foundation.org > Reviewed-by: Joerg Roedel > Signed-off-by: Andy Lutomirski > --- > > Hi Michael and DaveM- > > This is a prerequisite for the virtio DMA fixing project. It works > as a standalone patch, though. Would it make sense to apply it to > an appropriate networking tree now? > > drivers/net/virtio_net.c | 53 > > 1 file changed, 36 insertions(+), 17 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index d8838dedb7a4..4f10f8a58811 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -976,31 +976,43 @@ static bool virtnet_send_command(struct virtnet_info > *vi, u8 class, u8 cmd, >struct scatterlist *out) > { > struct scatterlist *sgs[4], hdr, stat; > - struct virtio_net_ctrl_hdr ctrl; > - virtio_net_ctrl_ack status = ~0; > + > + struct { > + struct virtio_net_ctrl_hdr ctrl; > + virtio_net_ctrl_ack status; > + } *buf; > + > unsigned out_num = 0, tmp; > + bool ret; > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > > - ctrl.class = class; > - ctrl.cmd = cmd; > + buf = kmalloc(sizeof(*buf), GFP_ATOMIC); > + if (!buf) > + return false; This is problematic. The command is never retried, the error is propagated to userspace. > + buf->status = ~0; > + > + buf->ctrl.class = class; > + buf->ctrl.cmd = cmd; > /* Add header */ > - sg_init_one(, , sizeof(ctrl)); > + sg_init_one(, >ctrl, sizeof(buf->ctrl)); > sgs[out_num++] = > > if (out) > sgs[out_num++] = out; > > /* Add return status. */ > - sg_init_one(, , sizeof(status)); > + sg_init_one(, >status, sizeof(buf->status)); > sgs[out_num] = > > BUG_ON(out_num + 1 > ARRAY_SIZE(sgs)); > virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC); > > - if (unlikely(!virtqueue_kick(vi->cvq))) > - return status == VIRTIO_NET_OK; > + if (unlikely(!virtqueue_kick(vi->cvq))) { > + ret = (buf->status == VIRTIO_NET_OK); > + goto out; > + } > > /* Spin for a response, the kick causes an ioport write, trapping >* into the hypervisor, so the request should be handled immediately. > @@ -1009,7 +1021,11 @@ static bool virtnet_send_command(struct virtnet_info > *vi, u8 class, u8 cmd, > !virtqueue_is_broken(vi->cvq)) > cpu_relax(); > > - return status == VIRTIO_NET_OK; > + ret = (buf->status == VIRTIO_NET_OK); > + > +out: > + kfree(buf); > + return ret; > } > > static int virtnet_set_mac_address(struct net_device *dev, void *p) > @@ -1151,7 +1167,7 @@ static void virtnet_set_rx_mode(struct net_device *dev) > { > struct virtnet_info *vi = netdev_priv(dev); > struct scatterlist sg[2]; > - u8 promisc, allmulti; > + u8 *cmdbyte; > struct virtio_net_ctrl_mac *mac_data; > struct netdev_hw_addr *ha; > int uc_count; > @@ -1163,22 +1179,25 @@ static void virtnet_set_rx_mode(struct net_device > *dev) > if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX)) > return; > > - promisc = ((dev->flags & IFF_PROMISC) != 0); > - allmulti = ((dev->flags & IFF_ALLMULTI) != 0); > + cmdbyte = kmalloc(sizeof(*cmdbyte), GFP_ATOMIC); > + if (!cmdbyte) > + return; Here the error is ignored, rx mode will be incorrect. OTOH it looks like that's already the case. > > - sg_init_one(sg, , sizeof(promisc)); > + sg_init_one(sg, cmdbyte, sizeof(*cmdbyte)); > > + *cmdbyte = ((dev->flags & IFF_PROMISC) != 0); > if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, > VIRTIO_NET_CTRL_RX_PROMISC, sg)) > dev_warn(>dev, "Failed to %sable promisc mode.\n", > - promisc ? "en" : "dis"); > - > - sg_init_one(sg, , sizeof(allmulti)); > + *cmdbyte ? "en" : "dis"); > > + *cmdbyte = ((dev->flags & IFF_ALLMULTI) != 0); > if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, > VIRTIO_NET_CTRL_RX_ALLMULTI, sg)) > dev_warn(>dev, "Failed to %sable allmulti mode.\n", > - allmulti ? "en" : "dis"); > +
Re: [PATCH v1 1/3] virtio-net: Using single MSIX IRQ for TX/RX Q pair
On Wed, Oct 28, 2015 at 11:13:39AM +0800, Jason Wang wrote: > > > On 10/27/2015 04:38 PM, Michael S. Tsirkin wrote: > > On Mon, Oct 26, 2015 at 10:52:47AM -0700, Ravi Kerur wrote: > >> Ported earlier patch from Jason Wang (dated 12/26/2014). > >> > >> This patch tries to reduce the number of MSIX irqs required for > >> virtio-net by sharing a MSIX irq for each TX/RX queue pair through > >> channels. If transport support channel, about half of the MSIX irqs > >> were reduced. > >> > >> Signed-off-by: Ravi Kerur> > Why bother BTW? > > The reason is we want to save the number of interrupt vectors used. > Booting a guest with 256 queues with current driver will result all > tx/rx queues shares a single vector. This is suboptimal. With a single CPU? But what configures so many queues? Why do it? > With this > series, half could be saved. At cost of e.g. inability to balance the interrupts. > And more complex policy could be applied on > top (e.g limit the number of vectors used by driver). If that's the motivation, I'd like to see a draft of that more complex policy first. > > Looks like this is adding a bunch of overhead > > on data path - to what end? > > I agree some benchmark is needed for this. > > > Maybe you have a huge number of these devices ... but in that case, how > > about sharing the config interrupt instead? > > That's only possible if host supports VIRTIO_1 > > (so we can detect config interrupt by reading the ISR). > > > > > > > >> --- > >> drivers/net/virtio_net.c | 29 - > >> 1 file changed, 28 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >> index d8838ded..d705cce 100644 > >> --- a/drivers/net/virtio_net.c > >> +++ b/drivers/net/virtio_net.c > >> @@ -72,6 +72,9 @@ struct send_queue { > >> > >>/* Name of the send queue: output.$index */ > >>char name[40]; > >> + > >> + /* Name of the channel, shared with irq. */ > >> + char channel_name[40]; > >> }; > >> > >> /* Internal representation of a receive virtqueue */ > >> @@ -1529,6 +1532,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > >>int ret = -ENOMEM; > >>int i, total_vqs; > >>const char **names; > >> + const char **channel_names; > >> + unsigned *channels; > >> > >>/* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by > >> * possible N-1 RX/TX queue pairs used in multiqueue mode, followed by > >> @@ -1548,6 +1553,17 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > >>if (!names) > >>goto err_names; > >> > >> + channel_names = kmalloc_array(vi->max_queue_pairs, > >> +sizeof(*channel_names), > >> +GFP_KERNEL); > >> + if (!channel_names) > >> + goto err_channel_names; > >> + > >> + channels = kmalloc_array(total_vqs, sizeof(*channels), > >> + GFP_KERNEL); > >> + if (!channels) > >> + goto err_channels; > >> + > >>/* Parameters for control virtqueue, if any */ > >>if (vi->has_cvq) { > >>callbacks[total_vqs - 1] = NULL; > >> @@ -1562,10 +1578,15 @@ static int virtnet_find_vqs(struct virtnet_info > >> *vi) > >>sprintf(vi->sq[i].name, "output.%d", i); > >>names[rxq2vq(i)] = vi->rq[i].name; > >>names[txq2vq(i)] = vi->sq[i].name; > >> + sprintf(vi->sq[i].channel_name, "txrx.%d", i); > >> + channel_names[i] = vi->sq[i].channel_name; > >> + channels[rxq2vq(i)] = i; > >> + channels[txq2vq(i)] = i; > >>} > >> > >>ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks, > >> - names); > >> + names, channels, channel_names, > >> + vi->max_queue_pairs); > >>if (ret) > >>goto err_find; > >> > >> @@ -1580,6 +1601,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > >>vi->sq[i].vq = vqs[txq2vq(i)]; > >>} > >> > >> + kfree(channels); > >> + kfree(channel_names); > >>kfree(names); > >>kfree(callbacks); > >>kfree(vqs); > >> @@ -1587,6 +1610,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > >>return 0; > >> > >> err_find: > >> + kfree(channels); > >> +err_channels: > >> + kfree(channel_names); > >> +err_channel_names: > >>kfree(names); > >> err_names: > >>kfree(callbacks); > >> -- > >> 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 0/4] Automatic adjustment of max frame size
On 15/10/28 (水) 13:58, Stephen Hemminger wrote: On Mon, 26 Oct 2015 12:40:55 +0900 Toshiaki Makitawrote: ... Thank you for taking a look at the patch set. I'm not sure if I fully understand you, so please correct me if I misread you. The problem is that you require changing network device drivers and device specific knowledge about what will work or not. Because of that the modificaton can't be automated. I'm not sure what you mean by "device specific knowledge" and "automated"... Indeed, this requires change in each driver. But required changes in drivers should be mostly making use of ndo_change_mtu implementation code and not hard. We can progressively implement ndo_enc_hdr_len for each driver. If max frame size cannot be changed on a certain NIC, vlan driver will emit a warning message and make MTU smaller, then userspace can handle it (patch 3). If needed, maybe we can expose this feature via ethtool. Also, this effects even more layered devices like tunnels etc. Yes, if tunnel devices start to utilize this framework. This is one of purposes of my patch set. The problem is quite large, and this patch only begins to address it. Yes, this is the first step to address the problem. It seems to me that just having the vlan driver to a sane auto default is the best solution. For now, this patch implementation is limited to vlan. For other protocols, auto-expansion may not be suitable and may need some nob to use the framework. If you mean just making MTU smaller on vlan device instead of adjusting max frame size of real device, then it would not work. 802.1ad HW switches, at any rate, send 1526 bytes frames so they will be dropped on the real device. It might cause a smaller MTU than ideal, but at least it will still work. Then the user can manually set a larger MTU if they know their hardware will work. Toshiaki Makita -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Dear Mr/Ms, we are a OEM parts supplier on many categories,we can supply all kinds of metal parts in compliance with customer's design. Idea and designs from customers can be realized into new products here confidentially. Any OEM metalwork is welcomed! B/R Yours James Cheung Skype:senkemfg
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
On Mon, 26 Oct 2015 21:06:33 +0100 Pablo Neira Ayusowrote: > Hi, > > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote: > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get > > Please, no need to Cc everyone here. Please, submit your Netfilter > patches to netfilter-de...@vger.kernel.org. > > Moreover, it would be great if the subject includes something > descriptive on what you need, for this I'd suggest: > > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in > nf_conntrack_find_get > > I'm including Neal P. Murphy, he said he would help testing these > backports, getting a Tested-by: tag usually speeds up things too. I hammered it a couple nights ago. First test was 5000 processes on 6 SMP CPUs opening and closing a port on a 'remote' host using the usual random source ports. Only got up to 32000 conntracks. The generator was a 64-bit Smoothwall KVM without the patch. The traffic passed through a 32-bit Smoothwall KVM with the patch. The target was on the VM host. No problems encountered. I suspect I didn't come close to triggering the original problem. Second test was a couple thousand processes all using the same source IP and port and dest IP and port. Still no problems. But these were perl scripts (and they used lots of RAM); perhaps a short C program would let me run more. Any ideas on how I might test it more brutally? N -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 1/3] virtio-net: Using single MSIX IRQ for TX/RX Q pair
On 10/28/2015 03:21 PM, Michael S. Tsirkin wrote: > On Wed, Oct 28, 2015 at 11:13:39AM +0800, Jason Wang wrote: >> >> On 10/27/2015 04:38 PM, Michael S. Tsirkin wrote: >>> On Mon, Oct 26, 2015 at 10:52:47AM -0700, Ravi Kerur wrote: Ported earlier patch from Jason Wang (dated 12/26/2014). This patch tries to reduce the number of MSIX irqs required for virtio-net by sharing a MSIX irq for each TX/RX queue pair through channels. If transport support channel, about half of the MSIX irqs were reduced. Signed-off-by: Ravi Kerur>>> Why bother BTW? >> The reason is we want to save the number of interrupt vectors used. >> Booting a guest with 256 queues with current driver will result all >> tx/rx queues shares a single vector. This is suboptimal. > With a single CPU? Even for smp guests. Or you want a per-cpu interrupt? > But what configures so many queues? Why do it? Something like cpu hot add. > >> With this >> series, half could be saved. > At cost of e.g. inability to balance the interrupts. Didn't follow. Btw, most psychical cards shares irq with tx/rx queue pair. > >> And more complex policy could be applied on >> top (e.g limit the number of vectors used by driver). > If that's the motivation, I'd like to see a draft of that more complex > policy first. How about something like: 1) Driver provides a min and max number of vectors it needs. 2) Virtio pci can then use pci_enable_msix_range() and return the actual number of vectors to driver. 3) Then driver can divide the virtqueues into different groups > >>> Looks like this is adding a bunch of overhead >>> on data path - to what end? >> I agree some benchmark is needed for this. >> >>> Maybe you have a huge number of these devices ... but in that case, how >>> about sharing the config interrupt instead? >>> That's only possible if host supports VIRTIO_1 >>> (so we can detect config interrupt by reading the ISR). >>> >>> >>> --- drivers/net/virtio_net.c | 29 - 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index d8838ded..d705cce 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -72,6 +72,9 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + /* Name of the channel, shared with irq. */ + char channel_name[40]; }; /* Internal representation of a receive virtqueue */ @@ -1529,6 +1532,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi) int ret = -ENOMEM; int i, total_vqs; const char **names; + const char **channel_names; + unsigned *channels; /* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by * possible N-1 RX/TX queue pairs used in multiqueue mode, followed by @@ -1548,6 +1553,17 @@ static int virtnet_find_vqs(struct virtnet_info *vi) if (!names) goto err_names; + channel_names = kmalloc_array(vi->max_queue_pairs, +sizeof(*channel_names), +GFP_KERNEL); + if (!channel_names) + goto err_channel_names; + + channels = kmalloc_array(total_vqs, sizeof(*channels), + GFP_KERNEL); + if (!channels) + goto err_channels; + /* Parameters for control virtqueue, if any */ if (vi->has_cvq) { callbacks[total_vqs - 1] = NULL; @@ -1562,10 +1578,15 @@ static int virtnet_find_vqs(struct virtnet_info *vi) sprintf(vi->sq[i].name, "output.%d", i); names[rxq2vq(i)] = vi->rq[i].name; names[txq2vq(i)] = vi->sq[i].name; + sprintf(vi->sq[i].channel_name, "txrx.%d", i); + channel_names[i] = vi->sq[i].channel_name; + channels[rxq2vq(i)] = i; + channels[txq2vq(i)] = i; } ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks, - names); + names, channels, channel_names, + vi->max_queue_pairs); if (ret) goto err_find; @@ -1580,6 +1601,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi) vi->sq[i].vq = vqs[txq2vq(i)]; } + kfree(channels); + kfree(channel_names); kfree(names); kfree(callbacks); kfree(vqs); @@ -1587,6 +1610,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi) return 0; err_find: + kfree(channels); +err_channels: + kfree(channel_names); +err_channel_names: kfree(names); err_names: kfree(callbacks);
Re: [PATCH v7 02/10] ss: created formatters for json and hr
> > Well, then we should wait for another voice aimed at the complexity of > the patchset before amending and resending me the patchset. > > Well, I perceive that after Sutter has taken over the maintenance responsibilitiy and answered accordingly that the outstanding resentments as resolved. Those resentments were related to the patchsets complexity and size. Right? -- Matthias Tafelmeier signature.asc Description: OpenPGP digital signature
Re: [PATCH v7 10/10] ss: activate json_writer excluded logic
On Wed, Oct 28, 2015 at 11:39:41AM +0900, Stephen Hemminger wrote: > On Tue, 27 Oct 2015 14:21:03 +0100 > Phil Sutterwrote: > > > On Thu, Sep 10, 2015 at 09:35:08PM +0200, Matthias Tafelmeier wrote: > > > This small patch extends the lib json_writer module for formerly > > > deactivated functionality. > > > > Why was it deactivated in the first place? > > The code came from another project that wasn't using this > function. Ah, I didn't get that the functions he uncomments were not added by his series in the first place. Still: - This patch should come before 02/10 which makes use of the uncommented functions here. - jsonw_null() and jsonw_null_field() are still unused, no need to uncomment them. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy
On Tue, Oct 27, 2015 at 09:01:08AM -0700, Johannes Weiner wrote: ... > > > But regardless of tcp window control, we need to account socket memory > > > in the main memory accounting pool where pressure is shared (to the > > > best of our abilities) between all accounted memory consumers. > > > > > > > No objections to this point. However, I really don't like the idea to > > charge tcp window size to memory.current instead of charging individual > > pages consumed by the workload for storing socket buffers, because it is > > inconsistent with what we have now. Can't we charge individual skb pages > > as we do in case of other kmem allocations? > > Absolutely, both work for me. I chose that route because it's where > the networking code already tracks and accounts memory consumed, so it > seemed like a better site to hook into. > > But I understand your concerns. We want to track this stuff as close > to the memory allocators as possible. Exactly. > > > > But also, there are people right now for whom the socket buffers cause > > > system OOM, but the existing memcg's hard tcp window limitq that > > > exists absolutely wrecks network performance for them. It's not usable > > > the way it is. It'd be much better to have the socket buffers exert > > > pressure on the shared pool, and then propagate the overall pressure > > > back to individual consumers with reclaim, shrinkers, vmpressure etc. > > > > This might or might not work. I'm not an expert to judge. But if you do > > this only for memcg leaving the global case as it is, networking people > > won't budge IMO. So could you please start such a major rework from the > > global case? Could you please try to deprecate the tcp window limits not > > only in the legacy memcg hierarchy, but also system-wide in order to > > attract attention of networking experts? > > I'm definitely interested in addressing this globally as well. > > The idea behind this was to use the memcg part as a testbed. cgroup2 > is going to be new and people are prepared for hiccups when migrating > their applications to it; and they can roll back to cgroup1 and tcp > window limits at any time should they run into problems in production. Then you'd better not touch existing tcp limits at all, because they just work, and the logic behind them is very close to that of global tcp limits. I don't think one can simplify it somehow. Moreover, frankly I still have my reservations about this vmpressure propagation to skb you're proposing. It might work, but I doubt it will allow us to throw away explicit tcp limit, as I explained previously. So, even with your approach I think we can still need per memcg tcp limit *unless* you get rid of global tcp limit somehow. > > So this seemed like a good way to prove a new mechanism before rolling > it out to every single Linux setup, rather than switch everybody over > after the limited scope testing I can do as a developer on my own. > > Keep in mind that my patches are not committing anything in terms of > interface, so we retain all the freedom to fix and tune the way this > is implemented, including the freedom to re-add tcp window limits in > case the pressure balancing is not a comprehensive solution. > I really dislike this kind of proof. It looks like you're trying to push something you think is right covertly, w/o having a proper discussion with networking people and then say that it just works and hence should be done globally, but what if it won't? Revert it? We already have a lot of dubious stuff in memcg that should be reverted, so let's please try to avoid this kind of mistakes in future. Note, I say "w/o having a proper discussion with networking people", because I don't think they will really care *unless* you change the global logic, simply because most of them aren't very interested in memcg AFAICS. That effectively means you loose a chance to listen to networking experts, who could point you at design flaws and propose an improvement right away. Let's please not miss such an opportunity. You said that you'd seen this problem happen w/o cgroups, so you have a use case that might need fixing at the global level. IMO it shouldn't be difficult to prepare an RFC patch for the global case first and see what people think about it. Thanks, Vladimir -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 02/10] ss: created formatters for json and hr
On Wed, Oct 28, 2015 at 09:07:47AM +0100, Matthias Tafelmeier wrote: > > > > > Well, then we should wait for another voice aimed at the complexity of > > the patchset before amending and resending me the patchset. > > > > > > Well, I perceive that after Sutter has taken over the maintenance > responsibilitiy and answered accordingly that the outstanding > resentments as resolved. I did not take over maintenance responsibility (whatever that means to you precisely). I merely reviewed the patches, focussing on the technical aspects of both implementation and patch management. Regarding the concept itself, I think the usability of filters in combination with json output is worth a discussion as well. > Those resentments were related to the patchsets complexity and size. I didn't see any problem with that in the first place. It is indeed a big change, achieving something like that without a big patch set is unlikely. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface
Hi all, I was unable to figure out who's maintaining drivers/net/ethernet/freescale/fec_main.c, so I hope someone can help out on this list... We have a board with a RMII phy connected to an i.MX6S. The hardware seems to be ok, since I can receive and transmit ethernet frames without drops or errors. However only simple things like ping and dhcp seemed to work. TCP/IP connections could not be made. When looking at both ends with tcpdump, I realized that all transmitted packages arrived at the other end with the TCP and IP header checksums zeroed-out. After issuing the following command, TCP/IP started working correctly: $ ethtool --offload eth0 tx off This works around the issue. For some reason, when the FEC is in RMII mode, it isn't filling in the checksums. On another board with an RGMII phy the same kernel works fine without the need to disable offloading. What can possibly relate this functionality to the choice of MAC interface? Best regards, -- David Jander Protonic Holland. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 02/10] ss: created formatters for json and hr
> Yeah, sorry for not having looked into this earlier. Also, I neither > have nor claim any power of veto. No big issue. Maybe Stephen can clarifiy things. I mean acknowledge no further objections. > Apart from that, I'm not against this > patch series in general, just trying to help raise it's quality a bit. Many thanks for that. > Eventually, we don't set anything in stone so everything can be > fixed/improved later on. Except Git history of course, which is > important to get right in relation to bisecting. Absolutely! -- BR Matthias signature.asc Description: OpenPGP digital signature
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On 27/10/2015 23:17, Al Viro wrote: Frankly, as far as I'm concerned, the bottom line is * there are two variants of semantics in that area and there's not much that could be done about that. Yes, that seems to be the case. * POSIX is vague enough for both variants to comply with it (it's also very badly written in the area in question). On that aspect I disagree, the POSIX semantics seem clear to me, and are different to the Linux behaviour. * I don't see any way to implement something similar to Solaris behaviour without a huge increase of memory footprint or massive cacheline pingpong. Solaris appears to go for memory footprint from hell - cacheline per descriptor (instead of a pointer per descriptor). Yes, that does seem to be the case. Thanks for the detailed explanation you've provided as to why that's so. * the benefits of Solaris-style behaviour are not obvious - all things equal it would be interesting, but the things are very much not equal. What's more, if your userland code is such that accept() argument could be closed by another thread, the caller *cannot* do anything with said argument after accept() returns, no matter which variant of semantics is used. Yes, irrespective of how you terminate the accept, once it returns with an error it's unsafe to use the FD, with the exception of failures such as EAGAIN, EINTR etc. However the shutdown() behaviour of Linux is not POSIX compliant and allowing an accept to continue of a FD that's been closed doesn't seem correct either. * [Linux-specific aside] our __alloc_fd() can degrade quite badly with some use patterns. The cacheline pingpong in the bitmap is probably inevitable, unless we accept considerably heavier memory footprint, but we also have a case when alloc_fd() takes O(n) and it's _not_ hard to trigger - close(3);open(...); will have the next open() after that scanning the entire in-use bitmap. I think I see a way to improve it without slowing the normal case down, but I'll need to experiment a bit before I post patches. Anybody with examples of real-world loads that make our descriptor allocator to degrade is very welcome to post the reproducers... It looks like the remaining discussion is going to be about Linux implementation details so I'll bow out at this point. Thanks again for all the helpful explanation. -- Alan Burlison -- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userland later on.
--- net/bridge/netfilter/ebtables.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index f46ca41..26922e9 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -989,7 +989,7 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl, the check on the size is done later, when we have the lock */ if (repl->num_counters) { unsigned long size = repl->num_counters * sizeof(*counterstmp); - counterstmp = vmalloc(size); + counterstmp = vzalloc(size); if (!counterstmp) return -ENOMEM; } @@ -1410,7 +1410,7 @@ static int copy_counters_to_user(struct ebt_table *t, return -EINVAL; } - counterstmp = vmalloc(nentries * sizeof(*counterstmp)); + counterstmp = vzalloc(nentries * sizeof(*counterstmp)); if (!counterstmp) return -ENOMEM; -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/4] sfc: allocate rx pages on the same node as the interrupt
On Wed, 2015-10-28 at 15:01 +, Shradha Shah wrote: > From: Daniel Pieczko> > When the interrupt servicing a channel is on a NUMA node that is > not local to the device, performance is improved by allocating > rx pages on the node local to the interrupt (remote to the device) > > The performance-optimal case, where interrupts and applications > are pinned to CPUs on the same node as the device, is not altered > by this change. > > This change gave a 1% improvement in transaction rate using Nginx > with all interrupts and Nginx threads on the node remote to the > device. It also gave a small reduction in round-trip latency, > again with the interrupt and application on a different node to > the device. > > Allocating rx pages based on the channel->irq_node value is only > valid for the initial driver-load interrupt affinities; if an > interrupt is moved later, the wrong node may be used for the > allocation. > > Signed-off-by: Shradha Shah > --- > drivers/net/ethernet/sfc/efx.c| 1 + > drivers/net/ethernet/sfc/net_driver.h | 3 +++ > drivers/net/ethernet/sfc/rx.c | 14 +- > 3 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c > index 974637d..89fbd03 100644 > --- a/drivers/net/ethernet/sfc/efx.c > +++ b/drivers/net/ethernet/sfc/efx.c > @@ -445,6 +445,7 @@ efx_alloc_channel(struct efx_nic *efx, int i, struct > efx_channel *old_channel) > channel->efx = efx; > channel->channel = i; > channel->type = _default_channel_type; > + channel->irq_node = NUMA_NO_NODE; > > for (j = 0; j < EFX_TXQ_TYPES; j++) { > tx_queue = >tx_queue[j]; > diff --git a/drivers/net/ethernet/sfc/net_driver.h > b/drivers/net/ethernet/sfc/net_driver.h > index ad56231..0ab9080a 100644 > --- a/drivers/net/ethernet/sfc/net_driver.h > +++ b/drivers/net/ethernet/sfc/net_driver.h > @@ -419,6 +419,7 @@ enum efx_sync_events_state { > * @sync_events_state: Current state of sync events on this channel > * @sync_timestamp_major: Major part of the last ptp sync event > * @sync_timestamp_minor: Minor part of the last ptp sync event > + * @irq_node: NUMA node of interrupt > */ > struct efx_channel { > struct efx_nic *efx; > @@ -477,6 +478,8 @@ struct efx_channel { > enum efx_sync_events_state sync_events_state; > u32 sync_timestamp_major; > u32 sync_timestamp_minor; > + > + int irq_node; > }; > > #ifdef CONFIG_NET_RX_BUSY_POLL > diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c > index 3f0e129..c5ef1e8 100644 > --- a/drivers/net/ethernet/sfc/rx.c > +++ b/drivers/net/ethernet/sfc/rx.c > @@ -168,11 +168,15 @@ static int efx_init_rx_buffers(struct efx_rx_queue > *rx_queue, bool atomic) >* context in such a case. So, use __GFP_NO_WARN >* in case of atomic. >*/ > - page = alloc_pages(__GFP_COLD | __GFP_COMP | > -(atomic ? > - (GFP_ATOMIC | __GFP_NOWARN) > - : GFP_KERNEL), > -efx->rx_buffer_order); > + struct efx_channel *channel; > + > + channel = efx_rx_queue_channel(rx_queue); > + page = alloc_pages_node(channel->irq_node, __GFP_COMP | > + (atomic ? > + (GFP_ATOMIC | __GFP_NOWARN) > + : GFP_KERNEL), > + efx->rx_buffer_order); > + > if (unlikely(page == NULL)) > return -ENOMEM; > dma_addr = > Sorry, I do not understand this patch, and why the following one is not squashed on this one. irq_node is always NUMA_NO_NODE (in this patch) So you claim a 1% improvement, switching from alloc_pages(...) to alloc_pages_node(NUMA_NO_NODE, ...) ??? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: bridge: use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userland la
--- net/bridge/netfilter/ebtables.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index f46ca41..26922e9 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -989,7 +989,7 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl, the check on the size is done later, when we have the lock */ if (repl->num_counters) { unsigned long size = repl->num_counters * sizeof(*counterstmp); - counterstmp = vmalloc(size); + counterstmp = vzalloc(size); if (!counterstmp) return -ENOMEM; } @@ -1410,7 +1410,7 @@ static int copy_counters_to_user(struct ebt_table *t, return -EINVAL; } - counterstmp = vmalloc(nentries * sizeof(*counterstmp)); + counterstmp = vzalloc(nentries * sizeof(*counterstmp)); if (!counterstmp) return -ENOMEM; -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] unix: fix use-after-free in unix_dgram_poll()
Rainer Weikusatwrites: > Jason Baron writes: [...] >> 2) >> >> For the case of epoll() in edge triggered mode we need to ensure that >> when we return -EAGAIN from unix_dgram_sendmsg() when unix_recvq_full() >> is true, we need to add a unix_peer_wake_connect() call to guarantee a >> wakeup. Otherwise, we are going to potentially hang there. > > I consider this necessary. (As already discussed privately) just doing this would open up another way for sockets to be enqueued on the peer_wait queue of the peer forever despite no one wants to be notified of write space availability. Here's another RFC patch addressing the issues so far plus this one by breaking the connection to the peer socket from the wake up relaying function. This has the nice additional property that the dgram_poll code becomes somewhat simpler as the "dequeued where we didn't enqueue" situation can no longer occur and the not-so-nice additional property that the connect and disconnect functions need to take the peer_wait.lock spinlock explicitly so that this lock is used to ensure that no two threads modifiy the private pointer of the client wait_queue_t. I've also moved the check, possibly enqueue then recheck and possibly dequeue dance into a pair of functions as this code would be identical for both unix_dgram_poll and unix_dgram_sendmsg (I'm not really happy with the names, though). --- --- linux-2-6.b/net/unix/af_unix.c 2015-10-28 16:06:29.581960497 + +++ linux-2-6/net/unix/af_unix.c2015-10-28 16:14:55.326065483 + @@ -115,6 +115,8 @@ #include #include +#define POLL_OUT_ALL (POLLOUT | POLLWRNORM | POLLWRBAND) + static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1]; static DEFINE_SPINLOCK(unix_table_lock); static atomic_long_t unix_nr_socks; @@ -303,6 +305,117 @@ found: return s; } +/* + * Support code for asymmetrically connected dgram sockets + * + * If a datagram socket is connected to a socket not itself connected + * to the first socket (eg, /dev/log), clients may only enqueue more + * messages if the present receive queue of the server socket is not + * "too large". This means there's a second writability condition poll + * and sendmsg need to test. The dgram recv code will do a wake up on + * the peer_wait wait queue of a socket upon reception of a datagram + * which needs to be propagated to sleeping writers since these might + * not yet have sent anything. This can't be accomplished via + * poll_wait because the lifetime of the server socket might be less + * than that of its clients if these break their association with it + * or if the server socket is closed while clients are still connected + * to it and there's no way to inform "a polling implementation" that + * it should let go of a certain wait queue + * + * In order to achieve wake up propagation, a wait_queue_t of the + * client socket is thus enqueued on the peer_wait queue of the server + * socket whose wake function does a wake_up on the ordinary client + * socket wait queue. This connection is established whenever a write + * (or poll for write) hit the flow control condition and broken when + * the connection to the server socket is dissolved or after a wake up + * was relayed. + */ + +static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags, + void *key) +{ + struct unix_sock *u; + wait_queue_head_t *u_sleep; + + u = container_of(q, struct unix_sock, peer_wake); + + __remove_wait_queue(_sk(u->peer_wake.private)->peer_wait, + >peer_wake); + u->peer_wake.private = NULL; + + /* relaying can only happen while the wq still exists */ + u_sleep = sk_sleep(>sk); + if (u_sleep) + wake_up_interruptible_poll(u_sleep, key); + + return 0; +} + +static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + + spin_lock(_other->peer_wait.lock); + + if (!u->peer_wake.private) { + u->peer_wake.private = other; + __add_wait_queue(_other->peer_wait, >peer_wake); + + rc = 1; + } + + spin_unlock(_other->peer_wait.lock); + return rc; +} + +static int unix_dgram_peer_wake_disconnect(struct sock *sk, struct sock *other) +{ + struct unix_sock *u, *u_other; + int rc; + + u = unix_sk(sk); + u_other = unix_sk(other); + rc = 0; + + spin_lock(_other->peer_wait.lock); + + if (u->peer_wake.private == other) { + __remove_wait_queue(_other->peer_wait, >peer_wake); + u->peer_wake.private = NULL; + + rc = 1; + } + + spin_unlock(_other->peer_wait.lock); + return rc; +} + +static inline int
Re: [RFC] unix: fix use-after-free in unix_dgram_poll()
On 10/28/2015 12:46 PM, Rainer Weikusat wrote: > Rainer Weikusatwrites: >> Jason Baron writes: > > [...] > >>> 2) >>> >>> For the case of epoll() in edge triggered mode we need to ensure that >>> when we return -EAGAIN from unix_dgram_sendmsg() when unix_recvq_full() >>> is true, we need to add a unix_peer_wake_connect() call to guarantee a >>> wakeup. Otherwise, we are going to potentially hang there. >> >> I consider this necessary. > > (As already discussed privately) just doing this would open up another > way for sockets to be enqueued on the peer_wait queue of the peer > forever despite no one wants to be notified of write space > availability. Here's another RFC patch addressing the issues so far plus > this one by breaking the connection to the peer socket from the wake up > relaying function. This has the nice additional property that the > dgram_poll code becomes somewhat simpler as the "dequeued where we > didn't enqueue" situation can no longer occur and the not-so-nice > additional property that the connect and disconnect functions need to > take the peer_wait.lock spinlock explicitly so that this lock is used to > ensure that no two threads modifiy the private pointer of the client > wait_queue_t. Hmmm...I thought these were already all guarded by unix_state_lock(sk). In any case, rest of the patch overall looks good to me. Thanks, -Jason -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/4] sfc: set and clear interrupt affinity hints
Hello. On 10/28/2015 06:02 PM, Shradha Shah wrote: From: Bert KenwardUse cpumask_local_spread to provide interrupt affinity hints for each queue. This will spread interrupts across NUMA local CPUs first, extending to remote nodes if needed. Signed-off-by: Shradha Shah --- drivers/net/ethernet/sfc/efx.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 84f9e90..93c4c0e 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -1489,6 +1489,30 @@ static int efx_probe_interrupts(struct efx_nic *efx) return 0; } +#if defined(CONFIG_SMP) +static void efx_set_interrupt_affinity(struct efx_nic *efx) +{ + struct efx_channel *channel; + unsigned int cpu; + + efx_for_each_channel(channel, efx) { + cpu = cpumask_local_spread(channel->channel, + pcibus_to_node(efx->pci_dev->bus)); + + irq_set_affinity_hint(channel->irq, cpumask_of(cpu)); + channel->irq_mem_node = cpu_to_mem(cpu); + } +} + +static void efx_clear_interrupt_affinity(struct efx_nic *efx) +{ + struct efx_channel *channel; + + efx_for_each_channel(channel, efx) + irq_set_affinity_hint(channel->irq, NULL); +} +#endif /* CONFIG_SMP */ + static int efx_soft_enable_interrupts(struct efx_nic *efx) { struct efx_channel *channel, *end_channel; @@ -2932,6 +2956,9 @@ static void efx_pci_remove_main(struct efx_nic *efx) cancel_work_sync(>reset_work); efx_disable_interrupts(efx); +#if defined(CONFIG_SMP) + efx_clear_interrupt_affinity(efx); +#endif Please just define empty function for !SMP case instead of the ugly #ifdef'fery in the functiojn bodies. [...] MBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bridge: use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userlan
On Wed, Oct 28, 2015 at 09:10:20PM +0300, Sergei Shtylyov wrote: > Hello. > >Your subject is too long, it should have been placed in the changelog > partially. You you didn't sign off on the patch, so it can't applied. > > MBR, Sergei Thank you. Please reject this patch. I re-sent a proper one in another mail. > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bridge: use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userlan
Please reject this patch. I sent a proper one with the sign-on later on. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 1/1] tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers
Testing of the new UDP bearer has revealed that reception of NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE message buffers is not prepared for the case that those may be non-linear. We now linearize all such buffers before they are delivered up to the generic reception layer. In order for the commit to apply cleanly to 'net' and 'stable', we do the change in the function tipc_udp_recv() for now. Later, we will post a commit to 'net-next' moving the linearization to generic code, in tipc_named_rcv() and tipc_link_proto_rcv(). Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy--- net/tipc/udp_media.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 6e648d9..cd7c5f1 100644 --- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -48,6 +48,7 @@ #include #include "core.h" #include "bearer.h" +#include "msg.h" /* IANA assigned UDP port */ #define UDP_PORT_DEFAULT 6118 @@ -222,6 +223,10 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb) { struct udp_bearer *ub; struct tipc_bearer *b; + int usr = msg_user(buf_msg(skb)); + + if ((usr == LINK_PROTOCOL) || (usr == NAME_DISTRIBUTOR)) + skb_linearize(skb); ub = rcu_dereference_sk_user_data(sk); if (!ub) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] ipv4: use l4 hash for locally generated multipath flows
This patch changes how the multipath hash is computed for locally generated UDP or TCP flows: now the hash comprises also l4 information (source and destination port). This allows better utilization of the available paths when the existing flows have the same source IP and the same destination IP: with l3 hash, even when multiple connections are in place simultaneously, a single path will be used, while with l4 hash we can use all the available paths. Signed-off-by: Paolo Abeni--- include/net/ip_fib.h | 12 net/ipv4/fib_semantics.c | 3 ++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index ac5c6e8..56bf68c 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -328,6 +328,18 @@ static inline int fib_multipath_hash(__be32 saddr, __be32 daddr) return jhash_2words(saddr, daddr, fib_multipath_secret) >> 1; } +static inline int fib_multipath_output_hash(const struct flowi4 *fl4) +{ + if ((fl4->flowi4_proto == IPPROTO_TCP) || + (fl4->flowi4_proto == IPPROTO_UDP)) + return jhash_3words(fl4->saddr, fl4->daddr, + *((__u32 *)>uli.ports), + fib_multipath_secret) >> 1; + + return jhash_2words(fl4->saddr, fl4->daddr, fib_multipath_secret) >> 1; +} + + void fib_select_multipath(struct fib_result *res, int hash); void fib_select_path(struct net *net, struct fib_result *res, struct flowi4 *fl4, int mp_hash); diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 42778d9..8a18349 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -1564,7 +1564,8 @@ void fib_select_path(struct net *net, struct fib_result *res, #ifdef CONFIG_IP_ROUTE_MULTIPATH if (res->fi->fib_nhs > 1 && fl4->flowi4_oif == 0) { if (mp_hash < 0) - mp_hash = fib_multipath_hash(fl4->saddr, fl4->daddr); + mp_hash = fib_multipath_output_hash(fl4); + fib_select_multipath(res, mp_hash); } else -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1 net-next] net: bridge: use vzalloc() instead of vmalloc() for counterstmp
counterstmp is not cleared before it is used in get_counters(). it might be leaked partially when it is sent to userland later on. Signed-off-by: Loganaden Velvindron--- net/bridge/netfilter/ebtables.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index f46ca41..26922e9 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -989,7 +989,7 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl, the check on the size is done later, when we have the lock */ if (repl->num_counters) { unsigned long size = repl->num_counters * sizeof(*counterstmp); - counterstmp = vmalloc(size); + counterstmp = vzalloc(size); if (!counterstmp) return -ENOMEM; } @@ -1410,7 +1410,7 @@ static int copy_counters_to_user(struct ebt_table *t, return -EINVAL; } - counterstmp = vmalloc(nentries * sizeof(*counterstmp)); + counterstmp = vzalloc(nentries * sizeof(*counterstmp)); if (!counterstmp) return -ENOMEM; -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bridge: use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userlan
Hello. Your subject is too long, it should have been placed in the changelog partially. You you didn't sign off on the patch, so it can't applied. MBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next] hyperv: Add handler for RNDIS_STATUS_NETWORK_CHANGE event
> -Original Message- > From: Richard Weinberger [mailto:richard.weinber...@gmail.com] > Sent: Tuesday, October 27, 2015 6:36 PM > To: David Miller> Cc: Haiyang Zhang ; o...@aepfle.de; Greg Kroah- > Hartman ; netdev@vger.kernel.org; > jasow...@redhat.com; driverdev-de...@linuxdriverproject.org; LKML > > Subject: Re: [PATCH net-next] hyperv: Add handler for > RNDIS_STATUS_NETWORK_CHANGE event > > On Mon, Jun 23, 2014 at 10:10 PM, David Miller > wrote: > > From: Haiyang Zhang > > Date: Mon, 23 Jun 2014 16:09:59 + > > > >> So, what's the equivalent or similar command to "network restart" on > >> SLES12? Could you update the command line for the usermodehelper > when > >> porting this patch to SLES 12? > > > > No, you are not going to keep the usermodehelper invocation in your > > driver please remove it. It is absolutely inappropriate, and I > > strictly do not want to keep it in there because other people will > > copy it and then we'll have a real mess on our hands. > > Sorry for digging up this old thread. > While talking with some guys about usermodehelper abuses I came across > this gem. > Mainline still contains that "/etc/init.d/network restart" code. > Haiyang, care to cleanup? I will clean up the usermode helper soon. Thanks, - Haiyang N�r��yb�X��ǧv�^�){.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥
Re: [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy
On Wed, Oct 28, 2015 at 11:20:03AM +0300, Vladimir Davydov wrote: > Then you'd better not touch existing tcp limits at all, because they > just work, and the logic behind them is very close to that of global tcp > limits. I don't think one can simplify it somehow. Uhm, no, there is a crapload of boilerplate code and complication that seems entirely unnecessary. The only thing missing from my patch seems to be the part where it enters memory pressure state when the limit is hit. I'm adding this for completeness, but I doubt it even matters. > Moreover, frankly I still have my reservations about this vmpressure > propagation to skb you're proposing. It might work, but I doubt it > will allow us to throw away explicit tcp limit, as I explained > previously. So, even with your approach I think we can still need > per memcg tcp limit *unless* you get rid of global tcp limit > somehow. Having the hard limit as a failsafe (or a minimum for other consumers) is one thing, and certainly something I'm open to for cgroupv2, should we have problems with load startup up after a socket memory landgrab. That being said, if the VM is struggling to reclaim pages, or is even swapping, it makes perfect sense to let the socket memory scheduler know it shouldn't continue to increase its footprint until the VM recovers. Regardless of any hard limitations/minimum guarantees. This is what my patch does and it seems pretty straight-forward to me. I don't really understand why this is so controversial. The *next* step would be to figure out whether we can actually *reclaim* memory in the network subsystem--shrink windows and steal buffers back--and that might even be an avenue to replace tcp window limits. But it's not necessary for *this* patch series to be useful. > > So this seemed like a good way to prove a new mechanism before rolling > > it out to every single Linux setup, rather than switch everybody over > > after the limited scope testing I can do as a developer on my own. > > > > Keep in mind that my patches are not committing anything in terms of > > interface, so we retain all the freedom to fix and tune the way this > > is implemented, including the freedom to re-add tcp window limits in > > case the pressure balancing is not a comprehensive solution. > > I really dislike this kind of proof. It looks like you're trying to > push something you think is right covertly, w/o having a proper > discussion with networking people and then say that it just works > and hence should be done globally, but what if it won't? Revert it? > We already have a lot of dubious stuff in memcg that should be > reverted, so let's please try to avoid this kind of mistakes in > future. Note, I say "w/o having a proper discussion with networking > people", because I don't think they will really care *unless* you > change the global logic, simply because most of them aren't very > interested in memcg AFAICS. Come on, Dave is the first To and netdev is CC'd. They might not care about memcg, but "pushing things covertly" is a bit of a stretch. > That effectively means you loose a chance to listen to networking > experts, who could point you at design flaws and propose an improvement > right away. Let's please not miss such an opportunity. You said that > you'd seen this problem happen w/o cgroups, so you have a use case that > might need fixing at the global level. IMO it shouldn't be difficult to > prepare an RFC patch for the global case first and see what people think > about it. No, the problem we are running into is when network memory is not tracked per cgroup. The lack of containment means that the socket memory consumption of individual cgroups can trigger system OOM. We tried using the per-memcg tcp limits, and that prevents the OOMs for sure, but it's horrendous for network performance. There is no "stop growing" phase, it just keeps going full throttle until it hits the wall hard. Now, we could probably try to replicate the global knobs and add a per-memcg soft limit. But you know better than anyone else how hard it is to estimate the overall workingset size of a workload, and the margins on containerized loads are razor-thin. Performance is much more sensitive to input errors, and often times parameters must be adjusted continuously during the runtime of a workload. It'd be disasterous to rely on yet more static, error-prone user input here. What all this means to me is that fixing it on the cgroup level has higher priority. But it also means that once we figured it out under such a high-pressure environment, it's much easier to apply to the global case and potentially replace the soft limit there. This seems like a better approach to me than starting globally, only to realize that the solution is not workable for cgroups and we need yet something else. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at
[BUG] Erroneous behavior in try_to_coalesce
Hello, Recently I observed 2 crashes on one of my server with the following backtraces: [22751.889645] [ cut here ] [22751.889660] WARNING: CPU: 38 PID: 12807 at net/core/skbuff.c:3498 skb_try_coalesce+0x34b/0x360() [22751.889661] Modules linked in: tcp_diag inet_diag xt_LOG xt_limit xt_addrtype xt_multiport xt_pkt type xt_conntrack netconsole act_police cls_basic sch_ingress veth ipv6 openvswitch gre vxlan ip_tun nel xt_owner xt_state iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ixgbe i2c_i801 lpc_ich mfd_core igb i2c_algo_bit ioapic ses enclosure ioatdma dca ipmi_devintf ipmi_si ipmi_msghandler aacraid [22751.889704] CPU: 38 PID: 12807 Comm: handler22 Not tainted 3.12.49-clouder2 #2 [22751.889706] Hardware name: Supermicro PIO-617R-TLN4F+-ST031/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0b 05/27/2014 [22751.889708] 0daa 883fff4839e8 81643c91 0daa [22751.889716] 883fff483a28 81089acc 883fff483b68 [22751.889721] 8832bd282b00 882e6b0190e8 883fff483aa4 05b4 [22751.889726] Call Trace: [22751.889728][] dump_stack+0x58/0x7f [22751.889739] [] warn_slowpath_common+0x8c/0xc0 [22751.889742] [] warn_slowpath_null+0x1a/0x20 [22751.889745] [] skb_try_coalesce+0x34b/0x360 [22751.889752] [] tcp_try_coalesce+0x69/0xc0 [22751.889755] [] tcp_queue_rcv+0x53/0x130 [22751.889758] [] tcp_data_queue+0x1d3/0xd40 [22751.889761] [] tcp_rcv_established+0x319/0x5e0 [22751.889767] [] ? nf_nat_ipv4_fn+0x1e1/0x270 [iptable_nat] [22751.889771] [] tcp_v4_do_rcv+0x152/0x3d0 [22751.889777] [] ? security_sock_rcv_skb+0x16/0x20 [22751.889781] [] ? sk_filter+0x37/0xf0 [22751.889784] [] tcp_v4_rcv+0x6b7/0x730 [22751.889787] [] ? ip_rcv+0x3a0/0x3a0 [22751.889791] [] ? nf_hook_slow+0x85/0x130 [22751.889794] [] ? ip_rcv+0x3a0/0x3a0 [22751.889796] [] ip_local_deliver_finish+0xc2/0x250 [22751.889799] [] ip_local_deliver+0x88/0x90 [22751.889802] [] ip_rcv_finish+0x119/0x380 [22751.889804] [] ip_rcv+0x2c5/0x3a0 [22751.889809] [] ? netdev_frame_hook+0xb5/0x130 [openvswitch] [22751.889815] [] __netif_receive_skb_core+0x626/0x7e0 [22751.889818] [] __netif_receive_skb+0x27/0x70 [22751.889820] [] process_backlog+0xd9/0x1e0 [22751.889823] [] net_rx_action+0x12c/0x280 [22751.889828] [] __do_softirq+0x137/0x2e0 [22751.889832] [] call_softirq+0x1c/0x30 [22751.889833][] do_softirq+0x8d/0xc0 [22751.889843] [] ? ovs_packet_cmd_execute+0x217/0x250 [openvswitch] [22751.889846] [] local_bh_enable+0xdb/0xf0 [22751.889849] [] ovs_packet_cmd_execute+0x217/0x250 [openvswitch] [22751.889853] [] genl_family_rcv_msg+0x221/0x390 [22751.889856] [] ? genl_family_rcv_msg+0x390/0x390 [22751.889858] [] genl_rcv_msg+0x63/0xb0 [22751.889861] [] netlink_rcv_skb+0xa9/0xd0 [22751.889864] [] genl_rcv+0x2c/0x40 [22751.889867] [] netlink_unicast+0x10f/0x190 [22751.889869] [] netlink_sendmsg+0x2bb/0x650 [22751.889874] [] ? __pollwait+0xf0/0xf0 [22751.889881] [] sock_sendmsg+0x90/0xc0 [22751.889883] [] ? __pollwait+0xf0/0xf0 [22751.889887] [] ? local_bh_enable_ip+0x87/0xf0 [22751.889890] [] ? _raw_spin_unlock_bh+0x24/0x30 [22751.889894] [] ? verify_iovec+0x8d/0x110 [22751.889898] [] ___sys_sendmsg+0x417/0x440 [22751.889904] [] ? ep_poll+0x144/0x370 And then alter the actual crashed occured: [44923.628546] BUG: unable to handle kernel paging request at 00820299 [44923.629139] IP: [] kfree_skb_list+0x18/0x30 [44923.629463] PGD 35cc3b5067 PUD 0 [44923.629823] Oops: [#1] SMP [44923.630182] Modules linked in: tcp_diag inet_diag xt_LOG xt_limit xt_addrtype xt_multiport xt_pkttype xt_conntrack netconsole act_police cls_basic sch_ingress veth ipv6 openvswitch gre vxlan ip_tunnel xt_owner xt_state iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ixgbe i2c_i801 lpc_ich mfd_core igb i2c_algo_bit ioapic ses enclosure ioatdma dca ipmi_devintf ipmi_si ipmi_msghandler aacraid [44923.634368] CPU: 10 PID: 39391 Comm: kworker/u80:0 Tainted: G W3.12.49-clouder2 #2 [44923.634851] Hardware name: Supermicro PIO-617R-TLN4F+-ST031/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0b 05/27/2014 [44923.635340] Workqueue: dm-thin do_worker [dm_thin_pool] [44923.635653] task: 881918cb0810 ti: 880d5a4ea000 task.ti: 880d5a4ea000 [44923.635926] RIP: 0010:[] [] kfree_skb_list+0x18/0x30 [44923.636251] RSP: 0018:883fff003cd0 EFLAGS: 00010206 [44923.636521] RAX: RBX: 882e5622be00 RCX: 883fd12b9800 [44923.636791] RDX: 0100 RSI: 0040 RDI: 00820299 [44923.637064] RBP: 883fff003ce0 R08: 00dc R09: 0003 [44923.637336] R10: 0003 R11:
Re: [BUG] Erroneous behavior in try_to_coalesce
On Thu, 2015-10-29 at 04:19 +0900, Nikolay Borisov wrote: > > > Could you please comment whether it looks viable so that I can resend > as a proper fix? Also the interesting question is what kind of packets > could trigger this warn_on_once? In both traces ovs_packet_cmd_execute > is present so I suspect it might be possible that somehow openvswitch is > injecting wrong packets which make the kernel crash. Bug is the packet producer, not in try_to_coalesce() This issue comes up on netdev from times to times... The WARN_ON() in try_to_coalesce() is an attempt to detect a producer made a lie about truesize, leading to OOM in case of abuses. Do not paper over the bug, find the root cause and fix it, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Intel-wired-lan] [net PATCH v2] ixgbe: Reset interface after enabling SR-IOV
-Original Message- From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On Behalf Of Alexander Duyck Sent: Tuesday, October 20, 2015 1:28 PM To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org Subject: [Intel-wired-lan] [net PATCH v2] ixgbe: Reset interface after enabling SR-IOV Enabling SR-IOV and then bringing the interface up was resulting in the PF MAC addresses getting into a bad state. Specifically the MAC address was enabled for both VF 0 and the PF. This resulted in some odd behaviors such as VF 0 receiving a copy of the PFs traffic, which in turn enables the ability for VF 0 to spoof the PF. A workaround for this issue appears to be to bring up the interface first and then enable SR-IOV as this way the reset is then triggered in the existing code. In order to correct this I have added a change to ixgbe_setup_tc where if the interface is down we still will at least call ixgbe_reset so that the MAC addresses for the device are reset to the correct pools. Steps to reproduce issue: modprobe ixgbe echo 7 > /sys/bus/pci/devices/\:01\:00.1/sriov_numvfs ifconfig enp1s0f1 up ethregs -s 1:00.1 | grep MPSAR | grep -v Result: MPSAR[0] 0081 MPSAR[254] 0001 Expected Result, behavior after patch: MPSAR[0] 0080 MPSAR[254] 0080 Signed-off-by: Alexander Duyck--- Tested-by: Darin Miller -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE:
Hello, I am Major. Alan Edward, in the military unit here in Afghanistan and i need an urgent assistance with someone i can trust,It's risk free and legal. --- This email has been checked for viruses by Avast antivirus software. http://www.avast.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Congratulation !!!!
Congratulation,You have been selected to receive the sum of $850,000 Donation from my won Lottery Money, Kindly get back to me now and Claim your Cash. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT] [4.4] NFC update
Hi David, This is the NFC pull request for 4.4. It's a bit bigger than usual, the 3 main culprits being: - A new driver for Intel's Fields Peak NCI chipset. In order to support this chipset we had to export a few NCI routines and extend the driver NCI ops to not only support proprietary commands but also core ones. - Support for vendor commands for both STM drivers, st-nci and st21nfca. Those vendor commands allow to run factory tests through the NFC netlink interface. - New i2c and SPI support for the Marvell driver, together with firmware download support for this driver's core. Besides that we also have: - A few file renames in the STM drivers, to keep the naming consistent between drivers. - Some improvements and fixes on the NCI HCI layer, mostly to properly reach a secure element over a legacy HCI link. - A few fixes for the s3fwrn5 and trf7970a drivers. The following changes since commit f6d3125fa3c2f55ddf7cf69365c41089de6cfae6: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-10-02 07:21:25 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next.git tags/nfc-next-4.4-2 for you to fetch changes up to f11631748ee6973f85238109a3fa8ab8e760e5a4: NFC: nci: non-static functions can not be inline (2015-10-28 06:44:45 +0100) Axel Lin (2): nfc: s3fwrn5: Make NFC_S3FWRN5 select CRYPTO nfc: s3fwrn5: i2c: Use devm_request_threaded_irq to avoid irq leak Christophe JAILLET (1): NFC: nfcwilink: Drop a useless static qualifier Christophe Ricard (35): NFC: st-nci: Align st-nci driver with other nfc driver NFC: st-nci: include st-nci.h instead of ndlc.h NFC: st21nfca: Align st21nfca driver with other nfc driver NFC: st-nci: Fix incorrect spi buffer size NFC: nci: Fix incorrect data chaining when sending data NFC: nci: Fix improper management of HCI return code NFC: nci: extract pipe value using NCI_HCP_MSG_GET_PIPE NFC: nci: add nci_hci_clear_all_pipes functions NFC: nci: Call nci_hci_clear_all_pipes at HCI initial activation. NFC: nci: Create pipe on specific gate in nci_hci_connect_gate NFC: st-nci: Remove HCI init_data.gates initialization in load_session NFC: st21nfca: Remove HCI gates initialization in load_session NFC: st-nci: Open NCI_HCI_LINK_MGMT_PIPE NFC: st21nfca: Open NFC_HCI_LINK_MGMT_PIPE NFC: st-nci: Keep st_nci_gates unchanged in load_session NFC: st21nfca: Keep st21nfca_gates unchanged in load_session NFC: st-nci: initialize gate_count in st_nci_hci_network_init NFC: st-nci: Add support for NCI_HCI_IDENTITY_MGMT_GATE NFC: st-nci: Fix st_nci_gates offset NFC: st21nfca: Fix st21nfca_gates offset NFC: st-nci: Add support for proprietary commands NFC: st-nci: Add error messages when an unexpected HCI event occurs NFC: netlink: Add missing NFC_ATTR comments NFC: st-nci: Add ese-present/uicc-present dts properties NFC: st-nci: Increase delay between 2 secure element activations NFC: st-nci: Fix host_list verification after SE activation NFC: st21nfca: Fix host_list verification after SEactivation NFC: netlink: Add mode parameter to deactivate_target functions NFC: st-nci: Add few code style fixes NFC: st21nfca: Add few code style fixes NFC: st21nfca: Add error messages for unexpected HCI events NFC: st-nci: Disable irq when powering the device up NFC: st-nci: remove duplicated skb dump NFC: st-nci: Replace st21nfcb by st_nci in makefile NFC: st21nfca: Add support for proprietary commands Javier Martinez Canillas (1): NFC: trf7970a: Add OF match table Jean Delvare (3): NFC: pn544: Auto-select core module NFC: microread: Auto-select core module NFC: nfcmrvl: Auto-select core module Julia Lawall (2): NFC: nxp-nci: constify nxp_nci_phy_ops structure NFC: delete null dereference Robert Dolca (11): NFC: nci: Export nci data send API NFC: nci: Add function to get max packet size for conn NFC: nci: Introduce new core opcodes NFC: nci: Do not call post_setup when setup fails NFC: nci: Introduce nci_core_cmd NFC: nci: Allow the driver to set handler for core nci ops NFC: nci: rename nci_prop_ops to nci_driver_ops NFC: nci: fix possible crash in nci_core_conn_create NFC: nci: add nci_get_conn_info_by_id function NFC: Add Intel Fields Peak NFC solution driver NFC: nci: non-static functions can not be inline Samuel Ortiz (2): NFC: nci: Use __nci_request for exported routines NFC: st-nci: Rename st-nci_se.c Valentin Rothberg (1): NFC: s3fwrn5: Remove superfluous cflags Vincent Cuissard (9): NFC: nfcmrvl: remove unneeded version defines NFC: NCI: export nci_send_frame and nci_send_cmd
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On Thu, 2015-10-29 at 00:15 +, Al Viro wrote: > On Wed, Oct 28, 2015 at 04:08:29PM -0700, Eric Dumazet wrote: > > > > Except for legacy stuff and stdin/stdout/stderr games, I really doubt > > > > lot of applications absolutely rely on the POSIX thing... > > > > > > We obviously can't turn that into default behaviour, though. BTW, what > > > distribution do you have in mind for those random descriptors? Uniform > > > on [0,INT_MAX] is a bad idea for obvious reasons - you'll blow the > > > memory footprint pretty soon... > > > > Simply [0 , fdt->max_fds] is working well in most cases. > > Umm... So first you dup2() to establish the ->max_fds you want, then > do such opens? Yes, dup2() is done at program startup, knowing the expected max load (in term of concurrent fd) + ~10 % (actual fd array size can be more than this because of power of two rounding in alloc_fdtable() ) But this is an optimization : If you do not use the initial dup2(), the fd array can be automatically expanded if needed (all slots are in use) > What used/unused ratio do you expect to deal with? > And what kind of locking are you going to use? Keep in mind that > e.g. dup2() is dependent on the lack of allocations while it's working, > so it's not as simple as "we don't need no stinkin' ->files_lock"... No locking change. files->file_lock is still taken. We only want to minimize time to find an empty slot. The trick is to not start bitmap search at files->next_fd, but a random point. This is a win if we assume there are enough holes. low = start; if (low < files->next_fd) low = files->next_fd; res = -1; if (flags & O_FD_FASTALLOC) { random_point = pick_random_between(low, fdt->max_fds); res = find_next_zero_bit(fdt->open_fds, fdt->max_fds, random_point); /* No empty slot found, try the other range */ if (res >= fdt->max_fds) { res = find_next_zero_bit(fdt->open_fds, low, random_point); if (res >= random_point) res = -1; } } ... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On Wed, Oct 28, 2015 at 08:29:41PM -0700, Eric Dumazet wrote: > But this is an optimization : If you do not use the initial dup2(), the > fd array can be automatically expanded if needed (all slots are in use) Whee... > No locking change. files->file_lock is still taken. > > We only want to minimize time to find an empty slot. Then I'd say that my variant is going to win. It *will* lead to cacheline pingpong in more cases than yours, but I'm quite sure that it will be a win as far as the total amount of cachelines accessed. > The trick is to not start bitmap search at files->next_fd, but a random > point. This is a win if we assume there are enough holes. > > low = start; > if (low < files->next_fd) > low = files->next_fd; > > res = -1; > if (flags & O_FD_FASTALLOC) { > random_point = pick_random_between(low, fdt->max_fds); > > res = find_next_zero_bit(fdt->open_fds, fdt->max_fds, > random_point); > /* No empty slot found, try the other range */ > if (res >= fdt->max_fds) { > res = find_next_zero_bit(fdt->open_fds, > low, random_point); > if (res >= random_point) > res = -1; > } > } Have you tried to experiment with that in userland? I mean, emulate that thing in normal userland code, count the cacheline accesses and drive it with the use patterns collected from actual applications. I can sit down and play with math expectations, but I suspect that it's easier to experiment. It's nothing but an intuition (I hadn't seriously done probability theory in quite a while, and my mathematical tastes run more to geometry and topology anyway), but... I would expect it to degrade badly when the bitmap is reasonably dense. Note, BTW, that vmalloc'ed memory gets populated as you read it, and it's not cheap - it's done via #PF triggered in kernel mode, with handler noticing that the faulting address is in vmalloc range and doing the right thing. IOW, if your bitmap is very sparse, the price of page faults needs to be taken into account. AFAICS, the only benefit of that thing is keeping dirtied cachelines far from each other. Which might be a win overall, but I'm not convinced that the rest won't offset the effect of that... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [V5, 2/6] fsl/fman: Add FMan support
On Tue, 2015-10-27 at 11:32 -0500, Liberman Igal-B31950 wrote: > > > + > > > +struct device *fman_get_device(struct fman *fman) { > > > + return fman->dev; > > > +} > > > > Is this really necessary? > > > > Fman port needs fman->dev, fman structure is opaque, so yes, it's needed. Why is opacity being maintained from one part of the fman driver to another? Isn't this the sort of excessive layering that was complained about? > > > + /* In B4 rev 2.0 (and above) the MURAM size is 512KB. > > > + * Check the SVR and update MURAM size if required. > > > + */ > > > + u32 svr; > > > + > > > + svr = mfspr(SPRN_SVR); > > > + > > > + if ((SVR_SOC_VER(svr) == SVR_B4860) && (SVR_MAJ(svr) >= > > 2)) > > > + fman->dts_params.muram_size = 0x8; > > > + } > > > > Why wasn't the MURAM size described in the device tree, as it was with > > CPM/QE? > > > > MURAM size described by the device-tree. > In B4860 rev 2.0 (and above) MURAM size is bigger. > This is workaround, in order to have the same device tree for all B4860 > revisions. We don't support b4860 prior to rev 2.0 (due to e6500 core errata) so this is irrelevant. Fix the device tree. > > > + > > > + of_node_put(muram_node); > > > + of_node_put(fm_node); > > > + > > > + err = devm_request_irq(_dev->dev, irq, fman_irq, > > > +IRQF_NO_SUSPEND, "fman", fman); > > > + if (err < 0) { > > > + pr_err("Error: allocating irq %d (error = %d)\n", irq, err); > > > + goto fman_free; > > > + } > > > > Why IRQF_NO_SUSPEND? > > > > It shouldn't be IRQF_NO_SUSPEND for now, removed. Why just "for now"? -Scott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On Wed, 2015-10-28 at 21:13 +, Al Viro wrote: > On Wed, Oct 28, 2015 at 07:47:57AM -0700, Eric Dumazet wrote: > > On Wed, 2015-10-28 at 06:24 -0700, Eric Dumazet wrote: > > > > > Before I take a deep look at your suggestion, are you sure plain use of > > > include/linux/percpu-refcount.h infra is not possible for struct cred ? > > > > BTW, I am not convinced we need to spend so much energy and per-cpu > > memory for struct cred refcount. > > > > The big problem is fd array spinlock of course and bitmap search for > > POSIX compliance. > > > > The cache line trashing in struct cred is a minor one ;) > > percpu-refcount isn't convenient - the only such candidate for ref_kill in > there is "all other references are gone", and that can happen in > interesting locking environments. I doubt that it would be a good fit, TBH... OK then ... > Cacheline pingpong on the descriptors bitmap is probably inevitable, but > it's not the only problem in the existing implementation - close a small > descriptor when you've got a lot of them and look for the second open > after that. _That_ can lead to thousands of cachelines being read through, > all under the table spinlock. It's literally orders of magnitude worse. > And if the first open after that close happens to be for a short-living > descriptor, you'll get the same situation back in your face as soon as you > close it. > > I think we can seriously improve that without screwing the fast path by > adding "summary" bitmaps once the primary grows past the cacheline worth > of bits. With bits in the summary bitmap corresponding to cacheline-sized > chunks of the primary, being set iff all bits in the corresponding chunk > are set. If the summary map grows larger than one cacheline, add the > second-order one (that happens at quarter million descriptors and serves > until 128 million; adding the third-order map is probably worthless). > > I want to maintain the same kind of "everything below this is known to be > in use" thing as we do now. Allocation would start with looking into the > same place in primary bitmap where we'd looked now and similar search > forward for zero bit. _However_, it would stop at cacheline boundary. > If nothing had been found, we look in the corresponding place in the > summary bitmap and search for zero bit there. Again, no more than up > to the cacheline boundary. If something is found, we've got a chunk in > the primary known to contain a zero bit; if not - go to the second-level > and search there, etc. > > When a zero bit in the primary had been found, check if it's within the > rlimit (passed to __alloc_fd() explicitly) and either bugger off or set > that bit. If there are zero bits left in the same word - we are done, > otherwise check the still unread words in the cacheline and see if all > of them are ~0UL. If all of them are, set the bit in summary bitmap, etc. > > Normal case is exactly the same as now - one cacheline accessed and modified. > We might end up touching more than that, but it's going to be rare and > the cases when it happens are very likely to lead to much worse amount of > memory traffic with the current code. > > Freeing is done by zeroing the bit in primary, checking for other zero bits > nearby and buggering off if there are such. If the entire cacheline used > to be all-bits-set, clear the bit in summary and, if there's a second-order > summary, get the bit in there clear as well - it's probably not worth > bothering with checking that all the cacheline in summary bitmap had been > all-bits-set. Again, the normal case is the same as now. > > It'll need profiling and tuning, but AFAICS it's doable without making the > things worse than they are now, and it should get rid of those O(N) fetches > under spinlock cases. And yes, those are triggerable and visible in > profiles. IMO it's worth trying to fix... > Well, all this complexity goes away with a O_FD_FASTALLOC / SOCK_FD_FASTALLOC bit in various fd allocations, which specifically tells the kernel we do not care getting the lowest possible fd as POSIX mandates. With this bit set, the bitmap search can start at a random point, and we find a lot in O(1) : one cache line miss, if you have at least one free bit/slot per 512 bits (64 bytes cache line). #ifndef O_FD_FASTALLOC #define O_FD_FASTALLOC 0x4000 #endif #ifndef SOCK_FD_FASTALLOC #define SOCK_FD_FASTALLOC O_FD_FASTALLOC #endif ... // active sockets socket(AF_INET, SOCK_STREAM | SOCK_FD_FASTALLOC, 0); ... // passive sockets accept4(sockfd, ..., SOCK_FD_FASTALLOC); ... Except for legacy stuff and stdin/stdout/stderr games, I really doubt lot of applications absolutely rely on the POSIX thing... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On Wed, Oct 28, 2015 at 07:47:57AM -0700, Eric Dumazet wrote: > On Wed, 2015-10-28 at 06:24 -0700, Eric Dumazet wrote: > > > Before I take a deep look at your suggestion, are you sure plain use of > > include/linux/percpu-refcount.h infra is not possible for struct cred ? > > BTW, I am not convinced we need to spend so much energy and per-cpu > memory for struct cred refcount. > > The big problem is fd array spinlock of course and bitmap search for > POSIX compliance. > > The cache line trashing in struct cred is a minor one ;) percpu-refcount isn't convenient - the only such candidate for ref_kill in there is "all other references are gone", and that can happen in interesting locking environments. I doubt that it would be a good fit, TBH... Cacheline pingpong on the descriptors bitmap is probably inevitable, but it's not the only problem in the existing implementation - close a small descriptor when you've got a lot of them and look for the second open after that. _That_ can lead to thousands of cachelines being read through, all under the table spinlock. It's literally orders of magnitude worse. And if the first open after that close happens to be for a short-living descriptor, you'll get the same situation back in your face as soon as you close it. I think we can seriously improve that without screwing the fast path by adding "summary" bitmaps once the primary grows past the cacheline worth of bits. With bits in the summary bitmap corresponding to cacheline-sized chunks of the primary, being set iff all bits in the corresponding chunk are set. If the summary map grows larger than one cacheline, add the second-order one (that happens at quarter million descriptors and serves until 128 million; adding the third-order map is probably worthless). I want to maintain the same kind of "everything below this is known to be in use" thing as we do now. Allocation would start with looking into the same place in primary bitmap where we'd looked now and similar search forward for zero bit. _However_, it would stop at cacheline boundary. If nothing had been found, we look in the corresponding place in the summary bitmap and search for zero bit there. Again, no more than up to the cacheline boundary. If something is found, we've got a chunk in the primary known to contain a zero bit; if not - go to the second-level and search there, etc. When a zero bit in the primary had been found, check if it's within the rlimit (passed to __alloc_fd() explicitly) and either bugger off or set that bit. If there are zero bits left in the same word - we are done, otherwise check the still unread words in the cacheline and see if all of them are ~0UL. If all of them are, set the bit in summary bitmap, etc. Normal case is exactly the same as now - one cacheline accessed and modified. We might end up touching more than that, but it's going to be rare and the cases when it happens are very likely to lead to much worse amount of memory traffic with the current code. Freeing is done by zeroing the bit in primary, checking for other zero bits nearby and buggering off if there are such. If the entire cacheline used to be all-bits-set, clear the bit in summary and, if there's a second-order summary, get the bit in there clear as well - it's probably not worth bothering with checking that all the cacheline in summary bitmap had been all-bits-set. Again, the normal case is the same as now. It'll need profiling and tuning, but AFAICS it's doable without making the things worse than they are now, and it should get rid of those O(N) fetches under spinlock cases. And yes, those are triggerable and visible in profiles. IMO it's worth trying to fix... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Intel-wired-lan] [PATCH] fm10k:Fix error handling in the function fm10k_resume
-Original Message- From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On Behalf Of Nicholas Krause Sent: Saturday, October 17, 2015 9:21 AM To: Kirsher, Jeffrey TCc: linux-ker...@vger.kernel.org; intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org Subject: [Intel-wired-lan] [PATCH] fm10k:Fix error handling in the function fm10k_resume This fixes error handling to proper check if the call to the function fm10k_mbx_request_irq has failed by returning a error code and if so return immediately to the caller of fm10k_resume to properly signal a failure has occurred when accepting to resume this network Signed-off-by: Nicholas Krause --- Tested-by: Krishneil SIngh -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On Wed, Oct 28, 2015 at 02:44:28PM -0700, Eric Dumazet wrote: > Well, all this complexity goes away with a O_FD_FASTALLOC / > SOCK_FD_FASTALLOC bit in various fd allocations, which specifically > tells the kernel we do not care getting the lowest possible fd as POSIX > mandates. ... which won't do a damn thing for existing userland. > Except for legacy stuff and stdin/stdout/stderr games, I really doubt > lot of applications absolutely rely on the POSIX thing... We obviously can't turn that into default behaviour, though. BTW, what distribution do you have in mind for those random descriptors? Uniform on [0,INT_MAX] is a bad idea for obvious reasons - you'll blow the memory footprint pretty soon... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next PATCH] RDS: convert bind hash table to re-sizable hashtable
To further improve the RDS connection scalabilty on massive systems where number of sockets grows into tens of thousands of sockets, there is a need of larger bind hashtable. Pre-allocated 8K or 16K table is not very flexible in terms of memory utilisation. The rhashtable infrastructure gives us the flexibility to grow the hashtbable based on use and also comes up with inbuilt efficient bucket(chain) handling. Reviewed-by: David MillerSigned-off-by: Santosh Shilimkar Signed-off-by: Santosh Shilimkar --- net/rds/af_rds.c | 10 - net/rds/bind.c | 126 +++ net/rds/rds.h| 7 +++- 3 files changed, 57 insertions(+), 86 deletions(-) diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c index 384ea1e..b5476aeb 100644 --- a/net/rds/af_rds.c +++ b/net/rds/af_rds.c @@ -573,6 +573,7 @@ static void rds_exit(void) rds_threads_exit(); rds_stats_exit(); rds_page_exit(); + rds_bind_lock_destroy(); rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info); rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info); } @@ -582,11 +583,14 @@ static int rds_init(void) { int ret; - rds_bind_lock_init(); + ret = rds_bind_lock_init(); + if (ret) + goto out; ret = rds_conn_init(); if (ret) - goto out; + goto out_bind; + ret = rds_threads_init(); if (ret) goto out_conn; @@ -620,6 +624,8 @@ out_conn: rds_conn_exit(); rds_cong_exit(); rds_page_exit(); +out_bind: + rds_bind_lock_destroy(); out: return ret; } diff --git a/net/rds/bind.c b/net/rds/bind.c index 6192566..2b00222 100644 --- a/net/rds/bind.c +++ b/net/rds/bind.c @@ -38,54 +38,17 @@ #include #include "rds.h" -struct bind_bucket { - rwlock_tlock; - struct hlist_head head; +static struct rhashtable bind_hash_table; + +static struct rhashtable_params ht_parms = { + .nelem_hint = 768, + .key_len = sizeof(u64), + .key_offset = offsetof(struct rds_sock, rs_bound_key), + .head_offset = offsetof(struct rds_sock, rs_bound_node), + .max_size = 16384, + .min_size = 1024, }; -#define BIND_HASH_SIZE 1024 -static struct bind_bucket bind_hash_table[BIND_HASH_SIZE]; - -static struct bind_bucket *hash_to_bucket(__be32 addr, __be16 port) -{ - return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) & - (BIND_HASH_SIZE - 1)); -} - -/* must hold either read or write lock (write lock for insert != NULL) */ -static struct rds_sock *rds_bind_lookup(struct bind_bucket *bucket, - __be32 addr, __be16 port, - struct rds_sock *insert) -{ - struct rds_sock *rs; - struct hlist_head *head = >head; - u64 cmp; - u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port); - - hlist_for_each_entry(rs, head, rs_bound_node) { - cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) | - be16_to_cpu(rs->rs_bound_port); - - if (cmp == needle) { - rds_sock_addref(rs); - return rs; - } - } - - if (insert) { - /* -* make sure our addr and port are set before -* we are added to the list. -*/ - insert->rs_bound_addr = addr; - insert->rs_bound_port = port; - rds_sock_addref(insert); - - hlist_add_head(>rs_bound_node, head); - } - return NULL; -} - /* * Return the rds_sock bound at the given local address. * @@ -94,18 +57,14 @@ static struct rds_sock *rds_bind_lookup(struct bind_bucket *bucket, */ struct rds_sock *rds_find_bound(__be32 addr, __be16 port) { + u64 key = ((u64)addr << 32) | port; struct rds_sock *rs; - unsigned long flags; - struct bind_bucket *bucket = hash_to_bucket(addr, port); - read_lock_irqsave(>lock, flags); - rs = rds_bind_lookup(bucket, addr, port, NULL); - read_unlock_irqrestore(>lock, flags); - - if (rs && sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) { - rds_sock_put(rs); + rs = rhashtable_lookup_fast(_hash_table, , ht_parms); + if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) + rds_sock_addref(rs); + else rs = NULL; - } rdsdebug("returning rs %p for %pI4:%u\n", rs, , ntohs(port)); @@ -116,10 +75,9 @@ struct rds_sock *rds_find_bound(__be32 addr, __be16 port) /* returns -ve errno or +ve port */ static int rds_add_bound(struct rds_sock *rs, __be32 addr, __be16 *port) { - unsigned long flags; int ret = -EADDRINUSE;
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On Wed, 2015-10-28 at 22:33 +, Al Viro wrote: > On Wed, Oct 28, 2015 at 02:44:28PM -0700, Eric Dumazet wrote: > > > Well, all this complexity goes away with a O_FD_FASTALLOC / > > SOCK_FD_FASTALLOC bit in various fd allocations, which specifically > > tells the kernel we do not care getting the lowest possible fd as POSIX > > mandates. > > ... which won't do a damn thing for existing userland. For the userland that need +5,000,000 socket, I can tell you they are using this flag as soon they are aware it exists ;) > > > Except for legacy stuff and stdin/stdout/stderr games, I really doubt > > lot of applications absolutely rely on the POSIX thing... > > We obviously can't turn that into default behaviour, though. BTW, what > distribution do you have in mind for those random descriptors? Uniform > on [0,INT_MAX] is a bad idea for obvious reasons - you'll blow the > memory footprint pretty soon... Simply [0 , fdt->max_fds] is working well in most cases. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] e1000e: Fix msi-x interrupt automask
On 10/22/2015 05:32 PM, Benjamin Poirier wrote: Since the introduction of 82574 support in e1000e, the driver has worked on the assumption that msi-x interrupt generation is automatically disabled after each irq. As it turns out, this is not the case. Currently, rx interrupts can fire multiple times before and during napi processing. This can be a problem for users because frames that arrive in a certain window (after adapter->clean_rx() but before napi_complete_done() has cleared NAPI_STATE_SCHED) generate an interrupt which does not lead to napi_schedule(). These frames sit in the rx queue until another frame arrives (a tcp retransmit for example). While the EIAC and CTRL_EXT registers are properly configured for irq automask, the modification of IAM in e1000_configure_msix() is what prevents automask from working as intended. This patch removes that erroneous write and fixes interrupt rearming for tx and "other" interrupts. Since e1000_msix_other() reads ICR, all interrupts must be rearmed in that function. Reported-by: Frank SteinerSigned-off-by: Benjamin Poirier --- drivers/net/ethernet/intel/e1000e/netdev.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index a228167..8881256 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1921,7 +1921,8 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) no_link_interrupt: if (!test_bit(__E1000_DOWN, >state)) - ew32(IMS, E1000_IMS_LSC | E1000_IMS_OTHER); + ew32(IMS, adapter->eiac_mask | E1000_IMS_OTHER | +E1000_IMS_LSC); return IRQ_HANDLED; } I would argue your first patch probably didn't go far enough to remove dead code. Specifically you should only ever get into this function if LSC is set. There are no other causes that should trigger this. As such you could probably remove the ICR read, and instead replace it with an ICR write of the LSC bit since OTHER is already cleared via EIAC. @@ -1940,6 +1941,9 @@ static irqreturn_t e1000_intr_msix_tx(int __always_unused irq, void *data) /* Ring was not completely cleaned, so fire another interrupt */ ew32(ICS, tx_ring->ims_val); + if (!test_bit(__E1000_DOWN, >state)) + ew32(IMS, E1000_IMS_TXQ0); + return IRQ_HANDLED; } I think what you need to set here is tx_ring->ims_val, not E1000_IMS_TXQ0. @@ -2027,11 +2031,7 @@ static void e1000_configure_msix(struct e1000_adapter *adapter) /* enable MSI-X PBA support */ ctrl_ext = er32(CTRL_EXT); - ctrl_ext |= E1000_CTRL_EXT_PBA_CLR; - - /* Auto-Mask Other interrupts upon ICR read */ - ew32(IAM, ~E1000_EIAC_MASK_82574 | E1000_IMS_OTHER); - ctrl_ext |= E1000_CTRL_EXT_EIAME; + ctrl_ext |= E1000_CTRL_EXT_PBA_CLR | E1000_CTRL_EXT_EIAME; ew32(CTRL_EXT, ctrl_ext); e1e_flush(); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
On Wed, Oct 28, 2015 at 04:08:29PM -0700, Eric Dumazet wrote: > > > Except for legacy stuff and stdin/stdout/stderr games, I really doubt > > > lot of applications absolutely rely on the POSIX thing... > > > > We obviously can't turn that into default behaviour, though. BTW, what > > distribution do you have in mind for those random descriptors? Uniform > > on [0,INT_MAX] is a bad idea for obvious reasons - you'll blow the > > memory footprint pretty soon... > > Simply [0 , fdt->max_fds] is working well in most cases. Umm... So first you dup2() to establish the ->max_fds you want, then do such opens? What used/unused ratio do you expect to deal with? And what kind of locking are you going to use? Keep in mind that e.g. dup2() is dependent on the lack of allocations while it's working, so it's not as simple as "we don't need no stinkin' ->files_lock"... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch net-next 02/12] switchdev: Make flood to CPU optional
From: Ido SchimmelIn certain use cases it is not always desirable for the switch device to flood traffic to CPU port. Instead, only certain packet types (e.g. STP, LACP) should be trapped to it. Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko --- Documentation/networking/switchdev.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt index ce510e1..9199413 100644 --- a/Documentation/networking/switchdev.txt +++ b/Documentation/networking/switchdev.txt @@ -278,8 +278,8 @@ Flooding L2 domain For a given L2 VLAN domain, the switch device should flood multicast/broadcast and unknown unicast packets to all ports in domain, if allowed by port's current STP state. The switch driver, knowing which ports are within which -vlan L2 domain, can program the switch device for flooding. The packet should -also be sent to the port netdev for processing by the bridge driver. The +vlan L2 domain, can program the switch device for flooding. The packet may +be sent to the port netdev for processing by the bridge driver. The bridge should not reflood the packet to the same ports the device flooded, otherwise there will be duplicate packets on the wire. -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch net-next 05/12] mlxsw: spectrum: Add support for flood control
From: Ido SchimmelAdd or remove a bridged port from the flooding domain of unknown unicast packets according to user configuration. Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 1 + drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 + .../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 113 ++--- 3 files changed, 78 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index e30b2da..3be4a23 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -1227,6 +1227,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port) mlxsw_sp_port->local_port = local_port; mlxsw_sp_port->learning = 1; mlxsw_sp_port->learning_sync = 1; + mlxsw_sp_port->uc_flood = 1; mlxsw_sp_port->pvid = 1; mlxsw_sp_port->pcpu_stats = diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h index b4d8393..4365c8b 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h @@ -86,6 +86,7 @@ struct mlxsw_sp_port { u8 stp_state; u8 learning:1, learning_sync:1, + uc_flood:1, bridged:1; u16 pvid; /* 802.1Q bridge VLANs */ diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index c3881c9..1f3b12e 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -66,7 +66,8 @@ static int mlxsw_sp_port_attr_get(struct net_device *dev, case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS: attr->u.brport_flags = (mlxsw_sp_port->learning ? BR_LEARNING : 0) | - (mlxsw_sp_port->learning_sync ? BR_LEARNING_SYNC : 0); + (mlxsw_sp_port->learning_sync ? BR_LEARNING_SYNC : 0) | + (mlxsw_sp_port->uc_flood ? BR_FLOOD : 0); break; default: return -EOPNOTSUPP; @@ -123,15 +124,89 @@ static int mlxsw_sp_port_attr_stp_state_set(struct mlxsw_sp_port *mlxsw_sp_port, return mlxsw_sp_port_stp_state_set(mlxsw_sp_port, state); } +static int __mlxsw_sp_port_flood_set(struct mlxsw_sp_port *mlxsw_sp_port, +u16 fid_begin, u16 fid_end, bool set, +bool only_uc) +{ + struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp; + u16 range = fid_end - fid_begin + 1; + char *sftr_pl; + int err; + + sftr_pl = kmalloc(MLXSW_REG_SFTR_LEN, GFP_KERNEL); + if (!sftr_pl) + return -ENOMEM; + + mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_UC, fid_begin, + MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, range, + mlxsw_sp_port->local_port, set); + err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sftr), sftr_pl); + if (err) + goto buffer_out; + + /* Flooding control allows one to decide whether a given port will +* flood unicast traffic for which there is no FDB entry. +*/ + if (only_uc) + goto buffer_out; + + mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_BM, fid_begin, + MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, range, + mlxsw_sp_port->local_port, set); + err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sftr), sftr_pl); + +buffer_out: + kfree(sftr_pl); + return err; +} + +static int mlxsw_sp_port_uc_flood_set(struct mlxsw_sp_port *mlxsw_sp_port, + bool set) +{ + struct net_device *dev = mlxsw_sp_port->dev; + u16 vid, last_visited_vid; + int err; + + for_each_set_bit(vid, mlxsw_sp_port->active_vlans, VLAN_N_VID) { + err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid, vid, set, + true); + if (err) { + last_visited_vid = vid; + goto err_port_flood_set; + } + } + + return 0; + +err_port_flood_set: + for_each_set_bit(vid, mlxsw_sp_port->active_vlans, last_visited_vid) + __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid, vid, !set, true); + netdev_err(dev, "Failed to configure unicast flooding\n"); + return err; +} + static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port, struct switchdev_trans *trans, unsigned long
[patch net-next 04/12] mlxsw: spectrum: Add support for VLAN ranges in flooding configuration
From: Ido SchimmelWhen enabling a range of VLANs on a bridged port we can configure flooding for these VLANs by one register access instead of calling the same register for each VLAN. This is accomplished by using the 'range' field of the Switch Flooding Table Register (SFTR). Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko --- .../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 40 +++--- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index c39b7a1..c3881c9 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -248,9 +248,11 @@ static int mlxsw_sp_port_fid_unmap(struct mlxsw_sp_port *mlxsw_sp_port, u16 fid) } static int __mlxsw_sp_port_flood_set(struct mlxsw_sp_port *mlxsw_sp_port, -u16 fid, bool set, bool only_uc) +u16 fid_begin, u16 fid_end, bool set, +bool only_uc) { struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp; + u16 range = fid_end - fid_begin + 1; char *sftr_pl; int err; @@ -258,8 +260,8 @@ static int __mlxsw_sp_port_flood_set(struct mlxsw_sp_port *mlxsw_sp_port, if (!sftr_pl) return -ENOMEM; - mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_UC, fid, - MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, 0, + mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_UC, fid_begin, + MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, range, mlxsw_sp_port->local_port, set); err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sftr), sftr_pl); if (err) @@ -271,8 +273,8 @@ static int __mlxsw_sp_port_flood_set(struct mlxsw_sp_port *mlxsw_sp_port, if (only_uc) goto buffer_out; - mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_BM, fid, - MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, 0, + mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_BM, fid_begin, + MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, range, mlxsw_sp_port->local_port, set); err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sftr), sftr_pl); @@ -345,14 +347,13 @@ static int __mlxsw_sp_port_vlans_add(struct mlxsw_sp_port *mlxsw_sp_port, netdev_err(dev, "Failed to map FID=%d", vid); return err; } + } - err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid, true, - false); - if (err) { - netdev_err(dev, "Failed to set flooding for FID=%d", - vid); - return err; - } + err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid_begin, vid_end, + true, false); + if (err) { + netdev_err(dev, "Failed to configure flooding\n"); + return err; } for (vid = vid_begin; vid <= vid_end; @@ -530,15 +531,14 @@ static int __mlxsw_sp_port_vlans_del(struct mlxsw_sp_port *mlxsw_sp_port, if (init) goto out; - for (vid = vid_begin; vid <= vid_end; vid++) { - err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid, false, - false); - if (err) { - netdev_err(dev, "Failed to clear flooding for FID=%d", - vid); - return err; - } + err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid_begin, vid_end, + false, false); + if (err) { + netdev_err(dev, "Failed to clear flooding\n"); + return err; + } + for (vid = vid_begin; vid <= vid_end; vid++) { /* Remove FID mapping in case of Virtual mode */ err = mlxsw_sp_port_fid_unmap(mlxsw_sp_port, vid); if (err) { -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/25] IB/mthca, net/mlx4: remove counting semaphores
Hi Arnd, Since we want to make counting semaphores go away, Why do we want to make counting semaphores go away? completely? or just for binary use cases? I have a use case in iser target code where a counting semaphore is the best suited synchronizing mechanism. I have a single thread handling connect requests (one at a time) while connect requests are event driven and come asynchronously. This is why I use a queue and a counting semaphore to handle this situation. I'd need to rethink of a new strategy to handle this without counting semaphores and I'm not entirely sure it would be simpler. this patch replaces the semaphore counting the event-driven commands with an open-coded wait-queue, which should be an equivalent transformation of the code, although it does not make it any nicer. As far as I can tell, there is a preexisting race condition regarding the cmd->use_events flag, which is not protected by any lock. When this flag is toggled while another command is being started, that command gets stuck until the mode is toggled back. A better solution that would solve the race condition and at the same time improve the code readability would create a new locking primitive that replaces both semaphores, like static int mlx4_use_events(struct mlx4_cmd *cmd) { int ret = -EAGAIN; spin_lock(>lock); if (cmd->use_events && cmd->commands < cmd->max_commands) { cmd->commands++; ret = 1; } else if (!cmd->use_events && cmd->commands == 0) { cmd->commands = 1; ret = 0; } spin_unlock(>lock); return ret; } static bool mlx4_use_events(struct mlx4_cmd *cmd) { int ret; wait_event(cmd->events_wq, ret = __mlx4_use_events(cmd) >= 0); return ret; } Cc: Roland DreierCc: Eli Cohen Cc: Yevgeny Petrilin Cc: netdev@vger.kernel.org Cc: linux-r...@vger.kernel.org Signed-off-by: Arnd Bergmann Conflicts: drivers/net/mlx4/cmd.c drivers/net/mlx4/mlx4.h --- drivers/infiniband/hw/mthca/mthca_cmd.c | 12 drivers/infiniband/hw/mthca/mthca_dev.h | 3 ++- drivers/net/ethernet/mellanox/mlx4/cmd.c | 12 drivers/net/ethernet/mellanox/mlx4/mlx4.h | 3 ++- 4 files changed, 20 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 9d3e5c1ac60e..aad1852e8e10 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -417,7 +417,8 @@ static int mthca_cmd_wait(struct mthca_dev *dev, int err = 0; struct mthca_cmd_context *context; - down(>cmd.event_sem); + wait_event(dev->cmd.event_wait, + atomic_add_unless(>cmd.commands, -1, 0)); spin_lock(>cmd.context_lock); BUG_ON(dev->cmd.free_head < 0); @@ -459,7 +460,8 @@ out: dev->cmd.free_head = context - dev->cmd.context; spin_unlock(>cmd.context_lock); - up(>cmd.event_sem); + atomic_inc(>cmd.commands); + wake_up(>cmd.event_wait); return err; } @@ -571,7 +573,8 @@ int mthca_cmd_use_events(struct mthca_dev *dev) dev->cmd.context[dev->cmd.max_cmds - 1].next = -1; dev->cmd.free_head = 0; - sema_init(>cmd.event_sem, dev->cmd.max_cmds); + init_waitqueue_head(>cmd.event_wait); + atomic_set(>cmd.commands, dev->cmd.max_cmds); spin_lock_init(>cmd.context_lock); for (dev->cmd.token_mask = 1; @@ -597,7 +600,8 @@ void mthca_cmd_use_polling(struct mthca_dev *dev) dev->cmd.flags &= ~MTHCA_CMD_USE_EVENTS; for (i = 0; i < dev->cmd.max_cmds; ++i) - down(>cmd.event_sem); + wait_event(dev->cmd.event_wait, + atomic_add_unless(>cmd.commands, -1, 0)); kfree(dev->cmd.context); diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index 7e6a6d64ad4e..3055f5c12ac8 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -121,7 +121,8 @@ struct mthca_cmd { struct pci_pool *pool; struct mutex hcr_mutex; struct semaphore poll_sem; - struct semaphore event_sem; + wait_queue_head_t event_wait; + atomic_t commands; int max_cmds; spinlock_tcontext_lock; int free_head; diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c index 78f5a1a0b8c8..60134a4245ef 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c @@ -273,7 +273,8 @@ static int mlx4_cmd_wait(struct mlx4_dev *dev, u64 in_param, u64 *out_param, struct
[patch net-next 00/12] mlxsw: driver update
From: Jiri PirkoThis driver update mainly brings support for user to be able to setup flooding on specified port, via bridge flag. Also, there is a fix in ageing time conversion. The rest is just cosmetics. Ido Schimmel (4): switchdev: Add support for flood control switchdev: Make flood to CPU optional mlxsw: spectrum: Add support for VLAN ranges in flooding configuration mlxsw: spectrum: Add support for flood control Jiri Pirko (6): mlxsw: spectrum: move "bridged" bool to u8 flags mlxsw: reg: Fix description for reg_sfd_uc_sub_port mlxsw: reg: Fix desription typos of couple of SFN items mlxsw: reg: Avoid unnecessary line wrap for mlxsw_reg_sfd_uc_unpack mlxsw: spectrum: Fix ageing time value mlxsw: spectrum: Make mlxsw_sp_port_switchdev_ops static Or Gerlitz (2): mlxsw: Put constant on the right side of comparisons mlxsw: Put braces on all arms of branch statement Documentation/networking/switchdev.txt | 7 +- drivers/net/ethernet/mellanox/mlxsw/core.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/pci.c | 3 +- drivers/net/ethernet/mellanox/mlxsw/reg.h | 18 +-- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 5 +- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 7 +- .../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 144 + drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 2 +- net/switchdev/switchdev.c | 5 +- 9 files changed, 122 insertions(+), 73 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html