[PATCH] fec: Use gpio_set_value_cansleep()

2015-10-28 Thread Fabio Estevam
From: Fabio Estevam 

We are in a context where we can sleep, and the FEC PHY reset gpio
may be on an I2C expander. Use the cansleep() variant when
setting the GPIO value.

Based on a patch from Russell King for pci-mvebu.c.

Signed-off-by: Fabio Estevam 
---
 drivers/net/ethernet/freescale/fec_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 501e143..b2a3220 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3261,7 +3261,7 @@ static void fec_reset_phy(struct platform_device *pdev)
return;
}
msleep(msec);
-   gpio_set_value(phy_reset, 1);
+   gpio_set_value_cansleep(phy_reset, 1);
 }
 #else /* CONFIG_OF */
 static void fec_reset_phy(struct platform_device *pdev)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/2] ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues

2015-10-28 Thread Hannes Frederic Sowa
Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.

Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.

This is the second version of the patch which doesn't use the
overflow_usub function, which got reverted for now.

Suggested-by: Linus Torvalds 
Cc: Linus Torvalds 
Reported-by: Dmitry Vyukov 
Cc: Dmitry Vyukov 
Signed-off-by: Hannes Frederic Sowa 
---
 net/ipv6/ip6_output.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index d03d6da..f84ec4e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -584,6 +584,8 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb,
if (np->frag_size)
mtu = np->frag_size;
}
+   if (mtu < hlen + sizeof(struct frag_hdr) + 8)
+   goto fail_toobig;
mtu -= hlen + sizeof(struct frag_hdr);
 
frag_id = ipv6_select_ident(net, _hdr(skb)->daddr,
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/2] Revert "Merge branch 'ipv6-overflow-arith'"

2015-10-28 Thread Hannes Frederic Sowa
Linus dislikes these changes. To not hold up the net-merge let's revert
it for now and fix the bug like Linus suggested.

This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing
changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9.

Cc: Linus Torvalds 
Signed-off-by: Hannes Frederic Sowa 
---
Sorry for delaying the net pull request!

 include/linux/compiler-gcc.h   |  4 
 include/linux/overflow-arith.h | 18 --
 net/ipv6/ip6_output.c  |  6 +-
 3 files changed, 1 insertion(+), 27 deletions(-)
 delete mode 100644 include/linux/overflow-arith.h

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 82c159e..dfaa7b3 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -237,10 +237,6 @@
 #define KASAN_ABI_VERSION 3
 #endif
 
-#if GCC_VERSION >= 5
-#define CC_HAVE_BUILTIN_OVERFLOW
-#endif
-
 #endif /* gcc version >= 4 specific checks */
 
 #if !defined(__noclone)
diff --git a/include/linux/overflow-arith.h b/include/linux/overflow-arith.h
deleted file mode 100644
index e12ccf8..000
--- a/include/linux/overflow-arith.h
+++ /dev/null
@@ -1,18 +0,0 @@
-#pragma once
-
-#include 
-
-#ifdef CC_HAVE_BUILTIN_OVERFLOW
-
-#define overflow_usub __builtin_usub_overflow
-
-#else
-
-static inline bool overflow_usub(unsigned int a, unsigned int b,
-unsigned int *res)
-{
-   *res = a - b;
-   return *res > a ? true : false;
-}
-
-#endif
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 8dddb45..d03d6da 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -28,7 +28,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -585,10 +584,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb,
if (np->frag_size)
mtu = np->frag_size;
}
-
-   if (overflow_usub(mtu, hlen + sizeof(struct frag_hdr), ) ||
-   mtu <= 7)
-   goto fail_toobig;
+   mtu -= hlen + sizeof(struct frag_hdr);
 
frag_id = ipv6_select_ident(net, _hdr(skb)->daddr,
_hdr(skb)->saddr);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v6, 4/6] fsl/fman: Add FMan SP support

2015-10-28 Thread igal.liberman
From: Igal Liberman 

The Storage Profiles contain parameters that are used
by the FMan for frame reception and transmission.

Signed-off-by: Igal Liberman 
---
 drivers/net/ethernet/freescale/fman/Makefile  |2 +-
 drivers/net/ethernet/freescale/fman/fman_sp.c |  167 +
 drivers/net/ethernet/freescale/fman/fman_sp.h |  103 +++
 3 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_sp.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_sp.h

diff --git a/drivers/net/ethernet/freescale/fman/Makefile 
b/drivers/net/ethernet/freescale/fman/Makefile
index 43360d70..5141532 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -2,5 +2,5 @@ subdir-ccflags-y +=  
-I$(srctree)/drivers/net/ethernet/freescale/fman
 
 obj-y  += fsl_fman.o fsl_fman_mac.o
 
-fsl_fman-objs  := fman_muram.o fman.o
+fsl_fman-objs  := fman_muram.o fman.o fman_sp.o
 fsl_fman_mac-objs := fman_dtsec.o fman_memac.o fman_tgec.o
diff --git a/drivers/net/ethernet/freescale/fman/fman_sp.c 
b/drivers/net/ethernet/freescale/fman/fman_sp.c
new file mode 100644
index 000..f36c622
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman_sp.c
@@ -0,0 +1,167 @@
+/*
+ * Copyright 2008 - 2015 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "fman_sp.h"
+#include "fman.h"
+
+void fman_sp_set_buf_pools_in_asc_order_of_buf_sizes(struct fman_ext_pools
+*fm_ext_pools,
+u8 *ordered_array,
+u16 *sizes_array)
+{
+   u16 buf_size = 0;
+   int i = 0, j = 0, k = 0;
+
+   /* First we copy the external buffers pools information
+* to an ordered local array
+*/
+   for (i = 0; i < fm_ext_pools->num_of_pools_used; i++) {
+   /* get pool size */
+   buf_size = fm_ext_pools->ext_buf_pool[i].size;
+
+   /* keep sizes in an array according to poolId
+* for direct access
+*/
+   sizes_array[fm_ext_pools->ext_buf_pool[i].id] = buf_size;
+
+   /* save poolId in an ordered array according to size */
+   for (j = 0; j <= i; j++) {
+   /* this is the next free place in the array */
+   if (j == i)
+   ordered_array[i] =
+   fm_ext_pools->ext_buf_pool[i].id;
+   else {
+   /* find the right place for this poolId */
+   if (buf_size < sizes_array[ordered_array[j]]) {
+   /* move the pool_ids one place ahead
+* to make room for this poolId
+*/
+   for (k = i; k > j; k--)
+   ordered_array[k] =
+   

Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Eric Dumazet
On Wed, 2015-10-28 at 12:35 +, Al Viro wrote:
> [Linus and Dave added, Solaris and NetBSD folks dropped from Cc]
> 
> On Tue, Oct 27, 2015 at 05:13:56PM -0700, Eric Dumazet wrote:
> > On Tue, 2015-10-27 at 23:17 +, Al Viro wrote:
> > 
> > >   * [Linux-specific aside] our __alloc_fd() can degrade quite badly
> > > with some use patterns.  The cacheline pingpong in the bitmap is probably
> > > inevitable, unless we accept considerably heavier memory footprint,
> > > but we also have a case when alloc_fd() takes O(n) and it's _not_ hard
> > > to trigger - close(3);open(...); will have the next open() after that
> > > scanning the entire in-use bitmap.  I think I see a way to improve it
> > > without slowing the normal case down, but I'll need to experiment a
> > > bit before I post patches.  Anybody with examples of real-world loads
> > > that make our descriptor allocator to degrade is very welcome to post
> > > the reproducers...
> > 
> > Well, I do have real-world loads, but quite hard to setup in a lab :(
> > 
> > Note that we also hit the 'struct cred'->usage refcount for every
> > open()/close()/sock_alloc(), and simply moving uid/gid out of the first
> > cache line really helps, as current_fsuid() and current_fsgid() no
> > longer forces a pingpong.
> > 
> > I moved seldom used fields on the first cache line, so that overall
> > memory usage did not change (192 bytes on 64 bit arches)
> 
> [snip]
> 
> Makes sense, but there's a funny thing about that refcount - the part
> coming from ->f_cred is the most frequently changed *and* almost all
> places using ->f_cred are just looking at its fields and do not manipulate
> its refcount.  The only exception (do_process_acct()) is easy to eliminate
> just by storing a separate reference to the current creds of acct(2) caller
> and using it instead of looking at ->f_cred.  What's more, the place where we
> grab what will be ->f_cred is guaranteed to have a non-f_cred reference *and*
> most of the time such a reference is there for dropping ->f_cred (in
> file_free()/file_free_rcu()).
> 
> With that change in kernel/acct.c done, we could do the following:
>   a) split the cred refcount into the normal and percpu parts and
> add a spinlock in there.
>   b) have put_cred() do this:
>   if (atomic_dec_and_test(>usage)) {
>   this_cpu_add(>f_cred_usage, 1);
>   call_rcu(>rcu, put_f_cred_rcu);
>   }
>   c) have get_empty_filp() increment current_cred ->f_cred_usage with
> this_cpu_add()
>   d) have file_free() do
>   percpu_counter_dec(_files);
>   rcu_read_lock();
>   if (likely(atomic_read(>f_cred->usage))) {
>   this_cpu_add(>f_cred->f_cred_usage, -1);
>   rcu_read_unlock();
>   call_rcu(>f_u.fu_rcuhead, file_free_rcu_light);
>   } else {
>   rcu_read_unlock();
>   call_rcu(>f_u.fu_rcuhead, file_free_rcu);
>   }
> file_free_rcu() being
> static void file_free_rcu(struct rcu_head *head)
> {
> struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
> put_f_cred(>f_cred->rcu);
> kmem_cache_free(filp_cachep, f);
> }
> and file_free_rcu_light() - the same sans put_f_cred();
> 
> with put_f_cred() doing
>   spin_lock cred->lock
>   this_cpu_add(>f_cred_usage, -1);
>   find the sum of cred->f_cred_usage
>   spin_unlock cred->lock
>   if the sum has not reached 0
>   return
>   current put_cred_rcu(cred)
> 
> IOW, let's try to get rid of cross-cpu stores in ->f_cred grabbing and
> (most of) ->f_cred dropping.
> 
> Note that there are two paths leading to put_f_cred() in the above - via
> call_rcu() on >rcu and from file_free_rcu() called via call_rcu() on
> >f_u.fu_rcuhead.  Both are RCU-delayed and they can happen in parallel -
> different rcu_head are used.
> 
> atomic_read() check in file_free() might give false positives if it comes
> just before put_cred() on another CPU kills the last non-f_cred reference.
> It's not a problem, since put_f_cred() from that put_cred() won't be
> executed until we drop rcu_read_lock(), so we can safely decrement the
> cred->f_cred_usage without cred->lock here (and we are guaranteed that we 
> won't
> be dropping the last of that - the same put_cred() would've incremented
> ->f_cred_usage).
> 
> Does anybody see problems with that approach?  I'm going to grab some sleep
> (only a couple of hours so far tonight ;-/), will cook an incremental to 
> Eric's
> field-reordering patch when I get up...

Before I take a deep look at your suggestion, are you sure plain use of
include/linux/percpu-refcount.h infra is not possible for struct cred ?

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH] xfrm: dst_entries_init() per-net dst_ops

2015-10-28 Thread Dan Streetman
On Tue, Oct 27, 2015 at 12:15 PM,   wrote:
> From: Dan Streetman 
>
> The ipv4 and ipv6 xfrms each create a template dst_ops object, and
> perform dst_entries_init() on the template objects.  Then each net
> namespace has its net.xfrm.xfrm[46]_dst_ops field set to the template
> values.  The problem with that is the dst_ops.pcpuc_entries field is
> a percpu counter and cannot be used correctly by simply copying it to
> another object.
>
> The result of this is a very subtle bug; changes to the dst entries
> counter from one net namespace may sometimes get applied to a different
> net namespace dst entries counter.  This is because of how the percpu
> counter works; it has a main count field as well as a pointer to the
> percpu variables.  Each net namespace maintains its own main count
> variable, but all point to one set of percpu variables.  When any net
> namespace happens to change one of the percpu variables to outside its
> small batch range, its count is moved to the net namespace's main count
> variable.  So with multiple net namespaces operating concurrently, the
> dst_ops entries counter can stray from the actual value that it should
> be; if counts are consistently moved from one net namespace to another
> (which my testing showed is likely), then one net namespace winds up
> with a negative dst_ops count (which is reported as 0) while another
> winds up with a continually increasing count, eventually reaching its
> gc_thresh limit, which causes all new traffic on the net namespace to
> fail with -ENOBUFS.
>
> This removes the dst_entries_init (and dst_entries_destroy) call for
> the template dst_ops objects; their counters will never be used.
> Instead dst_entries_init is called for each net namespace's dst_ops
> object, right after copying its values from the template, and

Well I'm not sure why my test kernel booted, while the test robot
found the bug of GFP_KERNEL percpu counter alloc during atomic
context.  Thanks test robot!

I'll update the patch and resend.


> dst_entries_destroy is called when the net namespace is removed.
>
> Signed-off-by: Dan Streetman 
> Signed-off-by: Dan Streetman 
> ---
>  net/ipv4/xfrm4_policy.c |  5 +++--
>  net/ipv6/xfrm6_policy.c | 10 --
>  net/xfrm/xfrm_policy.c  | 25 +++--
>  3 files changed, 30 insertions(+), 10 deletions(-)
>
> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
> index f2606b9..5f747ee 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -235,6 +235,9 @@ static void xfrm4_dst_ifdown(struct dst_entry *dst, 
> struct net_device *dev,
> xfrm_dst_ifdown(dst, dev);
>  }
>
> +/* This is used as a template only; the dst_entries counter is not
> + * initialized for this, but must be on per-net copies of this
> + */
>  static struct dst_ops xfrm4_dst_ops = {
> .family =   AF_INET,
> .gc =   xfrm4_garbage_collect,
> @@ -325,8 +328,6 @@ static void __init xfrm4_policy_init(void)
>
>  void __init xfrm4_init(void)
>  {
> -   dst_entries_init(_dst_ops);
> -
> xfrm4_state_init();
> xfrm4_policy_init();
> xfrm4_protocol_init();
> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> index 2cc5840..b895ec1 100644
> --- a/net/ipv6/xfrm6_policy.c
> +++ b/net/ipv6/xfrm6_policy.c
> @@ -279,6 +279,9 @@ static void xfrm6_dst_ifdown(struct dst_entry *dst, 
> struct net_device *dev,
> xfrm_dst_ifdown(dst, dev);
>  }
>
> +/* This is used as a template only; the dst_entries counter is not
> + * initialized for this, but must be on per-net copies of this
> + */
>  static struct dst_ops xfrm6_dst_ops = {
> .family =   AF_INET6,
> .gc =   xfrm6_garbage_collect,
> @@ -376,13 +379,9 @@ int __init xfrm6_init(void)
>  {
> int ret;
>
> -   dst_entries_init(_dst_ops);
> -
> ret = xfrm6_policy_init();
> -   if (ret) {
> -   dst_entries_destroy(_dst_ops);
> +   if (ret)
> goto out;
> -   }
> ret = xfrm6_state_init();
> if (ret)
> goto out_policy;
> @@ -411,5 +410,4 @@ void xfrm6_fini(void)
> xfrm6_protocol_fini();
> xfrm6_policy_fini();
> xfrm6_state_fini();
> -   dst_entries_destroy(_dst_ops);
>  }
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 09bfcba..5381719 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -2896,12 +2896,32 @@ static void __net_init xfrm_dst_ops_init(struct net 
> *net)
>
> rcu_read_lock();
> afinfo = rcu_dereference(xfrm_policy_afinfo[AF_INET]);
> -   if (afinfo)
> +   if (afinfo) {
> net->xfrm.xfrm4_dst_ops = *afinfo->dst_ops;
> +   dst_entries_init(>xfrm.xfrm4_dst_ops);
> +   }
>  #if IS_ENABLED(CONFIG_IPV6)
> 

Re: [PATCH v7 02/10] ss: created formatters for json and hr

2015-10-28 Thread Matthias Tafelmeier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA384

> I did not take over maintenance responsibility (whatever that means
> to you precisely). I merely reviewed the patches, focussing on the 
> technical aspects of both implementation and patch management.

Ah, I meant the maintenance of iproute2 as a whole. Though, obviosly I
must have misconceived that.

>> Those resentments were related to the patchsets complexity and
>> size.
> 
> I didn't see any problem with that in the first place. It is indeed
> a big change, achieving something like that without a big patch set
> is unlikely.
> 

Fine, I was just repounding that since Steven Hemminger raised that.
My reasoning here is that I just don't want to kick off restarting
work whith objections still in the minds – since we are already at V7
now.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBCQAGBQJWMLgnAAoJEOAWT1uK3zQ7dMkH/jHps8no3c23LRXGnVaX08Ap
Eha6XWU9pHrCHAM2AF6XI8aKERjS00ycuC12rFKoPZC2sjSXv4PTGFJq9w8AF71K
os5PPi1iZRFQ/0tti7pMkGTmUwRrtHmdfGNKvu79oRJfADaqaNtpZV+4UiS2bPCP
jy+89mA02XXgJpNkJgG/md6wNFHEsJBUGtcx3KSWqYXHHpV2FJoN1H8P28ESVAJA
H2o1De6g7XBbSpigiHX8X69CkzjZor5cYyWF6W5lUNXhGCQ4xqmGJycNKjM3Et/g
OPXvcaRKwv2R06pSYzkQ17tsnm9u8+R/v3nQvFDJGD0+zZJsc+c2by2KTQt6qm4=
=ONCK
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface

2015-10-28 Thread David Jander
On Wed, 28 Oct 2015 10:31:17 -0200
Fabio Estevam  wrote:

> On Wed, Oct 28, 2015 at 9:19 AM, David Jander  wrote:
> 
> > Sorry, I somehow assumed it was obvious I'd report against latest
> > mainline... I'm on 4.3-rc7.
> 
> Are you able to find out a previous kernel version that does not
> exhibit this failure?

I can search further down, but 4.1 is also broken.
Are there specific changes or versions you are suspicious of?
Russel mentioned something similar being fixed in the past... any pointers to
this fix, so I can investigate whether this has any relation?

Best regards,

-- 
David Jander
Protonic Holland.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface

2015-10-28 Thread Fabio Estevam
On Wed, Oct 28, 2015 at 9:19 AM, David Jander  wrote:

> Sorry, I somehow assumed it was obvious I'd report against latest mainline...
> I'm on 4.3-rc7.

Are you able to find out a previous kernel version that does not
exhibit this failure?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v6, 6/6] fsl/fman: Add FMan MAC driver

2015-10-28 Thread igal.liberman
From: Igal Liberman 

This patch adds the Ethernet MAC driver supporting the three
different types of MACs: dTSEC, tGEC and mEMAC.

Signed-off-by: Igal Liberman 
---
 drivers/net/ethernet/freescale/fman/Makefile |3 +-
 drivers/net/ethernet/freescale/fman/mac.c|  980 ++
 drivers/net/ethernet/freescale/fman/mac.h|   97 +++
 3 files changed, 1079 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/mac.c
 create mode 100644 drivers/net/ethernet/freescale/fman/mac.h

diff --git a/drivers/net/ethernet/freescale/fman/Makefile 
b/drivers/net/ethernet/freescale/fman/Makefile
index 2eb0b9b..51fd2e6 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -1,6 +1,7 @@
 subdir-ccflags-y +=  -I$(srctree)/drivers/net/ethernet/freescale/fman
 
-obj-y  += fsl_fman.o fsl_fman_mac.o
+obj-y  += fsl_fman.o fsl_fman_mac.o fsl_mac.o
 
 fsl_fman-objs  := fman_muram.o fman.o fman_sp.o fman_port.o
 fsl_fman_mac-objs := fman_dtsec.o fman_memac.o fman_tgec.o
+fsl_mac-objs += mac.o
diff --git a/drivers/net/ethernet/freescale/fman/mac.c 
b/drivers/net/ethernet/freescale/fman/mac.c
new file mode 100644
index 000..17a5a5c
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/mac.c
@@ -0,0 +1,980 @@
+/* Copyright 2008-2015 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "mac.h"
+#include "fman_mac.h"
+#include "fman_dtsec.h"
+#include "fman_tgec.h"
+#include "fman_memac.h"
+
+#define MAC_DESCRIPTION "FSL FMan MAC API based driver"
+
+MODULE_LICENSE("Dual BSD/GPL");
+
+MODULE_AUTHOR("Emil Medve ");
+
+MODULE_DESCRIPTION(MAC_DESCRIPTION);
+
+struct mac_priv_s {
+   struct device   *dev;
+   void __iomem*vaddr;
+   u8  cell_index;
+   phy_interface_t phy_if;
+   struct fman *fman;
+   struct device_node  *phy_node;
+   /* List of multicast addresses */
+   struct list_headmc_addr_list;
+   struct platform_device  *eth_dev;
+   struct fixed_phy_status *fixed_link;
+   u16 speed;
+   u16 max_speed;
+
+   int (*enable)(struct fman_mac *mac_dev, enum comm_mode mode);
+   int (*disable)(struct fman_mac *mac_dev, enum comm_mode mode);
+};
+
+struct mac_address {
+   u8 addr[ETH_ALEN];
+   struct list_head list;
+};
+
+static void mac_exception(void *_mac_dev, enum fman_mac_exceptions ex)
+{
+   struct mac_device   *mac_dev;
+   struct mac_priv_s   *priv;
+
+   mac_dev = (struct mac_device *)_mac_dev;
+   priv = mac_dev->priv;
+
+   if (ex == FM_MAC_EX_10G_RX_FIFO_OVFL) {
+   /* don't flag RX FIFO after the first */
+   mac_dev->set_exception(mac_dev->fman_mac,
+  

Re: [RFC PATCH net-next 1/4] perf tools: Enable pre-event inherit setting by config terms

2015-10-28 Thread Jiri Olsa
On Wed, Oct 28, 2015 at 10:55:02AM +, Wang Nan wrote:

SNIP

> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index f820906..397fb4e 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -653,6 +653,15 @@ static void apply_config_terms(struct perf_evsel *evsel,
>   case PERF_EVSEL__CONFIG_TERM_STACK_USER:
>   dump_size = term->val.stack_user;
>   break;
> + case PERF_EVSEL__CONFIG_TERM_INHERIT:
> + /*
> +  * attr->inherit should has already been set by
> +  * perf_evsel__config. If user explicitly set
> +  * inherit using config terms, override global
> +  * opt->no_inherit setting.
> +  */
> + attr->inherit = term->val.inherit ? 1 : 0;
> + break;
>   default:
>   break;
>   }
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 9a95e73..e402f83 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -43,6 +43,7 @@ enum {
>   PERF_EVSEL__CONFIG_TERM_TIME,
>   PERF_EVSEL__CONFIG_TERM_CALLGRAPH,
>   PERF_EVSEL__CONFIG_TERM_STACK_USER,
> + PERF_EVSEL__CONFIG_TERM_INHERIT,
>   PERF_EVSEL__CONFIG_TERM_MAX,
>  };
>  
> @@ -55,6 +56,7 @@ struct perf_evsel_config_term {
>   booltime;
>   char*callgraph;
>   u64 stack_user;
> + u64 inherit;

seems like bool would be enough

jirka
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH 07/10] net: wireless: iwlegacy: Remove unneeded variable ret

2015-10-28 Thread Sergei Shtylyov

Hello.

On 10/27/2015 10:02 PM, Punit Vara wrote:


This patch is to the 3945-mac.c file that fixes up following warning
by coccicheck:

drivers/net/wireless/iwlegacy/3945-mac.c:247:5-8: Unneeded variable:
"ret". Return "- EOPNOTSUPP" on line 249

Return -EOPNOTSUPP directly instead of return using ret

Signed-off-by: Punit Vara 
---
  drivers/net/wireless/iwlegacy/3945-mac.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/wireless/iwlegacy/3945-mac.c 
b/drivers/net/wireless/iwlegacy/3945-mac.c
index af1b3e6..ff4dc44 100644
--- a/drivers/net/wireless/iwlegacy/3945-mac.c
+++ b/drivers/net/wireless/iwlegacy/3945-mac.c
@@ -244,9 +244,7 @@ il3945_set_dynamic_key(struct il_priv *il, struct 
ieee80211_key_conf *keyconf,
  static int
  il3945_remove_static_key(struct il_priv *il)
  {
-   int ret = -EOPNOTSUPP;
-
-   return ret;
+   return -EOPNOTSUPP;
  }

  static int
@@ -529,7 +527,6 @@ il3945_tx_skb(struct il_priv *il,
if (unlikely(tid >= MAX_TID_COUNT))
goto drop;
}
-


   Unrelated white space change.

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm: dst_entries_init() per-net dst_ops

2015-10-28 Thread David Miller
From: Dan Streetman 
Date: Wed, 28 Oct 2015 09:32:47 -0400

> Well I'm not sure why my test kernel booted, while the test robot
> found the bug of GFP_KERNEL percpu counter alloc during atomic
> context.  Thanks test robot!

It's because of the kernel config options you (don't) have
enabled.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [GIT] Networking

2015-10-28 Thread David Miller
From: Linus Torvalds 
Date: Wed, 28 Oct 2015 18:39:56 +0900

> Get rid of it. And I don't *ever* want to see that shit again.

No problem, I'll revert it all.

I asked Hannes to repost his patches to linux-kernel hoping someone
would review and say it stunk or not, give him some feedback, or
whatever, and nobody reviewed the changes at all...

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] powerpc32: checksum_wrappers_64 becomes checksum_wrappers

2015-10-28 Thread Anton Blanchard
Hi Scott,

> I wonder why it was 64-bit specific in the first place.

I think it was part of a series where I added my 64bit assembly checksum
routines, and I didn't step back and think that the wrapper code would
be useful on 32 bit.

Anton
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH 07/10] net: wireless: iwlegacy: Remove unneeded variable ret

2015-10-28 Thread Sergei Shtylyov

On 10/28/2015 4:35 PM, Sergei Shtylyov wrote:


This patch is to the 3945-mac.c file that fixes up following warning
by coccicheck:

drivers/net/wireless/iwlegacy/3945-mac.c:247:5-8: Unneeded variable:
"ret". Return "- EOPNOTSUPP" on line 249

Return -EOPNOTSUPP directly instead of return using ret

Signed-off-by: Punit Vara 
---
  drivers/net/wireless/iwlegacy/3945-mac.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/wireless/iwlegacy/3945-mac.c
b/drivers/net/wireless/iwlegacy/3945-mac.c
index af1b3e6..ff4dc44 100644
--- a/drivers/net/wireless/iwlegacy/3945-mac.c
+++ b/drivers/net/wireless/iwlegacy/3945-mac.c
@@ -244,9 +244,7 @@ il3945_set_dynamic_key(struct il_priv *il, struct
ieee80211_key_conf *keyconf,
  static int
  il3945_remove_static_key(struct il_priv *il)
  {
-int ret = -EOPNOTSUPP;
-
-return ret;
+return -EOPNOTSUPP;
  }

  static int
@@ -529,7 +527,6 @@ il3945_tx_skb(struct il_priv *il,
  if (unlikely(tid >= MAX_TID_COUNT))
  goto drop;
  }
-


Unrelated white space change.


   And I've already complained about it! Please remove this hunk.

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v6, 1/6] fsl/fman: Add FMan MURAM support

2015-10-28 Thread igal.liberman
From: Igal Liberman 

Add Frame Manager Multi-User RAM support.
This internal FMan memory block is used by the
FMan hardware modules, the management being made
through the generic allocator.

The FMan Internal memory, for example, is used for
allocating transmit and receive FIFOs.

Signed-off-by: Igal Liberman 
---
 drivers/net/ethernet/freescale/Kconfig   |1 +
 drivers/net/ethernet/freescale/Makefile  |2 +
 drivers/net/ethernet/freescale/fman/Kconfig  |8 ++
 drivers/net/ethernet/freescale/fman/Makefile |5 +
 drivers/net/ethernet/freescale/fman/fman_muram.c |  159 ++
 drivers/net/ethernet/freescale/fman/fman_muram.h |   51 +++
 6 files changed, 226 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/fman/Kconfig
 create mode 100644 drivers/net/ethernet/freescale/fman/Makefile
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_muram.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_muram.h

diff --git a/drivers/net/ethernet/freescale/Kconfig 
b/drivers/net/ethernet/freescale/Kconfig
index ff76d4e..f3f89cc 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -53,6 +53,7 @@ config FEC_MPC52xx_MDIO
  If compiled as module, it will be called fec_mpc52xx_phy.
 
 source "drivers/net/ethernet/freescale/fs_enet/Kconfig"
+source "drivers/net/ethernet/freescale/fman/Kconfig"
 
 config FSL_PQ_MDIO
tristate "Freescale PQ MDIO"
diff --git a/drivers/net/ethernet/freescale/Makefile 
b/drivers/net/ethernet/freescale/Makefile
index 71debd1..4097c58 100644
--- a/drivers/net/ethernet/freescale/Makefile
+++ b/drivers/net/ethernet/freescale/Makefile
@@ -17,3 +17,5 @@ gianfar_driver-objs := gianfar.o \
gianfar_ethtool.o
 obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o
 ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o
+
+obj-$(CONFIG_FSL_FMAN) += fman/
diff --git a/drivers/net/ethernet/freescale/fman/Kconfig 
b/drivers/net/ethernet/freescale/fman/Kconfig
new file mode 100644
index 000..66b7296
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/Kconfig
@@ -0,0 +1,8 @@
+config FSL_FMAN
+   bool "FMan support"
+   depends on FSL_SOC || COMPILE_TEST
+   select GENERIC_ALLOCATOR
+   default n
+   help
+   Freescale Data-Path Acceleration Architecture Frame Manager
+   (FMan) support
diff --git a/drivers/net/ethernet/freescale/fman/Makefile 
b/drivers/net/ethernet/freescale/fman/Makefile
new file mode 100644
index 000..fc2e194
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -0,0 +1,5 @@
+subdir-ccflags-y +=  -I$(srctree)/drivers/net/ethernet/freescale/fman
+
+obj-y  += fsl_fman.o
+
+fsl_fman-objs  := fman_muram.o
diff --git a/drivers/net/ethernet/freescale/fman/fman_muram.c 
b/drivers/net/ethernet/freescale/fman/fman_muram.c
new file mode 100644
index 000..35d4a50
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman_muram.c
@@ -0,0 +1,159 @@
+/*
+ * Copyright 2008-2015 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "fman_muram.h"
+
+#include 
+#include 
+#include 

[v6, 3/6] fsl/fman: Add FMan MAC support

2015-10-28 Thread igal.liberman
From: Igal Liberman 

Add the Data Path Acceleration Architecture Frame Manger MAC support.
This patch adds The FMan MAC configuration, initialization and
runtime control routines.
This patch contains support for these types of MACs:
- dTSEC: Three speed Ethernet controller (10/100/1000 Mbps)
- tGEC: 10G Ethernet controller (10 Gbps)
- mEMAC: Multi-rate Ethernet MAC (10/100/1000/1 Mbps)
Different FMan revisions have different type and number of MACs.

Signed-off-by: Igal Liberman 
---
 drivers/net/ethernet/freescale/fman/Makefile   |3 +-
 .../net/ethernet/freescale/fman/crc_mac_addr_ext.h |  314 
 drivers/net/ethernet/freescale/fman/fman_dtsec.c   | 1608 
 drivers/net/ethernet/freescale/fman/fman_dtsec.h   |   59 +
 drivers/net/ethernet/freescale/fman/fman_mac.h |  276 
 drivers/net/ethernet/freescale/fman/fman_memac.c   | 1307 
 drivers/net/ethernet/freescale/fman/fman_memac.h   |   60 +
 drivers/net/ethernet/freescale/fman/fman_tgec.c|  798 ++
 drivers/net/ethernet/freescale/fman/fman_tgec.h|   55 +
 9 files changed, 4479 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_dtsec.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_dtsec.h
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_mac.h
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_memac.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_memac.h
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_tgec.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_tgec.h

diff --git a/drivers/net/ethernet/freescale/fman/Makefile 
b/drivers/net/ethernet/freescale/fman/Makefile
index fb5a7f0..43360d70 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -1,5 +1,6 @@
 subdir-ccflags-y +=  -I$(srctree)/drivers/net/ethernet/freescale/fman
 
-obj-y  += fsl_fman.o
+obj-y  += fsl_fman.o fsl_fman_mac.o
 
 fsl_fman-objs  := fman_muram.o fman.o
+fsl_fman_mac-objs := fman_dtsec.o fman_memac.o fman_tgec.o
diff --git a/drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h 
b/drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h
new file mode 100644
index 000..92f2e87
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h
@@ -0,0 +1,314 @@
+/*
+ * Copyright 2008-2015 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* Define a macro that calculate the crc value of an Ethernet MAC address
+ * (48 bitd address)
+ */
+
+#ifndef __crc_mac_addr_ext_h
+#define __crc_mac_addr_ext_h
+
+#include 
+
+static u32 crc_table[256] = {
+   0x,
+   0x77073096,
+   0xee0e612c,
+   0x990951ba,
+   0x076dc419,
+   0x706af48f,
+   0xe963a535,
+   0x9e6495a3,
+   0x0edb8832,
+   0x79dcb8a4,
+   0xe0d5e91e,
+   0x97d2d988,
+   0x09b64c2b,
+   0x7eb17cbd,
+   0xe7b82d07,
+   0x90bf1d91,
+   0x1db71064,
+   0x6ab020f2,
+   0xf3b97148,
+   

[v6, 2/6] fsl/fman: Add FMan support

2015-10-28 Thread igal.liberman
From: Igal Liberman 

Add the Data Path Acceleration Architecture Frame Manger Driver.
The FMan embeds a series of hardware blocks that implement a group
of Ethernet interfaces. This patch adds The FMan configuration,
initialization and runtime control routines.

The FMan driver supports several hardware versions
differentiated by things like:
- Different type of MACs
- Number of MAC and ports
- Available resources
- Different hardware errata

Signed-off-by: Igal Liberman 
---
 drivers/net/ethernet/freescale/fman/Makefile |2 +-
 drivers/net/ethernet/freescale/fman/fman.c   | 2896 ++
 drivers/net/ethernet/freescale/fman/fman.h   |  329 +++
 3 files changed, 3226 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman.h

diff --git a/drivers/net/ethernet/freescale/fman/Makefile 
b/drivers/net/ethernet/freescale/fman/Makefile
index fc2e194..fb5a7f0 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -2,4 +2,4 @@ subdir-ccflags-y +=  
-I$(srctree)/drivers/net/ethernet/freescale/fman
 
 obj-y  += fsl_fman.o
 
-fsl_fman-objs  := fman_muram.o
+fsl_fman-objs  := fman_muram.o fman.o
diff --git a/drivers/net/ethernet/freescale/fman/fman.c 
b/drivers/net/ethernet/freescale/fman/fman.c
new file mode 100644
index 000..c8923c68
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -0,0 +1,2896 @@
+/*
+ * Copyright 2008-2015 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "fman.h"
+#include "fman_muram.h"
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* General defines */
+#define FMAN_LIODN_TBL 64  /* size of LIODN table */
+#define MAX_NUM_OF_MACS10
+#define FM_NUM_OF_FMAN_CTRL_EVENT_REGS 4
+#define BASE_RX_PORTID 0x08
+#define BASE_TX_PORTID 0x28
+
+/* Modules registers offsets */
+#define BMI_OFFSET 0x0008
+#define QMI_OFFSET 0x00080400
+#define DMA_OFFSET 0x000C2000
+#define FPM_OFFSET 0x000C3000
+#define IMEM_OFFSET0x000C4000
+#define CGP_OFFSET 0x000DB000
+
+/* Exceptions bit map */
+#define EX_DMA_BUS_ERROR   0x8000
+#define EX_DMA_READ_ECC0x4000
+#define EX_DMA_SYSTEM_WRITE_ECC0x2000
+#define EX_DMA_FM_WRITE_ECC0x1000
+#define EX_FPM_STALL_ON_TASKS  0x0800
+#define EX_FPM_SINGLE_ECC  0x0400
+#define EX_FPM_DOUBLE_ECC  0x0200
+#define EX_QMI_SINGLE_ECC  0x0100
+#define EX_QMI_DEQ_FROM_UNKNOWN_PORTID 0x0080
+#define EX_QMI_DOUBLE_ECC  0x0040
+#define EX_BMI_LIST_RAM_ECC0x0020
+#define EX_BMI_STORAGE_PROFILE_ECC 0x0010
+#define EX_BMI_STATISTICS_RAM_ECC  0x0008
+#define EX_IRAM_ECC0x0004
+#define EX_MURAM_ECC  

Re: [RFC PATCH net-next 2/4] perf tools: Introduce bpf-output event

2015-10-28 Thread Sergei Shtylyov

Hello.

On 10/28/2015 1:55 PM, Wang Nan wrote:


Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce
bpf_perf_event_output() helper) add a helper to enable BPF program


   You haven't run the patch thru scripts/checkpath.pl, I guess? It now 
enforces the certain style of citing a commit.



output data to perf ring buffer through a new type of perf event
PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf
event of that type. Now perf user can use following cmdline to
receive output data from BPF programs:

  # perf record -a -e evt=bpf-output/no-inherit/ \
   -e ./test_bpf_output.c/maps.bpf-output.event=evt/ ls

  # perf script
perf 12927 [004] 355971.129276:  0 evt=bpf-output/no-inherit/:  
811ed5f1 sys_write
perf 12927 [004] 355971.129279:  0 evt=bpf-output/no-inherit/:  
811ed5f1 sys_write
...

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Brendan Gregg 
Cc: David S. Miller 
---
  tools/perf/util/evsel.c| 6 ++
  tools/perf/util/parse-events.c | 4 
  tools/perf/util/parse-events.l | 1 +
  3 files changed, 11 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 397fb4e..f01defb 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -224,6 +224,12 @@ struct perf_evsel *perf_evsel__new_idx(struct 
perf_event_attr *attr, int idx)
if (evsel != NULL)
perf_evsel__init(evsel, attr, idx);

+   if ((evsel->attr.type == PERF_TYPE_SOFTWARE) &&
+   (evsel->attr.config == PERF_COUNT_SW_BPF_OUTPUT)) {


   Inner parens not necessary here.

[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] amd-xgbe: Fix race between access of desc and desc index

2015-10-28 Thread Tom Lendacky

On 10/27/2015 09:50 PM, David Miller wrote:

From: Tom Lendacky 
Date: Mon, 26 Oct 2015 17:13:54 -0500


During Tx cleanup it's still possible for the descriptor data to be
read ahead of the descriptor index. A memory barrier is required between
the read of the descriptor index and the start of the Tx cleanup loop.
This allows a change to a lighter-weight barrier in the Tx transmit
routine just before updating the current descriptor index.

Since the memory barrier does result in extra overhead on arm64, keep
the previous change to not chase the current descriptor value. This
prevents the execution of the barrier for each loop performed.

Suggested-by: Alexander Duyck 
Signed-off-by: Tom Lendacky 


Applied, thanks.



Thanks David.  Could you queue this up for the 4.1 and 4.2 stable
trees?

Thanks,
Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface

2015-10-28 Thread Russell King - ARM Linux
On Wed, Oct 28, 2015 at 10:48:54AM +0100, David Jander wrote:
> 
> Hi all,
> 
> I was unable to figure out who's maintaining
> drivers/net/ethernet/freescale/fec_main.c, so I hope someone can help out on
> this list...
> 
> We have a board with a RMII phy connected to an i.MX6S. The hardware seems to
> be ok, since I can receive and transmit ethernet frames without drops or
> errors. However only simple things like ping and dhcp seemed to work. TCP/IP
> connections could not be made. When looking at both ends with tcpdump, I
> realized that all transmitted packages arrived at the other end with the TCP
> and IP header checksums zeroed-out.
> 
> After issuing the following command, TCP/IP started working correctly:
> 
> $ ethtool --offload eth0 tx off
> 
> This works around the issue. For some reason, when the FEC is in RMII mode, it
> isn't filling in the checksums.
> 
> On another board with an RGMII phy the same kernel works fine without the need
> to disable offloading. What can possibly relate this functionality to the
> choice of MAC interface?

You don't mention which kernel version you're using.  There has been a bug
here with older kernels...

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface

2015-10-28 Thread David Jander
On Wed, 28 Oct 2015 11:14:14 +
Russell King - ARM Linux  wrote:

> On Wed, Oct 28, 2015 at 10:48:54AM +0100, David Jander wrote:
> > 
> > Hi all,
> > 
> > I was unable to figure out who's maintaining
> > drivers/net/ethernet/freescale/fec_main.c, so I hope someone can help out
> > on this list...
> > 
> > We have a board with a RMII phy connected to an i.MX6S. The hardware seems
> > to be ok, since I can receive and transmit ethernet frames without drops or
> > errors. However only simple things like ping and dhcp seemed to work.
> > TCP/IP connections could not be made. When looking at both ends with
> > tcpdump, I realized that all transmitted packages arrived at the other end
> > with the TCP and IP header checksums zeroed-out.
> > 
> > After issuing the following command, TCP/IP started working correctly:
> > 
> > $ ethtool --offload eth0 tx off
> > 
> > This works around the issue. For some reason, when the FEC is in RMII
> > mode, it isn't filling in the checksums.
> > 
> > On another board with an RGMII phy the same kernel works fine without the
> > need to disable offloading. What can possibly relate this functionality to
> > the choice of MAC interface?
> 
> You don't mention which kernel version you're using.  There has been a bug
> here with older kernels...

Sorry, I somehow assumed it was obvious I'd report against latest mainline...
I'm on 4.3-rc7.

Best regards,

-- 
David Jander
Protonic Holland.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Al Viro
[Linus and Dave added, Solaris and NetBSD folks dropped from Cc]

On Tue, Oct 27, 2015 at 05:13:56PM -0700, Eric Dumazet wrote:
> On Tue, 2015-10-27 at 23:17 +, Al Viro wrote:
> 
> > * [Linux-specific aside] our __alloc_fd() can degrade quite badly
> > with some use patterns.  The cacheline pingpong in the bitmap is probably
> > inevitable, unless we accept considerably heavier memory footprint,
> > but we also have a case when alloc_fd() takes O(n) and it's _not_ hard
> > to trigger - close(3);open(...); will have the next open() after that
> > scanning the entire in-use bitmap.  I think I see a way to improve it
> > without slowing the normal case down, but I'll need to experiment a
> > bit before I post patches.  Anybody with examples of real-world loads
> > that make our descriptor allocator to degrade is very welcome to post
> > the reproducers...
> 
> Well, I do have real-world loads, but quite hard to setup in a lab :(
> 
> Note that we also hit the 'struct cred'->usage refcount for every
> open()/close()/sock_alloc(), and simply moving uid/gid out of the first
> cache line really helps, as current_fsuid() and current_fsgid() no
> longer forces a pingpong.
> 
> I moved seldom used fields on the first cache line, so that overall
> memory usage did not change (192 bytes on 64 bit arches)

[snip]

Makes sense, but there's a funny thing about that refcount - the part
coming from ->f_cred is the most frequently changed *and* almost all
places using ->f_cred are just looking at its fields and do not manipulate
its refcount.  The only exception (do_process_acct()) is easy to eliminate
just by storing a separate reference to the current creds of acct(2) caller
and using it instead of looking at ->f_cred.  What's more, the place where we
grab what will be ->f_cred is guaranteed to have a non-f_cred reference *and*
most of the time such a reference is there for dropping ->f_cred (in
file_free()/file_free_rcu()).

With that change in kernel/acct.c done, we could do the following:
a) split the cred refcount into the normal and percpu parts and
add a spinlock in there.
b) have put_cred() do this:
if (atomic_dec_and_test(>usage)) {
this_cpu_add(>f_cred_usage, 1);
call_rcu(>rcu, put_f_cred_rcu);
}
c) have get_empty_filp() increment current_cred ->f_cred_usage with
this_cpu_add()
d) have file_free() do
percpu_counter_dec(_files);
rcu_read_lock();
if (likely(atomic_read(>f_cred->usage))) {
this_cpu_add(>f_cred->f_cred_usage, -1);
rcu_read_unlock();
call_rcu(>f_u.fu_rcuhead, file_free_rcu_light);
} else {
rcu_read_unlock();
call_rcu(>f_u.fu_rcuhead, file_free_rcu);
}
file_free_rcu() being
static void file_free_rcu(struct rcu_head *head)
{
struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
put_f_cred(>f_cred->rcu);
kmem_cache_free(filp_cachep, f);
}
and file_free_rcu_light() - the same sans put_f_cred();

with put_f_cred() doing
spin_lock cred->lock
this_cpu_add(>f_cred_usage, -1);
find the sum of cred->f_cred_usage
spin_unlock cred->lock
if the sum has not reached 0
return
current put_cred_rcu(cred)

IOW, let's try to get rid of cross-cpu stores in ->f_cred grabbing and
(most of) ->f_cred dropping.

Note that there are two paths leading to put_f_cred() in the above - via
call_rcu() on >rcu and from file_free_rcu() called via call_rcu() on
>f_u.fu_rcuhead.  Both are RCU-delayed and they can happen in parallel -
different rcu_head are used.

atomic_read() check in file_free() might give false positives if it comes
just before put_cred() on another CPU kills the last non-f_cred reference.
It's not a problem, since put_f_cred() from that put_cred() won't be
executed until we drop rcu_read_lock(), so we can safely decrement the
cred->f_cred_usage without cred->lock here (and we are guaranteed that we won't
be dropping the last of that - the same put_cred() would've incremented
->f_cred_usage).

Does anybody see problems with that approach?  I'm going to grab some sleep
(only a couple of hours so far tonight ;-/), will cook an incremental to Eric's
field-reordering patch when I get up...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v6, 0/6] Freescale DPAA FMan

2015-10-28 Thread igal.liberman
From: Igal Liberman 

The Freescale Data Path Acceleration Architecture (DPAA) is a set
of hardware components on specific QorIQ multicore processors.
This architecture provides the infrastructure to support
simplified sharing of networking interfaces and accelerators
by multiple CPU cores and the accelerators.

One of the DPAA accelerators is the Frame Manager (FMan)
which contains a series of hardware blocks: ports, Ethernet MACs,
a multi user RAM (MURAM) and Storage Profile (SP).

This patch set introduce the FMan drivers.
Each driver configures and initializes the corresponding
FMan hardware module (described above).
The MAC driver offers support for three different
types of MACs (eTSEC, TGEC, MEMAC).

v5 --> v6:
- Addressed feedback from Scott:
- Moved kernel doc to source files
- Removed a series of configurable settings
- Miscellaneous code updates

v4 --> v5:
- Addressed feedback from David Miller:
- Removed driver layering
- Reduce namespace pollution
- Reduce code complexity and size

v3 --> v4:
- Remove device_initcall call in driver registration (redundant)
- Remove hot/cold labels
- Minor update in FMan Clock read from device-tree
- Update fixed-link support
- Addressed feedback from Stephen Hemminger
- Remove bogus blank line

v2 --> v3:
- Addressed feedback from Scott:
- Remove typedefs
- Remove unnecessary memory barriers
- Remove unnecessary casting
- Remove KConfig options
- Remove early_params
- Remove Hungarian notation
- Remove __packed__  attribute and padding from structures
- Remove unlikely attribute (where it's not needed)
- Use proper error codes and remove unnecessary prints
- Use proper values for sleep routines
- Replace complex Macros with functions
- Improve device tree processing code
- Use symbolic defines
- Add time-out in busy-wait loops
- Removed exit code (loadable module support will be added 
later)
- Fixed "fixed-link" issue raised by Joakim Tjernlund

v1 --> v2:
- Addressed feedback from Paul Bolle:
- General feedback of FMan Driver layer
- Remove Errata defines
- Aligned comments to Kernel Doc
- Remove Loadable Module support (not yet supported)
- Removed not needed KConfig dependencies
- Addressed feedback from Scott Wood
- Use Kernel ioread/iowrite services
- Squash FLIB source and header patches together

This submission is based on the prior Freescale DPAA FMan V3,RFC submission.
Several issues addresses in this submission:
- Reduced MAC layering and complexity
- Reduced code base
- T1024/T2080 10G best effort support

Igal Liberman (6):
  fsl/fman: Add FMan MURAM support
  fsl/fman: Add FMan support
  fsl/fman: Add FMan MAC support
  fsl/fman: Add FMan SP support
  fsl/fman: Add FMan Port Support
  fsl/fman: Add FMan MAC driver

 drivers/net/ethernet/freescale/Kconfig |1 +
 drivers/net/ethernet/freescale/Makefile|2 +
 drivers/net/ethernet/freescale/fman/Kconfig|8 +
 drivers/net/ethernet/freescale/fman/Makefile   |7 +
 .../net/ethernet/freescale/fman/crc_mac_addr_ext.h |  314 +++
 drivers/net/ethernet/freescale/fman/fman.c | 2896 
 drivers/net/ethernet/freescale/fman/fman.h |  329 +++
 drivers/net/ethernet/freescale/fman/fman_dtsec.c   | 1608 +++
 drivers/net/ethernet/freescale/fman/fman_dtsec.h   |   59 +
 drivers/net/ethernet/freescale/fman/fman_mac.h |  276 ++
 drivers/net/ethernet/freescale/fman/fman_memac.c   | 1307 +
 drivers/net/ethernet/freescale/fman/fman_memac.h   |   60 +
 drivers/net/ethernet/freescale/fman/fman_muram.c   |  159 ++
 drivers/net/ethernet/freescale/fman/fman_muram.h   |   51 +
 drivers/net/ethernet/freescale/fman/fman_port.c| 1800 
 drivers/net/ethernet/freescale/fman/fman_port.h|  151 +
 drivers/net/ethernet/freescale/fman/fman_sp.c  |  167 ++
 drivers/net/ethernet/freescale/fman/fman_sp.h  |  103 +
 drivers/net/ethernet/freescale/fman/fman_tgec.c|  798 ++
 drivers/net/ethernet/freescale/fman/fman_tgec.h|   55 +
 drivers/net/ethernet/freescale/fman/mac.c  |  980 +++
 drivers/net/ethernet/freescale/fman/mac.h  |   97 +
 22 files changed, 11228 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/fman/Kconfig
 create mode 100644 drivers/net/ethernet/freescale/fman/Makefile
 create mode 100644 drivers/net/ethernet/freescale/fman/crc_mac_addr_ext.h
 create mode 

[v6, 5/6] fsl/fman: Add FMan Port Support

2015-10-28 Thread igal.liberman
From: Igal Liberman 

Add the Data Path Acceleration Architecture Frame Manger Port Driver.
The FMan driver uses a module called "Port" to represent the physical
TX and RX ports.
Each FMan version has different number of physical ports.
This patch adds The FMan Port configuration, initialization and
runtime control routines for both TX and RX.

Signed-off-by: Igal Liberman 
---
 drivers/net/ethernet/freescale/fman/Makefile|2 +-
 drivers/net/ethernet/freescale/fman/fman_port.c | 1800 +++
 drivers/net/ethernet/freescale/fman/fman_port.h |  151 ++
 3 files changed, 1952 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_port.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_port.h

diff --git a/drivers/net/ethernet/freescale/fman/Makefile 
b/drivers/net/ethernet/freescale/fman/Makefile
index 5141532..2eb0b9b 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -2,5 +2,5 @@ subdir-ccflags-y +=  
-I$(srctree)/drivers/net/ethernet/freescale/fman
 
 obj-y  += fsl_fman.o fsl_fman_mac.o
 
-fsl_fman-objs  := fman_muram.o fman.o fman_sp.o
+fsl_fman-objs  := fman_muram.o fman.o fman_sp.o fman_port.o
 fsl_fman_mac-objs := fman_dtsec.o fman_memac.o fman_tgec.o
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c 
b/drivers/net/ethernet/freescale/fman/fman_port.c
new file mode 100644
index 000..462f83d
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -0,0 +1,1800 @@
+/*
+ * Copyright 2008 - 2015 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "fman_port.h"
+#include "fman.h"
+#include "fman_sp.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Queue ID */
+#define DFLT_FQ_ID 0x00FF
+
+/* General defines */
+#define PORT_BMI_FIFO_UNITS0x100
+
+#define MAX_PORT_FIFO_SIZE(bmi_max_fifo_size)  \
+   min((u32)bmi_max_fifo_size, (u32)1024 * FMAN_BMI_FIFO_UNITS)
+
+#define PORT_CG_MAP_NUM8
+#define PORT_PRS_RESULT_WORDS_NUM  8
+#define PORT_IC_OFFSET_UNITS   0x10
+
+#define MIN_EXT_BUF_SIZE   64
+
+#define BMI_PORT_REGS_OFFSET   0
+#define QMI_PORT_REGS_OFFSET   0x400
+
+/* Default values */
+#define DFLT_PORT_BUFFER_PREFIX_CONTEXT_DATA_ALIGN \
+   DFLT_FM_SP_BUFFER_PREFIX_CONTEXT_DATA_ALIGN
+
+#define DFLT_PORT_CUT_BYTES_FROM_END   4
+
+#define DFLT_PORT_ERRORS_TO_DISCARDFM_PORT_FRM_ERR_CLS_DISCARD
+#define DFLT_PORT_MAX_FRAME_LENGTH 9600
+
+#define DFLT_PORT_RX_FIFO_PRI_ELEVATION_LEV(bmi_max_fifo_size) \
+   MAX_PORT_FIFO_SIZE(bmi_max_fifo_size)
+
+#define DFLT_PORT_RX_FIFO_THRESHOLD(major, bmi_max_fifo_size)  \
+   (major == 6 ?   \
+   MAX_PORT_FIFO_SIZE(bmi_max_fifo_size) : \
+   (MAX_PORT_FIFO_SIZE(bmi_max_fifo_size) * 3 / 4))\
+
+#define DFLT_PORT_EXTRA_NUM_OF_FIFO_BUFS   0
+
+/* QMI defines */
+#define 

Re: [RFC PATCH net-next 1/4] perf tools: Enable pre-event inherit setting by config terms

2015-10-28 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 28, 2015 at 02:21:26PM +0100, Jiri Olsa escreveu:
> On Wed, Oct 28, 2015 at 10:55:02AM +, Wang Nan wrote:
> > @@ -55,6 +56,7 @@ struct perf_evsel_config_term {
> > booltime;
> > char*callgraph;
> > u64 stack_user;
> > +   u64 inherit;
> 
> seems like bool would be enough

Ok, will change this and move it to a more suitable place member
alignment wise.

Can I, with this change, slap an Acked-by: jirka?

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm: dst_entries_init() per-net dst_ops

2015-10-28 Thread Hannes Frederic Sowa
Hello,

On Wed, Oct 28, 2015, at 14:32, Dan Streetman wrote:
> On Tue, Oct 27, 2015 at 12:15 PM,   wrote:
> > From: Dan Streetman 
> >
> > The ipv4 and ipv6 xfrms each create a template dst_ops object, and
> > perform dst_entries_init() on the template objects.  Then each net
> > namespace has its net.xfrm.xfrm[46]_dst_ops field set to the template
> > values.  The problem with that is the dst_ops.pcpuc_entries field is
> > a percpu counter and cannot be used correctly by simply copying it to
> > another object.

How hard would it be to split of the counters from the dst_ops struct?
We could make dst_ops instances const and have normal pointers to them
and keep the dst_entries as a small array in net namespace?

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next 1/4] perf tools: Enable pre-event inherit setting by config terms

2015-10-28 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 28, 2015 at 10:42:13AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Oct 28, 2015 at 02:21:26PM +0100, Jiri Olsa escreveu:
> > On Wed, Oct 28, 2015 at 10:55:02AM +, Wang Nan wrote:
> > > @@ -55,6 +56,7 @@ struct perf_evsel_config_term {
> > >   booltime;
> > >   char*callgraph;
> > >   u64 stack_user;
> > > + u64 inherit;
> > 
> > seems like bool would be enough
> 
> Ok, will change this and move it to a more suitable place member
> alignment wise.

Nah, switched it to bool, but no need to move it around, that is an
union...
 
> Can I, with this change, slap an Acked-by: jirka?
> 
> - Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-10-28 Thread Rasmus Villemoes
On Wed, Oct 28 2015, Hannes Frederic Sowa  wrote:

> Hi Linus,
>
> On Wed, Oct 28, 2015, at 10:39, Linus Torvalds wrote:
>> Get rid of it. And I don't *ever* want to see that shit again.
>
> I don't want to give up on that this easily:
>
> In future I would like to see an interface like this. It is often hard
> to do correct overflow/wrap-around tests and it would be great if there
> are helper functions which could easily and without a lot of thinking be
> used by people to remove those problems from the kernel.

I agree - proper overflow checking can be really hard. Quick, assuming a
and b have the same unsigned integer type, is 'a+b

Re: [PATCH] xfrm: dst_entries_init() per-net dst_ops

2015-10-28 Thread Dan Streetman
On Wed, Oct 28, 2015 at 9:42 AM, Hannes Frederic Sowa
 wrote:
> Hello,
>
> On Wed, Oct 28, 2015, at 14:32, Dan Streetman wrote:
>> On Tue, Oct 27, 2015 at 12:15 PM,   wrote:
>> > From: Dan Streetman 
>> >
>> > The ipv4 and ipv6 xfrms each create a template dst_ops object, and
>> > perform dst_entries_init() on the template objects.  Then each net
>> > namespace has its net.xfrm.xfrm[46]_dst_ops field set to the template
>> > values.  The problem with that is the dst_ops.pcpuc_entries field is
>> > a percpu counter and cannot be used correctly by simply copying it to
>> > another object.
>
> How hard would it be to split of the counters from the dst_ops struct?
> We could make dst_ops instances const and have normal pointers to them
> and keep the dst_entries as a small array in net namespace?

Well, the dst_ops->pcpuc_entries counter is used in dst.c which just
gets a struct dst_ops *, so it doesn't have access to the owning net
namespace.  And, not all dst_ops users have a per-net-namespace
dst_ops; ipv4/route.c for example uses a global "ipv4_dst_ops" object.
So it probably does need to stay owned by dst_ops.


>
> Bye,
> Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/4] sfc: Use cpu_to_mem() to support memoryless nodes

2015-10-28 Thread Shradha Shah
From: Bert Kenward 

With CONFIG_HAVE_MEMORYLESS_NODES cpu_to_node() may return
nodes without memory, which is not a good choice when later
using that to allocate memory. cpu_to_mem() instead provides
the most appropriate NUMA node to allocate from.

Signed-off-by: Shradha Shah 
---
 drivers/net/ethernet/sfc/efx.c| 2 +-
 drivers/net/ethernet/sfc/net_driver.h | 4 ++--
 drivers/net/ethernet/sfc/rx.c | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 89fbd03..84f9e90 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -445,7 +445,7 @@ efx_alloc_channel(struct efx_nic *efx, int i, struct 
efx_channel *old_channel)
channel->efx = efx;
channel->channel = i;
channel->type = _default_channel_type;
-   channel->irq_node = NUMA_NO_NODE;
+   channel->irq_mem_node = NUMA_NO_NODE;
 
for (j = 0; j < EFX_TXQ_TYPES; j++) {
tx_queue = >tx_queue[j];
diff --git a/drivers/net/ethernet/sfc/net_driver.h 
b/drivers/net/ethernet/sfc/net_driver.h
index 0ab9080a..bab6cc0 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -419,7 +419,7 @@ enum efx_sync_events_state {
  * @sync_events_state: Current state of sync events on this channel
  * @sync_timestamp_major: Major part of the last ptp sync event
  * @sync_timestamp_minor: Minor part of the last ptp sync event
- * @irq_node: NUMA node of interrupt
+ * @irq_mem_node: Memory NUMA node of interrupt
  */
 struct efx_channel {
struct efx_nic *efx;
@@ -479,7 +479,7 @@ struct efx_channel {
u32 sync_timestamp_major;
u32 sync_timestamp_minor;
 
-   int irq_node;
+   int irq_mem_node;
 };
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index c5ef1e8..095d1af 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -171,7 +171,8 @@ static int efx_init_rx_buffers(struct efx_rx_queue 
*rx_queue, bool atomic)
struct efx_channel *channel;
 
channel = efx_rx_queue_channel(rx_queue);
-   page = alloc_pages_node(channel->irq_node, __GFP_COMP |
+   page = alloc_pages_node(channel->irq_mem_node,
+   __GFP_COMP |
(atomic ?
 (GFP_ATOMIC | __GFP_NOWARN)
 : GFP_KERNEL),

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/4] sfc: allocate rx pages on the same node as the interrupt

2015-10-28 Thread Shradha Shah
From: Daniel Pieczko 

When the interrupt servicing a channel is on a NUMA node that is
not local to the device, performance is improved by allocating
rx pages on the node local to the interrupt (remote to the device)

The performance-optimal case, where interrupts and applications
are pinned to CPUs on the same node as the device, is not altered
by this change.

This change gave a 1% improvement in transaction rate using Nginx
with all interrupts and Nginx threads on the node remote to the
device. It also gave a small reduction in round-trip latency,
again with the interrupt and application on a different node to
the device.

Allocating rx pages based on the channel->irq_node value is only
valid for the initial driver-load interrupt affinities; if an
interrupt is moved later, the wrong node may be used for the
allocation.

Signed-off-by: Shradha Shah 
---
 drivers/net/ethernet/sfc/efx.c|  1 +
 drivers/net/ethernet/sfc/net_driver.h |  3 +++
 drivers/net/ethernet/sfc/rx.c | 14 +-
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 974637d..89fbd03 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -445,6 +445,7 @@ efx_alloc_channel(struct efx_nic *efx, int i, struct 
efx_channel *old_channel)
channel->efx = efx;
channel->channel = i;
channel->type = _default_channel_type;
+   channel->irq_node = NUMA_NO_NODE;
 
for (j = 0; j < EFX_TXQ_TYPES; j++) {
tx_queue = >tx_queue[j];
diff --git a/drivers/net/ethernet/sfc/net_driver.h 
b/drivers/net/ethernet/sfc/net_driver.h
index ad56231..0ab9080a 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -419,6 +419,7 @@ enum efx_sync_events_state {
  * @sync_events_state: Current state of sync events on this channel
  * @sync_timestamp_major: Major part of the last ptp sync event
  * @sync_timestamp_minor: Minor part of the last ptp sync event
+ * @irq_node: NUMA node of interrupt
  */
 struct efx_channel {
struct efx_nic *efx;
@@ -477,6 +478,8 @@ struct efx_channel {
enum efx_sync_events_state sync_events_state;
u32 sync_timestamp_major;
u32 sync_timestamp_minor;
+
+   int irq_node;
 };
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 3f0e129..c5ef1e8 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -168,11 +168,15 @@ static int efx_init_rx_buffers(struct efx_rx_queue 
*rx_queue, bool atomic)
 * context in such a case.  So, use __GFP_NO_WARN
 * in case of atomic.
 */
-   page = alloc_pages(__GFP_COLD | __GFP_COMP |
-  (atomic ?
-   (GFP_ATOMIC | __GFP_NOWARN)
-   : GFP_KERNEL),
-  efx->rx_buffer_order);
+   struct efx_channel *channel;
+
+   channel = efx_rx_queue_channel(rx_queue);
+   page = alloc_pages_node(channel->irq_node, __GFP_COMP |
+   (atomic ?
+(GFP_ATOMIC | __GFP_NOWARN)
+: GFP_KERNEL),
+   efx->rx_buffer_order);
+
if (unlikely(page == NULL))
return -ENOMEM;
dma_addr =

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] net: phy: Stop 'phy-state-machine' and 'phy_change' work on remove

2015-10-28 Thread Neil Armstrong
> > void phy_disconnect(struct phy_device *phydev)
> > {
> > if (phydev->irq > 0)
> > phy_stop_interrupts(phydev);
> >
> > phy_stop_machine(phydev);
> >
> > phydev->adjust_link = NULL;
> >
> > phy_detach(phydev);
> > }
> 
> And this does not yet get called. It probably needs to be in
> dsa_switch_destroy() just before unregister_netdev() of the slave
> devices.
> 
> However, the ordering in dsa_switch_destroy() looks wrong. The fixed
> phys are destroyed before the slave devices. They should probably be
> destroyed after the slave devices, or at least after the
> phy_disconnect() is called.
> 
>  Andrew
> 

Andrew, Florian,

Thanks for the review, a call to phy_disconnect was missing in 
dsa_switch_destroy.

I will post a new patchset with the correct fix, a switch to delayed_work and
a separate dsa_slave_destroy function for sake of maintenance ease.

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 2/4] net: dsa: bcm_sf2: cleanup resources in remove callback

2015-10-28 Thread Neil Armstrong
Implement a remove callback allowing the switch driver to cleanup
resources it used: interrupts and remapped register ranges.

Signed-off-by: Florian Fainelli 
Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/bcm_sf2.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 6f946fe..e0be318 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1054,6 +1054,25 @@ out_unmap:
return ret;
 }

+static void bcm_sf2_sw_remove(struct dsa_switch *ds)
+{
+   struct bcm_sf2_priv *priv = ds_to_priv(ds);
+   void __iomem **base;
+   unsigned int i;
+
+   /* Disable all interrupts and free them */
+   bcm_sf2_intr_disable(priv);
+
+   free_irq(priv->irq0, priv);
+   free_irq(priv->irq1, priv);
+
+   base = >core;
+   for (i = 0; i < BCM_SF2_REGS_NUM; i++) {
+   iounmap(*base);
+   base++;
+   }
+}
+
 static int bcm_sf2_sw_set_addr(struct dsa_switch *ds, u8 *addr)
 {
return 0;
@@ -1367,6 +1386,7 @@ static struct dsa_switch_driver bcm_sf2_switch_driver = {
.tag_protocol   = DSA_TAG_PROTO_BRCM,
.priv_size  = sizeof(struct bcm_sf2_priv),
.probe  = bcm_sf2_sw_probe,
+   .remove = bcm_sf2_sw_remove,
.setup  = bcm_sf2_sw_setup,
.set_addr   = bcm_sf2_sw_set_addr,
.get_phy_flags  = bcm_sf2_sw_get_phy_flags,
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 3/4] net: dsa: mv88e6xxx: add common and ppu remove function

2015-10-28 Thread Neil Armstrong
With the previously introduced remove callback, add a
mv88e6xxx common remove function to cleanup all resources.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6xxx.c | 18 ++
 drivers/net/dsa/mv88e6xxx.h |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index b1b14f5..6287096 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -331,6 +331,16 @@ void mv88e6xxx_ppu_state_init(struct dsa_switch *ds)
ps->ppu_timer.function = mv88e6xxx_ppu_reenable_timer;
 }

+void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   
+   del_timer_sync(>ppu_timer);
+
+   cancel_work_sync(>bridge_work);
+   flush_work(>bridge_work);
+}
+
 int mv88e6xxx_phy_read_ppu(struct dsa_switch *ds, int addr, int regnum)
 {
int ret;
@@ -2083,6 +2093,14 @@ int mv88e6xxx_setup_common(struct dsa_switch *ds)
return 0;
 }

+void mv88e6xxx_remove_common(struct dsa_switch *ds)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+
+   cancel_work_sync(>bridge_work);
+   flush_work(>bridge_work);
+}
+
 int mv88e6xxx_setup_global(struct dsa_switch *ds)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 6f9ed5d..64d37a0 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -417,6 +417,7 @@ struct mv88e6xxx_hw_stat {
 int mv88e6xxx_switch_reset(struct dsa_switch *ds, bool ppu_active);
 int mv88e6xxx_setup_ports(struct dsa_switch *ds);
 int mv88e6xxx_setup_common(struct dsa_switch *ds);
+void mv88e6xxx_remove_common(struct dsa_switch *ds);
 int mv88e6xxx_setup_global(struct dsa_switch *ds);
 int __mv88e6xxx_reg_read(struct mii_bus *bus, int sw_addr, int addr, int reg);
 int mv88e6xxx_reg_read(struct dsa_switch *ds, int addr, int reg);
@@ -431,6 +432,7 @@ int mv88e6xxx_phy_read_indirect(struct dsa_switch *ds, int 
port, int regnum);
 int mv88e6xxx_phy_write_indirect(struct dsa_switch *ds, int port, int regnum,
 u16 val);
 void mv88e6xxx_ppu_state_init(struct dsa_switch *ds);
+void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds);
 int mv88e6xxx_phy_read_ppu(struct dsa_switch *ds, int addr, int regnum);
 int mv88e6xxx_phy_write_ppu(struct dsa_switch *ds, int addr,
int regnum, u16 val);
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 1/4] net: dsa: allow switch drivers to cleanup their resources

2015-10-28 Thread Neil Armstrong
Some switch drivers might request interrupts, remap register ranges,
allow such drivers to implement a "remove" callback doing just that.

Signed-off-by: Florian Fainelli 
Signed-off-by: Neil Armstrong 
---
 include/net/dsa.h | 1 +
 net/dsa/dsa.c | 4 
 2 files changed, 5 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 98ccbde..0e1502c 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -212,6 +212,7 @@ struct dsa_switch_driver {
 */
char*(*probe)(struct device *host_dev, int sw_addr);
int (*setup)(struct dsa_switch *ds);
+   void(*remove)(struct dsa_switch *ds);
int (*set_addr)(struct dsa_switch *ds, u8 *addr);
u32 (*get_phy_flags)(struct dsa_switch *ds, int port);

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 1eba07f..f462fc5 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -459,6 +459,10 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
}

mdiobus_unregister(ds->slave_mii_bus);
+
+   /* Leave a chance to the driver to cleanup */
+   if (ds->drv->remove)
+   ds->drv->remove(ds);
 }

 #ifdef CONFIG_PM_SLEEP
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next 1/4] perf tools: Enable pre-event inherit setting by config terms

2015-10-28 Thread Jiri Olsa
On Wed, Oct 28, 2015 at 10:42:13AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 28, 2015 at 02:21:26PM +0100, Jiri Olsa escreveu:
> > On Wed, Oct 28, 2015 at 10:55:02AM +, Wang Nan wrote:
> > > @@ -55,6 +56,7 @@ struct perf_evsel_config_term {
> > >   booltime;
> > >   char*callgraph;
> > >   u64 stack_user;
> > > + u64 inherit;
> > 
> > seems like bool would be enough
> 
> Ok, will change this and move it to a more suitable place member
> alignment wise.
> 
> Can I, with this change, slap an Acked-by: jirka?

yep

jirka
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 0/4] net: dsa: cleanup dsa driver

2015-10-28 Thread Neil Armstrong
Introduce a new remove callback to allow DSA drivers to cleanup their
ressources.
Then add a remove implementation for bcm_sf2 and mv88e6xxx.

This patch was not tested due of a lack of hardware.

v2: add remove callback patch to the serie

Neil Armstrong (4):
  net: dsa: allow switch drivers to cleanup their resources
  net: dsa: bcm_sf2: cleanup resources in remove callback
  net: dsa: mv88e6xxx: add common and ppu remove function
  net: dsa: make usage of mv88e6xxx common remove function

 drivers/net/dsa/bcm_sf2.c | 20 
 drivers/net/dsa/mv88e6123_61_65.c |  1 +
 drivers/net/dsa/mv88e6131.c   |  8 
 drivers/net/dsa/mv88e6171.c   |  1 +
 drivers/net/dsa/mv88e6352.c   |  1 +
 drivers/net/dsa/mv88e6xxx.c   | 18 ++
 drivers/net/dsa/mv88e6xxx.h   |  2 ++
 include/net/dsa.h |  1 +
 net/dsa/dsa.c |  4 
 9 files changed, 56 insertions(+)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 02/10] ss: created formatters for json and hr

2015-10-28 Thread Phil Sutter
On Wed, Oct 28, 2015 at 12:57:28PM +0100, Matthias Tafelmeier wrote:
> >> Those resentments were related to the patchsets complexity and
> >> size.
> > 
> > I didn't see any problem with that in the first place. It is indeed
> > a big change, achieving something like that without a big patch set
> > is unlikely.
> > 
> 
> Fine, I was just repounding that since Steven Hemminger raised that.
> My reasoning here is that I just don't want to kick off restarting
> work whith objections still in the minds – since we are already at V7
> now.

Yeah, sorry for not having looked into this earlier. Also, I neither
have nor claim any power of veto. Apart from that, I'm not against this
patch series in general, just trying to help raise it's quality a bit.
Eventually, we don't set anything in stone so everything can be
fixed/improved later on. Except Git history of course, which is
important to get right in relation to bisecting.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 3/4] net: dsa: mv88e6xxx: add common and ppu remove function

2015-10-28 Thread Andrew Lunn
On Wed, Oct 28, 2015 at 03:13:16PM +0100, Neil Armstrong wrote:
> With the previously introduced remove callback, add a
> mv88e6xxx common remove function to cleanup all resources.
> 
> Signed-off-by: Neil Armstrong 
> ---
>  drivers/net/dsa/mv88e6xxx.c | 18 ++
>  drivers/net/dsa/mv88e6xxx.h |  2 ++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> index b1b14f5..6287096 100644
> --- a/drivers/net/dsa/mv88e6xxx.c
> +++ b/drivers/net/dsa/mv88e6xxx.c
> @@ -331,6 +331,16 @@ void mv88e6xxx_ppu_state_init(struct dsa_switch *ds)
>   ps->ppu_timer.function = mv88e6xxx_ppu_reenable_timer;
>  }
> 
> +void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds)
> +{
> + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
> + 
> + del_timer_sync(>ppu_timer);
> +
> + cancel_work_sync(>bridge_work);
> + flush_work(>bridge_work);
> +}
> +

You add this function, but you don't use it anywhere?  Also, why
cancel bridge work, not ppu_work? Or has that been consolidated
in some patch i'm missing?

   Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/4] sfc: NUMA support

2015-10-28 Thread Shradha Shah
This patch series adds support for 
- allocating rx pages local to the interrupt
- setting affinity hint to influence IRQs to be allocated on the
same NUMA node as the one where the card resides.

Alexandra Kossovsky (1):
  sfc: use __GFP_NOWARN when allocating RX pages from atomic context.

Bert Kenward (2):
  sfc: Use cpu_to_mem() to support memoryless nodes
  sfc: set and clear interrupt affinity hints

Daniel Pieczko (1):
  sfc: allocate rx pages on the same node as the interrupt

 drivers/net/ethernet/sfc/efx.c| 36 +++
 drivers/net/ethernet/sfc/net_driver.h |  3 +++
 drivers/net/ethernet/sfc/rx.c | 18 +++---
 3 files changed, 54 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 4/4] sfc: set and clear interrupt affinity hints

2015-10-28 Thread Shradha Shah
From: Bert Kenward 

Use cpumask_local_spread to provide interrupt affinity hints
for each queue. This will spread interrupts across NUMA local
CPUs first, extending to remote nodes if needed.

Signed-off-by: Shradha Shah 
---
 drivers/net/ethernet/sfc/efx.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 84f9e90..93c4c0e 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1489,6 +1489,30 @@ static int efx_probe_interrupts(struct efx_nic *efx)
return 0;
 }
 
+#if defined(CONFIG_SMP)
+static void efx_set_interrupt_affinity(struct efx_nic *efx)
+{
+   struct efx_channel *channel;
+   unsigned int cpu;
+
+   efx_for_each_channel(channel, efx) {
+   cpu = cpumask_local_spread(channel->channel,
+  pcibus_to_node(efx->pci_dev->bus));
+
+   irq_set_affinity_hint(channel->irq, cpumask_of(cpu));
+   channel->irq_mem_node = cpu_to_mem(cpu);
+   }
+}
+
+static void efx_clear_interrupt_affinity(struct efx_nic *efx)
+{
+   struct efx_channel *channel;
+
+   efx_for_each_channel(channel, efx)
+   irq_set_affinity_hint(channel->irq, NULL);
+}
+#endif /* CONFIG_SMP */
+
 static int efx_soft_enable_interrupts(struct efx_nic *efx)
 {
struct efx_channel *channel, *end_channel;
@@ -2932,6 +2956,9 @@ static void efx_pci_remove_main(struct efx_nic *efx)
cancel_work_sync(>reset_work);
 
efx_disable_interrupts(efx);
+#if defined(CONFIG_SMP)
+   efx_clear_interrupt_affinity(efx);
+#endif
efx_nic_fini_interrupt(efx);
efx_fini_port(efx);
efx->type->fini(efx);
@@ -3081,6 +3108,11 @@ static int efx_pci_probe_main(struct efx_nic *efx)
rc = efx_nic_init_interrupt(efx);
if (rc)
goto fail5;
+
+#if defined(CONFIG_SMP)
+   efx_set_interrupt_affinity(efx);
+#endif
+
rc = efx_enable_interrupts(efx);
if (rc)
goto fail6;
@@ -3088,6 +3120,9 @@ static int efx_pci_probe_main(struct efx_nic *efx)
return 0;
 
  fail6:
+#if defined(CONFIG_SMP)
+   efx_clear_interrupt_affinity(efx);
+#endif
efx_nic_fini_interrupt(efx);
  fail5:
efx_fini_port(efx);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] net: dsa: allow switch drivers to cleanup their resources

2015-10-28 Thread Neil Armstrong
On 10/27/2015 05:59 PM, Vivien Didelot wrote:
> On Oct. Tuesday 27 (44) 04:43 PM, Neil Armstrong wrote:
>>
>> Yes, I didn't know how to handle this since it was part of a larger patch.
>>
>> I forgot to add this into the cover-letter but I wanted to send an RFC serie 
>> with
>> your bcm remove patch and a mv88e6xxx remove experimental code.
>>
>> Yet, the mv88e6060 does not make usage of this.
> 
> So this patch must be part of your RFC for module removal instead of
> this patchset.
> 
> Thanks,
> -v
> 

Vivien, Florian,

Thanks for the review, I will integrate it in the other RFC patchset with the 
correct Signed-off-by tag.

Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/4] net: dsa: allow switch drivers to cleanup their resources

2015-10-28 Thread Andrew Lunn
On Wed, Oct 28, 2015 at 03:12:57PM +0100, Neil Armstrong wrote:
> Some switch drivers might request interrupts, remap register ranges,
> allow such drivers to implement a "remove" callback doing just that.
> 
> Signed-off-by: Florian Fainelli 
> Signed-off-by: Neil Armstrong 
> ---
>  include/net/dsa.h | 1 +
>  net/dsa/dsa.c | 4 
>  2 files changed, 5 insertions(+)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 98ccbde..0e1502c 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -212,6 +212,7 @@ struct dsa_switch_driver {
>*/
>   char*(*probe)(struct device *host_dev, int sw_addr);
>   int (*setup)(struct dsa_switch *ds);
> + void(*remove)(struct dsa_switch *ds);
>   int (*set_addr)(struct dsa_switch *ds, u8 *addr);
>   u32 (*get_phy_flags)(struct dsa_switch *ds, int port);
> 
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 1eba07f..f462fc5 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -459,6 +459,10 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
>   }
> 
>   mdiobus_unregister(ds->slave_mii_bus);
> +
> + /* Leave a chance to the driver to cleanup */

A nitpick:

 /* Give the driver a chance to cleanup */

would be better English.

Reviewed-by: Andrew Lunn 


Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Eric Dumazet
On Wed, 2015-10-28 at 06:24 -0700, Eric Dumazet wrote:

> Before I take a deep look at your suggestion, are you sure plain use of
> include/linux/percpu-refcount.h infra is not possible for struct cred ?

BTW, I am not convinced we need to spend so much energy and per-cpu
memory for struct cred refcount.

The big problem is fd array spinlock of course and bitmap search for
POSIX compliance.

The cache line trashing in struct cred is a minor one ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/4] sfc: use __GFP_NOWARN when allocating RX pages from atomic context.

2015-10-28 Thread Shradha Shah
From: Alexandra Kossovsky 

If we fail to allocate a page when in atomic context this is
handled by scheduling a fill in non-atomic context.
As such, a warning is not needed.

Signed-off-by: Shradha Shah 
---
 drivers/net/ethernet/sfc/rx.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 809ea461..3f0e129 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -163,8 +163,15 @@ static int efx_init_rx_buffers(struct efx_rx_queue 
*rx_queue, bool atomic)
do {
page = efx_reuse_page(rx_queue);
if (page == NULL) {
+   /* GFP_ATOMIC may fail because of various reasons,
+* and we re-schedule rx_fill from non-atomic
+* context in such a case.  So, use __GFP_NO_WARN
+* in case of atomic.
+*/
page = alloc_pages(__GFP_COLD | __GFP_COMP |
-  (atomic ? GFP_ATOMIC : GFP_KERNEL),
+  (atomic ?
+   (GFP_ATOMIC | __GFP_NOWARN)
+   : GFP_KERNEL),
   efx->rx_buffer_order);
if (unlikely(page == NULL))
return -ENOMEM;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 4/4] net: dsa: make usage of mv88e6xxx common remove function

2015-10-28 Thread Neil Armstrong
Make usage of previously introduced mv88e6xxx common remove
function in all mv88e6xxx drivers.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6123_61_65.c | 1 +
 drivers/net/dsa/mv88e6131.c   | 8 
 drivers/net/dsa/mv88e6171.c   | 1 +
 drivers/net/dsa/mv88e6352.c   | 1 +
 4 files changed, 11 insertions(+)

diff --git a/drivers/net/dsa/mv88e6123_61_65.c 
b/drivers/net/dsa/mv88e6123_61_65.c
index 4bcfd68..1773c99 100644
--- a/drivers/net/dsa/mv88e6123_61_65.c
+++ b/drivers/net/dsa/mv88e6123_61_65.c
@@ -122,6 +122,7 @@ struct dsa_switch_driver mv88e6123_61_65_switch_driver = {
.priv_size  = sizeof(struct mv88e6xxx_priv_state),
.probe  = mv88e6123_61_65_probe,
.setup  = mv88e6123_61_65_setup,
+   .remove = mv88e6xxx_remove_common,
.set_addr   = mv88e6xxx_set_addr_indirect,
.phy_read   = mv88e6xxx_phy_read,
.phy_write  = mv88e6xxx_phy_write,
diff --git a/drivers/net/dsa/mv88e6131.c b/drivers/net/dsa/mv88e6131.c
index c73121c..0f559b4 100644
--- a/drivers/net/dsa/mv88e6131.c
+++ b/drivers/net/dsa/mv88e6131.c
@@ -137,6 +137,13 @@ static int mv88e6131_setup(struct dsa_switch *ds)
return mv88e6xxx_setup_ports(ds);
 }

+static void mv88e6131_remove(struct dsa_switch *ds)
+{
+   mv88e6xxx_ppu_state_remove(ds);
+
+   mv88e6xxx_remove_common(ds);
+}
+
 static int mv88e6131_port_to_phy_addr(struct dsa_switch *ds, int port)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
@@ -175,6 +182,7 @@ struct dsa_switch_driver mv88e6131_switch_driver = {
.priv_size  = sizeof(struct mv88e6xxx_priv_state),
.probe  = mv88e6131_probe,
.setup  = mv88e6131_setup,
+   .remove = mv88e6131_remove,
.set_addr   = mv88e6xxx_set_addr_direct,
.phy_read   = mv88e6131_phy_read,
.phy_write  = mv88e6131_phy_write,
diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index 2c8eb6f..382529b 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -101,6 +101,7 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
.priv_size  = sizeof(struct mv88e6xxx_priv_state),
.probe  = mv88e6171_probe,
.setup  = mv88e6171_setup,
+   .remove = mv88e6xxx_remove_common,
.set_addr   = mv88e6xxx_set_addr_indirect,
.phy_read   = mv88e6xxx_phy_read_indirect,
.phy_write  = mv88e6xxx_phy_write_indirect,
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index cbf4dd8..7938901 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -321,6 +321,7 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.priv_size  = sizeof(struct mv88e6xxx_priv_state),
.probe  = mv88e6352_probe,
.setup  = mv88e6352_setup,
+   .remove = mv88e6xxx_remove_common,
.set_addr   = mv88e6xxx_set_addr_indirect,
.phy_read   = mv88e6xxx_phy_read_indirect,
.phy_write  = mv88e6xxx_phy_write_indirect,
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 3/4] net: dsa: mv88e6xxx: add common and ppu remove function

2015-10-28 Thread Andrew Lunn
On Wed, Oct 28, 2015 at 03:37:02PM +0100, Neil Armstrong wrote:
> Hi Andrew,
> 
> On 10/28/2015 03:35 PM, Andrew Lunn wrote:
> > On Wed, Oct 28, 2015 at 03:13:16PM +0100, Neil Armstrong wrote:
> >> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> >> index b1b14f5..6287096 100644
> >> --- a/drivers/net/dsa/mv88e6xxx.c
> >> +++ b/drivers/net/dsa/mv88e6xxx.c
> >> @@ -331,6 +331,16 @@ void mv88e6xxx_ppu_state_init(struct dsa_switch *ds)
> >>ps->ppu_timer.function = mv88e6xxx_ppu_reenable_timer;
> >>  }
> >>
> >> +void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds)
> >> +{
> >> +  struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
> >> +  
> >> +  del_timer_sync(>ppu_timer);
> >> +
> >> +  cancel_work_sync(>bridge_work);
> >> +  flush_work(>bridge_work);
> >> +}
> >> +
> > 
> > You add this function, but you don't use it anywhere?  Also, why
> > cancel bridge work, not ppu_work? Or has that been consolidated
> > in some patch i'm missing?
> > 
> >Andrew
> > 
> 
> It's called in the next patch, in mv88e6131_remove for mv88e6131.

Hi Neil

It would be better to split this out into a patch of its own, and
include the mv88e6131 change with it.

Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 3/4] net: dsa: mv88e6xxx: add common and ppu remove function

2015-10-28 Thread Neil Armstrong
Hi Andrew,

On 10/28/2015 03:35 PM, Andrew Lunn wrote:
> On Wed, Oct 28, 2015 at 03:13:16PM +0100, Neil Armstrong wrote:
>> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
>> index b1b14f5..6287096 100644
>> --- a/drivers/net/dsa/mv88e6xxx.c
>> +++ b/drivers/net/dsa/mv88e6xxx.c
>> @@ -331,6 +331,16 @@ void mv88e6xxx_ppu_state_init(struct dsa_switch *ds)
>>  ps->ppu_timer.function = mv88e6xxx_ppu_reenable_timer;
>>  }
>>
>> +void mv88e6xxx_ppu_state_remove(struct dsa_switch *ds)
>> +{
>> +struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
>> +
>> +del_timer_sync(>ppu_timer);
>> +
>> +cancel_work_sync(>bridge_work);
>> +flush_work(>bridge_work);
>> +}
>> +
> 
> You add this function, but you don't use it anywhere?  Also, why
> cancel bridge work, not ppu_work? Or has that been consolidated
> in some patch i'm missing?
> 
>Andrew
> 

It's called in the next patch, in mv88e6131_remove for mv88e6131.

Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT] Networking

2015-10-28 Thread David Miller

This may look a bit scary this late in the release cycle, but as is typically
the case it's predominantly small driver fixes all over the place.

1) Fix two regressions in ipv6 route lookups, particularly wrt. output
   interface specifications in the lookup key.  From David Ahern.

2) Fix checks in ipv6 IPSEC tunnel pre-encap fragmentation, from
   Herbert Xu.

3) Fix mis-advertisement of 1000BASE-T on bcm63xx_enet, from Simon
   Arlott.

4) Some smsc phys misbehave with energy detect mode enabled, so add a
   DT property and disable it on such switches.  From Heiko Schocher.

5) Fix TSO corruption on TX in mv643xx_eth, from Philipp Kirchhofer.

6) Fix regression added by removal of openvswitch vport stats, from
   James Morse.

7) Vendor Kconfig options should be bool, not tristate, from Andreas
   Schwab.

8) Use non-_BH() net stats bump in tcp_xmit_probe_skb(), otherwise
   we barf during TCP REPAIR operations.

9) Fix various bugs in openvswitch conntrack support, from Joe
   Stringer.

10) Fix NETLINK_LIST_MEMBERSHIPS locking, from David Herrmann.

11) Don't have VSOCK do sock_put() in interrupt context, from Jorgen
Hansen.

12) Fix skb_realloc_headroom() failures properly in ISDN, from Karsten
Keil.

13) Add some device IDs to qmi_wwan, from Bjorn Mork.

14) Fix ovs egress tunnel information when using lwtunnel devices,
from Pravin B Shelar.

15) Add missing NETIF_F_FRAGLIST to macvtab feature list, from Jason
Wang.

16) Fix incorrect handling of throw routes when the result of the
throw cannot find a match, from Xin Long.

17) Protect ipv6 MTU calculations from wrap-around, from Hannes
Frederic Sowa.

18) Fix failed autonegotiation on KSZ9031 micrel PHYs, from Nathan
Sullivan.

19) Add missing memory barries in descriptor accesses or xgbe driver,
from Thomas Lendacky.

20) Fix release conditon test in pppoe_release(), from Guillaume Nault.

21) Fix gianfar bugs wrt. filter configuration, from Claudiu Manoil.

22) Fix violations of RX buffer alignment in sh_eth driver, from Sergei
Shtylyov.

23) Fixing missing of_node_put() calls in various places around the
networking, from Julia Lawall.

24) Fix incorrect leaf now walking in ipv4 routing tree, from Alexander
Duyck.

25) RDS doesn't check pskb_pull()/pskb_trim() return values, from
Sowmini Varadhan.

26) Fix VLAN configuration in mlx4 driver, from Jack Morgenstein.

Please pull, thanks a lot.

The following changes since commit 1099f86044111e9a7807f09523e42d4c9d0fb781:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-10-19 
09:55:40 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to e18f6ac30d31433d8cd9ccf693d3cdd5d2e66ef9:

  Merge branch 'mlx4-fixes' (2015-10-27 20:27:45 -0700)



Alexander Duyck (1):
  fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key

Andreas Schwab (1):
  net: cavium: change NET_VENDOR_CAVIUM to bool

Andrew F. Davis (1):
  net: phy: dp83848: Add TI DP83848 Ethernet PHY

Andrew Shewmaker (1):
  tcp: allow dctcp alpha to drop to zero

Bjørn Mork (1):
  qmi_wwan: add Sierra Wireless MC74xx/EM74xx

Carol L Soto (1):
  net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes

Claudiu Manoil (4):
  gianfar: Remove duplicated argument to bitwise OR
  gianfar: Don't enable the Filer w/o the Parser
  gianfar: Fix Rx BSY error handling
  MAINTAINERS: Add entry for gianfar ethernet driver

Dan Carpenter (1):
  irda: precedence bug in irlmp_seq_hb_idx()

David Ahern (2):
  net: Really fix vti6 with oif in dst lookups
  net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set

David Daney (1):
  net: thunderx: Rewrite silicon revision tests.

David Herrmann (1):
  netlink: fix locking around NETLINK_LIST_MEMBERSHIPS

David S. Miller (12):
  Merge branch 'smsc-energy-detect'
  Merge branch 'mv643xx-fixes'
  Merge git://git.kernel.org/.../pablo/nf
  Merge branch 'isdn-null-deref'
  Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec
  Merge branch 'master' of git://git.kernel.org/.../jkirsher/net-queue
  Merge branch 'ipv6-overflow-arith'
  Merge branch 'thunderx-fixes'
  Merge branch 'gianfar-fixes'
  Merge branch 'sh_eth-fixes'
  Merge branch 'net_of_node_put'
  Merge branch 'mlx4-fixes'

Eric Dumazet (1):
  ipv6: gre: support SIT encapsulation

Florian Westphal (1):
  netfilter: sync with packet rx also after removing queue entries

Gao feng (1):
  vsock: fix missing cleanup when misc_register failed

Guillaume Nault (1):
  ppp: fix pppoe_dev deletion condition in pppoe_release()

Hannes Frederic Sowa (2):
  overflow-arith: begin to add support for overflow builtin functions
  ipv6: protect mtu calculation of wrap-around and infinite loop by 

Re: [PATCH v2 1/3] virtio_net: Stop doing DMA from the stack

2015-10-28 Thread Michael S. Tsirkin
On Tue, Oct 27, 2015 at 10:30:19PM -0700, Andy Lutomirski wrote:
> From: Andy Lutomirski 
> 
> Once virtio starts using the DMA API, we won't be able to safely DMA
> from the stack.  virtio-net does a couple of config DMA requests
> from small stack buffers -- switch to using dynamically-allocated
> memory.
> 
> This should have no effect on any performance-critical code paths.
> 
> Cc: netdev@vger.kernel.org
> Cc: "Michael S. Tsirkin" 
> Cc: virtualizat...@lists.linux-foundation.org
> Reviewed-by: Joerg Roedel 
> Signed-off-by: Andy Lutomirski 
> ---
> 
> Hi Michael and DaveM-
> 
> This is a prerequisite for the virtio DMA fixing project.  It works
> as a standalone patch, though.  Would it make sense to apply it to
> an appropriate networking tree now?
> 
>  drivers/net/virtio_net.c | 53 
> 
>  1 file changed, 36 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index d8838dedb7a4..4f10f8a58811 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -976,31 +976,43 @@ static bool virtnet_send_command(struct virtnet_info 
> *vi, u8 class, u8 cmd,
>struct scatterlist *out)
>  {
>   struct scatterlist *sgs[4], hdr, stat;
> - struct virtio_net_ctrl_hdr ctrl;
> - virtio_net_ctrl_ack status = ~0;
> +
> + struct {
> + struct virtio_net_ctrl_hdr ctrl;
> + virtio_net_ctrl_ack status;
> + } *buf;
> +
>   unsigned out_num = 0, tmp;
> + bool ret;
>  
>   /* Caller should know better */
>   BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
>  
> - ctrl.class = class;
> - ctrl.cmd = cmd;
> + buf = kmalloc(sizeof(*buf), GFP_ATOMIC);
> + if (!buf)
> + return false;

This is problematic. The command is never retried, the error
is propagated to userspace.

> + buf->status = ~0;
> +
> + buf->ctrl.class = class;
> + buf->ctrl.cmd = cmd;
>   /* Add header */
> - sg_init_one(, , sizeof(ctrl));
> + sg_init_one(, >ctrl, sizeof(buf->ctrl));
>   sgs[out_num++] = 
>  
>   if (out)
>   sgs[out_num++] = out;
>  
>   /* Add return status. */
> - sg_init_one(, , sizeof(status));
> + sg_init_one(, >status, sizeof(buf->status));
>   sgs[out_num] = 
>  
>   BUG_ON(out_num + 1 > ARRAY_SIZE(sgs));
>   virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
>  
> - if (unlikely(!virtqueue_kick(vi->cvq)))
> - return status == VIRTIO_NET_OK;
> + if (unlikely(!virtqueue_kick(vi->cvq))) {
> + ret = (buf->status == VIRTIO_NET_OK);
> + goto out;
> + }
>  
>   /* Spin for a response, the kick causes an ioport write, trapping
>* into the hypervisor, so the request should be handled immediately.
> @@ -1009,7 +1021,11 @@ static bool virtnet_send_command(struct virtnet_info 
> *vi, u8 class, u8 cmd,
>  !virtqueue_is_broken(vi->cvq))
>   cpu_relax();
>  
> - return status == VIRTIO_NET_OK;
> + ret = (buf->status == VIRTIO_NET_OK);
> +
> +out:
> + kfree(buf);
> + return ret;
>  }
>  
>  static int virtnet_set_mac_address(struct net_device *dev, void *p)
> @@ -1151,7 +1167,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
>   struct scatterlist sg[2];
> - u8 promisc, allmulti;
> + u8 *cmdbyte;
>   struct virtio_net_ctrl_mac *mac_data;
>   struct netdev_hw_addr *ha;
>   int uc_count;
> @@ -1163,22 +1179,25 @@ static void virtnet_set_rx_mode(struct net_device 
> *dev)
>   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
>   return;
>  
> - promisc = ((dev->flags & IFF_PROMISC) != 0);
> - allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
> + cmdbyte = kmalloc(sizeof(*cmdbyte), GFP_ATOMIC);
> + if (!cmdbyte)
> + return;

Here the error is ignored, rx mode will be incorrect.
OTOH it looks like that's already the case.

>  
> - sg_init_one(sg, , sizeof(promisc));
> + sg_init_one(sg, cmdbyte, sizeof(*cmdbyte));
>  
> + *cmdbyte = ((dev->flags & IFF_PROMISC) != 0);
>   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
> VIRTIO_NET_CTRL_RX_PROMISC, sg))
>   dev_warn(>dev, "Failed to %sable promisc mode.\n",
> -  promisc ? "en" : "dis");
> -
> - sg_init_one(sg, , sizeof(allmulti));
> +  *cmdbyte ? "en" : "dis");
>  
> + *cmdbyte = ((dev->flags & IFF_ALLMULTI) != 0);
>   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
> VIRTIO_NET_CTRL_RX_ALLMULTI, sg))
>   dev_warn(>dev, "Failed to %sable allmulti mode.\n",
> -  allmulti ? "en" : "dis");
> + 

Re: [PATCH v1 1/3] virtio-net: Using single MSIX IRQ for TX/RX Q pair

2015-10-28 Thread Michael S. Tsirkin
On Wed, Oct 28, 2015 at 11:13:39AM +0800, Jason Wang wrote:
> 
> 
> On 10/27/2015 04:38 PM, Michael S. Tsirkin wrote:
> > On Mon, Oct 26, 2015 at 10:52:47AM -0700, Ravi Kerur wrote:
> >> Ported earlier patch from Jason Wang (dated 12/26/2014).
> >>
> >> This patch tries to reduce the number of MSIX irqs required for
> >> virtio-net by sharing a MSIX irq for each TX/RX queue pair through
> >> channels. If transport support channel, about half of the MSIX irqs
> >> were reduced.
> >>
> >> Signed-off-by: Ravi Kerur 
> > Why bother BTW? 
> 
> The reason is we want to save the number of interrupt vectors used.
> Booting a guest with 256 queues with current driver will result all
> tx/rx queues shares a single vector. This is suboptimal.

With a single CPU? But what configures so many queues? Why do it?

> With this
> series, half could be saved.

At cost of e.g. inability to balance the interrupts.

> And more complex policy could be applied on
> top (e.g limit the number of vectors used by driver).

If that's the motivation, I'd like to see a draft of that more complex
policy first.

> > Looks like this is adding a bunch of overhead
> > on data path - to what end?
> 
> I agree some benchmark is needed for this.
> 
> > Maybe you have a huge number of these devices ... but in that case, how
> > about sharing the config interrupt instead?
> > That's only possible if host supports VIRTIO_1
> > (so we can detect config interrupt by reading the ISR).
> >
> >
> >
> >> ---
> >>  drivers/net/virtio_net.c | 29 -
> >>  1 file changed, 28 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index d8838ded..d705cce 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -72,6 +72,9 @@ struct send_queue {
> >>  
> >>/* Name of the send queue: output.$index */
> >>char name[40];
> >> +
> >> +  /* Name of the channel, shared with irq. */
> >> +  char channel_name[40];
> >>  };
> >>  
> >>  /* Internal representation of a receive virtqueue */
> >> @@ -1529,6 +1532,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >>int ret = -ENOMEM;
> >>int i, total_vqs;
> >>const char **names;
> >> +  const char **channel_names;
> >> +  unsigned *channels;
> >>  
> >>/* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by
> >> * possible N-1 RX/TX queue pairs used in multiqueue mode, followed by
> >> @@ -1548,6 +1553,17 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >>if (!names)
> >>goto err_names;
> >>  
> >> +  channel_names = kmalloc_array(vi->max_queue_pairs,
> >> +sizeof(*channel_names),
> >> +GFP_KERNEL);
> >> +  if (!channel_names)
> >> +  goto err_channel_names;
> >> +
> >> +  channels = kmalloc_array(total_vqs, sizeof(*channels),
> >> +   GFP_KERNEL);
> >> +  if (!channels)
> >> +  goto err_channels;
> >> +
> >>/* Parameters for control virtqueue, if any */
> >>if (vi->has_cvq) {
> >>callbacks[total_vqs - 1] = NULL;
> >> @@ -1562,10 +1578,15 @@ static int virtnet_find_vqs(struct virtnet_info 
> >> *vi)
> >>sprintf(vi->sq[i].name, "output.%d", i);
> >>names[rxq2vq(i)] = vi->rq[i].name;
> >>names[txq2vq(i)] = vi->sq[i].name;
> >> +  sprintf(vi->sq[i].channel_name, "txrx.%d", i);
> >> +  channel_names[i] = vi->sq[i].channel_name;
> >> +  channels[rxq2vq(i)] = i;
> >> +  channels[txq2vq(i)] = i;
> >>}
> >>  
> >>ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
> >> -   names);
> >> +   names, channels, channel_names,
> >> +   vi->max_queue_pairs);
> >>if (ret)
> >>goto err_find;
> >>  
> >> @@ -1580,6 +1601,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >>vi->sq[i].vq = vqs[txq2vq(i)];
> >>}
> >>  
> >> +  kfree(channels);
> >> +  kfree(channel_names);
> >>kfree(names);
> >>kfree(callbacks);
> >>kfree(vqs);
> >> @@ -1587,6 +1610,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >>return 0;
> >>  
> >>  err_find:
> >> +  kfree(channels);
> >> +err_channels:
> >> +  kfree(channel_names);
> >> +err_channel_names:
> >>kfree(names);
> >>  err_names:
> >>kfree(callbacks);
> >> -- 
> >> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 0/4] Automatic adjustment of max frame size

2015-10-28 Thread Toshiaki Makita

On 15/10/28 (水) 13:58, Stephen Hemminger wrote:

On Mon, 26 Oct 2015 12:40:55 +0900
Toshiaki Makita  wrote:

...

Thank you for taking a look at the patch set.
I'm not sure if I fully understand you, so please correct me if I 
misread you.



The problem is that you require changing network device drivers
and device specific knowledge about what will work or not. Because
of that the modificaton can't be automated.


I'm not sure what you mean by "device specific knowledge" and "automated"...
Indeed, this requires change in each driver.
But required changes in drivers should be mostly making use of 
ndo_change_mtu implementation code and not hard. We can progressively 
implement ndo_enc_hdr_len for each driver.
If max frame size cannot be changed on a certain NIC, vlan driver will 
emit a warning message and make MTU smaller, then userspace can handle 
it (patch 3). If needed, maybe we can expose this feature via ethtool.




Also, this effects even more layered devices like tunnels etc.


Yes, if tunnel devices start to utilize this framework. This is one of 
purposes of my patch set.



The problem is quite large, and this patch only begins to address it.


Yes, this is the first step to address the problem.



It seems to me that just having the vlan driver to a sane
auto default is the best solution.


For now, this patch implementation is limited to vlan. For other 
protocols, auto-expansion may not be suitable and may need some nob to 
use the framework.


If you mean just making MTU smaller on vlan device instead of adjusting 
max frame size of real device, then it would not work. 802.1ad HW 
switches, at any rate, send 1526 bytes frames so they will be dropped on 
the real device.



It might cause a smaller MTU
than ideal, but at least it will still work. Then the user can
manually set a larger MTU if they know their hardware will work.


Toshiaki Makita
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2015-10-28 Thread arwen lai
Dear Mr/Ms,
we are a OEM parts supplier on many categories,we can supply all kinds of metal 
parts in compliance with customer's design.
Idea and designs from customers can be realized into new products here 
confidentially. Any 
OEM metalwork is welcomed! 

B/R
Yours James Cheung
Skype:senkemfg

Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-28 Thread Neal P. Murphy
On Mon, 26 Oct 2015 21:06:33 +0100
Pablo Neira Ayuso  wrote:

> Hi,
> 
> On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote:
> > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
> 
> Please, no need to Cc everyone here. Please, submit your Netfilter
> patches to netfilter-de...@vger.kernel.org.
> 
> Moreover, it would be great if the subject includes something
> descriptive on what you need, for this I'd suggest:
> 
> [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in 
> nf_conntrack_find_get
> 
> I'm including Neal P. Murphy, he said he would help testing these
> backports, getting a Tested-by: tag usually speeds up things too.

I hammered it a couple nights ago. First test was 5000 processes on 6 SMP CPUs 
opening and closing a port on a 'remote' host using the usual random source 
ports. Only got up to 32000 conntracks. The generator was a 64-bit Smoothwall 
KVM without the patch. The traffic passed through a 32-bit Smoothwall KVM with 
the patch. The target was on the VM host. No problems encountered. I suspect I 
didn't come close to triggering the original problem. Second test was a couple 
thousand processes all using the same source IP and port and dest IP and port. 
Still no problems. But these were perl scripts (and they used lots of RAM); 
perhaps a short C program would let me run more.

Any ideas on how I might test it more brutally?

N
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/3] virtio-net: Using single MSIX IRQ for TX/RX Q pair

2015-10-28 Thread Jason Wang


On 10/28/2015 03:21 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 28, 2015 at 11:13:39AM +0800, Jason Wang wrote:
>>
>> On 10/27/2015 04:38 PM, Michael S. Tsirkin wrote:
>>> On Mon, Oct 26, 2015 at 10:52:47AM -0700, Ravi Kerur wrote:
 Ported earlier patch from Jason Wang (dated 12/26/2014).

 This patch tries to reduce the number of MSIX irqs required for
 virtio-net by sharing a MSIX irq for each TX/RX queue pair through
 channels. If transport support channel, about half of the MSIX irqs
 were reduced.

 Signed-off-by: Ravi Kerur 
>>> Why bother BTW? 
>> The reason is we want to save the number of interrupt vectors used.
>> Booting a guest with 256 queues with current driver will result all
>> tx/rx queues shares a single vector. This is suboptimal.
> With a single CPU? 

Even for smp guests. Or you want a per-cpu interrupt?

> But what configures so many queues? Why do it?

Something like cpu hot add.

>
>> With this
>> series, half could be saved.
> At cost of e.g. inability to balance the interrupts.

Didn't follow. Btw, most psychical cards shares irq with tx/rx queue pair.

>
>> And more complex policy could be applied on
>> top (e.g limit the number of vectors used by driver).
> If that's the motivation, I'd like to see a draft of that more complex
> policy first.

How about something like:

1) Driver provides a min and max number of vectors it needs.
2) Virtio pci can then use pci_enable_msix_range() and return the actual
number of vectors to driver.
3) Then driver can divide the virtqueues into different groups

>
>>> Looks like this is adding a bunch of overhead
>>> on data path - to what end?
>> I agree some benchmark is needed for this.
>>
>>> Maybe you have a huge number of these devices ... but in that case, how
>>> about sharing the config interrupt instead?
>>> That's only possible if host supports VIRTIO_1
>>> (so we can detect config interrupt by reading the ISR).
>>>
>>>
>>>
 ---
  drivers/net/virtio_net.c | 29 -
  1 file changed, 28 insertions(+), 1 deletion(-)

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index d8838ded..d705cce 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -72,6 +72,9 @@ struct send_queue {
  
/* Name of the send queue: output.$index */
char name[40];
 +
 +  /* Name of the channel, shared with irq. */
 +  char channel_name[40];
  };
  
  /* Internal representation of a receive virtqueue */
 @@ -1529,6 +1532,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
int ret = -ENOMEM;
int i, total_vqs;
const char **names;
 +  const char **channel_names;
 +  unsigned *channels;
  
/* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by
 * possible N-1 RX/TX queue pairs used in multiqueue mode, followed by
 @@ -1548,6 +1553,17 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
if (!names)
goto err_names;
  
 +  channel_names = kmalloc_array(vi->max_queue_pairs,
 +sizeof(*channel_names),
 +GFP_KERNEL);
 +  if (!channel_names)
 +  goto err_channel_names;
 +
 +  channels = kmalloc_array(total_vqs, sizeof(*channels),
 +   GFP_KERNEL);
 +  if (!channels)
 +  goto err_channels;
 +
/* Parameters for control virtqueue, if any */
if (vi->has_cvq) {
callbacks[total_vqs - 1] = NULL;
 @@ -1562,10 +1578,15 @@ static int virtnet_find_vqs(struct virtnet_info 
 *vi)
sprintf(vi->sq[i].name, "output.%d", i);
names[rxq2vq(i)] = vi->rq[i].name;
names[txq2vq(i)] = vi->sq[i].name;
 +  sprintf(vi->sq[i].channel_name, "txrx.%d", i);
 +  channel_names[i] = vi->sq[i].channel_name;
 +  channels[rxq2vq(i)] = i;
 +  channels[txq2vq(i)] = i;
}
  
ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
 -   names);
 +   names, channels, channel_names,
 +   vi->max_queue_pairs);
if (ret)
goto err_find;
  
 @@ -1580,6 +1601,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
vi->sq[i].vq = vqs[txq2vq(i)];
}
  
 +  kfree(channels);
 +  kfree(channel_names);
kfree(names);
kfree(callbacks);
kfree(vqs);
 @@ -1587,6 +1610,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
return 0;
  
  err_find:
 +  kfree(channels);
 +err_channels:
 +  kfree(channel_names);
 +err_channel_names:
kfree(names);
  err_names:
kfree(callbacks);

Re: [PATCH v7 02/10] ss: created formatters for json and hr

2015-10-28 Thread Matthias Tafelmeier

>
> Well, then we should wait for another voice aimed at the complexity of
> the patchset before amending and resending me the patchset.
>
>

Well, I perceive that after Sutter has taken over the maintenance
responsibilitiy and answered accordingly that the outstanding
resentments as resolved.
Those resentments were related to the patchsets complexity and size.

Right?

-- 
Matthias Tafelmeier




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v7 10/10] ss: activate json_writer excluded logic

2015-10-28 Thread Phil Sutter
On Wed, Oct 28, 2015 at 11:39:41AM +0900, Stephen Hemminger wrote:
> On Tue, 27 Oct 2015 14:21:03 +0100
> Phil Sutter  wrote:
> 
> > On Thu, Sep 10, 2015 at 09:35:08PM +0200, Matthias Tafelmeier wrote:
> > > This small patch extends the lib json_writer module for formerly
> > > deactivated functionality.  
> > 
> > Why was it deactivated in the first place?
> 
> The code came from another project that wasn't using this
> function.

Ah, I didn't get that the functions he uncomments were not added by his
series in the first place. Still:
- This patch should come before 02/10 which makes use of the uncommented
  functions here.
- jsonw_null() and jsonw_null_field() are still unused, no need to
  uncomment them.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy

2015-10-28 Thread Vladimir Davydov
On Tue, Oct 27, 2015 at 09:01:08AM -0700, Johannes Weiner wrote:
...
> > > But regardless of tcp window control, we need to account socket memory
> > > in the main memory accounting pool where pressure is shared (to the
> > > best of our abilities) between all accounted memory consumers.
> > > 
> > 
> > No objections to this point. However, I really don't like the idea to
> > charge tcp window size to memory.current instead of charging individual
> > pages consumed by the workload for storing socket buffers, because it is
> > inconsistent with what we have now. Can't we charge individual skb pages
> > as we do in case of other kmem allocations?
> 
> Absolutely, both work for me. I chose that route because it's where
> the networking code already tracks and accounts memory consumed, so it
> seemed like a better site to hook into.
> 
> But I understand your concerns. We want to track this stuff as close
> to the memory allocators as possible.

Exactly.

> 
> > > But also, there are people right now for whom the socket buffers cause
> > > system OOM, but the existing memcg's hard tcp window limitq that
> > > exists absolutely wrecks network performance for them. It's not usable
> > > the way it is. It'd be much better to have the socket buffers exert
> > > pressure on the shared pool, and then propagate the overall pressure
> > > back to individual consumers with reclaim, shrinkers, vmpressure etc.
> > 
> > This might or might not work. I'm not an expert to judge. But if you do
> > this only for memcg leaving the global case as it is, networking people
> > won't budge IMO. So could you please start such a major rework from the
> > global case? Could you please try to deprecate the tcp window limits not
> > only in the legacy memcg hierarchy, but also system-wide in order to
> > attract attention of networking experts?
> 
> I'm definitely interested in addressing this globally as well.
> 
> The idea behind this was to use the memcg part as a testbed. cgroup2
> is going to be new and people are prepared for hiccups when migrating
> their applications to it; and they can roll back to cgroup1 and tcp
> window limits at any time should they run into problems in production.

Then you'd better not touch existing tcp limits at all, because they
just work, and the logic behind them is very close to that of global tcp
limits. I don't think one can simplify it somehow. Moreover, frankly I
still have my reservations about this vmpressure propagation to skb
you're proposing. It might work, but I doubt it will allow us to throw
away explicit tcp limit, as I explained previously. So, even with your
approach I think we can still need per memcg tcp limit *unless* you get
rid of global tcp limit somehow.

> 
> So this seemed like a good way to prove a new mechanism before rolling
> it out to every single Linux setup, rather than switch everybody over
> after the limited scope testing I can do as a developer on my own.
> 
> Keep in mind that my patches are not committing anything in terms of
> interface, so we retain all the freedom to fix and tune the way this
> is implemented, including the freedom to re-add tcp window limits in
> case the pressure balancing is not a comprehensive solution.
> 

I really dislike this kind of proof. It looks like you're trying to push
something you think is right covertly, w/o having a proper discussion
with networking people and then say that it just works and hence should
be done globally, but what if it won't? Revert it? We already have a lot
of dubious stuff in memcg that should be reverted, so let's please try
to avoid this kind of mistakes in future. Note, I say "w/o having a
proper discussion with networking people", because I don't think they
will really care *unless* you change the global logic, simply because
most of them aren't very interested in memcg AFAICS.

That effectively means you loose a chance to listen to networking
experts, who could point you at design flaws and propose an improvement
right away. Let's please not miss such an opportunity. You said that
you'd seen this problem happen w/o cgroups, so you have a use case that
might need fixing at the global level. IMO it shouldn't be difficult to
prepare an RFC patch for the global case first and see what people think
about it.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 02/10] ss: created formatters for json and hr

2015-10-28 Thread Phil Sutter
On Wed, Oct 28, 2015 at 09:07:47AM +0100, Matthias Tafelmeier wrote:
> 
> >
> > Well, then we should wait for another voice aimed at the complexity of
> > the patchset before amending and resending me the patchset.
> >
> >
> 
> Well, I perceive that after Sutter has taken over the maintenance
> responsibilitiy and answered accordingly that the outstanding
> resentments as resolved.

I did not take over maintenance responsibility (whatever that means to
you precisely). I merely reviewed the patches, focussing on the
technical aspects of both implementation and patch management.

Regarding the concept itself, I think the usability of filters in
combination with json output is worth a discussion as well.

> Those resentments were related to the patchsets complexity and size.

I didn't see any problem with that in the first place. It is indeed a
big change, achieving something like that without a big patch set is
unlikely.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG: fsl FEC ethernet tx checksum offloading doesn't work with RMII interface

2015-10-28 Thread David Jander

Hi all,

I was unable to figure out who's maintaining
drivers/net/ethernet/freescale/fec_main.c, so I hope someone can help out on
this list...

We have a board with a RMII phy connected to an i.MX6S. The hardware seems to
be ok, since I can receive and transmit ethernet frames without drops or
errors. However only simple things like ping and dhcp seemed to work. TCP/IP
connections could not be made. When looking at both ends with tcpdump, I
realized that all transmitted packages arrived at the other end with the TCP
and IP header checksums zeroed-out.

After issuing the following command, TCP/IP started working correctly:

$ ethtool --offload eth0 tx off

This works around the issue. For some reason, when the FEC is in RMII mode, it
isn't filling in the checksums.

On another board with an RGMII phy the same kernel works fine without the need
to disable offloading. What can possibly relate this functionality to the
choice of MAC interface?

Best regards,

-- 
David Jander
Protonic Holland.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 02/10] ss: created formatters for json and hr

2015-10-28 Thread Matthias Tafelmeier

> Yeah, sorry for not having looked into this earlier. Also, I neither
> have nor claim any power of veto.

No big issue. Maybe Stephen can clarifiy things. I mean acknowledge no
further objections.

>  Apart from that, I'm not against this
> patch series in general, just trying to help raise it's quality a bit.

Many thanks for that.

> Eventually, we don't set anything in stone so everything can be
> fixed/improved later on. Except Git history of course, which is
> important to get right in relation to bisecting.
Absolutely!

-- 
BR

Matthias



signature.asc
Description: OpenPGP digital signature


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Alan Burlison

On 27/10/2015 23:17, Al Viro wrote:


Frankly, as far as I'm concerned, the bottom line is
* there are two variants of semantics in that area and there's not
much that could be done about that.


Yes, that seems to be the case.


* POSIX is vague enough for both variants to comply with it (it's
also very badly written in the area in question).


On that aspect I disagree, the POSIX semantics seem clear to me, and are 
different to the Linux behaviour.



* I don't see any way to implement something similar to Solaris
behaviour without a huge increase of memory footprint or massive cacheline
pingpong.  Solaris appears to go for memory footprint from hell - cacheline
per descriptor (instead of a pointer per descriptor).


Yes, that does seem to be the case. Thanks for the detailed explanation 
you've provided as to why that's so.



* the benefits of Solaris-style behaviour are not obvious - all things
equal it would be interesting, but the things are very much not equal.  What's
more, if your userland code is such that accept() argument could be closed by
another thread, the caller *cannot* do anything with said argument after
accept() returns, no matter which variant of semantics is used.


Yes, irrespective of how you terminate the accept, once it returns with 
an error it's unsafe to use the FD, with the exception of failures such 
as EAGAIN, EINTR etc. However the shutdown() behaviour of Linux is not 
POSIX compliant and allowing an accept to continue of a FD that's been 
closed doesn't seem correct either.



* [Linux-specific aside] our __alloc_fd() can degrade quite badly
with some use patterns.  The cacheline pingpong in the bitmap is probably
inevitable, unless we accept considerably heavier memory footprint,
but we also have a case when alloc_fd() takes O(n) and it's _not_ hard
to trigger - close(3);open(...); will have the next open() after that
scanning the entire in-use bitmap.  I think I see a way to improve it
without slowing the normal case down, but I'll need to experiment a
bit before I post patches.  Anybody with examples of real-world loads
that make our descriptor allocator to degrade is very welcome to post
the reproducers...


It looks like the remaining discussion is going to be about Linux 
implementation details so I'll bow out at this point. Thanks again for 
all the helpful explanation.


--
Alan Burlison
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userland later on.

2015-10-28 Thread Loganaden Velvindron
---
 net/bridge/netfilter/ebtables.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index f46ca41..26922e9 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -989,7 +989,7 @@ static int do_replace_finish(struct net *net, struct 
ebt_replace *repl,
   the check on the size is done later, when we have the lock */
if (repl->num_counters) {
unsigned long size = repl->num_counters * sizeof(*counterstmp);
-   counterstmp = vmalloc(size);
+   counterstmp = vzalloc(size);
if (!counterstmp)
return -ENOMEM;
}
@@ -1410,7 +1410,7 @@ static int copy_counters_to_user(struct ebt_table *t,
return -EINVAL;
}
 
-   counterstmp = vmalloc(nentries * sizeof(*counterstmp));
+   counterstmp = vzalloc(nentries * sizeof(*counterstmp));
if (!counterstmp)
return -ENOMEM;
 
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/4] sfc: allocate rx pages on the same node as the interrupt

2015-10-28 Thread Eric Dumazet
On Wed, 2015-10-28 at 15:01 +, Shradha Shah wrote:
> From: Daniel Pieczko 
> 
> When the interrupt servicing a channel is on a NUMA node that is
> not local to the device, performance is improved by allocating
> rx pages on the node local to the interrupt (remote to the device)
> 
> The performance-optimal case, where interrupts and applications
> are pinned to CPUs on the same node as the device, is not altered
> by this change.
> 
> This change gave a 1% improvement in transaction rate using Nginx
> with all interrupts and Nginx threads on the node remote to the
> device. It also gave a small reduction in round-trip latency,
> again with the interrupt and application on a different node to
> the device.
> 
> Allocating rx pages based on the channel->irq_node value is only
> valid for the initial driver-load interrupt affinities; if an
> interrupt is moved later, the wrong node may be used for the
> allocation.
> 
> Signed-off-by: Shradha Shah 
> ---
>  drivers/net/ethernet/sfc/efx.c|  1 +
>  drivers/net/ethernet/sfc/net_driver.h |  3 +++
>  drivers/net/ethernet/sfc/rx.c | 14 +-
>  3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
> index 974637d..89fbd03 100644
> --- a/drivers/net/ethernet/sfc/efx.c
> +++ b/drivers/net/ethernet/sfc/efx.c
> @@ -445,6 +445,7 @@ efx_alloc_channel(struct efx_nic *efx, int i, struct 
> efx_channel *old_channel)
>   channel->efx = efx;
>   channel->channel = i;
>   channel->type = _default_channel_type;
> + channel->irq_node = NUMA_NO_NODE;
>  
>   for (j = 0; j < EFX_TXQ_TYPES; j++) {
>   tx_queue = >tx_queue[j];
> diff --git a/drivers/net/ethernet/sfc/net_driver.h 
> b/drivers/net/ethernet/sfc/net_driver.h
> index ad56231..0ab9080a 100644
> --- a/drivers/net/ethernet/sfc/net_driver.h
> +++ b/drivers/net/ethernet/sfc/net_driver.h
> @@ -419,6 +419,7 @@ enum efx_sync_events_state {
>   * @sync_events_state: Current state of sync events on this channel
>   * @sync_timestamp_major: Major part of the last ptp sync event
>   * @sync_timestamp_minor: Minor part of the last ptp sync event
> + * @irq_node: NUMA node of interrupt
>   */
>  struct efx_channel {
>   struct efx_nic *efx;
> @@ -477,6 +478,8 @@ struct efx_channel {
>   enum efx_sync_events_state sync_events_state;
>   u32 sync_timestamp_major;
>   u32 sync_timestamp_minor;
> +
> + int irq_node;
>  };
>  
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
> index 3f0e129..c5ef1e8 100644
> --- a/drivers/net/ethernet/sfc/rx.c
> +++ b/drivers/net/ethernet/sfc/rx.c
> @@ -168,11 +168,15 @@ static int efx_init_rx_buffers(struct efx_rx_queue 
> *rx_queue, bool atomic)
>* context in such a case.  So, use __GFP_NO_WARN
>* in case of atomic.
>*/
> - page = alloc_pages(__GFP_COLD | __GFP_COMP |
> -(atomic ?
> - (GFP_ATOMIC | __GFP_NOWARN)
> - : GFP_KERNEL),
> -efx->rx_buffer_order);
> + struct efx_channel *channel;
> +
> + channel = efx_rx_queue_channel(rx_queue);
> + page = alloc_pages_node(channel->irq_node, __GFP_COMP |
> + (atomic ?
> +  (GFP_ATOMIC | __GFP_NOWARN)
> +  : GFP_KERNEL),
> + efx->rx_buffer_order);
> +
>   if (unlikely(page == NULL))
>   return -ENOMEM;
>   dma_addr =
> 


Sorry, I do not understand this patch, and why the following one is not
squashed on this one.

irq_node is always NUMA_NO_NODE (in this patch)

So you claim a 1% improvement, switching from alloc_pages(...) to
alloc_pages_node(NUMA_NO_NODE, ...) ???



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: bridge: use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userland la

2015-10-28 Thread Loganaden Velvindron
---
 net/bridge/netfilter/ebtables.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index f46ca41..26922e9 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -989,7 +989,7 @@ static int do_replace_finish(struct net *net, struct 
ebt_replace *repl,
   the check on the size is done later, when we have the lock */
if (repl->num_counters) {
unsigned long size = repl->num_counters * sizeof(*counterstmp);
-   counterstmp = vmalloc(size);
+   counterstmp = vzalloc(size);
if (!counterstmp)
return -ENOMEM;
}
@@ -1410,7 +1410,7 @@ static int copy_counters_to_user(struct ebt_table *t,
return -EINVAL;
}
 
-   counterstmp = vmalloc(nentries * sizeof(*counterstmp));
+   counterstmp = vzalloc(nentries * sizeof(*counterstmp));
if (!counterstmp)
return -ENOMEM;
 
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] unix: fix use-after-free in unix_dgram_poll()

2015-10-28 Thread Rainer Weikusat
Rainer Weikusat  writes:
> Jason Baron  writes:

[...]

>> 2)
>>
>> For the case of epoll() in edge triggered mode we need to ensure that
>> when we return -EAGAIN from unix_dgram_sendmsg() when unix_recvq_full()
>> is true, we need to add a unix_peer_wake_connect() call to guarantee a
>> wakeup. Otherwise, we are going to potentially hang there.
>
> I consider this necessary.

(As already discussed privately) just doing this would open up another
way for sockets to be enqueued on the peer_wait queue of the peer
forever despite no one wants to be notified of write space
availability. Here's another RFC patch addressing the issues so far plus
this one by breaking the connection to the peer socket from the wake up
relaying function. This has the nice additional property that the
dgram_poll code becomes somewhat simpler as the "dequeued where we
didn't enqueue" situation can no longer occur and the not-so-nice
additional property that the connect and disconnect functions need to
take the peer_wait.lock spinlock explicitly so that this lock is used to
ensure that no two threads modifiy the private pointer of the client
wait_queue_t.

I've also moved the check, possibly enqueue then recheck and possibly
dequeue dance into a pair of functions as this code would be identical
for both unix_dgram_poll and unix_dgram_sendmsg (I'm not really happy
with the names, though).

---
--- linux-2-6.b/net/unix/af_unix.c  2015-10-28 16:06:29.581960497 +
+++ linux-2-6/net/unix/af_unix.c2015-10-28 16:14:55.326065483 +
@@ -115,6 +115,8 @@
 #include 
 #include 
 
+#define POLL_OUT_ALL   (POLLOUT | POLLWRNORM | POLLWRBAND)
+
 static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
 static DEFINE_SPINLOCK(unix_table_lock);
 static atomic_long_t unix_nr_socks;
@@ -303,6 +305,117 @@ found:
return s;
 }
 
+/*
+ * Support code for asymmetrically connected dgram sockets
+ *
+ * If a datagram socket is connected to a socket not itself connected
+ * to the first socket (eg, /dev/log), clients may only enqueue more
+ * messages if the present receive queue of the server socket is not
+ * "too large". This means there's a second writability condition poll
+ * and sendmsg need to test. The dgram recv code will do a wake up on
+ * the peer_wait wait queue of a socket upon reception of a datagram
+ * which needs to be propagated to sleeping writers since these might
+ * not yet have sent anything. This can't be accomplished via
+ * poll_wait because the lifetime of the server socket might be less
+ * than that of its clients if these break their association with it
+ * or if the server socket is closed while clients are still connected
+ * to it and there's no way to inform "a polling implementation" that
+ * it should let go of a certain wait queue
+ *
+ * In order to achieve wake up propagation, a wait_queue_t of the
+ * client socket is thus enqueued on the peer_wait queue of the server
+ * socket whose wake function does a wake_up on the ordinary client
+ * socket wait queue. This connection is established whenever a write
+ * (or poll for write) hit the flow control condition and broken when
+ * the connection to the server socket is dissolved or after a wake up
+ * was relayed.
+ */
+
+static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int 
flags,
+   void *key)
+{
+   struct unix_sock *u;
+   wait_queue_head_t *u_sleep;
+
+   u = container_of(q, struct unix_sock, peer_wake);
+
+   __remove_wait_queue(_sk(u->peer_wake.private)->peer_wait,
+   >peer_wake);
+   u->peer_wake.private = NULL;
+
+   /* relaying can only happen while the wq still exists */
+   u_sleep = sk_sleep(>sk);
+   if (u_sleep)
+   wake_up_interruptible_poll(u_sleep, key);
+
+   return 0;
+}
+
+static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
+{
+   struct unix_sock *u, *u_other;
+   int rc;
+
+   u = unix_sk(sk);
+   u_other = unix_sk(other);
+   rc = 0;
+
+   spin_lock(_other->peer_wait.lock);
+
+   if (!u->peer_wake.private) {
+   u->peer_wake.private = other;
+   __add_wait_queue(_other->peer_wait, >peer_wake);
+
+   rc = 1;
+   }
+
+   spin_unlock(_other->peer_wait.lock);
+   return rc;
+}
+
+static int unix_dgram_peer_wake_disconnect(struct sock *sk, struct sock *other)
+{
+   struct unix_sock *u, *u_other;
+   int rc;
+
+   u = unix_sk(sk);
+   u_other = unix_sk(other);
+   rc = 0;
+
+   spin_lock(_other->peer_wait.lock);
+
+   if (u->peer_wake.private == other) {
+   __remove_wait_queue(_other->peer_wait, >peer_wake);
+   u->peer_wake.private = NULL;
+
+   rc = 1;
+   }
+
+   spin_unlock(_other->peer_wait.lock);
+   return rc;
+}
+
+static inline int 

Re: [RFC] unix: fix use-after-free in unix_dgram_poll()

2015-10-28 Thread Jason Baron
On 10/28/2015 12:46 PM, Rainer Weikusat wrote:
> Rainer Weikusat  writes:
>> Jason Baron  writes:
> 
> [...]
> 
>>> 2)
>>>
>>> For the case of epoll() in edge triggered mode we need to ensure that
>>> when we return -EAGAIN from unix_dgram_sendmsg() when unix_recvq_full()
>>> is true, we need to add a unix_peer_wake_connect() call to guarantee a
>>> wakeup. Otherwise, we are going to potentially hang there.
>>
>> I consider this necessary.
> 
> (As already discussed privately) just doing this would open up another
> way for sockets to be enqueued on the peer_wait queue of the peer
> forever despite no one wants to be notified of write space
> availability. Here's another RFC patch addressing the issues so far plus
> this one by breaking the connection to the peer socket from the wake up
> relaying function. This has the nice additional property that the
> dgram_poll code becomes somewhat simpler as the "dequeued where we
> didn't enqueue" situation can no longer occur and the not-so-nice
> additional property that the connect and disconnect functions need to
> take the peer_wait.lock spinlock explicitly so that this lock is used to
> ensure that no two threads modifiy the private pointer of the client
> wait_queue_t.

Hmmm...I thought these were already all guarded by unix_state_lock(sk).
In any case, rest of the patch overall looks good to me.

Thanks,

-Jason

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 4/4] sfc: set and clear interrupt affinity hints

2015-10-28 Thread Sergei Shtylyov

Hello.

On 10/28/2015 06:02 PM, Shradha Shah wrote:


From: Bert Kenward 

Use cpumask_local_spread to provide interrupt affinity hints
for each queue. This will spread interrupts across NUMA local
CPUs first, extending to remote nodes if needed.

Signed-off-by: Shradha Shah 
---
  drivers/net/ethernet/sfc/efx.c | 35 +++
  1 file changed, 35 insertions(+)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 84f9e90..93c4c0e 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1489,6 +1489,30 @@ static int efx_probe_interrupts(struct efx_nic *efx)
return 0;
  }

+#if defined(CONFIG_SMP)
+static void efx_set_interrupt_affinity(struct efx_nic *efx)
+{
+   struct efx_channel *channel;
+   unsigned int cpu;
+
+   efx_for_each_channel(channel, efx) {
+   cpu = cpumask_local_spread(channel->channel,
+  pcibus_to_node(efx->pci_dev->bus));
+
+   irq_set_affinity_hint(channel->irq, cpumask_of(cpu));
+   channel->irq_mem_node = cpu_to_mem(cpu);
+   }
+}
+
+static void efx_clear_interrupt_affinity(struct efx_nic *efx)
+{
+   struct efx_channel *channel;
+
+   efx_for_each_channel(channel, efx)
+   irq_set_affinity_hint(channel->irq, NULL);
+}
+#endif /* CONFIG_SMP */
+
  static int efx_soft_enable_interrupts(struct efx_nic *efx)
  {
struct efx_channel *channel, *end_channel;
@@ -2932,6 +2956,9 @@ static void efx_pci_remove_main(struct efx_nic *efx)
cancel_work_sync(>reset_work);

efx_disable_interrupts(efx);
+#if defined(CONFIG_SMP)
+   efx_clear_interrupt_affinity(efx);
+#endif


   Please just define empty function for !SMP case instead of the ugly 
#ifdef'fery in the functiojn bodies.


[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: bridge: use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userlan

2015-10-28 Thread Loganaden Velvindron
On Wed, Oct 28, 2015 at 09:10:20PM +0300, Sergei Shtylyov wrote:
> Hello.
> 
>Your subject is too long, it should have been placed in the changelog
> partially. You you didn't sign off on the patch, so it can't applied.
> 
> MBR, Sergei

Thank you. Please reject this patch. I re-sent a proper one in another mail.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: bridge: use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userlan

2015-10-28 Thread Loganaden Velvindron
Please reject this patch. I sent a proper one with the sign-on later on.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/1] tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers

2015-10-28 Thread Jon Maloy
Testing of the new UDP bearer has revealed that reception of
NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE
message buffers is not prepared for the case that those may be
non-linear.

We now linearize all such buffers before they are delivered up to the
generic reception layer.

In order for the commit to apply cleanly to 'net' and 'stable', we do
the change in the function tipc_udp_recv() for now. Later, we will post
a commit to 'net-next' moving the linearization to generic code, in
tipc_named_rcv() and tipc_link_proto_rcv().

Fixes: commit d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Jon Maloy 
---
 net/tipc/udp_media.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index 6e648d9..cd7c5f1 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -48,6 +48,7 @@
 #include 
 #include "core.h"
 #include "bearer.h"
+#include "msg.h"
 
 /* IANA assigned UDP port */
 #define UDP_PORT_DEFAULT   6118
@@ -222,6 +223,10 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff 
*skb)
 {
struct udp_bearer *ub;
struct tipc_bearer *b;
+   int usr = msg_user(buf_msg(skb));
+
+   if ((usr == LINK_PROTOCOL) || (usr == NAME_DISTRIBUTOR))
+   skb_linearize(skb);
 
ub = rcu_dereference_sk_user_data(sk);
if (!ub) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] ipv4: use l4 hash for locally generated multipath flows

2015-10-28 Thread Paolo Abeni
This patch changes how the multipath hash is computed for locally
generated UDP or TCP flows: now the hash comprises also l4 information
(source and destination port).

This allows better utilization of the available paths when the existing
flows have the same source IP and the same destination IP: with l3 hash,
even when multiple connections are in place simultaneously, a single path
will be used, while with l4 hash we can use all the available paths.

Signed-off-by: Paolo Abeni 
---
 include/net/ip_fib.h | 12 
 net/ipv4/fib_semantics.c |  3 ++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index ac5c6e8..56bf68c 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -328,6 +328,18 @@ static inline int fib_multipath_hash(__be32 saddr, __be32 
daddr)
return jhash_2words(saddr, daddr, fib_multipath_secret) >> 1;
 }
 
+static inline int fib_multipath_output_hash(const struct flowi4 *fl4)
+{
+   if ((fl4->flowi4_proto == IPPROTO_TCP) ||
+   (fl4->flowi4_proto == IPPROTO_UDP))
+   return jhash_3words(fl4->saddr, fl4->daddr,
+   *((__u32 *)>uli.ports),
+   fib_multipath_secret) >> 1;
+
+   return jhash_2words(fl4->saddr, fl4->daddr, fib_multipath_secret) >> 1;
+}
+
+
 void fib_select_multipath(struct fib_result *res, int hash);
 void fib_select_path(struct net *net, struct fib_result *res,
 struct flowi4 *fl4, int mp_hash);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 42778d9..8a18349 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1564,7 +1564,8 @@ void fib_select_path(struct net *net, struct fib_result 
*res,
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
if (res->fi->fib_nhs > 1 && fl4->flowi4_oif == 0) {
if (mp_hash < 0)
-   mp_hash = fib_multipath_hash(fl4->saddr, fl4->daddr);
+   mp_hash = fib_multipath_output_hash(fl4);
+
fib_select_multipath(res, mp_hash);
}
else
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1 net-next] net: bridge: use vzalloc() instead of vmalloc() for counterstmp

2015-10-28 Thread Loganaden Velvindron
counterstmp is not cleared before it is used in get_counters(). it might be 
leaked partially when it is sent to userland later on.

Signed-off-by: Loganaden Velvindron 
---
 net/bridge/netfilter/ebtables.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index f46ca41..26922e9 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -989,7 +989,7 @@ static int do_replace_finish(struct net *net, struct 
ebt_replace *repl,
   the check on the size is done later, when we have the lock */
if (repl->num_counters) {
unsigned long size = repl->num_counters * sizeof(*counterstmp);
-   counterstmp = vmalloc(size);
+   counterstmp = vzalloc(size);
if (!counterstmp)
return -ENOMEM;
}
@@ -1410,7 +1410,7 @@ static int copy_counters_to_user(struct ebt_table *t,
return -EINVAL;
}
 
-   counterstmp = vmalloc(nentries * sizeof(*counterstmp));
+   counterstmp = vzalloc(nentries * sizeof(*counterstmp));
if (!counterstmp)
return -ENOMEM;
 
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: bridge: use vzalloc() instead of vmalloc() as counterstmp is not cleared before it is used in get_counters(). counterstmp might be leaked partially when it is sent to userlan

2015-10-28 Thread Sergei Shtylyov

Hello.

   Your subject is too long, it should have been placed in the changelog 
partially. You you didn't sign off on the patch, so it can't applied.


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next] hyperv: Add handler for RNDIS_STATUS_NETWORK_CHANGE event

2015-10-28 Thread Haiyang Zhang


> -Original Message-
> From: Richard Weinberger [mailto:richard.weinber...@gmail.com]
> Sent: Tuesday, October 27, 2015 6:36 PM
> To: David Miller 
> Cc: Haiyang Zhang ; o...@aepfle.de; Greg Kroah-
> Hartman ; netdev@vger.kernel.org;
> jasow...@redhat.com; driverdev-de...@linuxdriverproject.org; LKML
> 
> Subject: Re: [PATCH net-next] hyperv: Add handler for
> RNDIS_STATUS_NETWORK_CHANGE event
> 
> On Mon, Jun 23, 2014 at 10:10 PM, David Miller 
> wrote:
> > From: Haiyang Zhang 
> > Date: Mon, 23 Jun 2014 16:09:59 +
> >
> >> So, what's the equivalent or similar command to "network restart" on
> >> SLES12? Could you update the command line for the usermodehelper
> when
> >> porting this patch to SLES 12?
> >
> > No, you are not going to keep the usermodehelper invocation in your
> > driver please remove it.  It is absolutely inappropriate, and I
> > strictly do not want to keep it in there because other people will
> > copy it and then we'll have a real mess on our hands.
> 
> Sorry for digging up this old thread.
> While talking with some guys about usermodehelper abuses I came across
> this gem.
> Mainline still contains that "/etc/init.d/network restart" code.
> Haiyang, care to cleanup?

I will clean up the usermode helper soon.

Thanks,
- Haiyang

N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

Re: [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy

2015-10-28 Thread Johannes Weiner
On Wed, Oct 28, 2015 at 11:20:03AM +0300, Vladimir Davydov wrote:
> Then you'd better not touch existing tcp limits at all, because they
> just work, and the logic behind them is very close to that of global tcp
> limits. I don't think one can simplify it somehow.

Uhm, no, there is a crapload of boilerplate code and complication that
seems entirely unnecessary. The only thing missing from my patch seems
to be the part where it enters memory pressure state when the limit is
hit. I'm adding this for completeness, but I doubt it even matters.

> Moreover, frankly I still have my reservations about this vmpressure
> propagation to skb you're proposing. It might work, but I doubt it
> will allow us to throw away explicit tcp limit, as I explained
> previously. So, even with your approach I think we can still need
> per memcg tcp limit *unless* you get rid of global tcp limit
> somehow.

Having the hard limit as a failsafe (or a minimum for other consumers)
is one thing, and certainly something I'm open to for cgroupv2, should
we have problems with load startup up after a socket memory landgrab.

That being said, if the VM is struggling to reclaim pages, or is even
swapping, it makes perfect sense to let the socket memory scheduler
know it shouldn't continue to increase its footprint until the VM
recovers. Regardless of any hard limitations/minimum guarantees.

This is what my patch does and it seems pretty straight-forward to
me. I don't really understand why this is so controversial.

The *next* step would be to figure out whether we can actually
*reclaim* memory in the network subsystem--shrink windows and steal
buffers back--and that might even be an avenue to replace tcp window
limits. But it's not necessary for *this* patch series to be useful.

> > So this seemed like a good way to prove a new mechanism before rolling
> > it out to every single Linux setup, rather than switch everybody over
> > after the limited scope testing I can do as a developer on my own.
> > 
> > Keep in mind that my patches are not committing anything in terms of
> > interface, so we retain all the freedom to fix and tune the way this
> > is implemented, including the freedom to re-add tcp window limits in
> > case the pressure balancing is not a comprehensive solution.
> 
> I really dislike this kind of proof. It looks like you're trying to
> push something you think is right covertly, w/o having a proper
> discussion with networking people and then say that it just works
> and hence should be done globally, but what if it won't? Revert it?
> We already have a lot of dubious stuff in memcg that should be
> reverted, so let's please try to avoid this kind of mistakes in
> future. Note, I say "w/o having a proper discussion with networking
> people", because I don't think they will really care *unless* you
> change the global logic, simply because most of them aren't very
> interested in memcg AFAICS.

Come on, Dave is the first To and netdev is CC'd. They might not care
about memcg, but "pushing things covertly" is a bit of a stretch.

> That effectively means you loose a chance to listen to networking
> experts, who could point you at design flaws and propose an improvement
> right away. Let's please not miss such an opportunity. You said that
> you'd seen this problem happen w/o cgroups, so you have a use case that
> might need fixing at the global level. IMO it shouldn't be difficult to
> prepare an RFC patch for the global case first and see what people think
> about it.

No, the problem we are running into is when network memory is not
tracked per cgroup. The lack of containment means that the socket
memory consumption of individual cgroups can trigger system OOM.

We tried using the per-memcg tcp limits, and that prevents the OOMs
for sure, but it's horrendous for network performance. There is no
"stop growing" phase, it just keeps going full throttle until it hits
the wall hard.

Now, we could probably try to replicate the global knobs and add a
per-memcg soft limit. But you know better than anyone else how hard it
is to estimate the overall workingset size of a workload, and the
margins on containerized loads are razor-thin. Performance is much
more sensitive to input errors, and often times parameters must be
adjusted continuously during the runtime of a workload. It'd be
disasterous to rely on yet more static, error-prone user input here.

What all this means to me is that fixing it on the cgroup level has
higher priority. But it also means that once we figured it out under
such a high-pressure environment, it's much easier to apply to the
global case and potentially replace the soft limit there.

This seems like a better approach to me than starting globally, only
to realize that the solution is not workable for cgroups and we need
yet something else.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[BUG] Erroneous behavior in try_to_coalesce

2015-10-28 Thread Nikolay Borisov
Hello,

Recently I observed 2 crashes on one of my server with the following backtraces:

[22751.889645] [ cut here ]
[22751.889660] WARNING: CPU: 38 PID: 12807 at net/core/skbuff.c:3498
skb_try_coalesce+0x34b/0x360()
[22751.889661] Modules linked in: tcp_diag inet_diag xt_LOG xt_limit
xt_addrtype xt_multiport xt_pkt
type xt_conntrack netconsole act_police cls_basic sch_ingress veth
ipv6 openvswitch gre vxlan ip_tun
nel xt_owner xt_state iptable_mangle xt_nat iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack
iptable_raw ext2 dm_thin_pool dm_bio_prison dm_persistent_data
dm_bufio dm_mirror dm_region_hash dm_log ixgbe i2c_i801 lpc_ich
mfd_core igb i2c_algo_bit ioapic ses enclosure ioatdma dca
ipmi_devintf ipmi_si ipmi_msghandler aacraid
[22751.889704] CPU: 38 PID: 12807 Comm: handler22 Not tainted
3.12.49-clouder2 #2
[22751.889706] Hardware name: Supermicro
PIO-617R-TLN4F+-ST031/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0b 05/27/2014
[22751.889708]  0daa 883fff4839e8 81643c91
0daa
[22751.889716]   883fff483a28 81089acc
883fff483b68
[22751.889721]  8832bd282b00 882e6b0190e8 883fff483aa4
05b4
[22751.889726] Call Trace:
[22751.889728][] dump_stack+0x58/0x7f
[22751.889739]  [] warn_slowpath_common+0x8c/0xc0
[22751.889742]  [] warn_slowpath_null+0x1a/0x20
[22751.889745]  [] skb_try_coalesce+0x34b/0x360
[22751.889752]  [] tcp_try_coalesce+0x69/0xc0
[22751.889755]  [] tcp_queue_rcv+0x53/0x130
[22751.889758]  [] tcp_data_queue+0x1d3/0xd40
[22751.889761]  [] tcp_rcv_established+0x319/0x5e0
[22751.889767]  [] ? nf_nat_ipv4_fn+0x1e1/0x270 [iptable_nat]
[22751.889771]  [] tcp_v4_do_rcv+0x152/0x3d0
[22751.889777]  [] ? security_sock_rcv_skb+0x16/0x20
[22751.889781]  [] ? sk_filter+0x37/0xf0
[22751.889784]  [] tcp_v4_rcv+0x6b7/0x730
[22751.889787]  [] ? ip_rcv+0x3a0/0x3a0
[22751.889791]  [] ? nf_hook_slow+0x85/0x130
[22751.889794]  [] ? ip_rcv+0x3a0/0x3a0
[22751.889796]  [] ip_local_deliver_finish+0xc2/0x250
[22751.889799]  [] ip_local_deliver+0x88/0x90
[22751.889802]  [] ip_rcv_finish+0x119/0x380
[22751.889804]  [] ip_rcv+0x2c5/0x3a0
[22751.889809]  [] ? netdev_frame_hook+0xb5/0x130
[openvswitch]
[22751.889815]  [] __netif_receive_skb_core+0x626/0x7e0
[22751.889818]  [] __netif_receive_skb+0x27/0x70
[22751.889820]  [] process_backlog+0xd9/0x1e0
[22751.889823]  [] net_rx_action+0x12c/0x280
[22751.889828]  [] __do_softirq+0x137/0x2e0
[22751.889832]  [] call_softirq+0x1c/0x30
[22751.889833][] do_softirq+0x8d/0xc0
[22751.889843]  [] ?
ovs_packet_cmd_execute+0x217/0x250 [openvswitch]
[22751.889846]  [] local_bh_enable+0xdb/0xf0
[22751.889849]  []
ovs_packet_cmd_execute+0x217/0x250 [openvswitch]
[22751.889853]  [] genl_family_rcv_msg+0x221/0x390
[22751.889856]  [] ? genl_family_rcv_msg+0x390/0x390
[22751.889858]  [] genl_rcv_msg+0x63/0xb0
[22751.889861]  [] netlink_rcv_skb+0xa9/0xd0
[22751.889864]  [] genl_rcv+0x2c/0x40
[22751.889867]  [] netlink_unicast+0x10f/0x190
[22751.889869]  [] netlink_sendmsg+0x2bb/0x650
[22751.889874]  [] ? __pollwait+0xf0/0xf0
[22751.889881]  [] sock_sendmsg+0x90/0xc0
[22751.889883]  [] ? __pollwait+0xf0/0xf0
[22751.889887]  [] ? local_bh_enable_ip+0x87/0xf0
[22751.889890]  [] ? _raw_spin_unlock_bh+0x24/0x30
[22751.889894]  [] ? verify_iovec+0x8d/0x110
[22751.889898]  [] ___sys_sendmsg+0x417/0x440
[22751.889904]  [] ? ep_poll+0x144/0x370

And then alter the actual crashed occured:

[44923.628546] BUG: unable to handle kernel paging request at 00820299
[44923.629139] IP: [] kfree_skb_list+0x18/0x30
[44923.629463] PGD 35cc3b5067 PUD 0
[44923.629823] Oops:  [#1] SMP
[44923.630182] Modules linked in: tcp_diag inet_diag xt_LOG xt_limit
xt_addrtype xt_multiport xt_pkttype xt_conntrack netconsole act_police
cls_basic sch_ingress veth ipv6 openvswitch gre vxlan ip_tunnel
xt_owner xt_state iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ext2
dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror
dm_region_hash dm_log ixgbe i2c_i801 lpc_ich mfd_core igb i2c_algo_bit
ioapic ses enclosure ioatdma dca ipmi_devintf ipmi_si ipmi_msghandler
aacraid
[44923.634368] CPU: 10 PID: 39391 Comm: kworker/u80:0 Tainted: G
 W3.12.49-clouder2 #2
[44923.634851] Hardware name: Supermicro
PIO-617R-TLN4F+-ST031/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0b 05/27/2014
[44923.635340] Workqueue: dm-thin do_worker [dm_thin_pool]
[44923.635653] task: 881918cb0810 ti: 880d5a4ea000 task.ti:
880d5a4ea000
[44923.635926] RIP: 0010:[]  []
kfree_skb_list+0x18/0x30
[44923.636251] RSP: 0018:883fff003cd0  EFLAGS: 00010206
[44923.636521] RAX:  RBX: 882e5622be00 RCX: 883fd12b9800
[44923.636791] RDX: 0100 RSI: 0040 RDI: 00820299
[44923.637064] RBP: 883fff003ce0 R08: 00dc R09: 0003
[44923.637336] R10: 0003 R11: 

Re: [BUG] Erroneous behavior in try_to_coalesce

2015-10-28 Thread Eric Dumazet
On Thu, 2015-10-29 at 04:19 +0900, Nikolay Borisov wrote:

> 
> 
> Could you please comment whether it looks viable so that I can resend
> as a proper fix? Also the interesting question is what kind of packets
> could trigger this warn_on_once? In both traces ovs_packet_cmd_execute
> is present so I suspect it might be possible that somehow openvswitch is
> injecting wrong packets which make the kernel crash.

Bug is the packet producer, not in try_to_coalesce()

This issue comes up on netdev from times to times...

The WARN_ON() in try_to_coalesce() is an attempt to detect a producer
made a lie about truesize, leading to OOM in case of abuses.

Do not paper over the bug, find the root cause and fix it, thanks.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [net PATCH v2] ixgbe: Reset interface after enabling SR-IOV

2015-10-28 Thread Miller, Darin J


-Original Message-
From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On 
Behalf Of Alexander Duyck
Sent: Tuesday, October 20, 2015 1:28 PM
To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
Subject: [Intel-wired-lan] [net PATCH v2] ixgbe: Reset interface after enabling 
SR-IOV

Enabling SR-IOV and then bringing the interface up was resulting in the PF MAC 
addresses getting into a bad state.  Specifically the MAC address was enabled 
for both VF 0 and the PF.  This resulted in some odd behaviors such as VF 0 
receiving a copy of the PFs traffic, which in turn enables the ability for VF 0 
to spoof the PF.

A workaround for this issue appears to be to bring up the interface first and 
then enable SR-IOV as this way the reset is then triggered in the existing code.

In order to correct this I have added a change to ixgbe_setup_tc where if the 
interface is down we still will at least call ixgbe_reset so that the MAC 
addresses for the device are reset to the correct pools.

Steps to reproduce issue:
modprobe ixgbe
echo 7 > /sys/bus/pci/devices/\:01\:00.1/sriov_numvfs
ifconfig enp1s0f1 up
ethregs -s 1:00.1 | grep MPSAR | grep -v 

Result:
MPSAR[0]   0081
MPSAR[254] 0001

Expected Result, behavior after patch:
MPSAR[0]   0080
MPSAR[254] 0080

Signed-off-by: Alexander Duyck 
---

Tested-by: Darin Miller 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE:

2015-10-28 Thread
Hello,

I am Major. Alan Edward, in the military unit here in Afghanistan and i need an 
urgent assistance with someone i can trust,It's risk free and legal.

---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Congratulation !!!!

2015-10-28 Thread
Congratulation,You have been selected to receive the sum of $850,000  
Donation from my won Lottery Money, Kindly get back to me now and Claim your 
Cash. 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT] [4.4] NFC update

2015-10-28 Thread Samuel Ortiz
Hi David,

This is the NFC pull request for 4.4.

It's a bit bigger than usual, the 3 main culprits being:

- A new driver for Intel's Fields Peak NCI chipset. In order to
  support this chipset we had to export a few NCI routines and
  extend the driver NCI ops to not only support proprietary
  commands but also core ones.

- Support for vendor commands for both STM drivers, st-nci
  and st21nfca. Those vendor commands allow to run factory tests
  through the NFC netlink interface.

- New i2c and SPI support for the Marvell driver, together with
  firmware download support for this driver's core.

Besides that we also have:

- A few file renames in the STM drivers, to keep the naming
  consistent between drivers.

- Some improvements and fixes on the NCI HCI layer, mostly to
  properly reach a secure element over a legacy HCI link.

- A few fixes for the s3fwrn5 and trf7970a drivers.


The following changes since commit f6d3125fa3c2f55ddf7cf69365c41089de6cfae6:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-10-02 
07:21:25 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next.git 
tags/nfc-next-4.4-2

for you to fetch changes up to f11631748ee6973f85238109a3fa8ab8e760e5a4:

  NFC: nci: non-static functions can not be inline (2015-10-28 06:44:45 +0100)


Axel Lin (2):
  nfc: s3fwrn5: Make NFC_S3FWRN5 select CRYPTO
  nfc: s3fwrn5: i2c: Use devm_request_threaded_irq to avoid irq leak

Christophe JAILLET (1):
  NFC: nfcwilink: Drop a useless static qualifier

Christophe Ricard (35):
  NFC: st-nci: Align st-nci driver with other nfc driver
  NFC: st-nci: include st-nci.h instead of ndlc.h
  NFC: st21nfca: Align st21nfca driver with other nfc driver
  NFC: st-nci: Fix incorrect spi buffer size
  NFC: nci: Fix incorrect data chaining when sending data
  NFC: nci: Fix improper management of HCI return code
  NFC: nci: extract pipe value using NCI_HCP_MSG_GET_PIPE
  NFC: nci: add nci_hci_clear_all_pipes functions
  NFC: nci: Call nci_hci_clear_all_pipes at HCI initial activation.
  NFC: nci: Create pipe on specific gate in nci_hci_connect_gate
  NFC: st-nci: Remove HCI init_data.gates initialization in load_session
  NFC: st21nfca: Remove HCI gates initialization in load_session
  NFC: st-nci: Open NCI_HCI_LINK_MGMT_PIPE
  NFC: st21nfca: Open NFC_HCI_LINK_MGMT_PIPE
  NFC: st-nci: Keep st_nci_gates unchanged in load_session
  NFC: st21nfca: Keep st21nfca_gates unchanged in load_session
  NFC: st-nci: initialize gate_count in st_nci_hci_network_init
  NFC: st-nci: Add support for NCI_HCI_IDENTITY_MGMT_GATE
  NFC: st-nci: Fix st_nci_gates offset
  NFC: st21nfca: Fix st21nfca_gates offset
  NFC: st-nci: Add support for proprietary commands
  NFC: st-nci: Add error messages when an unexpected HCI event occurs
  NFC: netlink: Add missing NFC_ATTR comments
  NFC: st-nci: Add ese-present/uicc-present dts properties
  NFC: st-nci: Increase delay between 2 secure element activations
  NFC: st-nci: Fix host_list verification after SE activation
  NFC: st21nfca: Fix host_list verification after SEactivation
  NFC: netlink: Add mode parameter to deactivate_target functions
  NFC: st-nci: Add few code style fixes
  NFC: st21nfca: Add few code style fixes
  NFC: st21nfca: Add error messages for unexpected HCI events
  NFC: st-nci: Disable irq when powering the device up
  NFC: st-nci: remove duplicated skb dump
  NFC: st-nci: Replace st21nfcb by st_nci in makefile
  NFC: st21nfca: Add support for proprietary commands

Javier Martinez Canillas (1):
  NFC: trf7970a: Add OF match table

Jean Delvare (3):
  NFC: pn544: Auto-select core module
  NFC: microread: Auto-select core module
  NFC: nfcmrvl: Auto-select core module

Julia Lawall (2):
  NFC: nxp-nci: constify nxp_nci_phy_ops structure
  NFC: delete null dereference

Robert Dolca (11):
  NFC: nci: Export nci data send API
  NFC: nci: Add function to get max packet size for conn
  NFC: nci: Introduce new core opcodes
  NFC: nci: Do not call post_setup when setup fails
  NFC: nci: Introduce nci_core_cmd
  NFC: nci: Allow the driver to set handler for core nci ops
  NFC: nci: rename nci_prop_ops to nci_driver_ops
  NFC: nci: fix possible crash in nci_core_conn_create
  NFC: nci: add nci_get_conn_info_by_id function
  NFC: Add Intel Fields Peak NFC solution driver
  NFC: nci: non-static functions can not be inline

Samuel Ortiz (2):
  NFC: nci: Use __nci_request for exported routines
  NFC: st-nci: Rename st-nci_se.c

Valentin Rothberg (1):
  NFC: s3fwrn5: Remove superfluous cflags

Vincent Cuissard (9):
  NFC: nfcmrvl: remove unneeded version defines
  NFC: NCI: export nci_send_frame and nci_send_cmd 

Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Eric Dumazet
On Thu, 2015-10-29 at 00:15 +, Al Viro wrote:
> On Wed, Oct 28, 2015 at 04:08:29PM -0700, Eric Dumazet wrote:
> > > > Except for legacy stuff and stdin/stdout/stderr games, I really doubt
> > > > lot of applications absolutely rely on the POSIX thing...
> > > 
> > > We obviously can't turn that into default behaviour, though.  BTW, what
> > > distribution do you have in mind for those random descriptors?  Uniform
> > > on [0,INT_MAX] is a bad idea for obvious reasons - you'll blow the
> > > memory footprint pretty soon...
> > 
> > Simply [0 , fdt->max_fds] is working well in most cases.
> 
> Umm...  So first you dup2() to establish the ->max_fds you want, then
> do such opens?

Yes, dup2() is done at program startup, knowing the expected max load
(in term of concurrent fd) + ~10 % (actual fd array size can be more
than this because of power of two rounding in alloc_fdtable() )

But this is an optimization : If you do not use the initial dup2(), the
fd array can be automatically expanded if needed (all slots are in use)

>   What used/unused ratio do you expect to deal with?
> And what kind of locking are you going to use?  Keep in mind that
> e.g. dup2() is dependent on the lack of allocations while it's working,
> so it's not as simple as "we don't need no stinkin' ->files_lock"...

No locking change. files->file_lock is still taken.

We only want to minimize time to find an empty slot.

The trick is to not start bitmap search at files->next_fd, but a random
point. This is a win if we assume there are enough holes.

low = start;
if (low < files->next_fd)
low = files->next_fd;

res = -1;
if (flags & O_FD_FASTALLOC) {
random_point = pick_random_between(low, fdt->max_fds);

res = find_next_zero_bit(fdt->open_fds, fdt->max_fds,
random_point);
/* No empty slot found, try the other range */
if (res >= fdt->max_fds) {
res = find_next_zero_bit(fdt->open_fds,
low, random_point);
if (res >= random_point)
res = -1;
}
}
...




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Al Viro
On Wed, Oct 28, 2015 at 08:29:41PM -0700, Eric Dumazet wrote:

> But this is an optimization : If you do not use the initial dup2(), the
> fd array can be automatically expanded if needed (all slots are in use)

Whee...

> No locking change. files->file_lock is still taken.
> 
> We only want to minimize time to find an empty slot.

Then I'd say that my variant is going to win.  It *will* lead to
cacheline pingpong in more cases than yours, but I'm quite sure that
it will be a win as far as the total amount of cachelines accessed.

> The trick is to not start bitmap search at files->next_fd, but a random
> point. This is a win if we assume there are enough holes.
> 
> low = start;
> if (low < files->next_fd)
> low = files->next_fd;
> 
> res = -1;
> if (flags & O_FD_FASTALLOC) {
>   random_point = pick_random_between(low, fdt->max_fds);
> 
>   res = find_next_zero_bit(fdt->open_fds, fdt->max_fds,
>   random_point);
>   /* No empty slot found, try the other range */
>   if (res >= fdt->max_fds) {
>   res = find_next_zero_bit(fdt->open_fds,
>   low, random_point);
>   if (res >= random_point)
>   res = -1;
>   }
> }

Have you tried to experiment with that in userland?  I mean, emulate that
thing in normal userland code, count the cacheline accesses and drive it
with the use patterns collected from actual applications.

I can sit down and play with math expectations, but I suspect that it's
easier to experiment.  It's nothing but an intuition (I hadn't seriously
done probability theory in quite a while, and my mathematical tastes run
more to geometry and topology anyway), but... I would expect it to degrade
badly when the bitmap is reasonably dense.

Note, BTW, that vmalloc'ed memory gets populated as you read it, and it's
not cheap - it's done via #PF triggered in kernel mode, with handler
noticing that the faulting address is in vmalloc range and doing the
right thing.  IOW, if your bitmap is very sparse, the price of page faults
needs to be taken into account.

AFAICS, the only benefit of that thing is keeping dirtied cachelines far
from each other.  Which might be a win overall, but I'm not convinced that
the rest won't offset the effect of that...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [V5, 2/6] fsl/fman: Add FMan support

2015-10-28 Thread Scott Wood
On Tue, 2015-10-27 at 11:32 -0500, Liberman Igal-B31950 wrote:

> > > +
> > > +struct device *fman_get_device(struct fman *fman) {
> > > + return fman->dev;
> > > +}
> > 
> > Is this really necessary?
> > 
> 
> Fman port needs fman->dev, fman structure is opaque, so yes, it's needed.

Why is opacity being maintained from one part of the fman driver to another?  
Isn't this the sort of excessive layering that was complained about?


> > > + /* In B4 rev 2.0 (and above) the MURAM size is 512KB.
> > > +  * Check the SVR and update MURAM size if required.
> > > +  */
> > > + u32 svr;
> > > +
> > > + svr = mfspr(SPRN_SVR);
> > > +
> > > + if ((SVR_SOC_VER(svr) == SVR_B4860) && (SVR_MAJ(svr) >=
> > 2))
> > > + fman->dts_params.muram_size = 0x8;
> > > + }
> > 
> > Why wasn't the MURAM size described in the device tree, as it was with
> > CPM/QE?
> > 
> 
> MURAM size described by the device-tree.
> In B4860 rev 2.0 (and above) MURAM size is bigger. 
> This is workaround, in order to have the same device tree for all B4860 
> revisions.

We don't support b4860 prior to rev 2.0 (due to e6500 core errata) so this is 
irrelevant.  Fix the device tree.

> > > +
> > > + of_node_put(muram_node);
> > > + of_node_put(fm_node);
> > > +
> > > + err = devm_request_irq(_dev->dev, irq, fman_irq,
> > > +IRQF_NO_SUSPEND, "fman", fman);
> > > + if (err < 0) {
> > > + pr_err("Error: allocating irq %d (error = %d)\n", irq, err);
> > > + goto fman_free;
> > > + }
> > 
> > Why IRQF_NO_SUSPEND?
> > 
> 
> It shouldn't be IRQF_NO_SUSPEND for now, removed. 

Why just "for now"?

-Scott

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Eric Dumazet
On Wed, 2015-10-28 at 21:13 +, Al Viro wrote:
> On Wed, Oct 28, 2015 at 07:47:57AM -0700, Eric Dumazet wrote:
> > On Wed, 2015-10-28 at 06:24 -0700, Eric Dumazet wrote:
> > 
> > > Before I take a deep look at your suggestion, are you sure plain use of
> > > include/linux/percpu-refcount.h infra is not possible for struct cred ?
> > 
> > BTW, I am not convinced we need to spend so much energy and per-cpu
> > memory for struct cred refcount.
> > 
> > The big problem is fd array spinlock of course and bitmap search for
> > POSIX compliance.
> > 
> > The cache line trashing in struct cred is a minor one ;)
> 
> percpu-refcount isn't convenient - the only such candidate for ref_kill in
> there is "all other references are gone", and that can happen in
> interesting locking environments.  I doubt that it would be a good fit, TBH...

OK then ...

> Cacheline pingpong on the descriptors bitmap is probably inevitable, but
> it's not the only problem in the existing implementation - close a small
> descriptor when you've got a lot of them and look for the second open
> after that.  _That_ can lead to thousands of cachelines being read through,
> all under the table spinlock.  It's literally orders of magnitude worse.
> And if the first open after that close happens to be for a short-living
> descriptor, you'll get the same situation back in your face as soon as you
> close it.
> 
> I think we can seriously improve that without screwing the fast path by
> adding "summary" bitmaps once the primary grows past the cacheline worth
> of bits.  With bits in the summary bitmap corresponding to cacheline-sized
> chunks of the primary, being set iff all bits in the corresponding chunk
> are set.  If the summary map grows larger than one cacheline, add the
> second-order one (that happens at quarter million descriptors and serves
> until 128 million; adding the third-order map is probably worthless).
> 
> I want to maintain the same kind of "everything below this is known to be
> in use" thing as we do now.  Allocation would start with looking into the
> same place in primary bitmap where we'd looked now and similar search
> forward for zero bit.  _However_, it would stop at cacheline boundary.
> If nothing had been found, we look in the corresponding place in the
> summary bitmap and search for zero bit there.  Again, no more than up
> to the cacheline boundary.  If something is found, we've got a chunk in
> the primary known to contain a zero bit; if not - go to the second-level
> and search there, etc.
> 
> When a zero bit in the primary had been found, check if it's within the
> rlimit (passed to __alloc_fd() explicitly) and either bugger off or set
> that bit.  If there are zero bits left in the same word - we are done,
> otherwise check the still unread words in the cacheline and see if all
> of them are ~0UL.  If all of them are, set the bit in summary bitmap, etc.
> 
> Normal case is exactly the same as now - one cacheline accessed and modified.
> We might end up touching more than that, but it's going to be rare and
> the cases when it happens are very likely to lead to much worse amount of
> memory traffic with the current code.
> 
> Freeing is done by zeroing the bit in primary, checking for other zero bits
> nearby and buggering off if there are such.  If the entire cacheline used
> to be all-bits-set, clear the bit in summary and, if there's a second-order
> summary, get the bit in there clear as well - it's probably not worth
> bothering with checking that all the cacheline in summary bitmap had been
> all-bits-set.  Again, the normal case is the same as now.
> 
> It'll need profiling and tuning, but AFAICS it's doable without making the
> things worse than they are now, and it should get rid of those O(N) fetches
> under spinlock cases.  And yes, those are triggerable and visible in
> profiles.  IMO it's worth trying to fix...
> 

Well, all this complexity goes away with a O_FD_FASTALLOC /
SOCK_FD_FASTALLOC bit in various fd allocations, which specifically
tells the kernel we do not care getting the lowest possible fd as POSIX
mandates.

With this bit set, the bitmap search can start at a random point, and we
find a lot in O(1) : one cache line miss, if you have at least one free
bit/slot per 512 bits (64 bytes cache line).

#ifndef O_FD_FASTALLOC
#define O_FD_FASTALLOC 0x4000
#endif

#ifndef SOCK_FD_FASTALLOC
#define SOCK_FD_FASTALLOC O_FD_FASTALLOC
#endif

... // active sockets
socket(AF_INET, SOCK_STREAM | SOCK_FD_FASTALLOC, 0);
... // passive sockets
accept4(sockfd, ..., SOCK_FD_FASTALLOC);
...

Except for legacy stuff and stdin/stdout/stderr games, I really doubt
lot of applications absolutely rely on the POSIX thing...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Al Viro
On Wed, Oct 28, 2015 at 07:47:57AM -0700, Eric Dumazet wrote:
> On Wed, 2015-10-28 at 06:24 -0700, Eric Dumazet wrote:
> 
> > Before I take a deep look at your suggestion, are you sure plain use of
> > include/linux/percpu-refcount.h infra is not possible for struct cred ?
> 
> BTW, I am not convinced we need to spend so much energy and per-cpu
> memory for struct cred refcount.
> 
> The big problem is fd array spinlock of course and bitmap search for
> POSIX compliance.
> 
> The cache line trashing in struct cred is a minor one ;)

percpu-refcount isn't convenient - the only such candidate for ref_kill in
there is "all other references are gone", and that can happen in
interesting locking environments.  I doubt that it would be a good fit, TBH...

Cacheline pingpong on the descriptors bitmap is probably inevitable, but
it's not the only problem in the existing implementation - close a small
descriptor when you've got a lot of them and look for the second open
after that.  _That_ can lead to thousands of cachelines being read through,
all under the table spinlock.  It's literally orders of magnitude worse.
And if the first open after that close happens to be for a short-living
descriptor, you'll get the same situation back in your face as soon as you
close it.

I think we can seriously improve that without screwing the fast path by
adding "summary" bitmaps once the primary grows past the cacheline worth
of bits.  With bits in the summary bitmap corresponding to cacheline-sized
chunks of the primary, being set iff all bits in the corresponding chunk
are set.  If the summary map grows larger than one cacheline, add the
second-order one (that happens at quarter million descriptors and serves
until 128 million; adding the third-order map is probably worthless).

I want to maintain the same kind of "everything below this is known to be
in use" thing as we do now.  Allocation would start with looking into the
same place in primary bitmap where we'd looked now and similar search
forward for zero bit.  _However_, it would stop at cacheline boundary.
If nothing had been found, we look in the corresponding place in the
summary bitmap and search for zero bit there.  Again, no more than up
to the cacheline boundary.  If something is found, we've got a chunk in
the primary known to contain a zero bit; if not - go to the second-level
and search there, etc.

When a zero bit in the primary had been found, check if it's within the
rlimit (passed to __alloc_fd() explicitly) and either bugger off or set
that bit.  If there are zero bits left in the same word - we are done,
otherwise check the still unread words in the cacheline and see if all
of them are ~0UL.  If all of them are, set the bit in summary bitmap, etc.

Normal case is exactly the same as now - one cacheline accessed and modified.
We might end up touching more than that, but it's going to be rare and
the cases when it happens are very likely to lead to much worse amount of
memory traffic with the current code.

Freeing is done by zeroing the bit in primary, checking for other zero bits
nearby and buggering off if there are such.  If the entire cacheline used
to be all-bits-set, clear the bit in summary and, if there's a second-order
summary, get the bit in there clear as well - it's probably not worth
bothering with checking that all the cacheline in summary bitmap had been
all-bits-set.  Again, the normal case is the same as now.

It'll need profiling and tuning, but AFAICS it's doable without making the
things worse than they are now, and it should get rid of those O(N) fetches
under spinlock cases.  And yes, those are triggerable and visible in
profiles.  IMO it's worth trying to fix...

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [PATCH] fm10k:Fix error handling in the function fm10k_resume

2015-10-28 Thread Singh, Krishneil K


-Original Message-
From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On 
Behalf Of Nicholas Krause
Sent: Saturday, October 17, 2015 9:21 AM
To: Kirsher, Jeffrey T 
Cc: linux-ker...@vger.kernel.org; intel-wired-...@lists.osuosl.org; 
netdev@vger.kernel.org
Subject: [Intel-wired-lan] [PATCH] fm10k:Fix error handling in the function 
fm10k_resume

This fixes error handling to proper check if the call to the function 
fm10k_mbx_request_irq has failed by returning a error code and if so return 
immediately to the caller of fm10k_resume to properly signal a failure has 
occurred when accepting to resume this network

Signed-off-by: Nicholas Krause 
---

Tested-by: Krishneil SIngh 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Al Viro
On Wed, Oct 28, 2015 at 02:44:28PM -0700, Eric Dumazet wrote:

> Well, all this complexity goes away with a O_FD_FASTALLOC /
> SOCK_FD_FASTALLOC bit in various fd allocations, which specifically
> tells the kernel we do not care getting the lowest possible fd as POSIX
> mandates.

... which won't do a damn thing for existing userland.

> Except for legacy stuff and stdin/stdout/stderr games, I really doubt
> lot of applications absolutely rely on the POSIX thing...

We obviously can't turn that into default behaviour, though.  BTW, what
distribution do you have in mind for those random descriptors?  Uniform
on [0,INT_MAX] is a bad idea for obvious reasons - you'll blow the
memory footprint pretty soon...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH] RDS: convert bind hash table to re-sizable hashtable

2015-10-28 Thread Santosh Shilimkar
To further improve the RDS connection scalabilty on massive systems
where number of sockets grows into tens of thousands  of sockets, there
is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
not very flexible in terms of memory utilisation. The rhashtable
infrastructure gives us the flexibility to grow the hashtbable based
on use and also comes up with inbuilt efficient bucket(chain) handling.

Reviewed-by: David Miller 
Signed-off-by: Santosh Shilimkar 
Signed-off-by: Santosh Shilimkar 
---
 net/rds/af_rds.c |  10 -
 net/rds/bind.c   | 126 +++
 net/rds/rds.h|   7 +++-
 3 files changed, 57 insertions(+), 86 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 384ea1e..b5476aeb 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -573,6 +573,7 @@ static void rds_exit(void)
rds_threads_exit();
rds_stats_exit();
rds_page_exit();
+   rds_bind_lock_destroy();
rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info);
rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info);
 }
@@ -582,11 +583,14 @@ static int rds_init(void)
 {
int ret;
 
-   rds_bind_lock_init();
+   ret = rds_bind_lock_init();
+   if (ret)
+   goto out;
 
ret = rds_conn_init();
if (ret)
-   goto out;
+   goto out_bind;
+
ret = rds_threads_init();
if (ret)
goto out_conn;
@@ -620,6 +624,8 @@ out_conn:
rds_conn_exit();
rds_cong_exit();
rds_page_exit();
+out_bind:
+   rds_bind_lock_destroy();
 out:
return ret;
 }
diff --git a/net/rds/bind.c b/net/rds/bind.c
index 6192566..2b00222 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -38,54 +38,17 @@
 #include 
 #include "rds.h"
 
-struct bind_bucket {
-   rwlock_tlock;
-   struct hlist_head   head;
+static struct rhashtable bind_hash_table;
+
+static struct rhashtable_params ht_parms = {
+   .nelem_hint = 768,
+   .key_len = sizeof(u64),
+   .key_offset = offsetof(struct rds_sock, rs_bound_key),
+   .head_offset = offsetof(struct rds_sock, rs_bound_node),
+   .max_size = 16384,
+   .min_size = 1024,
 };
 
-#define BIND_HASH_SIZE 1024
-static struct bind_bucket bind_hash_table[BIND_HASH_SIZE];
-
-static struct bind_bucket *hash_to_bucket(__be32 addr, __be16 port)
-{
-   return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) &
- (BIND_HASH_SIZE - 1));
-}
-
-/* must hold either read or write lock (write lock for insert != NULL) */
-static struct rds_sock *rds_bind_lookup(struct bind_bucket *bucket,
-   __be32 addr, __be16 port,
-   struct rds_sock *insert)
-{
-   struct rds_sock *rs;
-   struct hlist_head *head = >head;
-   u64 cmp;
-   u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);
-
-   hlist_for_each_entry(rs, head, rs_bound_node) {
-   cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
- be16_to_cpu(rs->rs_bound_port);
-
-   if (cmp == needle) {
-   rds_sock_addref(rs);
-   return rs;
-   }
-   }
-
-   if (insert) {
-   /*
-* make sure our addr and port are set before
-* we are added to the list.
-*/
-   insert->rs_bound_addr = addr;
-   insert->rs_bound_port = port;
-   rds_sock_addref(insert);
-
-   hlist_add_head(>rs_bound_node, head);
-   }
-   return NULL;
-}
-
 /*
  * Return the rds_sock bound at the given local address.
  *
@@ -94,18 +57,14 @@ static struct rds_sock *rds_bind_lookup(struct bind_bucket 
*bucket,
  */
 struct rds_sock *rds_find_bound(__be32 addr, __be16 port)
 {
+   u64 key = ((u64)addr << 32) | port;
struct rds_sock *rs;
-   unsigned long flags;
-   struct bind_bucket *bucket = hash_to_bucket(addr, port);
 
-   read_lock_irqsave(>lock, flags);
-   rs = rds_bind_lookup(bucket, addr, port, NULL);
-   read_unlock_irqrestore(>lock, flags);
-
-   if (rs && sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) {
-   rds_sock_put(rs);
+   rs = rhashtable_lookup_fast(_hash_table, , ht_parms);
+   if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
+   rds_sock_addref(rs);
+   else
rs = NULL;
-   }
 
rdsdebug("returning rs %p for %pI4:%u\n", rs, ,
ntohs(port));
@@ -116,10 +75,9 @@ struct rds_sock *rds_find_bound(__be32 addr, __be16 port)
 /* returns -ve errno or +ve port */
 static int rds_add_bound(struct rds_sock *rs, __be32 addr, __be16 *port)
 {
-   unsigned long flags;
int ret = -EADDRINUSE;

Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Eric Dumazet
On Wed, 2015-10-28 at 22:33 +, Al Viro wrote:
> On Wed, Oct 28, 2015 at 02:44:28PM -0700, Eric Dumazet wrote:
> 
> > Well, all this complexity goes away with a O_FD_FASTALLOC /
> > SOCK_FD_FASTALLOC bit in various fd allocations, which specifically
> > tells the kernel we do not care getting the lowest possible fd as POSIX
> > mandates.
> 
> ... which won't do a damn thing for existing userland.

For the userland that need +5,000,000 socket, I can tell you they are
using this flag as soon they are aware it exists ;)

> 
> > Except for legacy stuff and stdin/stdout/stderr games, I really doubt
> > lot of applications absolutely rely on the POSIX thing...
> 
> We obviously can't turn that into default behaviour, though.  BTW, what
> distribution do you have in mind for those random descriptors?  Uniform
> on [0,INT_MAX] is a bad idea for obvious reasons - you'll blow the
> memory footprint pretty soon...

Simply [0 , fdt->max_fds] is working well in most cases.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] e1000e: Fix msi-x interrupt automask

2015-10-28 Thread Alexander Duyck

On 10/22/2015 05:32 PM, Benjamin Poirier wrote:

Since the introduction of 82574 support in e1000e, the driver has worked on
the assumption that msi-x interrupt generation is automatically disabled
after each irq. As it turns out, this is not the case. Currently, rx
interrupts can fire multiple times before and during napi processing. This
can be a problem for users because frames that arrive in a certain window
(after adapter->clean_rx() but before napi_complete_done() has cleared
NAPI_STATE_SCHED) generate an interrupt which does not lead to
napi_schedule(). These frames sit in the rx queue until another frame
arrives (a tcp retransmit for example).

While the EIAC and CTRL_EXT registers are properly configured for irq
automask, the modification of IAM in e1000_configure_msix() is what
prevents automask from working as intended.

This patch removes that erroneous write and fixes interrupt rearming for tx
and "other" interrupts. Since e1000_msix_other() reads ICR, all interrupts
must be rearmed in that function.

Reported-by: Frank Steiner 
Signed-off-by: Benjamin Poirier 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index a228167..8881256 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1921,7 +1921,8 @@ static irqreturn_t e1000_msix_other(int __always_unused 
irq, void *data)

  no_link_interrupt:
if (!test_bit(__E1000_DOWN, >state))
-   ew32(IMS, E1000_IMS_LSC | E1000_IMS_OTHER);
+   ew32(IMS, adapter->eiac_mask | E1000_IMS_OTHER |
+E1000_IMS_LSC);

return IRQ_HANDLED;
  }


I would argue your first patch probably didn't go far enough to remove 
dead code.  Specifically you should only ever get into this function if 
LSC is set.  There are no other causes that should trigger this.  As 
such you could probably remove the ICR read, and instead replace it with 
an ICR write of the LSC bit since OTHER is already cleared via EIAC.



@@ -1940,6 +1941,9 @@ static irqreturn_t e1000_intr_msix_tx(int __always_unused 
irq, void *data)
/* Ring was not completely cleaned, so fire another interrupt */
ew32(ICS, tx_ring->ims_val);

+   if (!test_bit(__E1000_DOWN, >state))
+   ew32(IMS, E1000_IMS_TXQ0);
+
return IRQ_HANDLED;
  }



I think what you need to set here is tx_ring->ims_val, not E1000_IMS_TXQ0.


@@ -2027,11 +2031,7 @@ static void e1000_configure_msix(struct e1000_adapter 
*adapter)

/* enable MSI-X PBA support */
ctrl_ext = er32(CTRL_EXT);
-   ctrl_ext |= E1000_CTRL_EXT_PBA_CLR;
-
-   /* Auto-Mask Other interrupts upon ICR read */
-   ew32(IAM, ~E1000_EIAC_MASK_82574 | E1000_IMS_OTHER);
-   ctrl_ext |= E1000_CTRL_EXT_EIAME;
+   ctrl_ext |= E1000_CTRL_EXT_PBA_CLR | E1000_CTRL_EXT_EIAME;
ew32(CTRL_EXT, ctrl_ext);
e1e_flush();
  }



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

2015-10-28 Thread Al Viro
On Wed, Oct 28, 2015 at 04:08:29PM -0700, Eric Dumazet wrote:
> > > Except for legacy stuff and stdin/stdout/stderr games, I really doubt
> > > lot of applications absolutely rely on the POSIX thing...
> > 
> > We obviously can't turn that into default behaviour, though.  BTW, what
> > distribution do you have in mind for those random descriptors?  Uniform
> > on [0,INT_MAX] is a bad idea for obvious reasons - you'll blow the
> > memory footprint pretty soon...
> 
> Simply [0 , fdt->max_fds] is working well in most cases.

Umm...  So first you dup2() to establish the ->max_fds you want, then
do such opens?  What used/unused ratio do you expect to deal with?
And what kind of locking are you going to use?  Keep in mind that
e.g. dup2() is dependent on the lack of allocations while it's working,
so it's not as simple as "we don't need no stinkin' ->files_lock"...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 02/12] switchdev: Make flood to CPU optional

2015-10-28 Thread Jiri Pirko
From: Ido Schimmel 

In certain use cases it is not always desirable for the switch device to
flood traffic to CPU port. Instead, only certain packet types (e.g.
STP, LACP) should be trapped to it.

Signed-off-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 Documentation/networking/switchdev.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/switchdev.txt 
b/Documentation/networking/switchdev.txt
index ce510e1..9199413 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -278,8 +278,8 @@ Flooding L2 domain
 For a given L2 VLAN domain, the switch device should flood multicast/broadcast
 and unknown unicast packets to all ports in domain, if allowed by port's
 current STP state.  The switch driver, knowing which ports are within which
-vlan L2 domain, can program the switch device for flooding.  The packet should
-also be sent to the port netdev for processing by the bridge driver.  The
+vlan L2 domain, can program the switch device for flooding.  The packet may
+be sent to the port netdev for processing by the bridge driver.  The
 bridge should not reflood the packet to the same ports the device flooded,
 otherwise there will be duplicate packets on the wire.
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 05/12] mlxsw: spectrum: Add support for flood control

2015-10-28 Thread Jiri Pirko
From: Ido Schimmel 

Add or remove a bridged port from the flooding domain of unknown unicast
packets according to user configuration.

Signed-off-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |   1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |   1 +
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 113 ++---
 3 files changed, 78 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index e30b2da..3be4a23 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1227,6 +1227,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp 
*mlxsw_sp, u8 local_port)
mlxsw_sp_port->local_port = local_port;
mlxsw_sp_port->learning = 1;
mlxsw_sp_port->learning_sync = 1;
+   mlxsw_sp_port->uc_flood = 1;
mlxsw_sp_port->pvid = 1;
 
mlxsw_sp_port->pcpu_stats =
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index b4d8393..4365c8b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -86,6 +86,7 @@ struct mlxsw_sp_port {
u8 stp_state;
u8 learning:1,
   learning_sync:1,
+  uc_flood:1,
   bridged:1;
u16 pvid;
/* 802.1Q bridge VLANs */
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index c3881c9..1f3b12e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -66,7 +66,8 @@ static int mlxsw_sp_port_attr_get(struct net_device *dev,
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
attr->u.brport_flags =
(mlxsw_sp_port->learning ? BR_LEARNING : 0) |
-   (mlxsw_sp_port->learning_sync ? BR_LEARNING_SYNC : 0);
+   (mlxsw_sp_port->learning_sync ? BR_LEARNING_SYNC : 0) |
+   (mlxsw_sp_port->uc_flood ? BR_FLOOD : 0);
break;
default:
return -EOPNOTSUPP;
@@ -123,15 +124,89 @@ static int mlxsw_sp_port_attr_stp_state_set(struct 
mlxsw_sp_port *mlxsw_sp_port,
return mlxsw_sp_port_stp_state_set(mlxsw_sp_port, state);
 }
 
+static int __mlxsw_sp_port_flood_set(struct mlxsw_sp_port *mlxsw_sp_port,
+u16 fid_begin, u16 fid_end, bool set,
+bool only_uc)
+{
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+   u16 range = fid_end - fid_begin + 1;
+   char *sftr_pl;
+   int err;
+
+   sftr_pl = kmalloc(MLXSW_REG_SFTR_LEN, GFP_KERNEL);
+   if (!sftr_pl)
+   return -ENOMEM;
+
+   mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_UC, fid_begin,
+   MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, range,
+   mlxsw_sp_port->local_port, set);
+   err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sftr), sftr_pl);
+   if (err)
+   goto buffer_out;
+
+   /* Flooding control allows one to decide whether a given port will
+* flood unicast traffic for which there is no FDB entry.
+*/
+   if (only_uc)
+   goto buffer_out;
+
+   mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_BM, fid_begin,
+   MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, range,
+   mlxsw_sp_port->local_port, set);
+   err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sftr), sftr_pl);
+
+buffer_out:
+   kfree(sftr_pl);
+   return err;
+}
+
+static int mlxsw_sp_port_uc_flood_set(struct mlxsw_sp_port *mlxsw_sp_port,
+ bool set)
+{
+   struct net_device *dev = mlxsw_sp_port->dev;
+   u16 vid, last_visited_vid;
+   int err;
+
+   for_each_set_bit(vid, mlxsw_sp_port->active_vlans, VLAN_N_VID) {
+   err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid, vid, set,
+   true);
+   if (err) {
+   last_visited_vid = vid;
+   goto err_port_flood_set;
+   }
+   }
+
+   return 0;
+
+err_port_flood_set:
+   for_each_set_bit(vid, mlxsw_sp_port->active_vlans, last_visited_vid)
+   __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid, vid, !set, true);
+   netdev_err(dev, "Failed to configure unicast flooding\n");
+   return err;
+}
+
 static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,
   struct switchdev_trans *trans,
   unsigned long 

[patch net-next 04/12] mlxsw: spectrum: Add support for VLAN ranges in flooding configuration

2015-10-28 Thread Jiri Pirko
From: Ido Schimmel 

When enabling a range of VLANs on a bridged port we can configure
flooding for these VLANs by one register access instead of calling the
same register for each VLAN. This is accomplished by using the 'range'
field of the Switch Flooding Table Register (SFTR).

Signed-off-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 40 +++---
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index c39b7a1..c3881c9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -248,9 +248,11 @@ static int mlxsw_sp_port_fid_unmap(struct mlxsw_sp_port 
*mlxsw_sp_port, u16 fid)
 }
 
 static int __mlxsw_sp_port_flood_set(struct mlxsw_sp_port *mlxsw_sp_port,
-u16 fid, bool set, bool only_uc)
+u16 fid_begin, u16 fid_end, bool set,
+bool only_uc)
 {
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+   u16 range = fid_end - fid_begin + 1;
char *sftr_pl;
int err;
 
@@ -258,8 +260,8 @@ static int __mlxsw_sp_port_flood_set(struct mlxsw_sp_port 
*mlxsw_sp_port,
if (!sftr_pl)
return -ENOMEM;
 
-   mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_UC, fid,
-   MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, 0,
+   mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_UC, fid_begin,
+   MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, range,
mlxsw_sp_port->local_port, set);
err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sftr), sftr_pl);
if (err)
@@ -271,8 +273,8 @@ static int __mlxsw_sp_port_flood_set(struct mlxsw_sp_port 
*mlxsw_sp_port,
if (only_uc)
goto buffer_out;
 
-   mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_BM, fid,
-   MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, 0,
+   mlxsw_reg_sftr_pack(sftr_pl, MLXSW_SP_FLOOD_TABLE_BM, fid_begin,
+   MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST, range,
mlxsw_sp_port->local_port, set);
err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sftr), sftr_pl);
 
@@ -345,14 +347,13 @@ static int __mlxsw_sp_port_vlans_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
netdev_err(dev, "Failed to map FID=%d", vid);
return err;
}
+   }
 
-   err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid, true,
-   false);
-   if (err) {
-   netdev_err(dev, "Failed to set flooding for FID=%d",
-  vid);
-   return err;
-   }
+   err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid_begin, vid_end,
+   true, false);
+   if (err) {
+   netdev_err(dev, "Failed to configure flooding\n");
+   return err;
}
 
for (vid = vid_begin; vid <= vid_end;
@@ -530,15 +531,14 @@ static int __mlxsw_sp_port_vlans_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
if (init)
goto out;
 
-   for (vid = vid_begin; vid <= vid_end; vid++) {
-   err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid, false,
-   false);
-   if (err) {
-   netdev_err(dev, "Failed to clear flooding for FID=%d",
-  vid);
-   return err;
-   }
+   err = __mlxsw_sp_port_flood_set(mlxsw_sp_port, vid_begin, vid_end,
+   false, false);
+   if (err) {
+   netdev_err(dev, "Failed to clear flooding\n");
+   return err;
+   }
 
+   for (vid = vid_begin; vid <= vid_end; vid++) {
/* Remove FID mapping in case of Virtual mode */
err = mlxsw_sp_port_fid_unmap(mlxsw_sp_port, vid);
if (err) {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/25] IB/mthca, net/mlx4: remove counting semaphores

2015-10-28 Thread Sagi Grimberg

Hi Arnd,


Since we want to make counting semaphores go away,


Why do we want to make counting semaphores go away? completely?
or just for binary use cases?

I have a use case in iser target code where a counting semaphore is the
best suited synchronizing mechanism.

I have a single thread handling connect requests (one at a time) while
connect requests are event driven and come asynchronously. This is
why I use a queue and a counting semaphore to handle this situation.

I'd need to rethink of a new strategy to handle this without counting
semaphores and I'm not entirely sure it would be simpler.


this patch replaces the semaphore counting the event-driven
commands with an open-coded wait-queue, which should
be an equivalent transformation of the code, although
it does not make it any nicer.

As far as I can tell, there is a preexisting race condition
regarding the cmd->use_events flag, which is not protected
by any lock. When this flag is toggled while another command
is being started, that command gets stuck until the mode is
toggled back.

A better solution that would solve the race condition and
at the same time improve the code readability would create
a new locking primitive that replaces both semaphores, like

static int mlx4_use_events(struct mlx4_cmd *cmd)
{
int ret = -EAGAIN;
spin_lock(>lock);
if (cmd->use_events && cmd->commands < cmd->max_commands) {
cmd->commands++;
ret = 1;
} else if (!cmd->use_events && cmd->commands == 0) {
cmd->commands = 1;
ret = 0;
}
spin_unlock(>lock);
return ret;
}

static bool mlx4_use_events(struct mlx4_cmd *cmd)
{
int ret;
wait_event(cmd->events_wq, ret = __mlx4_use_events(cmd) >= 0);
return ret;
}

Cc: Roland Dreier 
Cc: Eli Cohen 
Cc: Yevgeny Petrilin 
Cc: netdev@vger.kernel.org
Cc: linux-r...@vger.kernel.org
Signed-off-by: Arnd Bergmann 

Conflicts:

drivers/net/mlx4/cmd.c
drivers/net/mlx4/mlx4.h
---
  drivers/infiniband/hw/mthca/mthca_cmd.c   | 12 
  drivers/infiniband/hw/mthca/mthca_dev.h   |  3 ++-
  drivers/net/ethernet/mellanox/mlx4/cmd.c  | 12 
  drivers/net/ethernet/mellanox/mlx4/mlx4.h |  3 ++-
  4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c 
b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 9d3e5c1ac60e..aad1852e8e10 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -417,7 +417,8 @@ static int mthca_cmd_wait(struct mthca_dev *dev,
int err = 0;
struct mthca_cmd_context *context;

-   down(>cmd.event_sem);
+   wait_event(dev->cmd.event_wait,
+  atomic_add_unless(>cmd.commands, -1, 0));

spin_lock(>cmd.context_lock);
BUG_ON(dev->cmd.free_head < 0);
@@ -459,7 +460,8 @@ out:
dev->cmd.free_head = context - dev->cmd.context;
spin_unlock(>cmd.context_lock);

-   up(>cmd.event_sem);
+   atomic_inc(>cmd.commands);
+   wake_up(>cmd.event_wait);
return err;
  }

@@ -571,7 +573,8 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
dev->cmd.context[dev->cmd.max_cmds - 1].next = -1;
dev->cmd.free_head = 0;

-   sema_init(>cmd.event_sem, dev->cmd.max_cmds);
+   init_waitqueue_head(>cmd.event_wait);
+   atomic_set(>cmd.commands, dev->cmd.max_cmds);
spin_lock_init(>cmd.context_lock);

for (dev->cmd.token_mask = 1;
@@ -597,7 +600,8 @@ void mthca_cmd_use_polling(struct mthca_dev *dev)
dev->cmd.flags &= ~MTHCA_CMD_USE_EVENTS;

for (i = 0; i < dev->cmd.max_cmds; ++i)
-   down(>cmd.event_sem);
+   wait_event(dev->cmd.event_wait,
+  atomic_add_unless(>cmd.commands, -1, 0));

kfree(dev->cmd.context);

diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h 
b/drivers/infiniband/hw/mthca/mthca_dev.h
index 7e6a6d64ad4e..3055f5c12ac8 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -121,7 +121,8 @@ struct mthca_cmd {
struct pci_pool  *pool;
struct mutex  hcr_mutex;
struct semaphore  poll_sem;
-   struct semaphore  event_sem;
+   wait_queue_head_t event_wait;
+   atomic_t  commands;
int   max_cmds;
spinlock_tcontext_lock;
int   free_head;
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c 
b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 78f5a1a0b8c8..60134a4245ef 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -273,7 +273,8 @@ static int mlx4_cmd_wait(struct mlx4_dev *dev, u64 
in_param, u64 *out_param,
struct 

[patch net-next 00/12] mlxsw: driver update

2015-10-28 Thread Jiri Pirko
From: Jiri Pirko 

This driver update mainly brings support for user to be able to setup
flooding on specified port, via bridge flag. Also, there is a fix in ageing
time conversion. The rest is just cosmetics.

Ido Schimmel (4):
  switchdev: Add support for flood control
  switchdev: Make flood to CPU optional
  mlxsw: spectrum: Add support for VLAN ranges in flooding configuration
  mlxsw: spectrum: Add support for flood control

Jiri Pirko (6):
  mlxsw: spectrum: move "bridged" bool to u8 flags
  mlxsw: reg: Fix description for reg_sfd_uc_sub_port
  mlxsw: reg: Fix desription typos of couple of SFN items
  mlxsw: reg: Avoid unnecessary line wrap for mlxsw_reg_sfd_uc_unpack
  mlxsw: spectrum: Fix ageing time value
  mlxsw: spectrum: Make mlxsw_sp_port_switchdev_ops static

Or Gerlitz (2):
  mlxsw: Put constant on the right side of comparisons
  mlxsw: Put braces on all arms of branch statement

 Documentation/networking/switchdev.txt |   7 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c |   4 +-
 drivers/net/ethernet/mellanox/mlxsw/pci.c  |   3 +-
 drivers/net/ethernet/mellanox/mlxsw/reg.h  |  18 +--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |   5 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |   7 +-
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 144 +
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c |   2 +-
 net/switchdev/switchdev.c  |   5 +-
 9 files changed, 122 insertions(+), 73 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >