date:20160526

Re: [PATCH 1/1] net: nps_enet: Disable interrupts before napi reschedule

2016-05-26 Thread Vineet Gupta

Hi Elad, Noam,

On Thursday 26 May 2016 11:23 PM, Alexey Brodkin wrote:

> 
> We just bumped into the same problem (data exchange hangs on the very first 
> "ping")
> with released Linux v4.6 and linux-next on our nSIM OSCI virtual platform.
> 
> I believe it was commit 05c00d82f4d1 ("net: nps_enet: bug fix - handle lost 
> tx interrupts")
> that introduced the problem. At least reverting it I got networking working.
> 
> And indeed that patch fixes mentioned issue.
> In other words...
> 
> Tested-by: Alexey Brodkin 

FWIW, we now actively use the same driver (and same systemc model) in one of our
our simulation platforms used for testing regressions. So please try to keep arc
mailing list on CC for any nps_enet driver patches so we are in loop and know 
what
is going on !

Thx,
-Vineet

Re: IPv6 extension header privileges

2016-05-26 Thread YOSHIFUJI Hideaki

Hi,

Tom Herbert wrote:
> Hi,
> 
> In ipv6_sockglue.c I noticed:
> 
> /* hop-by-hop / destination options are privileged option */
> retv = -EPERM;
> if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
>break;
> 
> Can anyone provide that rationale as to why these are privileged ops?

It is better to disallow by default for security.
FreeBSD does this in the same way.
We may have sysctl bitmaps, of course.

--yoshfuji

> 
> Thanks,
> Tom
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION

[PATCH] vxlan: Accept user specified MTU value when create new vxlan link

2016-05-26 Thread Chen Haiquan

When create a new vxlan link, example:
  ip link add vtap mtu 1440 type vxlan vni 1 dev eth0

The argument "mtu" has no effect, because it is not set to conf->mtu. The
default value is used in vxlan_dev_configure function.

This problem was introduced by commit 0dfbdf4102b9 (vxlan: Factor out device 
configuration).

Fixes: 0dfbdf4102b9 (vxlan: Factor out device configuration) 

Signed-off-by:  Chen Haiquan 
---
 drivers/net/vxlan.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 8ff30c3..f999db2 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3086,6 +3086,9 @@ static int vxlan_newlink(struct net *src_net, struct 
net_device *dev,
if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
conf.flags |= VXLAN_F_REMCSUM_NOPARTIAL;
 
+   if (tb[IFLA_MTU])
+   conf.mtu = nla_get_u32(tb[IFLA_MTU]);
+
err = vxlan_dev_configure(src_net, dev, );
switch (err) {
case -ENODEV:
-- 
2.7.4

Re: BUG: net/netlink: KASAN: use-after-free in netlink_sock_destruct

2016-05-26 Thread Baozeng Ding



On 2016/5/26 23:06, Eric Dumazet wrote:
> On Thu, 2016-05-26 at 22:48 +0800, Baozeng Ding wrote:
>> Hi all,
>> I've got the following report use-after-free in netlink_sock_destruct while 
>> running syzkaller.
>> Unfortunately no reproducer.The kernel version is 4.6 (May 15, on commit 
>> 2dcd0af568b0cf583645c8a317dd12e344b1c72a). Thanks.
>>
>> ==
>> BUG: KASAN: use-after-free in kfree_skb+0x28c/0x310 at addr 880036c1179c
>> Read of size 4 by task syz-executor/21618
>> =
>> BUG skbuff_head_cache (Tainted: GW  ): kasan: bad access detected
>> -
>>
>> Disabling lock debugging due to kernel taint
>> INFO: Slab 0xeadb0400 objects=25 used=3 fp=0x880036c116c0 
>> flags=0x1fffc004080
>> INFO: Object 0x880036c11680 @offset=5760 fp=0x
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>>  0002 88006da07c40 8295f5f1 88003e0fc5c0
>>  880036c11680 eadb0400 880036c1 88006da07c70
>>  8171144d 88003e0fc5c0 eadb0400 880036c11680
>> Call Trace:
>>  [< inline >] __dump_stack /lib/dump_stack.c:15
>>  [] dump_stack+0xb3/0x112 /lib/dump_stack.c:51
>>  [] print_trailer+0x10d/0x190 /mm/slub.c:667
>>  [] object_err+0x2f/0x40 /mm/slub.c:674
>>  [< inline >] print_address_description /mm/kasan/report.c:179
>>  [] kasan_report_error+0x218/0x530 /mm/kasan/report.c:275
>>  [] ? debug_check_no_locks_freed+0x290/0x290 
>> /kernel/locking/lockdep.c:4212
>>  [< inline >] kasan_report /mm/kasan/report.c:297
>>  [] __asan_report_load4_noabort+0x3e/0x40 
>> /mm/kasan/report.c:317
>>  [< inline >] ? atomic_read /include/linux/compiler.h:222
>>  [] ? kfree_skb+0x28c/0x310 /net/core/skbuff.c:699
>>  [< inline >] atomic_read /include/linux/compiler.h:222
>>  [] kfree_skb+0x28c/0x310 /net/core/skbuff.c:699
>>  [] netlink_sock_destruct+0xeb/0x2b0 
>> /net/netlink/af_netlink.c:334
>>  [] ? __netlink_create+0x1d0/0x1d0 
>> /net/netlink/af_netlink.c:577
>>  [] sk_destruct+0x4a/0x4f0 /net/core/sock.c:1429
>>  [] __sk_free+0x57/0x200 /net/core/sock.c:1459
>>  [] sk_free+0x30/0x40 /net/core/sock.c:1470
>>  [< inline >] sock_put /include/net/sock.h:1506
>>  [] deferred_put_nlk_sk+0x34/0x40 
>> /net/netlink/af_netlink.c:652
>>  [< inline >] __rcu_reclaim /kernel/rcu/rcu.h:118
>>  [< inline >] rcu_do_batch /kernel/rcu/tree.c:2681
>>  [< inline >] invoke_rcu_callbacks /kernel/rcu/tree.c:2947
>>  [< inline >] __rcu_process_callbacks /kernel/rcu/tree.c:2914
>>  [] rcu_process_callbacks+0xa71/0x11d0 
>> /kernel/rcu/tree.c:2931
>>  [< inline >] ? __rcu_reclaim /kernel/rcu/rcu.h:108
>>  [< inline >] ? rcu_do_batch /kernel/rcu/tree.c:2681
>>  [< inline >] ? invoke_rcu_callbacks /kernel/rcu/tree.c:2947
>>  [< inline >] ? __rcu_process_callbacks /kernel/rcu/tree.c:2914
>>  [] ? rcu_process_callbacks+0xa1c/0x11d0 
>> /kernel/rcu/tree.c:2931
>>  [] ? __netlink_deliver_tap+0x7c0/0x7c0 
>> /net/netlink/af_netlink.c:204
>>  [] __do_softirq+0x22b/0x8da /kernel/softirq.c:273
>>  [< inline >] invoke_softirq /kernel/softirq.c:350
>>  [] irq_exit+0x15d/0x190 /kernel/softirq.c:391
>>  [< inline >] exiting_irq /./arch/x86/include/asm/apic.h:658
>>  [] smp_apic_timer_interrupt+0x7b/0xa0 
>> /arch/x86/kernel/apic/apic.c:932
>>  [] apic_timer_interrupt+0x8c/0xa0 
>> /arch/x86/entry/entry_64.S:454
>>  [< inline >] ? atomic_add_return 
>> /./arch/x86/include/asm/atomic.h:156
>>  [< inline >] ? kref_get /include/linux/kref.h:46
>>  [] ? klist_next+0x177/0x400 /lib/klist.c:393
>>  [< inline >] ? kref_get /include/linux/kref.h:46
>>  [] ? klist_next+0x168/0x400 /lib/klist.c:393
>>  [] class_dev_iter_next+0x8b/0xd0 /drivers/base/class.c:324
>>  [] ? tty_get_pgrp+0x80/0x80 /drivers/tty/tty_io.c:2525
>>  [] class_find_device+0x101/0x1c0 /drivers/base/class.c:428
>>  [] ? class_for_each_device+0x1d0/0x1d0 
>> /drivers/base/class.c:375
>>  [< inline >] tty_get_device /drivers/tty/tty_io.c:3139
>>  [] alloc_tty_struct+0x5fb/0x840 /drivers/tty/tty_io.c:3183
>>  [] ? do_SAK_work+0x20/0x20 /drivers/tty/tty_io.c:3112
>>  [] ? mutex_lock_interruptible_nested+0x980/0x980 ??:?
>>  [] tty_init_dev+0x78/0x4b0 /drivers/tty/tty_io.c:1532
>>  [< inline >] tty_open_by_driver /drivers/tty/tty_io.c:2065
>>  [] tty_open+0xd31/0x1050 /drivers/tty/tty_io.c:2113
>>  [] ? tty_init_dev+0x4b0/0x4b0 /drivers/tty/tty_io.c:1543
>>  [< inline >] ? spin_unlock /include/linux/spinlock.h:347
>>  [] ? chrdev_open+0xbf/0x4c0 /fs/char_dev.c:376
>>  [] ? tty_init_dev+0x4b0/0x4b0 /drivers/tty/tty_io.c:1543
>>  [] chrdev_open+0x22a/0x4c0

RE: [Intel-wired-lan] [RFC PATCH net] e1000e: keep vlan interfaces functional after rxvlan off

2016-05-26 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Jeff Kirsher
> Sent: Wednesday, May 18, 2016 2:40 PM
> To: Jarod Wilson ; linux-ker...@vger.kernel.org;
> Avargil, Raanan 
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: Re: [Intel-wired-lan] [RFC PATCH net] e1000e: keep vlan interfaces
> functional after rxvlan off
> 
> On Tue, 2016-05-17 at 15:03 -0400, Jarod Wilson wrote:
> > I've got a bug report about an e1000e interface, where a vlan interface
> > is
> > set up on top of it:
> >
> > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99
> > $ ip link set ens1f0 up
> > $ ip link set ens1f0.99 up
> > $ ip addr add 192.168.99.92 dev ens1f0.99
> >
> > At this point, I can ping another host on vlan 99, ip 192.168.99.91.
> > However, if I do the following:
> >
> > $ ethtool -K ens1f0 rxvlan off
> >
> > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on
> > again. I'm not sure if this is actually intended behavior, or if there's
> > a
> > lack of software vlan stripping fallback, or what, but things continue to
> > work if I simply don't call e1000e_vlan_strip_disable() if there are
> > active vlans (plagiarizing a function from the e1000 driver here) on the
> > interface.
> >
> > Also slipped a related-ish fix to the kerneldoc text for
> > e1000e_vlan_strip_disable here...
> >
> > CC: Jeff Kirsher 
> > CC: intel-wired-...@lists.osuosl.org
> > CC: netdev@vger.kernel.org
> > Signed-off-by: Jarod Wilson 
> > ---
> >  drivers/net/ethernet/intel/e1000e/netdev.c | 15 +--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> Raanan, please review this patch.  Even though it is an RFC I will be
> adding it to my queue for testing.
> http://patchwork.ozlabs.org/patch/623238/

Yup, without this patch disabling rxvlan offload does indeed break vlan 
connectivity and with the patch I can disable and re-enable rxvlan offloads as 
much as I care to.  It also makes it through my regression tests without 
problems.

Tested-by: Aaron Brown 

This is from functional - does it work - testing perspective so review is 
probably still in order.

[RFC PATCH 13/16] net: dsa: mv88e6xxx: Refactor MDIO so driver registers mdio bus

2016-05-26 Thread Andrew Lunn

Have the switch driver register its own MDIO bus. This allows for an
mdio property in the device tree, with child nodes for phys, which
can be referenced via phandles, etc.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx.c | 204 ++--
 drivers/net/dsa/mv88e6xxx.h |   6 ++
 2 files changed, 144 insertions(+), 66 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 11845eccf670..8fbc771f0475 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -238,16 +239,16 @@ int mv88e6xxx_set_addr(struct dsa_switch *ds, u8 *addr)
return mv88e6xxx_set_addr_direct(ds, addr);
 }
 
-static int _mv88e6xxx_phy_read(struct mv88e6xxx_priv_state *ps, int addr,
-  int regnum)
+static int _mv88e6xxx_mdio_read(struct mv88e6xxx_priv_state *ps, int addr,
+   int regnum)
 {
if (addr >= 0)
return _mv88e6xxx_reg_read(ps, addr, regnum);
return 0x;
 }
 
-static int _mv88e6xxx_phy_write(struct mv88e6xxx_priv_state *ps, int addr,
-   int regnum, u16 val)
+static int _mv88e6xxx_mdio_write(struct mv88e6xxx_priv_state *ps, int addr,
+int regnum, u16 val)
 {
if (addr >= 0)
return _mv88e6xxx_reg_write(ps, addr, regnum, val);
@@ -378,8 +379,8 @@ void mv88e6xxx_ppu_state_init(struct mv88e6xxx_priv_state 
*ps)
ps->ppu_timer.function = mv88e6xxx_ppu_reenable_timer;
 }
 
-static int mv88e6xxx_phy_read_ppu(struct mv88e6xxx_priv_state *ps, int addr,
- int regnum)
+static int mv88e6xxx_mdio_read_ppu(struct mv88e6xxx_priv_state *ps, int addr,
+  int regnum)
 {
int ret;
 
@@ -392,8 +393,8 @@ static int mv88e6xxx_phy_read_ppu(struct 
mv88e6xxx_priv_state *ps, int addr,
return ret;
 }
 
-static int mv88e6xxx_phy_write_ppu(struct mv88e6xxx_priv_state *ps, int addr,
-  int regnum, u16 val)
+static int mv88e6xxx_mdio_write_ppu(struct mv88e6xxx_priv_state *ps, int addr,
+   int regnum, u16 val)
 {
int ret;
 
@@ -829,7 +830,7 @@ static int mv88e6xxx_wait(struct mv88e6xxx_priv_state *ps, 
int reg,
return ret;
 }
 
-static int _mv88e6xxx_phy_wait(struct mv88e6xxx_priv_state *ps)
+static int _mv88e6xxx_mdio_wait(struct mv88e6xxx_priv_state *ps)
 {
return _mv88e6xxx_wait(ps, REG_GLOBAL2, GLOBAL2_SMI_OP,
   GLOBAL2_SMI_OP_BUSY);
@@ -1076,8 +1077,8 @@ static int _mv88e6xxx_atu_wait(struct 
mv88e6xxx_priv_state *ps)
   GLOBAL_ATU_OP_BUSY);
 }
 
-static int _mv88e6xxx_phy_read_indirect(struct mv88e6xxx_priv_state *ps,
-   int addr, int regnum)
+static int _mv88e6xxx_mdio_read_indirect(struct mv88e6xxx_priv_state *ps,
+int addr, int regnum)
 {
int ret;
 
@@ -1087,7 +1088,7 @@ static int _mv88e6xxx_phy_read_indirect(struct 
mv88e6xxx_priv_state *ps,
if (ret < 0)
return ret;
 
-   ret = _mv88e6xxx_phy_wait(ps);
+   ret = _mv88e6xxx_mdio_wait(ps);
if (ret < 0)
return ret;
 
@@ -1096,8 +1097,8 @@ static int _mv88e6xxx_phy_read_indirect(struct 
mv88e6xxx_priv_state *ps,
return ret;
 }
 
-static int _mv88e6xxx_phy_write_indirect(struct mv88e6xxx_priv_state *ps,
-int addr, int regnum, u16 val)
+static int _mv88e6xxx_mdio_write_indirect(struct mv88e6xxx_priv_state *ps,
+ int addr, int regnum, u16 val)
 {
int ret;
 
@@ -1109,7 +1110,7 @@ static int _mv88e6xxx_phy_write_indirect(struct 
mv88e6xxx_priv_state *ps,
   GLOBAL2_SMI_OP_22_WRITE | (addr << 5) |
   regnum);
 
-   return _mv88e6xxx_phy_wait(ps);
+   return _mv88e6xxx_mdio_wait(ps);
 }
 
 static int mv88e6xxx_get_eee(struct dsa_switch *ds, int port,
@@ -1123,7 +1124,7 @@ static int mv88e6xxx_get_eee(struct dsa_switch *ds, int 
port,
 
mutex_lock(>smi_mutex);
 
-   reg = _mv88e6xxx_phy_read_indirect(ps, port, 16);
+   reg = _mv88e6xxx_mdio_read_indirect(ps, port, 16);
if (reg < 0)
goto out;
 
@@ -1154,7 +1155,7 @@ static int mv88e6xxx_set_eee(struct dsa_switch *ds, int 
port,
 
mutex_lock(>smi_mutex);
 
-   ret = _mv88e6xxx_phy_read_indirect(ps, port, 16);
+   ret = _mv88e6xxx_mdio_read_indirect(ps, port, 16);
if (ret < 0)
goto out;
 
@@ -1164,7 +1165,7 @@ static int mv88e6xxx_set_eee(struct dsa_switch *ds, int 
port,
if (e->tx_lpi_enabled)
reg |= 0x0100;
 
-   ret =

[RFC PATCH 14/16] net: dsa: Add new binding implementation

2016-05-26 Thread Andrew Lunn

The existing DSA binding has a number of limitations and problems. The
main problem is that it cannot represent a switch as a linux device,
hanging off some bus. It is limited to one CPU port. The DSA platform
device is artificial, and does not really represent hardware.

Implement a new binding which can be embedded into any type of node on
a bus to represent one switch device, and its links to other switches.

Signed-off-by: Andrew Lunn 
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/mv88e6xxx.c |   7 +
 include/net/dsa.h   |  20 ++
 net/dsa/Makefile|   2 +-
 net/dsa/dsa.c   |   1 +
 net/dsa/dsa2.c  | 653 
 net/dsa/dsa_priv.h  |   2 +-
 net/dsa/slave.c |   8 +-
 7 files changed, 689 insertions(+), 4 deletions(-)
 create mode 100644 net/dsa/dsa2.c

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 8fbc771f0475..a6ccfdfbe225 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3749,6 +3749,12 @@ int mv88e6xxx_probe(struct mdio_device *mdiodev)
 
dev_set_drvdata(dev, ds);
 
+   err = dsa_register_switch(ds, mdiodev->dev.of_node);
+   if (err) {
+   mv88e6xxx_mdio_unregister(ps);
+   return err;
+   }
+
dev_info(dev, "switch 0x%x probed: %s, revision %u\n",
 prod_num, ps->info->name, rev);
 
@@ -3760,6 +3766,7 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)
struct dsa_switch *ds = dev_get_drvdata(>dev);
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
 
+   dsa_unregister_switch(ds);
put_device(>bus->dev);
 
mv88e6xxx_mdio_unregister(ps);
diff --git a/include/net/dsa.h b/include/net/dsa.h
index adb75422bc6c..032f6efa4b3e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -86,6 +86,17 @@ struct dsa_platform_data {
 struct packet_type;
 
 struct dsa_switch_tree {
+   struct list_headlist;
+
+   /* Tree identifier */
+   u32 tree;
+
+   /* Number of switches attached to this tree */
+   struct kref refcount;
+
+   /* Has this tree been applied to the hardware? */
+   bool applied;
+
/*
 * Configuration data for the platform device that owns
 * this dsa switch tree instance.
@@ -172,9 +183,15 @@ struct dsa_switch {
 #endif
 
/*
+* The lower device this switch uses to talk to the host
+*/
+   struct net_device *master_netdev;
+
+   /*
 * Slave mii_bus and devices for the individual ports.
 */
u32 dsa_port_mask;
+   u32 cpu_port_mask;
u32 enabled_port_mask;
u32 phys_mii_mask;
struct dsa_port ports[DSA_MAX_PORTS];
@@ -363,4 +380,7 @@ static inline bool dsa_uses_tagged_protocol(struct 
dsa_switch_tree *dst)
 {
return dst->rcv != NULL;
 }
+
+void dsa_unregister_switch(struct dsa_switch *ds);
+int dsa_register_switch(struct dsa_switch *ds, struct device_node *np);
 #endif
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index da06ed1df620..8af4ded70f1c 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -1,6 +1,6 @@
 # the core
 obj-$(CONFIG_NET_DSA) += dsa_core.o
-dsa_core-y += dsa.o slave.o
+dsa_core-y += dsa.o slave.o dsa2.o
 
 # tagging formats
 dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index b1787e2f4bb3..b3cada3ecab7 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -294,6 +294,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
}
dst->cpu_switch = index;
dst->cpu_port = i;
+   ds->cpu_port_mask |= 1 << i;
} else if (!strcmp(name, "dsa")) {
ds->dsa_port_mask |= 1 << i;
} else {
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
new file mode 100644
index ..4e5051bed643
--- /dev/null
+++ b/net/dsa/dsa2.c
@@ -0,0 +1,653 @@
+/*
+ * net/dsa/dsa2.c - Hardware switch handling, binding version 2
+ * Copyright (c) 2008-2009 Marvell Semiconductor
+ * Copyright (c) 2013 Florian Fainelli 
+ * Copyright (c) 2016 Andrew Lunn 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "dsa_priv.h"
+
+static LIST_HEAD(dsa_switch_trees);
+static DEFINE_MUTEX(dsa2_mutex);
+
+static struct dsa_switch_tree *dsa_get_dst(u32 tree)
+{
+   struct dsa_switch_tree *dst;
+
+

[RFC PATCH 12/16] dsa: Make mdio bus optional

2016-05-26 Thread Andrew Lunn

The switch may want to instantiate its own MDIO bus. Only do it
centrally if the switch has not already created one, and the read op
is implemented.

Signed-off-by: Andrew Lunn 
---
 net/dsa/dsa.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 49cdf143c822..b1787e2f4bb3 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -340,17 +340,18 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
if (ret < 0)
goto out;
 
-   ds->slave_mii_bus = devm_mdiobus_alloc(parent);
-   if (ds->slave_mii_bus == NULL) {
-   ret = -ENOMEM;
-   goto out;
-   }
-   dsa_slave_mii_bus_init(ds);
-
-   ret = mdiobus_register(ds->slave_mii_bus);
-   if (ret < 0)
-   goto out;
+   if (!ds->slave_mii_bus && drv->phy_read) {
+   ds->slave_mii_bus = devm_mdiobus_alloc(parent);
+   if (!ds->slave_mii_bus) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   dsa_slave_mii_bus_init(ds);
 
+   ret = mdiobus_register(ds->slave_mii_bus);
+   if (ret < 0)
+   goto out;
+   }
 
/*
 * Create network devices for physical switch ports.
@@ -493,7 +494,8 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
dsa_cpu_dsa_destroy(ds->ports[port].dn);
}
 
-   mdiobus_unregister(ds->slave_mii_bus);
+   if (ds->slave_mii_bus && ds->drv->phy_read)
+   mdiobus_unregister(ds->slave_mii_bus);
 }
 
 #ifdef CONFIG_PM_SLEEP
-- 
2.8.1

[RFC PATCH 11/16] net: dsa: Refactor selection of tag ops into a function

2016-05-26 Thread Andrew Lunn

Replace the two switch statements with an array lookup, and store the
result in the dsa tree structure. The drivers no longer need to know
the selected tag protocol, so remove it from the dsa switch structure.

Signed-off-by: Andrew Lunn 
---
 include/net/dsa.h  |  8 +-
 net/dsa/dsa.c  | 71 ++
 net/dsa/dsa_priv.h |  1 +
 net/dsa/slave.c| 35 +--
 4 files changed, 54 insertions(+), 61 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index ea0e7d30342b..adb75422bc6c 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -26,6 +26,7 @@ enum dsa_tag_protocol {
DSA_TAG_PROTO_TRAILER,
DSA_TAG_PROTO_EDSA,
DSA_TAG_PROTO_BRCM,
+   _DSA_TAG_LAST,
 };
 
 #define DSA_MAX_SWITCHES   4
@@ -100,7 +101,6 @@ struct dsa_switch_tree {
   struct net_device *dev,
   struct packet_type *pt,
   struct net_device *orig_dev);
-   enum dsa_tag_protocol   tag_protocol;
 
/*
 * Original copy of the master netdev ethtool_ops
@@ -117,6 +117,12 @@ struct dsa_switch_tree {
 * Data for the individual switch chips.
 */
struct dsa_switch   *ds[DSA_MAX_SWITCHES];
+
+   /*
+* Tagging protocol operations for adding and removing an
+* encapsulation tag.
+*/
+   const struct dsa_device_ops *tag_ops;
 };
 
 struct dsa_port {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 20eede3facf5..49cdf143c822 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -29,6 +29,33 @@
 
 char dsa_driver_version[] = "0.1";
 
+static struct sk_buff *dsa_slave_notag_xmit(struct sk_buff *skb,
+   struct net_device *dev)
+{
+   /* Just return the original SKB */
+   return skb;
+}
+
+static const struct dsa_device_ops none_ops = {
+   .xmit   = dsa_slave_notag_xmit,
+   .rcv= NULL,
+};
+
+const struct dsa_device_ops *dsa_device_ops[_DSA_TAG_LAST] = {
+#ifdef CONFIG_NET_DSA_TAG_DSA
+   [DSA_TAG_PROTO_DSA] = _netdev_ops,
+#endif
+#ifdef CONFIG_NET_DSA_TAG_EDSA
+   [DSA_TAG_PROTO_EDSA] = _netdev_ops,
+#endif
+#ifdef CONFIG_NET_DSA_TAG_TRAILER
+   [DSA_TAG_PROTO_TRAILER] = _netdev_ops,
+#endif
+#ifdef CONFIG_NET_DSA_TAG_BRCM
+   [DSA_TAG_PROTO_BRCM] = _netdev_ops,
+#endif
+   [DSA_TAG_PROTO_NONE] = _ops,
+};
 
 /* switch driver registration ***/
 static DEFINE_MUTEX(dsa_switch_drivers_mutex);
@@ -225,6 +252,20 @@ static int dsa_cpu_dsa_setups(struct dsa_switch *ds, 
struct device *dev)
return 0;
 }
 
+const struct dsa_device_ops *dsa_resolve_tag_protocol(int tag_protocol)
+{
+   const struct dsa_device_ops *ops;
+
+   if (tag_protocol >= _DSA_TAG_LAST)
+   return ERR_PTR(-EINVAL);
+   ops = dsa_device_ops[tag_protocol];
+
+   if (!ops)
+   return ERR_PTR(-ENOPROTOOPT);
+
+   return ops;
+}
+
 static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent)
 {
struct dsa_switch_driver *drv = ds->drv;
@@ -277,35 +318,13 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
 * switch.
 */
if (dst->cpu_switch == index) {
-   switch (drv->tag_protocol) {
-#ifdef CONFIG_NET_DSA_TAG_DSA
-   case DSA_TAG_PROTO_DSA:
-   dst->rcv = dsa_netdev_ops.rcv;
-   break;
-#endif
-#ifdef CONFIG_NET_DSA_TAG_EDSA
-   case DSA_TAG_PROTO_EDSA:
-   dst->rcv = edsa_netdev_ops.rcv;
-   break;
-#endif
-#ifdef CONFIG_NET_DSA_TAG_TRAILER
-   case DSA_TAG_PROTO_TRAILER:
-   dst->rcv = trailer_netdev_ops.rcv;
-   break;
-#endif
-#ifdef CONFIG_NET_DSA_TAG_BRCM
-   case DSA_TAG_PROTO_BRCM:
-   dst->rcv = brcm_netdev_ops.rcv;
-   break;
-#endif
-   case DSA_TAG_PROTO_NONE:
-   break;
-   default:
-   ret = -ENOPROTOOPT;
+   dst->tag_ops = dsa_resolve_tag_protocol(drv->tag_protocol);
+   if (IS_ERR(dst->tag_ops)) {
+   ret = PTR_ERR(dst->tag_ops);
goto out;
}
 
-   dst->tag_protocol = drv->tag_protocol;
+   dst->rcv = dst->tag_ops->rcv;
}
 
memcpy(ds->rtable, cd->rtable, sizeof(ds->rtable));
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index dbea5d9e7f75..72f7b8989cfb 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -53,6 +53,7 @@ extern char dsa_driver_version[];
 int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
  struct device_node *port_dn, int port);
 void

[RFC PATCH 10/16] net: dsa: mv88e6xxx: Only support EDSA tagging

2016-05-26 Thread Andrew Lunn

The merged driver no longer offers the option to use DSA tagging. So
remove the code to setup the switch to do DSA tagging and hard code
the use of EDSA.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 492801a6398c..11845eccf670 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2725,11 +2725,8 @@ static int mv88e6xxx_setup_port(struct 
mv88e6xxx_priv_state *ps, int port)
if (mv88e6xxx_6352_family(ps) || mv88e6xxx_6351_family(ps) ||
mv88e6xxx_6165_family(ps) || mv88e6xxx_6097_family(ps) ||
mv88e6xxx_6320_family(ps)) {
-   if (ds->dst->tag_protocol == DSA_TAG_PROTO_EDSA)
-   reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA;
-   else
-   reg |= PORT_CONTROL_FRAME_MODE_DSA;
-   reg |= PORT_CONTROL_FORWARD_UNKNOWN |
+   reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA |
+   PORT_CONTROL_FORWARD_UNKNOWN |
PORT_CONTROL_FORWARD_UNKNOWN_MC;
}
 
@@ -2737,7 +2734,6 @@ static int mv88e6xxx_setup_port(struct 
mv88e6xxx_priv_state *ps, int port)
mv88e6xxx_6165_family(ps) || mv88e6xxx_6097_family(ps) ||
mv88e6xxx_6095_family(ps) || mv88e6xxx_6065_family(ps) ||
mv88e6xxx_6185_family(ps) || mv88e6xxx_6320_family(ps)) {
-   if (ds->dst->tag_protocol == DSA_TAG_PROTO_EDSA)
reg |= PORT_CONTROL_EGRESS_ADD_TAG;
}
}
-- 
2.8.1

[RFC PATCH 15/16] arm: dt: vf610-zii-devel-b: Make use of new DSA binding

2016-05-26 Thread Andrew Lunn

Hang the three switches of the three MDIO busses using the new DSA
binding. Also, make use of the mdio-bus and explicitly list the phys
on one device. This is not required, but good for testing.

Signed-off-by: Andrew Lunn 
---
 arch/arm/boot/dts/vf610-zii-dev-rev-b.dts | 328 --
 1 file changed, 170 insertions(+), 158 deletions(-)

diff --git a/arch/arm/boot/dts/vf610-zii-dev-rev-b.dts 
b/arch/arm/boot/dts/vf610-zii-dev-rev-b.dts
index 6c60b7f91104..5c1fcab4a6f7 100644
--- a/arch/arm/boot/dts/vf610-zii-dev-rev-b.dts
+++ b/arch/arm/boot/dts/vf610-zii-dev-rev-b.dts
@@ -85,187 +85,199 @@
reg = <1>;
#address-cells = <1>;
#size-cells = <0>;
+
+   switch0: switch0@0 {
+   compatible = "marvell,mv88e6085";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0>;
+   dsa,member = <0 0>;
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   port@0 {
+   reg = <0>;
+   label = "lan0";
+   };
+
+   port@1 {
+   reg = <1>;
+   label = "lan1";
+   };
+
+   port@2 {
+   reg = <2>;
+   label = "lan2";
+   };
+
+   switch0port5: port@5 {
+   reg = <5>;
+   label = "dsa";
+   phy-mode = "rgmii-txid";
+   link = <
+   >;
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   };
+   };
+
+   port@6 {
+   reg = <6>;
+   label = "cpu";
+   ethernet = <>;
+   fixed-link {
+   speed = <100>;
+   full-duplex;
+   };
+   };
+   };
+   };
};
 
mdio_mux_2: mdio@2 {
reg = <2>;
#address-cells = <1>;
#size-cells = <0>;
-   };
-
-   mdio_mux_4: mdio@4 {
-   reg = <4>;
-   #address-cells = <1>;
-   #size-cells = <0>;
-   };
-
-   mdio_mux_8: mdio@8 {
-   reg = <8>;
-   #address-cells = <1>;
-   #size-cells = <0>;
-   };
-   };
-
-   dsa {
-   compatible = "marvell,dsa";
-   #address-cells = <2>;
-   #size-cells = <0>;
-   dsa,ethernet = <>;
-   dsa,mii-bus = <_mux_1>;
-
-   /* 6352 - Primary - 7 ports */
-   switch0: switch@0-0 {
-   #address-cells = <1>;
-   #size-cells = <0>;
-   reg = <0x00 0>;
-   eeprom-length = <512>;
 
-   port@0 {
+   switch1: switch1@0 {
+   compatible = "marvell,mv88e6085";
+   #address-cells = <1>;
+   #size-cells = <0>;
reg = <0>;
-   label = "lan0";
-   };
-
-   port@1 {
-   reg = <1>;
-   label = "lan1";
-   };
-
-   port@2 {
-   reg = <2>;
-   label = "lan2";
-   };
-
-   switch0port5: port@5 {
-   reg = <5>;
-

[RFC PATCH 16/16] dsa: Document new binding

2016-05-26 Thread Andrew Lunn

Add the new binding to the documentation of the existing binding.
Mark the old binding as deprecated.

Signed-off-by: Andrew Lunn 
Signed-off-by: Florian Fainelli 
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt | 278 +-
 1 file changed, 276 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index 9f4807f90c31..8c9e1b80cb65 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -1,5 +1,279 @@
-Marvell Distributed Switch Architecture Device Tree Bindings
-
+Distributed Switch Architecture Device Tree Bindings
+
+
+Two bindings exist, one of which has been deprecated due to
+limitations.
+
+Current Binding
+---
+
+Switches are true Linux devices and can be probes by any means. Once
+probed, they register to the DSA framework, passing a node
+pointer. This node is expected to fulfil the following binding, and
+may contain additional properties as required by the device it is
+embedded within.
+
+Required properties:
+
+- ports: A container for child nodes representing switch ports.
+
+Optional properties:
+
+- dsa,member   : A two element list indicates which DSA cluster, and position
+ within the cluster a switch takes. <0 0> is cluster 0,
+ switch 0. <0 1> is cluster 0, switch 1. <1 0> is cluster 1,
+ switch 0. A switch not part of any cluster (single device
+ hanging off a CPU port) must not specify this property
+
+The ports container has the following properties
+
+Required properties:
+
+- #address-cells   : Must be 1
+- #size-cells  : Must be 0
+
+Each port children node must have the following mandatory properties:
+- reg  : Describes the port address in the switch
+- label: Describes the label associated with this 
port, which
+  will become the netdev name. Special labels are
+ "cpu" to indicate a CPU port and "dsa" to
+ indicate an uplink/downlink port between switches in
+ the cluster.
+
+A port labelled "dsa" has the following mandatory property:
+
+- link : Should be a list of phandles to other switch's DSA
+ port. This port is used as the outgoing port
+ towards the phandle ports. The full routing
+ information must be given, not just the one hop
+ routes to neighbouring switches.
+
+A port labelled "cpu" has the following mandatory property:
+
+- ethernet : Should be a phandle to a valid Ethernet device node.
+  This host device is what the switch port is
+ connected to.
+
+Port child nodes may also contain the following optional standardised
+properties, described in binding documents:
+
+- phy-handle   : Phandle to a PHY on an MDIO bus. See
+ Documentation/devicetree/bindings/net/ethernet.txt
+ for details.
+
+- phy-mode : See
+ Documentation/devicetree/bindings/net/ethernet.txt
+ for details.
+
+- fixed-link   : Fixed-link subnode describing a link to a non-MDIO
+ managed entity. See
+ Documentation/devicetree/bindings/net/fixed-link.txt
+ for details.
+
+Example
+
+The following example shows three switches on three MDIO busses,
+linked into one DSA cluster.
+
+ {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   switch0: switch0@0 {
+   compatible = "marvell,mv88e6085";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0>;
+
+   dsa,member = <0 0>;
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   port@0 {
+   reg = <0>;
+   label = "lan0";
+   };
+
+   port@1 {
+   reg = <1>;
+   label = "lan1";
+   };
+
+   port@2 {
+   reg = <2>;
+   label = "lan2";
+   };
+
+   switch0port5: port@5 {
+   reg = <5>;
+   label = "dsa";
+   phy-mode = "rgmii-txid";
+   link = <
+

[RFC PATCH 00/16] New DSA bind, switches as devices

2016-05-26 Thread Andrew Lunn

This is an RFC patchset and should not be accepted yet.

The interesting patches here are the last three. They implement a new
binding for DSA, which removes a few limitations of the current DSA
binding. In particular, it allows switches to be true Linux devices.
These devices can be on any type of bus, unlike the old DSA binding
which assumes MDIO. See the commit log for more details. The second to
last patch modifies an existing boards device tree to use the new
binding, giving a good example of how switches can be true MDIO
devices. The last patch documents the new binding.

I know both John Crispin and Bryan Whitehead are interesting in
implementing DSA drivers, hence i have CC: you. Comments welcome.

Thanks go to Florian and Vivien for reviewing, testing and bug fixing
these patches.

Andrew Lunn (15):
  dsa: slave: chip data is optional, don't dereference NULL
  dsa: slave: Remove MDIO address from switch MDIO bus name
  dsa: tag_{e}dsa.c: Remove dependency on platform data
  dsa: Add a ports structure and use it in the switch structure
  dsa: Move port device node into port structure
  dsa: Remove dynamic allocate of routing table
  dsa: Copy the routing table into the switch structure
  dsa: dsa: Split up creating/destroying of DSA and CPU ports
  net: dsa: mv88e6xxx: Only support EDSA tagging
  net: dsa: Refactor selection of tag ops into a function
  dsa: Make mdio bus optional
  net: dsa: mv88e6xxx: Refactor MDIO so driver registers mdio bus
  net: dsa: Add new binding implementation
  arm: dt: vf610-zii-devel-b: Make use of new DSA binding
  dsa: Document new binding

Vivien Didelot (1):
  net: dsa: mv88e6xxx: fix circular lock in PPU work

 Documentation/devicetree/bindings/net/dsa/dsa.txt | 278 -
 arch/arm/boot/dts/vf610-zii-dev-rev-b.dts | 328 +--
 drivers/net/dsa/bcm_sf2.c |   4 +-
 drivers/net/dsa/mv88e6xxx.c   | 264 ++---
 drivers/net/dsa/mv88e6xxx.h   |   6 +
 include/net/dsa.h |  57 +-
 net/dsa/Makefile  |   2 +-
 net/dsa/dsa.c | 210 ---
 net/dsa/dsa2.c| 653 ++
 net/dsa/dsa_priv.h|   6 +-
 net/dsa/slave.c   |  57 +-
 net/dsa/tag_brcm.c|   4 +-
 net/dsa/tag_dsa.c |  10 +-
 net/dsa/tag_edsa.c|  10 +-
 net/dsa/tag_trailer.c |   4 +-
 15 files changed, 1485 insertions(+), 408 deletions(-)
 create mode 100644 net/dsa/dsa2.c

-- 
2.8.1

[RFC PATCH 05/16] dsa: Add a ports structure and use it in the switch structure

2016-05-26 Thread Andrew Lunn

There are going to be more per-port members added to the switch
structure. So add a port structure and move the netdev into it.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/bcm_sf2.c   |  4 ++--
 drivers/net/dsa/mv88e6xxx.c | 27 ---
 include/net/dsa.h   |  8 ++--
 net/dsa/dsa.c   |  8 
 net/dsa/slave.c |  4 ++--
 net/dsa/tag_brcm.c  |  4 ++--
 net/dsa/tag_dsa.c   |  4 ++--
 net/dsa/tag_edsa.c  |  4 ++--
 net/dsa/tag_trailer.c   |  4 ++--
 9 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 10ddd5a5dfb6..73df91bb0466 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -804,7 +804,7 @@ static int bcm_sf2_sw_fdb_dump(struct dsa_switch *ds, int 
port,
   int (*cb)(struct switchdev_obj *obj))
 {
struct bcm_sf2_priv *priv = ds_to_priv(ds);
-   struct net_device *dev = ds->ports[port];
+   struct net_device *dev = ds->ports[port].netdev;
struct bcm_sf2_arl_entry results[2];
unsigned int count = 0;
int ret;
@@ -1248,7 +1248,7 @@ static void bcm_sf2_sw_fixed_link_update(struct 
dsa_switch *ds, int port,
 * state machine and make it go in PHY_FORCING state instead.
 */
if (!status->link)
-   netif_carrier_off(ds->ports[port]);
+   netif_carrier_off(ds->ports[port].netdev);
status->duplex = 1;
} else {
status->link = 1;
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index c361036e7f9c..85332d9a245a 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1327,7 +1327,7 @@ static int _mv88e6xxx_port_state(struct 
mv88e6xxx_priv_state *ps, int port,
if (ret)
return ret;
 
-   netdev_dbg(ds->ports[port], "PortState %s (was %s)\n",
+   netdev_dbg(ds->ports[port].netdev, "PortState %s (was %s)\n",
   mv88e6xxx_port_state_names[state],
   mv88e6xxx_port_state_names[oldstate]);
}
@@ -1405,7 +1405,8 @@ static void mv88e6xxx_port_stp_state_set(struct 
dsa_switch *ds, int port,
mutex_unlock(>smi_mutex);
 
if (err)
-   netdev_err(ds->ports[port], "failed to update state to %s\n",
+   netdev_err(ds->ports[port].netdev,
+  "failed to update state to %s\n",
   mv88e6xxx_port_state_names[stp_state]);
 }
 
@@ -1431,8 +1432,8 @@ static int _mv88e6xxx_port_pvid(struct 
mv88e6xxx_priv_state *ps, int port,
if (ret < 0)
return ret;
 
-   netdev_dbg(ds->ports[port], "DefaultVID %d (was %d)\n", *new,
-  pvid);
+   netdev_dbg(ds->ports[port].netdev,
+  "DefaultVID %d (was %d)\n", *new, pvid);
}
 
if (old)
@@ -1847,7 +1848,8 @@ static int _mv88e6xxx_port_fid(struct 
mv88e6xxx_priv_state *ps, int port,
if (ret < 0)
return ret;
 
-   netdev_dbg(ds->ports[port], "FID %d (was %d)\n", *new, fid);
+   netdev_dbg(ds->ports[port].netdev,
+  "FID %d (was %d)\n", *new, fid);
}
 
if (old)
@@ -2028,7 +2030,7 @@ static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch 
*ds, int port,
ps->ports[port].bridge_dev)
break; /* same bridge, check next VLAN */
 
-   netdev_warn(ds->ports[port],
+   netdev_warn(ds->ports[port].netdev,
"hardware VLAN %d already used by %s\n",
vlan.vid,
netdev_name(ps->ports[i].bridge_dev));
@@ -2078,7 +2080,7 @@ static int mv88e6xxx_port_vlan_filtering(struct 
dsa_switch *ds, int port,
if (ret < 0)
goto unlock;
 
-   netdev_dbg(ds->ports[port], "802.1Q Mode %s (was %s)\n",
+   netdev_dbg(ds->ports[port].netdev, "802.1Q Mode %s (was %s)\n",
   mv88e6xxx_port_8021q_mode_names[new],
   mv88e6xxx_port_8021q_mode_names[old]);
}
@@ -2147,11 +2149,12 @@ static void mv88e6xxx_port_vlan_add(struct dsa_switch 
*ds, int port,
 
for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid)
if (_mv88e6xxx_port_vlan_add(ps, port, vid, untagged))
-   netdev_err(ds->ports[port], "failed to add VLAN %d%c\n",
+   netdev_err(ds->ports[port].netdev,
+  "failed to add VLAN %d%c\n",
   vid, untagged ? 'u' : 't');
 
if (pvid

[RFC PATCH 09/16] dsa: dsa: Split up creating/destroying of DSA and CPU ports

2016-05-26 Thread Andrew Lunn

Refactor the code to setup a single DSA/CPU port into a function of
its own, and export it, so it can be used by the new binding.

Similarly, refactor the destroy code into a function.

Signed-off-by: Andrew Lunn 
---
 net/dsa/dsa.c  | 86 --
 net/dsa/dsa_priv.h |  3 ++
 2 files changed, 54 insertions(+), 35 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index bfe1d03d4730..20eede3facf5 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -180,36 +180,47 @@ __ATTRIBUTE_GROUPS(dsa_hwmon);
 #endif /* CONFIG_NET_DSA_HWMON */
 
 /* basic switch operations **/
-static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master)
+int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
+ struct device_node *port_dn, int port)
 {
-   struct device_node *port_dn;
struct phy_device *phydev;
-   int ret, port, mode;
+   int ret, mode;
+
+   if (of_phy_is_fixed_link(port_dn)) {
+   ret = of_phy_register_fixed_link(port_dn);
+   if (ret) {
+   dev_err(dev, "failed to register fixed PHY\n");
+   return ret;
+   }
+   phydev = of_phy_find_device(port_dn);
+
+   mode = of_get_phy_mode(port_dn);
+   if (mode < 0)
+   mode = PHY_INTERFACE_MODE_NA;
+   phydev->interface = mode;
+
+   genphy_config_init(phydev);
+   genphy_read_status(phydev);
+   if (ds->drv->adjust_link)
+   ds->drv->adjust_link(ds, port, phydev);
+   }
+
+   return 0;
+}
+
+static int dsa_cpu_dsa_setups(struct dsa_switch *ds, struct device *dev)
+{
+   struct device_node *port_dn;
+   int ret, port;
 
for (port = 0; port < DSA_MAX_PORTS; port++) {
if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
continue;
 
port_dn = ds->ports[port].dn;
-   if (of_phy_is_fixed_link(port_dn)) {
-   ret = of_phy_register_fixed_link(port_dn);
-   if (ret) {
-   netdev_err(master,
-  "failed to register fixed PHY\n");
-   return ret;
-   }
-   phydev = of_phy_find_device(port_dn);
-
-   mode = of_get_phy_mode(port_dn);
-   if (mode < 0)
-   mode = PHY_INTERFACE_MODE_NA;
-   phydev->interface = mode;
-
-   genphy_config_init(phydev);
-   genphy_read_status(phydev);
-   if (ds->drv->adjust_link)
-   ds->drv->adjust_link(ds, port, phydev);
-   }
+   ret = dsa_cpu_dsa_setup(ds, dev, port_dn, port);
+   if (ret)
+   return ret;
}
return 0;
 }
@@ -340,7 +351,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
}
 
/* Perform configuration of the CPU and DSA ports */
-   ret = dsa_cpu_dsa_setup(ds, dst->master_netdev);
+   ret = dsa_cpu_dsa_setups(ds, parent);
if (ret < 0) {
netdev_err(dst->master_netdev, "[%d] : can't configure CPU and 
DSA ports\n",
   index);
@@ -423,10 +434,21 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
return ds;
 }
 
-static void dsa_switch_destroy(struct dsa_switch *ds)
+void dsa_cpu_dsa_destroy(struct device_node *port_dn)
 {
-   struct device_node *port_dn;
struct phy_device *phydev;
+
+   if (of_phy_is_fixed_link(port_dn)) {
+   phydev = of_phy_find_device(port_dn);
+   if (phydev) {
+   phy_device_free(phydev);
+   fixed_phy_unregister(phydev);
+   }
+   }
+}
+
+static void dsa_switch_destroy(struct dsa_switch *ds)
+{
int port;
 
 #ifdef CONFIG_NET_DSA_HWMON
@@ -445,17 +467,11 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
dsa_slave_destroy(ds->ports[port].netdev);
}
 
-   /* Remove any fixed link PHYs */
+   /* Disable configuration of the CPU and DSA ports */
for (port = 0; port < DSA_MAX_PORTS; port++) {
-   port_dn = ds->ports[port].dn;
-   if (of_phy_is_fixed_link(port_dn)) {
-   phydev = of_phy_find_device(port_dn);
-   if (phydev) {
-   phy_device_free(phydev);
-   of_node_put(port_dn);
-   fixed_phy_unregister(phydev);
-   }
-   }
+   if ((dsa_is_cpu_port(ds, port) ||

[RFC PATCH 01/16] dsa: slave: chip data is optional, don't dereference NULL

2016-05-26 Thread Andrew Lunn

The new binding does not make use of dsa_chip_data, a.k.a cd.  When
retrieving the size of the EEPROM attached to a switch, don't assume
there is a cd attached to the switch structure.

Signed-off-by: Andrew Lunn 
---
 net/dsa/slave.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 152436cdab30..135a91706755 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -615,7 +615,7 @@ static int dsa_slave_get_eeprom_len(struct net_device *dev)
struct dsa_slave_priv *p = netdev_priv(dev);
struct dsa_switch *ds = p->parent;
 
-   if (ds->cd->eeprom_len)
+   if (ds->cd && ds->cd->eeprom_len)
return ds->cd->eeprom_len;
 
if (ds->drv->get_eeprom_len)
-- 
2.8.1

[RFC PATCH 03/16] dsa: slave: Remove MDIO address from switch MDIO bus name

2016-05-26 Thread Andrew Lunn

The DSA layer should no longer assume the switch is connected to an
MDIO bus. As a result, we cannot use the address on the MDIO bus when
forming the name of the switches internal MDIO bus for its builtin and
possibly external PHYs. The switch index is sufficient to make the
name unique, so drop the MDIO address.

Signed-off-by: Andrew Lunn 
---
 net/dsa/slave.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 135a91706755..f640a48a6ff3 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -49,8 +49,7 @@ void dsa_slave_mii_bus_init(struct dsa_switch *ds)
ds->slave_mii_bus->name = "dsa slave smi";
ds->slave_mii_bus->read = dsa_slave_phy_read;
ds->slave_mii_bus->write = dsa_slave_phy_write;
-   snprintf(ds->slave_mii_bus->id, MII_BUS_ID_SIZE, "dsa-%d:%.2x",
-   ds->index, ds->cd->sw_addr);
+   snprintf(ds->slave_mii_bus->id, MII_BUS_ID_SIZE, "dsa-%d", ds->index);
ds->slave_mii_bus->parent = ds->dev;
ds->slave_mii_bus->phy_mask = ~ds->phys_mii_mask;
 }
-- 
2.8.1

[RFC PATCH 07/16] dsa: Remove dynamic allocate of routing table

2016-05-26 Thread Andrew Lunn

With a maximum of four switches, the size of the routing table is the
same as the pointer to it. Removing it makes the code simpler.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx.c |  3 +--
 include/net/dsa.h   | 10 +-
 net/dsa/dsa.c   | 12 
 3 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 85332d9a245a..d622c0fb76cc 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3024,8 +3024,7 @@ static int mv88e6xxx_setup_global(struct 
mv88e6xxx_priv_state *ps)
for (i = 0; i < 32; i++) {
int nexthop = 0x1f;
 
-   if (ps->ds->cd->rtable &&
-   i != ps->ds->index && i < ps->ds->dst->pd->nr_chips)
+   if (i != ps->ds->index && i < ps->ds->dst->pd->nr_chips)
nexthop = ps->ds->cd->rtable[i] & 0x1f;
 
err = _mv88e6xxx_reg_write(
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 8314197d028f..37e8e179d85a 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -58,12 +58,12 @@ struct dsa_chip_data {
struct device_node *port_dn[DSA_MAX_PORTS];
 
/*
-* An array (with nr_chips elements) of which element [a]
-* indicates which port on this switch should be used to
-* send packets to that are destined for switch a.  Can be
-* NULL if there is only one switch chip.
+* An array of which element [a] indicates which port on this
+* switch should be used to send packets to that are destined
+* for switch a.  Can be NULL if there is only one switch
+* chip.
 */
-   s8  *rtable;
+   s8  rtable[DSA_MAX_SWITCHES];
 };
 
 struct dsa_platform_data {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 5907f8cd13b6..6177dd750847 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -587,17 +587,6 @@ static int dsa_of_setup_routing_table(struct 
dsa_platform_data *pd,
if (link_sw_addr >= pd->nr_chips)
return -EINVAL;
 
-   /* First time routing table allocation */
-   if (!cd->rtable) {
-   cd->rtable = kmalloc_array(pd->nr_chips, sizeof(s8),
-  GFP_KERNEL);
-   if (!cd->rtable)
-   return -ENOMEM;
-
-   /* default to no valid uplink/downlink */
-   memset(cd->rtable, -1, pd->nr_chips * sizeof(s8));
-   }
-
cd->rtable[link_sw_addr] = port_index;
 
return 0;
@@ -639,7 +628,6 @@ static void dsa_of_free_platform_data(struct 
dsa_platform_data *pd)
kfree(pd->chip[i].port_names[port_index]);
port_index++;
}
-   kfree(pd->chip[i].rtable);
 
/* Drop our reference to the MDIO bus device */
if (pd->chip[i].host_dev)
-- 
2.8.1

[RFC PATCH 06/16] dsa: Move port device node into port structure

2016-05-26 Thread Andrew Lunn

Move the port device node structure into the port structure, from the
chip data. This information is needed in the next step of implementing
the new binding.

The chip data structure is used while parsing the whole old binding,
before the individual switch structures exist. With the new bindings,
this is reversed, the switches exist first, and the interconnections
between the switches is derived from the individual switch
bindings. Thus this chip data structure becomes unneeded.

Signed-off-by: Andrew Lunn 
---
 include/net/dsa.h | 1 +
 net/dsa/dsa.c | 8 
 net/dsa/slave.c   | 5 ++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 9aed8572037c..8314197d028f 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -121,6 +121,7 @@ struct dsa_switch_tree {
 
 struct dsa_port {
struct net_device   *netdev;
+   struct device_node  *dn;
 };
 
 struct dsa_switch {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 18086e0cc617..5907f8cd13b6 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -182,7 +182,6 @@ __ATTRIBUTE_GROUPS(dsa_hwmon);
 /* basic switch operations **/
 static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master)
 {
-   struct dsa_chip_data *cd = ds->cd;
struct device_node *port_dn;
struct phy_device *phydev;
int ret, port, mode;
@@ -191,7 +190,7 @@ static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct 
net_device *master)
if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
continue;
 
-   port_dn = cd->port_dn[port];
+   port_dn = ds->ports[port].dn;
if (of_phy_is_fixed_link(port_dn)) {
ret = of_phy_register_fixed_link(port_dn);
if (ret) {
@@ -325,6 +324,8 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
 * Create network devices for physical switch ports.
 */
for (i = 0; i < DSA_MAX_PORTS; i++) {
+   ds->ports[i].dn = cd->port_dn[i];
+
if (!(ds->enabled_port_mask & (1 << i)))
continue;
 
@@ -424,7 +425,6 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
 {
struct device_node *port_dn;
struct phy_device *phydev;
-   struct dsa_chip_data *cd = ds->cd;
int port;
 
 #ifdef CONFIG_NET_DSA_HWMON
@@ -445,7 +445,7 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
 
/* Remove any fixed link PHYs */
for (port = 0; port < DSA_MAX_PORTS; port++) {
-   port_dn = cd->port_dn[port];
+   port_dn = ds->ports[port].dn;
if (of_phy_is_fixed_link(port_dn)) {
phydev = of_phy_find_device(port_dn);
if (phydev) {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 169abacbc6ce..52f1183c42a0 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -998,13 +998,12 @@ static int dsa_slave_phy_setup(struct dsa_slave_priv *p,
struct net_device *slave_dev)
 {
struct dsa_switch *ds = p->parent;
-   struct dsa_chip_data *cd = ds->cd;
struct device_node *phy_dn, *port_dn;
bool phy_is_fixed = false;
u32 phy_flags = 0;
int mode, ret;
 
-   port_dn = cd->port_dn[p->port];
+   port_dn = ds->ports[p->port].dn;
mode = of_get_phy_mode(port_dn);
if (mode < 0)
mode = PHY_INTERFACE_MODE_NA;
@@ -1146,7 +1145,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct device 
*parent,
 NULL);
 
SET_NETDEV_DEV(slave_dev, parent);
-   slave_dev->dev.of_node = ds->cd->port_dn[port];
+   slave_dev->dev.of_node = ds->ports[port].dn;
slave_dev->vlan_features = master->vlan_features;
 
p = netdev_priv(slave_dev);
-- 
2.8.1

[RFC PATCH 04/16] dsa: tag_{e}dsa.c: Remove dependency on platform data

2016-05-26 Thread Andrew Lunn

The platform data nr_chips is used when validating a received packet,
to ensure it comes from a know switch chip. The number of possible
switches is limited to DSA_MAX_SWITCHES, so use this as the first
validation step. The new binding allows holes in the dst->ds[] array,
so also ensure ensure there is a valid dsa_switch for this packet.

Signed-off-by: Andrew Lunn 
---
 net/dsa/tag_dsa.c  | 6 +-
 net/dsa/tag_edsa.c | 6 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index aa780e4ac0bd..f9832f097681 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -107,9 +107,13 @@ static int dsa_rcv(struct sk_buff *skb, struct net_device 
*dev,
 * Check that the source device exists and that the source
 * port is a registered DSA port.
 */
-   if (source_device >= dst->pd->nr_chips)
+   if (source_device >= DSA_MAX_SWITCHES)
goto out_drop;
+
ds = dst->ds[source_device];
+   if (!ds)
+   goto out_drop;
+
if (source_port >= DSA_MAX_PORTS || ds->ports[source_port] == NULL)
goto out_drop;
 
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index 2288c8098c42..3890aac8190f 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -120,9 +120,13 @@ static int edsa_rcv(struct sk_buff *skb, struct net_device 
*dev,
 * Check that the source device exists and that the source
 * port is a registered DSA port.
 */
-   if (source_device >= dst->pd->nr_chips)
+   if (source_device >= DSA_MAX_SWITCHES)
goto out_drop;
+
ds = dst->ds[source_device];
+   if (!ds)
+   goto out_drop;
+
if (source_port >= DSA_MAX_PORTS || ds->ports[source_port] == NULL)
goto out_drop;
 
-- 
2.8.1

[RFC PATCH 08/16] dsa: Copy the routing table into the switch structure

2016-05-26 Thread Andrew Lunn

The new binding will not have a chip data structure, it will place the
routing directly into the switch structure. To enable backwards
compatibility, copy the routing from the chip data into the switch
structure.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx.c |  4 ++--
 include/net/dsa.h   | 10 +-
 net/dsa/dsa.c   |  2 ++
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index d622c0fb76cc..492801a6398c 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3024,8 +3024,8 @@ static int mv88e6xxx_setup_global(struct 
mv88e6xxx_priv_state *ps)
for (i = 0; i < 32; i++) {
int nexthop = 0x1f;
 
-   if (i != ps->ds->index && i < ps->ds->dst->pd->nr_chips)
-   nexthop = ps->ds->cd->rtable[i] & 0x1f;
+   if (i != ds->index && i < DSA_MAX_SWITCHES)
+   nexthop = ds->rtable[i] & 0x1f;
 
err = _mv88e6xxx_reg_write(
ps, REG_GLOBAL2,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 37e8e179d85a..ea0e7d30342b 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -149,6 +149,14 @@ struct dsa_switch {
 */
struct dsa_switch_driver*drv;
 
+   /*
+* An array of which element [a] indicates which port on this
+* switch should be used to send packets to that are destined
+* for switch a.  Can be NULL if there is only one switch
+* chip.
+*/
+   s8  rtable[DSA_MAX_SWITCHES];
+
 #ifdef CONFIG_NET_DSA_HWMON
/*
 * Hardware monitoring information
@@ -195,7 +203,7 @@ static inline u8 dsa_upstream_port(struct dsa_switch *ds)
if (dst->cpu_switch == ds->index)
return dst->cpu_port;
else
-   return ds->cd->rtable[dst->cpu_switch];
+   return ds->rtable[dst->cpu_switch];
 }
 
 struct switchdev_trans;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 6177dd750847..bfe1d03d4730 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -297,6 +297,8 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
dst->tag_protocol = drv->tag_protocol;
}
 
+   memcpy(ds->rtable, cd->rtable, sizeof(ds->rtable));
+
/*
 * Do basic register setup.
 */
-- 
2.8.1

[RFC PATCH 02/16] net: dsa: mv88e6xxx: fix circular lock in PPU work

2016-05-26 Thread Andrew Lunn

From: Vivien Didelot 

Lock debugging shows that there is a possible circular lock in the PPU
work code. Switch the lock order of smi_mutex and ppu_mutex to fix this.

Here's the full trace:

[4.341325] ==
[4.347519] [ INFO: possible circular locking dependency detected ]
[4.353800] 4.6.0 #4 Not tainted
[4.357039] ---
[4.363315] kworker/0:1/328 is trying to acquire lock:
[4.368463]  (>smi_mutex){+.+.+.}, at: [<8049c758>] 
mv88e6xxx_reg_read+0x30/0x54
[4.376313]
[4.376313] but task is already holding lock:
[4.382160]  (>ppu_mutex){+.+...}, at: [<8049cac0>] 
mv88e6xxx_ppu_reenable_work+0x28/0xd4
[4.390772]
[4.390772] which lock already depends on the new lock.
[4.390772]
[4.398963]
[4.398963] the existing dependency chain (in reverse order) is:
[4.406461]
[4.406461] -> #1 (>ppu_mutex){+.+...}:
[4.410897][<806d86bc>] mutex_lock_nested+0x54/0x360
[4.416606][<8049a800>] mv88e6xxx_ppu_access_get+0x28/0x100
[4.422906][<8049b778>] mv88e6xxx_phy_read+0x90/0xdc
[4.428599][<806a4534>] dsa_slave_phy_read+0x3c/0x40
[4.434300][<804943ec>] mdiobus_read+0x68/0x80
[4.439481][<804939d4>] get_phy_device+0x58/0x1d8
[4.444914][<80493ed0>] mdiobus_scan+0x24/0xf4
[4.450078][<8049409c>] __mdiobus_register+0xfc/0x1ac
[4.455857][<806a40b0>] dsa_probe+0x860/0xca8
[4.460934][<8043246c>] platform_drv_probe+0x5c/0xc0
[4.466627][<804305a0>] driver_probe_device+0x118/0x450
[4.472589][<80430b00>] __device_attach_driver+0xac/0x128
[4.478724][<8042e350>] bus_for_each_drv+0x74/0xa8
[4.484235][<804302d8>] __device_attach+0xc4/0x154
[4.489755][<80430cec>] device_initial_probe+0x1c/0x20
[4.495612][<8042f620>] bus_probe_device+0x98/0xa0
[4.501123][<8042fbd0>] deferred_probe_work_func+0x4c/0xd4
[4.507328][<8013a794>] process_one_work+0x1a8/0x604
[4.513030][<8013ac54>] worker_thread+0x64/0x528
[4.518367][<801409e8>] kthread+0xec/0x100
[4.523201][<80108f30>] ret_from_fork+0x14/0x24
[4.528462]
[4.528462] -> #0 (>smi_mutex){+.+.+.}:
[4.532895][<8015ad5c>] lock_acquire+0xb4/0x1dc
[4.538154][<806d86bc>] mutex_lock_nested+0x54/0x360
[4.543856][<8049c758>] mv88e6xxx_reg_read+0x30/0x54
[4.549549][<8049cad8>] mv88e6xxx_ppu_reenable_work+0x40/0xd4
[4.556022][<8013a794>] process_one_work+0x1a8/0x604
[4.561707][<8013ac54>] worker_thread+0x64/0x528
[4.567053][<801409e8>] kthread+0xec/0x100
[4.571878][<80108f30>] ret_from_fork+0x14/0x24
[4.577139]
[4.577139] other info that might help us debug this:
[4.577139]
[4.585159]  Possible unsafe locking scenario:
[4.585159]
[4.591093]CPU0CPU1
[4.595631]
[4.600169]   lock(>ppu_mutex);
[4.603693]lock(>smi_mutex);
[4.609742]lock(>ppu_mutex);
[4.615790]   lock(>smi_mutex);
[4.619314]
[4.619314]  *** DEADLOCK ***
[4.619314]
[4.625256] 3 locks held by kworker/0:1/328:
[4.629537]  #0:  ("events"){.+.+..}, at: [<8013a704>] 
process_one_work+0x118/0x604
[4.637288]  #1:  ((>ppu_work)){+.+...}, at: [<8013a704>] 
process_one_work+0x118/0x604
[4.645653]  #2:  (>ppu_mutex){+.+...}, at: [<8049cac0>] 
mv88e6xxx_ppu_reenable_work+0x28/0xd4
[4.654714]
[4.654714] stack backtrace:
[4.659098] CPU: 0 PID: 328 Comm: kworker/0:1 Not tainted 4.6.0 #4
[4.665286] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[4.671748] Workqueue: events mv88e6xxx_ppu_reenable_work
[4.677174] Backtrace:
[4.679674] [<8010d354>] (dump_backtrace) from [<8010d5a0>] 
(show_stack+0x20/0x24)
[4.687252]  r6:80fb3c88 r5:80fb3c88 r4:80fb4728 r3:0002
[4.693003] [<8010d580>] (show_stack) from [<803b45e8>] 
(dump_stack+0x24/0x28)
[4.700246] [<803b45c4>] (dump_stack) from [<80157398>] 
(print_circular_bug+0x208/0x32c)
[4.708361] [<80157190>] (print_circular_bug) from [<8015a630>] 
(__lock_acquire+0x185c/0x1b80)
[4.716982]  r10:9ec22a00 r9:0060 r8:8164b6bc r7:0040 
r6:0003 r5:8163a5b4
[4.724905]  r4:0003 r3:9ec22de8
[4.728537] [<80158dd4>] (__lock_acquire) from [<8015ad5c>] 
(lock_acquire+0xb4/0x1dc)
[4.736378]  r10:6013

[PATCH 2/2] net: pktgen: Call destroy_hrtimer_on_stack()

2016-05-26 Thread Guenter Roeck

If CONFIG_DEBUG_OBJECTS_TIMERS=y, hrtimer_init_on_stack() requires
a matching call to destroy_hrtimer_on_stack() to clean up timer
debug objects.

Signed-off-by: Guenter Roeck 
---
 net/core/pktgen.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 8604ae245960..8b02df0d354d 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2245,10 +2245,8 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t 
spin_until)
hrtimer_set_expires(, spin_until);
 
remaining = ktime_to_ns(hrtimer_expires_remaining());
-   if (remaining <= 0) {
-   pkt_dev->next_tx = ktime_add_ns(spin_until, pkt_dev->delay);
-   return;
-   }
+   if (remaining <= 0)
+   goto out;
 
start_time = ktime_get();
if (remaining < 10) {
@@ -2273,7 +2271,9 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t 
spin_until)
}
 
pkt_dev->idle_acc += ktime_to_ns(ktime_sub(end_time, start_time));
+out:
pkt_dev->next_tx = ktime_add_ns(spin_until, pkt_dev->delay);
+   destroy_hrtimer_on_stack();
 }
 
 static inline void set_pkt_overhead(struct pktgen_dev *pkt_dev)
-- 
2.5.0

[PATCH 1/2] timer: Export destroy_hrtimer_on_stack()

2016-05-26 Thread Guenter Roeck

hrtimer_init_on_stack() needs a matching call to
destroy_hrtimer_on_stack(), so both need to be exported.

Signed-off-by: Guenter Roeck 
---
 kernel/time/hrtimer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 8c7392c4fdbd..e99df0ff1d42 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -425,6 +425,7 @@ void destroy_hrtimer_on_stack(struct hrtimer *timer)
 {
debug_object_free(timer, _debug_descr);
 }
+EXPORT_SYMBOL_GPL(destroy_hrtimer_on_stack);
 
 #else
 static inline void debug_hrtimer_init(struct hrtimer *timer) { }
-- 
2.5.0

[PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-05-26 Thread Steven Rostedt

[ Alison, can you try this patch ]

This uses netpoll_poll_lock()/unlock() to synchronize netpoll and napi
poll operations. Without this method, the synchronization is done by
looping on NAPI_STATE_SCHED 'bitset'. This method works fine on a non-rt
kernel because a softirq can not be preempted, and the thread poll is
called with local_bh_disable() which prevents softirqs from running and
preempting it. But on rt, this code can be preempted.  Thus, the code may
be preempted out while holding the NAPI_STATE_SCHED 'bitset', opening a
window for a livelock.

For example:

   

   napi_schedule_prep()
test_and_set_bit(NAPI_STATE_SCHED, >state)

   

   sk_busy_loop()

  do {
   rc = busy_poll()
   ret = napi_schedule_prep()
return !test_and_set_bit(NAPI_STATE_SCHED, >state)
 
   if (!ret) return 0
   
  } while (...) /* for ever */

This isn't a problem in non PREEMPT_RT because the napi_schedule_prep()
can not be preempted. But because it can in PREEMPT_RT, we need to add
some extra locking. The netpoll_poll_lock() works well here, but they need
to be added around any call to busy_poll().

Using IS_ENABLED(CONFIG_PREEMPT_RT_FULL) will allow gcc to optimize out
the extra calls to poll_lock.

Tested-by: "Luis Claudio R. Goncalves" 
Reviewed-by: Daniel Bristot de Oliveira 
Signed-off-by: Steven Rostedt 
---
 include/linux/netpoll.h |2 +-
 include/net/busy_poll.h |   14 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

Index: linux-rt.git/include/linux/netpoll.h
===
--- linux-rt.git.orig/include/linux/netpoll.h   2016-05-26 18:31:09.183150389 
-0400
+++ linux-rt.git/include/linux/netpoll.h2016-05-26 18:52:02.657014280 
-0400
@@ -77,7 +77,7 @@ static inline void *netpoll_poll_lock(st
 {
struct net_device *dev = napi->dev;
 
-   if (dev && dev->npinfo) {
+   if (dev && (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) || dev->npinfo)) {
spin_lock(>poll_lock);
napi->poll_owner = smp_processor_id();
return napi;
Index: linux-rt.git/include/net/busy_poll.h
===
--- linux-rt.git.orig/include/net/busy_poll.h   2016-05-26 18:31:09.183150389 
-0400
+++ linux-rt.git/include/net/busy_poll.h2016-05-26 19:10:12.134266713 
-0400
@@ -25,6 +25,7 @@
 #define _LINUX_NET_BUSY_POLL_H
 
 #include 
+#include 
 #include 
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
@@ -97,7 +98,18 @@ static inline bool sk_busy_loop(struct s
goto out;
 
do {
-   rc = ops->ndo_busy_poll(napi);
+   /* When RT is enabled, napi_schedule_prep() can be preempted
+* with NAPI_STATE_SCHED set, causing the busy_poll() function
+* to always return zero, and this loop may never exit.
+* In that case, we must always take the netpoll_poll_lock.
+*/
+   if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
+   void *have = netpoll_poll_lock(napi);
+   rc = ops->ndo_busy_poll(napi);
+   netpoll_poll_unlock(have);
+   } else {
+   rc = ops->ndo_busy_poll(napi);
+   }
 
if (rc == LL_FLUSH_FAILED)
break; /* permanent failure */

[PATCH,RFC] macvlan: Handle broadcasts inline if we have only a few macvlans.

2016-05-26 Thread Lennert Buytenhek

Commit 412ca1550cbecb2c ("macvlan: Move broadcasts into a work queue")
moved processing of all macvlan multicasts into a work queue.  This
causes a noticable performance regression when there is heavy multicast
traffic on the underlying interface for multicast groups that the
macvlan subinterfaces are not members of, in which case we end up
cloning all those packets and then freeing them again from a work queue
without really doing any useful work with them in between.

The commit message for commit 412ca1550cbecb2c says:

|   Fundamentally, we need to ensure that the amount of work handled
|   in each netif_rx backlog run is constrained.  As broadcasts are
|   anything but constrained, it either needs to be limited per run
|   or moved to process context.

This patch moves multicast handling back into macvlan_handle_frame()
context if there are 100 or fewer macvlan subinterfaces, while keeping
the work queue for if there are more macvlan subinterfaces than that.

I played around with keeping track of the number of macvlan
subinterfaces that have each multicast filter bit set, but that ended
up being more complicated than I liked.  Conditionalising the work
queue deferring on the total number of macvlan subinterfaces seems
like a fair compromise.

On a quickly whipped together test program that creates an ethertap
interface with a single macvlan subinterface and then blasts 16 Mi
multicast packets through the ethertap interface for a multicast
group that the macvlan subinterface is not a member of, run time goes
from (vanilla kernel):

# time ./stress
real0m41.864s
user0m0.622s
sys 0m20.754s

to (with this patch):

# time ./stress
real0m16.539s
user0m0.519s
sys 0m15.949s

Reported-by: Grant Zhang 
Signed-off-by: Lennert Buytenhek 
---
 drivers/net/macvlan.c | 71 ---
 1 file changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index cb01023..02934a5 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -231,7 +231,8 @@ static unsigned int mc_hash(const struct macvlan_dev *vlan,
 static void macvlan_broadcast(struct sk_buff *skb,
  const struct macvlan_port *port,
  struct net_device *src,
- enum macvlan_mode mode)
+ enum macvlan_mode mode,
+ bool do_rx_softirq)
 {
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
@@ -254,17 +255,49 @@ static void macvlan_broadcast(struct sk_buff *skb,
 
err = NET_RX_DROP;
nskb = skb_clone(skb, GFP_ATOMIC);
-   if (likely(nskb))
+   if (likely(nskb)) {
err = macvlan_broadcast_one(
nskb, vlan, eth,
-   mode == MACVLAN_MODE_BRIDGE) ?:
- netif_rx_ni(nskb);
+   mode == MACVLAN_MODE_BRIDGE);
+   if (err == 0) {
+   if (do_rx_softirq)
+   err = netif_rx_ni(nskb);
+   else
+   err = netif_rx(nskb);
+   }
+   }
macvlan_count_rx(vlan, skb->len + ETH_HLEN,
 err == NET_RX_SUCCESS, true);
}
}
 }
 
+static void macvlan_process_one(struct sk_buff *skb,
+   struct macvlan_port *port,
+   const struct macvlan_dev *src,
+   bool do_rx_softirq)
+{
+   if (!src)
+   /* frame comes from an external address */
+   macvlan_broadcast(skb, port, NULL,
+ MACVLAN_MODE_PRIVATE |
+ MACVLAN_MODE_VEPA|
+ MACVLAN_MODE_PASSTHRU|
+ MACVLAN_MODE_BRIDGE, do_rx_softirq);
+   else if (src->mode == MACVLAN_MODE_VEPA)
+   /* flood to everyone except source */
+   macvlan_broadcast(skb, port, src->dev,
+ MACVLAN_MODE_VEPA |
+ MACVLAN_MODE_BRIDGE, do_rx_softirq);
+   else
+   /*
+* flood only to VEPA ports, bridge ports
+* already saw the frame on the way out.
+*/
+   macvlan_broadcast(skb, port, src->dev,
+ MACVLAN_MODE_VEPA, do_rx_softirq);
+}
+
 static void

[PATCH V4] brcmfmac: print errors if creating interface fails

2016-05-26 Thread Rafał Miłecki

This is helpful for debugging. Without this all I was getting from "iw"
command on failed creating of P2P interface was:
> command failed: Too many open files in system (-23)

Signed-off-by: Rafał Miłecki 
---
V2: s/in/if/ in commit message
V3: Add one more error message as suggested by Arend
V4: Also update brcmf_cfg80211_add_iface & print error for AP
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 8 ++--
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c  | 3 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 3d09d23..e7975a3 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -669,14 +669,18 @@ static struct wireless_dev 
*brcmf_cfg80211_add_iface(struct wiphy *wiphy,
return ERR_PTR(-EOPNOTSUPP);
case NL80211_IFTYPE_AP:
wdev = brcmf_ap_add_vif(wiphy, name, flags, params);
-   if (!IS_ERR(wdev))
+   if (IS_ERR(wdev))
+   brcmf_err("Failed to create AP interface %s: %d\n", 
name, PTR_ERR(wdev));
+   else
brcmf_cfg80211_update_proto_addr_mode(wdev);
return wdev;
case NL80211_IFTYPE_P2P_CLIENT:
case NL80211_IFTYPE_P2P_GO:
case NL80211_IFTYPE_P2P_DEVICE:
wdev = brcmf_p2p_add_vif(wiphy, name, name_assign_type, type, 
flags, params);
-   if (!IS_ERR(wdev))
+   if (IS_ERR(wdev))
+   brcmf_err("Failed to create P2P interface %s: %d\n", 
name, PTR_ERR(wdev));
+   else
brcmf_cfg80211_update_proto_addr_mode(wdev);
return wdev;
case NL80211_IFTYPE_UNSPECIFIED:
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
index 1652a48..bc26aec 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
@@ -2031,7 +2031,7 @@ static int brcmf_p2p_request_p2p_if(struct brcmf_p2p_info 
*p2p,
err = brcmf_fil_iovar_data_set(ifp, "p2p_ifadd", _request,
   sizeof(if_request));
if (err)
-   return err;
+   brcmf_err("p2p_ifadd failed %d\n", err);
 
return err;
 }
@@ -2185,6 +2185,7 @@ struct wireless_dev *brcmf_p2p_add_vif(struct wiphy 
*wiphy, const char *name,
err = brcmf_p2p_request_p2p_if(>p2p, ifp, cfg->p2p.int_addr,
   iftype);
if (err) {
+   brcmf_err("Failed to request P2P virtual interface %s\n", name);
brcmf_cfg80211_arm_vif_event(cfg, NULL);
goto fail;
}
-- 
1.8.4.5

Re: How can I test ndo_tx_timeout()?

2016-05-26 Thread Timur Tabi


Florian Fainelli wrote:

One way to do this could be to never reclaim the SKBs you just
transmitted which could be achieved by disabling the TX completion
interrupt permanently for instance.


Thanks, that was the the trick.

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.

Re: How can I test ndo_tx_timeout()?

2016-05-26 Thread Florian Fainelli

On 05/26/2016 01:40 PM, Timur Tabi wrote:
> Is there an easy way to test my driver's response to a
> ndo_tx_timeout() call?  That is, force dev_watchdog() to think that a
> timeout has occurred?
> 

One way to do this could be to never reclaim the SKBs you just
transmitted which could be achieved by disabling the TX completion
interrupt permanently for instance. Alternatively you might stop your tx
queue, avoid checking for this condition (netif_tx_queue_stopped..) and
still try to push SKBs and transmit them, combining the two should be
fairly effective.

There could be better ways to test that, though these would at least try
to reproduce real world scenarios where transmit flow control is broken.
-- 
Florian

Re: [PATCH percpu/for-4.7-fixes 1/2] percpu: fix synchronization between chunk->map_extend_work and chunk destruction

2016-05-26 Thread Vlastimil Babka

On 26.5.2016 21:21, Tejun Heo wrote:
> Hello,
> 
> On Thu, May 26, 2016 at 11:19:06AM +0200, Vlastimil Babka wrote:
>>> if (is_atomic) {
>>> margin = 3;
>>>
>>> if (chunk->map_alloc <
>>> -   chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW &&
>>> -   pcpu_async_enabled)
>>> -   schedule_work(>map_extend_work);
>>> +   chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) {
>>> +   if (list_empty(>map_extend_list)) {
> 
>> So why this list_empty condition? Doesn't it deserve a comment then? And
> 
> Because doing list_add() twice corrupts the list.  I'm not sure that
> deserves a comment.  We can do list_move() instead but that isn't
> necessarily better.

Ugh, right, somehow I thought it was testing _map_extend_chunks.
My second question was based on the assumption that the list can have only one
item. Sorry about the noise.

>> isn't using a list an overkill in that case?
> 
> That would require rebalance work to scan all chunks whenever it's
> scheduled and if a lot of atomic allocations are taking place, it has
> some possibility to become expensive with a lot of chunks.
> 
> Thanks.
>

[PATCH net] sfc: use flow dissector helpers for aRFS

2016-05-26 Thread Edward Cree

Signed-off-by: Edward Cree 
---
This seems to work in my testing, but I first looked at the flow dissector
API less than four hours ago, so I might be doing it wrong.

I still think that this is too big a change for 'net' and that it's better
to take the original fix now and then I'll respin this patch for net-next
when it opens up.  But I'm also happy for you to take this now in which
case I'll respin Jon's patch 2/2 on top of the result.  Your choice.

 drivers/net/ethernet/sfc/rx.c | 76 +--
 1 file changed, 23 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 8956995..adbce33 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -842,33 +842,15 @@ int efx_filter_rfs(struct net_device *net_dev, const 
struct sk_buff *skb,
struct efx_nic *efx = netdev_priv(net_dev);
struct efx_channel *channel;
struct efx_filter_spec spec;
-   const __be16 *ports;
-   __be16 ether_type;
-   int nhoff;
+   struct flow_keys fk;
int rc;
 
-   /* The core RPS/RFS code has already parsed and validated
-* VLAN, IP and transport headers.  We assume they are in the
-* header area.
-*/
-
-   if (skb->protocol == htons(ETH_P_8021Q)) {
-   const struct vlan_hdr *vh =
-   (const struct vlan_hdr *)skb->data;
-
-   /* We can't filter on the IP 5-tuple and the vlan
-* together, so just strip the vlan header and filter
-* on the IP part.
-*/
-   EFX_BUG_ON_PARANOID(skb_headlen(skb) < sizeof(*vh));
-   ether_type = vh->h_vlan_encapsulated_proto;
-   nhoff = sizeof(struct vlan_hdr);
-   } else {
-   ether_type = skb->protocol;
-   nhoff = 0;
-   }
+   if (!skb_flow_dissect_flow_keys(skb, , 0))
+   return -EPROTONOSUPPORT;
 
-   if (ether_type != htons(ETH_P_IP) && ether_type != htons(ETH_P_IPV6))
+   if (fk.basic.n_proto != htons(ETH_P_IP) && fk.basic.n_proto != 
htons(ETH_P_IPV6))
+   return -EPROTONOSUPPORT;
+   if (fk.control.flags & FLOW_DIS_IS_FRAGMENT)
return -EPROTONOSUPPORT;
 
efx_filter_init_rx(, EFX_FILTER_PRI_HINT,
@@ -878,34 +863,19 @@ int efx_filter_rfs(struct net_device *net_dev, const 
struct sk_buff *skb,
EFX_FILTER_MATCH_ETHER_TYPE | EFX_FILTER_MATCH_IP_PROTO |
EFX_FILTER_MATCH_LOC_HOST | EFX_FILTER_MATCH_LOC_PORT |
EFX_FILTER_MATCH_REM_HOST | EFX_FILTER_MATCH_REM_PORT;
-   spec.ether_type = ether_type;
-
-   if (ether_type == htons(ETH_P_IP)) {
-   const struct iphdr *ip =
-   (const struct iphdr *)(skb->data + nhoff);
-
-   EFX_BUG_ON_PARANOID(skb_headlen(skb) < nhoff + sizeof(*ip));
-   if (ip_is_fragment(ip))
-   return -EPROTONOSUPPORT;
-   spec.ip_proto = ip->protocol;
-   spec.rem_host[0] = ip->saddr;
-   spec.loc_host[0] = ip->daddr;
-   EFX_BUG_ON_PARANOID(skb_headlen(skb) < nhoff + 4 * ip->ihl + 4);
-   ports = (const __be16 *)(skb->data + nhoff + 4 * ip->ihl);
+   spec.ether_type = fk.basic.n_proto;
+   spec.ip_proto = fk.basic.ip_proto;
+
+   if (fk.basic.n_proto == htons(ETH_P_IP)) {
+   spec.rem_host[0] = fk.addrs.v4addrs.src;
+   spec.loc_host[0] = fk.addrs.v4addrs.dst;
} else {
-   const struct ipv6hdr *ip6 =
-   (const struct ipv6hdr *)(skb->data + nhoff);
-
-   EFX_BUG_ON_PARANOID(skb_headlen(skb) <
-   nhoff + sizeof(*ip6) + 4);
-   spec.ip_proto = ip6->nexthdr;
-   memcpy(spec.rem_host, >saddr, sizeof(ip6->saddr));
-   memcpy(spec.loc_host, >daddr, sizeof(ip6->daddr));
-   ports = (const __be16 *)(ip6 + 1);
+   memcpy(spec.rem_host, , sizeof(struct 
in6_addr));
+   memcpy(spec.loc_host, , sizeof(struct 
in6_addr));
}
 
-   spec.rem_port = ports[0];
-   spec.loc_port = ports[1];
+   spec.rem_port = fk.ports.src;
+   spec.loc_port = fk.ports.dst;
 
rc = efx->type->filter_rfs_insert(efx, );
if (rc < 0)
@@ -916,18 +886,18 @@ int efx_filter_rfs(struct net_device *net_dev, const 
struct sk_buff *skb,
channel = efx_get_channel(efx, skb_get_rx_queue(skb));
++channel->rfs_filters_added;
 
-   if (ether_type == htons(ETH_P_IP))
+   if (spec.ether_type == htons(ETH_P_IP))
netif_info(efx, rx_status, efx->net_dev,
   "steering %s %pI4:%u:%pI4:%u to queue %u [flow %u 
filter %d]\n",
   (spec.ip_proto == IPPROTO_TCP) ? "TCP" : "UDP",
-

How can I test ndo_tx_timeout()?

2016-05-26 Thread Timur Tabi

Is there an easy way to test my driver's response to a
ndo_tx_timeout() call?  That is, force dev_watchdog() to think that a
timeout has occurred?

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

Re: [PATCH net 1/2] sfc: handle nonlinear SKBs in efx_filter_rfs()

2016-05-26 Thread David Miller

From: Eric Dumazet 
Date: Thu, 26 May 2016 09:56:36 -0700

> I truly believe that every time a driver has a private flow dissector,
> it always have at least one bug.

+1

Re: [PATCH net 0/4] net/mlx4_en: fix stats

2016-05-26 Thread David Miller

From: Or Gerlitz 
Date: Thu, 26 May 2016 18:19:34 +0300

> Eric, sure, we were on a transition period and Tariq was not fully
> familiar with that practice,
> I'd like to make sure the move is finalized internally and then we'll
> send the patch..
> 
> On a somehow related note, Dave, Eric's patches were sent after our
> Wed working hours ended and accepted before our Thu working hours
> started.. could we get a better chance to review driver patches
> before acceptance?  I know there were times where we've screwed up
> and things didn't get fast attention, but we're working to improve
> so... get us a chance [1]?

If Eric's patches are clearly correct, I will apply them if I want to.

Sorry.

Re: [PATCH net 4/4] net/mlx4_en: get rid of private net_device_stats

2016-05-26 Thread David Miller

From: Eric Dumazet 
Date: Thu, 26 May 2016 05:54:10 -0700

> These stats being computed using deltas, this can not work as is.
> 
> I believe the rule is to not clear the netdev stats at open()

+1

Any driver clearing stats at open is broken and will break many,
many, networking tools.  And bonding too.

Re: [PATCH net 4/4] net/mlx4_en: get rid of private net_device_stats

2016-05-26 Thread David Miller

From: Tariq Toukan 
Date: Thu, 26 May 2016 12:38:58 +0300

> I am aware that clearing the stats structure might be redundant today,
> as the function is called only within mlx4_en_open, but we might want
> to call the function in other flows in the future.

You really should not arbitrarily clear the statistics, a down/up is not
supposed to rest them, for example.

Re: [PATCH] Accept user specified MTU value when create new vxlan link

2016-05-26 Thread David Miller

From: Chen Haiquan 
Date: Thu, 26 May 2016 17:09:01 +0800

> Fixes: 0dfbdf4102b9303d3ddf2177c0220098ff99f6de
> (vxlan: Factor out device configuration) which missed MTU value specified
> by user when create new vxlan link.
> 
> Signed-off-by:  Chen Haiquan 

This is not a properly formatted patch submission.

First of all, your Subject line should have a proper subsystem prefix
specified.  That way people who scan the GIT shortlog can tell where
your change is taking place by scanning the beginning of every commit
header line.

[PATCH] vxlan: Use specified MTU value when creating new link.

Second, your Fixes tag is not the place where you explain the problem
and how you are fixing it.

You put that in the commit message proper, then you give the Fixes:
tag.

Have you looked at how other people format their commit messages?
Have you seen anyone else do things the way you have done so here?
Always mimick other patch submitters, and learn by example.  If you
still can't figure out how to format something properly, ask the
list.

Thanks.

Re: [PATCH v2] net: alx: use custom skb allocator

2016-05-26 Thread David Miller

From: Feng Tang 
Date: Thu, 26 May 2016 16:41:55 +0800

> Maybe the driver maintainer from Atheros could take a look, as they
> can reach all the real HWs :)

Don't hold your breath.

Re: [PATCH net 0/8] qed*: Bug fixes

2016-05-26 Thread David Miller

From: Yuval Mintz 
Date: Thu, 26 May 2016 11:01:16 +0300

> This series contain several small fixes, most of which deal with
> either 100g support, sriov or bandwidth configurations.

Series applied, but in future the GFP_KERNEL fix should be done
differently.

Instead of passing a boolean sleepable state around, pass a "gfp_t"
which makes it clear at the call sites what that argument is
controlling.

That way you wouldn't have that mess of "sleepable ? GFP_X : GFP_Y"
constructs all over the place.

Re: [PATCH percpu/for-4.7-fixes 1/2] percpu: fix synchronization between chunk->map_extend_work and chunk destruction

2016-05-26 Thread Tejun Heo

Hello,

On Thu, May 26, 2016 at 11:19:06AM +0200, Vlastimil Babka wrote:
> > if (is_atomic) {
> > margin = 3;
> > 
> > if (chunk->map_alloc <
> > -   chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW &&
> > -   pcpu_async_enabled)
> > -   schedule_work(>map_extend_work);
> > +   chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) {
> > +   if (list_empty(>map_extend_list)) {

> So why this list_empty condition? Doesn't it deserve a comment then? And

Because doing list_add() twice corrupts the list.  I'm not sure that
deserves a comment.  We can do list_move() instead but that isn't
necessarily better.

> isn't using a list an overkill in that case?

That would require rebalance work to scan all chunks whenever it's
scheduled and if a lot of atomic allocations are taking place, it has
some possibility to become expensive with a lot of chunks.

Thanks.

-- 
tejun

Re: [PATCH RFT 1/2] phylib: add device reset GPIO support

2016-05-26 Thread Uwe Kleine-König

On Thu, May 26, 2016 at 11:00:55AM +0200, Linus Walleij wrote:
> On Thu, May 12, 2016 at 8:42 PM, Uwe Kleine-König
>  wrote:
> 
> > [added Linus Walleij to Cc, there is a question for you/him below]
> (...)
> >> +void mdio_device_reset(struct mdio_device *mdiodev, int value)
> >> +{
> >> + if (mdiodev->reset)
> >> + gpiod_set_value(mdiodev->reset, value);
> >
> > Before v4.6-rc1~108^2~91 it was not necessary to check for the first
> > parameter being non-NULL before calling gpiod_set_value. Linus, did you
> > change this on purpose?
> 
> Not really. And AFAICT it is still not necessary: what changed is that
> an error message will be printed by VALIDATE_DESC() if you do that.
> And that is proper I guess? I think it's sloppy code to randomly pass in
> NULL to a call and just expect it to bail out, it seems more like
> exercising the error path than something you'd normally rely on.
> 
> Or am I getting things wrong?

is the following sloppy?:

somegpio = gpiod_get_optional(dev, "some", GPIOD_OUT_LOW);
if (IS_ERR(somegpio))
return PTR_ERR(somegpio);
gpiod_set_value(somegpio, 1);

If not (as I assume) you really changed something as this might trigger
the warning.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |

Re: IPv6 extension header privileges

2016-05-26 Thread Tom Herbert

On Mon, May 23, 2016 at 11:11 AM, Tom Herbert  wrote:
> On Sun, May 22, 2016 at 4:56 AM, Sowmini Varadhan
>  wrote:
>>
>>> > Tom Herbert wrote:
>>> > If you don't mind I'll change this to make specific options are
>>> > privileged and not all hbh and destopt. There is talk in IETF about
>>> > reinventing IP extensibility within UDP since the kernel APIs don't
>>> > allow setting EH. I would like to avoid that :-)
>>
>>> On 21.05.2016 19:46, Sowmini Varadhan wrote:
>>> > Do you mean this
>>> >   http://www.ietf.org/mail-archive/web/spud/current/msg00365.html
>>
>> On (05/22/16 03:08), Hannes Frederic Sowa wrote:
>>> Hmm, haven't read carefully but isn't that just plain TCP in UDP? I saw
>>> extension headers mentioned but haven't grasped why they deem necessary.
>>
>> Tom should clarify what he meant, but perhaps he was referring to other
>> threads discussing v6 EH. In any case, I dont think the way least-privileges
>> for EH are implemented in an OS is directly relevant or causational for
>> whether or not the kernel should be bypassed - looks like there are a lot
>> of other drafts floating around, arguing for implementing various tcp/ip
>> protocols in uspace and beyond, motivated by various reasons.
>>
> It's a deployment conundrum. Suppose tomorrow that IANA registers some
> new hpb option that would be useful to the network, but is of no
> interest to the kernel other than it needs to be set in packets when
> the user requests it. In the white list model, there is no problem
> getting support for such a thing into the upstream kernel, the time
> frame for that is one release cycle. Neither is there any problem
> updating the apps to set the option, for instance we can update FB app
> to do this within a week. The problem is that getting something into
> the kernel does not make it useful, the kernel needs to actually be
> deployed which is mostly out of our control (for those of us who don't
> own the client platform). So get the options deployed on clients
> (particularly Android), this takes much, much longer. And if the
> feature requires explicit action do be enabled, like turning a sysctl,
> it is going to take even longer possibly an indeterminate amount of
> time to ever get enabled.
>
Thinking about this some more, the per option white list is a better
approach. If we allow an open ended mechanism for applications to
signal the network with arbitrary data (like user specified hbp
options would be), then use of that mechanism will inevitably
exploited by some authorities to force user to hand over private data
about their communications. It's better to not build in back doors to
security...

Tom

> Tom
>
>> Moving back to the topic here:
>>
>>> > Hannes Frederic Sowa wrote:
>>>  A white list of certain registered IPv6 IANA-options for non-priv 
>>>  whould
>>
>>> On 21.05.2016 19:46, Sowmini Varadhan wrote:
>>> > Problem is that APIs are not IANA'ed.
>>> > Even RFC 3542 is just Informationaal.
>>> >
>>> > And even the classic socket API's that come down from BSD are not
>>> > ietf'ed or iana'ed.
>>
>> On (05/22/16 03:08), Hannes Frederic Sowa wrote:
>>> Hmm, haven't read carefully but isn't that just plain TCP in UDP? I saw
>>> I think I don't completely understand this. IANA is numbering registry
>>> and if we have the proper option number allocated we can make sensible
>>> decisions and put options on the white list or provide a more complete
>>> sensible implementation of the specification in the kernel.
>>
>> IANA registers internet protocol (and related) numbers. so, e.g.,
>> So, for example, IP_TOS value is not really documented in iana,
>> and it ends up being 1 on linux, 3 on freebsd.  Or, to take another example,
>> IP_PKTINFO is "8" on linux, 0x1a on solaris and 25 in netbsd.
>>
>> But TOS, and the various code-points (which actually go out
>> in the packet, and are needed for proper interop in the network)
>> are documented in iana/ietf etc.
>>
>>> E.g. if an option for encapsulation is going to be specified, normal
>>> users should not be able to set those, like with CALIPSO or some VNI
>>> inside hop-by-hop options. That should probably be controlled by a
>>> routing table or a flow matching subsystem, in the kernel.
>>
>> sure, I completely agree with that. And I strongly suspect that's why
>> rfc3542 puts down a wildcard "may" - so that some options may be privileged,
>> others not. Which options are "privileged" (and even the definition
>> of "privileged") are entirely up to the OS implementation. (and even *how*
>> least priviliges/RBAC are implemented, can vary from OS to OS).
>>
>>> I think it is also in favor of the IETF to get those numbers specified
>>> and allocated in a proper way, otherwise security won't be manageable at
>>> all any more.
>>
>> see above.. Even rfc793 actually does not talk about POSIX APIs
>> but speaks in generalities, since the focus is on what goes on the wire.
>> In

Re: usbnet: smsc95xx: fix link detection for disabled autonegotiation

2016-05-26 Thread Florian Fainelli

On 05/26/2016 04:01 AM, Christoph Fritz wrote:
> On Thu, 2016-05-26 at 04:31 +0200, Andrew Lunn wrote:
>> On Thu, May 26, 2016 at 04:06:47AM +0200, Christoph Fritz wrote:
>>> To detect link status up/down for connections where autonegotiation is
>>> explicitly disabled, we don't get an irq but need to poll the status
>>> register for link up/down detection.
>>> This patch adds a workqueue to poll for link status.
>>
>> Did you consider using the phylib? It probably does the needed polling
>> already, and it looks like the functions needed to implement an MDIO
>> bus are already in place.
> 
> smsc95xx supports a relative wide range of PHYs which I don't have
> access to in regard of testing. So I prefer the least invasive one (with
> this patch) as mostly all of the other usbnet drivers do.

My reading of the driver is that it only supports its internal PHY, so
it should be pretty straightforward to extend drivers/net/phy/smsc.c to
support it?

> 
> A merge to phylib while paying attention to all the suspend modes and
> testing the wide range of PHYs would surely be the right thing to do.

Yes, the suspend stuff could be a little tricky, but not impossible, the
microchip lan78xx is an user of PHYLIB and it seems to work okay.
-- 
Florian

Re: [PATCH 1/1] net: nps_enet: Disable interrupts before napi reschedule

2016-05-26 Thread Alexey Brodkin

Hi Elad,

On Thu, 2016-05-26 at 15:00 +0300, Elad Kanfi wrote:
> From: Elad Kanfi 
> 
> Since NAPI works by shutting down event interrupts when theres
> work and turning them on when theres none, the net driver must
> make sure that interrupts are disabled when it reschedules polling.
> By calling napi_reschedule, the driver switches to polling mode,
> therefor there should be no interrupt interference.
> Any received packets will be handled in nps_enet_poll by polling the HW
> indication of received packet until all packets are handled.
> 
> Signed-off-by: Elad Kanfi 
> Acked-by: Noam Camus 
> ---
>  drivers/net/ethernet/ezchip/nps_enet.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ezchip/nps_enet.c 
> b/drivers/net/ethernet/ezchip/nps_enet.c
> index 085f912..06f0317 100644
> --- a/drivers/net/ethernet/ezchip/nps_enet.c
> +++ b/drivers/net/ethernet/ezchip/nps_enet.c
> @@ -205,8 +205,10 @@ static int nps_enet_poll(struct napi_struct *napi, int 
> budget)
>    * re-adding ourselves to the poll list.
>    */
>  
> - if (priv->tx_skb && !tx_ctrl_ct)
> + if (priv->tx_skb && !tx_ctrl_ct) {
> + nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0);
>   napi_reschedule(napi);
> + }
>   }
>  
>   return work_done;

We just bumped into the same problem (data exchange hangs on the very first 
"ping")
with released Linux v4.6 and linux-next on our nSIM OSCI virtual platform.

I believe it was commit 05c00d82f4d1 ("net: nps_enet: bug fix - handle lost tx 
interrupts")
that introduced the problem. At least reverting it I got networking working.

And indeed that patch fixes mentioned issue.
In other words...

Tested-by: Alexey Brodkin 

P.S. Given my observation is correct please add following to your commit
message if you ever do a respin:
-->8---
Fixes: 05c00d82f4d1 ("net: nps_enet: bug fix - handle lost tx interrupts")

Cc:  # 4.6.x
-->8---

Re: [patch] ptp: oops in ptp_ioctl()

2016-05-26 Thread Richard Cochran

On Thu, May 26, 2016 at 09:46:22AM +0300, Dan Carpenter wrote:
> If we pass ERR_PTR(-EFAULT) to kfree() then it's going to oops.

Thanks for catching this.

Acked-by: Richard Cochran

Re: [PATCH net 0/4] net/mlx4_en: fix stats

2016-05-26 Thread Eric Dumazet

On Thu, 2016-05-26 at 18:19 +0300, Or Gerlitz wrote:
> On Thu, May 26, 2016 at 3:57 PM, Eric Dumazet  wrote:
> > On Thu, 2016-05-26 at 12:44 +0300, Tariq Toukan wrote:
> >> Hi Eric,
> >>
> >> > mlx4 has various bugs in its ndo_get_stats() and related functions.
> >> > This patch series address the obvious issues.
> >> > Remaining ones will be discussed later.
> >> >
> >> Thanks for the fixes.
> >> I see they were already applied.
> >> I reviewed them all and replied to patch 4/4, the rest look good to me.
> >> Please CC me as the maintainer of mlx4_en on future patches.
> >
> > If you are mlx4_en maintainer, please submit an official patch so that
> > non Mellanox employees can get this information using the normal way ?
> 
> Eric, sure, we were on a transition period and Tariq was not fully
> familiar with that practice,
> I'd like to make sure the move is finalized internally and then we'll
> send the patch..

Sure ! Please note I gave a polite answer, I am sorry if you felt any
hidden intent from my side.

I did CC the official mlx4_en maintainer on this patch series, by simply
looking at MAINTAINERS file.

[PATCH] net: l2tp: Make l2tp_ip6 namespace aware

2016-05-26 Thread Shmulik Ladkani

l2tp_ip6 tunnel and session lookups were still using init_net, although
the l2tp core infrastructure already supports lookups keyed by 'net'.

As a result, l2tp_ip6_recv discarded packets for tunnels/sessions
created in namespaces other than the init_net.

Fix, by using dev_net(skb->dev) or sock_net(sk) where appropriate.

Signed-off-by: Shmulik Ladkani 
---
 net/l2tp/l2tp_ip6.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index c6f5df1bed..6c54e03fe9 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -128,6 +128,7 @@ static inline struct sock *l2tp_ip6_bind_lookup(struct net 
*net,
  */
 static int l2tp_ip6_recv(struct sk_buff *skb)
 {
+   struct net *net = dev_net(skb->dev);
struct sock *sk;
u32 session_id;
u32 tunnel_id;
@@ -154,7 +155,7 @@ static int l2tp_ip6_recv(struct sk_buff *skb)
}
 
/* Ok, this is a data packet. Lookup the session. */
-   session = l2tp_session_find(_net, NULL, session_id);
+   session = l2tp_session_find(net, NULL, session_id);
if (session == NULL)
goto discard;
 
@@ -188,14 +189,14 @@ pass_up:
goto discard;
 
tunnel_id = ntohl(*(__be32 *) >data[4]);
-   tunnel = l2tp_tunnel_find(_net, tunnel_id);
+   tunnel = l2tp_tunnel_find(net, tunnel_id);
if (tunnel != NULL)
sk = tunnel->sock;
else {
struct ipv6hdr *iph = ipv6_hdr(skb);
 
read_lock_bh(_ip6_lock);
-   sk = __l2tp_ip6_bind_lookup(_net, >daddr,
+   sk = __l2tp_ip6_bind_lookup(net, >daddr,
0, tunnel_id);
read_unlock_bh(_ip6_lock);
}
@@ -263,6 +264,7 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr 
*uaddr, int addr_len)
struct inet_sock *inet = inet_sk(sk);
struct ipv6_pinfo *np = inet6_sk(sk);
struct sockaddr_l2tpip6 *addr = (struct sockaddr_l2tpip6 *) uaddr;
+   struct net *net = sock_net(sk);
__be32 v4addr = 0;
int addr_type;
int err;
@@ -286,7 +288,7 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr 
*uaddr, int addr_len)
 
err = -EADDRINUSE;
read_lock_bh(_ip6_lock);
-   if (__l2tp_ip6_bind_lookup(_net, >l2tp_addr,
+   if (__l2tp_ip6_bind_lookup(net, >l2tp_addr,
   sk->sk_bound_dev_if, addr->l2tp_conn_id))
goto out_in_use;
read_unlock_bh(_ip6_lock);
@@ -456,7 +458,7 @@ static int l2tp_ip6_backlog_recv(struct sock *sk, struct 
sk_buff *skb)
return 0;
 
 drop:
-   IP_INC_STATS(_net, IPSTATS_MIB_INDISCARDS);
+   IP_INC_STATS(sock_net(sk), IPSTATS_MIB_INDISCARDS);
kfree_skb(skb);
return -1;
 }
-- 
2.7.4

Re: [PATCH net 1/2] sfc: handle nonlinear SKBs in efx_filter_rfs()

2016-05-26 Thread Edward Cree

On 26/05/16 17:56, Eric Dumazet wrote:
> Lot of magic here. Yet another flow dissection.
>
> Seems to be a good place to use net/core/flow_dissector.c helpers.
Fair point, but that doesn't really feel like 'net' material.
I'll look into flow_dissector and see if I can get something ready for when 
net-next opens back up.  But in the meantime I think this fix is still needed.

Re: [PATCH net 1/2] sfc: handle nonlinear SKBs in efx_filter_rfs()

2016-05-26 Thread Eric Dumazet

On Thu, 2016-05-26 at 17:01 +0100, Edward Cree wrote:
> Previously efx_filter_rfs() assumed that the headers it needed (802.1Q, IP)
>  would be present in the linear data area of the SKB.
> When running with debugging I found that this is not always the case and
>  that in fact the data may all be paged.
> So now use skb_header_pointer() to extract the data.
> 
> Also replace EFX_BUG_ON_PARANOID checks for insufficient data with checks
>  that return -EINVAL, as this case is possible if the received packet was
>  too short.
> 
> Signed-off-by: Edward Cree 
> ---
>  drivers/net/ethernet/sfc/rx.c | 39 +++
>  1 file changed, 23 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
> index 8956995..52790f0 100644
> --- a/drivers/net/ethernet/sfc/rx.c
> +++ b/drivers/net/ethernet/sfc/rx.c
> @@ -842,25 +842,32 @@ int efx_filter_rfs(struct net_device *net_dev, const 
> struct sk_buff *skb,
>   struct efx_nic *efx = netdev_priv(net_dev);
>   struct efx_channel *channel;
>   struct efx_filter_spec spec;
> + /* 60 octets is the maximum length of an IPv4 header (all IPv6 headers
> +  * are 40 octets), and we pull 4 more to get the port numbers
> +  */
> + #define EFX_RFS_HEADER_LENGTH   (sizeof(struct vlan_hdr) + 60 + 4)

Lot of magic here. Yet another flow dissection.

Seems to be a good place to use net/core/flow_dissector.c helpers.

I truly believe that every time a driver has a private flow dissector,
it always have at least one bug.

[PATCH] Documentation: ip-sysctl.txt: clarify secure_redirects

2016-05-26 Thread Eric Garver

Clarify how secure_redirects works. Mention that RFC1122 always applies.

Signed-off-by: Eric Garver 
---
 Documentation/networking/ip-sysctl.txt | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 6c7f365b1515..9ae929395b24 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1036,15 +1036,17 @@ proxy_arp_pvlan - BOOLEAN
 
 shared_media - BOOLEAN
Send(router) or accept(host) RFC1620 shared media redirects.
-   Overrides ip_secure_redirects.
+   Overrides secure_redirects.
shared_media for the interface will be enabled if at least one of
conf/{all,interface}/shared_media is set to TRUE,
it will be disabled otherwise
default TRUE
 
 secure_redirects - BOOLEAN
-   Accept ICMP redirect messages only for gateways,
-   listed in default gateway list.
+   Accept ICMP redirect messages only to gateways listed in the
+   interface's current gateway list. Even if disabled, RFC1122 redirect
+   rules still apply.
+   Overridden by shared_media.
secure_redirects for the interface will be enabled if at least one of
conf/{all,interface}/secure_redirects is set to TRUE,
it will be disabled otherwise
-- 
2.5.5

[PATCH net 2/2] sfc: Track RPS flow IDs per channel instead of per function

2016-05-26 Thread Edward Cree

From: Jon Cooper 

Otherwise we get confused when two flows on different channels get the
 same flow ID.

Signed-off-by: Edward Cree 
---
 drivers/net/ethernet/sfc/efx.c| 32 +++-
 drivers/net/ethernet/sfc/net_driver.h | 12 
 drivers/net/ethernet/sfc/rx.c | 29 +
 3 files changed, 56 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 0705ec86..097f363 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1726,14 +1726,33 @@ static int efx_probe_filters(struct efx_nic *efx)
 
 #ifdef CONFIG_RFS_ACCEL
if (efx->type->offload_features & NETIF_F_NTUPLE) {
-   efx->rps_flow_id = kcalloc(efx->type->max_rx_ip_filters,
-  sizeof(*efx->rps_flow_id),
-  GFP_KERNEL);
-   if (!efx->rps_flow_id) {
+   struct efx_channel *channel;
+   int i, success = 1;
+
+   efx_for_each_channel(channel, efx) {
+   channel->rps_flow_id =
+   kcalloc(efx->type->max_rx_ip_filters,
+   sizeof(*channel->rps_flow_id),
+   GFP_KERNEL);
+   if (!channel->rps_flow_id)
+   success = 0;
+   else
+   for (i = 0;
+i < efx->type->max_rx_ip_filters;
+++i)
+   channel->rps_flow_id[i] =
+   RPS_FLOW_ID_INVALID;
+   }
+
+   if (!success) {
+   efx_for_each_channel(channel, efx)
+   kfree(channel->rps_flow_id);
efx->type->filter_table_remove(efx);
rc = -ENOMEM;
goto out_unlock;
}
+
+   efx->rps_expire_index = efx->rps_expire_channel = 0;
}
 #endif
 out_unlock:
@@ -1744,7 +1763,10 @@ out_unlock:
 static void efx_remove_filters(struct efx_nic *efx)
 {
 #ifdef CONFIG_RFS_ACCEL
-   kfree(efx->rps_flow_id);
+   struct efx_channel *channel;
+
+   efx_for_each_channel(channel, efx)
+   kfree(channel->rps_flow_id);
 #endif
down_write(>filter_sem);
efx->type->filter_table_remove(efx);
diff --git a/drivers/net/ethernet/sfc/net_driver.h 
b/drivers/net/ethernet/sfc/net_driver.h
index 38c4223..d13ddf9 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -403,6 +403,8 @@ enum efx_sync_events_state {
  * @event_test_cpu: Last CPU to handle interrupt or test event for this channel
  * @irq_count: Number of IRQs since last adaptive moderation decision
  * @irq_mod_score: IRQ moderation score
+ * @rps_flow_id: Flow IDs of filters allocated for accelerated RFS,
+ *  indexed by filter ID
  * @n_rx_tobe_disc: Count of RX_TOBE_DISC errors
  * @n_rx_ip_hdr_chksum_err: Count of RX IP header checksum errors
  * @n_rx_tcp_udp_chksum_err: Count of RX TCP and UDP checksum errors
@@ -446,6 +448,8 @@ struct efx_channel {
unsigned int irq_mod_score;
 #ifdef CONFIG_RFS_ACCEL
unsigned int rfs_filters_added;
+#define RPS_FLOW_ID_INVALID 0x
+   u32 *rps_flow_id;
 #endif
 
unsigned n_rx_tobe_disc;
@@ -889,9 +893,9 @@ struct vfdi_status;
  * @filter_sem: Filter table rw_semaphore, for freeing the table
  * @filter_lock: Filter table lock, for mere content changes
  * @filter_state: Architecture-dependent filter table state
- * @rps_flow_id: Flow IDs of filters allocated for accelerated RFS,
- * indexed by filter ID
- * @rps_expire_index: Next index to check for expiry in @rps_flow_id
+ * @rps_expire_channel: Next channel to check for expiry
+ * @rps_expire_index: Next index to check for expiry in
+ * @rps_expire_channel's @rps_flow_id
  * @active_queues: Count of RX and TX queues that haven't been flushed and 
drained.
  * @rxq_flush_pending: Count of number of receive queues that need to be 
flushed.
  * Decremented when the efx_flush_rx_queue() is called.
@@ -1035,7 +1039,7 @@ struct efx_nic {
spinlock_t filter_lock;
void *filter_state;
 #ifdef CONFIG_RFS_ACCEL
-   u32 *rps_flow_id;
+   unsigned int rps_expire_channel;
unsigned int rps_expire_index;
 #endif
 
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 52790f0..1533c08 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -855,6 +855,9 @@ int efx_filter_rfs(struct net_device *net_dev, const struct 
sk_buff *skb,
int nhoff;
int rc;
 
+   if (flow_id == RPS_FLOW_ID_INVALID)
+

[PATCH net 1/2] sfc: handle nonlinear SKBs in efx_filter_rfs()

2016-05-26 Thread Edward Cree

Previously efx_filter_rfs() assumed that the headers it needed (802.1Q, IP)
 would be present in the linear data area of the SKB.
When running with debugging I found that this is not always the case and
 that in fact the data may all be paged.
So now use skb_header_pointer() to extract the data.

Also replace EFX_BUG_ON_PARANOID checks for insufficient data with checks
 that return -EINVAL, as this case is possible if the received packet was
 too short.

Signed-off-by: Edward Cree 
---
 drivers/net/ethernet/sfc/rx.c | 39 +++
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 8956995..52790f0 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -842,25 +842,32 @@ int efx_filter_rfs(struct net_device *net_dev, const 
struct sk_buff *skb,
struct efx_nic *efx = netdev_priv(net_dev);
struct efx_channel *channel;
struct efx_filter_spec spec;
+   /* 60 octets is the maximum length of an IPv4 header (all IPv6 headers
+* are 40 octets), and we pull 4 more to get the port numbers
+*/
+   #define EFX_RFS_HEADER_LENGTH   (sizeof(struct vlan_hdr) + 60 + 4)
+   unsigned char header[EFX_RFS_HEADER_LENGTH];
+   int headlen = min_t(int, EFX_RFS_HEADER_LENGTH, skb->len);
+   #undef EFX_RFS_HEADER_LENGTH
+   void *hptr;
const __be16 *ports;
__be16 ether_type;
int nhoff;
int rc;
 
-   /* The core RPS/RFS code has already parsed and validated
-* VLAN, IP and transport headers.  We assume they are in the
-* header area.
-*/
+   hptr = skb_header_pointer(skb, 0, headlen, header);
+   if (!hptr)
+   return -EINVAL;
 
if (skb->protocol == htons(ETH_P_8021Q)) {
-   const struct vlan_hdr *vh =
-   (const struct vlan_hdr *)skb->data;
+   const struct vlan_hdr *vh = hptr;
 
/* We can't filter on the IP 5-tuple and the vlan
 * together, so just strip the vlan header and filter
 * on the IP part.
 */
-   EFX_BUG_ON_PARANOID(skb_headlen(skb) < sizeof(*vh));
+   if (headlen < sizeof(*vh))
+   return -EINVAL;
ether_type = vh->h_vlan_encapsulated_proto;
nhoff = sizeof(struct vlan_hdr);
} else {
@@ -881,23 +888,23 @@ int efx_filter_rfs(struct net_device *net_dev, const 
struct sk_buff *skb,
spec.ether_type = ether_type;
 
if (ether_type == htons(ETH_P_IP)) {
-   const struct iphdr *ip =
-   (const struct iphdr *)(skb->data + nhoff);
+   const struct iphdr *ip = hptr + nhoff;
 
-   EFX_BUG_ON_PARANOID(skb_headlen(skb) < nhoff + sizeof(*ip));
+   if (headlen < nhoff + sizeof(*ip))
+   return -EINVAL;
if (ip_is_fragment(ip))
return -EPROTONOSUPPORT;
spec.ip_proto = ip->protocol;
spec.rem_host[0] = ip->saddr;
spec.loc_host[0] = ip->daddr;
-   EFX_BUG_ON_PARANOID(skb_headlen(skb) < nhoff + 4 * ip->ihl + 4);
-   ports = (const __be16 *)(skb->data + nhoff + 4 * ip->ihl);
+   if (headlen < nhoff + 4 * ip->ihl + 4)
+   return -EINVAL;
+   ports = (const __be16 *)(hptr + nhoff + 4 * ip->ihl);
} else {
-   const struct ipv6hdr *ip6 =
-   (const struct ipv6hdr *)(skb->data + nhoff);
+   const struct ipv6hdr *ip6 = (hptr + nhoff);
 
-   EFX_BUG_ON_PARANOID(skb_headlen(skb) <
-   nhoff + sizeof(*ip6) + 4);
+   if (headlen < nhoff + sizeof(*ip6) + 4)
+   return -EINVAL;
spec.ip_proto = ip6->nexthdr;
memcpy(spec.rem_host, >saddr, sizeof(ip6->saddr));
memcpy(spec.loc_host, >daddr, sizeof(ip6->daddr));

[PATCH net 0/2] sfc: aRFS fixes

2016-05-26 Thread Edward Cree

The issue fixed in patch #1 was found two years ago and might not still
 happen on current kernels, but (a) we didn't figure out what caused it,
 and (b) the fix should be harmless even if it's unnecessary.

Edward Cree (1):
  sfc: handle nonlinear SKBs in efx_filter_rfs()

Jon Cooper (1):
  sfc: Track RPS flow IDs per channel instead of per function

 drivers/net/ethernet/sfc/efx.c| 32 ++---
 drivers/net/ethernet/sfc/net_driver.h | 12 ---
 drivers/net/ethernet/sfc/rx.c | 68 ++-
 3 files changed, 79 insertions(+), 33 deletions(-)

Re: [RFC PATCH 0/7] tou: Transports over UDP - part I

2016-05-26 Thread Alex Elsayed

Tom Herbert  herbertland.com> writes:

> 
> Transports over UDP is intended to encapsulate TCP and other transport
> protocols directly and securely in UDP.
> 
> The goal of this work is twofold:
> 
> 1) Allow applications to run their own transport layer stack (i.e.from
>userspace). This eliminates dependencies on the OS (e.g. solves a
>major dependency issue for Facebook on clients).
> 
> 2) Make transport layer headers (all of L4) invisible to the network
>so that they can't do intrusive actions at L4. This will be enforced
>with DTLS in use.

Just popping in to note that this has significant similarities with the
DeDiS group's Tng project[1], which takes the approach of splitting the
"transport layer" into four sub-layers:

1.) Endpoint (What port?)
2.) Flow (Congestion control)
3.) Isolation (Integrity, confidentiality, and preventing middlebox mangling)
4.) Semantic (End-to-end guarantees, fate-sharing, stream vs. dgram vs.
seqpacket, etc)

[1] http://dedis.cs.yale.edu/2009/tng/

Re: [PATCH net 0/4] net/mlx4_en: fix stats

2016-05-26 Thread Or Gerlitz

On Thu, May 26, 2016 at 3:57 PM, Eric Dumazet  wrote:
> On Thu, 2016-05-26 at 12:44 +0300, Tariq Toukan wrote:
>> Hi Eric,
>>
>> > mlx4 has various bugs in its ndo_get_stats() and related functions.
>> > This patch series address the obvious issues.
>> > Remaining ones will be discussed later.
>> >
>> Thanks for the fixes.
>> I see they were already applied.
>> I reviewed them all and replied to patch 4/4, the rest look good to me.
>> Please CC me as the maintainer of mlx4_en on future patches.
>
> If you are mlx4_en maintainer, please submit an official patch so that
> non Mellanox employees can get this information using the normal way ?

Eric, sure, we were on a transition period and Tariq was not fully
familiar with that practice,
I'd like to make sure the move is finalized internally and then we'll
send the patch..

On a somehow related note, Dave, Eric's patches were sent after our Wed working
hours ended and accepted before our Thu working hours started.. could we get a
better chance to review driver patches before acceptance?  I know
there were times
where we've screwed up and things didn't get fast attention, but we're
working to improve
so... get us a chance [1]?

Or.

[1] when it comes to weekends, the IL WW ends on Thu when the US WW
has almost two
full days to go (Thu, Fri), so here the response latency might be
bigger, but Sun-Thu we should
be responding on the same day.

Re: BUG: net/netlink: KASAN: use-after-free in netlink_sock_destruct

2016-05-26 Thread Eric Dumazet

On Thu, 2016-05-26 at 22:48 +0800, Baozeng Ding wrote:
> Hi all,
> I've got the following report use-after-free in netlink_sock_destruct while 
> running syzkaller.
> Unfortunately no reproducer.The kernel version is 4.6 (May 15, on commit 
> 2dcd0af568b0cf583645c8a317dd12e344b1c72a). Thanks.
> 
> ==
> BUG: KASAN: use-after-free in kfree_skb+0x28c/0x310 at addr 880036c1179c
> Read of size 4 by task syz-executor/21618
> =
> BUG skbuff_head_cache (Tainted: GW  ): kasan: bad access detected
> -
> 
> Disabling lock debugging due to kernel taint
> INFO: Slab 0xeadb0400 objects=25 used=3 fp=0x880036c116c0 
> flags=0x1fffc004080
> INFO: Object 0x880036c11680 @offset=5760 fp=0x
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  0002 88006da07c40 8295f5f1 88003e0fc5c0
>  880036c11680 eadb0400 880036c1 88006da07c70
>  8171144d 88003e0fc5c0 eadb0400 880036c11680
> Call Trace:
>  [< inline >] __dump_stack /lib/dump_stack.c:15
>  [] dump_stack+0xb3/0x112 /lib/dump_stack.c:51
>  [] print_trailer+0x10d/0x190 /mm/slub.c:667
>  [] object_err+0x2f/0x40 /mm/slub.c:674
>  [< inline >] print_address_description /mm/kasan/report.c:179
>  [] kasan_report_error+0x218/0x530 /mm/kasan/report.c:275
>  [] ? debug_check_no_locks_freed+0x290/0x290 
> /kernel/locking/lockdep.c:4212
>  [< inline >] kasan_report /mm/kasan/report.c:297
>  [] __asan_report_load4_noabort+0x3e/0x40 
> /mm/kasan/report.c:317
>  [< inline >] ? atomic_read /include/linux/compiler.h:222
>  [] ? kfree_skb+0x28c/0x310 /net/core/skbuff.c:699
>  [< inline >] atomic_read /include/linux/compiler.h:222
>  [] kfree_skb+0x28c/0x310 /net/core/skbuff.c:699
>  [] netlink_sock_destruct+0xeb/0x2b0 
> /net/netlink/af_netlink.c:334
>  [] ? __netlink_create+0x1d0/0x1d0 
> /net/netlink/af_netlink.c:577
>  [] sk_destruct+0x4a/0x4f0 /net/core/sock.c:1429
>  [] __sk_free+0x57/0x200 /net/core/sock.c:1459
>  [] sk_free+0x30/0x40 /net/core/sock.c:1470
>  [< inline >] sock_put /include/net/sock.h:1506
>  [] deferred_put_nlk_sk+0x34/0x40 
> /net/netlink/af_netlink.c:652
>  [< inline >] __rcu_reclaim /kernel/rcu/rcu.h:118
>  [< inline >] rcu_do_batch /kernel/rcu/tree.c:2681
>  [< inline >] invoke_rcu_callbacks /kernel/rcu/tree.c:2947
>  [< inline >] __rcu_process_callbacks /kernel/rcu/tree.c:2914
>  [] rcu_process_callbacks+0xa71/0x11d0 
> /kernel/rcu/tree.c:2931
>  [< inline >] ? __rcu_reclaim /kernel/rcu/rcu.h:108
>  [< inline >] ? rcu_do_batch /kernel/rcu/tree.c:2681
>  [< inline >] ? invoke_rcu_callbacks /kernel/rcu/tree.c:2947
>  [< inline >] ? __rcu_process_callbacks /kernel/rcu/tree.c:2914
>  [] ? rcu_process_callbacks+0xa1c/0x11d0 
> /kernel/rcu/tree.c:2931
>  [] ? __netlink_deliver_tap+0x7c0/0x7c0 
> /net/netlink/af_netlink.c:204
>  [] __do_softirq+0x22b/0x8da /kernel/softirq.c:273
>  [< inline >] invoke_softirq /kernel/softirq.c:350
>  [] irq_exit+0x15d/0x190 /kernel/softirq.c:391
>  [< inline >] exiting_irq /./arch/x86/include/asm/apic.h:658
>  [] smp_apic_timer_interrupt+0x7b/0xa0 
> /arch/x86/kernel/apic/apic.c:932
>  [] apic_timer_interrupt+0x8c/0xa0 
> /arch/x86/entry/entry_64.S:454
>  [< inline >] ? atomic_add_return /./arch/x86/include/asm/atomic.h:156
>  [< inline >] ? kref_get /include/linux/kref.h:46
>  [] ? klist_next+0x177/0x400 /lib/klist.c:393
>  [< inline >] ? kref_get /include/linux/kref.h:46
>  [] ? klist_next+0x168/0x400 /lib/klist.c:393
>  [] class_dev_iter_next+0x8b/0xd0 /drivers/base/class.c:324
>  [] ? tty_get_pgrp+0x80/0x80 /drivers/tty/tty_io.c:2525
>  [] class_find_device+0x101/0x1c0 /drivers/base/class.c:428
>  [] ? class_for_each_device+0x1d0/0x1d0 
> /drivers/base/class.c:375
>  [< inline >] tty_get_device /drivers/tty/tty_io.c:3139
>  [] alloc_tty_struct+0x5fb/0x840 /drivers/tty/tty_io.c:3183
>  [] ? do_SAK_work+0x20/0x20 /drivers/tty/tty_io.c:3112
>  [] ? mutex_lock_interruptible_nested+0x980/0x980 ??:?
>  [] tty_init_dev+0x78/0x4b0 /drivers/tty/tty_io.c:1532
>  [< inline >] tty_open_by_driver /drivers/tty/tty_io.c:2065
>  [] tty_open+0xd31/0x1050 /drivers/tty/tty_io.c:2113
>  [] ? tty_init_dev+0x4b0/0x4b0 /drivers/tty/tty_io.c:1543
>  [< inline >] ? spin_unlock /include/linux/spinlock.h:347
>  [] ? chrdev_open+0xbf/0x4c0 /fs/char_dev.c:376
>  [] ? tty_init_dev+0x4b0/0x4b0 /drivers/tty/tty_io.c:1543
>  [] chrdev_open+0x22a/0x4c0 /fs/char_dev.c:388
>  [] ? cdev_put+0x60/0x60 /fs/char_dev.c:338
>  [] ? __fsnotify_parent+0x5e/0x2b0 /fs/notify/fsnotify.c:98
>  [] ?

BUG: net/netlink: KASAN: use-after-free in netlink_sock_destruct

2016-05-26 Thread Baozeng Ding

Hi all,
I've got the following report use-after-free in netlink_sock_destruct while 
running syzkaller.
Unfortunately no reproducer.The kernel version is 4.6 (May 15, on commit 
2dcd0af568b0cf583645c8a317dd12e344b1c72a). Thanks.

==
BUG: KASAN: use-after-free in kfree_skb+0x28c/0x310 at addr 880036c1179c
Read of size 4 by task syz-executor/21618
=
BUG skbuff_head_cache (Tainted: GW  ): kasan: bad access detected
-

Disabling lock debugging due to kernel taint
INFO: Slab 0xeadb0400 objects=25 used=3 fp=0x880036c116c0 
flags=0x1fffc004080
INFO: Object 0x880036c11680 @offset=5760 fp=0x
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
 0002 88006da07c40 8295f5f1 88003e0fc5c0
 880036c11680 eadb0400 880036c1 88006da07c70
 8171144d 88003e0fc5c0 eadb0400 880036c11680
Call Trace:
 [< inline >] __dump_stack /lib/dump_stack.c:15
 [] dump_stack+0xb3/0x112 /lib/dump_stack.c:51
 [] print_trailer+0x10d/0x190 /mm/slub.c:667
 [] object_err+0x2f/0x40 /mm/slub.c:674
 [< inline >] print_address_description /mm/kasan/report.c:179
 [] kasan_report_error+0x218/0x530 /mm/kasan/report.c:275
 [] ? debug_check_no_locks_freed+0x290/0x290 
/kernel/locking/lockdep.c:4212
 [< inline >] kasan_report /mm/kasan/report.c:297
 [] __asan_report_load4_noabort+0x3e/0x40 
/mm/kasan/report.c:317
 [< inline >] ? atomic_read /include/linux/compiler.h:222
 [] ? kfree_skb+0x28c/0x310 /net/core/skbuff.c:699
 [< inline >] atomic_read /include/linux/compiler.h:222
 [] kfree_skb+0x28c/0x310 /net/core/skbuff.c:699
 [] netlink_sock_destruct+0xeb/0x2b0 
/net/netlink/af_netlink.c:334
 [] ? __netlink_create+0x1d0/0x1d0 
/net/netlink/af_netlink.c:577
 [] sk_destruct+0x4a/0x4f0 /net/core/sock.c:1429
 [] __sk_free+0x57/0x200 /net/core/sock.c:1459
 [] sk_free+0x30/0x40 /net/core/sock.c:1470
 [< inline >] sock_put /include/net/sock.h:1506
 [] deferred_put_nlk_sk+0x34/0x40 
/net/netlink/af_netlink.c:652
 [< inline >] __rcu_reclaim /kernel/rcu/rcu.h:118
 [< inline >] rcu_do_batch /kernel/rcu/tree.c:2681
 [< inline >] invoke_rcu_callbacks /kernel/rcu/tree.c:2947
 [< inline >] __rcu_process_callbacks /kernel/rcu/tree.c:2914
 [] rcu_process_callbacks+0xa71/0x11d0 /kernel/rcu/tree.c:2931
 [< inline >] ? __rcu_reclaim /kernel/rcu/rcu.h:108
 [< inline >] ? rcu_do_batch /kernel/rcu/tree.c:2681
 [< inline >] ? invoke_rcu_callbacks /kernel/rcu/tree.c:2947
 [< inline >] ? __rcu_process_callbacks /kernel/rcu/tree.c:2914
 [] ? rcu_process_callbacks+0xa1c/0x11d0 
/kernel/rcu/tree.c:2931
 [] ? __netlink_deliver_tap+0x7c0/0x7c0 
/net/netlink/af_netlink.c:204
 [] __do_softirq+0x22b/0x8da /kernel/softirq.c:273
 [< inline >] invoke_softirq /kernel/softirq.c:350
 [] irq_exit+0x15d/0x190 /kernel/softirq.c:391
 [< inline >] exiting_irq /./arch/x86/include/asm/apic.h:658
 [] smp_apic_timer_interrupt+0x7b/0xa0 
/arch/x86/kernel/apic/apic.c:932
 [] apic_timer_interrupt+0x8c/0xa0 
/arch/x86/entry/entry_64.S:454
 [< inline >] ? atomic_add_return /./arch/x86/include/asm/atomic.h:156
 [< inline >] ? kref_get /include/linux/kref.h:46
 [] ? klist_next+0x177/0x400 /lib/klist.c:393
 [< inline >] ? kref_get /include/linux/kref.h:46
 [] ? klist_next+0x168/0x400 /lib/klist.c:393
 [] class_dev_iter_next+0x8b/0xd0 /drivers/base/class.c:324
 [] ? tty_get_pgrp+0x80/0x80 /drivers/tty/tty_io.c:2525
 [] class_find_device+0x101/0x1c0 /drivers/base/class.c:428
 [] ? class_for_each_device+0x1d0/0x1d0 
/drivers/base/class.c:375
 [< inline >] tty_get_device /drivers/tty/tty_io.c:3139
 [] alloc_tty_struct+0x5fb/0x840 /drivers/tty/tty_io.c:3183
 [] ? do_SAK_work+0x20/0x20 /drivers/tty/tty_io.c:3112
 [] ? mutex_lock_interruptible_nested+0x980/0x980 ??:?
 [] tty_init_dev+0x78/0x4b0 /drivers/tty/tty_io.c:1532
 [< inline >] tty_open_by_driver /drivers/tty/tty_io.c:2065
 [] tty_open+0xd31/0x1050 /drivers/tty/tty_io.c:2113
 [] ? tty_init_dev+0x4b0/0x4b0 /drivers/tty/tty_io.c:1543
 [< inline >] ? spin_unlock /include/linux/spinlock.h:347
 [] ? chrdev_open+0xbf/0x4c0 /fs/char_dev.c:376
 [] ? tty_init_dev+0x4b0/0x4b0 /drivers/tty/tty_io.c:1543
 [] chrdev_open+0x22a/0x4c0 /fs/char_dev.c:388
 [] ? cdev_put+0x60/0x60 /fs/char_dev.c:338
 [] ? __fsnotify_parent+0x5e/0x2b0 /fs/notify/fsnotify.c:98
 [] ? security_file_open+0x89/0x190 /security/security.c:840
 [] do_dentry_open+0x6a2/0xcb0 /fs/open.c:736
 [] ? cdev_put+0x60/0x60 /fs/char_dev.c:338
 [] vfs_open+0x113/0x210 /fs/open.c:849
 [] ? may_open+0x1cd/0x260 /fs/namei.c:2776
 [< inline >]

Re: [PATCH RESEND 7/8] pipe: account to kmemcg

2016-05-26 Thread Eric Dumazet

On Thu, 2016-05-26 at 16:59 +0300, Vladimir Davydov wrote:
> On Thu, May 26, 2016 at 04:04:55PM +0900, Minchan Kim wrote:
> > On Wed, May 25, 2016 at 01:30:11PM +0300, Vladimir Davydov wrote:
> > > On Tue, May 24, 2016 at 01:04:33PM -0700, Eric Dumazet wrote:
> > > > On Tue, 2016-05-24 at 19:13 +0300, Vladimir Davydov wrote:
> > > > > On Tue, May 24, 2016 at 05:59:02AM -0700, Eric Dumazet wrote:
> > > > > ...
> > > > > > > +static int anon_pipe_buf_steal(struct pipe_inode_info *pipe,
> > > > > > > +struct pipe_buffer *buf)
> > > > > > > +{
> > > > > > > + struct page *page = buf->page;
> > > > > > > +
> > > > > > > + if (page_count(page) == 1) {
> > > > > > 
> > > > > > This looks racy : some cpu could have temporarily elevated page 
> > > > > > count.
> > > > > 
> > > > > All pipe operations (pipe_buf_operations->get, ->release, ->steal) are
> > > > > supposed to be called under pipe_lock. So, if we see a 
> > > > > pipe_buffer->page
> > > > > with refcount of 1 in ->steal, that means that we are the only its 
> > > > > user
> > > > > and it can't be spliced to another pipe.
> > > > > 
> > > > > In fact, I just copied the code from generic_pipe_buf_steal, adding
> > > > > kmemcg related checks along the way, so it should be fine.
> > > > 
> > > > So you guarantee that no other cpu might have done
> > > > get_page_unless_zero() right before this test ?
> > > 
> > > Each pipe_buffer holds a reference to its page. If we find page's
> > > refcount to be 1 here, then it can be referenced only by our
> > > pipe_buffer. And the refcount cannot be increased by a parallel thread,
> > > because we hold pipe_lock, which rules out splice, and otherwise it's
> > > impossible to reach the page as it is not on lru. That said, I think I
> > > guarantee that this should be safe.
> > 
> > I don't know kmemcg internal and pipe stuff so my comment might be
> > totally crap.
> > 
> > No one cannot guarantee any CPU cannot held a reference of a page.
> > Look at get_page_unless_zero usecases.
> > 
> > 1. balloon_page_isolate
> > 
> > It can hold a reference in random page and then verify the page
> > is balloon page. Otherwise, just put.
> > 
> > 2. page_idle_get_page
> > 
> > It has PageLRU check but it's racy so it can hold a reference
> > of randome page and then verify within zone->lru_lock. If it's
> > not LRU page, just put.
> 
> Well, I see your concern now - even if a page is not on lru and we
> locked all structs pointing to it, it can always get accessed by pfn in
> a completely unrelated thread, like in examples you gave above. That's a
> fair point.
> 
> However, I still think that it's OK in case of pipe buffers. What can
> happen if somebody takes a transient reference to a pipe buffer page? At
> worst, we'll see page_count > 1 due to temporary ref and abort stealing,
> falling back on copying instead. That's OK, because stealing is not
> guaranteed. Can a function that takes a transient ref to page by pfn
> mistakenly assume that this is a page it's interested in? I don't think
> so, because this page has no marks on it except special _mapcount value,
> which should only be set on kmemcg pages.

Well, all this information deserve to be in the changelog.

Maybe in 6 months, this will be incredibly useful for bug hunting.

pipes can be used to exchange data (or pages) between processes in
different domains.

If kmemcg is not precise, this could be used by some attackers to force
some processes to consume all their budget and eventually not be able to
allocate new pages.

[RFC net-next 1/1] ethtool: Add support for set eeprom metadata.

2016-05-26 Thread Sudarsana Reddy Kalluru

Currently ethtool implementation does not have a way to pass the metadata
for eeprom related operations. Some adapters have a complicated
non-volatile memory implementation that requires additional information –
there are drivers [bnx2x and bnxt] that use the ‘magic’ field in the
{G,S}EEPROM  for that purpose, although that’s not its intended usage.

This patch adds a provision to pass the eeprom metadata for
%ETHTOOL_SEEPROM/%ETHTOOL_GEEPROM implementations. User provided metadata
will be cached by the driver and assigns a magic value which the
application need to use for the subsequent {G,S}EEPROM command.
---
 include/linux/ethtool.h  |  1 +
 include/uapi/linux/ethtool.h | 29 +
 net/core/ethtool.c   | 25 +
 3 files changed, 55 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 9ded8c6..6f98c2a 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -316,6 +316,7 @@ struct ethtool_ops {
  struct ethtool_eeprom *, u8 *);
int (*set_eeprom)(struct net_device *,
  struct ethtool_eeprom *, u8 *);
+   int (*set_meeprom)(struct net_device *, struct ethtool_meeprom *);
int (*get_coalesce)(struct net_device *, struct ethtool_coalesce *);
int (*set_coalesce)(struct net_device *, struct ethtool_coalesce *);
void(*get_ringparam)(struct net_device *,
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 9222db8..f0f28a8 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -279,6 +279,13 @@ struct ethtool_regs {
  * The value passed in to %ETHTOOL_SEEPROM must match the value
  * returned by %ETHTOOL_GEEPROM for the same device.  This is
  * unused when @cmd is %ETHTOOL_GMODULEEEPROM.
+ * Depending on @cmd, the value passed must match:
+ * - For %ETHTOOL_GMODULEEEPROM, no requirements.
+ * - For %ETHTOOL_GEEPROM, must match the value returned by the
+ * previous %ETHTOOL_SEEPROMMETA in case device supports it.
+ * - For %ETHTOOL_SEEPROM, must match value returned by the previous
+ * %ETHTOOL_SEEPROMMETA in case device supports it, or by
+ * previous %ETHTOOL_GEEPROM otherwise.
  * @offset: Offset within the EEPROM to begin reading/writing, in bytes
  * @len: On entry, number of bytes to read/write.  On successful
  * return, number of bytes actually read/written.  In case of
@@ -298,6 +305,27 @@ struct ethtool_eeprom {
 };
 
 /**
+ * struct ethtool_meeprom - Set EEPROM metadata
+ * @cmd: Command number = %ETHTOOL_SEEPROMMETA
+ * @metadata: eeprom metadata corresponds to the successive %ETHTOOL_SEEPROM or
+ * %ETHTOOL_GEEPROM command.
+ * @magic: A 'magic cookie' value to be used by the successive 
%ETHTOOL_SEEPROM/
+ * %ETHTOOL_GEEPROM command. User provided eeprom metadata will be
+ * assoicated with a magic value. The value passed in to %ETHTOOL_SEEPROM
+ * or %ETHTOOL_GEEPROM must match with the value returned by
+ * %ETHTOOL_SEEPROMMETA for the same device.
+ * Essentialy user will send eeprom meta data, followed by %ETHTOOL_SEEPROM
+ * or %ETHTOOL_GEEPROM command. It's driver's responsibility to map the
+ * metadata to the subsequent eeprom command, using the magic to validate
+ * correctness.
+ */
+struct ethtool_meeprom {
+   __u32 cmd;
+   __u32 metadata;
+   __u32 magic;
+};
+
+/**
  * struct ethtool_eee - Energy Efficient Ethernet information
  * @cmd: ETHTOOL_{G,S}EEE
  * @supported: Mask of %SUPPORTED_* flags for the speed/duplex combinations
@@ -1315,6 +1343,7 @@ struct ethtool_per_queue_op {
 #define ETHTOOL_GLINKSETTINGS  0x004c /* Get ethtool_link_settings */
 #define ETHTOOL_SLINKSETTINGS  0x004d /* Set ethtool_link_settings */
 
+#define ETHTOOL_SEEPROMMETA0x004e /* Set eeprom metadata */
 
 /* compatibility with older code */
 #define SPARC_ETH_GSET ETHTOOL_GSET
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index bdb4013..c5efadd 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2421,6 +2421,28 @@ static int ethtool_set_per_queue(struct net_device *dev, 
void __user *useraddr)
};
 }
 
+static int ethtool_set_meeprom(struct net_device *dev, void __user *useraddr)
+{
+   const struct ethtool_ops *ops = dev->ethtool_ops;
+   struct ethtool_meeprom meeprom;
+   int ret;
+
+   if (!ops->set_meeprom)
+   return -EOPNOTSUPP;
+
+   if (copy_from_user(, useraddr, sizeof(meeprom)))
+   return -EFAULT;
+
+   ret = ops->set_meeprom(dev, );
+   if (ret)
+   return ret;
+
+   if (copy_to_user(useraddr, , sizeof(meeprom)))
+   return -EFAULT;
+
+   return 0;
+}
+
 /* The main entry point in this file.  Called from net/core/dev_ioctl.c */
 
 int dev_ethtool(struct net *net, struct ifreq *ifr)
@@

[PATCH] net: Fragment large datagrams even when IP_HDRINCL is set.

2016-05-26 Thread Alan Davey

One of the bugs documented in the raw(7) man page is as follows: When the
IP_HDRINCL option is set, datagrams will not be fragmented and are limited to
the interface MTU.

This patch fixes the bug by removing the check for "length > rt->dst.dev->mtu"
in raw_send_hdrinc() (net/ipv4/raw.c).  Datagrams are no longer limited to the
interface MTU size if the IP_HDRINCL option is set, but are fragmented, if
necessary, in the same way as all other datagrams.

Signed-off-by: Alan Davey 
---
 net/ipv4/raw.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 8d22de7..de690b3 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -351,11 +351,6 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 
*fl4,
struct rtable *rt = *rtp;
int hlen, tlen;
 
-   if (length > rt->dst.dev->mtu) {
-   ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport,
-  rt->dst.dev->mtu);
-   return -EMSGSIZE;
-   }
if (flags_PROBE)
goto out;
 
-- 
1.8.3.1

Re: [PATCH RESEND 7/8] pipe: account to kmemcg

2016-05-26 Thread Vladimir Davydov

On Thu, May 26, 2016 at 04:04:55PM +0900, Minchan Kim wrote:
> On Wed, May 25, 2016 at 01:30:11PM +0300, Vladimir Davydov wrote:
> > On Tue, May 24, 2016 at 01:04:33PM -0700, Eric Dumazet wrote:
> > > On Tue, 2016-05-24 at 19:13 +0300, Vladimir Davydov wrote:
> > > > On Tue, May 24, 2016 at 05:59:02AM -0700, Eric Dumazet wrote:
> > > > ...
> > > > > > +static int anon_pipe_buf_steal(struct pipe_inode_info *pipe,
> > > > > > +  struct pipe_buffer *buf)
> > > > > > +{
> > > > > > +   struct page *page = buf->page;
> > > > > > +
> > > > > > +   if (page_count(page) == 1) {
> > > > > 
> > > > > This looks racy : some cpu could have temporarily elevated page count.
> > > > 
> > > > All pipe operations (pipe_buf_operations->get, ->release, ->steal) are
> > > > supposed to be called under pipe_lock. So, if we see a pipe_buffer->page
> > > > with refcount of 1 in ->steal, that means that we are the only its user
> > > > and it can't be spliced to another pipe.
> > > > 
> > > > In fact, I just copied the code from generic_pipe_buf_steal, adding
> > > > kmemcg related checks along the way, so it should be fine.
> > > 
> > > So you guarantee that no other cpu might have done
> > > get_page_unless_zero() right before this test ?
> > 
> > Each pipe_buffer holds a reference to its page. If we find page's
> > refcount to be 1 here, then it can be referenced only by our
> > pipe_buffer. And the refcount cannot be increased by a parallel thread,
> > because we hold pipe_lock, which rules out splice, and otherwise it's
> > impossible to reach the page as it is not on lru. That said, I think I
> > guarantee that this should be safe.
> 
> I don't know kmemcg internal and pipe stuff so my comment might be
> totally crap.
> 
> No one cannot guarantee any CPU cannot held a reference of a page.
> Look at get_page_unless_zero usecases.
> 
> 1. balloon_page_isolate
> 
> It can hold a reference in random page and then verify the page
> is balloon page. Otherwise, just put.
> 
> 2. page_idle_get_page
> 
> It has PageLRU check but it's racy so it can hold a reference
> of randome page and then verify within zone->lru_lock. If it's
> not LRU page, just put.

Well, I see your concern now - even if a page is not on lru and we
locked all structs pointing to it, it can always get accessed by pfn in
a completely unrelated thread, like in examples you gave above. That's a
fair point.

However, I still think that it's OK in case of pipe buffers. What can
happen if somebody takes a transient reference to a pipe buffer page? At
worst, we'll see page_count > 1 due to temporary ref and abort stealing,
falling back on copying instead. That's OK, because stealing is not
guaranteed. Can a function that takes a transient ref to page by pfn
mistakenly assume that this is a page it's interested in? I don't think
so, because this page has no marks on it except special _mapcount value,
which should only be set on kmemcg pages.

Thanks,
Vladimir

Re: [PATCH v2] net: alx: use custom skb allocator

2016-05-26 Thread Eric Dumazet

On Thu, 2016-05-26 at 16:41 +0800, Feng Tang wrote:
> On Wed, May 25, 2016 at 07:53:41PM -0400, David Miller wrote:

> > 
> > But now that we have at least two instances of this code we really
> > need to put a common version somewhere. :-/
> 
> I agree, and furthermore I noticed there are some similar routines
> in the 4 individual Atheros drivers atlx/alx/atl1c/atl1e, which may
> be unified by a simple framework for them all. Maybe the driver
> maintainer from Atheros could take a look, as they can reach all
> the real HWs :)

Note that you could also use the napi_get_frags() API that other drivers
use, and you attach page frags to the skb.

(Ie use small skb->head allocations where the stack will pull the
headers, but use your own page based allocations for the frames)

This might allow you to use the page reuse trick that some Intel drivers
use.

Look for igb_can_reuse_rx_page() for an example.

Thanks.

[PATCH net] ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr

2016-05-26 Thread Baozeng Ding

Fix a logic error to avoid potential null pointer dereference.

Signed-off-by: Baozeng Ding 
---
 net/ieee802154/nl802154.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c
index ca207db..116187b 100644
--- a/net/ieee802154/nl802154.c
+++ b/net/ieee802154/nl802154.c
@@ -1289,8 +1289,8 @@ ieee802154_llsec_parse_dev_addr(struct nlattr *nla,
 nl802154_dev_addr_policy))
return -EINVAL;
 
-   if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] &&
-   !attrs[NL802154_DEV_ADDR_ATTR_MODE] &&
+   if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] ||
+   !attrs[NL802154_DEV_ADDR_ATTR_MODE] ||
!(attrs[NL802154_DEV_ADDR_ATTR_SHORT] ||
  attrs[NL802154_DEV_ADDR_ATTR_EXTENDED]))
return -EINVAL;
-- 
1.9.1

[PATCH net] ieee802154: fix logic error in, ieee802154_llsec_parse_dev_addr

2016-05-26 Thread Baozeng Ding

Fix a logic error to avoid potential null pointer dereference.

Signed-off-by: Baozeng Ding 
---
 net/ieee802154/nl802154.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c
index ca207db..116187b 100644
--- a/net/ieee802154/nl802154.c
+++ b/net/ieee802154/nl802154.c
@@ -1289,8 +1289,8 @@ ieee802154_llsec_parse_dev_addr(struct nlattr *nla,
 nl802154_dev_addr_policy))
return -EINVAL;
 
-   if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] &&
-   !attrs[NL802154_DEV_ADDR_ATTR_MODE] &&
+   if (!attrs[NL802154_DEV_ADDR_ATTR_PAN_ID] ||
+   !attrs[NL802154_DEV_ADDR_ATTR_MODE] ||
!(attrs[NL802154_DEV_ADDR_ATTR_SHORT] ||
  attrs[NL802154_DEV_ADDR_ATTR_EXTENDED]))
return -EINVAL;
-- 
1.9.1

RE: [PATCH net 4/4] net/mlx4_en: get rid of private net_device_stats

2016-05-26 Thread David Laight

From: Tariq Toukan
> Sent: 26 May 2016 10:39
...
> I am aware that clearing the stats structure might be redundant today,
> as the function is called only within mlx4_en_open, but we might want to
> call the function in other flows in the future.

My personal view is that stats should never be cleared.
Any code that wants stats deltas should be remembering the old
values itself.

For SNMP (etc) you may want to save the time when the stats block
was created (when everything would be zero).

David

Re: [PATCH net 4/4] net/mlx4_en: get rid of private net_device_stats

2016-05-26 Thread Eric Dumazet

On Thu, 2016-05-26 at 12:38 +0300, Tariq Toukan wrote:
> Hi Eric,
> 
> Thanks for the fix.
> 
> > We do not need to clear fields that are already 0.
> Why is it always true that dev->stats is already 0 at the point ndo_open 
> is called?

netdev structs are zero allocated. (kzalloc)

Re: [PATCH net 0/4] net/mlx4_en: fix stats

2016-05-26 Thread Eric Dumazet

On Thu, 2016-05-26 at 12:44 +0300, Tariq Toukan wrote:
> Hi Eric,
> 
> > mlx4 has various bugs in its ndo_get_stats() and related functions.
> > This patch series address the obvious issues.
> > Remaining ones will be discussed later.
> >
> Thanks for the fixes.
> I see they were already applied.
> I reviewed them all and replied to patch 4/4, the rest look good to me.
> Please CC me as the maintainer of mlx4_en on future patches.

If you are mlx4_en maintainer, please submit an official patch so that
non Mellanox employees can get this information using the normal way ?

Thanks.

diff --git a/MAINTAINERS b/MAINTAINERS
index 8d397157981c..4a35221e4819 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7250,6 +7250,7 @@ F:drivers/scsi/megaraid/
 
 MELLANOX ETHERNET DRIVER (mlx4_en)
 M: Eugenia Emantayev 
+M: Tariq Toukan 
 L: netdev@vger.kernel.org
 S: Supported
 W: http://www.mellanox.com

Re: [PATCH net 4/4] net/mlx4_en: get rid of private net_device_stats

2016-05-26 Thread Eric Dumazet

On Thu, 2016-05-26 at 12:38 +0300, Tariq Toukan wrote:
> Hi Eric,
> 
> Thanks for the fix.
> 
> > We do not need to clear fields that are already 0.
> Why is it always true that dev->stats is already 0 at the point ndo_open 
> is called?
> Is it true also in a flow of open -> stop -> open? I searched the kernel 
> stack for this but couldn't find.
> > @@ -1877,7 +1877,6 @@ static void mlx4_en_clear_stats(struct net_device 
> > *dev)
> > if (mlx4_en_DUMP_ETH_STATS(mdev, priv->port, 1))
> > en_dbg(HW, priv, "Failed dumping statistics\n");
> >   
> > -   memset(>stats, 0, sizeof(priv->stats));
> > memset(>pstats, 0, sizeof(priv->pstats));
> > memset(>pkstats, 0, sizeof(priv->pkstats));
> > memset(>port_stats, 0, sizeof(priv->port_stats));
> The role of this function is to clear the stats, no matter when and 
> where it is called.
> I am aware that clearing the stats structure might be redundant today, 
> as the function is called only within mlx4_en_open, but we might want to 
> call the function in other flows in the future.

This is the ' non obvious fix' we need to discuss for net-next

When we enslave a mlx4 NIC in a bonding driver, we sometime get
incorrect bond stats because mlx4 stats are reset.

These stats being computed using deltas, this can not work as is.

I believe the rule is to not clear the netdev stats at open()

Anyway, for this particular path, it does not matter, as
mlx4_en_DUMP_ETH_STATS() will set all the fields to their value.

[PATCH 1/1] net: nps_enet: Disable interrupts before napi reschedule

2016-05-26 Thread Elad Kanfi

From: Elad Kanfi 

Since NAPI works by shutting down event interrupts when theres
work and turning them on when theres none, the net driver must
make sure that interrupts are disabled when it reschedules polling.
By calling napi_reschedule, the driver switches to polling mode,
therefor there should be no interrupt interference.
Any received packets will be handled in nps_enet_poll by polling the HW
indication of received packet until all packets are handled.

Signed-off-by: Elad Kanfi 
Acked-by: Noam Camus 
---
 drivers/net/ethernet/ezchip/nps_enet.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/ezchip/nps_enet.c 
b/drivers/net/ethernet/ezchip/nps_enet.c
index 085f912..06f0317 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.c
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ -205,8 +205,10 @@ static int nps_enet_poll(struct napi_struct *napi, int 
budget)
 * re-adding ourselves to the poll list.
 */
 
-   if (priv->tx_skb && !tx_ctrl_ct)
+   if (priv->tx_skb && !tx_ctrl_ct) {
+   nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0);
napi_reschedule(napi);
+   }
}
 
return work_done;
-- 
1.7.1

[PATCH v1 1/1] net/lapb: tuse %*ph to dump buffers

2016-05-26 Thread Andy Shevchenko

Use %*ph specifier to dump small buffers in hex format instead doing this
byte-by-byte.

Signed-off-by: Andy Shevchenko 
---
 net/lapb/lapb_in.c   |  5 ++---
 net/lapb/lapb_out.c  |  4 +---
 net/lapb/lapb_subr.c | 14 +-
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/net/lapb/lapb_in.c b/net/lapb/lapb_in.c
index 5dba899..1824708 100644
--- a/net/lapb/lapb_in.c
+++ b/net/lapb/lapb_in.c
@@ -444,10 +444,9 @@ static void lapb_state3_machine(struct lapb_cb *lapb, 
struct sk_buff *skb,
break;
 
case LAPB_FRMR:
-   lapb_dbg(1, "(%p) S3 RX FRMR(%d) %02X %02X %02X %02X %02X\n",
+   lapb_dbg(1, "(%p) S3 RX FRMR(%d) %5ph\n",
 lapb->dev, frame->pf,
-skb->data[0], skb->data[1], skb->data[2],
-skb->data[3], skb->data[4]);
+skb->data);
lapb_establish_data_link(lapb);
lapb_dbg(0, "(%p) S3 -> S1\n", lapb->dev);
lapb_requeue_frames(lapb);
diff --git a/net/lapb/lapb_out.c b/net/lapb/lapb_out.c
index ba4d015..482c94d 100644
--- a/net/lapb/lapb_out.c
+++ b/net/lapb/lapb_out.c
@@ -148,9 +148,7 @@ void lapb_transmit_buffer(struct lapb_cb *lapb, struct 
sk_buff *skb, int type)
}
}
 
-   lapb_dbg(2, "(%p) S%d TX %02X %02X %02X\n",
-lapb->dev, lapb->state,
-skb->data[0], skb->data[1], skb->data[2]);
+   lapb_dbg(2, "(%p) S%d TX %3ph\n", lapb->dev, lapb->state, skb->data);
 
if (!lapb_data_transmit(lapb, skb))
kfree_skb(skb);
diff --git a/net/lapb/lapb_subr.c b/net/lapb/lapb_subr.c
index 9d0a426..3c1914d 100644
--- a/net/lapb/lapb_subr.c
+++ b/net/lapb/lapb_subr.c
@@ -113,9 +113,7 @@ int lapb_decode(struct lapb_cb *lapb, struct sk_buff *skb,
 {
frame->type = LAPB_ILLEGAL;
 
-   lapb_dbg(2, "(%p) S%d RX %02X %02X %02X\n",
-lapb->dev, lapb->state,
-skb->data[0], skb->data[1], skb->data[2]);
+   lapb_dbg(2, "(%p) S%d RX %3ph\n", lapb->dev, lapb->state, skb->data);
 
/* We always need to look at 2 bytes, sometimes we need
 * to look at 3 and those cases are handled below.
@@ -284,10 +282,9 @@ void lapb_transmit_frmr(struct lapb_cb *lapb)
dptr++;
*dptr++ = lapb->frmr_type;
 
-   lapb_dbg(1, "(%p) S%d TX FRMR %02X %02X %02X %02X %02X\n",
+   lapb_dbg(1, "(%p) S%d TX FRMR %5ph\n",
 lapb->dev, lapb->state,
-skb->data[1], skb->data[2], skb->data[3],
-skb->data[4], skb->data[5]);
+>data[1]);
} else {
dptr= skb_put(skb, 4);
*dptr++ = LAPB_FRMR;
@@ -299,9 +296,8 @@ void lapb_transmit_frmr(struct lapb_cb *lapb)
dptr++;
*dptr++ = lapb->frmr_type;
 
-   lapb_dbg(1, "(%p) S%d TX FRMR %02X %02X %02X\n",
-lapb->dev, lapb->state, skb->data[1],
-skb->data[2], skb->data[3]);
+   lapb_dbg(1, "(%p) S%d TX FRMR %3ph\n",
+lapb->dev, lapb->state, >data[1]);
}
 
lapb_transmit_buffer(lapb, skb, LAPB_RESPONSE);
-- 
2.8.1

Re: usbnet: smsc95xx: fix link detection for disabled autonegotiation

2016-05-26 Thread Christoph Fritz

On Thu, 2016-05-26 at 04:31 +0200, Andrew Lunn wrote:
> On Thu, May 26, 2016 at 04:06:47AM +0200, Christoph Fritz wrote:
> > To detect link status up/down for connections where autonegotiation is
> > explicitly disabled, we don't get an irq but need to poll the status
> > register for link up/down detection.
> > This patch adds a workqueue to poll for link status.
> 
> Did you consider using the phylib? It probably does the needed polling
> already, and it looks like the functions needed to implement an MDIO
> bus are already in place.

smsc95xx supports a relative wide range of PHYs which I don't have
access to in regard of testing. So I prefer the least invasive one (with
this patch) as mostly all of the other usbnet drivers do.

A merge to phylib while paying attention to all the suspend modes and
testing the wide range of PHYs would surely be the right thing to do.

Any thoughts on that?

Thanks
  -- Christoph

Re: [PATCH percpu/for-4.7-fixes 2/2] percpu: fix synchronization between synchronous map extension and chunk destruction

2016-05-26 Thread Vlastimil Babka


On 05/25/2016 05:45 PM, Tejun Heo wrote:

For non-atomic allocations, pcpu_alloc() can try to extend the area
map synchronously after dropping pcpu_lock; however, the extension
wasn't synchronized against chunk destruction and the chunk might get
freed while extension is in progress.

This patch fixes the bug by putting most of non-atomic allocations
under pcpu_alloc_mutex to synchronize against pcpu_balance_work which
is responsible for async chunk management including destruction.

Signed-off-by: Tejun Heo 
Reported-and-tested-by: Alexei Starovoitov 
Reported-by: Vlastimil Babka 
Reported-by: Sasha Levin 
Cc: sta...@vger.kernel.org # v3.18+
Fixes: 1a4d76076cda ("percpu: implement asynchronous chunk population")


Didn't spot any problems this time.

Thanks

Re: [PATCH net 0/4] net/mlx4_en: fix stats

2016-05-26 Thread Tariq Toukan


Hi Eric,


mlx4 has various bugs in its ndo_get_stats() and related functions.
This patch series address the obvious issues.
Remaining ones will be discussed later.


Thanks for the fixes.
I see they were already applied.
I reviewed them all and replied to patch 4/4, the rest look good to me.
Please CC me as the maintainer of mlx4_en on future patches.

Regards,
Tariq

Re: [PATCH net 4/4] net/mlx4_en: get rid of private net_device_stats

2016-05-26 Thread Tariq Toukan


Hi Eric,

Thanks for the fix.


We do not need to clear fields that are already 0.
Why is it always true that dev->stats is already 0 at the point ndo_open 
is called?
Is it true also in a flow of open -> stop -> open? I searched the kernel 
stack for this but couldn't find.

@@ -1877,7 +1877,6 @@ static void mlx4_en_clear_stats(struct net_device *dev)
if (mlx4_en_DUMP_ETH_STATS(mdev, priv->port, 1))
en_dbg(HW, priv, "Failed dumping statistics\n");
  
-	memset(>stats, 0, sizeof(priv->stats));

memset(>pstats, 0, sizeof(priv->pstats));
memset(>pkstats, 0, sizeof(priv->pkstats));
memset(>port_stats, 0, sizeof(priv->port_stats));
The role of this function is to clear the stats, no matter when and 
where it is called.
I am aware that clearing the stats structure might be redundant today, 
as the function is called only within mlx4_en_open, but we might want to 
call the function in other flows in the future.


Regards,
Tariq

Re: [PATCH net v2] virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv

2016-05-26 Thread Michael S. Tsirkin

On Thu, May 26, 2016 at 02:27:59PM +0800, wangyunjian wrote:
> In function virtnet_open() and virtnet_probe(), func try_fill_recv()
> will be executed at the same time. VQ in virtqueue_add() is not protected
> well and BUG_ON will be triggered when virito_net.ko being removed.
> 
> Signed-off-by: Yunjian Wang 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/net/virtio_net.c | 18 ++
>  1 file changed, 2 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 49d84e5..e0638e5 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1925,24 +1925,11 @@ static int virtnet_probe(struct virtio_device *vdev)
> 
> virtio_device_ready(vdev);
> 
> -   /* Last of all, set up some receive buffers. */
> -   for (i = 0; i < vi->curr_queue_pairs; i++) {
> -   try_fill_recv(vi, >rq[i], GFP_KERNEL);
> -
> -   /* If we didn't even get one input buffer, we're useless. */
> -   if (vi->rq[i].vq->num_free ==
> -   virtqueue_get_vring_size(vi->rq[i].vq)) {
> -   free_unused_bufs(vi);
> -   err = -ENOMEM;
> -   goto free_recv_bufs;
> -   }
> -   }
> -
> vi->nb.notifier_call = _cpu_callback;
> err = register_hotcpu_notifier(>nb);
> if (err) {
> pr_debug("virtio_net: registering cpu notifier failed\n");
> -   goto free_recv_bufs;
> +   goto free_unregister_netdev;
> }
> 
> /* Assume link up if device can't report link status,
> @@ -1960,10 +1947,9 @@ static int virtnet_probe(struct virtio_device *vdev)
> 
> return 0;
> 
> -free_recv_bufs:
> +free_unregister_netdev:
> vi->vdev->config->reset(vdev);
> 
> -   free_receive_bufs(vi);
> unregister_netdev(dev);
>  free_vqs:
> cancel_delayed_work_sync(>refill);
> --
> 1.7.12.4
>

Re: [PATCH net v2] virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv

2016-05-26 Thread Jason Wang




On 2016年05月26日 14:27, wangyunjian wrote:

In function virtnet_open() and virtnet_probe(), func try_fill_recv()
will be executed at the same time. VQ in virtqueue_add() is not protected
well and BUG_ON will be triggered when virito_net.ko being removed.

Signed-off-by: Yunjian Wang 
---
  drivers/net/virtio_net.c | 18 ++
  1 file changed, 2 insertions(+), 16 deletions(-)


The patch is needed for stable.

Acked-by: Jason Wang 



diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 49d84e5..e0638e5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1925,24 +1925,11 @@ static int virtnet_probe(struct virtio_device *vdev)

 virtio_device_ready(vdev);

-   /* Last of all, set up some receive buffers. */
-   for (i = 0; i < vi->curr_queue_pairs; i++) {
-   try_fill_recv(vi, >rq[i], GFP_KERNEL);
-
-   /* If we didn't even get one input buffer, we're useless. */
-   if (vi->rq[i].vq->num_free ==
-   virtqueue_get_vring_size(vi->rq[i].vq)) {
-   free_unused_bufs(vi);
-   err = -ENOMEM;
-   goto free_recv_bufs;
-   }
-   }
-
 vi->nb.notifier_call = _cpu_callback;
 err = register_hotcpu_notifier(>nb);
 if (err) {
 pr_debug("virtio_net: registering cpu notifier failed\n");
-   goto free_recv_bufs;
+   goto free_unregister_netdev;
 }

 /* Assume link up if device can't report link status,
@@ -1960,10 +1947,9 @@ static int virtnet_probe(struct virtio_device *vdev)

 return 0;

-free_recv_bufs:
+free_unregister_netdev:
 vi->vdev->config->reset(vdev);

-   free_receive_bufs(vi);
 unregister_netdev(dev);
  free_vqs:
 cancel_delayed_work_sync(>refill);
--
1.7.12.4

Re: [PATCH percpu/for-4.7-fixes 1/2] percpu: fix synchronization between chunk->map_extend_work and chunk destruction

2016-05-26 Thread Vlastimil Babka


On 05/25/2016 05:44 PM, Tejun Heo wrote:

Atomic allocations can trigger async map extensions which is serviced
by chunk->map_extend_work.  pcpu_balance_work which is responsible for
destroying idle chunks wasn't synchronizing properly against
chunk->map_extend_work and may end up freeing the chunk while the work
item is still in flight.

This patch fixes the bug by rolling async map extension operations
into pcpu_balance_work.

Signed-off-by: Tejun Heo 
Reported-and-tested-by: Alexei Starovoitov 
Reported-by: Vlastimil Babka 
Reported-by: Sasha Levin 
Cc: sta...@vger.kernel.org # v3.18+
Fixes: 9c824b6a172c ("percpu: make sure chunk->map array has available space")


I didn't spot issues, but I'm not that familiar with the code, so it doesn't 
mean much. Just one question below:



---
 mm/percpu.c |   57 -
 1 file changed, 36 insertions(+), 21 deletions(-)

--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -112,7 +112,7 @@ struct pcpu_chunk {
int map_used;   /* # of map entries used before 
the sentry */
int map_alloc;  /* # of map entries allocated */
int *map;   /* allocation map */
-   struct work_struct  map_extend_work;/* async ->map[] extension */
+   struct list_headmap_extend_list;/* on pcpu_map_extend_chunks */

void*data;  /* chunk data */
int first_free; /* no free below this */
@@ -166,6 +166,9 @@ static DEFINE_MUTEX(pcpu_alloc_mutex);  /

 static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */

+/* chunks which need their map areas extended, protected by pcpu_lock */
+static LIST_HEAD(pcpu_map_extend_chunks);
+
 /*
  * The number of empty populated pages, protected by pcpu_lock.  The
  * reserved chunk doesn't contribute to the count.
@@ -395,13 +398,19 @@ static int pcpu_need_to_extend(struct pc
 {
int margin, new_alloc;

+   lockdep_assert_held(_lock);
+
if (is_atomic) {
margin = 3;

if (chunk->map_alloc <
-   chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW &&
-   pcpu_async_enabled)
-   schedule_work(>map_extend_work);
+   chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) {
+   if (list_empty(>map_extend_list)) {


So why this list_empty condition? Doesn't it deserve a comment then? And isn't 
using a list an overkill in that case?


Thanks.


+   list_add_tail(>map_extend_list,
+ _map_extend_chunks);
+   pcpu_schedule_balance_work();
+   }
+   }
} else {
margin = PCPU_ATOMIC_MAP_MARGIN_HIGH;
}


[...]

Re: [PATCH RFT 1/2] phylib: add device reset GPIO support

2016-05-26 Thread Linus Walleij

On Thu, May 12, 2016 at 8:42 PM, Uwe Kleine-König
 wrote:

> [added Linus Walleij to Cc, there is a question for you/him below]
(...)
>> +void mdio_device_reset(struct mdio_device *mdiodev, int value)
>> +{
>> + if (mdiodev->reset)
>> + gpiod_set_value(mdiodev->reset, value);
>
> Before v4.6-rc1~108^2~91 it was not necessary to check for the first
> parameter being non-NULL before calling gpiod_set_value. Linus, did you
> change this on purpose?

Not really. And AFAICT it is still not necessary: what changed is that
an error message will be printed by VALIDATE_DESC() if you do that.
And that is proper I guess? I think it's sloppy code to randomly pass in
NULL to a call and just expect it to bail out, it seems more like
exercising the error path than something you'd normally rely on.

Or am I getting things wrong?

Yours,
Linus Walleij

Re: [PATCH v2] net: alx: use custom skb allocator

2016-05-26 Thread Feng Tang

On Wed, May 25, 2016 at 07:53:41PM -0400, David Miller wrote:
> From: Feng Tang 
> Date: Wed, 25 May 2016 14:49:54 +0800
> 
> > This patch follows Eric Dumazet's commit 7b70176421 for Atheros
> > atl1c driver to fix one exactly same bug in alx driver, that the
> > network link will be lost in 1-5 minutes after the device is up.
> > 
> > My laptop Lenovo Y580 with Atheros AR8161 ethernet device hit the
> > same problem with kernel 4.4, and it will be cured by Jarod Wilson's
> > commit c406700c for alx driver which get merged in 4.5. But there
> > are still some alx devices can't function well even with Jarod's
> > patch, while this patch could make them work fine. More details on
> > https://bugzilla.kernel.org/show_bug.cgi?id=70761
> > 
> > The debug shows the issue is very likely to be related with the RX
> > DMA address, specifically 0x...f80, if RX buffer get 0x...f80 several
> > times, their will be RX overflow error and device will stop working.
> > 
> > For kernel 4.5.0 with Jarod's patch which works fine with my
> > AR8161/Lennov Y580, if I made some change to the
> > __netdev_alloc_skb
> > --> __alloc_page_frag()
> > to make the allocated buffer can get an address with 0x...f80,
> > then the same error happens. If I make it to 0x...f40 or 0xfc0,
> > everything will be still fine. So I tend to believe that the
> > 0x..f80 address cause the silicon to behave abnormally.
> > 
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
> > Cc: Eric Dumazet 
> > Cc: Johannes Berg 
> > Cc: Jarod Wilson 
> > Signed-off-by: Feng Tang 
> > Tested-by: Ole Lukoie 
> 
> Looks good, applied, thanks.

Thanks for reviewing and taking it.

> 
> But now that we have at least two instances of this code we really
> need to put a common version somewhere. :-/

I agree, and furthermore I noticed there are some similar routines
in the 4 individual Atheros drivers atlx/alx/atl1c/atl1e, which may
be unified by a simple framework for them all. Maybe the driver
maintainer from Atheros could take a look, as they can reach all
the real HWs :)

Thanks,
Feng

Re: [PATCH V3] brcmfmac: print error if p2p_ifadd firmware command fails

2016-05-26 Thread Arend Van Spriel

On 26-5-2016 0:44, Rafał Miłecki wrote:
> This is helpful for debugging, without this all I was getting from "iw"
> command on device with BCM43602 was:
>> command failed: Too many open files in system (-23)
> 
> Signed-off-by: Rafał Miłecki 
> ---
> V2: s/in/if/ in commit message
> V3: Add one more error message as suggested by Arend
> ---
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> index 1652a48..bc26aec 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> @@ -2031,7 +2031,7 @@ static int brcmf_p2p_request_p2p_if(struct 
> brcmf_p2p_info *p2p,
>   err = brcmf_fil_iovar_data_set(ifp, "p2p_ifadd", _request,
>  sizeof(if_request));
>   if (err)
> - return err;
> + brcmf_err("p2p_ifadd failed %d\n", err);
>  
>   return err;
>  }
> @@ -2185,6 +2185,7 @@ struct wireless_dev *brcmf_p2p_add_vif(struct wiphy 
> *wiphy, const char *name,
>   err = brcmf_p2p_request_p2p_if(>p2p, ifp, cfg->p2p.int_addr,
>  iftype);
>   if (err) {
> + brcmf_err("Failed to request P2P virtual interface %s\n", name);
>   brcmf_cfg80211_arm_vif_event(cfg, NULL);
>   goto fail;
>   }

Not exactly what I meant. I meant instead of and the function I would
like the error message in is brcmf_cfg80211_add_iface() so you also get
an error message when AP interface creation fails.

Regards,
Arend

[PATCH net 3/8] qede: Don't expose self-test for VFs

2016-05-26 Thread Yuval Mintz

PFs and VFs differ in their registered ethtool operations,
but they're using a common function for get_sset_count().
As a result, `ethtool -i' for a VF would indicate it supports
selftest, although that's not the case.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 1bc7535..ad3cae3 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -230,7 +230,10 @@ static int qede_get_sset_count(struct net_device *dev, int 
stringset)
case ETH_SS_PRIV_FLAGS:
return QEDE_PRI_FLAG_LEN;
case ETH_SS_TEST:
-   return QEDE_ETHTOOL_TEST_MAX;
+   if (!IS_VF(edev))
+   return QEDE_ETHTOOL_TEST_MAX;
+   else
+   return 0;
default:
DP_VERBOSE(edev, QED_MSG_DEBUG,
   "Unsupported stringset 0x%08x\n", stringset);
-- 
1.9.3

[PATCH net 6/8] qed: Add missing 100g init mode

2016-05-26 Thread Yuval Mintz

Some of the HW configurations are currently missing for 100g devices.
This can cause various classification issues, as well as prevent device
from fully reaching line-rate.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 920eadd..2a7c875 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -588,7 +588,14 @@ static void qed_calc_hw_mode(struct qed_hwfn *p_hwfn)
 
hw_mode |= 1 << MODE_ASIC;
 
+   if (p_hwfn->cdev->num_hwfns > 1)
+   hw_mode |= 1 << MODE_100G;
+
p_hwfn->hw_info.hw_mode = hw_mode;
+
+   DP_VERBOSE(p_hwfn, (NETIF_MSG_PROBE | NETIF_MSG_IFUP),
+  "Configuring function for hw_mode: 0x%08x\n",
+  p_hwfn->hw_info.hw_mode);
 }
 
 /* Init run time data for all PFs on an engine. */
-- 
1.9.3

[PATCH net 7/8] qed: Prevent 100g from working in MSI

2016-05-26 Thread Yuval Mintz

From: Sudarsana Reddy Kalluru 

Adapter can support 100g in both MSIx and INTa, but not in MSI.

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dev.c  |  5 +
 drivers/net/ethernet/qlogic/qed/qed_main.c | 18 ++
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 2a7c875..579c6d5 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -835,6 +835,11 @@ int qed_hw_init(struct qed_dev *cdev,
u32 load_code, param;
int rc, mfw_rc, i;
 
+   if ((int_mode == QED_INT_MODE_MSI) && (cdev->num_hwfns > 1)) {
+   DP_NOTICE(cdev, "MSI mode is not supported for CMT devices\n");
+   return -EINVAL;
+   }
+
if (IS_PF(cdev)) {
rc = qed_init_fw_data(cdev, bin_fw_data);
if (rc != 0)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 8b22f87..7530646 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -413,15 +413,17 @@ static int qed_set_int_mode(struct qed_dev *cdev, bool 
force_mode)
/* Fallthrough */
 
case QED_INT_MODE_MSI:
-   rc = pci_enable_msi(cdev->pdev);
-   if (!rc) {
-   int_params->out.int_mode = QED_INT_MODE_MSI;
-   goto out;
-   }
+   if (cdev->num_hwfns == 1) {
+   rc = pci_enable_msi(cdev->pdev);
+   if (!rc) {
+   int_params->out.int_mode = QED_INT_MODE_MSI;
+   goto out;
+   }
 
-   DP_NOTICE(cdev, "Failed to enable MSI\n");
-   if (force_mode)
-   goto out;
+   DP_NOTICE(cdev, "Failed to enable MSI\n");
+   if (force_mode)
+   goto out;
+   }
/* Fallthrough */
 
case QED_INT_MODE_INTA:
-- 
1.9.3

[PATCH net 4/8] qed: Fix allocation in interrupt context

2016-05-26 Thread Yuval Mintz

From: Sudarsana Reddy Kalluru 

Commit 39651abd2814 ("qed: add support for dcbx") is re-configuring
the QM hw-block as part of its sequence. This is done in attention
handling context which is non-sleepable, yet memory is allocated in
this flow using GFP_KERNEL.

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 089016f..7aeed2f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -155,7 +155,7 @@ void qed_resc_free(struct qed_dev *cdev)
}
 }
 
-static int qed_init_qm_info(struct qed_hwfn *p_hwfn)
+static int qed_init_qm_info(struct qed_hwfn *p_hwfn, bool b_sleepable)
 {
u8 num_vports, vf_offset = 0, i, vport_id, num_ports, curr_queue = 0;
struct qed_qm_info *qm_info = _hwfn->qm_info;
@@ -182,23 +182,28 @@ static int qed_init_qm_info(struct qed_hwfn *p_hwfn)
 
/* PQs will be arranged as follows: First per-TC PQ then pure-LB quete.
 */
-   qm_info->qm_pq_params = kzalloc(sizeof(*qm_info->qm_pq_params) *
-   num_pqs, GFP_KERNEL);
+   qm_info->qm_pq_params = kcalloc(num_pqs,
+   sizeof(struct init_qm_pq_params),
+   b_sleepable ? GFP_KERNEL : GFP_ATOMIC);
if (!qm_info->qm_pq_params)
goto alloc_err;
 
-   qm_info->qm_vport_params = kzalloc(sizeof(*qm_info->qm_vport_params) *
-  num_vports, GFP_KERNEL);
+   qm_info->qm_vport_params = kcalloc(num_vports,
+  sizeof(struct init_qm_vport_params),
+  b_sleepable ? GFP_KERNEL
+  : GFP_ATOMIC);
if (!qm_info->qm_vport_params)
goto alloc_err;
 
-   qm_info->qm_port_params = kzalloc(sizeof(*qm_info->qm_port_params) *
- MAX_NUM_PORTS, GFP_KERNEL);
+   qm_info->qm_port_params = kcalloc(MAX_NUM_PORTS,
+ sizeof(struct init_qm_port_params),
+ b_sleepable ? GFP_KERNEL
+ : GFP_ATOMIC);
if (!qm_info->qm_port_params)
goto alloc_err;
 
-   qm_info->wfq_data = kcalloc(num_vports, sizeof(*qm_info->wfq_data),
-   GFP_KERNEL);
+   qm_info->wfq_data = kcalloc(num_vports, sizeof(struct qed_wfq_data),
+   b_sleepable ? GFP_KERNEL : GFP_ATOMIC);
if (!qm_info->wfq_data)
goto alloc_err;
 
@@ -299,7 +304,7 @@ int qed_qm_reconf(struct qed_hwfn *p_hwfn, struct qed_ptt 
*p_ptt)
qed_qm_info_free(p_hwfn);
 
/* initialize qed's qm data structure */
-   rc = qed_init_qm_info(p_hwfn);
+   rc = qed_init_qm_info(p_hwfn, false);
if (rc)
return rc;
 
@@ -388,7 +393,7 @@ int qed_resc_alloc(struct qed_dev *cdev)
goto alloc_err;
 
/* Prepare and process QM requirements */
-   rc = qed_init_qm_info(p_hwfn);
+   rc = qed_init_qm_info(p_hwfn, true);
if (rc)
goto alloc_err;
 
-- 
1.9.3

[PATCH net 8/8] qed: Don't config min BW on 100g on link flap

2016-05-26 Thread Yuval Mintz

Currently 100g devices don't support minimum/maximum BW configurations,
yet link flaps might cause the driver to attempt to do such a
configuration. Prevent this just as we do for the maximum BW.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 579c6d5..2d89e8c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -2105,6 +2105,13 @@ void qed_configure_vp_wfq_on_link_change(struct qed_dev 
*cdev, u32 min_pf_rate)
 {
int i;
 
+   if (cdev->num_hwfns > 1) {
+   DP_VERBOSE(cdev,
+  NETIF_MSG_LINK,
+  "WFQ configuration is not supported for this 
device\n");
+   return;
+   }
+
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = >hwfns[i];
 
-- 
1.9.3

[PATCH net 2/8] qede: Reload on GRO changes

2016-05-26 Thread Yuval Mintz

Since driver is using a FW-based GRO implementation, this has some
effects on its ability to cope with GRO enablement/disablement.
As a result, driver must perform an inner-reload as a result of a state
change in the offload configuration of said feature.

[Failure to do so means network stack would continue to receive
aggregated packets even though user requested the feature to be disabled].

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 3bb1428..5d00d14 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -2091,6 +2091,29 @@ static void qede_vlan_mark_nonconfigured(struct qede_dev 
*edev)
edev->accept_any_vlan = false;
 }
 
+int qede_set_features(struct net_device *dev, netdev_features_t features)
+{
+   struct qede_dev *edev = netdev_priv(dev);
+   netdev_features_t changes = features ^ dev->features;
+   bool need_reload = false;
+
+   /* No action needed if hardware GRO is disabled during driver load */
+   if (changes & NETIF_F_GRO) {
+   if (dev->features & NETIF_F_GRO)
+   need_reload = !edev->gro_disable;
+   else
+   need_reload = edev->gro_disable;
+   }
+
+   if (need_reload && netif_running(edev->ndev)) {
+   dev->features = features;
+   qede_reload(edev, NULL, NULL);
+   return 1;
+   }
+
+   return 0;
+}
+
 #ifdef CONFIG_QEDE_VXLAN
 static void qede_add_vxlan_port(struct net_device *dev,
sa_family_t sa_family, __be16 port)
@@ -2175,6 +2198,7 @@ static const struct net_device_ops qede_netdev_ops = {
 #endif
.ndo_vlan_rx_add_vid = qede_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = qede_vlan_rx_kill_vid,
+   .ndo_set_features = qede_set_features,
.ndo_get_stats64 = qede_get_stats64,
 #ifdef CONFIG_QED_SRIOV
.ndo_set_vf_link_state = qede_set_vf_link_state,
-- 
1.9.3

[PATCH net 5/8] qed: Save min/max accross dcbx-change

2016-05-26 Thread Yuval Mintz

When DCBx re-negotiation is occurring, the PF's configurations for
maximum and minimum bandwidth guarantees are currently lost.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 7aeed2f..920eadd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -161,6 +161,8 @@ static int qed_init_qm_info(struct qed_hwfn *p_hwfn, bool 
b_sleepable)
struct qed_qm_info *qm_info = _hwfn->qm_info;
struct init_qm_port_params *p_qm_port;
u16 num_pqs, multi_cos_tcs = 1;
+   u8 pf_wfq = qm_info->pf_wfq;
+   u32 pf_rl = qm_info->pf_rl;
u16 num_vfs = 0;
 
 #ifdef CONFIG_QED_SRIOV
@@ -269,10 +271,10 @@ static int qed_init_qm_info(struct qed_hwfn *p_hwfn, bool 
b_sleepable)
for (i = 0; i < qm_info->num_vports; i++)
qm_info->qm_vport_params[i].vport_wfq = 1;
 
-   qm_info->pf_wfq = 0;
-   qm_info->pf_rl = 0;
qm_info->vport_rl_en = 1;
qm_info->vport_wfq_en = 1;
+   qm_info->pf_rl = pf_rl;
+   qm_info->pf_wfq = pf_wfq;
 
return 0;
 
-- 
1.9.3

[PATCH net 1/8] qede: Fix VF minimum BW setting

2016-05-26 Thread Yuval Mintz

VF is currently ignoring the minimum provided by the API,
mistakenly using the maximum for minimum as well.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 337e839..3bb1428 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1824,7 +1824,7 @@ static int qede_set_vf_rate(struct net_device *dev, int 
vfidx,
 {
struct qede_dev *edev = netdev_priv(dev);
 
-   return edev->ops->iov->set_rate(edev->cdev, vfidx, max_tx_rate,
+   return edev->ops->iov->set_rate(edev->cdev, vfidx, min_tx_rate,
max_tx_rate);
 }
 
-- 
1.9.3

[PATCH net 0/8] qed*: Bug fixes

2016-05-26 Thread Yuval Mintz

This series contain several small fixes, most of which deal with
either 100g support, sriov or bandwidth configurations.

Dave,

Please consider applying this to `net'.

Thanks,
Yuval

Sudarsana Reddy Kalluru (2):
  qed: Fix allocation in interrupt context
  qed: Prevent 100g from working in MSI

Yuval Mintz (6):
  qede: Fix VF minimum BW setting
  qede: Reload on GRO changes
  qede: Don't expose self-test for VFs
  qed: Save min/max accross dcbx-change
  qed: Add missing 100g init mode
  qed: Don't config min BW on 100g on link flap

 drivers/net/ethernet/qlogic/qed/qed_dev.c   | 52 ++---
 drivers/net/ethernet/qlogic/qed/qed_main.c  | 18 +
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  5 ++-
 drivers/net/ethernet/qlogic/qede/qede_main.c| 26 -
 4 files changed, 78 insertions(+), 23 deletions(-)

-- 
1.9.3

Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

2016-05-26 Thread Ulf Hansson

On 26 May 2016 at 06:05, Yangbo Lu  wrote:
> Hi Uffe,
>
> Could we merge this patchset? ...
> It has been a long time to wait for Arnd's response...
>
> Thanks a lot.
>
>

As we are still in the merge window I won't queue anything but fixes.
Let's give Arnd another week or so to respond.

Kind regards
Uffe

> Best regards,
> Yangbo Lu
>
>
>> -Original Message-
>> From: Yangbo Lu
>> Sent: Friday, May 20, 2016 2:06 PM
>> To: 'Scott Wood'; Arnd Bergmann; linux-arm-ker...@lists.infradead.org
>> Cc: linuxppc-...@lists.ozlabs.org; Mark Rutland;
>> devicet...@vger.kernel.org; ulf.hans...@linaro.org; Russell King; Bhupesh
>> Sharma; netdev@vger.kernel.org; Joerg Roedel; Kumar Gala; linux-
>> m...@vger.kernel.org; linux-ker...@vger.kernel.org; Yang-Leo Li;
>> io...@lists.linux-foundation.org; Rob Herring; linux-...@vger.kernel.org;
>> Claudiu Manoil; Santosh Shilimkar; Xiaobo Xie; linux-...@vger.kernel.org;
>> Qiang Zhao
>> Subject: RE: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-
>> R1.0-R2.0
>>
>> Hi Arnd,
>>
>> Any comments?
>> Please reply when you see the email since we want this eSDHC issue to be
>> fixed soon.
>>
>> All the patches are Freescale-specific and is to fix the eSDHC driver.
>> Could we let them merged first if you were talking about a new way of
>> abstracting getting SoC version.
>>
>>
>> Thanks :)
>>
>>
>> Best regards,
>> Yangbo Lu
>>
>

Re: [PATCH RESEND 7/8] pipe: account to kmemcg

2016-05-26 Thread Minchan Kim

On Wed, May 25, 2016 at 01:30:11PM +0300, Vladimir Davydov wrote:
> On Tue, May 24, 2016 at 01:04:33PM -0700, Eric Dumazet wrote:
> > On Tue, 2016-05-24 at 19:13 +0300, Vladimir Davydov wrote:
> > > On Tue, May 24, 2016 at 05:59:02AM -0700, Eric Dumazet wrote:
> > > ...
> > > > > +static int anon_pipe_buf_steal(struct pipe_inode_info *pipe,
> > > > > +struct pipe_buffer *buf)
> > > > > +{
> > > > > + struct page *page = buf->page;
> > > > > +
> > > > > + if (page_count(page) == 1) {
> > > > 
> > > > This looks racy : some cpu could have temporarily elevated page count.
> > > 
> > > All pipe operations (pipe_buf_operations->get, ->release, ->steal) are
> > > supposed to be called under pipe_lock. So, if we see a pipe_buffer->page
> > > with refcount of 1 in ->steal, that means that we are the only its user
> > > and it can't be spliced to another pipe.
> > > 
> > > In fact, I just copied the code from generic_pipe_buf_steal, adding
> > > kmemcg related checks along the way, so it should be fine.
> > 
> > So you guarantee that no other cpu might have done
> > get_page_unless_zero() right before this test ?
> 
> Each pipe_buffer holds a reference to its page. If we find page's
> refcount to be 1 here, then it can be referenced only by our
> pipe_buffer. And the refcount cannot be increased by a parallel thread,
> because we hold pipe_lock, which rules out splice, and otherwise it's
> impossible to reach the page as it is not on lru. That said, I think I
> guarantee that this should be safe.

I don't know kmemcg internal and pipe stuff so my comment might be
totally crap.

No one cannot guarantee any CPU cannot held a reference of a page.
Look at get_page_unless_zero usecases.

1. balloon_page_isolate

It can hold a reference in random page and then verify the page
is balloon page. Otherwise, just put.

2. page_idle_get_page

It has PageLRU check but it's racy so it can hold a reference
of randome page and then verify within zone->lru_lock. If it's
not LRU page, just put.

[patch] ptp: oops in ptp_ioctl()

2016-05-26 Thread Dan Carpenter

If we pass ERR_PTR(-EFAULT) to kfree() then it's going to oops.

Fixes: 2ece068e1b1d ('ptp: use memdup_user().')
Signed-off-by: Dan Carpenter 

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 0b1ac6b..d637c93 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -211,6 +211,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
sysoff = memdup_user((void __user *)arg, sizeof(*sysoff));
if (IS_ERR(sysoff)) {
err = PTR_ERR(sysoff);
+   sysoff = NULL;
break;
}
if (sysoff->n_samples > PTP_MAX_SAMPLES) {

[PATCH net v2] virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv

2016-05-26 Thread wangyunjian

In function virtnet_open() and virtnet_probe(), func try_fill_recv()
will be executed at the same time. VQ in virtqueue_add() is not protected
well and BUG_ON will be triggered when virito_net.ko being removed.

Signed-off-by: Yunjian Wang 
---
 drivers/net/virtio_net.c | 18 ++
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 49d84e5..e0638e5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1925,24 +1925,11 @@ static int virtnet_probe(struct virtio_device *vdev)

virtio_device_ready(vdev);

-   /* Last of all, set up some receive buffers. */
-   for (i = 0; i < vi->curr_queue_pairs; i++) {
-   try_fill_recv(vi, >rq[i], GFP_KERNEL);
-
-   /* If we didn't even get one input buffer, we're useless. */
-   if (vi->rq[i].vq->num_free ==
-   virtqueue_get_vring_size(vi->rq[i].vq)) {
-   free_unused_bufs(vi);
-   err = -ENOMEM;
-   goto free_recv_bufs;
-   }
-   }
-
vi->nb.notifier_call = _cpu_callback;
err = register_hotcpu_notifier(>nb);
if (err) {
pr_debug("virtio_net: registering cpu notifier failed\n");
-   goto free_recv_bufs;
+   goto free_unregister_netdev;
}

/* Assume link up if device can't report link status,
@@ -1960,10 +1947,9 @@ static int virtnet_probe(struct virtio_device *vdev)

return 0;

-free_recv_bufs:
+free_unregister_netdev:
vi->vdev->config->reset(vdev);

-   free_receive_bufs(vi);
unregister_netdev(dev);
 free_vqs:
cancel_delayed_work_sync(>refill);
--
1.7.12.4

答复: [PATCH] virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv

2016-05-26 Thread wangyunjian

Jason Wang,

Thanks for your comments.

I make new patch, according to your comments.


-邮件原件-
发件人: Jason Wang [mailto:jasow...@redhat.com] 
发送时间: 2016年5月26日 9:41
收件人: wangyunjian; da...@davemloft.net; netdev@vger.kernel.org; m...@redhat.com; 
Liuyongan
主题: Re: [PATCH] virtio_net: fix virtnet_open and virtnet_probe competing for 
try_fill_recv



On 2016年05月25日 20:33, wangyunjian wrote:
> In function virtnet_open() and virtnet_probe(), func try_fill_recv() will be 
> executed at the same time. VQ in virtqueue_add() is not protected well and 
> BUG_ON will be triggered when virito_net.ko being removed.
>
> Test Script:
> for (( i=0; i<500; i=i+1 )); do
>   rmmod virtio_net
>   modprobe virtio_net
>   ifconfig eth0 up
> done
>
> [  302.336996] [ cut here ] [  302.338794] 
> kernel BUG at virtio_ring.c:750!
> [  302.340013] invalid opcode:  [#1] SMP [  302.340013] last sysfs 
> file: /sys/devices/pci:00/:00:03.0/virtio0/device
> [  302.340013] CPU 0
> [  302.340013] Modules linked in: virtio_balloon virtio_net(-) 
> virtio_pci virtio_ring virtio ipv6 af_packet microcode acpiphp 
> pci_hotplug fuse loop dm_mod rtc_cmos tpm_tis rtc_core tpm i2c_piix4 
> rtc_lib container button floppy pcspkr tpm_bios i2c_core joydev sg 
> usbhid hid uhci_hcd ehci_hcd usbcore edd ext3 mbcache jbd fan 
> processor ide_pci_generic piix ide_core ata_generic at a_piix libata 
> thermal thermal_sys hwmon sd_mod crc_t10dif kvm_ivshmem(N) scsi_mod 
> pv_channel(N) [last unloaded: virtio] [  302.340013] Supported: Yes [  
> 302.340013] Pid: 8410, comm: rmmod Tainted: GN  2.6.32.12-0.7-default 
> #1 Standard PC (i440FX + PIIX, 1996) [  302.340013] RIP: 
> 0010:[]  [] 
> virtqueue_detach_unused_buf+0xb9/0xc0 [virtio_ring] [  302.340013] 
> RSP: 0018:88000c7a9e08  EFLAGS: 00010283 [  302.340013] RAX: 
> 00f4 RBX: 0100 RCX: 4d8e [  
> 302.340013] RDX: 880001e0 RSI: 0046 RDI: 
> 81a71570 [  302.340013] RBP: 88000c987000 R08: 
>  R09: 000a [  302.340013] R10:  
> R11:  R12: 0400 [  302.340013] R13: 
>  R14: 7fff92bc1758 R15: 0001 [  302.340013] 
> FS:  7f8b3995d700() GS:880001e0() knlGS: 
> [  302.340013] CS:  0010 DS:  ES:  CR0: 8005003b [  
> 302.340013] CR2: 7fff196433e0 CR3: 0d07e000 CR4: 06f0 
> [  302.340013] DR0:  DR1:  DR2: 
>  [  302.340013] DR3:  DR6: 0ff0 
> DR7: 0400 [  302.340013] Process rmmod (pid: 8410, threadinfo 
> 88000c7a8000, task 88000c7aa200) [  302.340013] Stack:
> [  302.340013]  88000fbb3780  88000c987000 
> a034b178 [  302.340013] <0> 88000fbb3850 88000fbb3780 
> a034efc0 88000fbb3850 [  302.340013] <0>  
> a034b299 88000fbb3780 a034b406 [  302.340013] Call Trace:
> [  302.340013]  [] free_unused_bufs+0x88/0x120 
> [virtio_net] [  302.340013]  [] 
> remove_vq_common+0x19/0x30 [virtio_net] [  302.340013]  
> [] virtnet_remove+0x46/0x80 [virtio_net] [  
> 302.340013]  [] virtio_dev_remove+0x15/0x60 [virtio] 
> [  302.340013]  [] 
> __device_release_driver+0x91/0x110
> [  302.340013]  [] driver_detach+0xa8/0xb0 [  
> 302.340013]  [] bus_remove_driver+0x82/0x110 [  
> 302.340013]  [] sys_delete_module+0x1c4/0x290 [  
> 302.340013]  [] system_call_fastpath+0x16/0x1b [  
> 302.340013]  [<7f8b394c7de7>] 0x7f8b394c7de7 [  302.340013] Code: 
> c3 01 49 83 c4 04 e8 30 10 07 e1 8b 55 38 39 da 77 d0 8b 75 2c 31 c0 
> 48 c7 c7 ba 4b 32 a0 e8 18 10 07 e1 8b 45 2c 3b 45 38 74 82 <0f> 0b eb 
> fe 0f 1f 00 48 83 ec 28 48 89 6c 24 08 48 89 1c 24 48 [  302.340013] 
> RIP  [] virtqueue_detach_unused_buf+0xb9/0xc0 
> [virtio_ring] [  302.340013]  RSP  [  302.438579] 
> ---[ end trace 1e583bdb5b2644f1 ]---
>
>
> Signed-off-by: Yunjian Wang 
> ---
>   drivers/net/virtio_net.c | 4 
>   1 file changed, 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 
> 49d84e5..4528ef8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -818,10 +818,6 @@ static int virtnet_open(struct net_device *dev)
>  int i;
>
>  for (i = 0; i < vi->max_queue_pairs; i++) {
> -   if (i < vi->curr_queue_pairs)
> -   /* Make sure we have some buffers: if oom use wq. */
> -   if (!try_fill_recv(vi, >rq[i], GFP_KERNEL))
> -   schedule_delayed_work(>refill, 0);
>  virtnet_napi_enable(>rq[i]);
>  }
>
> --
> 1.7.12.4
>

Thanks a lot for spotting the issue.

But the fix looks questionable, what if we increase the number of queues before 
we open it? I

94 matches

Mail list logo