date:20170310

[PATCH] block: DAC960: fix ifnullfree.cocci warnings

2017-03-10 Thread kbuild test robot

drivers/block/DAC960.c:441:3-19: WARNING: NULL check before freeing functions 
like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not 
needed. Maybe consider reorganizing relevant code to avoid passing NULL values.
drivers/block/DAC960.c:446:1-17: WARNING: NULL check before freeing functions 
like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not 
needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

CC: Romain Perier 
Signed-off-by: Fengguang Wu 
---

 DAC960.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -437,13 +437,11 @@ static void DAC960_DestroyAuxiliaryStruc
   Controller->CurrentStatusBuffer = NULL;
 }
 
-  if (ScatterGatherPool != NULL)
-   dma_pool_destroy(ScatterGatherPool);
+  dma_pool_destroy(ScatterGatherPool);
   if (Controller->FirmwareType == DAC960_V1_Controller)
return;
 
-  if (RequestSensePool != NULL)
-   dma_pool_destroy(RequestSensePool);
+  dma_pool_destroy(RequestSensePool);
 
   for (i = 0; i < DAC960_MaxLogicalDrives; i++) {
kfree(Controller->V2.LogicalDeviceInformation[i]);

Re: [PATCH v5 01/19] block: DAC960: Replace PCI pool old API

2017-03-10 Thread kbuild test robot

Hi Romain,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.11-rc1 next-20170310]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Romain-Perier/Replace-PCI-pool-by-DMA-pool-API/20170311-133849
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next


coccinelle warnings: (new ones prefixed by >>)

>> drivers/block/DAC960.c:441:3-19: WARNING: NULL check before freeing 
>> functions like kfree, debugfs_remove, debugfs_remove_recursive or 
>> usb_free_urb is not needed. Maybe consider reorganizing relevant code to 
>> avoid passing NULL values.
   drivers/block/DAC960.c:446:1-17: WARNING: NULL check before freeing 
functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb 
is not needed. Maybe consider reorganizing relevant code to avoid passing NULL 
values.

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

[PATCH net] selftests/bpf: fix broken build

2017-03-10 Thread Alexei Starovoitov

Recent merge of 'linux-kselftest-4.11-rc1' tree broke bpf test build.
None of the tests were building and test_verifier.c had tons of compiler errors.
Fix it and add #ifdef CAP_IS_SUPPORTED to support old versions of libcap.
Tested on centos 6.8 and 7

Signed-off-by: Alexei Starovoitov 
---
 tools/include/uapi/linux/bpf_perf_event.h   | 18 ++
 tools/testing/selftests/bpf/Makefile|  4 +++-
 tools/testing/selftests/bpf/test_verifier.c |  4 
 3 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 tools/include/uapi/linux/bpf_perf_event.h

diff --git a/tools/include/uapi/linux/bpf_perf_event.h 
b/tools/include/uapi/linux/bpf_perf_event.h
new file mode 100644
index ..067427259820
--- /dev/null
+++ b/tools/include/uapi/linux/bpf_perf_event.h
@@ -0,0 +1,18 @@
+/* Copyright (c) 2016 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _UAPI__LINUX_BPF_PERF_EVENT_H__
+#define _UAPI__LINUX_BPF_PERF_EVENT_H__
+
+#include 
+#include 
+
+struct bpf_perf_event_data {
+   struct pt_regs regs;
+   __u64 sample_period;
+};
+
+#endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 4b498265dae6..67531f47781b 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -1,12 +1,14 @@
 LIBDIR := ../../../lib
 BPFOBJ := $(LIBDIR)/bpf/bpf.o
 
-CFLAGS += -Wall -O2 -lcap -I../../../include/uapi -I$(LIBDIR)
+CFLAGS += -Wall -O2 -lcap -I../../../include/uapi -I$(LIBDIR) $(BPFOBJ)
 
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map
 
 TEST_PROGS := test_kmod.sh
 
+all: $(TEST_GEN_PROGS)
+
 .PHONY: all clean force
 
 # force a rebuild of BPFOBJ when its dependencies are updated
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index e1f5b9eea1e8..d1555e4240c0 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -8,6 +8,8 @@
  * License as published by the Free Software Foundation.
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -4583,10 +4585,12 @@ static bool is_admin(void)
cap_flag_value_t sysadmin = CAP_CLEAR;
const cap_value_t cap_val = CAP_SYS_ADMIN;
 
+#ifdef CAP_IS_SUPPORTED
if (!CAP_IS_SUPPORTED(CAP_SETFCAP)) {
perror("cap_get_flag");
return false;
}
+#endif
caps = cap_get_proc();
if (!caps) {
perror("cap_get_proc");
-- 
2.8.0

[PATCH] netfilter: Force fake conntrack entry to be at least 8 bytes aligned

2017-03-10 Thread Steven Rostedt (VMware)

Since the nfct and nfctinfo have been combined, the nf_conn structure
must be at least 8 bytes aligned, as the 3 LSB bits are used for the
nfctinfo. But there's a fake nf_conn structure to denote untracked
connections, which is created by a PER_CPU construct. This does not
guarantee that it will be 8 bytes aligned and can break the logic in
determining the correct nfctinfo.

I triggered this on a 32bit machine with the following error:

BUG: unable to handle kernel NULL pointer dereference at 0af4
IP: nf_ct_deliver_cached_events+0x1b/0xfb
*pdpt = 31962001 *pde =  

Oops:  [#1] SMP
[Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 
ip6table_filter ip6_tables ipv6 crc_ccitt ppdev r8169 parport_pc parport
  OK  ]
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-test+ #75
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
task: c126ec00 task.stack: c1258000
EIP: nf_ct_deliver_cached_events+0x1b/0xfb
EFLAGS: 00010202 CPU: 0
EAX: 0021cd01 EBX:  ECX: 27b0c767 EDX: 32bcb17a
ESI: f34135c0 EDI: f34135c0 EBP: f2debd60 ESP: f2debd3c
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 80050033 CR2: 0af4 CR3: 309a0440 CR4: 001406f0
Call Trace:
 
 ? ipv6_skip_exthdr+0xac/0xcb
 ipv6_confirm+0x10c/0x119 [nf_conntrack_ipv6]
 nf_hook_slow+0x22/0xc7
 nf_hook+0x9a/0xad [ipv6]
 ? ip6t_do_table+0x356/0x379 [ip6_tables]
 ? ip6_fragment+0x9e9/0x9e9 [ipv6]
 ip6_output+0xee/0x107 [ipv6]
 ? ip6_fragment+0x9e9/0x9e9 [ipv6]
 dst_output+0x36/0x4d [ipv6]
 NF_HOOK.constprop.37+0xb2/0xba [ipv6]
 ? icmp6_dst_alloc+0x2c/0xfd [ipv6]
 ? local_bh_enable+0x14/0x14 [ipv6]
 mld_sendpack+0x1c5/0x281 [ipv6]
 ? mark_held_locks+0x40/0x5c
 mld_ifc_timer_expire+0x1f6/0x21e [ipv6]
 call_timer_fn+0x135/0x283
 ? detach_if_pending+0x55/0x55
 ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
 __run_timers+0x111/0x14b
 ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
 run_timer_softirq+0x1c/0x36
 __do_softirq+0x185/0x37c
 ? test_ti_thread_flag.constprop.19+0xd/0xd
 do_softirq_own_stack+0x22/0x28
 
 irq_exit+0x5a/0xa4
 smp_apic_timer_interrupt+0x2a/0x34
 apic_timer_interrupt+0x37/0x3c

By using DEFINE/DECLARE_PER_CPU_ALIGNED we can enforce at least 8 byte
alignment as all cache line sizes are at least 8 bytes or more.

Fixes: a9e419dc7be6 ("netfilter: merge ctinfo into nfct pointer storage area")
Signed-off-by: Steven Rostedt (VMware) 
---
diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index f540f9a..1960587 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -244,7 +244,7 @@ extern s32 (*nf_ct_nat_offset)(const struct nf_conn *ct,
   u32 seq);
 
 /* Fake conntrack entry for untracked connections */
-DECLARE_PER_CPU(struct nf_conn, nf_conntrack_untracked);
+DECLARE_PER_CPU_ALIGNED(struct nf_conn, nf_conntrack_untracked);
 static inline struct nf_conn *nf_ct_untracked_get(void)
 {
return raw_cpu_ptr(_conntrack_untracked);
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 071b97f..ffb78e5 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -181,7 +181,11 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
 unsigned int nf_conntrack_max __read_mostly;
 seqcount_t nf_conntrack_generation __read_mostly;
 
-DEFINE_PER_CPU(struct nf_conn, nf_conntrack_untracked);
+/* nf_conn must be 8 bytes aligned, as the 3 LSB bits are used
+ * for the nfctinfo. We cheat by (ab)using the PER CPU cache line
+ * alignment to enforce this.
+ */
+DEFINE_PER_CPU_ALIGNED(struct nf_conn, nf_conntrack_untracked);
 EXPORT_PER_CPU_SYMBOL(nf_conntrack_untracked);
 
 static unsigned int nf_conntrack_hash_rnd __read_mostly;

[PATCH] usbnet: smsc95xx: Reduce logging noise

2017-03-10 Thread Guenter Roeck

An insert/remove stress test generated the following log message sequence.

usb 7-1.1: New USB device found, idVendor=0424, idProduct=ec00
usb 7-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
smsc95xx 7-1.1:1.0 (unnamed net_device) (uninitialized):
Failed to read reg index 0x0114: -22
smsc95xx 7-1.1:1.0 (unnamed net_device) (uninitialized):
Error reading MII_ACCESS
smsc95xx 7-1.1:1.0 (unnamed net_device) (uninitialized):
MII is busy in smsc95xx_mdio_read

... repeat 100+ times ...

smsc95xx 7-1.1:1.0 (unnamed net_device) (uninitialized):
MII is busy in smsc95xx_mdio_read
smsc95xx 7-1.1:1.0 (unnamed net_device) (uninitialized):
timeout on PHY Reset
smsc95xx 7-1.1:1.0 (unnamed net_device) (uninitialized):
Failed to init PHY
smsc95xx 7-1.1:1.0 (unnamed net_device) (uninitialized):
Failed to read reg index 0x: -22
smsc95xx: probe of 7-1.1:1.0 failed with error -22
usb usb7: USB disconnect, device number 1
usb 7-1: USB disconnect, device number 2
usb 7-1.1: USB disconnect, device number 3

Use netdev_dbg() instead of netdev_warn() for the repeating messages
to reduce logging noise.

Signed-off-by: Guenter Roeck 
---
 drivers/net/usb/smsc95xx.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 831aa33d078a..f69e9e031973 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -100,8 +100,8 @@ static int __must_check __smsc95xx_read_reg(struct usbnet 
*dev, u32 index,
 | USB_TYPE_VENDOR | USB_RECIP_DEVICE,
 0, index, , 4);
if (unlikely(ret < 0)) {
-   netdev_warn(dev->net, "Failed to read reg index 0x%08x: %d\n",
-   index, ret);
+   netdev_dbg(dev->net, "Failed to read reg index 0x%08x: %d\n",
+  index, ret);
return ret;
}
 
@@ -174,7 +174,7 @@ static int __must_check __smsc95xx_phy_wait_not_busy(struct 
usbnet *dev,
do {
ret = __smsc95xx_read_reg(dev, MII_ADDR, , in_pm);
if (ret < 0) {
-   netdev_warn(dev->net, "Error reading MII_ACCESS\n");
+   netdev_dbg(dev->net, "Error reading MII_ACCESS\n");
return ret;
}
 
@@ -197,7 +197,7 @@ static int __smsc95xx_mdio_read(struct net_device *netdev, 
int phy_id, int idx,
/* confirm MII not busy */
ret = __smsc95xx_phy_wait_not_busy(dev, in_pm);
if (ret < 0) {
-   netdev_warn(dev->net, "MII is busy in smsc95xx_mdio_read\n");
+   netdev_dbg(dev->net, "MII is busy in smsc95xx_mdio_read\n");
goto done;
}
 
-- 
2.7.4

[net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation.

2017-03-10 Thread Andy Zhou

The logic of allocating and copy key for each 'exec_actions_level'
was specific to execute_recirc(). However, future patches will reuse
as well.  Refactor the logic into its own function clone_key().

Signed-off-by: Andy Zhou 
---
 net/openvswitch/actions.c | 71 ---
 1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 75182e9..a622c19 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2014 Nicira, Inc.
+ * Copyright (c) 2007-2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -75,7 +75,7 @@ struct ovs_frag_data {
 
 #define DEFERRED_ACTION_FIFO_SIZE 10
 #define OVS_RECURSION_LIMIT 5
-#define OVS_DEFERRED_ACTION_THRESHOLD (OVS_RECURSION_LIMIT - 2)
+#define OVS_DEFERRED_ACITON_THRESHOLD (OVS_RECURSION_LIMIT - 2)
 struct action_fifo {
int head;
int tail;
@@ -83,14 +83,31 @@ struct action_fifo {
struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE];
 };
 
-struct recirc_keys {
-   struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD];
+struct action_flow_keys {
+   struct sw_flow_key key[OVS_DEFERRED_ACITON_THRESHOLD];
 };
 
 static struct action_fifo __percpu *action_fifos;
-static struct recirc_keys __percpu *recirc_keys;
+static struct action_flow_keys __percpu *flow_keys;
 static DEFINE_PER_CPU(int, exec_actions_level);
 
+/* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
+ * space. Return NULL if out of key spaces.
+ */
+static struct sw_flow_key *clone_key(const struct sw_flow_key *key_)
+{
+   struct action_flow_keys *keys = this_cpu_ptr(flow_keys);
+   int level = this_cpu_read(exec_actions_level);
+   struct sw_flow_key *key = NULL;
+
+   if (level <= OVS_DEFERRED_ACITON_THRESHOLD) {
+   key = >key[level - 1];
+   *key = *key_;
+   }
+
+   return key;
+}
+
 static void action_fifo_init(struct action_fifo *fifo)
 {
fifo->head = 0;
@@ -1090,8 +1107,8 @@ static int execute_recirc(struct datapath *dp, struct 
sk_buff *skb,
  struct sw_flow_key *key,
  const struct nlattr *a, int rem)
 {
+   struct sw_flow_key *recirc_key;
struct deferred_action *da;
-   int level;
 
if (!is_flow_key_valid(key)) {
int err;
@@ -1115,29 +1132,27 @@ static int execute_recirc(struct datapath *dp, struct 
sk_buff *skb,
return 0;
}
 
-   level = this_cpu_read(exec_actions_level);
-   if (level <= OVS_DEFERRED_ACTION_THRESHOLD) {
-   struct recirc_keys *rks = this_cpu_ptr(recirc_keys);
-   struct sw_flow_key *recirc_key = >key[level - 1];
-
-   *recirc_key = *key;
+   /* If we are within the limit of 'OVS_DEFERRED_ACITON_THRESHOLD',
+* recirc immediately, otherwise, defer it for later execution.
+*/
+   recirc_key = clone_key(key);
+   if (recirc_key) {
recirc_key->recirc_id = nla_get_u32(a);
ovs_dp_process_packet(skb, recirc_key);
-
-   return 0;
-   }
-
-   da = add_deferred_actions(skb, key, NULL, 0);
-   if (da) {
-   da->pkt_key.recirc_id = nla_get_u32(a);
} else {
-   kfree_skb(skb);
-
-   if (net_ratelimit())
-   pr_warn("%s: deferred action limit reached, drop recirc 
action\n",
-   ovs_dp_name(dp));
+   da = add_deferred_actions(skb, key, NULL, 0);
+   if (da) {
+   recirc_key = >pkt_key;
+   recirc_key->recirc_id = nla_get_u32(a);
+   } else {
+   /* Log an error in case action fifo is full.
+*/
+   kfree_skb(skb);
+   if (net_ratelimit())
+   pr_warn("%s: deferred action limit reached, 
drop recirc action\n",
+   ovs_dp_name(dp));
+   }
}
-
return 0;
 }
 
@@ -1327,8 +1342,8 @@ int action_fifos_init(void)
if (!action_fifos)
return -ENOMEM;
 
-   recirc_keys = alloc_percpu(struct recirc_keys);
-   if (!recirc_keys) {
+   flow_keys = alloc_percpu(struct action_flow_keys);
+   if (!flow_keys) {
free_percpu(action_fifos);
return -ENOMEM;
}
@@ -1339,5 +1354,5 @@ int action_fifos_init(void)
 void action_fifos_exit(void)
 {
free_percpu(action_fifos);
-   free_percpu(recirc_keys);
+   free_percpu(flow_keys);
 }
-- 
1.8.3.1

[net-next sample action optimization 0/3]

2017-03-10 Thread Andy Zhou

The sample action can be used for translating Openflow 'clone' action.
However its implementation has not been sufficiently optimized for this
use case. This series attempts to close the gap.

Patch 3 commit message has more details on the specific optimizations
implemented.

Andy Zhou (3):
  openvswitch: Deferred fifo API change.
  openvswitch: Refactor recirc key allocation.
  openvswitch: Optimize sample action for the clone use cases

 include/uapi/linux/openvswitch.h |  13 +++
 net/openvswitch/actions.c| 194 ++-
 net/openvswitch/datapath.h   |   1 +
 net/openvswitch/flow_netlink.c   | 126 +
 4 files changed, 213 insertions(+), 121 deletions(-)

-- 
1.8.3.1

[net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases

2017-03-10 Thread Andy Zhou

With the introduction of open flow 'clone' action, the OVS user space
can now translate the 'clone' action into kernel datapath 'sample'
action, with 100% probability, to ensure that the clone semantics,
which is that the packet seen by the clone action is the same as the
packet seen by the action after clone, is faithfully carried out
in the datapath.

While the sample action in the datpath has the matching semantics,
its implementation is only optimized for its original use.
Specifically, there are two limitation: First, there is a 3 level of
nesting restriction, enforced at the flow downloading time. This
limit turns out to be too restrictive for the 'clone' use case.
Second, the implementation avoid recursive call only if the sample
action list has a single userspace action.

The main optimization implemented in this series removes the static
nesting limit check, instead, implement the run time recursion limit
check, and recursion avoidance similar to that of the 'recirc' action.
This optimization solve both #1 and #2 issues above.

One related optimization attemps to avoid copying flow key as
long as the actions enclosed does not change the flow key. The
detection is performed only once at the flow downloading time.

Another related optimization is to rewrite the action list
at flow downloading time in order to save the fast path from parsing
the sample action list in its original form repeatedly.

Signed-off-by: Andy Zhou 
---
 include/uapi/linux/openvswitch.h |  13 
 net/openvswitch/actions.c| 111 ++
 net/openvswitch/datapath.h   |   1 +
 net/openvswitch/flow_netlink.c   | 126 ---
 4 files changed, 162 insertions(+), 89 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 7f41f7d..0dfe69b 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -578,10 +578,23 @@ enum ovs_sample_attr {
OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* attributes. */
__OVS_SAMPLE_ATTR_MAX,
+
+#ifdef __KERNEL__
+   OVS_SAMPLE_ATTR_ARG  /* struct sample_arg  */
+#endif
 };
 
 #define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
 
+#ifdef __KERNEL__
+struct sample_arg {
+   bool exec;   /* When true, actions in sample will not
+   change flow keys. False otherwise. */
+   u32  probability;/* Same value as
+   'OVS_SAMPLE_ATTR_PROBABILITY'. */
+};
+#endif
+
 /**
  * enum ovs_userspace_attr - Attributes for %OVS_ACTION_ATTR_USERSPACE action.
  * @OVS_USERSPACE_ATTR_PID: u32 Netlink PID to which the %OVS_PACKET_CMD_ACTION
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index a622c19..8428438 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -928,73 +928,72 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
return ovs_dp_upcall(dp, skb, key, , cutlen);
 }
 
+
+/* When 'last' is true, sample() should always consume the 'skb'.
+ * Otherwise, sample() should keep 'skb' intact regardless what
+ * actions are executed within sample().
+ */
 static int sample(struct datapath *dp, struct sk_buff *skb,
  struct sw_flow_key *key, const struct nlattr *attr,
- const struct nlattr *actions, int actions_len)
+ bool last)
 {
-   const struct nlattr *acts_list = NULL;
-   const struct nlattr *a;
-   int rem;
-   u32 cutlen = 0;
-
-   for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
-a = nla_next(a, )) {
-   u32 probability;
+   struct nlattr *actions;
+   struct nlattr *sample_arg;
+   struct sk_buff *clone_skb;
+   struct sw_flow_key *orig = key;
+   int rem = nla_len(attr);
+   int err = 0;
+   const struct sample_arg *arg;
 
-   switch (nla_type(a)) {
-   case OVS_SAMPLE_ATTR_PROBABILITY:
-   probability = nla_get_u32(a);
-   if (!probability || prandom_u32() > probability)
-   return 0;
-   break;
+   /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
+   sample_arg = nla_data(attr);
+   arg = nla_data(sample_arg);
+   actions = nla_next(sample_arg, );
 
-   case OVS_SAMPLE_ATTR_ACTIONS:
-   acts_list = a;
-   break;
-   }
+   if ((arg->probability != U32_MAX) &&
+   (!arg->probability || prandom_u32() > arg->probability)) {
+   if (last)
+   consume_skb(skb);
+   return 0;
}
 
-   rem = nla_len(acts_list);
-   a = nla_data(acts_list);
-
-   /* Actions list is empty, do nothing */
-   if

[net-next sample action optimization 1/3] openvswitch: Deferred fifo API change.

2017-03-10 Thread Andy Zhou

add_deferred_actions() API currently requires actions to be passed in
as a fully encoded netlink message. So far both 'sample' and 'recirc'
actions happens to carry actions as fully encoded netlink messages.
However, this requirement is more restrictive than necessary, future
patch will need to pass in action lists that are not fully encoded
by themselves.

Signed-off-by: Andy Zhou 
Acked-by: Joe Stringer 
---
 net/openvswitch/actions.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index c82301c..75182e9 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -51,6 +51,7 @@ static int do_execute_actions(struct datapath *dp, struct 
sk_buff *skb,
 struct deferred_action {
struct sk_buff *skb;
const struct nlattr *actions;
+   int actions_len;
 
/* Store pkt_key clone when creating deferred action. */
struct sw_flow_key pkt_key;
@@ -119,8 +120,9 @@ static struct deferred_action *action_fifo_put(struct 
action_fifo *fifo)
 
 /* Return true if fifo is not full */
 static struct deferred_action *add_deferred_actions(struct sk_buff *skb,
-   const struct sw_flow_key 
*key,
-   const struct nlattr *attr)
+   const struct sw_flow_key *key,
+   const struct nlattr *actions,
+   const int actions_len)
 {
struct action_fifo *fifo;
struct deferred_action *da;
@@ -129,7 +131,8 @@ static struct deferred_action *add_deferred_actions(struct 
sk_buff *skb,
da = action_fifo_put(fifo);
if (da) {
da->skb = skb;
-   da->actions = attr;
+   da->actions = actions;
+   da->actions_len = actions_len;
da->pkt_key = *key;
}
 
@@ -966,7 +969,8 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
/* Skip the sample action when out of memory. */
return 0;
 
-   if (!add_deferred_actions(skb, key, a)) {
+   if (!add_deferred_actions(skb, key, nla_data(acts_list),
+ nla_len(acts_list))) {
if (net_ratelimit())
pr_warn("%s: deferred actions limit reached, dropping 
sample action\n",
ovs_dp_name(dp));
@@ -1123,7 +1127,7 @@ static int execute_recirc(struct datapath *dp, struct 
sk_buff *skb,
return 0;
}
 
-   da = add_deferred_actions(skb, key, NULL);
+   da = add_deferred_actions(skb, key, NULL, 0);
if (da) {
da->pkt_key.recirc_id = nla_get_u32(a);
} else {
@@ -1278,10 +1282,10 @@ static void process_deferred_actions(struct datapath 
*dp)
struct sk_buff *skb = da->skb;
struct sw_flow_key *key = >pkt_key;
const struct nlattr *actions = da->actions;
+   int actions_len = da->actions_len;
 
if (actions)
-   do_execute_actions(dp, skb, key, actions,
-  nla_len(actions));
+   do_execute_actions(dp, skb, key, actions, actions_len);
else
ovs_dp_process_packet(skb, key);
} while (!action_fifo_is_empty(fifo));
-- 
1.8.3.1

Re: [PATCH net-next v2] net: Add sysctl to toggle early demux for tcp and udp

2017-03-10 Thread Tom Herbert

On Fri, Mar 10, 2017 at 4:22 PM, Eric Dumazet  wrote:
> On Fri, 2017-03-10 at 08:33 -0800, Tom Herbert wrote:
>
>> Okay, now I'm confused. You're saying that when early demux was added
>> for IPv6 performance improved, but this patch is allowing early demux
>> to be disabled on the basis that it hurts performance for unconnected
>> UDP workloads. While it's true that early demux in the case results in
>> another UDP lookup, Eric's changes to make it lockless have made that
>> lookup very cheap. So we really need numbers to justify this patch.
>>
>
> Fact that the lookup is lockless does not avoid a cache line miss.
>
> Early demux computes a hash based on the 4-tuple, and lookups a hash
> table with does not fit in cpu caches.
>
> A cache line miss per packet is expensive, when handling millions of UDP
> packets per second, (with millions of 4-tuples)
>
>> Even if the numbers were to show a benefit, we still have the problem
>> that this creates a bimodal performance characteristic, e.g. what if
>> the work load were 1/2 connected and 1/2 unconnected in real life, or
>> what it the user incorrectly guesses the actual workload. Maybe a
>> deeper solution to investigate is making early demux work with
>> unconnected sockets.
>
> Sure, but forcing all UDP applications to perform IP early demux is not
> better.
>
All these hypotheses are quite testable, and it should be obvious that
if a patch is supposed to improve performance there should be some
effort to quantify the impact.

[PATCH] drop_monitor: use setup_timer

2017-03-10 Thread Geliang Tang

Use setup_timer() instead of init_timer() to simplify the code.

Signed-off-by: Geliang Tang 
---
 net/core/drop_monitor.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index fb55327..70ccda2 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -412,9 +412,8 @@ static int __init init_net_drop_monitor(void)
for_each_possible_cpu(cpu) {
data = _cpu(dm_cpu_data, cpu);
INIT_WORK(>dm_alert_work, send_dm_alert);
-   init_timer(>send_timer);
-   data->send_timer.data = (unsigned long)data;
-   data->send_timer.function = sched_send_work;
+   setup_timer(>send_timer, sched_send_work,
+   (unsigned long)data);
spin_lock_init(>lock);
reset_per_cpu_data(data);
}
-- 
2.9.3

[PATCH] ambassador: use setup_timer

2017-03-10 Thread Geliang Tang

Use setup_timer() instead of init_timer() to simplify the code.

Signed-off-by: Geliang Tang 
---
 drivers/atm/ambassador.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/atm/ambassador.c b/drivers/atm/ambassador.c
index 4a61079..906705e 100644
--- a/drivers/atm/ambassador.c
+++ b/drivers/atm/ambassador.c
@@ -2267,9 +2267,8 @@ static int amb_probe(struct pci_dev *pci_dev,
dev->atm_dev->ci_range.vpi_bits = NUM_VPI_BITS;
dev->atm_dev->ci_range.vci_bits = NUM_VCI_BITS;
 
-   init_timer(>housekeeping);
-   dev->housekeeping.function = do_housekeeping;
-   dev->housekeeping.data = (unsigned long) dev;
+   setup_timer(>housekeeping, do_housekeeping,
+   (unsigned long)dev);
mod_timer(>housekeeping, jiffies);
 
// enable host interrupts
-- 
2.9.3

[PATCH] x86-32: fix tlb flushing when lguest clears PGE

2017-03-10 Thread Daniel Borkmann

Fengguang reported [1] random corruptions from various locations on
x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY
config") and 9d876e79df6a ("bpf: fix unlocking of jited image when
module ronx not set") that uses the former. While x86-32 doesn't
have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro()
got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test
kernel doesn't have module support built in and therefore never
had the DEBUG_SET_MODULE_RONX setting enabled.

After investigating the crashes further, it turned out that using
set_memory_ro() and set_memory_rw() didn't have the desired effect,
for example, setting the pages as read-only on x86-32 would still
let probe_kernel_write() succeed without error. This behavior would
manifest itself in situations where the vmalloc'ed buffer was accessed
prior to set_memory_*() such as in case of bpf_prog_alloc(). In
cases where it wasn't, the page attribute changes seemed to have
taken effect, leading to the conclusion that a TLB invalidate
didn't happen. Moreover, it turned out that this issue reproduced
with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the
issue occurs, change_page_attr_set_clr() did trigger a TLB flush
as expected via __flush_tlb_all() through cpa_flush_range(), though.

There are 3 variants for issuing a TLB flush: invpcid_flush_all()
(depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE),
cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush.
For "-cpu host" case in my setup, the flush used invpcid_flush_all()
variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching
the kvm64 case to cr3 manually worked fine, and further investigating
the cr4 one turned out that X86_CR4_PGE bit was not set in cr4
register, meaning the __native_flush_tlb_global_irq_disabled() wrote
cr4 twice with the same value instead of clearing X86_CR4_PGE in the
first write to trigger the flush.

It turned out that X86_CR4_PGE was cleared from cr4 during init
from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE
bit is also cleared from there due to concerns of using PGE in
guest kernel that can lead to hard to trace bugs (see bff672e630a0
("lguest: documentation V: Host") in init()). The CPU feature bits
are cleared in dynamic boot_cpu_data, but they never propagated to
__flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has()
for testing which variant of TLB flushing to use, meaning they still
used the old setting of the host kernel.

Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would
propagate to static_cpu_has() checks is too late at this point as
sections have been patched already, so for now, it seems reasonable
to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to
commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This
lets the TLB flush trigger via cr3 as originally intended, properly
makes the new page attributes visible and thus fixes the crashes
seen by Fengguang.

  [1] https://lkml.org/lkml/2017/3/1/344

Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge")
Reported-by: Fengguang Wu 
Signed-off-by: Daniel Borkmann 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Thomas Gleixner 
Cc: Kees Cook 
Cc: Laura Abbott 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: Rusty Russell 
Cc: Alexei Starovoitov 
Cc: David S. Miller 
---
 arch/x86/include/asm/tlbflush.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 6fa8594..fc5abff 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -188,7 +188,7 @@ static inline void __native_flush_tlb_single(unsigned long 
addr)
 
 static inline void __flush_tlb_all(void)
 {
-   if (static_cpu_has(X86_FEATURE_PGE))
+   if (boot_cpu_has(X86_FEATURE_PGE))
__flush_tlb_global();
else
__flush_tlb();
-- 
1.9.3

Re: [PATCH net-next v2] net: Add sysctl to toggle early demux for tcp and udp

2017-03-10 Thread Eric Dumazet

On Fri, 2017-03-10 at 08:33 -0800, Tom Herbert wrote:

> Okay, now I'm confused. You're saying that when early demux was added
> for IPv6 performance improved, but this patch is allowing early demux
> to be disabled on the basis that it hurts performance for unconnected
> UDP workloads. While it's true that early demux in the case results in
> another UDP lookup, Eric's changes to make it lockless have made that
> lookup very cheap. So we really need numbers to justify this patch.
> 

Fact that the lookup is lockless does not avoid a cache line miss.

Early demux computes a hash based on the 4-tuple, and lookups a hash
table with does not fit in cpu caches.

A cache line miss per packet is expensive, when handling millions of UDP
packets per second, (with millions of 4-tuples)

> Even if the numbers were to show a benefit, we still have the problem
> that this creates a bimodal performance characteristic, e.g. what if
> the work load were 1/2 connected and 1/2 unconnected in real life, or
> what it the user incorrectly guesses the actual workload. Maybe a
> deeper solution to investigate is making early demux work with
> unconnected sockets.

Sure, but forcing all UDP applications to perform IP early demux is not
better.

Re: [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases

2017-03-10 Thread Joe Stringer

On 10 March 2017 at 14:07, Andy Zhou  wrote:
> On Thu, Mar 9, 2017 at 11:46 AM, Joe Stringer  wrote:
>> On 7 March 2017 at 16:15, Andy Zhou  wrote:
>>> With the introduction of open flow 'clone' action, the OVS user space
>>> can now translate the 'clone' action into kernel datapath 'sample'
>>> action, with 100% probability, to ensure that the clone semantics,
>>> which is that the packet seen by the clone action is the same as the
>>> packet seen by the action after clone, is faithfully carried out
>>> in the datapath.
>>>
>>> While the sample action in the datpath has the matching semantics,
>>> its implementation is only optimized for its original use.
>>> Specifically, there are two limitation: First, there is a 3 level of
>>> nesting restriction, enforced at the flow downloading time. This
>>> limit turns out to be too restrictive for the 'clone' use case.
>>> Second, the implementation avoid recursive call only if the sample
>>> action list has a single userspace action.
>>>
>>> The optimization implemented in this series removes the static
>>> nesting limit check, instead, implement the run time recursion limit
>>> check, and recursion avoidance similar to that of the 'recirc' action.
>>> This optimization solve both #1 and #2 issues above.
>>>
>>> Another optimization implemented is to avoid coping flow key as
>>
>> *copying
>>
>>> long as the actions enclosed does not change the flow key. The
>>> detection is performed only once at the flow downloading time.
>>>
>>> The third optimization implemented is to rewrite the action list
>>> at flow downloading time in order to save the fast path from parsing
>>> the sample action list in its original form repeatedly.
>>
>> Whenever there is an enumeration (1, 2, 3; ..another..., third thing
>> implemented) in a commit message, I have to ask whether each "another
>> change..." should be a separate patch. It certainly makes it easier to
>> review.
>>
> They are all part of the same implementation. Spliting them probably won't
> make much sense. I think I will drop #2 and #3 in the commit message since
> #1 is the main optimization.

Fair enough. You don't have to drop them from the commit message, it
makes the intention of all of the changes more clear.

>> I ran this through the OVS kernel tests and it's working correctly
>> from that point of view, but I didn't get a chance to dig in and
>> ensure for example, runtime behaviour of several nested
>> sample(actions(sample(actions(sample(actions(output)) handles
>> reasonably when it runs out of stack and deferred actions space. At a
>> high level though, this seems pretty straightforward.
>>
>> Several comments below, thanks.
>>
>>>
>>> Signed-off-by: Andy Zhou 
>>> ---
>>>  net/openvswitch/actions.c  | 106 ++--
>>>  net/openvswitch/datapath.h |   7 +++
>>>  net/openvswitch/flow_netlink.c | 118 
>>> -
>>>  3 files changed, 140 insertions(+), 91 deletions(-)
>>>
>>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>>> index 259aea9..2e8c372 100644
>>> --- a/net/openvswitch/actions.c
>>> +++ b/net/openvswitch/actions.c
>>> @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, 
>>> struct sk_buff *skb,
>>>  }
>>>
>>>  static int sample(struct datapath *dp, struct sk_buff *skb,
>>> - struct sw_flow_key *key, const struct nlattr *attr,
>>> - const struct nlattr *actions, int actions_len)
>>> + struct sw_flow_key *key, const struct nlattr *attr)
>>>  {
>>> -   const struct nlattr *acts_list = NULL;
>>> -   const struct nlattr *a;
>>> -   int rem;
>>> -   u32 cutlen = 0;
>>> -
>>> -   for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>>> -a = nla_next(a, )) {
>>> -   u32 probability;
>>> -
>>> -   switch (nla_type(a)) {
>>> -   case OVS_SAMPLE_ATTR_PROBABILITY:
>>> -   probability = nla_get_u32(a);
>>> -   if (!probability || prandom_u32() > probability)
>>> -   return 0;
>>> -   break;
>>> -
>>> -   case OVS_SAMPLE_ATTR_ACTIONS:
>>> -   acts_list = a;
>>> -   break;
>>> -   }
>>> -   }
>>> +   struct nlattr *actions;
>>> +   struct nlattr *sample_arg;
>>> +   struct sw_flow_key *orig = key;
>>> +   int rem = nla_len(attr);
>>> +   int err = 0;
>>> +   const struct sample_arg *arg;
>>>
>>> -   rem = nla_len(acts_list);
>>> -   a = nla_data(acts_list);
>>> +   /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */
>>
>> Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see.
>>
>>> +   sample_arg = nla_data(attr);
>>
>> We could do this in the parent call, like several other actions do.
>
> What do you mean?

Re: [PATCH] net: tun: use new api ethtool_{get|set}_link_ksettings

2017-03-10 Thread Eric Dumazet

On Fri, 2017-03-10 at 23:57 +0200, Michael S. Tsirkin wrote:
> On Fri, Mar 10, 2017 at 10:18:07PM +0100, Philippe Reynes wrote:
> > The ethtool api {get|set}_settings is deprecated.
> > We move this driver to new api {get|set}_link_ksettings.
> > 
> > As I don't have the hardware,
> 
> What kind of hardware?

TGIF ;)

[PATCH net-next 2/3] vxlan: fix snooping for link-local IPv6 addresses

2017-03-10 Thread Matthias Schiffer

If VXLAN is run over link-local IPv6 addresses, it is necessary to store
the ifindex in the FDB entries. Otherwise, the used interface is undefined
and unicast communication will most likely fail.

Signed-off-by: Matthias Schiffer 
---
 drivers/net/vxlan.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index cc0ace73d02e..4c0ef8bbad71 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -941,16 +941,24 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
  */
 static bool vxlan_snoop(struct net_device *dev,
union vxlan_addr *src_ip, const u8 *src_mac,
-   __be32 vni)
+   __u32 src_ifindex, __be32 vni)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
struct vxlan_fdb *f;
+   __u32 ifindex = 0;
+
+#if IS_ENABLED(CONFIG_IPV6)
+   if (src_ip->sa.sa_family == AF_INET6 &&
+   (ipv6_addr_type(_ip->sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL))
+   ifindex = src_ifindex;
+#endif
 
f = vxlan_find_mac(vxlan, src_mac, vni);
if (likely(f)) {
struct vxlan_rdst *rdst = first_remote_rcu(f);
 
-   if (likely(vxlan_addr_equal(>remote_ip, src_ip)))
+   if (likely(vxlan_addr_equal(>remote_ip, src_ip) &&
+  rdst->remote_ifindex == ifindex))
return false;
 
/* Don't migrate static entries, drop packets */
@@ -977,7 +985,7 @@ static bool vxlan_snoop(struct net_device *dev,
 vxlan->cfg.dst_port,
 vni,
 vxlan->default_dst.remote_vni,
-0, NTF_SELF);
+ifindex, NTF_SELF);
spin_unlock(>hash_lock);
}
 
@@ -1246,6 +1254,7 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan,
  struct sk_buff *skb, __be32 vni)
 {
union vxlan_addr saddr;
+   __u32 ifindex = skb->dev->ifindex;
 
skb_reset_mac_header(skb);
skb->protocol = eth_type_trans(skb, vxlan->dev);
@@ -1267,7 +1276,7 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan,
}
 
if ((vxlan->flags & VXLAN_F_LEARN) &&
-   vxlan_snoop(skb->dev, , eth_hdr(skb)->h_source, vni))
+   vxlan_snoop(skb->dev, , eth_hdr(skb)->h_source, ifindex, vni))
return false;
 
return true;
@@ -1974,7 +1983,8 @@ static void vxlan_encap_bypass(struct sk_buff *skb, 
struct vxlan_dev *src_vxlan,
}
 
if (dst_vxlan->flags & VXLAN_F_LEARN)
-   vxlan_snoop(skb->dev, , eth_hdr(skb)->h_source, vni);
+   vxlan_snoop(skb->dev, , eth_hdr(skb)->h_source, 0,
+   vni);
 
u64_stats_update_begin(_stats->syncp);
tx_stats->tx_packets++;
-- 
2.12.0

[PATCH net-next 0/3] VXLAN over IPv6 link-local

2017-03-10 Thread Matthias Schiffer

Running VXLANs over IPv6 link-local addresses allows to use them as a
drop-in replacement for VLANs, avoiding to allocate additional outer IP
addresses to run the VXLAN over.

The first patch is basically a bugfix, not allowing to use link-local
addresses without specifying an interface; it doesn't seem important enough
for net/stable though (and without the second patch, allowing to specify
link-local addresses at all does not result in a working configuration
anyways). The second patch then actually makes VXLAN over link-local IPv6
work by passing interface indices at the right places.

The final patch lifts the limitation of not allowing multiple VXLANs with
the same VNI and port, as long as link-local IPv6 addresses are used and
different interfaces are specified. Again, this brings VXLAN a bit closer
to VLANs feature-wise.


Matthias Schiffer (3):
  vxlan: don't allow link-local IPv6 local/remote addresses without
interface
  vxlan: fix snooping for link-local IPv6 addresses
  vxlan: allow multiple VXLANs with same VNI for IPv6 link-local
addresses

 drivers/net/vxlan.c | 120 +---
 1 file changed, 87 insertions(+), 33 deletions(-)

--
2.12.0

[PATCH net-next 1/3] vxlan: don't allow link-local IPv6 local/remote addresses without interface

2017-03-10 Thread Matthias Schiffer

Signed-off-by: Matthias Schiffer 
---
 drivers/net/vxlan.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e375560cc74e..cc0ace73d02e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2922,6 +2922,18 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
pr_info("multicast destination requires interface to be 
specified\n");
return -EINVAL;
}
+#if IS_ENABLED(CONFIG_IPV6)
+   else if (!conf->remote_ifindex &&
+((conf->saddr.sa.sa_family == AF_INET6 &&
+ (ipv6_addr_type(>saddr.sin6.sin6_addr) &
+  IPV6_ADDR_LINKLOCAL)) ||
+ (dst->remote_ip.sa.sa_family == AF_INET6 &&
+  (ipv6_addr_type(>remote_ip.sin6.sin6_addr) &
+   IPV6_ADDR_LINKLOCAL {
+   pr_info("link-local local/remote addresses require interface to 
be specified\n");
+   return -EINVAL;
+   }
+#endif
 
if (conf->mtu) {
int max_mtu = ETH_MAX_MTU;
-- 
2.12.0

[PATCH net-next 3/3] vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses

2017-03-10 Thread Matthias Schiffer

As link-local addresses are only valid for a single interface, we can allow
to use the same VNI for multiple independent VXLANs, as long as the used
interfaces are distinct. This way, VXLANs can always be used as a drop-in
replacement for VLANs with greater ID space.

This also extends VNI lookup to respect the ifindex when link-local IPv6
addresses are used, so using the same VNI on multiple interfaces can
actually work.

Signed-off-by: Matthias Schiffer 
---
 drivers/net/vxlan.c | 88 -
 1 file changed, 60 insertions(+), 28 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 4c0ef8bbad71..9cd6f6b92cf4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -224,7 +224,8 @@ static struct vxlan_sock *vxlan_find_sock(struct net *net, 
sa_family_t family,
return NULL;
 }
 
-static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, __be32 vni)
+static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, int ifindex,
+  __be32 vni)
 {
struct vxlan_dev *vxlan;
 
@@ -233,17 +234,30 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct 
vxlan_sock *vs, __be32 vni)
vni = 0;
 
hlist_for_each_entry_rcu(vxlan, vni_head(vs, vni), hlist) {
-   if (vxlan->default_dst.remote_vni == vni)
-   return vxlan;
+   if (vxlan->default_dst.remote_vni != vni)
+   continue;
+
+   if (IS_ENABLED(CONFIG_IPV6)) {
+   const struct vxlan_config *cfg = >cfg;
+
+   if (cfg->remote_ifindex != 0 &&
+   cfg->remote_ifindex != ifindex &&
+   cfg->saddr.sa.sa_family == AF_INET6 &&
+   (ipv6_addr_type(>saddr.sin6.sin6_addr) &
+IPV6_ADDR_LINKLOCAL))
+   continue;
+   }
+
+   return vxlan;
}
 
return NULL;
 }
 
 /* Look up VNI in a per net namespace table */
-static struct vxlan_dev *vxlan_find_vni(struct net *net, __be32 vni,
-   sa_family_t family, __be16 port,
-   u32 flags)
+static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
+   __be32 vni, sa_family_t family,
+   __be16 port, u32 flags)
 {
struct vxlan_sock *vs;
 
@@ -251,7 +265,7 @@ static struct vxlan_dev *vxlan_find_vni(struct net *net, 
__be32 vni,
if (!vs)
return NULL;
 
-   return vxlan_vs_find_vni(vs, vni);
+   return vxlan_vs_find_vni(vs, ifindex, vni);
 }
 
 /* Fill in neighbour message in skbuff. */
@@ -1342,7 +1356,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
 
vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
 
-   vxlan = vxlan_vs_find_vni(vs, vni);
+   vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni);
if (!vxlan)
goto drop;
 
@@ -2002,8 +2016,10 @@ static void vxlan_encap_bypass(struct sk_buff *skb, 
struct vxlan_dev *src_vxlan,
 }
 
 static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
-struct vxlan_dev *vxlan, union vxlan_addr 
*daddr,
-__be16 dst_port, __be32 vni, struct dst_entry 
*dst,
+struct vxlan_dev *vxlan,
+union vxlan_addr *daddr,
+__be16 dst_port, int dst_ifindex, __be32 vni,
+struct dst_entry *dst,
 u32 rt_flags)
 {
 #if IS_ENABLED(CONFIG_IPV6)
@@ -2019,7 +2035,7 @@ static int encap_bypass_if_local(struct sk_buff *skb, 
struct net_device *dev,
struct vxlan_dev *dst_vxlan;
 
dst_release(dst);
-   dst_vxlan = vxlan_find_vni(vxlan->net, vni,
+   dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, vni,
   daddr->sa.sa_family, dst_port,
   vxlan->flags);
if (!dst_vxlan) {
@@ -2051,6 +2067,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct dst_entry *ndst = NULL;
__be32 vni, label;
__u8 tos, ttl;
+   int ifindex;
int err;
u32 flags = vxlan->flags;
bool udp_sum = false;
@@ -2071,6 +2088,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 
dst_port = rdst->remote_port ? rdst->remote_port : 
vxlan->cfg.dst_port;
vni = (rdst->remote_vni) ? : default_vni;
+   ifindex = rdst->remote_ifindex;
local_ip = vxlan->cfg.saddr;
dst_cache = >dst_cache;

[PATCH] mpls: Do not decrement alive counter for unregister events

2017-03-10 Thread David Ahern

Multipath routes can be rendered usesless when a device in one of the
paths is deleted. For example:

$ ip -f mpls ro ls
100
nexthop as to 200 via inet 172.16.2.2  dev virt12
nexthop as to 300 via inet 172.16.3.2  dev br0
101
nexthop as to 201 via inet6 2000:2::2  dev virt12
nexthop as to 301 via inet6 2000:3::2  dev br0

$ ip li del br0

When br0 is deleted the other hop is not considered in
mpls_select_multipath because of the alive check -- rt_nhn_alive
is 0.

rt_nhn_alive is decremented once in mpls_ifdown when the device is taken
down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For
a 2 hop route, deleting one device drops the alive count to 0. Since
devices are taken down before unregistering, the decrement on
NETDEV_UNREGISTER is redundant.

Fixes: c89359a42e2a4 ("mpls: support for dead routes")
Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ccdac9c44fdc..22a9971aa484 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1288,7 +1288,8 @@ static void mpls_ifdown(struct net_device *dev, int event)
/* fall through */
case NETDEV_CHANGE:
nh->nh_flags |= RTNH_F_LINKDOWN;
-   ACCESS_ONCE(rt->rt_nhn_alive) = 
rt->rt_nhn_alive - 1;
+   if (event != NETDEV_UNREGISTER)
+   ACCESS_ONCE(rt->rt_nhn_alive) = 
rt->rt_nhn_alive - 1;
break;
}
if (event == NETDEV_UNREGISTER)
-- 
2.1.4

Re: [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases

2017-03-10 Thread Andy Zhou

On Thu, Mar 9, 2017 at 11:46 AM, Joe Stringer  wrote:
> On 7 March 2017 at 16:15, Andy Zhou  wrote:
>> With the introduction of open flow 'clone' action, the OVS user space
>> can now translate the 'clone' action into kernel datapath 'sample'
>> action, with 100% probability, to ensure that the clone semantics,
>> which is that the packet seen by the clone action is the same as the
>> packet seen by the action after clone, is faithfully carried out
>> in the datapath.
>>
>> While the sample action in the datpath has the matching semantics,
>> its implementation is only optimized for its original use.
>> Specifically, there are two limitation: First, there is a 3 level of
>> nesting restriction, enforced at the flow downloading time. This
>> limit turns out to be too restrictive for the 'clone' use case.
>> Second, the implementation avoid recursive call only if the sample
>> action list has a single userspace action.
>>
>> The optimization implemented in this series removes the static
>> nesting limit check, instead, implement the run time recursion limit
>> check, and recursion avoidance similar to that of the 'recirc' action.
>> This optimization solve both #1 and #2 issues above.
>>
>> Another optimization implemented is to avoid coping flow key as
>
> *copying
>
>> long as the actions enclosed does not change the flow key. The
>> detection is performed only once at the flow downloading time.
>>
>> The third optimization implemented is to rewrite the action list
>> at flow downloading time in order to save the fast path from parsing
>> the sample action list in its original form repeatedly.
>
> Whenever there is an enumeration (1, 2, 3; ..another..., third thing
> implemented) in a commit message, I have to ask whether each "another
> change..." should be a separate patch. It certainly makes it easier to
> review.
>
They are all part of the same implementation. Spliting them probably won't
make much sense. I think I will drop #2 and #3 in the commit message since
#1 is the main optimization.

> I ran this through the OVS kernel tests and it's working correctly
> from that point of view, but I didn't get a chance to dig in and
> ensure for example, runtime behaviour of several nested
> sample(actions(sample(actions(sample(actions(output)) handles
> reasonably when it runs out of stack and deferred actions space. At a
> high level though, this seems pretty straightforward.
>
> Several comments below, thanks.
>
>>
>> Signed-off-by: Andy Zhou 
>> ---
>>  net/openvswitch/actions.c  | 106 ++--
>>  net/openvswitch/datapath.h |   7 +++
>>  net/openvswitch/flow_netlink.c | 118 
>> -
>>  3 files changed, 140 insertions(+), 91 deletions(-)
>>
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 259aea9..2e8c372 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, 
>> struct sk_buff *skb,
>>  }
>>
>>  static int sample(struct datapath *dp, struct sk_buff *skb,
>> - struct sw_flow_key *key, const struct nlattr *attr,
>> - const struct nlattr *actions, int actions_len)
>> + struct sw_flow_key *key, const struct nlattr *attr)
>>  {
>> -   const struct nlattr *acts_list = NULL;
>> -   const struct nlattr *a;
>> -   int rem;
>> -   u32 cutlen = 0;
>> -
>> -   for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>> -a = nla_next(a, )) {
>> -   u32 probability;
>> -
>> -   switch (nla_type(a)) {
>> -   case OVS_SAMPLE_ATTR_PROBABILITY:
>> -   probability = nla_get_u32(a);
>> -   if (!probability || prandom_u32() > probability)
>> -   return 0;
>> -   break;
>> -
>> -   case OVS_SAMPLE_ATTR_ACTIONS:
>> -   acts_list = a;
>> -   break;
>> -   }
>> -   }
>> +   struct nlattr *actions;
>> +   struct nlattr *sample_arg;
>> +   struct sw_flow_key *orig = key;
>> +   int rem = nla_len(attr);
>> +   int err = 0;
>> +   const struct sample_arg *arg;
>>
>> -   rem = nla_len(acts_list);
>> -   a = nla_data(acts_list);
>> +   /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */
>
> Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see.
>
>> +   sample_arg = nla_data(attr);
>
> We could do this in the parent call, like several other actions do.

What do you mean?

>
> 
>
>> @@ -1246,9 +1227,24 @@ static int do_execute_actions(struct datapath *dp, 
>> struct sk_buff *skb,
>> err = execute_masked_set_action(skb, key, 
>> nla_data(a));
>> break;
>>
>> -   case OVS_ACTION_ATTR_SAMPLE:
>> -

Re: [PATCH] net: tun: use new api ethtool_{get|set}_link_ksettings

2017-03-10 Thread Michael S. Tsirkin

On Fri, Mar 10, 2017 at 10:18:07PM +0100, Philippe Reynes wrote:
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware,

What kind of hardware?

> I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 
> ---
>  drivers/net/tun.c |   24 +++-
>  1 files changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index dc1b1dd..c418f0a 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2430,18 +2430,16 @@ static void tun_chr_show_fdinfo(struct seq_file *m, 
> struct file *f)
>  
>  /* ethtool interface */
>  
> -static int tun_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
> -{
> - cmd->supported  = 0;
> - cmd->advertising= 0;
> - ethtool_cmd_speed_set(cmd, SPEED_10);
> - cmd->duplex = DUPLEX_FULL;
> - cmd->port   = PORT_TP;
> - cmd->phy_address= 0;
> - cmd->transceiver= XCVR_INTERNAL;
> - cmd->autoneg= AUTONEG_DISABLE;
> - cmd->maxtxpkt   = 0;
> - cmd->maxrxpkt   = 0;
> +static int tun_get_link_ksettings(struct net_device *dev,
> +   struct ethtool_link_ksettings *cmd)
> +{
> + ethtool_link_ksettings_zero_link_mode(cmd, supported);
> + ethtool_link_ksettings_zero_link_mode(cmd, advertising);
> + cmd->base.speed = SPEED_10;
> + cmd->base.duplex= DUPLEX_FULL;
> + cmd->base.port  = PORT_TP;
> + cmd->base.phy_address   = 0;
> + cmd->base.autoneg   = AUTONEG_DISABLE;
>   return 0;
>  }
>  
> @@ -2504,7 +2502,6 @@ static int tun_set_coalesce(struct net_device *dev,
>  }
>  
>  static const struct ethtool_ops tun_ethtool_ops = {
> - .get_settings   = tun_get_settings,
>   .get_drvinfo= tun_get_drvinfo,
>   .get_msglevel   = tun_get_msglevel,
>   .set_msglevel   = tun_set_msglevel,
> @@ -2512,6 +2509,7 @@ static int tun_set_coalesce(struct net_device *dev,
>   .get_ts_info= ethtool_op_get_ts_info,
>   .get_coalesce   = tun_get_coalesce,
>   .set_coalesce   = tun_set_coalesce,
> + .get_link_ksettings = tun_get_link_ksettings,
>  };
>  
>  static int tun_queue_resize(struct tun_struct *tun)
> -- 
> 1.7.4.4

Re: [RFC net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation.

2017-03-10 Thread Andy Zhou

On Thu, Mar 9, 2017 at 11:11 AM, Joe Stringer  wrote:
> On 7 March 2017 at 16:15, Andy Zhou  wrote:
>> The logic of allocating and copy key for each 'exec_actions_level'
>> was specific to execute_recirc(). However, future patches will reuse
>> as well.  Refactor the logic into its own function clone_key().
>>
>> Signed-off-by: Andy Zhou 
>> ---
>
> 
>
>> @@ -83,14 +83,32 @@ struct action_fifo {
>> struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE];
>>  };
>>
>> -struct recirc_keys {
>> -   struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD];
>> +struct action_flow_keys {
>> +   struct sw_flow_key key[OVS_ACTION_RECURSION_THRESHOLD];
>>  };
>
> I thought the old struct name was clearer on how it would be used -
> for when actions are deferred.

O.K. I will revert it.
>
>>
>>  static struct action_fifo __percpu *action_fifos;
>> -static struct recirc_keys __percpu *recirc_keys;
>> +static struct action_flow_keys __percpu *flow_keys;
>>  static DEFINE_PER_CPU(int, exec_actions_level);
>>
>> +/* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
>> + * space. Since the storage is pre-allocated, the caller does not
>> + * need to check for NULL return pointer.
>> + */
>
> Hmm? if level > OVS_ACTION_RECURSION_THRESHOLD, this function returns NULL.
Thanks for catching this. I will update the comment.

[PATCH net v4] xen-netback: fix race condition on XenBus disconnect

2017-03-10 Thread Igor Druzhinin

In some cases during XenBus disconnect event handling and subsequent
queue resource release there may be some TX handlers active on
other processors. Use RCU in order to synchronize with them.

Signed-off-by: Igor Druzhinin 
---
v4:
 * Use READ_ONCE instead of rcu_dereference to stop sparse complaining

v3:
 * Fix unintended semantic change in xenvif_get_ethtool_stats
 * Dropped extra code

v2:
 * Add protection for xenvif_get_ethtool_stats
 * Additional comments and fixes
---
 drivers/net/xen-netback/interface.c | 26 +-
 drivers/net/xen-netback/netback.c   |  2 +-
 drivers/net/xen-netback/xenbus.c| 20 ++--
 3 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 829b26c..8397f6c 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -165,13 +165,17 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
-   unsigned int num_queues = vif->num_queues;
+   unsigned int num_queues;
u16 index;
struct xenvif_rx_cb *cb;
 
BUG_ON(skb->dev != dev);
 
-   /* Drop the packet if queues are not set up */
+   /* Drop the packet if queues are not set up.
+* This handler should be called inside an RCU read section
+* so we don't need to enter it here explicitly.
+*/
+   num_queues = READ_ONCE(vif->num_queues);
if (num_queues < 1)
goto drop;
 
@@ -222,18 +226,18 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
+   unsigned int num_queues;
u64 rx_bytes = 0;
u64 rx_packets = 0;
u64 tx_bytes = 0;
u64 tx_packets = 0;
unsigned int index;
 
-   spin_lock(>lock);
-   if (vif->queues == NULL)
-   goto out;
+   rcu_read_lock();
+   num_queues = READ_ONCE(vif->num_queues);
 
/* Aggregate tx and rx stats from each queue */
-   for (index = 0; index < vif->num_queues; ++index) {
+   for (index = 0; index < num_queues; ++index) {
queue = >queues[index];
rx_bytes += queue->stats.rx_bytes;
rx_packets += queue->stats.rx_packets;
@@ -241,8 +245,7 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
tx_packets += queue->stats.tx_packets;
}
 
-out:
-   spin_unlock(>lock);
+   rcu_read_unlock();
 
vif->dev->stats.rx_bytes = rx_bytes;
vif->dev->stats.rx_packets = rx_packets;
@@ -378,10 +381,13 @@ static void xenvif_get_ethtool_stats(struct net_device 
*dev,
 struct ethtool_stats *stats, u64 * data)
 {
struct xenvif *vif = netdev_priv(dev);
-   unsigned int num_queues = vif->num_queues;
+   unsigned int num_queues;
int i;
unsigned int queue_index;
 
+   rcu_read_lock();
+   num_queues = READ_ONCE(vif->num_queues);
+
for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) {
unsigned long accum = 0;
for (queue_index = 0; queue_index < num_queues; ++queue_index) {
@@ -390,6 +396,8 @@ static void xenvif_get_ethtool_stats(struct net_device *dev,
}
data[i] = accum;
}
+
+   rcu_read_unlock();
 }
 
 static void xenvif_get_strings(struct net_device *dev, u32 stringset, u8 * 
data)
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index f9bcf4a..602d408 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -214,7 +214,7 @@ static void xenvif_fatal_tx_err(struct xenvif *vif)
netdev_err(vif->dev, "fatal error; disabling device\n");
vif->disabled = true;
/* Disable the vif from queue 0's kthread */
-   if (vif->queues)
+   if (vif->num_queues)
xenvif_kick_thread(>queues[0]);
 }
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index d2d7cd9..a56d3ea 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -495,26 +495,26 @@ static void backend_disconnect(struct backend_info *be)
struct xenvif *vif = be->vif;
 
if (vif) {
+   unsigned int num_queues = vif->num_queues;
unsigned int queue_index;
-   struct xenvif_queue *queues;
 
xen_unregister_watchers(vif);
 #ifdef CONFIG_DEBUG_FS
xenvif_debugfs_delif(vif);
 #endif /* CONFIG_DEBUG_FS */
xenvif_disconnect_data(vif);
-   for (queue_index = 0;
-queue_index < vif->num_queues;
-++queue_index)
-

Re: [PATCH v2] net: ethernet: aquantia: set net_device mtu when mtu is changed

2017-03-10 Thread Pavel Belous




On 10.03.2017 17:47, David Arcari wrote:

Hi,

On 03/09/2017 05:43 PM, Lino Sanfilippo wrote:

Hi,

On 09.03.2017 22:03, David Arcari wrote:

When the aquantia device mtu is changed the net_device structure is not
updated.  As a result the ip command does not properly reflect the mtu change.

Commit 5513e16421cb incorrectly assumed that __dev_set_mtu() was making the
assignment ndev->mtu = new_mtu;  This is not true in the case where the driver
has a ndo_change_mtu routine.

Fixes: 5513e16421cb ("net: ethernet: aquantia: Fixes for aq_ndev_change_mtu")

v2: no longer close/open net-device after mtu change

Cc: Pavel Belous 
Signed-off-by: David Arcari 
---
 drivers/net/ethernet/aquantia/atlantic/aq_main.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_main.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_main.c
index dad6362..bba5ebd 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_main.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_main.c
@@ -96,15 +96,9 @@ static int aq_ndev_change_mtu(struct net_device *ndev, int 
new_mtu)
struct aq_nic_s *aq_nic = netdev_priv(ndev);
int err = aq_nic_set_mtu(aq_nic, new_mtu + ETH_HLEN);

-   if (err < 0)
-   goto err_exit;
+   if (!err)
+   ndev->mtu = new_mtu;

-   if (netif_running(ndev)) {
-   aq_ndev_close(ndev);
-   aq_ndev_open(ndev);
-   }
-
-err_exit:


Removing the restart has nothing to do with the bug you want to fix here, has 
it?
I suggest to send a separate patch for this.

Regards,
Lino



I'm fine with that.  Pavel does that work for you?

It would mean that the original version of this patch should be applied and
either you or I could send the follow-up patch.

Best,

-Dave



David,

Yes, I think for this it is better to make separate patch.
I can prepare a patch for "close/open netdev" myself.

Probably you need send the original version of this patch as "v3" (I'm 
no sure what it is possible to discard v2 and apply v1 instead.)


Thank you,
Pavel

[PATCH] net: tun: use new api ethtool_{get|set}_link_ksettings

2017-03-10 Thread Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
 drivers/net/tun.c |   24 +++-
 1 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index dc1b1dd..c418f0a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2430,18 +2430,16 @@ static void tun_chr_show_fdinfo(struct seq_file *m, 
struct file *f)
 
 /* ethtool interface */
 
-static int tun_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
-{
-   cmd->supported  = 0;
-   cmd->advertising= 0;
-   ethtool_cmd_speed_set(cmd, SPEED_10);
-   cmd->duplex = DUPLEX_FULL;
-   cmd->port   = PORT_TP;
-   cmd->phy_address= 0;
-   cmd->transceiver= XCVR_INTERNAL;
-   cmd->autoneg= AUTONEG_DISABLE;
-   cmd->maxtxpkt   = 0;
-   cmd->maxrxpkt   = 0;
+static int tun_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
+{
+   ethtool_link_ksettings_zero_link_mode(cmd, supported);
+   ethtool_link_ksettings_zero_link_mode(cmd, advertising);
+   cmd->base.speed = SPEED_10;
+   cmd->base.duplex= DUPLEX_FULL;
+   cmd->base.port  = PORT_TP;
+   cmd->base.phy_address   = 0;
+   cmd->base.autoneg   = AUTONEG_DISABLE;
return 0;
 }
 
@@ -2504,7 +2502,6 @@ static int tun_set_coalesce(struct net_device *dev,
 }
 
 static const struct ethtool_ops tun_ethtool_ops = {
-   .get_settings   = tun_get_settings,
.get_drvinfo= tun_get_drvinfo,
.get_msglevel   = tun_get_msglevel,
.set_msglevel   = tun_set_msglevel,
@@ -2512,6 +2509,7 @@ static int tun_set_coalesce(struct net_device *dev,
.get_ts_info= ethtool_op_get_ts_info,
.get_coalesce   = tun_get_coalesce,
.set_coalesce   = tun_set_coalesce,
+   .get_link_ksettings = tun_get_link_ksettings,
 };
 
 static int tun_queue_resize(struct tun_struct *tun)
-- 
1.7.4.4

[PATCH net-next v3 2/2] mpls: allow TTL propagation from IP packets to be configured

2017-03-10 Thread Robert Shearman

Allow TTL propagation from IP packets to MPLS packets to be
configured. Add a new optional LWT attribute, MPLS_IPTUNNEL_TTL, which
allows the TTL to be set in the resulting MPLS packet, with the value
of 0 having the semantics of enabling propagation of the TTL from the
IP header (i.e. non-zero values disable propagation).

Also allow the configuration to be overridden globally by reusing the
same sysctl to control whether the TTL is propagated from IP packets
into the MPLS header. If the per-LWT attribute is set then it
overrides the global configuration. If the TTL isn't propagated then a
default TTL value is used which can be configured via a new sysctl,
"net.mpls.default_ttl". This is kept separate from the configuration
of whether IP TTL propagation is enabled as it can be used in the
future when non-IP payloads are supported (i.e. where there is no
payload TTL that can be propagated).

Signed-off-by: Robert Shearman 
---
 Documentation/networking/mpls-sysctl.txt |  8 
 include/net/mpls_iptunnel.h  |  2 +
 include/net/netns/mpls.h |  1 +
 include/uapi/linux/mpls_iptunnel.h   |  2 +
 net/mpls/af_mpls.c   | 11 +
 net/mpls/mpls_iptunnel.c | 73 ++--
 6 files changed, 84 insertions(+), 13 deletions(-)

diff --git a/Documentation/networking/mpls-sysctl.txt 
b/Documentation/networking/mpls-sysctl.txt
index 9badd1d6685f..2f24a1912a48 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -30,6 +30,14 @@ ip_ttl_propagate - BOOL
0 - disabled / RFC 3443 [Short] Pipe Model
1 - enabled / RFC 3443 Uniform Model (default)
 
+default_ttl - BOOL
+   Default TTL value to use for MPLS packets where it cannot be
+   propagated from an IP header, either because one isn't present
+   or ip_ttl_propagate has been disabled.
+
+   Possible values: 1 - 255
+   Default: 255
+
 conf//input - BOOL
Control whether packets can be input on this interface.
 
diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 179253f9dcfd..a18af6a16eb5 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -19,6 +19,8 @@
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
u8  labels;
+   u8  ttl_propagate;
+   u8  default_ttl;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/include/net/netns/mpls.h b/include/net/netns/mpls.h
index 08652eedabb2..6608b3693385 100644
--- a/include/net/netns/mpls.h
+++ b/include/net/netns/mpls.h
@@ -10,6 +10,7 @@ struct ctl_table_header;
 
 struct netns_mpls {
int ip_ttl_propagate;
+   int default_ttl;
size_t platform_labels;
struct mpls_route __rcu * __rcu *platform_label;
 
diff --git a/include/uapi/linux/mpls_iptunnel.h 
b/include/uapi/linux/mpls_iptunnel.h
index d80a0498f77e..f5e45095b0bb 100644
--- a/include/uapi/linux/mpls_iptunnel.h
+++ b/include/uapi/linux/mpls_iptunnel.h
@@ -16,11 +16,13 @@
 /* MPLS tunnel attributes
  * [RTA_ENCAP] = {
  * [MPLS_IPTUNNEL_DST]
+ * [MPLS_IPTUNNEL_TTL]
  * }
  */
 enum {
MPLS_IPTUNNEL_UNSPEC,
MPLS_IPTUNNEL_DST,
+   MPLS_IPTUNNEL_TTL,
__MPLS_IPTUNNEL_MAX,
 };
 #define MPLS_IPTUNNEL_MAX (__MPLS_IPTUNNEL_MAX - 1)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 0e1046f21af8..0c5d111abe36 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -34,6 +34,7 @@
 static int zero = 0;
 static int one = 1;
 static int label_limit = (1 << 20) - 1;
+static int ttl_max = 255;
 
 static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
   struct nlmsghdr *nlh, struct net *net, u32 portid,
@@ -2042,6 +2043,15 @@ static const struct ctl_table mpls_table[] = {
.extra1 = ,
.extra2 = ,
},
+   {
+   .procname   = "default_ttl",
+   .data   = MPLS_NS_SYSCTL_OFFSET(mpls.default_ttl),
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _max,
+   },
{ }
 };
 
@@ -2053,6 +2063,7 @@ static int mpls_net_init(struct net *net)
net->mpls.platform_labels = 0;
net->mpls.platform_label = NULL;
net->mpls.ip_ttl_propagate = 1;
+   net->mpls.default_ttl = 255;
 
table = kmemdup(mpls_table, sizeof(mpls_table), GFP_KERNEL);
if (table == NULL)
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index e4e4424f9eb1..22f71fce0bfb 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -29,6 +29,7 @@
 
 static const struct nla_policy mpls_iptunnel_policy[MPLS_IPTUNNEL_MAX + 1] = {

[PATCH net-next v3 1/2] mpls: allow TTL propagation to IP packets to be configured

2017-03-10 Thread Robert Shearman

Provide the ability to control on a per-route basis whether the TTL
value from an MPLS packet is propagated to an IPv4/IPv6 packet when
the last label is popped as per the theoretical model in RFC 3443
through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to
mean disable propagation and 1 to mean enable propagation.

In order to provide the ability to change the behaviour for packets
arriving with IPv4/IPv6 Explicit Null labels and to provide an easy
way for a user to change the behaviour for all existing routes without
having to reprogram them, a global knob is provided. This is done
through the addition of a new per-namespace sysctl,
"net.mpls.ip_ttl_propagate", which defaults to enabled. If the
per-route attribute is set (either enabled or disabled) then it
overrides the global configuration.

Signed-off-by: Robert Shearman 
---
 Documentation/networking/mpls-sysctl.txt | 11 
 include/net/netns/mpls.h |  2 +
 include/uapi/linux/rtnetlink.h   |  1 +
 net/mpls/af_mpls.c   | 87 +---
 net/mpls/internal.h  |  7 +++
 5 files changed, 100 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/mpls-sysctl.txt 
b/Documentation/networking/mpls-sysctl.txt
index 15d8d16934fd..9badd1d6685f 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -19,6 +19,17 @@ platform_labels - INTEGER
Possible values: 0 - 1048575
Default: 0
 
+ip_ttl_propagate - BOOL
+   Control whether TTL is propagated from the IPv4/IPv6 header to
+   the MPLS header on imposing labels and propagated from the
+   MPLS header to the IPv4/IPv6 header on popping the last label.
+
+   If disabled, the MPLS transport network will appear as a
+   single hop to transit traffic.
+
+   0 - disabled / RFC 3443 [Short] Pipe Model
+   1 - enabled / RFC 3443 Uniform Model (default)
+
 conf//input - BOOL
Control whether packets can be input on this interface.
 
diff --git a/include/net/netns/mpls.h b/include/net/netns/mpls.h
index d29203651c01..08652eedabb2 100644
--- a/include/net/netns/mpls.h
+++ b/include/net/netns/mpls.h
@@ -9,8 +9,10 @@ struct mpls_route;
 struct ctl_table_header;
 
 struct netns_mpls {
+   int ip_ttl_propagate;
size_t platform_labels;
struct mpls_route __rcu * __rcu *platform_label;
+
struct ctl_table_header *ctl;
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 6546917d605a..30fb25e851db 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -319,6 +319,7 @@ enum rtattr_type_t {
RTA_EXPIRES,
RTA_PAD,
RTA_UID,
+   RTA_TTL_PROPAGATE,
__RTA_MAX
 };
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 3818686182b2..0e1046f21af8 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -32,6 +32,7 @@
 #define MPLS_NEIGH_TABLE_UNSPEC (NEIGH_LINK_TABLE + 1)
 
 static int zero = 0;
+static int one = 1;
 static int label_limit = (1 << 20) - 1;
 
 static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
@@ -220,8 +221,8 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
return >rt_nh[nh_index];
 }
 
-static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
-   struct mpls_entry_decoded dec)
+static bool mpls_egress(struct net *net, struct mpls_route *rt,
+   struct sk_buff *skb, struct mpls_entry_decoded dec)
 {
enum mpls_payload_type payload_type;
bool success = false;
@@ -246,22 +247,46 @@ static bool mpls_egress(struct mpls_route *rt, struct 
sk_buff *skb,
switch (payload_type) {
case MPT_IPV4: {
struct iphdr *hdr4 = ip_hdr(skb);
+   u8 new_ttl;
skb->protocol = htons(ETH_P_IP);
+
+   /* If propagating TTL, take the decremented TTL from
+* the incoming MPLS header, otherwise decrement the
+* TTL, but only if not 0 to avoid underflow.
+*/
+   if (rt->rt_ttl_propagate == MPLS_TTL_PROP_ENABLED ||
+   (rt->rt_ttl_propagate == MPLS_TTL_PROP_DEFAULT &&
+net->mpls.ip_ttl_propagate))
+   new_ttl = dec.ttl;
+   else
+   new_ttl = hdr4->ttl ? hdr4->ttl - 1 : 0;
+
csum_replace2(>check,
  htons(hdr4->ttl << 8),
- htons(dec.ttl << 8));
-   hdr4->ttl = dec.ttl;
+ htons(new_ttl << 8));
+   hdr4->ttl = new_ttl;
success = true;
break;
}
case MPT_IPV6: {
struct ipv6hdr *hdr6 = ipv6_hdr(skb);
skb->protocol = htons(ETH_P_IPV6);
-   hdr6->hop_limit =

[PATCH net-next v3 0/2] mpls: allow TTL propagation to/from IP packets to be configured

2017-03-10 Thread Robert Shearman

It is sometimes desirable to present an MPLS transport network as a
single hop to traffic transiting it because it prevents confusion when
diagnosing failures. An example of where confusion can be generated is
when addresses used in the provider network overlap with addresses in
the overlay network and the addresses get exposed through ICMP errors
generated as packets transit the provider network.

In addition, RFC 3443 defines two methods of deriving TTL for an
outgoing packet: Uniform Model where the TTL is propagated to/from the
MPLS header and both Pipe Models and Short Pipe Models (with and
without PHP) where the TTL is not propagated to/from the MPLS header.

Changes in v3:
 - decrement ttl on popping last label when not doing ttl propagation,
   as suggested by David Ahern.
 - add comment to describe what the somewhat complex conditionals are
   doing to work out what ttl to use in mpls_iptunnel.c.
 - rearrange fields fields in struct netns_mpls to keep the platform
   label fields together, as suggested by David Ahern.

Changes in v2:
 - add references to RFC 3443 as suggested by David Ahern
 - fix setting of skb->protocol as noticed by David Ahern
 - implement per-route/per-LWT configurability as suggested by Eric
   Biederman
 - split into two patches for ease of review

Robert Shearman (2):
  mpls: allow TTL propagation to IP packets to be configured
  mpls: allow TTL propagation from IP packets to be configured

 Documentation/networking/mpls-sysctl.txt | 19 +++
 include/net/mpls_iptunnel.h  |  2 +
 include/net/netns/mpls.h |  3 +
 include/uapi/linux/mpls_iptunnel.h   |  2 +
 include/uapi/linux/rtnetlink.h   |  1 +
 net/mpls/af_mpls.c   | 98 +---
 net/mpls/internal.h  |  7 +++
 net/mpls/mpls_iptunnel.c | 73 +++-
 8 files changed, 184 insertions(+), 21 deletions(-)

-- 
2.1.4

Re: [PATCH v2] selinux: check for address length in selinux_socket_bind()

2017-03-10 Thread Paul Moore

On Fri, Mar 10, 2017 at 7:01 AM, Paul Moore  wrote:
> On Thu, Mar 9, 2017 at 2:12 AM, David Miller  wrote:
>> From: Alexander Potapenko 
>> Date: Mon,  6 Mar 2017 19:46:14 +0100
>>
>>> KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
>>> uninitialized memory in selinux_socket_bind():
>>  ...
>>> (the line numbers are relative to 4.8-rc6, but the bug persists upstream)
>>>
>>> , when I run the following program as root:
>>  ...
>>> (for different values of |size| other error reports are printed).
>>>
>>> This happens because bind() unconditionally copies |size| bytes of
>>> |addr| to the kernel, leaving the rest uninitialized. Then
>>> security_socket_bind() reads the IP address bytes, including the
>>> uninitialized ones, to determine the port, or e.g. pass them further to
>>> sel_netnode_find(), which uses them to calculate a hash.
>>>
>>> Signed-off-by: Alexander Potapenko 
>>
>> Are the SELINUX folks going to pick this up or should I?
>
> Yes, it's on my list of things to merge, I was just a bit distracted
> this week with yet another audit problem.  I'm going to start making
> my way through the patch backlog today.

Just merged into selinux/next, thanks.  My apologies for the delay.

-- 
paul moore
www.paul-moore.com

Re: net/sctp: recursive locking in sctp_do_peeloff

2017-03-10 Thread Dmitry Vyukov

On Fri, Mar 10, 2017 at 8:46 PM, Marcelo Ricardo Leitner
 wrote:
> On Fri, Mar 10, 2017 at 4:11 PM, Dmitry Vyukov  wrote:
>> Hello,
>>
>> I've got the following recursive locking report while running
>> syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:
>>
>> [ INFO: possible recursive locking detected ]
>> 4.10.0+ #14 Not tainted
>> -
>> syz-executor3/5560 is trying to acquire lock:
>>  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
>> include/net/sock.h:1460 [inline]
>>  (sk_lock-AF_INET6){+.+.+.}, at: []
>> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>>
>> but task is already holding lock:
>>  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
>> include/net/sock.h:1460 [inline]
>>  (sk_lock-AF_INET6){+.+.+.}, at: []
>> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>>
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>>
>>CPU0
>>
>>   lock(sk_lock-AF_INET6);
>>   lock(sk_lock-AF_INET6);
>>
>>  *** DEADLOCK ***
>>
>>  May be due to missing lock nesting notation
>
> Pretty much the case, I suppose. The lock held by sctp_getsockopt() is
> on one socket, while the other lock that sctp_close() is getting later
> is on the newly created (which failed) socket during peeloff
> operation.


Does this mean that never-ever lock 2 sockets at a time except for
this case? If so, it probably suggests that this case should not do it
either.


> I don´t know how to fix this nesting notation in this situation, but
> any idea why sock_create failed? Seems security_socket_post_create()
> failed in there, so sock_release was called with sock->ops still
> valid.

No idea. The fuzzer frequently creates low memory conditions, but
there are no alloc failures messages in the log (maybe some allocation
used NOWARN?).

Re: net/sctp: recursive locking in sctp_do_peeloff

2017-03-10 Thread Marcelo Ricardo Leitner

On Fri, Mar 10, 2017 at 4:11 PM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following recursive locking report while running
> syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:
>
> [ INFO: possible recursive locking detected ]
> 4.10.0+ #14 Not tainted
> -
> syz-executor3/5560 is trying to acquire lock:
>  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
> include/net/sock.h:1460 [inline]
>  (sk_lock-AF_INET6){+.+.+.}, at: []
> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>
> but task is already holding lock:
>  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
> include/net/sock.h:1460 [inline]
>  (sk_lock-AF_INET6){+.+.+.}, at: []
> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>CPU0
>
>   lock(sk_lock-AF_INET6);
>   lock(sk_lock-AF_INET6);
>
>  *** DEADLOCK ***
>
>  May be due to missing lock nesting notation

Pretty much the case, I suppose. The lock held by sctp_getsockopt() is
on one socket, while the other lock that sctp_close() is getting later
is on the newly created (which failed) socket during peeloff
operation.

I don´t know how to fix this nesting notation in this situation, but
any idea why sock_create failed? Seems security_socket_post_create()
failed in there, so sock_release was called with sock->ops still
valid.

>
> 1 lock held by syz-executor3/5560:
>  #0:  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
> include/net/sock.h:1460 [inline]
>  #0:  (sk_lock-AF_INET6){+.+.+.}, at: []
> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>
> stack backtrace:
> CPU: 0 PID: 5560 Comm: syz-executor3 Not tainted 4.10.0+ #14
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
>  print_deadlock_bug kernel/locking/lockdep.c:1729 [inline]
>  check_deadlock kernel/locking/lockdep.c:1773 [inline]
>  validate_chain kernel/locking/lockdep.c:2251 [inline]
>  __lock_acquire+0xef2/0x3430 kernel/locking/lockdep.c:3340
>  lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
>  lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
>  lock_sock include/net/sock.h:1460 [inline]
>  sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
>  sock_release+0x8d/0x1e0 net/socket.c:597
>  __sock_create+0x38b/0x870 net/socket.c:1226
>  sock_create+0x7f/0xa0 net/socket.c:1237
>  sctp_do_peeloff+0x1a2/0x440 net/sctp/socket.c:4879
>  sctp_getsockopt_peeloff net/sctp/socket.c:4914 [inline]
>  sctp_getsockopt+0x111a/0x67e0 net/sctp/socket.c:6628
>  sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2690
>  SYSC_getsockopt net/socket.c:1817 [inline]
>  SyS_getsockopt+0x240/0x380 net/socket.c:1799
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x44fb79
> RSP: 002b:7f35f232bb58 EFLAGS: 0212 ORIG_RAX: 0037
> RAX: ffda RBX: 0084 RCX: 0044fb79
> RDX: 0066 RSI: 0084 RDI: 0006
> RBP: 0006 R08: 20119000 R09: 
> R10: 2058dff8 R11: 0212 R12: 00708000
> R13: 0103 R14: 0001 R15:

net/sctp: recursive locking in sctp_do_peeloff

2017-03-10 Thread Dmitry Vyukov

Hello,

I've got the following recursive locking report while running
syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:

[ INFO: possible recursive locking detected ]
4.10.0+ #14 Not tainted
-
syz-executor3/5560 is trying to acquire lock:
 (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
include/net/sock.h:1460 [inline]
 (sk_lock-AF_INET6){+.+.+.}, at: []
sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497

but task is already holding lock:
 (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
include/net/sock.h:1460 [inline]
 (sk_lock-AF_INET6){+.+.+.}, at: []
sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(sk_lock-AF_INET6);
  lock(sk_lock-AF_INET6);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by syz-executor3/5560:
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
include/net/sock.h:1460 [inline]
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: []
sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611

stack backtrace:
CPU: 0 PID: 5560 Comm: syz-executor3 Not tainted 4.10.0+ #14
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
 print_deadlock_bug kernel/locking/lockdep.c:1729 [inline]
 check_deadlock kernel/locking/lockdep.c:1773 [inline]
 validate_chain kernel/locking/lockdep.c:2251 [inline]
 __lock_acquire+0xef2/0x3430 kernel/locking/lockdep.c:3340
 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
 lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
 lock_sock include/net/sock.h:1460 [inline]
 sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
 sock_release+0x8d/0x1e0 net/socket.c:597
 __sock_create+0x38b/0x870 net/socket.c:1226
 sock_create+0x7f/0xa0 net/socket.c:1237
 sctp_do_peeloff+0x1a2/0x440 net/sctp/socket.c:4879
 sctp_getsockopt_peeloff net/sctp/socket.c:4914 [inline]
 sctp_getsockopt+0x111a/0x67e0 net/sctp/socket.c:6628
 sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2690
 SYSC_getsockopt net/socket.c:1817 [inline]
 SyS_getsockopt+0x240/0x380 net/socket.c:1799
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x44fb79
RSP: 002b:7f35f232bb58 EFLAGS: 0212 ORIG_RAX: 0037
RAX: ffda RBX: 0084 RCX: 0044fb79
RDX: 0066 RSI: 0084 RDI: 0006
RBP: 0006 R08: 20119000 R09: 
R10: 2058dff8 R11: 0212 R12: 00708000
R13: 0103 R14: 0001 R15:

[PATCH net-next 09/13] nfp: store dma direction in data path structure

2017-03-10 Thread Jakub Kicinski

Instead of testing if xdp_prog is present store the dma direction
in data path structure.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   | 11 --
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 45 --
 2 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 5a92f6e41dae..db92463da440 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -438,6 +438,7 @@ struct nfp_stat_pair {
  * @bpf_offload_skip_sw:  Offloaded BPF program will not be rerun by cls_bpf
  * @bpf_offload_xdp:   Offloaded BPF program is XDP
  * @chained_metadata_format:  Firemware will use new metadata format
+ * @rx_dma_dir:Mapping direction for RX buffers
  * @ctrl:  Local copy of the control register/word.
  * @fl_bufsz:  Currently configured size of the freelist buffers
  * @rx_offset: Offset in the RX buffers where packet data starts
@@ -458,10 +459,12 @@ struct nfp_net_dp {
struct device *dev;
struct net_device *netdev;
 
-   unsigned is_vf:1;
-   unsigned bpf_offload_skip_sw:1;
-   unsigned bpf_offload_xdp:1;
-   unsigned chained_metadata_format:1;
+   u8 is_vf:1;
+   u8 bpf_offload_skip_sw:1;
+   u8 bpf_offload_xdp:1;
+   u8 chained_metadata_format:1;
+
+   u8 rx_dma_dir;
 
u32 ctrl;
u32 fl_bufsz;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index a9359da64f80..ab03f2f301cd 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -85,20 +85,18 @@ void nfp_net_get_fw_version(struct nfp_net_fw_version 
*fw_ver,
put_unaligned_le32(reg, fw_ver);
 }
 
-static dma_addr_t
-nfp_net_dma_map_rx(struct nfp_net_dp *dp, void *frag, int direction)
+static dma_addr_t nfp_net_dma_map_rx(struct nfp_net_dp *dp, void *frag)
 {
return dma_map_single(dp->dev, frag + NFP_NET_RX_BUF_HEADROOM,
  dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
- direction);
+ dp->rx_dma_dir);
 }
 
-static void
-nfp_net_dma_unmap_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr,
-int direction)
+static void nfp_net_dma_unmap_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr)
 {
dma_unmap_single(dp->dev, dma_addr,
-dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA, direction);
+dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
+dp->rx_dma_dir);
 }
 
 /* Firmware reconfig
@@ -991,8 +989,7 @@ static void nfp_net_xdp_complete(struct nfp_net_tx_ring 
*tx_ring)
if (!tx_ring->txbufs[idx].frag)
continue;
 
-   nfp_net_dma_unmap_rx(dp, tx_ring->txbufs[idx].dma_addr,
-DMA_BIDIRECTIONAL);
+   nfp_net_dma_unmap_rx(dp, tx_ring->txbufs[idx].dma_addr);
__free_page(virt_to_page(tx_ring->txbufs[idx].frag));
 
done_pkts++;
@@ -1037,8 +1034,7 @@ nfp_net_tx_ring_reset(struct nfp_net_dp *dp, struct 
nfp_net_tx_ring *tx_ring)
tx_buf = _ring->txbufs[idx];
 
if (tx_ring == r_vec->xdp_ring) {
-   nfp_net_dma_unmap_rx(dp, tx_buf->dma_addr,
-DMA_BIDIRECTIONAL);
+   nfp_net_dma_unmap_rx(dp, tx_buf->dma_addr);
__free_page(virt_to_page(tx_ring->txbufs[idx].frag));
} else {
struct sk_buff *skb = tx_ring->txbufs[idx].skb;
@@ -1139,7 +1135,6 @@ static void *
 nfp_net_rx_alloc_one(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring,
 dma_addr_t *dma_addr)
 {
-   int direction;
void *frag;
 
if (!dp->xdp_prog)
@@ -1151,9 +1146,7 @@ nfp_net_rx_alloc_one(struct nfp_net_dp *dp, struct 
nfp_net_rx_ring *rx_ring,
return NULL;
}
 
-   direction = dp->xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
-
-   *dma_addr = nfp_net_dma_map_rx(dp, frag, direction);
+   *dma_addr = nfp_net_dma_map_rx(dp, frag);
if (dma_mapping_error(dp->dev, *dma_addr)) {
nfp_net_free_frag(frag, dp->xdp_prog);
nn_dp_warn(dp, "Failed to map DMA RX buffer\n");
@@ -1163,9 +1156,7 @@ nfp_net_rx_alloc_one(struct nfp_net_dp *dp, struct 
nfp_net_rx_ring *rx_ring,
return frag;
 }
 
-static void *
-nfp_net_napi_alloc_one(struct nfp_net_dp *dp, int direction,
-  dma_addr_t *dma_addr)
+static void *nfp_net_napi_alloc_one(struct nfp_net_dp *dp, dma_addr_t 
*dma_addr)
 {
void *frag;
 
@@ -1178,7 +1169,7 @@

[PATCH net-next 11/13] nfp: reorganize pkt_off variable

2017-03-10 Thread Jakub Kicinski

Rename pkt_off variable to dma_off, it should hold data offset
counting from beginning of DMA mapping.  Compute the value only
in XDP context.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 513f55dd746b..0e4fa6802733 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1453,7 +1453,7 @@ nfp_net_rx_drop(struct nfp_net_r_vector *r_vec, struct 
nfp_net_rx_ring *rx_ring,
 static bool
 nfp_net_tx_xdp_buf(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring,
   struct nfp_net_tx_ring *tx_ring,
-  struct nfp_net_rx_buf *rxbuf, unsigned int pkt_off,
+  struct nfp_net_rx_buf *rxbuf, unsigned int dma_off,
   unsigned int pkt_len)
 {
struct nfp_net_tx_buf *txbuf;
@@ -1484,14 +1484,14 @@ nfp_net_tx_xdp_buf(struct nfp_net_dp *dp, struct 
nfp_net_rx_ring *rx_ring,
txbuf->pkt_cnt = 1;
txbuf->real_len = pkt_len;
 
-   dma_sync_single_for_device(dp->dev, rxbuf->dma_addr + pkt_off,
+   dma_sync_single_for_device(dp->dev, rxbuf->dma_addr + dma_off,
   pkt_len, DMA_BIDIRECTIONAL);
 
/* Build TX descriptor */
txd = _ring->txds[wr_idx];
txd->offset_eop = PCIE_DESC_TX_EOP;
txd->dma_len = cpu_to_le16(pkt_len);
-   nfp_desc_set_dma_addr(txd, rxbuf->dma_addr + pkt_off);
+   nfp_desc_set_dma_addr(txd, rxbuf->dma_addr + dma_off);
txd->data_len = cpu_to_le16(pkt_len);
 
txd->flags = 0;
@@ -1541,7 +1541,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
tx_ring = r_vec->xdp_ring;
 
while (pkts_polled < budget) {
-   unsigned int meta_len, data_len, data_off, pkt_len, pkt_off;
+   unsigned int meta_len, data_len, data_off, pkt_len;
struct nfp_net_rx_buf *rxbuf;
struct nfp_net_rx_desc *rxd;
dma_addr_t new_dma_addr;
@@ -1579,10 +1579,9 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
pkt_len = data_len - meta_len;
 
if (dp->rx_offset == NFP_NET_CFG_RX_OFFSET_DYNAMIC)
-   pkt_off = meta_len;
+   data_off = NFP_NET_RX_BUF_HEADROOM + meta_len;
else
-   pkt_off = dp->rx_offset;
-   data_off = NFP_NET_RX_BUF_HEADROOM + pkt_off;
+   data_off = NFP_NET_RX_BUF_HEADROOM + dp->rx_offset;
 
/* Stats update */
u64_stats_update_begin(_vec->rx_sync);
@@ -1592,10 +1591,12 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
if (xdp_prog && !(rxd->rxd.flags & PCIE_DESC_RX_BPF &&
  dp->bpf_offload_xdp)) {
+   unsigned int dma_off;
int act;
 
+   dma_off = data_off - NFP_NET_RX_BUF_HEADROOM;
dma_sync_single_for_cpu(dp->dev,
-   rxbuf->dma_addr + pkt_off,
+   rxbuf->dma_addr + dma_off,
pkt_len, DMA_BIDIRECTIONAL);
act = nfp_net_run_xdp(xdp_prog, rxbuf->frag + data_off,
  pkt_len);
@@ -1605,7 +1606,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
case XDP_TX:
if (unlikely(!nfp_net_tx_xdp_buf(dp, rx_ring,
 tx_ring, rxbuf,
-pkt_off,
+dma_off,
 pkt_len)))
trace_xdp_exception(dp->netdev,
xdp_prog, act);
-- 
2.11.0

[PATCH net-next 07/13] nfp: use dp to carry xdp_prog at reconfig time

2017-03-10 Thread Jakub Kicinski

Use xdp_prog member of data path struct to carry the xdp_prog to
alloc/free free functions.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  1 -
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 82 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  6 +-
 3 files changed, 37 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 84774c281b61..19dacc3f1269 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -815,7 +815,6 @@ nfp_net_irqs_assign(struct nfp_net *nn, struct msix_entry 
*irq_entries,
 struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn);
 int
 nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new,
- struct bpf_prog **xdp_prog,
  struct nfp_net_ring_set *rx, struct nfp_net_ring_set *tx);
 
 #ifdef CONFIG_NFP_DEBUG
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 862e86cb5688..6ab824a48d1d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1130,21 +1130,19 @@ nfp_net_free_frag(void *frag, bool xdp)
  * @dp:NFP Net data path struct
  * @rx_ring:   RX ring structure of the skb
  * @dma_addr:  Pointer to storage for DMA address (output param)
- * @xdp:   Whether XDP is enabled
  *
  * This function will allcate a new page frag, map it for DMA.
  *
  * Return: allocated page frag or NULL on failure.
  */
 static void *
-nfp_net_rx_alloc_one(struct nfp_net_dp *dp,
-struct nfp_net_rx_ring *rx_ring, dma_addr_t *dma_addr,
-bool xdp)
+nfp_net_rx_alloc_one(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring,
+dma_addr_t *dma_addr)
 {
int direction;
void *frag;
 
-   if (!xdp)
+   if (!dp->xdp_prog)
frag = netdev_alloc_frag(dp->fl_bufsz);
else
frag = page_address(alloc_page(GFP_KERNEL | __GFP_COLD));
@@ -1153,11 +1151,11 @@ nfp_net_rx_alloc_one(struct nfp_net_dp *dp,
return NULL;
}
 
-   direction = xdp ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
+   direction = dp->xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
 
*dma_addr = nfp_net_dma_map_rx(dp, frag, direction);
if (dma_mapping_error(dp->dev, *dma_addr)) {
-   nfp_net_free_frag(frag, xdp);
+   nfp_net_free_frag(frag, dp->xdp_prog);
nn_dp_warn(dp, "Failed to map DMA RX buffer\n");
return NULL;
}
@@ -1253,7 +1251,6 @@ static void nfp_net_rx_ring_reset(struct nfp_net_rx_ring 
*rx_ring)
  * nfp_net_rx_ring_bufs_free() - Free any buffers currently on the RX ring
  * @dp:NFP Net data path struct
  * @rx_ring:   RX ring to remove buffers from
- * @xdp:   Whether XDP is enabled
  *
  * Assumes that the device is stopped and buffers are in [0, ring->cnt - 1)
  * entries.  After device is disabled nfp_net_rx_ring_reset() must be called
@@ -1261,9 +1258,9 @@ static void nfp_net_rx_ring_reset(struct nfp_net_rx_ring 
*rx_ring)
  */
 static void
 nfp_net_rx_ring_bufs_free(struct nfp_net_dp *dp,
- struct nfp_net_rx_ring *rx_ring, bool xdp)
+ struct nfp_net_rx_ring *rx_ring)
 {
-   int direction = xdp ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
+   int direction = dp->xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
unsigned int i;
 
for (i = 0; i < rx_ring->cnt - 1; i++) {
@@ -1276,7 +1273,7 @@ nfp_net_rx_ring_bufs_free(struct nfp_net_dp *dp,
 
nfp_net_dma_unmap_rx(dp, rx_ring->rxbufs[i].dma_addr,
 direction);
-   nfp_net_free_frag(rx_ring->rxbufs[i].frag, xdp);
+   nfp_net_free_frag(rx_ring->rxbufs[i].frag, dp->xdp_prog);
rx_ring->rxbufs[i].dma_addr = 0;
rx_ring->rxbufs[i].frag = NULL;
}
@@ -1286,11 +1283,10 @@ nfp_net_rx_ring_bufs_free(struct nfp_net_dp *dp,
  * nfp_net_rx_ring_bufs_alloc() - Fill RX ring with buffers (don't give to FW)
  * @dp:NFP Net data path struct
  * @rx_ring:   RX ring to remove buffers from
- * @xdp:   Whether XDP is enabled
  */
 static int
 nfp_net_rx_ring_bufs_alloc(struct nfp_net_dp *dp,
-  struct nfp_net_rx_ring *rx_ring, bool xdp)
+  struct nfp_net_rx_ring *rx_ring)
 {
struct nfp_net_rx_buf *rxbufs;
unsigned int i;
@@ -1299,10 +1295,9 @@ nfp_net_rx_ring_bufs_alloc(struct nfp_net_dp *dp,
 
for (i = 0; i < rx_ring->cnt - 1; i++) {
rxbufs[i].frag =
-   nfp_net_rx_alloc_one(dp, rx_ring, [i].dma_addr,
-

[PATCH net-next 10/13] nfp: validate rx offset from the BAR and size down it's field

2017-03-10 Thread Jakub Kicinski

NFP_NET_CFG_RX_OFFSET is 32bit wide, make sure what we read from
there is reasonable for packet headroom.  This allows us to store
the rx_offset in a 8bit variable.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h|  6 +++---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 14 +++---
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index db92463da440..5f0547c6efb8 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -439,9 +439,9 @@ struct nfp_stat_pair {
  * @bpf_offload_xdp:   Offloaded BPF program is XDP
  * @chained_metadata_format:  Firemware will use new metadata format
  * @rx_dma_dir:Mapping direction for RX buffers
+ * @rx_offset: Offset in the RX buffers where packet data starts
  * @ctrl:  Local copy of the control register/word.
  * @fl_bufsz:  Currently configured size of the freelist buffers
- * @rx_offset: Offset in the RX buffers where packet data starts
  * @xdp_prog:  Installed XDP program
  * @tx_rings:  Array of pre-allocated TX ring structures
  * @rx_rings:  Array of pre-allocated RX ring structures
@@ -466,11 +466,11 @@ struct nfp_net_dp {
 
u8 rx_dma_dir;
 
+   u8 rx_offset;
+
u32 ctrl;
u32 fl_bufsz;
 
-   u32 rx_offset;
-
struct bpf_prog *xdp_prog;
 
struct nfp_net_tx_ring *tx_rings;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index ab03f2f301cd..513f55dd746b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3124,10 +3124,18 @@ int nfp_net_netdev_init(struct net_device *netdev)
nfp_net_write_mac_addr(nn);
 
/* Determine RX packet/metadata boundary offset */
-   if (nn->fw_ver.major >= 2)
-   nn->dp.rx_offset = nn_readl(nn, NFP_NET_CFG_RX_OFFSET);
-   else
+   if (nn->fw_ver.major >= 2) {
+   u32 reg;
+
+   reg = nn_readl(nn, NFP_NET_CFG_RX_OFFSET);
+   if (reg > NFP_NET_MAX_PREPEND) {
+   nn_err(nn, "Invalid rx offset: %d\n", reg);
+   return -EINVAL;
+   }
+   nn->dp.rx_offset = reg;
+   } else {
nn->dp.rx_offset = NFP_NET_RX_OFFSET;
+   }
 
/* Set default MTU and Freelist buffer size */
if (nn->max_mtu < NFP_NET_DEFAULT_MTU)
-- 
2.11.0

[PATCH net-next 05/13] nfp: use dp to carry fl_bufsz at reconfig time

2017-03-10 Thread Jakub Kicinski

Use fl_bufsz member of data path struct to carry desired size of
free list entries.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  3 --
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 56 +++---
 2 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 74f6d485351f..ab5865b955dd 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -316,8 +316,6 @@ struct nfp_net_rx_buf {
  * @rxds:   Virtual address of FL/RX ring in host memory
  * @dma:DMA address of the FL/RX ring
  * @size:   Size, in bytes, of the FL/RX ring (needed to free)
- * @bufsz: Buffer allocation size for convenience of management routines
- * (NOTE: this is in second cache line, do not use on fast path!)
  */
 struct nfp_net_rx_ring {
struct nfp_net_r_vector *r_vec;
@@ -339,7 +337,6 @@ struct nfp_net_rx_ring {
 
dma_addr_t dma;
unsigned int size;
-   unsigned int bufsz;
 } cacheline_aligned;
 
 /**
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 52f0e9dfd15a..92d4c2991a85 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -86,19 +86,19 @@ void nfp_net_get_fw_version(struct nfp_net_fw_version 
*fw_ver,
 }
 
 static dma_addr_t
-nfp_net_dma_map_rx(struct nfp_net_dp *dp, void *frag, unsigned int bufsz,
-  int direction)
+nfp_net_dma_map_rx(struct nfp_net_dp *dp, void *frag, int direction)
 {
return dma_map_single(dp->dev, frag + NFP_NET_RX_BUF_HEADROOM,
- bufsz - NFP_NET_RX_BUF_NON_DATA, direction);
+ dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
+ direction);
 }
 
 static void
 nfp_net_dma_unmap_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr,
-unsigned int bufsz, int direction)
+int direction)
 {
dma_unmap_single(dp->dev, dma_addr,
-bufsz - NFP_NET_RX_BUF_NON_DATA, direction);
+dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA, direction);
 }
 
 /* Firmware reconfig
@@ -992,7 +992,7 @@ static void nfp_net_xdp_complete(struct nfp_net_tx_ring 
*tx_ring)
continue;
 
nfp_net_dma_unmap_rx(dp, tx_ring->txbufs[idx].dma_addr,
-dp->fl_bufsz, DMA_BIDIRECTIONAL);
+DMA_BIDIRECTIONAL);
__free_page(virt_to_page(tx_ring->txbufs[idx].frag));
 
done_pkts++;
@@ -1038,7 +1038,7 @@ nfp_net_tx_ring_reset(struct nfp_net_dp *dp, struct 
nfp_net_tx_ring *tx_ring)
 
if (tx_ring == r_vec->xdp_ring) {
nfp_net_dma_unmap_rx(dp, tx_buf->dma_addr,
-dp->fl_bufsz, DMA_BIDIRECTIONAL);
+DMA_BIDIRECTIONAL);
__free_page(virt_to_page(tx_ring->txbufs[idx].frag));
} else {
struct sk_buff *skb = tx_ring->txbufs[idx].skb;
@@ -1130,7 +1130,6 @@ nfp_net_free_frag(void *frag, bool xdp)
  * @dp:NFP Net data path struct
  * @rx_ring:   RX ring structure of the skb
  * @dma_addr:  Pointer to storage for DMA address (output param)
- * @fl_bufsz:  size of freelist buffers
  * @xdp:   Whether XDP is enabled
  *
  * This function will allcate a new page frag, map it for DMA.
@@ -1140,13 +1139,13 @@ nfp_net_free_frag(void *frag, bool xdp)
 static void *
 nfp_net_rx_alloc_one(struct nfp_net_dp *dp,
 struct nfp_net_rx_ring *rx_ring, dma_addr_t *dma_addr,
-unsigned int fl_bufsz, bool xdp)
+bool xdp)
 {
int direction;
void *frag;
 
if (!xdp)
-   frag = netdev_alloc_frag(fl_bufsz);
+   frag = netdev_alloc_frag(dp->fl_bufsz);
else
frag = page_address(alloc_page(GFP_KERNEL | __GFP_COLD));
if (!frag) {
@@ -1156,7 +1155,7 @@ nfp_net_rx_alloc_one(struct nfp_net_dp *dp,
 
direction = xdp ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
 
-   *dma_addr = nfp_net_dma_map_rx(dp, frag, fl_bufsz, direction);
+   *dma_addr = nfp_net_dma_map_rx(dp, frag, direction);
if (dma_mapping_error(dp->dev, *dma_addr)) {
nfp_net_free_frag(frag, xdp);
nn_dp_warn(dp, "Failed to map DMA RX buffer\n");
@@ -1181,7 +1180,7 @@ nfp_net_napi_alloc_one(struct nfp_net_dp *dp, int 
direction,
return NULL;
}
 
-   *dma_addr = nfp_net_dma_map_rx(dp, frag, dp->fl_bufsz, direction);
+   *dma_addr = nfp_net_dma_map_rx(dp, frag,

[PATCH net-next 13/13] nfp: add support for xdp_adjust_head()

2017-03-10 Thread Jakub Kicinski

Support prepending data from XDP.  We are already always allocating
some headroom because FW may prepend metadata to packets.
xdp_adjust_head() can be supported by making sure that headroom is
big enough for XDP.  In case FW had prepended metadata to the packet,
however, we have to move it out of the way before we call XDP.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  2 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 98 +++---
 2 files changed, 70 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 5f0547c6efb8..4d45f4573b57 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -439,6 +439,7 @@ struct nfp_stat_pair {
  * @bpf_offload_xdp:   Offloaded BPF program is XDP
  * @chained_metadata_format:  Firemware will use new metadata format
  * @rx_dma_dir:Mapping direction for RX buffers
+ * @rx_dma_off:Offset at which DMA packets (for XDP headroom)
  * @rx_offset: Offset in the RX buffers where packet data starts
  * @ctrl:  Local copy of the control register/word.
  * @fl_bufsz:  Currently configured size of the freelist buffers
@@ -465,6 +466,7 @@ struct nfp_net_dp {
u8 chained_metadata_format:1;
 
u8 rx_dma_dir;
+   u8 rx_dma_off;
 
u8 rx_offset;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index fe7c3f6d820d..f134f1808b9a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1100,6 +1100,7 @@ nfp_net_calc_fl_bufsz(struct nfp_net_dp *dp)
unsigned int fl_bufsz;
 
fl_bufsz = NFP_NET_RX_BUF_HEADROOM;
+   fl_bufsz += dp->rx_dma_off;
if (dp->rx_offset == NFP_NET_CFG_RX_OFFSET_DYNAMIC)
fl_bufsz += NFP_NET_MAX_PREPEND;
else
@@ -1181,11 +1182,13 @@ static void *nfp_net_napi_alloc_one(struct nfp_net_dp 
*dp, dma_addr_t *dma_addr)
 
 /**
  * nfp_net_rx_give_one() - Put mapped skb on the software and hardware rings
+ * @dp:NFP Net data path struct
  * @rx_ring:   RX ring structure
  * @frag:  page fragment buffer
  * @dma_addr:  DMA address of skb mapping
  */
-static void nfp_net_rx_give_one(struct nfp_net_rx_ring *rx_ring,
+static void nfp_net_rx_give_one(const struct nfp_net_dp *dp,
+   struct nfp_net_rx_ring *rx_ring,
void *frag, dma_addr_t dma_addr)
 {
unsigned int wr_idx;
@@ -1199,7 +1202,8 @@ static void nfp_net_rx_give_one(struct nfp_net_rx_ring 
*rx_ring,
/* Fill freelist descriptor */
rx_ring->rxds[wr_idx].fld.reserved = 0;
rx_ring->rxds[wr_idx].fld.meta_len_dd = 0;
-   nfp_desc_set_dma_addr(_ring->rxds[wr_idx].fld, dma_addr);
+   nfp_desc_set_dma_addr(_ring->rxds[wr_idx].fld,
+ dma_addr + dp->rx_dma_off);
 
rx_ring->wr_p++;
rx_ring->wr_ptr_add++;
@@ -1296,14 +1300,17 @@ nfp_net_rx_ring_bufs_alloc(struct nfp_net_dp *dp,
 
 /**
  * nfp_net_rx_ring_fill_freelist() - Give buffers from the ring to FW
+ * @dp: NFP Net data path struct
  * @rx_ring: RX ring to fill
  */
-static void nfp_net_rx_ring_fill_freelist(struct nfp_net_rx_ring *rx_ring)
+static void
+nfp_net_rx_ring_fill_freelist(struct nfp_net_dp *dp,
+ struct nfp_net_rx_ring *rx_ring)
 {
unsigned int i;
 
for (i = 0; i < rx_ring->cnt - 1; i++)
-   nfp_net_rx_give_one(rx_ring, rx_ring->rxbufs[i].frag,
+   nfp_net_rx_give_one(dp, rx_ring, rx_ring->rxbufs[i].frag,
rx_ring->rxbufs[i].dma_addr);
 }
 
@@ -1429,8 +1436,9 @@ nfp_net_parse_meta(struct net_device *netdev, struct 
sk_buff *skb,
 }
 
 static void
-nfp_net_rx_drop(struct nfp_net_r_vector *r_vec, struct nfp_net_rx_ring 
*rx_ring,
-   struct nfp_net_rx_buf *rxbuf, struct sk_buff *skb)
+nfp_net_rx_drop(const struct nfp_net_dp *dp, struct nfp_net_r_vector *r_vec,
+   struct nfp_net_rx_ring *rx_ring, struct nfp_net_rx_buf *rxbuf,
+   struct sk_buff *skb)
 {
u64_stats_update_begin(_vec->rx_sync);
r_vec->rx_drops++;
@@ -1442,7 +1450,7 @@ nfp_net_rx_drop(struct nfp_net_r_vector *r_vec, struct 
nfp_net_rx_ring *rx_ring,
if (skb && rxbuf && skb->head == rxbuf->frag)
page_ref_inc(virt_to_head_page(rxbuf->frag));
if (rxbuf)
-   nfp_net_rx_give_one(rx_ring, rxbuf->frag, rxbuf->dma_addr);
+   nfp_net_rx_give_one(dp, rx_ring, rxbuf->frag, rxbuf->dma_addr);
if (skb)
dev_kfree_skb_any(skb);
 }
@@ -1460,16 +1468,16 @@ nfp_net_tx_xdp_buf(struct nfp_net_dp *dp, struct

[PATCH net-next 06/13] nfp: use dp to carry mtu at reconfig time

2017-03-10 Thread Jakub Kicinski

Move the mtu member from ring set to data path struct.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  4 +++-
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 23 +++---
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  2 --
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index ab5865b955dd..84774c281b61 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -452,6 +452,7 @@ struct nfp_stat_pair {
  * @num_tx_rings:  Currently configured number of TX rings
  * @num_stack_tx_rings:Number of TX rings used by the stack (not XDP)
  * @num_rx_rings:  Currently configured number of RX rings
+ * @mtu:   Device MTU
  */
 struct nfp_net_dp {
struct device *dev;
@@ -484,6 +485,8 @@ struct nfp_net_dp {
unsigned int num_tx_rings;
unsigned int num_stack_tx_rings;
unsigned int num_rx_rings;
+
+   unsigned int mtu;
 };
 
 /**
@@ -610,7 +613,6 @@ struct nfp_net {
 
 struct nfp_net_ring_set {
unsigned int n_rings;
-   unsigned int mtu;
unsigned int dcnt;
void *rings;
 };
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 92d4c2991a85..862e86cb5688 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1099,7 +1099,7 @@ static void nfp_net_tx_timeout(struct net_device *netdev)
 /* Receive processing
  */
 static unsigned int
-nfp_net_calc_fl_bufsz(struct nfp_net_dp *dp, unsigned int mtu)
+nfp_net_calc_fl_bufsz(struct nfp_net_dp *dp)
 {
unsigned int fl_bufsz;
 
@@ -1108,7 +1108,7 @@ nfp_net_calc_fl_bufsz(struct nfp_net_dp *dp, unsigned int 
mtu)
fl_bufsz += NFP_NET_MAX_PREPEND;
else
fl_bufsz += dp->rx_offset;
-   fl_bufsz += ETH_HLEN + VLAN_HLEN * 2 + mtu;
+   fl_bufsz += ETH_HLEN + VLAN_HLEN * 2 + dp->mtu;
 
fl_bufsz = SKB_DATA_ALIGN(fl_bufsz);
fl_bufsz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
@@ -1935,12 +1935,13 @@ nfp_net_rx_ring_set_swap(struct nfp_net *nn, struct 
nfp_net_dp *dp,
struct nfp_net_dp new_dp = *dp;
 
dp->fl_bufsz = nn->dp.fl_bufsz;
-   s->mtu = nn->dp.netdev->mtu;
+   dp->mtu = nn->dp.netdev->mtu;
s->dcnt = nn->dp.rxd_cnt;
s->rings = nn->dp.rx_rings;
s->n_rings = nn->dp.num_rx_rings;
 
-   nn->dp.netdev->mtu = new.mtu;
+   nn->dp.mtu = new_dp.mtu;
+   nn->dp.netdev->mtu = new_dp.mtu;
nn->dp.fl_bufsz = new_dp.fl_bufsz;
nn->dp.rxd_cnt = new.dcnt;
nn->dp.rx_rings = new.rings;
@@ -2255,7 +2256,6 @@ static int nfp_net_netdev_open(struct net_device *netdev)
struct nfp_net *nn = netdev_priv(netdev);
struct nfp_net_ring_set rx = {
.n_rings = nn->dp.num_rx_rings,
-   .mtu = nn->dp.netdev->mtu,
.dcnt = nn->dp.rxd_cnt,
};
struct nfp_net_ring_set tx = {
@@ -2466,6 +2466,8 @@ static void nfp_net_dp_swap(struct nfp_net *nn, struct 
nfp_net_dp *dp)
 
*dp = nn->dp;
nn->dp = new_dp;
+
+   nn->dp.netdev->mtu = new_dp.mtu;
 }
 
 static int
@@ -2554,7 +2556,6 @@ nfp_net_ring_reconfig_down(struct nfp_net *nn, struct 
nfp_net_dp *dp,
 {
nfp_net_dp_swap(nn, dp);
 
-   nn->dp.netdev->mtu = rx ? rx->mtu : nn->dp.netdev->mtu;
nn->dp.rxd_cnt = rx ? rx->dcnt : nn->dp.rxd_cnt;
nn->dp.txd_cnt = tx ? tx->dcnt : nn->dp.txd_cnt;
nn->dp.num_rx_rings = rx ? rx->n_rings : nn->dp.num_rx_rings;
@@ -2572,8 +2573,7 @@ nfp_net_ring_reconfig(struct nfp_net *nn, struct 
nfp_net_dp *dp,
 {
int r, err;
 
-   dp->fl_bufsz = nfp_net_calc_fl_bufsz(dp,
-rx ? rx->mtu : nn->dp.netdev->mtu);
+   dp->fl_bufsz = nfp_net_calc_fl_bufsz(dp);
 
dp->num_stack_tx_rings = tx ? tx->n_rings : dp->num_tx_rings;
if (*xdp_prog)
@@ -2659,7 +2659,6 @@ static int nfp_net_change_mtu(struct net_device *netdev, 
int new_mtu)
struct nfp_net *nn = netdev_priv(netdev);
struct nfp_net_ring_set rx = {
.n_rings = nn->dp.num_rx_rings,
-   .mtu = new_mtu,
.dcnt = nn->dp.rxd_cnt,
};
struct nfp_net_dp *dp;
@@ -2668,6 +2667,8 @@ static int nfp_net_change_mtu(struct net_device *netdev, 
int new_mtu)
if (!dp)
return -ENOMEM;
 
+   dp->mtu = new_mtu;
+
return nfp_net_ring_reconfig(nn, dp, >dp.xdp_prog, , NULL);
 }
 
@@ -2988,7 +2989,6 @@ static int nfp_net_xdp_setup(struct nfp_net *nn, struct 
bpf_prog *prog)
 {
struct nfp_net_ring_set rx = {
.n_rings = nn->dp.num_rx_rings,
-   .mtu

[PATCH net-next 12/13] nfp: prepare metadata handling for xdp_adjust_head()

2017-03-10 Thread Jakub Kicinski

XDP may require us to move metadata to make room for pushing
headers.  Track meta data location with a pointer and pass
it explicitly to functions.

While at it validate that meta_len from the descriptor is not
bogus.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 28 +++---
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 0e4fa6802733..fe7c3f6d820d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1385,24 +1385,21 @@ static void nfp_net_set_hash(struct net_device *netdev, 
struct sk_buff *skb,
 
 static void
 nfp_net_set_hash_desc(struct net_device *netdev, struct sk_buff *skb,
- struct nfp_net_rx_desc *rxd)
+ void *data, struct nfp_net_rx_desc *rxd)
 {
-   struct nfp_net_rx_hash *rx_hash;
+   struct nfp_net_rx_hash *rx_hash = data;
 
if (!(rxd->rxd.flags & PCIE_DESC_RX_RSS))
return;
 
-   rx_hash = (struct nfp_net_rx_hash *)(skb->data - sizeof(*rx_hash));
-
nfp_net_set_hash(netdev, skb, get_unaligned_be32(_hash->hash_type),
 _hash->hash);
 }
 
 static void *
 nfp_net_parse_meta(struct net_device *netdev, struct sk_buff *skb,
-  int meta_len)
+  void *data, int meta_len)
 {
-   u8 *data = skb->data - meta_len;
u32 meta_info;
 
meta_info = get_unaligned_be32(data);
@@ -1546,6 +1543,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
struct nfp_net_rx_desc *rxd;
dma_addr_t new_dma_addr;
void *new_frag;
+   u8 *meta;
 
idx = rx_ring->rd_p & (rx_ring->cnt - 1);
 
@@ -1589,6 +1587,17 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
r_vec->rx_bytes += pkt_len;
u64_stats_update_end(_vec->rx_sync);
 
+   /* Pointer to start of metadata */
+   meta = rxbuf->frag + data_off - meta_len;
+
+   if (unlikely(meta_len > NFP_NET_MAX_PREPEND ||
+(dp->rx_offset && meta_len > dp->rx_offset))) {
+   nn_dp_warn(dp, "oversized RX packet metadata %u\n",
+  meta_len);
+   nfp_net_rx_drop(r_vec, rx_ring, rxbuf, NULL);
+   continue;
+   }
+
if (xdp_prog && !(rxd->rxd.flags & PCIE_DESC_RX_BPF &&
  dp->bpf_offload_xdp)) {
unsigned int dma_off;
@@ -1641,12 +1650,13 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
skb_put(skb, pkt_len);
 
if (!dp->chained_metadata_format) {
-   nfp_net_set_hash_desc(dp->netdev, skb, rxd);
+   nfp_net_set_hash_desc(dp->netdev, skb, meta, rxd);
} else if (meta_len) {
void *end;
 
-   end = nfp_net_parse_meta(dp->netdev, skb, meta_len);
-   if (unlikely(end != skb->data)) {
+   end = nfp_net_parse_meta(dp->netdev, skb, meta,
+meta_len);
+   if (unlikely(end != meta + meta_len)) {
nn_dp_warn(dp, "invalid RX packet metadata\n");
nfp_net_rx_drop(r_vec, rx_ring, NULL, skb);
continue;
-- 
2.11.0

[PATCH net-next 03/13] nfp: pass new data path to ring reconfig

2017-03-10 Thread Jakub Kicinski

Make callers of nfp_net_ring_reconfig() pass newly allocated data
path structure.  We will gradually make use of that structure
instead of passing parameters around to all the allocation functions.
This commit adds allocation and propagation of new data path struct,
no parameters are converted, yet.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   5 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 108 ++---
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  14 ++-
 3 files changed, 91 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 90a44fad6bd5..74f6d485351f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -812,8 +812,11 @@ void nfp_net_irqs_disable(struct pci_dev *pdev);
 void
 nfp_net_irqs_assign(struct nfp_net *nn, struct msix_entry *irq_entries,
unsigned int n);
+
+struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn);
 int
-nfp_net_ring_reconfig(struct nfp_net *nn, struct bpf_prog **xdp_prog,
+nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new,
+ struct bpf_prog **xdp_prog,
  struct nfp_net_ring_set *rx, struct nfp_net_ring_set *tx);
 
 #ifdef CONFIG_NFP_DEBUG
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 951d511643f1..7afefb44b642 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1127,6 +1127,7 @@ nfp_net_free_frag(void *frag, bool xdp)
 
 /**
  * nfp_net_rx_alloc_one() - Allocate and map page frag for RX
+ * @dp:NFP Net data path struct
  * @rx_ring:   RX ring structure of the skb
  * @dma_addr:  Pointer to storage for DMA address (output param)
  * @fl_bufsz:  size of freelist buffers
@@ -1137,10 +1138,10 @@ nfp_net_free_frag(void *frag, bool xdp)
  * Return: allocated page frag or NULL on failure.
  */
 static void *
-nfp_net_rx_alloc_one(struct nfp_net_rx_ring *rx_ring, dma_addr_t *dma_addr,
+nfp_net_rx_alloc_one(struct nfp_net_dp *dp,
+struct nfp_net_rx_ring *rx_ring, dma_addr_t *dma_addr,
 unsigned int fl_bufsz, bool xdp)
 {
-   struct nfp_net_dp *dp = _ring->r_vec->nfp_net->dp;
int direction;
void *frag;
 
@@ -1299,7 +1300,7 @@ nfp_net_rx_ring_bufs_alloc(struct nfp_net_dp *dp,
 
for (i = 0; i < rx_ring->cnt - 1; i++) {
rxbufs[i].frag =
-   nfp_net_rx_alloc_one(rx_ring, [i].dma_addr,
+   nfp_net_rx_alloc_one(dp, rx_ring, [i].dma_addr,
 rx_ring->bufsz, xdp);
if (!rxbufs[i].frag) {
nfp_net_rx_ring_bufs_free(dp, rx_ring, xdp);
@@ -1784,7 +1785,8 @@ nfp_net_tx_ring_alloc(struct nfp_net_tx_ring *tx_ring, 
u32 cnt, bool is_xdp)
 }
 
 static struct nfp_net_tx_ring *
-nfp_net_tx_ring_set_prepare(struct nfp_net *nn, struct nfp_net_ring_set *s,
+nfp_net_tx_ring_set_prepare(struct nfp_net *nn, struct nfp_net_dp *dp,
+   struct nfp_net_ring_set *s,
unsigned int num_stack_tx_rings)
 {
struct nfp_net_tx_ring *rings;
@@ -1900,10 +1902,10 @@ nfp_net_rx_ring_alloc(struct nfp_net_rx_ring *rx_ring, 
unsigned int fl_bufsz,
 }
 
 static struct nfp_net_rx_ring *
-nfp_net_rx_ring_set_prepare(struct nfp_net *nn, struct nfp_net_ring_set *s,
-   bool xdp)
+nfp_net_rx_ring_set_prepare(struct nfp_net *nn, struct nfp_net_dp *dp,
+   struct nfp_net_ring_set *s, bool xdp)
 {
-   unsigned int fl_bufsz = nfp_net_calc_fl_bufsz(>dp, s->mtu);
+   unsigned int fl_bufsz = nfp_net_calc_fl_bufsz(dp, s->mtu);
struct nfp_net_rx_ring *rings;
unsigned int r;
 
@@ -1917,7 +1919,7 @@ nfp_net_rx_ring_set_prepare(struct nfp_net *nn, struct 
nfp_net_ring_set *s,
if (nfp_net_rx_ring_alloc([r], fl_bufsz, s->dcnt))
goto err_free_prev;
 
-   if (nfp_net_rx_ring_bufs_alloc(>dp, [r], xdp))
+   if (nfp_net_rx_ring_bufs_alloc(dp, [r], xdp))
goto err_free_ring;
}
 
@@ -1925,7 +1927,7 @@ nfp_net_rx_ring_set_prepare(struct nfp_net *nn, struct 
nfp_net_ring_set *s,
 
 err_free_prev:
while (r--) {
-   nfp_net_rx_ring_bufs_free(>dp, [r], xdp);
+   nfp_net_rx_ring_bufs_free(dp, [r], xdp);
 err_free_ring:
nfp_net_rx_ring_free([r]);
}
@@ -2295,14 +2297,15 @@ static int nfp_net_netdev_open(struct net_device 
*netdev)
goto err_cleanup_vec_p;
}
 
-   nn->dp.rx_rings = nfp_net_rx_ring_set_prepare(nn, , nn->dp.xdp_prog);
+   nn->dp.rx_rings =

[PATCH net-next 02/13] nfp: move control BAR pointer into data path structure

2017-03-10 Thread Jakub Kicinski

Control BAR pointer is used to unmask interrupts so it should be
in the first cacheline of adapter structure.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h| 21 +++--
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c| 14 +++---
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c   |  2 +-
 drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c |  4 ++--
 4 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 7d2c38604372..90a44fad6bd5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -447,6 +447,7 @@ struct nfp_stat_pair {
  * @xdp_prog:  Installed XDP program
  * @tx_rings:  Array of pre-allocated TX ring structures
  * @rx_rings:  Array of pre-allocated RX ring structures
+ * @ctrl_bar:  Pointer to mapped control BAR
  *
  * @txd_cnt:   Size of the TX ring in number of descriptors
  * @rxd_cnt:   Size of the RX ring in number of descriptors
@@ -474,6 +475,8 @@ struct nfp_net_dp {
struct nfp_net_tx_ring *tx_rings;
struct nfp_net_rx_ring *rx_rings;
 
+   u8 __iomem *ctrl_bar;
+
/* Cold data follows */
 
unsigned int txd_cnt;
@@ -527,7 +530,6 @@ struct nfp_net_dp {
  * @vxlan_ports:   VXLAN ports for RX inner csum offload communicated to HW
  * @vxlan_usecnt:  IPv4/IPv6 VXLAN port use counts
  * @qcp_cfg:Pointer to QCP queue used for configuration 
notification
- * @ctrl_bar:   Pointer to mapped control BAR
  * @tx_bar: Pointer to mapped TX queues
  * @rx_bar: Pointer to mapped FL/RX queues
  * @debugfs_dir:   Device directory in debugfs
@@ -595,7 +597,6 @@ struct nfp_net {
 
u8 __iomem *qcp_cfg;
 
-   u8 __iomem *ctrl_bar;
u8 __iomem *tx_bar;
u8 __iomem *rx_bar;
 
@@ -622,42 +623,42 @@ struct nfp_net_ring_set {
  */
 static inline u16 nn_readb(struct nfp_net *nn, int off)
 {
-   return readb(nn->ctrl_bar + off);
+   return readb(nn->dp.ctrl_bar + off);
 }
 
 static inline void nn_writeb(struct nfp_net *nn, int off, u8 val)
 {
-   writeb(val, nn->ctrl_bar + off);
+   writeb(val, nn->dp.ctrl_bar + off);
 }
 
 static inline u16 nn_readw(struct nfp_net *nn, int off)
 {
-   return readw(nn->ctrl_bar + off);
+   return readw(nn->dp.ctrl_bar + off);
 }
 
 static inline void nn_writew(struct nfp_net *nn, int off, u16 val)
 {
-   writew(val, nn->ctrl_bar + off);
+   writew(val, nn->dp.ctrl_bar + off);
 }
 
 static inline u32 nn_readl(struct nfp_net *nn, int off)
 {
-   return readl(nn->ctrl_bar + off);
+   return readl(nn->dp.ctrl_bar + off);
 }
 
 static inline void nn_writel(struct nfp_net *nn, int off, u32 val)
 {
-   writel(val, nn->ctrl_bar + off);
+   writel(val, nn->dp.ctrl_bar + off);
 }
 
 static inline u64 nn_readq(struct nfp_net *nn, int off)
 {
-   return readq(nn->ctrl_bar + off);
+   return readq(nn->dp.ctrl_bar + off);
 }
 
 static inline void nn_writeq(struct nfp_net *nn, int off, u64 val)
 {
-   writeq(val, nn->ctrl_bar + off);
+   writeq(val, nn->dp.ctrl_bar + off);
 }
 
 /* Flush posted PCI writes by reading something without side effects */
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 4620c1bba96e..969c30589f23 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -307,7 +307,7 @@ static void nfp_net_get_stats(struct net_device *netdev,
break;
 
case NFP_NET_DEV_ET_STATS:
-   io_p = nn->ctrl_bar + nfp_net_et_stats[i].off;
+   io_p = nn->dp.ctrl_bar + nfp_net_et_stats[i].off;
data[i] = readq(io_p);
break;
}
@@ -339,15 +339,15 @@ static void nfp_net_get_stats(struct net_device *netdev,
for (j = 0; j < NN_ET_RVEC_GATHER_STATS; j++)
data[i++] = gathered_stats[j];
for (j = 0; j < nn->dp.num_tx_rings; j++) {
-   io_p = nn->ctrl_bar + NFP_NET_CFG_TXR_STATS(j);
+   io_p = nn->dp.ctrl_bar + NFP_NET_CFG_TXR_STATS(j);
data[i++] = readq(io_p);
-   io_p = nn->ctrl_bar + NFP_NET_CFG_TXR_STATS(j) + 8;
+   io_p = nn->dp.ctrl_bar + NFP_NET_CFG_TXR_STATS(j) + 8;
data[i++] = readq(io_p);
}
for (j = 0; j < nn->dp.num_rx_rings; j++) {
-   io_p = nn->ctrl_bar + NFP_NET_CFG_RXR_STATS(j);
+   io_p = nn->dp.ctrl_bar + NFP_NET_CFG_RXR_STATS(j);
data[i++] = readq(io_p);
-   io_p = nn->ctrl_bar + NFP_NET_CFG_RXR_STATS(j) + 8;
+   io_p =

[PATCH net-next 08/13] nfp: switch to using data path structures for reconfiguration

2017-03-10 Thread Jakub Kicinski

Instead of passing around sets of rings and their parameters just
store all information in the data path structure.

We will no longer user xchg() on XDP programs when we swap programs
while the traffic is guaranteed not to be flowing.  This allows us
to simply assign the entire data path structures instead of copying
field by field.

The optimization to reallocate only the rings on the side (RX/TX)
which has been changed is also removed since it seems like it's not
worth the code complexity.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  10 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 260 ++---
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  46 +---
 3 files changed, 89 insertions(+), 227 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 19dacc3f1269..5a92f6e41dae 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -611,12 +611,6 @@ struct nfp_net {
struct nfp_eth_table_port *eth_port;
 };
 
-struct nfp_net_ring_set {
-   unsigned int n_rings;
-   unsigned int dcnt;
-   void *rings;
-};
-
 /* Functions to read/write from/to a BAR
  * Performs any endian conversion necessary.
  */
@@ -813,9 +807,7 @@ nfp_net_irqs_assign(struct nfp_net *nn, struct msix_entry 
*irq_entries,
unsigned int n);
 
 struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn);
-int
-nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new,
- struct nfp_net_ring_set *rx, struct nfp_net_ring_set *tx);
+int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new);
 
 #ifdef CONFIG_NFP_DEBUG
 void nfp_net_debugfs_create(void);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 6ab824a48d1d..a9359da64f80 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1740,20 +1740,20 @@ static void nfp_net_tx_ring_free(struct nfp_net_tx_ring 
*tx_ring)
 
 /**
  * nfp_net_tx_ring_alloc() - Allocate resource for a TX ring
+ * @dp:NFP Net data path struct
  * @tx_ring:   TX Ring structure to allocate
- * @cnt:   Ring buffer count
  * @is_xdp:True if ring will be used for XDP
  *
  * Return: 0 on success, negative errno otherwise.
  */
 static int
-nfp_net_tx_ring_alloc(struct nfp_net_tx_ring *tx_ring, u32 cnt, bool is_xdp)
+nfp_net_tx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_tx_ring *tx_ring,
+ bool is_xdp)
 {
struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
-   struct nfp_net_dp *dp = _vec->nfp_net->dp;
int sz;
 
-   tx_ring->cnt = cnt;
+   tx_ring->cnt = dp->txd_cnt;
 
tx_ring->size = sizeof(*tx_ring->txds) * tx_ring->cnt;
tx_ring->txds = dma_zalloc_coherent(dp->dev, tx_ring->size,
@@ -1777,61 +1777,45 @@ nfp_net_tx_ring_alloc(struct nfp_net_tx_ring *tx_ring, 
u32 cnt, bool is_xdp)
return -ENOMEM;
 }
 
-static struct nfp_net_tx_ring *
-nfp_net_tx_ring_set_prepare(struct nfp_net *nn, struct nfp_net_dp *dp,
-   struct nfp_net_ring_set *s)
+static int nfp_net_tx_rings_prepare(struct nfp_net *nn, struct nfp_net_dp *dp)
 {
-   struct nfp_net_tx_ring *rings;
unsigned int r;
 
-   rings = kcalloc(s->n_rings, sizeof(*rings), GFP_KERNEL);
-   if (!rings)
-   return NULL;
+   dp->tx_rings = kcalloc(dp->num_tx_rings, sizeof(*dp->tx_rings),
+  GFP_KERNEL);
+   if (!dp->tx_rings)
+   return -ENOMEM;
 
-   for (r = 0; r < s->n_rings; r++) {
+   for (r = 0; r < dp->num_tx_rings; r++) {
int bias = 0;
 
if (r >= dp->num_stack_tx_rings)
bias = dp->num_stack_tx_rings;
 
-   nfp_net_tx_ring_init([r], >r_vecs[r - bias], r);
+   nfp_net_tx_ring_init(>tx_rings[r], >r_vecs[r - bias],
+r);
 
-   if (nfp_net_tx_ring_alloc([r], s->dcnt, bias))
+   if (nfp_net_tx_ring_alloc(dp, >tx_rings[r], bias))
goto err_free_prev;
}
 
-   return s->rings = rings;
+   return 0;
 
 err_free_prev:
while (r--)
-   nfp_net_tx_ring_free([r]);
-   kfree(rings);
-   return NULL;
-}
-
-static void
-nfp_net_tx_ring_set_swap(struct nfp_net *nn, struct nfp_net_ring_set *s)
-{
-   struct nfp_net_ring_set new = *s;
-
-   s->dcnt = nn->dp.txd_cnt;
-   s->rings = nn->dp.tx_rings;
-   s->n_rings = nn->dp.num_tx_rings;
-
-   nn->dp.txd_cnt = new.dcnt;
-   nn->dp.tx_rings = new.rings;
-   nn->dp.num_tx_rings = new.n_rings;
+   nfp_net_tx_ring_free(>tx_rings[r]);
+   kfree(dp->tx_rings);
+

[PATCH net-next 04/13] nfp: use dp to carry number of stack tx rings and vectors

2017-03-10 Thread Jakub Kicinski

Instead of passing variables around use dp to store number of tx rings
for the stack and number of IRQ vectors.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 62 +++---
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 7afefb44b642..52f0e9dfd15a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1786,8 +1786,7 @@ nfp_net_tx_ring_alloc(struct nfp_net_tx_ring *tx_ring, 
u32 cnt, bool is_xdp)
 
 static struct nfp_net_tx_ring *
 nfp_net_tx_ring_set_prepare(struct nfp_net *nn, struct nfp_net_dp *dp,
-   struct nfp_net_ring_set *s,
-   unsigned int num_stack_tx_rings)
+   struct nfp_net_ring_set *s)
 {
struct nfp_net_tx_ring *rings;
unsigned int r;
@@ -1799,8 +1798,8 @@ nfp_net_tx_ring_set_prepare(struct nfp_net *nn, struct 
nfp_net_dp *dp,
for (r = 0; r < s->n_rings; r++) {
int bias = 0;
 
-   if (r >= num_stack_tx_rings)
-   bias = num_stack_tx_rings;
+   if (r >= dp->num_stack_tx_rings)
+   bias = dp->num_stack_tx_rings;
 
nfp_net_tx_ring_init([r], >r_vecs[r - bias], r);
 
@@ -2304,8 +2303,7 @@ static int nfp_net_netdev_open(struct net_device *netdev)
goto err_cleanup_vec;
}
 
-   nn->dp.tx_rings = nfp_net_tx_ring_set_prepare(nn, >dp, ,
- 
nn->dp.num_stack_tx_rings);
+   nn->dp.tx_rings = nfp_net_tx_ring_set_prepare(nn, >dp, );
if (!nn->dp.tx_rings) {
err = -ENOMEM;
goto err_free_rx_rings;
@@ -2466,10 +2464,16 @@ static void nfp_net_rss_init_itbl(struct nfp_net *nn)
ethtool_rxfh_indir_default(i, nn->dp.num_rx_rings);
 }
 
+static void nfp_net_dp_swap(struct nfp_net *nn, struct nfp_net_dp *dp)
+{
+   struct nfp_net_dp new_dp = *dp;
+
+   *dp = nn->dp;
+   nn->dp = new_dp;
+}
+
 static int
 nfp_net_ring_swap_enable(struct nfp_net *nn, struct nfp_net_dp *dp,
-unsigned int *num_vecs,
-unsigned int *stack_tx_rings,
 struct bpf_prog **xdp_prog,
 struct nfp_net_ring_set *rx,
 struct nfp_net_ring_set *tx)
@@ -2482,8 +2486,8 @@ nfp_net_ring_swap_enable(struct nfp_net *nn, struct 
nfp_net_dp *dp,
if (tx)
nfp_net_tx_ring_set_swap(nn, tx);
 
-   swap(*num_vecs, nn->dp.num_r_vecs);
-   swap(*stack_tx_rings, nn->dp.num_stack_tx_rings);
+   swap(dp->num_r_vecs, nn->dp.num_r_vecs);
+   swap(dp->num_stack_tx_rings, nn->dp.num_stack_tx_rings);
*xdp_prog = xchg(>dp.xdp_prog, *xdp_prog);
 
for (r = 0; r < nn->max_r_vecs; r++)
@@ -2550,17 +2554,16 @@ static void
 nfp_net_ring_reconfig_down(struct nfp_net *nn, struct nfp_net_dp *dp,
   struct bpf_prog **xdp_prog,
   struct nfp_net_ring_set *rx,
-  struct nfp_net_ring_set *tx,
-  unsigned int stack_tx_rings, unsigned int num_vecs)
+  struct nfp_net_ring_set *tx)
 {
+   nfp_net_dp_swap(nn, dp);
+
nn->dp.netdev->mtu = rx ? rx->mtu : nn->dp.netdev->mtu;
nn->dp.fl_bufsz = nfp_net_calc_fl_bufsz(>dp, nn->dp.netdev->mtu);
nn->dp.rxd_cnt = rx ? rx->dcnt : nn->dp.rxd_cnt;
nn->dp.txd_cnt = tx ? tx->dcnt : nn->dp.txd_cnt;
nn->dp.num_rx_rings = rx ? rx->n_rings : nn->dp.num_rx_rings;
nn->dp.num_tx_rings = tx ? tx->n_rings : nn->dp.num_tx_rings;
-   nn->dp.num_stack_tx_rings = stack_tx_rings;
-   nn->dp.num_r_vecs = num_vecs;
*xdp_prog = xchg(>dp.xdp_prog, *xdp_prog);
 
if (!netif_is_rxfh_configured(nn->dp.netdev))
@@ -2572,31 +2575,31 @@ nfp_net_ring_reconfig(struct nfp_net *nn, struct 
nfp_net_dp *dp,
  struct bpf_prog **xdp_prog,
  struct nfp_net_ring_set *rx, struct nfp_net_ring_set *tx)
 {
-   unsigned int stack_tx_rings, num_vecs, r;
-   int err;
+   int r, err;
 
-   stack_tx_rings = tx ? tx->n_rings : dp->num_tx_rings;
+   dp->num_stack_tx_rings = tx ? tx->n_rings : dp->num_tx_rings;
if (*xdp_prog)
-   stack_tx_rings -= rx ? rx->n_rings : dp->num_rx_rings;
+   dp->num_stack_tx_rings -= rx ? rx->n_rings : dp->num_rx_rings;
 
-   num_vecs = max(rx ? rx->n_rings : dp->num_rx_rings, stack_tx_rings);
+   dp->num_r_vecs = max(rx ? rx->n_rings : dp->num_rx_rings,
+dp->num_stack_tx_rings);
 
err = nfp_net_check_config(nn, dp, *xdp_prog, rx, tx);
if (err)

[PATCH net-next 01/13] nfp: separate data path information from the reset of adapter structure

2017-03-10 Thread Jakub Kicinski

Move all data path information into a separate structure.  This way
we will be able to allocate new data path with all new rings etc.
and swap it in easily.

No functional changes.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   | 105 ++--
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 602 +++--
 .../net/ethernet/netronome/nfp/nfp_net_debugfs.c   |   4 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  64 +--
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  |  30 +-
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   |  30 +-
 .../net/ethernet/netronome/nfp/nfp_netvf_main.c|  15 +-
 7 files changed, 436 insertions(+), 414 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 34f8c439f42f..7d2c38604372 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -50,14 +50,14 @@
 
 #include "nfp_net_ctrl.h"
 
-#define nn_err(nn, fmt, args...)  netdev_err((nn)->netdev, fmt, ## args)
-#define nn_warn(nn, fmt, args...) netdev_warn((nn)->netdev, fmt, ## args)
-#define nn_info(nn, fmt, args...) netdev_info((nn)->netdev, fmt, ## args)
-#define nn_dbg(nn, fmt, args...)  netdev_dbg((nn)->netdev, fmt, ## args)
-#define nn_warn_ratelimit(nn, fmt, args...)\
+#define nn_err(nn, fmt, args...)  netdev_err((nn)->dp.netdev, fmt, ## args)
+#define nn_warn(nn, fmt, args...) netdev_warn((nn)->dp.netdev, fmt, ## args)
+#define nn_info(nn, fmt, args...) netdev_info((nn)->dp.netdev, fmt, ## args)
+#define nn_dbg(nn, fmt, args...)  netdev_dbg((nn)->dp.netdev, fmt, ## args)
+#define nn_dp_warn(dp, fmt, args...)   \
do {\
if (unlikely(net_ratelimit()))  \
-   netdev_warn((nn)->netdev, fmt, ## args);\
+   netdev_warn((dp)->netdev, fmt, ## args);\
} while (0)
 
 /* Max time to wait for NFP to respond on updates (in seconds) */
@@ -434,18 +434,62 @@ struct nfp_stat_pair {
 };
 
 /**
- * struct nfp_net - NFP network device structure
+ * struct nfp_net_dp - NFP network device datapath data structure
  * @dev:   Backpointer to struct device
- * @netdev: Backpointer to net_device structure
- * @is_vf:  Is the driver attached to a VF?
+ * @netdev:Backpointer to net_device structure
+ * @is_vf: Is the driver attached to a VF?
  * @bpf_offload_skip_sw:  Offloaded BPF program will not be rerun by cls_bpf
  * @bpf_offload_xdp:   Offloaded BPF program is XDP
  * @chained_metadata_format:  Firemware will use new metadata format
- * @ctrl:   Local copy of the control register/word.
- * @fl_bufsz:   Currently configured size of the freelist buffers
+ * @ctrl:  Local copy of the control register/word.
+ * @fl_bufsz:  Currently configured size of the freelist buffers
  * @rx_offset: Offset in the RX buffers where packet data starts
  * @xdp_prog:  Installed XDP program
- * @fw_ver: Firmware version
+ * @tx_rings:  Array of pre-allocated TX ring structures
+ * @rx_rings:  Array of pre-allocated RX ring structures
+ *
+ * @txd_cnt:   Size of the TX ring in number of descriptors
+ * @rxd_cnt:   Size of the RX ring in number of descriptors
+ * @num_r_vecs:Number of used ring vectors
+ * @num_tx_rings:  Currently configured number of TX rings
+ * @num_stack_tx_rings:Number of TX rings used by the stack (not XDP)
+ * @num_rx_rings:  Currently configured number of RX rings
+ */
+struct nfp_net_dp {
+   struct device *dev;
+   struct net_device *netdev;
+
+   unsigned is_vf:1;
+   unsigned bpf_offload_skip_sw:1;
+   unsigned bpf_offload_xdp:1;
+   unsigned chained_metadata_format:1;
+
+   u32 ctrl;
+   u32 fl_bufsz;
+
+   u32 rx_offset;
+
+   struct bpf_prog *xdp_prog;
+
+   struct nfp_net_tx_ring *tx_rings;
+   struct nfp_net_rx_ring *rx_rings;
+
+   /* Cold data follows */
+
+   unsigned int txd_cnt;
+   unsigned int rxd_cnt;
+
+   unsigned int num_r_vecs;
+
+   unsigned int num_tx_rings;
+   unsigned int num_stack_tx_rings;
+   unsigned int num_rx_rings;
+};
+
+/**
+ * struct nfp_net - NFP network device structure
+ * @dp:Datapath structure
+ * @fw_ver:Firmware version
  * @cap:Capabilities advertised by the Firmware
  * @max_mtu:Maximum support MTU advertised by the Firmware
  * @rss_hfunc: RSS selected hash function
@@ -457,17 +501,9 @@ struct nfp_stat_pair {
  * @rx_filter_change:  Jiffies when statistics last changed
  * @rx_filter_stats_timer:  Timer for polling

[PATCH net-next 00/13] nfp: XDP adjust head support

2017-03-10 Thread Jakub Kicinski

Hi!

This series adds support for XDP adjust head.  Bulk of the code
is actually just paying technical debt.  On reconfiguration request
nfp was allocating new resources separately leaving device running
with the existing set of rings.  We used to manage the new resources
in special ring set structures.  This set is simply separating the
datapath part of the device structure from the control information
allowing the new datapath structure to be allocated with all new
memory and rings.  The swap operation is now greatly simplified.
We also save a lot of parameter passing this way.  Hopefully the churn
is worth the negative diffstat.

Support for XDP adjust head is done in a pretty standard way.  NFP
is a bit special because it prepends metadata before packet data
so we have to do a bit of memcpying in case XDP will run.  We also
luck out a little bit because the fact that we already have prepend
space allocated means that one byte is enough to store the extra XDP
space (256 of standard prepend space is a bit inconvenient since
it would normally require 16bits or boolean with additional shifts).

Jakub Kicinski (13):
  nfp: separate data path information from the reset of adapter
structure
  nfp: move control BAR pointer into data path structure
  nfp: pass new data path to ring reconfig
  nfp: use dp to carry number of stack tx rings and vectors
  nfp: use dp to carry fl_bufsz at reconfig time
  nfp: use dp to carry mtu at reconfig time
  nfp: use dp to carry xdp_prog at reconfig time
  nfp: switch to using data path structures for reconfiguration
  nfp: store dma direction in data path structure
  nfp: validate rx offset from the BAR and size down it's field
  nfp: reorganize pkt_off variable
  nfp: prepare metadata handling for xdp_adjust_head()
  nfp: add support for xdp_adjust_head()

 drivers/net/ethernet/netronome/nfp/nfp_net.h   | 150 ++--
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 999 ++---
 .../net/ethernet/netronome/nfp/nfp_net_debugfs.c   |   4 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 104 +--
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  |  32 +-
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   |  30 +-
 .../net/ethernet/netronome/nfp/nfp_netvf_main.c|  19 +-
 7 files changed, 655 insertions(+), 683 deletions(-)

-- 
2.11.0

[PATCH v5 net-next 1/9] net: stmmac: multiple queues dt configuration

2017-03-10 Thread Joao Pinto

This patch adds the multiple queues configuration in the Device Tree.
It was also created a set of structures to keep the RX and TX queues
configurations to be used in the driver.

Signed-off-by: Joao Pinto 
---
Changes v4->v5:
- patch title update (stmicro replaced by stmmac)
Changes v2->v4:
- Just to keep up with patch-set version
Changes v1->v2:
- RX and TX queues child nodes had bad handle

 Documentation/devicetree/bindings/net/stmmac.txt   | 40 ++
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  | 91 ++
 include/linux/stmmac.h | 30 +++
 3 files changed, 161 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index d3bfc2b..4107e67 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -72,6 +72,27 @@ Optional properties:
- snps,mb: mixed-burst
- snps,rb: rebuild INCRx Burst
 - mdio: with compatible = "snps,dwmac-mdio", create and register mdio bus.
+- Multiple RX Queues parameters: below the list of all the parameters to
+configure the multiple RX queues:
+   - snps,rx-queues-to-use: number of RX queues to be used in the driver
+   - Choose one of these RX scheduling algorithms:
+   - snps,rx-sched-sp: Strict priority
+   - snps,rx-sched-wsp: Weighted Strict priority
+   - For each RX queue
+   - Choose one of these modes:
+   - snps,dcb-algorithm: Queue to be enabled as DCB
+   - snps,avb-algorithm: Queue to be enabled as AVB
+   - snps,map-to-dma-channel: Channel to map
+- Multiple TX Queues parameters: below the list of all the parameters to
+configure the multiple TX queues:
+   - snps,tx-queues-to-use: number of TX queues to be used in the driver
+   - Choose one of these TX scheduling algorithms:
+   - snps,tx-sched-wrr: Weighted Round Robin
+   - snps,tx-sched-wfq: Weighted Fair Queuing
+   - snps,tx-sched-dwrr: Deficit Weighted Round Robin
+   - snps,tx-sched-sp: Strict priority
+   - For each TX queue
+   - snps,weight: TX queue weight (if using a weighted algorithm)
 
 Examples:
 
@@ -81,6 +102,23 @@ Examples:
snps,blen = <256 128 64 32 0 0 0>;
};
 
+   mtl_rx_setup: rx-queues-config {
+   snps,rx-queues-to-use = <1>;
+   snps,rx-sched-sp;
+   queue0 {
+   snps,dcb-algorithm;
+   snps,map-to-dma-channel = <0x0>;
+   };
+   };
+
+   mtl_tx_setup: tx-queues-config {
+   snps,tx-queues-to-use = <1>;
+   snps,tx-sched-wrr;
+   queue0 {
+   snps,weight = <0x10>;
+   };
+   };
+
gmac0: ethernet@e080 {
compatible = "st,spear600-gmac";
reg = <0xe080 0x8000>;
@@ -104,4 +142,6 @@ Examples:
phy1: ethernet-phy@0 {
};
};
+   snps,mtl-rx-config = <_rx_setup>;
+   snps,mtl-tx-config = <_tx_setup>;
};
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 433a842..ff6af8d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -132,6 +132,95 @@ static struct stmmac_axi *stmmac_axi_setup(struct 
platform_device *pdev)
 }
 
 /**
+ * stmmac_mtl_setup - parse DT parameters for multiple queues configuration
+ * @pdev: platform device
+ */
+static void stmmac_mtl_setup(struct platform_device *pdev,
+struct plat_stmmacenet_data *plat)
+{
+   struct device_node *q_node;
+   struct device_node *rx_node;
+   struct device_node *tx_node;
+   u8 queue = 0;
+
+   rx_node = of_parse_phandle(pdev->dev.of_node, "snps,mtl-rx-config", 0);
+   if (!rx_node)
+   return;
+
+   tx_node = of_parse_phandle(pdev->dev.of_node, "snps,mtl-tx-config", 0);
+   if (!tx_node) {
+   of_node_put(rx_node);
+   return;
+   }
+
+   /* Processing RX queues common config */
+   if (of_property_read_u8(rx_node, "snps,rx-queues-to-use",
+   >rx_queues_to_use))
+   plat->rx_queues_to_use = 1;
+
+   if (of_property_read_bool(rx_node, "snps,rx-sched-sp"))
+   plat->rx_sched_algorithm = MTL_RX_ALGORITHM_SP;
+   else if (of_property_read_bool(rx_node, "snps,rx-sched-wsp"))
+   plat->rx_sched_algorithm = MTL_RX_ALGORITHM_WSP;
+   else
+   plat->rx_sched_algorithm = MTL_RX_ALGORITHM_SP;
+
+   /* Processing

[PATCH v5 net-next 9/9] net: stmmac: configuration of CBS in case of a TX AVB queue

2017-03-10 Thread Joao Pinto

This patch adds the configuration of the AVB Credit-Based Shaper.

Signed-off-by: Joao Pinto 
---
Changes v4->v5:
- patch title update (stmicro replaced by stmmac)
Changes v3->v4:
- variable init was simplified in some functions
Changes v1->v3:
- Added in v3

 Documentation/devicetree/bindings/net/stmmac.txt   | 22 +--
 drivers/net/ethernet/stmicro/stmmac/common.h   |  4 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h   | 33 
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  | 46 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 29 ++
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  | 29 --
 include/linux/stmmac.h | 12 --
 7 files changed, 164 insertions(+), 11 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index 4107e67..9b4d5dd 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -92,8 +92,15 @@ Optional properties:
- snps,tx-sched-dwrr: Deficit Weighted Round Robin
- snps,tx-sched-sp: Strict priority
- For each TX queue
-   - snps,weight: TX queue weight (if using a weighted algorithm)
-
+   - snps,weight: TX queue weight (if using a DCB weight algorithm)
+   - Choose one of these modes:
+   - snps,dcb-algorithm: TX queue will be working in DCB
+   - snps,avb-algorithm: TX queue will be working in AVB
+   - Configure Credit Base Shaper (if AVB Mode selected):
+   - snps,send_slope: enable Low Power Interface
+   - snps,idle_slope: unlock on WoL
+   - snps,high_credit: max write outstanding req. limit
+   - snps,low_credit: max read outstanding req. limit
 Examples:
 
stmmac_axi_setup: stmmac-axi-config {
@@ -112,10 +119,19 @@ Examples:
};
 
mtl_tx_setup: tx-queues-config {
-   snps,tx-queues-to-use = <1>;
+   snps,tx-queues-to-use = <2>;
snps,tx-sched-wrr;
queue0 {
snps,weight = <0x10>;
+   snps,dcb-algorithm;
+   };
+
+   queue1 {
+   snps,avb-algorithm;
+   snps,send_slope = <0x1000>;
+   snps,idle_slope = <0x1000>;
+   snps,high_credit = <0x3E800>;
+   snps,low_credit = <0xFFC18000>;
};
};
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 86f43ac..e3ced07 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -464,6 +464,10 @@ struct stmmac_ops {
u32 weight, u32 queue);
/* RX MTL queue to RX dma mapping */
void (*map_mtl_to_dma)(struct mac_device_info *hw, u32 queue, u32 chan);
+   /* Configure AV Algorithm */
+   void (*config_cbs)(struct mac_device_info *hw, u32 send_slope,
+  u32 idle_slope, u32 high_credit, u32 low_credit,
+  u32 queue);
/* Dump MAC registers */
void (*dump_regs)(struct mac_device_info *hw, u32 *reg_space);
/* Handle extra events on specific interrupts hw dependent */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 5ca4d64..cf0e602 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -218,6 +218,15 @@ enum power_event {
 #define MTL_OP_MODE_RTC_96 (2 << MTL_OP_MODE_RTC_SHIFT)
 #define MTL_OP_MODE_RTC_128(3 << MTL_OP_MODE_RTC_SHIFT)
 
+/* MTL ETS Control register */
+#define MTL_ETS_CTRL_BASE_ADDR 0x0d10
+#define MTL_ETS_CTRL_BASE_OFFSET   0x40
+#define MTL_ETSX_CTRL_BASE_ADDR(x) (MTL_ETS_CTRL_BASE_ADDR + \
+   ((x) * MTL_ETS_CTRL_BASE_OFFSET))
+
+#define MTL_ETS_CTRL_CCBIT(3)
+#define MTL_ETS_CTRL_AVALG BIT(2)
+
 /* MTL Queue Quantum Weight */
 #define MTL_TXQ_WEIGHT_BASE_ADDR   0x0d18
 #define MTL_TXQ_WEIGHT_BASE_OFFSET 0x40
@@ -225,6 +234,30 @@ enum power_event {
((x) * MTL_TXQ_WEIGHT_BASE_OFFSET))
 #define MTL_TXQ_WEIGHT_ISCQW_MASK  GENMASK(20, 0)
 
+/* MTL sendSlopeCredit register */
+#define MTL_SEND_SLP_CRED_BASE_ADDR0x0d1c
+#define MTL_SEND_SLP_CRED_OFFSET   0x40
+#define MTL_SEND_SLP_CREDX_BASE_ADDR(x)(MTL_SEND_SLP_CRED_BASE_ADDR + \
+   ((x) * MTL_SEND_SLP_CRED_OFFSET))
+
+#define MTL_SEND_SLP_CRED_SSC_MASK

[PATCH v5 net-next 4/9] net: stmmac: mtl rx queue enabled as dcb or avb

2017-03-10 Thread Joao Pinto

This patch introduces the enabling of RX queues as DCB or as AVB based
on configuration.

Signed-off-by: Joao Pinto 
---
Changes v4->v5:
- patch title update (stmicro replaced by stmmac)
Changes v3->v4:
- variable init was simplified in some functions
Changes v1->v3:
- Just to keep up with patch-set version

 drivers/net/ethernet/stmicro/stmmac/common.h  |  2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c |  8 ++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 19 +++
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index a25b2f8..ad89c47 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -454,7 +454,7 @@ struct stmmac_ops {
/* Enable and verify that the IPC module is supported */
int (*rx_ipc)(struct mac_device_info *hw);
/* Enable RX Queues */
-   void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue);
+   void (*rx_queue_enable)(struct mac_device_info *hw, u8 mode, u32 queue);
/* Program RX Algorithms */
void (*prog_mtl_rx_algorithms)(struct mac_device_info *hw, u32 rx_alg);
/* Program TX Algorithms */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index fda6cfa..21a696e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -59,13 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw, 
int mtu)
writel(value, ioaddr + GMAC_INT_EN);
 }
 
-static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue)
+static void dwmac4_rx_queue_enable(struct mac_device_info *hw,
+  u8 mode, u32 queue)
 {
void __iomem *ioaddr = hw->pcsr;
u32 value = readl(ioaddr + GMAC_RXQ_CTRL0);
 
value &= GMAC_RX_QUEUE_CLEAR(queue);
-   value |= GMAC_RX_AV_QUEUE_ENABLE(queue);
+   if (mode == MTL_RX_AVB)
+   value |= GMAC_RX_AV_QUEUE_ENABLE(queue);
+   else if (mode == MTL_RX_DCB)
+   value |= GMAC_RX_DCB_QUEUE_ENABLE(queue);
 
writel(value, ioaddr + GMAC_RXQ_CTRL0);
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 735540f..c3e2dbf 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1256,19 +1256,14 @@ static void free_dma_desc_resources(struct stmmac_priv 
*priv)
  */
 static void stmmac_mac_enable_rx_queues(struct stmmac_priv *priv)
 {
-   int rx_count = priv->dma_cap.number_rx_queues;
-   int queue = 0;
-
-   /* If GMAC does not have multiple queues, then this is not necessary*/
-   if (rx_count == 1)
-   return;
+   u32 rx_queues_count = priv->plat->rx_queues_to_use;
+   int queue;
+   u8 mode;
 
-   /**
-*  If the core is synthesized with multiple rx queues / multiple
-*  dma channels, then rx queues will be disabled by default.
-*  For now only rx queue 0 is enabled.
-*/
-   priv->hw->mac->rx_queue_enable(priv->hw, queue);
+   for (queue = 0; queue < rx_queues_count; queue++) {
+   mode = priv->plat->rx_queues_cfg[queue].mode_to_use;
+   priv->hw->mac->rx_queue_enable(priv->hw, mode, queue);
+   }
 }
 
 /**
-- 
2.9.3

[PATCH v5 net-next 8/9] net: stmmac: mac debug prepared for multiple queues

2017-03-10 Thread Joao Pinto

This patch prepares mac debug dump for multiple queues.

Signed-off-by: Joao Pinto 
---
Changes v4->v5:
- patch title update (stmicro replaced by stmmac)
Changes v3->v4:
- Debug function now gets all rx / tx queues data
Changes v1->v3:
- Just to keep up with patch-set version

 drivers/net/ethernet/stmicro/stmmac/common.h   |   3 +-
 .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   |   3 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  | 107 +++--
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |   5 +-
 4 files changed, 64 insertions(+), 54 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 6a348d3..86f43ac 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -488,7 +488,8 @@ struct stmmac_ops {
void (*reset_eee_mode)(struct mac_device_info *hw);
void (*set_eee_timer)(struct mac_device_info *hw, int ls, int tw);
void (*set_eee_pls)(struct mac_device_info *hw, int link);
-   void (*debug)(void __iomem *ioaddr, struct stmmac_extra_stats *x);
+   void (*debug)(void __iomem *ioaddr, struct stmmac_extra_stats *x,
+ u32 rx_queues, u32 tx_queues);
/* PCS calls */
void (*pcs_ctrl_ane)(void __iomem *ioaddr, bool ane, bool srgmi_ral,
 bool loopback);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index 3a95ad9..7f78f77 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -413,7 +413,8 @@ static void dwmac1000_get_adv_lp(void __iomem *ioaddr, 
struct rgmii_adv *adv)
dwmac_get_adv_lp(ioaddr, GMAC_PCS_BASE, adv);
 }
 
-static void dwmac1000_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x)
+static void dwmac1000_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x,
+   u32 rx_queues, u32 tx_queues)
 {
u32 value = readl(ioaddr + GMAC_DEBUG);
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index f0f2dce..670cfee 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -469,64 +469,69 @@ static int dwmac4_irq_status(struct mac_device_info *hw,
return ret;
 }
 
-static void dwmac4_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x)
+static void dwmac4_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x,
+u32 rx_queues, u32 tx_queues)
 {
u32 value;
-
-   /*  Currently only channel 0 is supported */
-   value = readl(ioaddr + MTL_CHAN_TX_DEBUG(STMMAC_CHAN0));
-
-   if (value & MTL_DEBUG_TXSTSFSTS)
-   x->mtl_tx_status_fifo_full++;
-   if (value & MTL_DEBUG_TXFSTS)
-   x->mtl_tx_fifo_not_empty++;
-   if (value & MTL_DEBUG_TWCSTS)
-   x->mmtl_fifo_ctrl++;
-   if (value & MTL_DEBUG_TRCSTS_MASK) {
-   u32 trcsts = (value & MTL_DEBUG_TRCSTS_MASK)
->> MTL_DEBUG_TRCSTS_SHIFT;
-   if (trcsts == MTL_DEBUG_TRCSTS_WRITE)
-   x->mtl_tx_fifo_read_ctrl_write++;
-   else if (trcsts == MTL_DEBUG_TRCSTS_TXW)
-   x->mtl_tx_fifo_read_ctrl_wait++;
-   else if (trcsts == MTL_DEBUG_TRCSTS_READ)
-   x->mtl_tx_fifo_read_ctrl_read++;
-   else
-   x->mtl_tx_fifo_read_ctrl_idle++;
+   u32 queue;
+
+   for (queue = 0; queue < tx_queues; queue++) {
+   value = readl(ioaddr + MTL_CHAN_TX_DEBUG(queue));
+
+   if (value & MTL_DEBUG_TXSTSFSTS)
+   x->mtl_tx_status_fifo_full++;
+   if (value & MTL_DEBUG_TXFSTS)
+   x->mtl_tx_fifo_not_empty++;
+   if (value & MTL_DEBUG_TWCSTS)
+   x->mmtl_fifo_ctrl++;
+   if (value & MTL_DEBUG_TRCSTS_MASK) {
+   u32 trcsts = (value & MTL_DEBUG_TRCSTS_MASK)
+>> MTL_DEBUG_TRCSTS_SHIFT;
+   if (trcsts == MTL_DEBUG_TRCSTS_WRITE)
+   x->mtl_tx_fifo_read_ctrl_write++;
+   else if (trcsts == MTL_DEBUG_TRCSTS_TXW)
+   x->mtl_tx_fifo_read_ctrl_wait++;
+   else if (trcsts == MTL_DEBUG_TRCSTS_READ)
+   x->mtl_tx_fifo_read_ctrl_read++;
+   else
+   x->mtl_tx_fifo_read_ctrl_idle++;
+   }
+   if (value & MTL_DEBUG_TXPAUSED)
+   x->mac_tx_in_pause++;
}
-   if (value & MTL_DEBUG_TXPAUSED)
-   x->mac_tx_in_pause++;
 
-

[PATCH v5 net-next 3/9] net: stmmac: configure tx queue weight

2017-03-10 Thread Joao Pinto

This patch adds TX queues weight programming.

Signed-off-by: Joao Pinto 
---
Changes v4->v5:
- patch title update (stmicro replaced by stmmac)
Changes v3->v4:
- variable init was simplified in some functions
Changes v2->v3:
- local variable declarations from longest to shortest line
Changes v1->v2:
- Just to keep up with patch-set version

 drivers/net/ethernet/stmicro/stmmac/common.h  |  3 +++
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |  7 +++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 20 
 4 files changed, 42 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 5a0a781..a25b2f8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -459,6 +459,9 @@ struct stmmac_ops {
void (*prog_mtl_rx_algorithms)(struct mac_device_info *hw, u32 rx_alg);
/* Program TX Algorithms */
void (*prog_mtl_tx_algorithms)(struct mac_device_info *hw, u32 tx_alg);
+   /* Set MTL TX queues weight */
+   void (*set_mtl_tx_queue_weight)(struct mac_device_info *hw,
+   u32 weight, u32 queue);
/* Dump MAC registers */
void (*dump_regs)(struct mac_device_info *hw, u32 *reg_space);
/* Handle extra events on specific interrupts hw dependent */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 748ab6f..7d77e78 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -211,6 +211,13 @@ enum power_event {
 #define MTL_OP_MODE_RTC_96 (2 << MTL_OP_MODE_RTC_SHIFT)
 #define MTL_OP_MODE_RTC_128(3 << MTL_OP_MODE_RTC_SHIFT)
 
+/* MTL Queue Quantum Weight */
+#define MTL_TXQ_WEIGHT_BASE_ADDR   0x0d18
+#define MTL_TXQ_WEIGHT_BASE_OFFSET 0x40
+#define MTL_TXQX_WEIGHT_BASE_ADDR(x)   (MTL_TXQ_WEIGHT_BASE_ADDR + \
+   ((x) * MTL_TXQ_WEIGHT_BASE_OFFSET))
+#define MTL_TXQ_WEIGHT_ISCQW_MASK  GENMASK(20, 0)
+
 /*  MTL debug */
 #define MTL_DEBUG_TXSTSFSTSBIT(5)
 #define MTL_DEBUG_TXFSTS   BIT(4)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index f966755..fda6cfa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -116,6 +116,17 @@ static void dwmac4_prog_mtl_tx_algorithms(struct 
mac_device_info *hw,
}
 }
 
+static void dwmac4_set_mtl_tx_queue_weight(struct mac_device_info *hw,
+  u32 weight, u32 queue)
+{
+   void __iomem *ioaddr = hw->pcsr;
+   u32 value = readl(ioaddr + MTL_TXQX_WEIGHT_BASE_ADDR(queue));
+
+   value &= ~MTL_TXQ_WEIGHT_ISCQW_MASK;
+   value |= weight & MTL_TXQ_WEIGHT_ISCQW_MASK;
+   writel(value, ioaddr + MTL_TXQX_WEIGHT_BASE_ADDR(queue));
+}
+
 static void dwmac4_dump_regs(struct mac_device_info *hw, u32 *reg_space)
 {
void __iomem *ioaddr = hw->pcsr;
@@ -505,6 +516,7 @@ static const struct stmmac_ops dwmac4_ops = {
.rx_queue_enable = dwmac4_rx_queue_enable,
.prog_mtl_rx_algorithms = dwmac4_prog_mtl_rx_algorithms,
.prog_mtl_tx_algorithms = dwmac4_prog_mtl_tx_algorithms,
+   .set_mtl_tx_queue_weight = dwmac4_set_mtl_tx_queue_weight,
.dump_regs = dwmac4_dump_regs,
.host_irq_status = dwmac4_irq_status,
.flow_ctrl = dwmac4_flow_ctrl,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index af57f8d..735540f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1645,6 +1645,23 @@ static void stmmac_init_tx_coalesce(struct stmmac_priv 
*priv)
 }
 
 /**
+ *  stmmac_set_tx_queue_weight - Set TX queue weight
+ *  @priv: driver private structure
+ *  Description: It is used for setting TX queues weight
+ */
+static void stmmac_set_tx_queue_weight(struct stmmac_priv *priv)
+{
+   u32 tx_queues_count = priv->plat->tx_queues_to_use;
+   u32 weight;
+   u32 queue;
+
+   for (queue = 0; queue < tx_queues_count; queue++) {
+   weight = priv->plat->tx_queues_cfg[queue].weight;
+   priv->hw->mac->set_mtl_tx_queue_weight(priv->hw, weight, queue);
+   }
+}
+
+/**
  *  stmmac_mtl_configuration - Configure MTL
  *  @priv: driver private structure
  *  Description: It is used for configurring MTL
@@ -1654,6 +1671,9 @@ static void stmmac_mtl_configuration(struct stmmac_priv 
*priv)
u32 rx_queues_count = priv->plat->rx_queues_to_use;
u32 tx_queues_count = priv->plat->tx_queues_to_use;
 
+   if (tx_queues_count > 1 &&

[PATCH v5 net-next 6/9] net: stmmac: flow_ctrl functions adapted to mtl

2017-03-10 Thread Joao Pinto

This patch adapts flow_ctrl function to prepare it for multiple queues.

Signed-off-by: Joao Pinto 
---
Changes v4->v5:
- patch title update (stmicro replaced by stmmac)
Changes v3->v4:
- variable init simplified in some functions
Changes v1->v3:
- Just to keep up with patch-set version

 drivers/net/ethernet/stmicro/stmmac/common.h |  2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c |  3 ++-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  |  3 ++-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c| 20 +---
 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c |  3 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c| 17 ++---
 6 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 32f5f25..5532633 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -473,7 +473,7 @@ struct stmmac_ops {
void (*set_filter)(struct mac_device_info *hw, struct net_device *dev);
/* Flow control setting */
void (*flow_ctrl)(struct mac_device_info *hw, unsigned int duplex,
- unsigned int fc, unsigned int pause_time);
+ unsigned int fc, unsigned int pause_time, u32 tx_cnt);
/* Set power management mode (e.g. magic frame) */
void (*pmt)(struct mac_device_info *hw, unsigned long mode);
/* Set/Get Unicast MAC addresses */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index 19b9b30..3a95ad9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -216,7 +216,8 @@ static void dwmac1000_set_filter(struct mac_device_info *hw,
 
 
 static void dwmac1000_flow_ctrl(struct mac_device_info *hw, unsigned int 
duplex,
-   unsigned int fc, unsigned int pause_time)
+   unsigned int fc, unsigned int pause_time,
+   u32 tx_cnt)
 {
void __iomem *ioaddr = hw->pcsr;
/* Set flow such that DZPQ in Mac Register 6 is 0,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index e370cce..524135e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -131,7 +131,8 @@ static void dwmac100_set_filter(struct mac_device_info *hw,
 }
 
 static void dwmac100_flow_ctrl(struct mac_device_info *hw, unsigned int duplex,
-  unsigned int fc, unsigned int pause_time)
+  unsigned int fc, unsigned int pause_time,
+  u32 tx_cnt)
 {
void __iomem *ioaddr = hw->pcsr;
unsigned int flow = MAC_FLOW_CTRL_ENABLE;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index e9b153f..3069def 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -336,11 +336,12 @@ static void dwmac4_set_filter(struct mac_device_info *hw,
 }
 
 static void dwmac4_flow_ctrl(struct mac_device_info *hw, unsigned int duplex,
-unsigned int fc, unsigned int pause_time)
+unsigned int fc, unsigned int pause_time,
+u32 tx_cnt)
 {
void __iomem *ioaddr = hw->pcsr;
-   u32 channel = STMMAC_CHAN0; /* FIXME */
unsigned int flow = 0;
+   u32 queue = 0;
 
pr_debug("GMAC Flow-Control:\n");
if (fc & FLOW_RX) {
@@ -350,13 +351,18 @@ static void dwmac4_flow_ctrl(struct mac_device_info *hw, 
unsigned int duplex,
}
if (fc & FLOW_TX) {
pr_debug("\tTransmit Flow-Control ON\n");
-   flow |= GMAC_TX_FLOW_CTRL_TFE;
-   writel(flow, ioaddr + GMAC_QX_TX_FLOW_CTRL(channel));
 
-   if (duplex) {
+   if (duplex)
pr_debug("\tduplex mode: PAUSE %d\n", pause_time);
-   flow |= (pause_time << GMAC_TX_FLOW_CTRL_PT_SHIFT);
-   writel(flow, ioaddr + GMAC_QX_TX_FLOW_CTRL(channel));
+
+   for (queue = 0; queue < tx_cnt; queue++) {
+   flow |= GMAC_TX_FLOW_CTRL_TFE;
+
+   if (duplex)
+   flow |=
+   (pause_time << GMAC_TX_FLOW_CTRL_PT_SHIFT);
+
+   writel(flow, ioaddr + GMAC_QX_TX_FLOW_CTRL(queue));
}
}
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 85d6411..4a5dc89 100644
---

[PATCH v5 net-next 2/9] net: stmmac: configure mtl rx and tx algorithms

2017-03-10 Thread Joao Pinto

This patch adds the RX and TX scheduling algorithms programming.
It introduces the multiple queues configuration function
(stmmac_mtl_configuration) in stmmac_main.

Signed-off-by: Joao Pinto 
---
Changes v4->v5:
- patch title update (stmicro replaced by stmmac)
Changes v3->v4:
- Just to keep up with patch-set version
Changes v2->v3:
- Switch statements with a tab
Changes v1->v2:
- Just to keep up with patch-set version

 drivers/net/ethernet/stmicro/stmmac/common.h  |  4 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  | 10 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 48 +++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 31 +--
 4 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 04d9245..5a0a781 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -455,6 +455,10 @@ struct stmmac_ops {
int (*rx_ipc)(struct mac_device_info *hw);
/* Enable RX Queues */
void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue);
+   /* Program RX Algorithms */
+   void (*prog_mtl_rx_algorithms)(struct mac_device_info *hw, u32 rx_alg);
+   /* Program TX Algorithms */
+   void (*prog_mtl_tx_algorithms)(struct mac_device_info *hw, u32 tx_alg);
/* Dump MAC registers */
void (*dump_regs)(struct mac_device_info *hw, u32 *reg_space);
/* Handle extra events on specific interrupts hw dependent */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index db45134..748ab6f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -161,6 +161,16 @@ enum power_event {
 #define GMAC_HI_REG_AE BIT(31)
 
 /*  MTL registers */
+#define MTL_OPERATION_MODE 0x0c00
+#define MTL_OPERATION_SCHALG_MASK  GENMASK(6, 5)
+#define MTL_OPERATION_SCHALG_WRR   (0x0 << 5)
+#define MTL_OPERATION_SCHALG_WFQ   (0x1 << 5)
+#define MTL_OPERATION_SCHALG_DWRR  (0x2 << 5)
+#define MTL_OPERATION_SCHALG_SP(0x3 << 5)
+#define MTL_OPERATION_RAA  BIT(2)
+#define MTL_OPERATION_RAA_SP   (0x0 << 2)
+#define MTL_OPERATION_RAA_WSP  (0x1 << 2)
+
 #define MTL_INT_STATUS 0x0c20
 #define MTL_INT_Q0 BIT(0)
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 1e79e65..f966755 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -70,6 +70,52 @@ static void dwmac4_rx_queue_enable(struct mac_device_info 
*hw, u32 queue)
writel(value, ioaddr + GMAC_RXQ_CTRL0);
 }
 
+static void dwmac4_prog_mtl_rx_algorithms(struct mac_device_info *hw,
+ u32 rx_alg)
+{
+   void __iomem *ioaddr = hw->pcsr;
+   u32 value = readl(ioaddr + MTL_OPERATION_MODE);
+
+   value &= ~MTL_OPERATION_RAA;
+   switch (rx_alg) {
+   case MTL_RX_ALGORITHM_SP:
+   value |= MTL_OPERATION_RAA_SP;
+   break;
+   case MTL_RX_ALGORITHM_WSP:
+   value |= MTL_OPERATION_RAA_WSP;
+   break;
+   default:
+   break;
+   }
+
+   writel(value, ioaddr + MTL_OPERATION_MODE);
+}
+
+static void dwmac4_prog_mtl_tx_algorithms(struct mac_device_info *hw,
+ u32 tx_alg)
+{
+   void __iomem *ioaddr = hw->pcsr;
+   u32 value = readl(ioaddr + MTL_OPERATION_MODE);
+
+   value &= ~MTL_OPERATION_SCHALG_MASK;
+   switch (tx_alg) {
+   case MTL_TX_ALGORITHM_WRR:
+   value |= MTL_OPERATION_SCHALG_WRR;
+   break;
+   case MTL_TX_ALGORITHM_WFQ:
+   value |= MTL_OPERATION_SCHALG_WFQ;
+   break;
+   case MTL_TX_ALGORITHM_DWRR:
+   value |= MTL_OPERATION_SCHALG_DWRR;
+   break;
+   case MTL_TX_ALGORITHM_SP:
+   value |= MTL_OPERATION_SCHALG_SP;
+   break;
+   default:
+   break;
+   }
+}
+
 static void dwmac4_dump_regs(struct mac_device_info *hw, u32 *reg_space)
 {
void __iomem *ioaddr = hw->pcsr;
@@ -457,6 +503,8 @@ static const struct stmmac_ops dwmac4_ops = {
.core_init = dwmac4_core_init,
.rx_ipc = dwmac4_rx_ipc_enable,
.rx_queue_enable = dwmac4_rx_queue_enable,
+   .prog_mtl_rx_algorithms = dwmac4_prog_mtl_rx_algorithms,
+   .prog_mtl_tx_algorithms = dwmac4_prog_mtl_tx_algorithms,
.dump_regs = dwmac4_dump_regs,
.host_irq_status = dwmac4_irq_status,
.flow_ctrl = dwmac4_flow_ctrl,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c

[PATCH v5 net-next 7/9] net: stmmac: prepare irq_status for mtl

2017-03-10 Thread Joao Pinto

This patch prepares mac irq status treatment for multiple queues.

Signed-off-by: Joao Pinto 
---
Changes v4->v5:
- patch title update (stmicro replaced by stmmac)
Changes v3->v4:
- Build error in dwmac4_irq_mtl_status()
- variable init simplified in some functions
Changes v2->v3:
- local variable declarations from longest to shortest line
Changes v1->v2:
- Just to keep up with patch-set version

 drivers/net/ethernet/stmicro/stmmac/common.h  |  2 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |  2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 40 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |  5 +++
 4 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 5532633..6a348d3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -469,6 +469,8 @@ struct stmmac_ops {
/* Handle extra events on specific interrupts hw dependent */
int (*host_irq_status)(struct mac_device_info *hw,
   struct stmmac_extra_stats *x);
+   /* Handle MTL interrupts */
+   int (*host_mtl_irq_status)(struct mac_device_info *hw, u32 chan);
/* Multicast filter setting */
void (*set_filter)(struct mac_device_info *hw, struct net_device *dev);
/* Flow control setting */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 9dd8ac1..5ca4d64 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -172,7 +172,7 @@ enum power_event {
 #define MTL_OPERATION_RAA_WSP  (0x1 << 2)
 
 #define MTL_INT_STATUS 0x0c20
-#define MTL_INT_Q0 BIT(0)
+#define MTL_INT_QX(x)  BIT(x)
 
 #define MTL_RXQ_DMA_MAP0   0x0c30 /* queue 0 to 3 */
 #define MTL_RXQ_DMA_MAP1   0x0c34 /* queue 4 to 7 */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 3069def..f0f2dce 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -416,11 +416,34 @@ static void dwmac4_phystatus(void __iomem *ioaddr, struct 
stmmac_extra_stats *x)
}
 }
 
+static int dwmac4_irq_mtl_status(struct mac_device_info *hw, u32 chan)
+{
+   void __iomem *ioaddr = hw->pcsr;
+   u32 mtl_int_qx_status;
+   int ret = 0;
+
+   mtl_int_qx_status = readl(ioaddr + MTL_INT_STATUS);
+
+   /* Check MTL Interrupt */
+   if (mtl_int_qx_status & MTL_INT_QX(chan)) {
+   /* read Queue x Interrupt status */
+   u32 status = readl(ioaddr + MTL_CHAN_INT_CTRL(chan));
+
+   if (status & MTL_RX_OVERFLOW_INT) {
+   /*  clear Interrupt */
+   writel(status | MTL_RX_OVERFLOW_INT,
+  ioaddr + MTL_CHAN_INT_CTRL(chan));
+   ret = CORE_IRQ_MTL_RX_OVERFLOW;
+   }
+   }
+
+   return ret;
+}
+
 static int dwmac4_irq_status(struct mac_device_info *hw,
 struct stmmac_extra_stats *x)
 {
void __iomem *ioaddr = hw->pcsr;
-   u32 mtl_int_qx_status;
u32 intr_status;
int ret = 0;
 
@@ -439,20 +462,6 @@ static int dwmac4_irq_status(struct mac_device_info *hw,
x->irq_receive_pmt_irq_n++;
}
 
-   mtl_int_qx_status = readl(ioaddr + MTL_INT_STATUS);
-   /* Check MTL Interrupt: Currently only one queue is used: Q0. */
-   if (mtl_int_qx_status & MTL_INT_Q0) {
-   /* read Queue 0 Interrupt status */
-   u32 status = readl(ioaddr + MTL_CHAN_INT_CTRL(STMMAC_CHAN0));
-
-   if (status & MTL_RX_OVERFLOW_INT) {
-   /*  clear Interrupt */
-   writel(status | MTL_RX_OVERFLOW_INT,
-  ioaddr + MTL_CHAN_INT_CTRL(STMMAC_CHAN0));
-   ret = CORE_IRQ_MTL_RX_OVERFLOW;
-   }
-   }
-
dwmac_pcs_isr(ioaddr, GMAC_PCS_BASE, intr_status, x);
if (intr_status & PCS_RGSMIIIS_IRQ)
dwmac4_phystatus(ioaddr, x);
@@ -554,6 +563,7 @@ static const struct stmmac_ops dwmac4_ops = {
.map_mtl_to_dma = dwmac4_map_mtl_dma,
.dump_regs = dwmac4_dump_regs,
.host_irq_status = dwmac4_irq_status,
+   .host_mtl_irq_status = dwmac4_irq_mtl_status,
.flow_ctrl = dwmac4_flow_ctrl,
.pmt = dwmac4_pmt,
.set_umac_addr = dwmac4_set_umac_addr,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ed57409..3be77d4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++

[PATCH v5 net-next 5/9] net: stmmac: mapping mtl rx to dma channel

2017-03-10 Thread Joao Pinto

This patch adds the functionality of RX queue to dma channel mapping
based on configuration.

Signed-off-by: Joao Pinto 
---
Changes v4->v5:
- patch title update (stmicro replaced by stmmac)
Changes v3->v4:
- variable init was simplified in some functions
Changes v1->v3:
- Just to keep up with patch-set version

 drivers/net/ethernet/stmicro/stmmac/common.h  |  2 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |  7 +++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 25 +++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 21 +++
 4 files changed, 55 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index ad89c47..32f5f25 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -462,6 +462,8 @@ struct stmmac_ops {
/* Set MTL TX queues weight */
void (*set_mtl_tx_queue_weight)(struct mac_device_info *hw,
u32 weight, u32 queue);
+   /* RX MTL queue to RX dma mapping */
+   void (*map_mtl_to_dma)(struct mac_device_info *hw, u32 queue, u32 chan);
/* Dump MAC registers */
void (*dump_regs)(struct mac_device_info *hw, u32 *reg_space);
/* Handle extra events on specific interrupts hw dependent */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 7d77e78..9dd8ac1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -174,6 +174,13 @@ enum power_event {
 #define MTL_INT_STATUS 0x0c20
 #define MTL_INT_Q0 BIT(0)
 
+#define MTL_RXQ_DMA_MAP0   0x0c30 /* queue 0 to 3 */
+#define MTL_RXQ_DMA_MAP1   0x0c34 /* queue 4 to 7 */
+#define MTL_RXQ_DMA_Q04MDMACH_MASK GENMASK(3, 0)
+#define MTL_RXQ_DMA_Q04MDMACH(x)   ((x) << 0)
+#define MTL_RXQ_DMA_QXMDMACH_MASK(x)   GENMASK(11 + (8 * ((x) - 1)), 8 * (x))
+#define MTL_RXQ_DMA_QXMDMACH(chan, q)  ((chan) << (8 * (q)))
+
 #define MTL_CHAN_BASE_ADDR 0x0d00
 #define MTL_CHAN_BASE_OFFSET   0x40
 #define MTL_CHANX_BASE_ADDR(x) (MTL_CHAN_BASE_ADDR + \
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 21a696e..e9b153f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -131,6 +131,30 @@ static void dwmac4_set_mtl_tx_queue_weight(struct 
mac_device_info *hw,
writel(value, ioaddr + MTL_TXQX_WEIGHT_BASE_ADDR(queue));
 }
 
+static void dwmac4_map_mtl_dma(struct mac_device_info *hw, u32 queue, u32 chan)
+{
+   void __iomem *ioaddr = hw->pcsr;
+   u32 value;
+
+   if (queue < 4)
+   value = readl(ioaddr + MTL_RXQ_DMA_MAP0);
+   else
+   value = readl(ioaddr + MTL_RXQ_DMA_MAP1);
+
+   if (queue == 0 || queue == 4) {
+   value &= ~MTL_RXQ_DMA_Q04MDMACH_MASK;
+   value |= MTL_RXQ_DMA_Q04MDMACH(chan);
+   } else {
+   value &= ~MTL_RXQ_DMA_QXMDMACH_MASK(queue);
+   value |= MTL_RXQ_DMA_QXMDMACH(chan, queue);
+   }
+
+   if (queue < 4)
+   writel(value, ioaddr + MTL_RXQ_DMA_MAP0);
+   else
+   writel(value, ioaddr + MTL_RXQ_DMA_MAP1);
+}
+
 static void dwmac4_dump_regs(struct mac_device_info *hw, u32 *reg_space)
 {
void __iomem *ioaddr = hw->pcsr;
@@ -521,6 +545,7 @@ static const struct stmmac_ops dwmac4_ops = {
.prog_mtl_rx_algorithms = dwmac4_prog_mtl_rx_algorithms,
.prog_mtl_tx_algorithms = dwmac4_prog_mtl_tx_algorithms,
.set_mtl_tx_queue_weight = dwmac4_set_mtl_tx_queue_weight,
+   .map_mtl_to_dma = dwmac4_map_mtl_dma,
.dump_regs = dwmac4_dump_regs,
.host_irq_status = dwmac4_irq_status,
.flow_ctrl = dwmac4_flow_ctrl,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c3e2dbf..619bcc6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1657,6 +1657,23 @@ static void stmmac_set_tx_queue_weight(struct 
stmmac_priv *priv)
 }
 
 /**
+ *  stmmac_rx_queue_dma_chan_map - Map RX queue to RX dma channel
+ *  @priv: driver private structure
+ *  Description: It is used for mapping RX queues to RX dma channels
+ */
+static void stmmac_rx_queue_dma_chan_map(struct stmmac_priv *priv)
+{
+   u32 rx_queues_count = priv->plat->rx_queues_to_use;
+   u32 queue;
+   u32 chan;
+
+   for (queue = 0; queue < rx_queues_count; queue++) {
+   chan = priv->plat->rx_queues_cfg[queue].chan;
+   priv->hw->mac->map_mtl_to_dma(priv->hw, queue, chan);
+   }
+}
+
+/**
  *

[PATCH v5 net-next 0/9] prepare mac operations for multiple queues

2017-03-10 Thread Joao Pinto

As agreed with David Miller, this patch-set is the first of 3 to enable
multiple queues in stmmac.

This first one concentrates on mac operations adding functionalities as:
a) Configuration through DT
b) RX and TX scheduling algorithms programming
b) TX queues weight programming (essential in weightes algorithms)
c) RX enable as DCB or AVB (preparing for future AVB support)
d) Mapping RX queue to DMA channel
e) IRQ treatment prepared for multiple queues
f) Debug dump prepared for multiple queues
g) CBS configuration

In v3 patch-set version I included a new patch to enable CBS configuration
(Patch 9).

Joao Pinto (9):
  net: stmmac: multiple queues dt configuration
  net: stmmac: configure mtl rx and tx algorithms
  net: stmmac: configure tx queue weight
  net: stmmac: mtl rx queue enabled as dcb or avb
  net: stmmac: mapping mtl rx to dma channel
  net: stmmac: flow_ctrl functions adapted to mtl
  net: stmmac: prepare irq_status for mtl
  net: stmmac: mac debug prepared for multiple queues
  net: stmmac: configuration of CBS in case of a TX AVB queue

 Documentation/devicetree/bindings/net/stmmac.txt   |  58 +++-
 drivers/net/ethernet/stmicro/stmmac/common.h   |  22 +-
 .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   |   6 +-
 .../net/ethernet/stmicro/stmmac/dwmac100_core.c|   3 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h   |  59 +++-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  | 302 -
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |   8 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 142 --
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  | 114 
 include/linux/stmmac.h |  36 +++
 10 files changed, 647 insertions(+), 103 deletions(-)

-- 
2.9.3

Re: [PATCH net] amd-xgbe: Enable IRQs only if napi_complete_done() is true

2017-03-10 Thread David Miller

From: Tom Lendacky 
Date: Fri, 10 Mar 2017 08:19:37 -0600

> On 3/9/2017 8:31 PM, David Miller wrote:
>> From: Tom Lendacky 
>> Date: Thu, 9 Mar 2017 17:48:23 -0600
>>
>>> Depending on the hardware, the amd-xgbe driver may use
>>> disable_irq_nosync()
>>> and enable_irq() when an interrupt is received to process Rx
>>> packets. If
>>> the napi_complete_done() return value isn't checked an unbalanced
>>> enable
>>> for the IRQ could result, generating a warning stack trace.
>>>
>>> Update the driver to only enable interrupts if napi_complete_done()
>>> returns
>>> true.
>>>
>>> Reported-by: Jeremy Linton 
>>> Signed-off-by: Tom Lendacky 
>>
>> Applied, thanks.
> 
> Thanks David!  The change to napi_complete_done() from void to bool
> occurred in 4.10, can you queue this fix up against 4.10 stable?

Sure, done.

Re: amd-xgbe: unbalanced irq enable in v4.11-rc1

2017-03-10 Thread Mark Rutland

On Fri, Mar 10, 2017 at 11:39:42AM -0600, Tom Lendacky wrote:
> On 3/10/2017 11:19 AM, Mark Rutland wrote:
> >Hi,
> >
> >I'm seeing the below splat when transferring data over the network, using
> >v4.11-rc1 on an AMD Seattle platform. I don't see the splat with v4.10.
> >
> >Looking at just the driver, I couldn't see any suspicious changes. Reverting
> >commit 402168b4c2dc0734 ("amd-xgbe: Stop the PHY before releasing 
> >interrupts")
> >doesn't change matters.
> >
> >Any ideas?
> 
> Yes, patch submitted.  Please see:
> 
>   http://marc.info/?l=linux-netdev=148910333810442=2

Ah, thanks for the pointer!

That appears to solve the issue for me.

Thanks,
Mark.

Re: [PATCH net 0/5] Mellanox mlx5 fixes 2017-03-09

2017-03-10 Thread David Miller

From: Saeed Mahameed 
Date: Fri, 10 Mar 2017 14:33:00 +0200

> This series contains some mlx5 core and ethernet driver fixes.
> 
> For -stable:
> net/mlx5e: remove IEEE/CEE mode check when setting DCBX mode (for kernel >= 
> 4.10)
> net/mlx5e: Avoid wrong identification of rules on deletion (for kernel >= 4.9)
> net/mlx5: Don't save PCI state when PCI error is detected (for kernel >= 4.9)
> net/mlx5: Fix create autogroup prev initializer (for kernel >=4.9)

Series applied, thanks.

Re: [PATCH v4 net-next 0/9] prepare mac operations for multiple queues

2017-03-10 Thread Joao Pinto

Às 5:50 PM de 3/10/2017, David Miller escreveu:
> From: Joao Pinto 
> Date: Fri, 10 Mar 2017 12:18:25 +
> 
>> As agreed with David Miller, this patch-set is the first of 3 to enable
>> multiple queues in stmmac.
> 
> I have one more request to make.
> 
> The name of the driver is "stmmac" therefore I'd like you to use
> "stmmac: " as the subsystem prefix in your Subject lines just as
> everyone else who works on this driver does.  Please don't use
> "stmicro: " thank you.
> 

Ok, I will do that. Thanks.

Re: [PATCH v4 net-next 0/9] prepare mac operations for multiple queues

2017-03-10 Thread David Miller

From: Joao Pinto 
Date: Fri, 10 Mar 2017 12:18:25 +

> As agreed with David Miller, this patch-set is the first of 3 to enable
> multiple queues in stmmac.

I have one more request to make.

The name of the driver is "stmmac" therefore I'd like you to use
"stmmac: " as the subsystem prefix in your Subject lines just as
everyone else who works on this driver does.  Please don't use
"stmicro: " thank you.

Re: net: BUG in unix_notinflight

2017-03-10 Thread Cong Wang

On Tue, Mar 7, 2017 at 2:23 PM, Nikolay Borisov
 wrote:
>
>>>
>>>
>>> New report from linux-next/c0b7b2b33bd17f7155956d0338ce92615da686c9
>>>
>>> [ cut here ]
>>> kernel BUG at net/unix/garbage.c:149!
>>> invalid opcode:  [#1] SMP KASAN
>>> Dumping ftrace buffer:
>>>(ftrace buffer empty)
>>> Modules linked in:
>>> CPU: 0 PID: 1806 Comm: syz-executor7 Not tainted 4.10.0-next-20170303+ #6
>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>> BIOS Google 01/01/2011
>>> task: 880121c64740 task.stack: 88012c9e8000
>>> RIP: 0010:unix_notinflight+0x417/0x5d0 net/unix/garbage.c:149
>>> RSP: 0018:88012c9ef0f8 EFLAGS: 00010297
>>> RAX: 880121c64740 RBX: 11002593de23 RCX: 8801c490c628
>>> RDX:  RSI: 11002593de27 RDI: 8557e504
>>> RBP: 88012c9ef220 R08: 0001 R09: 
>>> R10: dc00 R11: ed002593de55 R12: 8801c490c0c0
>>> R13: 88012c9ef1f8 R14: 85101620 R15: dc00
>>> FS:  013d3940() GS:8801dbe0() knlGS:
>>> CS:  0010 DS:  ES:  CR0: 80050033
>>> CR2: 01fd8cd8 CR3: 0001cce69000 CR4: 001426f0
>>> Call Trace:
>>>  unix_detach_fds.isra.23+0xfa/0x170 net/unix/af_unix.c:1490
>>>  unix_destruct_scm+0xf4/0x200 net/unix/af_unix.c:1499
>>
>> The problem here is there is no lock protecting concurrent unix_detach_fds()
>> even though unix_notinflight() is already serialized, if we call
>> unix_notinflight()
>> twice on the same file pointer, we trigger this bug...
>>
>> I don't know what is the right lock here to serialize it.
>>
>
>
> I reported something similar a while ago
> https://lists.gt.net/linux/kernel/2534612
>
> And Miklos Szeredi then produced the following patch :
>
> https://patchwork.kernel.org/patch/9305121/
>
> However, this was never applied. I wonder if the patch makes sense?

I doubt it is the same case. According to Miklos' description,
the case he tried to fix is MSG_PEEK, but Dmitry's test case does not
set it... They are different problems probably.

[PATCH] mpls: Send route delete notifications when router module is unloaded

2017-03-10 Thread David Ahern

When the mpls_router module is unloaded, mpls routes are deleted but
notifications are not sent to userspace leaving userspace caches
out of sync. Add the call to mpls_notify_route in mpls_net_exit as
routes are freed.

Fixes: 0189197f44160 ("mpls: Basic routing support")
Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 3818686182b2..a1477989ed0b 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -2028,6 +2028,7 @@ static void mpls_net_exit(struct net *net)
for (index = 0; index < platform_labels; index++) {
struct mpls_route *rt = rtnl_dereference(platform_label[index]);
RCU_INIT_POINTER(platform_label[index], NULL);
+   mpls_notify_route(net, index, rt, NULL, NULL);
mpls_rt_free(rt);
}
rtnl_unlock();
-- 
2.1.4

Re: [patch net-next 0/2] ipv4: fib: FIB notifications cleanup

2017-03-10 Thread David Miller

From: Jiri Pirko 
Date: Fri, 10 Mar 2017 08:56:17 +0100

> Ido says:
> 
> The first patch moves the core FIB notification code to a separate file,
> so that code related to FIB rules is placed in fib_rules.c and not
> fib_trie.c. The reason for the change will become even more apparent in
> follow-up patchset where we extend the FIB rules notifications.
> 
> Second patch removes a redundant argument.

Looks good, series applied.

Re: amd-xgbe: unbalanced irq enable in v4.11-rc1

2017-03-10 Thread Tom Lendacky


On 3/10/2017 11:19 AM, Mark Rutland wrote:

Hi,

I'm seeing the below splat when transferring data over the network, using
v4.11-rc1 on an AMD Seattle platform. I don't see the splat with v4.10.

Looking at just the driver, I couldn't see any suspicious changes. Reverting
commit 402168b4c2dc0734 ("amd-xgbe: Stop the PHY before releasing interrupts")
doesn't change matters.

Any ideas?


Yes, patch submitted.  Please see:

http://marc.info/?l=linux-netdev=148910333810442=2

Thanks,
Tom



Thanks,
Mark.

[  106.114135] Unbalanced enable for IRQ 34
[  106.118157] [ cut here ]
[  106.122793] WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:529 
__enable_irq+0x168/0x1d8
[  106.130708] Modules linked in:
[  106.133766]
[  106.135262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.11.0-rc1-1-g2411f70 #5
[  106.142830] Hardware name: AMD Seattle (Rev.B0) Development Board 
(Overdrive) (DT)
[  106.150398] task: 2a9e2000 task.stack: 2a9c
[  106.156320] PC is at __enable_irq+0x168/0x1d8
[  106.160679] LR is at __enable_irq+0x168/0x1d8
[  106.165036] pc : [] lr : [] pstate: 
01c5
[  106.172428] sp : 8003fff17c50
[  106.175742] x29: 8003fff17c50 x28: 800359255160
[  106.181063] x27: 10006b24aa2d x26: 800359255170
[  106.186385] x25: 2a9ca000 x24: 800359255168
[  106.191706] x23: 80036902ca80 x22: 80036bbe8f78
[  106.197027] x21: 10006d77d1ef x20: 
[  106.202348] x19: 80036bbe8f00 x18: 2a9e27b8
[  106.207668] x17: 7e2b5d90 x16: 
[  106.212989] x15: 1fffe4000153c4f7 x14: 6003f559
[  106.218310] x13: 2ac45000 x12: 2b803000
[  106.223631] x11: 1fffe40001ba7a8d x10: 
[  106.228951] x9 : 10007ffe2ee0 x8 : 2a9e27e8
[  106.234272] x7 : 2d2d3000 x6 : 
[  106.239592] x5 : 41b58ab3 x4 : 1fffe40001ba3a10
[  106.244913] x3 : 1fffe40001ba3a10 x2 : dfff2000
[  106.250234] x1 : 10007ffe2f58 x0 : 001c
[  106.23]
[  106.257044] ---[ end trace 86935ef2e5e5c498 ]---
[  106.261658] Call trace:
[  106.264106] Exception stack(0x8003fff17a00 to 0x8003fff17b30)
[  106.270550] 7a00: 80036bbe8f00 0001 8003fff17c50 
282a1340
[  106.278383] 7a20: 01c5 003d 2a9ca000 
800359255170
[  106.286216] 7a40: 10006b24aa2d 2a9e2000 8003fff17b60 
73870f20
[  106.294048] 7a60: 41b58ab3 2a7164f0 28081ac8 
2a05d4a0
[  106.301881] 7a80: 80036902ca80 800359255168 2a9ca000 
800359255170
[  106.309713] 7aa0: 8003fff17c50 8003fff17c50 8003fff17c10 
ffc8
[  106.317546] 7ac0: 41b58ab3 2a72a340 28299380 
2a9e2000
[  106.325379] 7ae0: 8003fff17b20 28bf8f08 80036bbe8f00 
80036d8002c0
[  106.333211] 7b00: 80036d8003f8 10007ffe2f70  

[  106.341040] 7b20: 001c 10007ffe2f58
[  106.345921] [] __enable_irq+0x168/0x1d8
[  106.351323] [] enable_irq+0xa8/0x118
[  106.356467] [] xgbe_one_poll+0x190/0x280
[  106.361956] [] net_rx_action+0x6ac/0xd88
[  106.367444] [] __do_softirq+0x324/0xb70
[  106.372845] [] irq_exit+0x1cc/0x338
[  106.377900] [] __handle_domain_irq+0xdc/0x230
[  106.383821] [] gic_handle_irq+0x6c/0xe0
[  106.389220] Exception stack(0x2a9c3d40 to 0x2a9c3e70)
[  106.395663] 3d40: 2a9e27bc 0007  
1fffe4000153c4f7
[  106.403495] 3d60: 0004   
2d2d3000
[  106.411327] 3d80: 2a9e27c0 040001538762  
0003
[  106.419160] 3da0: 2b803000 2ac45000 10007ffe4281 
1fffe4000153c4f7
[  106.426992] 3dc0:  7e2b5d90 2a9e27b8 
2a9e2000
[  106.434824] 3de0: 2a9e2000  2a9cfeb0 
0003
[  106.442656] 3e00: 00ff 2a9cfe18 2a901058 

[  106.450488] 3e20: 008004810018 2a9c3e70 2808a1d8 
2a9c3e70
[  106.458321] 3e40: 2808a1dc 6145  
2a901058
[  106.466150] 3e60:  2808a1d8
[  106.471029] [] el1_irq+0xb8/0x130
[  106.475909] [] arch_cpu_idle+0x1c/0x28
[  106.481225] [] default_idle_call+0x34/0x78
[  106.486888] [] do_idle+0x21c/0x3c0
[  106.491856] [] cpu_startup_entry+0x24/0x28
[  106.497517] [] rest_init+0x1ec/0x200
[  106.502659] [] start_kernel+0x5d8/0x604
[  106.508061] [] __primary_switched+0x6c/0x74

Re: [patch net-next 00/10] mlxsw: Preparations for VRF offload

2017-03-10 Thread David Miller

From: Jiri Pirko 
Date: Fri, 10 Mar 2017 08:53:33 +0100

> Ido says:
> 
> This patchset aims to prepare the mlxsw driver for VRF offload. The
> follow-up patchsets that introduce VRF support can be found here:
> https://github.com/idosch/linux/tree/idosch-next
> 
> The first four patches are mainly concerned with the netdevice
> notification block. There are no functional changes, but merely
> restructuring to more easily integrate VRF enslavement.
> 
> Patches 5-10 remove various assumptions throughout the code about a
> single virtual router (VR) and also restructure the internal data
> structures to more accurately represent the device's operation.

Series applied, thanks.

Re: [PATCH net] rxrpc: Wake up the transmitter if Rx window size increases on the peer

2017-03-10 Thread David Miller

From: David Howells 
Date: Fri, 10 Mar 2017 07:48:49 +

> The RxRPC ACK packet may contain an extension that includes the peer's
> current Rx window size for this call.  We adjust the local Tx window size
> to match.  However, the transmitter can stall if the receive window is
> reduced to 0 by the peer and then reopened.
> 
> This is because the normal way that the transmitter is re-energised is by
> dropping something out of our Tx queue and thus making space.  When a
> single gap is made, the transmitter is woken up.  However, because there's
> nothing in the Tx queue at this point, this doesn't happen.
> 
> To fix this, perform a wake_up() any time we see the peer's Rx window size
> increasing.
> 
> The observable symptom is that calls start failing on ETIMEDOUT and the
> following:
> 
>   kAFS: SERVER DEAD state=-62
> 
> appears in dmesg.
> 
> Signed-off-by: David Howells 

Applied, thanks David.

Re: [PATCH net] act_connmark: avoid crashing on malformed nlattrs with null parms

2017-03-10 Thread Cong Wang

On Fri, Mar 10, 2017 at 7:55 AM, Etienne Noss  wrote:
> tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
> is set, resulting in a null pointer dereference when trying to access it.
>
> [501099.043007] BUG: unable to handle kernel NULL pointer dereference at 
> 0004
> [501099.043039] IP: [] tcf_connmark_init+0x8b/0x180 
> [act_connmark]
> ...
> [501099.044334] Call Trace:
> [501099.044345]  [] ? tcf_action_init_1+0x198/0x1b0
> [501099.044363]  [] ? tcf_action_init+0xb0/0x120
> [501099.044380]  [] ? tcf_exts_validate+0xc4/0x110
> [501099.044398]  [] ? u32_set_parms+0xa7/0x270 [cls_u32]
> [501099.044417]  [] ? u32_change+0x680/0x87b [cls_u32]
> [501099.044436]  [] ? tc_ctl_tfilter+0x4dd/0x8a0
> [501099.044454]  [] ? security_capable+0x41/0x60
> [501099.044471]  [] ? rtnetlink_rcv_msg+0xe1/0x220
> [501099.044490]  [] ? rtnl_newlink+0x870/0x870
> [501099.044507]  [] ? netlink_rcv_skb+0xa1/0xc0
> [501099.044524]  [] ? rtnetlink_rcv+0x24/0x30
> [501099.044541]  [] ? netlink_unicast+0x184/0x230
> [501099.044558]  [] ? netlink_sendmsg+0x2f8/0x3b0
> [501099.044576]  [] ? sock_sendmsg+0x30/0x40
> [501099.044592]  [] ? SYSC_sendto+0xd3/0x150
> [501099.044608]  [] ? __do_page_fault+0x2d1/0x510
> [501099.044626]  [] ? system_call_fast_compare_end+0xc/0x9b
>
> Signed-off-by: Étienne Noss 
> Signed-off-by: Victorien Molle 

Fixes:  22a5dc0e5e3e ("net: sched: Introduce connmark action")
Acked-by: Cong Wang 


> ---
>  net/sched/act_connmark.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
> index ab8062909962..f9bb43c25697 100644
> --- a/net/sched/act_connmark.c
> +++ b/net/sched/act_connmark.c
> @@ -113,6 +113,9 @@ static int tcf_connmark_init(struct net *net, struct 
> nlattr *nla,
> if (ret < 0)
> return ret;
>
> +   if (!tb[TCA_CONNMARK_PARMS])
> +   return -EINVAL;
> +
> parm = nla_data(tb[TCA_CONNMARK_PARMS]);
>
> if (!tcf_hash_check(tn, parm->index, a, bind)) {
> --
> 2.11.0
>

amd-xgbe: unbalanced irq enable in v4.11-rc1

2017-03-10 Thread Mark Rutland

Hi,

I'm seeing the below splat when transferring data over the network, using
v4.11-rc1 on an AMD Seattle platform. I don't see the splat with v4.10.

Looking at just the driver, I couldn't see any suspicious changes. Reverting
commit 402168b4c2dc0734 ("amd-xgbe: Stop the PHY before releasing interrupts")
doesn't change matters.

Any ideas?

Thanks,
Mark.

[  106.114135] Unbalanced enable for IRQ 34
[  106.118157] [ cut here ]
[  106.122793] WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:529 
__enable_irq+0x168/0x1d8
[  106.130708] Modules linked in:
[  106.133766] 
[  106.135262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.11.0-rc1-1-g2411f70 #5
[  106.142830] Hardware name: AMD Seattle (Rev.B0) Development Board 
(Overdrive) (DT)
[  106.150398] task: 2a9e2000 task.stack: 2a9c
[  106.156320] PC is at __enable_irq+0x168/0x1d8
[  106.160679] LR is at __enable_irq+0x168/0x1d8
[  106.165036] pc : [] lr : [] pstate: 
01c5
[  106.172428] sp : 8003fff17c50
[  106.175742] x29: 8003fff17c50 x28: 800359255160 
[  106.181063] x27: 10006b24aa2d x26: 800359255170 
[  106.186385] x25: 2a9ca000 x24: 800359255168 
[  106.191706] x23: 80036902ca80 x22: 80036bbe8f78 
[  106.197027] x21: 10006d77d1ef x20:  
[  106.202348] x19: 80036bbe8f00 x18: 2a9e27b8 
[  106.207668] x17: 7e2b5d90 x16:  
[  106.212989] x15: 1fffe4000153c4f7 x14: 6003f559 
[  106.218310] x13: 2ac45000 x12: 2b803000 
[  106.223631] x11: 1fffe40001ba7a8d x10:  
[  106.228951] x9 : 10007ffe2ee0 x8 : 2a9e27e8 
[  106.234272] x7 : 2d2d3000 x6 :  
[  106.239592] x5 : 41b58ab3 x4 : 1fffe40001ba3a10 
[  106.244913] x3 : 1fffe40001ba3a10 x2 : dfff2000 
[  106.250234] x1 : 10007ffe2f58 x0 : 001c 
[  106.23] 
[  106.257044] ---[ end trace 86935ef2e5e5c498 ]---
[  106.261658] Call trace:
[  106.264106] Exception stack(0x8003fff17a00 to 0x8003fff17b30)
[  106.270550] 7a00: 80036bbe8f00 0001 8003fff17c50 
282a1340
[  106.278383] 7a20: 01c5 003d 2a9ca000 
800359255170
[  106.286216] 7a40: 10006b24aa2d 2a9e2000 8003fff17b60 
73870f20
[  106.294048] 7a60: 41b58ab3 2a7164f0 28081ac8 
2a05d4a0
[  106.301881] 7a80: 80036902ca80 800359255168 2a9ca000 
800359255170
[  106.309713] 7aa0: 8003fff17c50 8003fff17c50 8003fff17c10 
ffc8
[  106.317546] 7ac0: 41b58ab3 2a72a340 28299380 
2a9e2000
[  106.325379] 7ae0: 8003fff17b20 28bf8f08 80036bbe8f00 
80036d8002c0
[  106.333211] 7b00: 80036d8003f8 10007ffe2f70  

[  106.341040] 7b20: 001c 10007ffe2f58
[  106.345921] [] __enable_irq+0x168/0x1d8
[  106.351323] [] enable_irq+0xa8/0x118
[  106.356467] [] xgbe_one_poll+0x190/0x280
[  106.361956] [] net_rx_action+0x6ac/0xd88
[  106.367444] [] __do_softirq+0x324/0xb70
[  106.372845] [] irq_exit+0x1cc/0x338
[  106.377900] [] __handle_domain_irq+0xdc/0x230
[  106.383821] [] gic_handle_irq+0x6c/0xe0
[  106.389220] Exception stack(0x2a9c3d40 to 0x2a9c3e70)
[  106.395663] 3d40: 2a9e27bc 0007  
1fffe4000153c4f7
[  106.403495] 3d60: 0004   
2d2d3000
[  106.411327] 3d80: 2a9e27c0 040001538762  
0003
[  106.419160] 3da0: 2b803000 2ac45000 10007ffe4281 
1fffe4000153c4f7
[  106.426992] 3dc0:  7e2b5d90 2a9e27b8 
2a9e2000
[  106.434824] 3de0: 2a9e2000  2a9cfeb0 
0003
[  106.442656] 3e00: 00ff 2a9cfe18 2a901058 

[  106.450488] 3e20: 008004810018 2a9c3e70 2808a1d8 
2a9c3e70
[  106.458321] 3e40: 2808a1dc 6145  
2a901058
[  106.466150] 3e60:  2808a1d8
[  106.471029] [] el1_irq+0xb8/0x130
[  106.475909] [] arch_cpu_idle+0x1c/0x28
[  106.481225] [] default_idle_call+0x34/0x78
[  106.486888] [] do_idle+0x21c/0x3c0
[  106.491856] [] cpu_startup_entry+0x24/0x28
[  106.497517] [] rest_init+0x1ec/0x200
[  106.502659] [] start_kernel+0x5d8/0x604
[  106.508061] [] __primary_switched+0x6c/0x74

Re: [PATCH v2] fjes: Do not load fjes driver if system does not have extended socket device.

2017-03-10 Thread Yasuaki Ishimatsu

On 03/09/2017 08:35 PM, David Miller wrote:

From: Yasuaki Ishimatsu 
Date: Wed, 8 Mar 2017 16:05:18 -0500

The fjes driver is used only by FUJITSU servers and almost of all
servers in the world never use it. But currently if ACPI PNP0C02
is defined in the ACPI table, the following message is always shown:

 "FUJITSU Extended Socket Network Device Driver - version 1.2
  - Copyright (c) 2015 FUJITSU LIMITED"

The message makes users confused because there is no reason that
the message is shown in other vendor servers.

To avoid the confusion, the patch adds a check that the server
has a extended socket device or not.

Signed-off-by: Yasuaki Ishimatsu 
CC: Taku Izumi 
---
v2:
- Order local variable declarations from longest to shortest line

This patch does not apply cleanly to the net tree.

Which tree did you apply the patch to?

The patch can apply to net-next tree with no conflicts as follows:

# git clone git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
Cloning into 'net-next'...
remote: Counting objects: 5265118, done.
remote: Compressing objects: 100% (805485/805485), done.
Receiving objects: 100% (5265118/5265118), 910.11 MiB | 23.42 MiB/s, done.
remote: Total 5265118 (delta 4419240), reused 5264459 (delta 4418809)
Resolving deltas: 100% (4419240/4419240), done.
Checking out files: 100% (58005/58005), done.
# head -n 30 fjes.patch
Subject: [PATCH v2] fjes: Do not load fjes driver if system does not have 
extended socket device.
Date: Wed, 8 Mar 2017 16:05:18 -0500
From: Yasuaki Ishimatsu 
To: netdev@vger.kernel.org
CC: David Miller , izumi.t...@jp.fujitsu.com

The fjes driver is used only by FUJITSU servers and almost of all
servers in the world never use it. But currently if ACPI PNP0C02
is defined in the ACPI table, the following message is always shown:

 "FUJITSU Extended Socket Network Device Driver - version 1.2
  - Copyright (c) 2015 FUJITSU LIMITED"

The message makes users confused because there is no reason that
the message is shown in other vendor servers.

To avoid the confusion, the patch adds a check that the server
has a extended socket device or not.

Signed-off-by: Yasuaki Ishimatsu 
CC: Taku Izumi 
---
v2:
- Order local variable declarations from longest to shortest line

 drivers/net/fjes/fjes_main.c | 52 +++-
 1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index b77e4ecf..a57c2cb 100644
# cd net-next/
# git am ../fjes.patch
Applying: fjes: Do not load fjes driver if system does not have extended socket 
device.
#

Thanks,
Yasuaki Ishimatsu

Re: [PATCH iproute2] man: Fix formatting of vrf parameter of ip-link show command

2017-03-10 Thread Stephen Hemminger

On Thu, 9 Mar 2017 12:56:14 +
Robert Shearman  wrote:

> Add missing opening " [" for the vrf parameter.
> 
> Signed-off-by: Robert Shearman 

Applied

Re: [PATCH iproute2] iplink: add support for afstats subcommand

2017-03-10 Thread Stephen Hemminger

On Thu, 9 Mar 2017 12:43:36 +
Robert Shearman  wrote:

> Add support for new afstats subcommand. This uses the new
> IFLA_STATS_AF_SPEC attribute of RTM_GETSTATS messages to show
> per-device, AF-specific stats. At the moment the kernel only supports
> MPLS AF stats, so that is all that's implemented here.
> 
> The print_num function is exposed from ipaddress.c to be used for
> printing the new stats so that the human-readable option, if set, can
> be respected.
> 
> Example of use:
> 
> $ ./ip/ip -f mpls link afstats dev eth1
> 3: eth1
> mpls:
> RX: bytes  packets  errors  dropped  noroute
> 9016   98   0   00
> TX: bytes  packets  errors  dropped
> 7232   113  0   0
> 
> Signed-off-by: Robert Shearman 

Applied

Re: [iproute PATCH] man: ss.8: Add missing protocols to description of -A

2017-03-10 Thread Stephen Hemminger

On Thu,  9 Mar 2017 17:07:33 +0100
Phil Sutter  wrote:

> The list was missing dccp and sctp protocols.
> 
> Signed-off-by: Phil Sutter 

Applied, thanks

[PATCH v2 6/9] net: stmmac: Parse FIFO sizes from feature registers

2017-03-10 Thread Thierry Reding

From: Thierry Reding 

New version of this core encode the FIFO sizes in one of the feature
registers. Use these sizes as default, but still allow device tree to
override them for backwards compatibility.

Reviewed-by: Mikko Perttunen 
Signed-off-by: Thierry Reding 
---
Changes in v2:
- provide macros for the FIFO size fields in the feature register
- add comment about FIFO size encoding to clarify the computation

 drivers/net/ethernet/stmicro/stmmac/common.h  | 3 +++
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  | 2 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  | 5 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +++
 4 files changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 04d9245b7149..3ca36744007b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -324,6 +324,9 @@ struct dma_features {
unsigned int number_tx_queues;
/* Alternate (enhanced) DESC mode */
unsigned int enh_desc;
+   /* TX and RX FIFO sizes */
+   unsigned int tx_fifo_size;
+   unsigned int rx_fifo_size;
 };
 
 /* GMAC TX FIFO is 8K, Rx FIFO is 16K */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index db45134fddf0..83f5e953e291 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -148,6 +148,8 @@ enum power_event {
 /* MAC HW features1 bitmap */
 #define GMAC_HW_FEAT_AVSEL BIT(20)
 #define GMAC_HW_TSOEN  BIT(18)
+#define GMAC_HW_TXFIFOSIZE GENMASK(10, 6)
+#define GMAC_HW_RXFIFOSIZE GENMASK(4, 0)
 
 /* MAC HW features2 bitmap */
 #define GMAC_HW_FEAT_TXCHCNT   GENMASK(21, 18)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index f97b0d5d9987..55270933bae1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -294,6 +294,11 @@ static void dwmac4_get_hw_feature(void __iomem *ioaddr,
hw_cap = readl(ioaddr + GMAC_HW_FEATURE1);
dma_cap->av = (hw_cap & GMAC_HW_FEAT_AVSEL) >> 20;
dma_cap->tsoen = (hw_cap & GMAC_HW_TSOEN) >> 18;
+   /* RX and TX FIFO sizes are encoded as log2(n / 128). Undo that by
+* shifting and store the sizes in bytes.
+*/
+   dma_cap->tx_fifo_size = 128 << ((hw_cap & GMAC_HW_TXFIFOSIZE) >> 6);
+   dma_cap->rx_fifo_size = 128 << ((hw_cap & GMAC_HW_RXFIFOSIZE) >> 0);
/* MAC HW feature2 */
hw_cap = readl(ioaddr + GMAC_HW_FEATURE2);
/* TX and RX number of channels */
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index eba9088e1f61..78f6ec2d165b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1281,6 +1281,9 @@ static void stmmac_dma_operation_mode(struct stmmac_priv 
*priv)
 {
int rxfifosz = priv->plat->rx_fifo_size;
 
+   if (rxfifosz == 0)
+   rxfifosz = priv->dma_cap.rx_fifo_size;
+
if (priv->plat->force_thresh_dma_mode)
priv->hw->dma->dma_mode(priv->ioaddr, tc, tc, rxfifosz);
else if (priv->plat->force_sf_dma_mode || priv->plat->tx_coe) {
-- 
2.12.0

[PATCH v2 9/9] net: stmmac: dwc-qos: Add Tegra186 support

2017-03-10 Thread Thierry Reding

From: Thierry Reding 

The NVIDIA Tegra186 SoC contains an instance of the Synopsys DWC
ethernet QOS IP core. The binding that it uses is slightly different
from existing ones because of the integration (clocks, resets, ...).

Signed-off-by: Thierry Reding 
---
Changes in v2:
- use readl_poll_timeout_atomic() instead of open-coding it
- add define for GMAC_1US_TIC_COUNTER register
- don't read register before overwriting it
- check for errors from clock API
- add missing space before }

 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c| 247 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h   |   1 +
 2 files changed, 248 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
index 319232021bb7..dd6a2f9791cc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
@@ -14,17 +14,34 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "stmmac_platform.h"
+#include "dwmac4.h"
+
+struct tegra_eqos {
+   struct device *dev;
+   void __iomem *regs;
+
+   struct reset_control *rst;
+   struct clk *clk_master;
+   struct clk *clk_slave;
+   struct clk *clk_tx;
+   struct clk *clk_rx;
+
+   struct gpio_desc *reset;
+};
 
 static int dwc_eth_dwmac_config_dt(struct platform_device *pdev,
   struct plat_stmmacenet_data *plat_dat)
@@ -158,6 +175,230 @@ static int dwc_qos_remove(struct platform_device *pdev)
return 0;
 }
 
+#define SDMEMCOMPPADCTRL 0x8800
+#define  SDMEMCOMPPADCTRL_PAD_E_INPUT_OR_E_PWRD BIT(31)
+
+#define AUTO_CAL_CONFIG 0x8804
+#define  AUTO_CAL_CONFIG_START BIT(31)
+#define  AUTO_CAL_CONFIG_ENABLE BIT(29)
+
+#define AUTO_CAL_STATUS 0x880c
+#define  AUTO_CAL_STATUS_ACTIVE BIT(31)
+
+static void tegra_eqos_fix_speed(void *priv, unsigned int speed)
+{
+   struct tegra_eqos *eqos = priv;
+   unsigned long rate = 12500;
+   bool needs_calibration = false;
+   u32 value;
+   int err;
+
+   switch (speed) {
+   case SPEED_1000:
+   needs_calibration = true;
+   rate = 12500;
+   break;
+
+   case SPEED_100:
+   needs_calibration = true;
+   rate = 2500;
+   break;
+
+   case SPEED_10:
+   rate = 250;
+   break;
+
+   default:
+   dev_err(eqos->dev, "invalid speed %u\n", speed);
+   break;
+   }
+
+   if (needs_calibration) {
+   /* calibrate */
+   value = readl(eqos->regs + SDMEMCOMPPADCTRL);
+   value |= SDMEMCOMPPADCTRL_PAD_E_INPUT_OR_E_PWRD;
+   writel(value, eqos->regs + SDMEMCOMPPADCTRL);
+
+   udelay(1);
+
+   value = readl(eqos->regs + AUTO_CAL_CONFIG);
+   value |= AUTO_CAL_CONFIG_START | AUTO_CAL_CONFIG_ENABLE;
+   writel(value, eqos->regs + AUTO_CAL_CONFIG);
+
+   err = readl_poll_timeout_atomic(eqos->regs + AUTO_CAL_STATUS,
+   value,
+   value & AUTO_CAL_STATUS_ACTIVE,
+   1, 10);
+   if (err < 0) {
+   dev_err(eqos->dev, "calibration did not start\n");
+   goto failed;
+   }
+
+   err = readl_poll_timeout_atomic(eqos->regs + AUTO_CAL_STATUS,
+   value,
+   (value & 
AUTO_CAL_STATUS_ACTIVE) == 0,
+   20, 200);
+   if (err < 0) {
+   dev_err(eqos->dev, "calibration didn't finish\n");
+   goto failed;
+   }
+
+   failed:
+   value = readl(eqos->regs + SDMEMCOMPPADCTRL);
+   value &= ~SDMEMCOMPPADCTRL_PAD_E_INPUT_OR_E_PWRD;
+   writel(value, eqos->regs + SDMEMCOMPPADCTRL);
+   } else {
+   value = readl(eqos->regs + AUTO_CAL_CONFIG);
+   value &= ~AUTO_CAL_CONFIG_ENABLE;
+   writel(value, eqos->regs + AUTO_CAL_CONFIG);
+   }
+
+   err = clk_set_rate(eqos->clk_tx, rate);
+   if (err < 0)
+   dev_err(eqos->dev, "failed to set TX rate: %d\n", err);
+}
+
+static int tegra_eqos_init(struct platform_device *pdev, void *priv)
+{
+   struct tegra_eqos *eqos = priv;
+   unsigned long rate;
+   u32 value;
+
+   rate = clk_get_rate(eqos->clk_slave);
+
+   value = (rate / 100) - 1;
+   writel(value, eqos->regs + GMAC_1US_TIC_COUNTER);
+
+   return 0;
+}
+
+static void

[PATCH v2 1/9] net: stmmac: Rename clk_ptp_ref clock to ptp_ref

2017-03-10 Thread Thierry Reding

From: Thierry Reding 

There aren't currently any users of the "clk_ptp_ref", but there are
other references to "ptp_ref", so I'm leaning towards considering that a
typo. Fix it.

Cc: Mark Rutland 
Cc: Rob Herring 
Cc: devicet...@vger.kernel.org
Signed-off-by: Thierry Reding 
---
 Documentation/devicetree/bindings/net/stmmac.txt  | 6 +++---
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index d3bfc2b30fb5..11b27dfd1627 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -28,9 +28,9 @@ Optional properties:
   clocks may be specified in derived bindings.
 - clock-names: One name for each entry in the clocks property, the
   first one should be "stmmaceth" and the second one should be "pclk".
-- clk_ptp_ref: this is the PTP reference clock; in case of the PTP is
-  available this clock is used for programming the Timestamp Addend Register.
-  If not passed then the system clock will be used and this is fine on some
+- ptp_ref: this is the PTP reference clock; in case of the PTP is available
+  this clock is used for programming the Timestamp Addend Register. If not
+  passed then the system clock will be used and this is fine on some
   platforms.
 - tx-fifo-depth: See ethernet.txt file in the same directory
 - rx-fifo-depth: See ethernet.txt file in the same directory
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 0ba1caf18619..f2d94eafeb0a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -359,7 +359,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
clk_prepare_enable(plat->pclk);
 
/* Fall-back to main clock in case of no PTP ref is passed */
-   plat->clk_ptp_ref = devm_clk_get(>dev, "clk_ptp_ref");
+   plat->clk_ptp_ref = devm_clk_get(>dev, "ptp_ref");
if (IS_ERR(plat->clk_ptp_ref)) {
plat->clk_ptp_rate = clk_get_rate(plat->stmmac_clk);
plat->clk_ptp_ref = NULL;
-- 
2.12.0

[PATCH v2 7/9] net: stmmac: Program RX queue size and flow control

2017-03-10 Thread Thierry Reding

From: Thierry Reding 

Program the receive queue size based on the RX FIFO size and enable
hardware flow control for large FIFOs.

Signed-off-by: Thierry Reding 
---
Changes in v2:
- add comments to clarify flow control threshold programming

 drivers/net/ethernet/stmicro/stmmac/dwmac4.h | 12 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c | 52 +++-
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 83f5e953e291..3b1828b4d294 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -182,6 +182,7 @@ enum power_event {
 #define MTL_OP_MODE_TSFBIT(1)
 
 #define MTL_OP_MODE_TQS_MASK   GENMASK(24, 16)
+#define MTL_OP_MODE_TQS_SHIFT  16
 
 #define MTL_OP_MODE_TTC_MASK   0x70
 #define MTL_OP_MODE_TTC_SHIFT  4
@@ -195,6 +196,17 @@ enum power_event {
 #define MTL_OP_MODE_TTC_384(6 << MTL_OP_MODE_TTC_SHIFT)
 #define MTL_OP_MODE_TTC_512(7 << MTL_OP_MODE_TTC_SHIFT)
 
+#define MTL_OP_MODE_RQS_MASK   GENMASK(29, 20)
+#define MTL_OP_MODE_RQS_SHIFT  20
+
+#define MTL_OP_MODE_RFD_MASK   GENMASK(19, 14)
+#define MTL_OP_MODE_RFD_SHIFT  14
+
+#define MTL_OP_MODE_RFA_MASK   GENMASK(13, 8)
+#define MTL_OP_MODE_RFA_SHIFT  8
+
+#define MTL_OP_MODE_EHFC   BIT(7)
+
 #define MTL_OP_MODE_RTC_MASK   0x18
 #define MTL_OP_MODE_RTC_SHIFT  3
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 55270933bae1..6ac6b2600a7c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -183,8 +183,9 @@ static void dwmac4_rx_watchdog(void __iomem *ioaddr, u32 
riwt)
 }
 
 static void dwmac4_dma_chan_op_mode(void __iomem *ioaddr, int txmode,
-   int rxmode, u32 channel)
+   int rxmode, u32 channel, int rxfifosz)
 {
+   unsigned int rqs = rxfifosz / 256 - 1;
u32 mtl_tx_op, mtl_rx_op, mtl_rx_int;
 
/* Following code only done for channel 0, other channels not yet
@@ -250,6 +251,53 @@ static void dwmac4_dma_chan_op_mode(void __iomem *ioaddr, 
int txmode,
mtl_rx_op |= MTL_OP_MODE_RTC_128;
}
 
+   mtl_rx_op &= ~MTL_OP_MODE_RQS_MASK;
+   mtl_rx_op |= rqs << MTL_OP_MODE_RQS_SHIFT;
+
+   /* enable flow control only if each channel gets 4 KiB or more FIFO */
+   if (rxfifosz >= 4096) {
+   unsigned int rfd, rfa;
+
+   mtl_rx_op |= MTL_OP_MODE_EHFC;
+
+   /* Set Threshold for Activating Flow Control to min 2 frames,
+* i.e. 1500 * 2 = 3000 bytes.
+*
+* Set Threshold for Deactivating Flow Control to min 1 frame,
+* i.e. 1500 bytes.
+*/
+   switch (rxfifosz) {
+   case 4096:
+   /* This violates the above formula because of FIFO size
+* limit therefore overflow may occur in spite of this.
+*/
+   rfd = 0x03; /* Full-2.5K */
+   rfa = 0x01; /* Full-1.5K */
+   break;
+
+   case 8192:
+   rfd = 0x06; /* Full-4K */
+   rfa = 0x0a; /* Full-6K */
+   break;
+
+   case 16384:
+   rfd = 0x06; /* Full-4K */
+   rfa = 0x12; /* Full-10K */
+   break;
+
+   default:
+   rfd = 0x06; /* Full-4K */
+   rfa = 0x1e; /* Full-16K */
+   break;
+   }
+
+   mtl_rx_op &= ~MTL_OP_MODE_RFD_MASK;
+   mtl_rx_op |= rfd << MTL_OP_MODE_RFD_SHIFT;
+
+   mtl_rx_op &= ~MTL_OP_MODE_RFA_MASK;
+   mtl_rx_op |= rfa << MTL_OP_MODE_RFA_SHIFT;
+   }
+
writel(mtl_rx_op, ioaddr + MTL_CHAN_RX_OP_MODE(channel));
 
/* Enable MTL RX overflow */
@@ -262,7 +310,7 @@ static void dwmac4_dma_operation_mode(void __iomem *ioaddr, 
int txmode,
  int rxmode, int rxfifosz)
 {
/* Only Channel 0 is actually configured and used */
-   dwmac4_dma_chan_op_mode(ioaddr, txmode, rxmode, 0);
+   dwmac4_dma_chan_op_mode(ioaddr, txmode, rxmode, 0, rxfifosz);
 }
 
 static void dwmac4_get_hw_feature(void __iomem *ioaddr,
-- 
2.12.0

[PATCH v2 8/9] net: stmmac: dwc-qos: Split out ->probe() and ->remove()

2017-03-10 Thread Thierry Reding

From: Thierry Reding 

Split out the binding specific parts of ->probe() and ->remove() to
enable the driver to support variants of the binding. This is useful in
order to keep backwards-compatibility while making it easy for a sub-
driver to deal only with the updated bindings rather than having to add
compatibility quirks all over the place.

Reviewed-by: Mikko Perttunen 
Reviewed-By: Joao Pinto 
Signed-off-by: Thierry Reding 
---
Changes in v2:
- check return values from the clock API

 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c| 124 -
 1 file changed, 98 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
index 1a3fa3d9f855..319232021bb7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -106,13 +107,80 @@ static int dwc_eth_dwmac_config_dt(struct platform_device 
*pdev,
return 0;
 }
 
+static void *dwc_qos_probe(struct platform_device *pdev,
+  struct plat_stmmacenet_data *plat_dat,
+  struct stmmac_resources *stmmac_res)
+{
+   int err;
+
+   plat_dat->stmmac_clk = devm_clk_get(>dev, "apb_pclk");
+   if (IS_ERR(plat_dat->stmmac_clk)) {
+   dev_err(>dev, "apb_pclk clock not found.\n");
+   return ERR_CAST(plat_dat->stmmac_clk);
+   }
+
+   err = clk_prepare_enable(plat_dat->stmmac_clk);
+   if (err < 0) {
+   dev_err(>dev, "failed to enable apb_pclk clock: %d\n",
+   err);
+   return ERR_PTR(err);
+   }
+
+   plat_dat->pclk = devm_clk_get(>dev, "phy_ref_clk");
+   if (IS_ERR(plat_dat->pclk)) {
+   dev_err(>dev, "phy_ref_clk clock not found.\n");
+   err = PTR_ERR(plat_dat->pclk);
+   goto disable;
+   }
+
+   err = clk_prepare_enable(plat_dat->pclk);
+   if (err < 0) {
+   dev_err(>dev, "failed to enable phy_ref clock: %d\n",
+   err);
+   goto disable;
+   }
+
+   return NULL;
+
+disable:
+   clk_disable_unprepare(plat_dat->stmmac_clk);
+   return ERR_PTR(err);
+}
+
+static int dwc_qos_remove(struct platform_device *pdev)
+{
+   struct net_device *ndev = platform_get_drvdata(pdev);
+   struct stmmac_priv *priv = netdev_priv(ndev);
+
+   clk_disable_unprepare(priv->plat->pclk);
+   clk_disable_unprepare(priv->plat->stmmac_clk);
+
+   return 0;
+}
+
+struct dwc_eth_dwmac_data {
+   void *(*probe)(struct platform_device *pdev,
+  struct plat_stmmacenet_data *data,
+  struct stmmac_resources *res);
+   int (*remove)(struct platform_device *pdev);
+};
+
+static const struct dwc_eth_dwmac_data dwc_qos_data = {
+   .probe = dwc_qos_probe,
+   .remove = dwc_qos_remove,
+};
+
 static int dwc_eth_dwmac_probe(struct platform_device *pdev)
 {
+   const struct dwc_eth_dwmac_data *data;
struct plat_stmmacenet_data *plat_dat;
struct stmmac_resources stmmac_res;
struct resource *res;
+   void *priv;
int ret;
 
+   data = of_device_get_match_data(>dev);
+
memset(_res, 0, sizeof(struct stmmac_resources));
 
/**
@@ -138,39 +206,26 @@ static int dwc_eth_dwmac_probe(struct platform_device 
*pdev)
if (IS_ERR(plat_dat))
return PTR_ERR(plat_dat);
 
-   plat_dat->stmmac_clk = devm_clk_get(>dev, "apb_pclk");
-   if (IS_ERR(plat_dat->stmmac_clk)) {
-   dev_err(>dev, "apb_pclk clock not found.\n");
-   ret = PTR_ERR(plat_dat->stmmac_clk);
-   plat_dat->stmmac_clk = NULL;
-   goto err_remove_config_dt;
+   priv = data->probe(pdev, plat_dat, _res);
+   if (IS_ERR(priv)) {
+   ret = PTR_ERR(priv);
+   dev_err(>dev, "failed to probe subdriver: %d\n", ret);
+   goto remove_config;
}
-   clk_prepare_enable(plat_dat->stmmac_clk);
-
-   plat_dat->pclk = devm_clk_get(>dev, "phy_ref_clk");
-   if (IS_ERR(plat_dat->pclk)) {
-   dev_err(>dev, "phy_ref_clk clock not found.\n");
-   ret = PTR_ERR(plat_dat->pclk);
-   plat_dat->pclk = NULL;
-   goto err_out_clk_dis_phy;
-   }
-   clk_prepare_enable(plat_dat->pclk);
 
ret = dwc_eth_dwmac_config_dt(pdev, plat_dat);
if (ret)
-   goto err_out_clk_dis_aper;
+   goto remove;
 
ret = stmmac_dvr_probe(>dev, plat_dat, _res);
if (ret)
-   goto err_out_clk_dis_aper;
+   goto remove;
 
-   return 0;
+   return ret;

[PATCH v2 3/9] net: stmmac: Disable PTP reference clock on error

2017-03-10 Thread Thierry Reding

From: Thierry Reding 

If an error occurs while opening the device, make sure to disable the
PTP reference clock.

Signed-off-by: Thierry Reding 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 7c38c9baf238..6060410d2b9e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1754,6 +1754,13 @@ static int stmmac_hw_setup(struct net_device *dev, bool 
init_ptp)
return 0;
 }
 
+static void stmmac_hw_teardown(struct net_device *dev)
+{
+   struct stmmac_priv *priv = netdev_priv(dev);
+
+   clk_disable_unprepare(priv->plat->clk_ptp_ref);
+}
+
 /**
  *  stmmac_open - open entry point of the driver
  *  @dev : pointer to the device structure.
@@ -1863,6 +1870,7 @@ static int stmmac_open(struct net_device *dev)
phy_stop(dev->phydev);
 
del_timer_sync(>txtimer);
+   stmmac_hw_teardown(dev);
 init_error:
free_dma_desc_resources(priv);
 dma_desc_error:
-- 
2.12.0

[PATCH v2 0/9] net: stmmac: Fixes and Tegra186 support

2017-03-10 Thread Thierry Reding

From: Thierry Reding <tred...@nvidia.com>

Hi everyone,

This series of patches start with a few cleanups that I ran across while
adding Tegra186 support to the stmmac driver. It then adds code for FIFO
size parsing from feature registers and finally enables support for the
incarnation of the Synopsys DWC QOS IP found on NVIDIA Tegra186 SoCs.

This is based on next-20170310.

Changes in v2:
- address review comments by Mikko and Joao
- add two additional cleanup patches

Thanks,
Thierry

Thierry Reding (9):
  net: stmmac: Rename clk_ptp_ref clock to ptp_ref
  net: stmmac: Stop PHY and remove TX timer on error
  net: stmmac: Disable PTP reference clock on error
  net: stmmac: Balance PTP reference clock enable/disable
  net: stmmac: Check for DMA mapping errors
  net: stmmac: Parse FIFO sizes from feature registers
  net: stmmac: Program RX queue size and flow control
  net: stmmac: dwc-qos: Split out ->probe() and ->remove()
  net: stmmac: dwc-qos: Add Tegra186 support

 Documentation/devicetree/bindings/net/stmmac.txt   |   6 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |   3 +
 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c| 371 +++--
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h   |  15 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c   |  57 +++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  23 +-
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   3 +-
 7 files changed, 444 insertions(+), 34 deletions(-)

-- 
2.12.0

[PATCH v2 4/9] net: stmmac: Balance PTP reference clock enable/disable

2017-03-10 Thread Thierry Reding

From: Thierry Reding 

clk_prepare_enable() and clk_disable_unprepare() for this clock aren't
properly balanced, which can trigger a WARN_ON() in the common clock
framework.

Reviewed-By: Joao Pinto 
Signed-off-by: Thierry Reding 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 -
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 6060410d2b9e..ce6d7e791f3f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1711,6 +1711,10 @@ static int stmmac_hw_setup(struct net_device *dev, bool 
init_ptp)
stmmac_mmc_setup(priv);
 
if (init_ptp) {
+   ret = clk_prepare_enable(priv->plat->clk_ptp_ref);
+   if (ret < 0)
+   netdev_warn(priv->dev, "failed to enable PTP reference 
clock: %d\n", ret);
+
ret = stmmac_init_ptp(priv);
if (ret == -EOPNOTSUPP)
netdev_warn(priv->dev, "PTP not supported by HW\n");
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index f2d94eafeb0a..fe49c3105755 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -365,7 +365,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
plat->clk_ptp_ref = NULL;
dev_warn(>dev, "PTP uses main clock\n");
} else {
-   clk_prepare_enable(plat->clk_ptp_ref);
plat->clk_ptp_rate = clk_get_rate(plat->clk_ptp_ref);
dev_dbg(>dev, "PTP rate %d\n", plat->clk_ptp_rate);
}
-- 
2.12.0

[PATCH v2 5/9] net: stmmac: Check for DMA mapping errors

2017-03-10 Thread Thierry Reding

From: Thierry Reding 

When DMA mapping an SKB fragment, the mapping must be checked for
errors, otherwise the DMA debug code will complain upon unmap.

Signed-off-by: Thierry Reding 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ce6d7e791f3f..eba9088e1f61 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2079,6 +2079,8 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, 
struct net_device *dev)
des = skb_frag_dma_map(priv->device, frag, 0,
   skb_frag_size(frag),
   DMA_TO_DEVICE);
+   if (dma_mapping_error(priv->device, des))
+   goto dma_map_err;
 
stmmac_tso_allocator(priv, des, skb_frag_size(frag),
 (i == nfrags - 1));
-- 
2.12.0

[PATCH v2 2/9] net: stmmac: Stop PHY and remove TX timer on error

2017-03-10 Thread Thierry Reding

From: Thierry Reding 

If an error occurs while opening the device, make sure that both the TX
timer and the PHY are properly cleaned up.

Signed-off-by: Thierry Reding 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 4498a3861aa3..7c38c9baf238 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1821,7 +1821,7 @@ static int stmmac_open(struct net_device *dev)
netdev_err(priv->dev,
   "%s: ERROR: allocating the IRQ %d (error: %d)\n",
   __func__, dev->irq, ret);
-   goto init_error;
+   goto irq_error;
}
 
/* Request the Wake IRQ in case of another line is used for WoL */
@@ -1858,7 +1858,11 @@ static int stmmac_open(struct net_device *dev)
free_irq(priv->wol_irq, dev);
 wolirq_error:
free_irq(dev->irq, dev);
+irq_error:
+   if (dev->phydev)
+   phy_stop(dev->phydev);
 
+   del_timer_sync(>txtimer);
 init_error:
free_dma_desc_resources(priv);
 dma_desc_error:
-- 
2.12.0

Re: [PATCH net-next v2] net: Add sysctl to toggle early demux for tcp and udp

2017-03-10 Thread Tom Herbert

On Thu, Mar 9, 2017 at 9:26 PM, Subash Abhinov Kasiviswanathan
 wrote:
> On 2017-03-09 20:42, Tom Herbert wrote:
>>
>> On Thu, Mar 9, 2017 at 7:31 PM, Subash Abhinov Kasiviswanathan
>>  wrote:
>>>
>>> Certain system process significant unconnected UDP workload.
>>> It would be preferrable to disable UDP early demux for those systems
>>> and enable it for TCP only.
>>>
>> Presumably you want this for performance reasons. Can you provide some
>> before and after numbers?
>
>
> Hi Tom
>
> We are working on UDPv6 performance issues seen on an Android ARM64 system.
> Adding an early demux handler (link below) for it helped to increase
> performance
> (800Mbps -> 870Mbps). This helps because Android statistics rules do
> multiple
> socket lookup when no socket is associated with the skb.
>
> https://www.mail-archive.com/netdev@vger.kernel.org/msg157003.html
>
> Eric mentioned that server loads usually see more unconnected load and he
> preferred to turn off early demux for UDP, hence this patch. I don't have
> numbers
> for unconnected loads as of now though.
>
Subash,

Okay, now I'm confused. You're saying that when early demux was added
for IPv6 performance improved, but this patch is allowing early demux
to be disabled on the basis that it hurts performance for unconnected
UDP workloads. While it's true that early demux in the case results in
another UDP lookup, Eric's changes to make it lockless have made that
lookup very cheap. So we really need numbers to justify this patch.

Even if the numbers were to show a benefit, we still have the problem
that this creates a bimodal performance characteristic, e.g. what if
the work load were 1/2 connected and 1/2 unconnected in real life, or
what it the user incorrectly guesses the actual workload. Maybe a
deeper solution to investigate is making early demux work with
unconnected sockets.

Tom

> --
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
> Linux Foundation Collaborative Project

Re: [PATCH v3] {net,IB}/{rxe,usnic}: Utilize generic mac to eui32 function

2017-03-10 Thread Leon Romanovsky

On Wed, Mar 08, 2017 at 03:48:57PM +0200, Yuval Shaia wrote:
> On Wed, Mar 08, 2017 at 09:40:41AM +0200, Leon Romanovsky wrote:
> > On Tue, Mar 07, 2017 at 09:31:58PM +0200, Yuval Shaia wrote:
> > > This logic seems to be duplicated in (at least) three separate files.
> > > Move it to one place so code can be re-use.
> > >
> > > Signed-off-by: Yuval Shaia 
> > > ---
> > > v0 -> v1:
> > >   * Add missing #include
> > >   * Rename to genaddrconf_ifid_eui48
> > > v1 -> v2:
> > >   * Reset eui[0] to default if dev_id is used
> > > v2 -> v3:
> > >   * Add helper function to avoid re-setting eui[0] to default if
> > > dev_id is used
> > > ---
> > >  drivers/infiniband/hw/usnic/usnic_common_util.h | 11 +++
> > >  drivers/infiniband/sw/rxe/rxe_net.c | 11 ++-
> > >  include/net/addrconf.h  | 25 
> > > +++--
> > >  3 files changed, 24 insertions(+), 23 deletions(-)
> >
> > Not promising statistics :)
>
> It is not fair, i also added 4 new blank lines :)
>
> >
> > >
> > > diff --git a/drivers/infiniband/hw/usnic/usnic_common_util.h 
> > > b/drivers/infiniband/hw/usnic/usnic_common_util.h
> > > index b54986d..d91b035 100644
> > > --- a/drivers/infiniband/hw/usnic/usnic_common_util.h
> > > +++ b/drivers/infiniband/hw/usnic/usnic_common_util.h
> > > @@ -34,6 +34,8 @@
> > >  #ifndef USNIC_CMN_UTIL_H
> > >  #define USNIC_CMN_UTIL_H
> > >
> > > +#include 
> > > +
> > >  static inline void
> > >  usnic_mac_to_gid(const char *const mac, char *raw_gid)
> > >  {
> > > @@ -57,14 +59,7 @@ usnic_mac_ip_to_gid(const char *const mac, const 
> > > __be32 inaddr, char *raw_gid)
> > >   raw_gid[1] = 0x80;
> > >   memset(_gid[2], 0, 2);
> > >   memcpy(_gid[4], , 4);
> > > - raw_gid[8] = mac[0]^2;
> > > - raw_gid[9] = mac[1];
> > > - raw_gid[10] = mac[2];
> > > - raw_gid[11] = 0xff;
> > > - raw_gid[12] = 0xfe;
> > > - raw_gid[13] = mac[3];
> > > - raw_gid[14] = mac[4];
> > > - raw_gid[15] = mac[5];
> > > + addrconf_addr_eui48(_gid[8], mac);
> > >  }
> > >
> > >  static inline void
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_net.c 
> > > b/drivers/infiniband/sw/rxe/rxe_net.c
> > > index d8610960..ab8ea23 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_net.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> > > @@ -38,6 +38,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >
> > > @@ -86,18 +87,10 @@ struct rxe_recv_sockets recv_sockets;
> > >
> > >  static __be64 rxe_mac_to_eui64(struct net_device *ndev)
> > >  {
> >
> > It is worth to drop this wrapper completely. The rxe_mac_to_eui64 is
> > called twice in the same file.
>
> hmmm, this is a classic wrapper to adapt an existing API signature to
> usage. And in this case to return __be64 from existing function which
> returns void.
>
> What we can do is the opposite - get rid of rxe_node_guid and rxe_port_guid
> and call directly to rxe_mac_to_eui64. This is based on the assumption that
> node_guid and port_guid are the same in Soft RoCE.
> What do you think?

As long as you will add a meaningful comment in these places.
Right now, the knowledge of node_guid == port_guid in RXE is very clear,
it is worth to preserve it (knowledge).

>
> >
> > > - unsigned char *mac_addr = ndev->dev_addr;
> > >   __be64 eui64;
> > >   unsigned char *dst = (unsigned char *)
> > >
> > > - dst[0] = mac_addr[0] ^ 2;
> > > - dst[1] = mac_addr[1];
> > > - dst[2] = mac_addr[2];
> > > - dst[3] = 0xff;
> > > - dst[4] = 0xfe;
> > > - dst[5] = mac_addr[3];
> > > - dst[6] = mac_addr[4];
> > > - dst[7] = mac_addr[5];
> > > + addrconf_addr_eui48(dst, ndev->dev_addr);
> > >
> > >   return eui64;
> > >  }
> > > diff --git a/include/net/addrconf.h b/include/net/addrconf.h
> > > index 17c6fd8..28274ed 100644
> > > --- a/include/net/addrconf.h
> > > +++ b/include/net/addrconf.h
> > > @@ -103,12 +103,25 @@ int addrconf_prefix_rcv_add_addr(struct net *net, 
> > > struct net_device *dev,
> > >u32 addr_flags, bool sllao, bool tokenized,
> > >__u32 valid_lft, u32 prefered_lft);
> > >
> > > +static inline void addrconf_addr_eui48_xor(u8 *eui, const char *const 
> > > addr, bool xor)
> > > +{
> > > + memcpy(eui, addr, 3);
> > > + if (xor)
> > > + eui[0] ^= 2;
> > > + eui[3] = 0xFF;
> > > + eui[4] = 0xFE;
> > > + memcpy(eui + 5, addr + 3, 3);
> > > +}
> > > +
> > > +static inline void addrconf_addr_eui48(u8 *eui, const char *const addr)
> > > +{
> > > + addrconf_addr_eui48_xor(eui, addr, true);
> >
> > Just put your "eui[0] ^= 2" here and remove redundant "if (xor)".
> > > +}
> > > +
> > >  static inline int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
> > >  {
> > >   if (dev->addr_len != ETH_ALEN)
> > >   return -1;
> > > - memcpy(eui, dev->dev_addr, 3);
> > > - memcpy(eui + 5, dev->dev_addr + 3, 3);
> > >
> > >   /*
> > >* The zSeries OSA network cards can be shared among various
>

Re: [PATCH net] amd-xgbe: Enable IRQs only if napi_complete_done() is true

2017-03-10 Thread Jeremy Linton


On 03/09/2017 05:48 PM, Tom Lendacky wrote:

Depending on the hardware, the amd-xgbe driver may use disable_irq_nosync()
and enable_irq() when an interrupt is received to process Rx packets. If
the napi_complete_done() return value isn't checked an unbalanced enable
for the IRQ could result, generating a warning stack trace.

Update the driver to only enable interrupts if napi_complete_done() returns
true.


I've been running this for a few hours now and haven't seen the warning. 
So it appears to be fixed. Thanks!


I guess Dave M picked it up already, but I will toss this in anyway.


Tested-by: Jeremy Linton 



Reported-by: Jeremy Linton 
Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c |   10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 248f60d..ffea985 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -2272,10 +2272,7 @@ static int xgbe_one_poll(struct napi_struct *napi, int 
budget)
processed = xgbe_rx_poll(channel, budget);

/* If we processed everything, we are done */
-   if (processed < budget) {
-   /* Turn off polling */
-   napi_complete_done(napi, processed);
-
+   if ((processed < budget) && napi_complete_done(napi, processed)) {
/* Enable Tx and Rx interrupts */
if (pdata->channel_irq_mode)
xgbe_enable_rx_tx_int(pdata, channel);
@@ -2317,10 +2314,7 @@ static int xgbe_all_poll(struct napi_struct *napi, int 
budget)
} while ((processed < budget) && (processed != last_processed));

/* If we processed everything, we are done */
-   if (processed < budget) {
-   /* Turn off polling */
-   napi_complete_done(napi, processed);
-
+   if ((processed < budget) && napi_complete_done(napi, processed)) {
/* Enable Tx and Rx interrupts */
xgbe_enable_rx_tx_ints(pdata);
}

Re: [patch net-next 2/2] ipv4: fib: Remove redundant argument

2017-03-10 Thread David Ahern

On 3/10/17 12:56 AM, Jiri Pirko wrote:
> From: Ido Schimmel 
> 
> We always pass the same event type to fib_notify() and
> fib_rules_notify(), so we can safely drop this argument.
> 
> Signed-off-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 
> ---
>  include/net/ip_fib.h|  9 +++--
>  net/ipv4/fib_notifier.c |  4 ++--
>  net/ipv4/fib_rules.c|  5 ++---
>  net/ipv4/fib_trie.c | 15 ++-
>  4 files changed, 13 insertions(+), 20 deletions(-)
> 

Acked-by: David Ahern

Re: [patch net-next 1/2] ipv4: fib: Move FIB notification code to a separate file

2017-03-10 Thread David Ahern

On 3/10/17 12:56 AM, Jiri Pirko wrote:
> From: Ido Schimmel 
> 
> Most of the code concerned with the FIB notification chain currently
> resides in fib_trie.c, but this isn't really appropriate, as the FIB
> notification chain is also used for FIB rules.
> 
> Therefore, it makes sense to move the common FIB notification code to a
> separate file and have it export the relevant functions, which can be
> invoked by its different users (e.g., fib_trie.c, fib_rules.c).
> 
> Signed-off-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 
> ---
>  include/net/ip_fib.h| 15 
>  net/ipv4/Makefile   |  2 +-
>  net/ipv4/fib_notifier.c | 86 +++
>  net/ipv4/fib_rules.c|  9 +
>  net/ipv4/fib_trie.c | 97 
> +
>  5 files changed, 113 insertions(+), 96 deletions(-)
>  create mode 100644 net/ipv4/fib_notifier.c
> 

Acked-by: David Ahern

[PATCH RFC] virtio_net: fix mergeable bufs error handling

2017-03-10 Thread Michael S. Tsirkin

On xdp error we try to free head_skb without having
initialized it, that's clearly bogus.

Fixes: f600b6905015 ("virtio_net: Add XDP support")
Cc: John Fastabend 
Signed-off-by: Michael S. Tsirkin 
---

I'm cleaning up a bunch of issues in this code, thus
RFC, will test and post it all together.

 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11773d6..e0fb3707 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -570,7 +570,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
u16 num_buf;
struct page *page;
int offset;
-   struct sk_buff *head_skb, *curr_skb;
+   struct sk_buff *head_skb = NULL, *curr_skb;
struct bpf_prog *xdp_prog;
unsigned int truesize;
 
-- 
MST

[PATCH net] act_connmark: avoid crashing on malformed nlattrs with null parms

2017-03-10 Thread Etienne Noss

tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
is set, resulting in a null pointer dereference when trying to access it.

[501099.043007] BUG: unable to handle kernel NULL pointer dereference at 
0004
[501099.043039] IP: [] tcf_connmark_init+0x8b/0x180 
[act_connmark]
...
[501099.044334] Call Trace:
[501099.044345]  [] ? tcf_action_init_1+0x198/0x1b0
[501099.044363]  [] ? tcf_action_init+0xb0/0x120
[501099.044380]  [] ? tcf_exts_validate+0xc4/0x110
[501099.044398]  [] ? u32_set_parms+0xa7/0x270 [cls_u32]
[501099.044417]  [] ? u32_change+0x680/0x87b [cls_u32]
[501099.044436]  [] ? tc_ctl_tfilter+0x4dd/0x8a0
[501099.044454]  [] ? security_capable+0x41/0x60
[501099.044471]  [] ? rtnetlink_rcv_msg+0xe1/0x220
[501099.044490]  [] ? rtnl_newlink+0x870/0x870
[501099.044507]  [] ? netlink_rcv_skb+0xa1/0xc0
[501099.044524]  [] ? rtnetlink_rcv+0x24/0x30
[501099.044541]  [] ? netlink_unicast+0x184/0x230
[501099.044558]  [] ? netlink_sendmsg+0x2f8/0x3b0
[501099.044576]  [] ? sock_sendmsg+0x30/0x40
[501099.044592]  [] ? SYSC_sendto+0xd3/0x150
[501099.044608]  [] ? __do_page_fault+0x2d1/0x510
[501099.044626]  [] ? system_call_fast_compare_end+0xc/0x9b

Signed-off-by: Étienne Noss 
Signed-off-by: Victorien Molle 
---
 net/sched/act_connmark.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index ab8062909962..f9bb43c25697 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -113,6 +113,9 @@ static int tcf_connmark_init(struct net *net, struct nlattr 
*nla,
if (ret < 0)
return ret;
 
+   if (!tb[TCA_CONNMARK_PARMS])
+   return -EINVAL;
+
parm = nla_data(tb[TCA_CONNMARK_PARMS]);
 
if (!tcf_hash_check(tn, parm->index, a, bind)) {
-- 
2.11.0

[net-next v1] vxlan: use appropriate family on L3 miss

2017-03-10 Thread Vincent Bernat

When sending a L3 miss, the family is set to AF_INET even for IPv6. This
causes userland (eg "ip monitor") to be confused. Ensure we send the
appropriate family in this case. For L2 miss, keep using AF_INET.

Signed-off-by: Vincent Bernat 
---
 drivers/net/vxlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e375560cc74e..168257aa8ace 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -276,9 +276,9 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct 
vxlan_dev *vxlan,
send_eth = send_ip = true;
 
if (type == RTM_GETNEIGH) {
-   ndm->ndm_family = AF_INET;
send_ip = !vxlan_addr_any(>remote_ip);
send_eth = !is_zero_ether_addr(fdb->eth_addr);
+   ndm->ndm_family = send_ip ? rdst->remote_ip.sa.sa_family : 
AF_INET;
} else
ndm->ndm_family = AF_BRIDGE;
ndm->ndm_state = fdb->state;
-- 
2.11.0

Re: [PATCH/RFC net-next 1/2] flow dissector: ND support

2017-03-10 Thread Jiri Pirko

Fri, Mar 10, 2017 at 04:20:21PM CET, simon.hor...@netronome.com wrote:
>On Fri, Mar 10, 2017 at 03:27:32PM +0100, Jiri Pirko wrote:
>> Fri, Mar 10, 2017 at 03:19:13PM CET, simon.hor...@netronome.com wrote:
>> >On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
>> >> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.hor...@netronome.com wrote:
>> >> >Allow dissection of Neighbour Discovery target IP, and source and
>> >> >destination link-layer addresses for neighbour solicitation and
>> >> >advertisement messages.
>> >> >
>> >> >Signed-off-by: Simon Horman 
>> >> >---
>> >> 
>> >> [...]
>> >> 
>> >> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>> >> >  
>> >> > FLOW_DISSECTOR_KEY_ICMP,
>> >> >  target_container);
>> >> > key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, 
>> >> > hlen);
>> >> >+
>> >> >+if (dissector_uses_key(flow_dissector, 
>> >> >FLOW_DISSECTOR_KEY_ND) &&
>> >> >+ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
>> >> 
>> >> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
>> >> instead.
>> >
>> >Thanks, will do.
>> >
>> >> >+(key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
>> >> >+ key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
>> >> >+key_nd = 
>> >> >skb_flow_dissector_target(flow_dissector,
>> >> >+   
>> >> >FLOW_DISSECTOR_KEY_ND,
>> >> >+   
>> >> >target_container);
>> >> >+if (!(skb_flow_dissect_nd(skb, key_nd, data, 
>> >> >nhoff,
>> >> >+  hlen, 
>> >> >ipv6_payload_len)))
>> >> >+goto out_bad;
>> >> >+}
>> >> 
>> >> You should put this under "switch (ip_proto) {"
>> >
>> >I see that makes sense in terms of the check against ip_proto.
>> >But I added it here to allow checking against key_icmp->code
>> >and key_icmp->type as well.
>> 
>> Sure. Just add under "switch (ip_proto) {" and call a wrapper nd
>> function from there. In that function, you check dissector_uses_key and
>> other needed things.
>
>Sorry, but I'm still a little unclear on how that interacts with
>the dissection of key_icmp.

you do:
if (key_icmp->code != 0)
return

Inside that function. Or something like that. Up to you. Just look at
__skb_flow_dissect_arp for example. First it checks dissector_uses_key,
then it does other checks.

Re: [PATCH/RFC net-next 1/2] flow dissector: ND support

2017-03-10 Thread Simon Horman

On Fri, Mar 10, 2017 at 03:27:32PM +0100, Jiri Pirko wrote:
> Fri, Mar 10, 2017 at 03:19:13PM CET, simon.hor...@netronome.com wrote:
> >On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
> >> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.hor...@netronome.com wrote:
> >> >Allow dissection of Neighbour Discovery target IP, and source and
> >> >destination link-layer addresses for neighbour solicitation and
> >> >advertisement messages.
> >> >
> >> >Signed-off-by: Simon Horman 
> >> >---
> >> 
> >> [...]
> >> 
> >> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> >> >   FLOW_DISSECTOR_KEY_ICMP,
> >> >   target_container);
> >> >  key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
> >> >+
> >> >+ if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
> >> >+ ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
> >> 
> >> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
> >> instead.
> >
> >Thanks, will do.
> >
> >> >+ (key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
> >> >+  key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
> >> >+ key_nd = skb_flow_dissector_target(flow_dissector,
> >> >+
> >> >FLOW_DISSECTOR_KEY_ND,
> >> >+target_container);
> >> >+ if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
> >> >+   hlen, ipv6_payload_len)))
> >> >+ goto out_bad;
> >> >+ }
> >> 
> >> You should put this under "switch (ip_proto) {"
> >
> >I see that makes sense in terms of the check against ip_proto.
> >But I added it here to allow checking against key_icmp->code
> >and key_icmp->type as well.
> 
> Sure. Just add under "switch (ip_proto) {" and call a wrapper nd
> function from there. In that function, you check dissector_uses_key and
> other needed things.

Sorry, but I'm still a little unclear on how that interacts with
the dissection of key_icmp.

Re: [PATCH v2] net: ethernet: aquantia: set net_device mtu when mtu is changed

2017-03-10 Thread David Arcari

Hi,

On 03/09/2017 05:43 PM, Lino Sanfilippo wrote:
> Hi,
> 
> On 09.03.2017 22:03, David Arcari wrote:
>> When the aquantia device mtu is changed the net_device structure is not
>> updated.  As a result the ip command does not properly reflect the mtu 
>> change.
>>
>> Commit 5513e16421cb incorrectly assumed that __dev_set_mtu() was making the
>> assignment ndev->mtu = new_mtu;  This is not true in the case where the 
>> driver
>> has a ndo_change_mtu routine.
>>
>> Fixes: 5513e16421cb ("net: ethernet: aquantia: Fixes for aq_ndev_change_mtu")
>>
>> v2: no longer close/open net-device after mtu change
>>
>> Cc: Pavel Belous 
>> Signed-off-by: David Arcari 
>> ---
>>  drivers/net/ethernet/aquantia/atlantic/aq_main.c | 10 ++
>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_main.c 
>> b/drivers/net/ethernet/aquantia/atlantic/aq_main.c
>> index dad6362..bba5ebd 100644
>> --- a/drivers/net/ethernet/aquantia/atlantic/aq_main.c
>> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_main.c
>> @@ -96,15 +96,9 @@ static int aq_ndev_change_mtu(struct net_device *ndev, 
>> int new_mtu)
>>  struct aq_nic_s *aq_nic = netdev_priv(ndev);
>>  int err = aq_nic_set_mtu(aq_nic, new_mtu + ETH_HLEN);
>>  
>> -if (err < 0)
>> -goto err_exit;
>> +if (!err)
>> +ndev->mtu = new_mtu;
>>  
>> -if (netif_running(ndev)) {
>> -aq_ndev_close(ndev);
>> -aq_ndev_open(ndev);
>> -}
>> -
>> -err_exit:
> 
> Removing the restart has nothing to do with the bug you want to fix here, has 
> it?
> I suggest to send a separate patch for this.
> 
> Regards,
> Lino
> 

I'm fine with that.  Pavel does that work for you?

It would mean that the original version of this patch should be applied and
either you or I could send the follow-up patch.

Best,

-Dave

[PATCH] vlan: support ingress/egress priority map flushing

2017-03-10 Thread Thierry Du Tre


When sending packets via a vlan device we can manipulate the priority bits in 
the vlan header (PCP) via a mapping based on tc class value.
Similarly, when packets are received via a vlan device, the PCP value can be 
mapped onto a tc class value, which is available for iptables rules and tc 
queueing disciplines.

One can use the vconfig utility to set both ingress and egress mapping entries 
(set_ingress_map/set_egress_map) or any other application to call the vlan 
ioctl handler.

The resulting map can be printed via /proc/net/vlan/ , i.e. :
# cat /proc/net/vlan/vlan11
vlan11  VID: 11  REORDER_HDR: 1  dev->priv_flags: 1
 total frames received 52331849
  total bytes received  17451834908
  Broadcast/Multicast Rcvd  1525155

  total frames transmitted 98569270
   total bytes transmitted 144870211289
Device: eth_test
INGRESS priority mappings: 0:0  1:1  2:2  3:3  4:0  5:0  6:0 7:0
 EGRESS priority mappings: 0:7

The current API offers only GET and SET operations, and when actually using 
this functionality a flush is missing to reset all entries.
This patch adds a FLUSH operation for both ingress and egress map which can 
then be used by vconfig or other applications.

Signed-off-by: Thierry Du Tre 
---
 include/uapi/linux/if_vlan.h |  2 ++
 net/8021q/vlan.c | 16 
 net/8021q/vlan.h |  2 ++
 net/8021q/vlan_dev.c | 23 +++
 4 files changed, 43 insertions(+)

diff --git a/include/uapi/linux/if_vlan.h b/include/uapi/linux/if_vlan.h
index 7e5e6b3..6bdff52 100644
--- a/include/uapi/linux/if_vlan.h
+++ b/include/uapi/linux/if_vlan.h
@@ -20,6 +20,8 @@
 enum vlan_ioctl_cmds {
ADD_VLAN_CMD,
DEL_VLAN_CMD,
+   FLUSH_VLAN_INGRESS_PRIORITY_CMD,
+   FLUSH_VLAN_EGRESS_PRIORITY_CMD,
SET_VLAN_INGRESS_PRIORITY_CMD,
SET_VLAN_EGRESS_PRIORITY_CMD,
GET_VLAN_INGRESS_PRIORITY_CMD,
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 467069b..8988419 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -539,6 +539,22 @@ static int vlan_ioctl_handler(struct net *net, void __user 
*arg)
}
 
 	switch (args.cmd) {

+   case FLUSH_VLAN_INGRESS_PRIORITY_CMD:
+   err = -EPERM;
+   if (!capable(CAP_NET_ADMIN))
+   break;
+   vlan_dev_flush_ingress_priority(dev);
+   err = 0;
+   break;
+
+   case FLUSH_VLAN_EGRESS_PRIORITY_CMD:
+   err = -EPERM;
+   if (!capable(CAP_NET_ADMIN))
+   break;
+   vlan_dev_flush_egress_priority(dev);
+   err = 0;
+   break;
+
case SET_VLAN_INGRESS_PRIORITY_CMD:
err = -EPERM;
if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index df8bd65..d8d90ca 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -97,6 +97,8 @@ static inline struct net_device *vlan_find_dev(struct 
net_device *real_dev,
(i) % VLAN_N_VID)))
 
 /* found in vlan_dev.c */

+void vlan_dev_flush_ingress_priority(const struct net_device *dev);
+void vlan_dev_flush_egress_priority(const struct net_device *dev);
 void vlan_dev_set_ingress_priority(const struct net_device *dev,
   u32 skb_prio, u16 vlan_prio);
 int vlan_dev_set_egress_priority(const struct net_device *dev,
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index e97ab82..8fd91c3 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -159,6 +159,29 @@ static int vlan_dev_change_mtu(struct net_device *dev, int 
new_mtu)
return 0;
 }
 
+void vlan_dev_flush_ingress_priority(const struct net_device *dev)

+{
+   struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
+
+   memset(vlan->ingress_priority_map, 0, 
sizeof(vlan->ingress_priority_map));
+   vlan->nr_ingress_mappings = 0;
+}
+
+void vlan_dev_flush_egress_priority(const struct net_device *dev)
+{
+   struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
+   struct vlan_priority_tci_mapping *mp;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) {
+   while ((mp = vlan->egress_priority_map[i]) != NULL) {
+   vlan->egress_priority_map[i] = mp->next;
+   kfree(mp);
+   }
+   }
+   vlan->nr_egress_mappings = 0;
+}
+
 void vlan_dev_set_ingress_priority(const struct net_device *dev,
   u32 skb_prio, u16 vlan_prio)
 {
--
2.7.4

Re: [PATCH/RFC net-next 1/2] flow dissector: ND support

2017-03-10 Thread Jiri Pirko

Fri, Mar 10, 2017 at 03:19:13PM CET, simon.hor...@netronome.com wrote:
>On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
>> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.hor...@netronome.com wrote:
>> >Allow dissection of Neighbour Discovery target IP, and source and
>> >destination link-layer addresses for neighbour solicitation and
>> >advertisement messages.
>> >
>> >Signed-off-by: Simon Horman 
>> >---
>> 
>> [...]
>> 
>> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>> > FLOW_DISSECTOR_KEY_ICMP,
>> > target_container);
>> >key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
>> >+
>> >+   if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
>> >+   ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
>> 
>> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
>> instead.
>
>Thanks, will do.
>
>> >+   (key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
>> >+key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
>> >+   key_nd = skb_flow_dissector_target(flow_dissector,
>> >+  
>> >FLOW_DISSECTOR_KEY_ND,
>> >+  target_container);
>> >+   if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
>> >+ hlen, ipv6_payload_len)))
>> >+   goto out_bad;
>> >+   }
>> 
>> You should put this under "switch (ip_proto) {"
>
>I see that makes sense in terms of the check against ip_proto.
>But I added it here to allow checking against key_icmp->code
>and key_icmp->type as well.

Sure. Just add under "switch (ip_proto) {" and call a wrapper nd
function from there. In that function, you check dissector_uses_key and
other needed things.

Re: [PATCH net] amd-xgbe: Enable IRQs only if napi_complete_done() is true

2017-03-10 Thread Tom Lendacky

On 3/9/2017 8:31 PM, David Miller wrote:

From: Tom Lendacky 
Date: Thu, 9 Mar 2017 17:48:23 -0600

Depending on the hardware, the amd-xgbe driver may use disable_irq_nosync()
and enable_irq() when an interrupt is received to process Rx packets. If
the napi_complete_done() return value isn't checked an unbalanced enable
for the IRQ could result, generating a warning stack trace.

Update the driver to only enable interrupts if napi_complete_done() returns
true.

Reported-by: Jeremy Linton 
Signed-off-by: Tom Lendacky 

Applied, thanks.

Thanks David!  The change to napi_complete_done() from void to bool
occurred in 4.10, can you queue this fix up against 4.10 stable?

Thanks,
Tom

Re: [PATCH/RFC net-next 1/2] flow dissector: ND support

2017-03-10 Thread Simon Horman

On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.hor...@netronome.com wrote:
> >Allow dissection of Neighbour Discovery target IP, and source and
> >destination link-layer addresses for neighbour solicitation and
> >advertisement messages.
> >
> >Signed-off-by: Simon Horman 
> >---
> 
> [...]
> 
> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> >  FLOW_DISSECTOR_KEY_ICMP,
> >  target_container);
> > key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
> >+
> >+if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
> >+ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
> 
> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
> instead.

Thanks, will do.

> >+(key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
> >+ key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
> >+key_nd = skb_flow_dissector_target(flow_dissector,
> >+   
> >FLOW_DISSECTOR_KEY_ND,
> >+   target_container);
> >+if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
> >+  hlen, ipv6_payload_len)))
> >+goto out_bad;
> >+}
> 
> You should put this under "switch (ip_proto) {"

I see that makes sense in terms of the check against ip_proto.
But I added it here to allow checking against key_icmp->code
and key_icmp->type as well.

1 2 >

1 - 100 of 131 matches

Mail list logo