[net-next] i40e(vf): remove i40e_ethtool_stats.h header file

2018-09-07 Thread Jacob Keller
Essentially reverts commit 8fd75c58a09a ("i40e: move ethtool
stats boiler plate code to i40e_ethtool_stats.h", 2018-08-30), and
additionally moves the similar code in i40evf into i40evf_ethtool.c.

The code was intially moved from i40e_ethtool.c into i40e_ethtool_stats.h
as a way of better logically organizing the code. This has two problems.
First, we can't have an inline function with variadic arguments on all
platforms. Second, it gave the appearance that we had plans to share
code between the i40e and i40evf drivers, due to having a near copy of
the contents in the i40evf/i40e_ethtool_stats.h file.

Patches which actually attempt to combine or share code between the i40e
and i40evf drivers have not materialized, and are likely a ways off.

Rather than fixing the one function which causes build issues, just move
this code back into the i40e_ethtool.c and i40evf_ethtool.c files. Note
that we also change these functions back from static inlines to just
statics, since they're no longer in a header file.

We can revisit this if/when work is done to actually attempt to share
code between drivers. Alternatively, this stats code could be made more
generic so that it can be shared across drivers as part of ethtool
kernel work.

Signed-off-by: Jacob Keller 
---
Sorry aboutthe delay, our iwl queue maintainer Jeff is on vacation, and
discussion about how best to resolve this issue was/is ongoing in the
IWL list.

I opted to just move the contents of these files back into
i40e_ethtool.c and i40evf_ethtool.c respectively, as I think it's better
than only moving the single offending function back into the .c file.
This is different from the approach suggested by Wang, Dongsheng
 on Intel Wired LAN, but I think it's a
better approach for now.

I've also kindly asked if Aaron Brown could test this out.

 .../net/ethernet/intel/i40e/i40e_ethtool.c| 219 -
 .../ethernet/intel/i40e/i40e_ethtool_stats.h  | 221 --
 .../intel/i40evf/i40e_ethtool_stats.h | 221 --
 .../ethernet/intel/i40evf/i40evf_ethtool.c| 219 -
 4 files changed, 436 insertions(+), 444 deletions(-)
 delete mode 100644 drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
 delete mode 100644 drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index d7d3974beca2..87fe2e60602f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -6,7 +6,224 @@
 #include "i40e.h"
 #include "i40e_diag.h"
 
-#include "i40e_ethtool_stats.h"
+/* ethtool statistics helpers */
+
+/**
+ * struct i40e_stats - definition for an ethtool statistic
+ * @stat_string: statistic name to display in ethtool -S output
+ * @sizeof_stat: the sizeof() the stat, must be no greater than sizeof(u64)
+ * @stat_offset: offsetof() the stat from a base pointer
+ *
+ * This structure defines a statistic to be added to the ethtool stats buffer.
+ * It defines a statistic as offset from a common base pointer. Stats should
+ * be defined in constant arrays using the I40E_STAT macro, with every element
+ * of the array using the same _type for calculating the sizeof_stat and
+ * stat_offset.
+ *
+ * The @sizeof_stat is expected to be sizeof(u8), sizeof(u16), sizeof(u32) or
+ * sizeof(u64). Other sizes are not expected and will produce a WARN_ONCE from
+ * the i40e_add_ethtool_stat() helper function.
+ *
+ * The @stat_string is interpreted as a format string, allowing formatted
+ * values to be inserted while looping over multiple structures for a given
+ * statistics array. Thus, every statistic string in an array should have the
+ * same type and number of format specifiers, to be formatted by variadic
+ * arguments to the i40e_add_stat_string() helper function.
+ **/
+struct i40e_stats {
+   char stat_string[ETH_GSTRING_LEN];
+   int sizeof_stat;
+   int stat_offset;
+};
+
+/* Helper macro to define an i40e_stat structure with proper size and type.
+ * Use this when defining constant statistics arrays. Note that @_type expects
+ * only a type name and is used multiple times.
+ */
+#define I40E_STAT(_type, _name, _stat) { \
+   .stat_string = _name, \
+   .sizeof_stat = FIELD_SIZEOF(_type, _stat), \
+   .stat_offset = offsetof(_type, _stat) \
+}
+
+/* Helper macro for defining some statistics directly copied from the netdev
+ * stats structure.
+ */
+#define I40E_NETDEV_STAT(_net_stat) \
+   I40E_STAT(struct rtnl_link_stats64, #_net_stat, _net_stat)
+
+/* Helper macro for defining some statistics related to queues */
+#define I40E_QUEUE_STAT(_name, _stat) \
+   I40E_STAT(struct i40e_ring, _name, _stat)
+
+/* Stats associated with a Tx or Rx ring */
+static const struct i40e_stats i40e_gstrings_queue_stats[] = {
+   I40E_QUEUE_STAT("%s-%u.packets", stats.packets),

[net v2] sch_fq_codel: zero q->flows_cnt when fq_codel_init fails

2018-07-10 Thread Jacob Keller
When fq_codel_init fails, qdisc_create_dflt will cleanup by using
qdisc_destroy. This function calls the ->reset() op prior to calling the
->destroy() op.

Unfortunately, during the failure flow for sch_fq_codel, the ->flows
parameter is not initialized, so the fq_codel_reset function will null
pointer dereference.

   kernel: BUG: unable to handle kernel NULL pointer dereference at 
0008
   kernel: IP: fq_codel_reset+0x58/0xd0 [sch_fq_codel]
   kernel: PGD 0 P4D 0
   kernel: Oops:  [#1] SMP PTI
   kernel: Modules linked in: i40iw i40e(OE) xt_CHECKSUM iptable_mangle 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc 
devlink ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert 
iscsi_target_mod sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srpt 
target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs 
ib_umad rdma_cm ib_cm iw_cm intel_rapl sb_edac x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore 
ib_core intel_rapl_perf mei_me mei joydev i2c_i801 lpc_ich ioatdma shpchp wmi 
sch_fq_codel xfs libcrc32c mgag200 ixgbe drm_kms_helper isci ttm firewire_ohci
   kernel:  mdio drm igb libsas crc32c_intel firewire_core ptp pps_core 
scsi_transport_sas crc_itu_t dca i2c_algo_bit ipmi_si ipmi_devintf 
ipmi_msghandler [last unloaded: i40e]
   kernel: CPU: 10 PID: 4219 Comm: ip Tainted: G   OE
4.16.13custom-fq-codel-test+ #3
   kernel: Hardware name: Intel Corporation S2600CO/S2600CO, BIOS 
SE5C600.86B.02.05.0004.051120151007 05/11/2015
   kernel: RIP: 0010:fq_codel_reset+0x58/0xd0 [sch_fq_codel]
   kernel: RSP: 0018:bfbf4c1fb620 EFLAGS: 00010246
   kernel: RAX: 0400 RBX:  RCX: 05b9
   kernel: RDX:  RSI: 9d03264a60c0 RDI: 9cfd17b31c00
   kernel: RBP: 0001 R08: 000260c0 R09: b679c3e9
   kernel: R10: f1dab06a0e80 R11: 9cfd163af800 R12: 9cfd17b31c00
   kernel: R13: 0001 R14: 9cfd153de600 R15: 0001
   kernel: FS:  7fdec2f92800() GS:9d032648() 
knlGS:
   kernel: CS:  0010 DS:  ES:  CR0: 80050033
   kernel: CR2: 0008 CR3: 000c1956a006 CR4: 000606e0
   kernel: Call Trace:
   kernel:  qdisc_destroy+0x56/0x140
   kernel:  qdisc_create_dflt+0x8b/0xb0
   kernel:  mq_init+0xc1/0xf0
   kernel:  qdisc_create_dflt+0x5a/0xb0
   kernel:  dev_activate+0x205/0x230
   kernel:  __dev_open+0xf5/0x160
   kernel:  __dev_change_flags+0x1a3/0x210
   kernel:  dev_change_flags+0x21/0x60
   kernel:  do_setlink+0x660/0xdf0
   kernel:  ? down_trylock+0x25/0x30
   kernel:  ? xfs_buf_trylock+0x1a/0xd0 [xfs]
   kernel:  ? rtnl_newlink+0x816/0x990
   kernel:  ? _xfs_buf_find+0x327/0x580 [xfs]
   kernel:  ? _cond_resched+0x15/0x30
   kernel:  ? kmem_cache_alloc+0x20/0x1b0
   kernel:  ? rtnetlink_rcv_msg+0x200/0x2f0
   kernel:  ? rtnl_calcit.isra.30+0x100/0x100
   kernel:  ? netlink_rcv_skb+0x4c/0x120
   kernel:  ? netlink_unicast+0x19e/0x260
   kernel:  ? netlink_sendmsg+0x1ff/0x3c0
   kernel:  ? sock_sendmsg+0x36/0x40
   kernel:  ? ___sys_sendmsg+0x295/0x2f0
   kernel:  ? ebitmap_cmp+0x6d/0x90
   kernel:  ? dev_get_by_name_rcu+0x73/0x90
   kernel:  ? skb_dequeue+0x52/0x60
   kernel:  ? __inode_wait_for_writeback+0x7f/0xf0
   kernel:  ? bit_waitqueue+0x30/0x30
   kernel:  ? fsnotify_grab_connector+0x3c/0x60
   kernel:  ? __sys_sendmsg+0x51/0x90
   kernel:  ? do_syscall_64+0x74/0x180
   kernel:  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
   kernel: Code: 00 00 48 89 87 00 02 00 00 8b 87 a0 01 00 00 85 c0 0f 84 84 00 
00 00 31 ed 48 63 dd 83 c5 01 48 c1 e3 06 49 03 9c 24 90 01 00 00 <48> 8b 73 08 
48 8b 3b e8 6c 9a 4f f6 48 8d 43 10 48 c7 03 00 00
   kernel: RIP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] RSP: bfbf4c1fb620
   kernel: CR2: 0008
   kernel: ---[ end trace e81a62bede66274e ]---

This is caused because flows_cnt is non-zero, but flows hasn't been
initialized. fq_codel_init has left the private data in a partially
initialized state.

To fix this, reset flows_cnt to 0 when we fail to initialize.
Additionally, to make the state more consistent, also cleanup the flows
pointer when the allocation of backlogs fails.

This fixes the NULL pointer dereference, since both the for-loop and
memset in fq_codel_reset will be no-ops when flow_cnt is zero.

Signed-off-by: Jacob Keller 
---
 net/sched/sch_fq_codel.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index cd2e0e342fb6..6c0a9d5dbf94 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -479,24 +479,28 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr 

[net] sch_fq_codel: zero q->flows_cnt when fq_codel_init fails

2018-07-09 Thread Jacob Keller
When fq_codel_init fails, qdisc_create_dflt will cleanup by using
qdisc_destroy. This function calls the ->reset() op prior to calling the
->destroy() op.

Unfortunately, during the failure flow for sch_fq_codel, the ->flows
parameter is not initialized, so the fq_codel_reset function will null
pointer dereference.

   kernel: BUG: unable to handle kernel NULL pointer dereference at 
0008
   kernel: IP: fq_codel_reset+0x58/0xd0 [sch_fq_codel]
   kernel: PGD 0 P4D 0
   kernel: Oops:  [#1] SMP PTI
   kernel: Modules linked in: i40iw i40e(OE) xt_CHECKSUM iptable_mangle 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc 
devlink ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert 
iscsi_target_mod sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srpt 
target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs 
ib_umad rdma_cm ib_cm iw_cm intel_rapl sb_edac x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore 
ib_core intel_rapl_perf mei_me mei joydev i2c_i801 lpc_ich ioatdma shpchp wmi 
sch_fq_codel xfs libcrc32c mgag200 ixgbe drm_kms_helper isci ttm firewire_ohci
   kernel:  mdio drm igb libsas crc32c_intel firewire_core ptp pps_core 
scsi_transport_sas crc_itu_t dca i2c_algo_bit ipmi_si ipmi_devintf 
ipmi_msghandler [last unloaded: i40e]
   kernel: CPU: 10 PID: 4219 Comm: ip Tainted: G   OE
4.16.13custom-fq-codel-test+ #3
   kernel: Hardware name: Intel Corporation S2600CO/S2600CO, BIOS 
SE5C600.86B.02.05.0004.051120151007 05/11/2015
   kernel: RIP: 0010:fq_codel_reset+0x58/0xd0 [sch_fq_codel]
   kernel: RSP: 0018:bfbf4c1fb620 EFLAGS: 00010246
   kernel: RAX: 0400 RBX:  RCX: 05b9
   kernel: RDX:  RSI: 9d03264a60c0 RDI: 9cfd17b31c00
   kernel: RBP: 0001 R08: 000260c0 R09: b679c3e9
   kernel: R10: f1dab06a0e80 R11: 9cfd163af800 R12: 9cfd17b31c00
   kernel: R13: 0001 R14: 9cfd153de600 R15: 0001
   kernel: FS:  7fdec2f92800() GS:9d032648() 
knlGS:
   kernel: CS:  0010 DS:  ES:  CR0: 80050033
   kernel: CR2: 0008 CR3: 000c1956a006 CR4: 000606e0
   kernel: Call Trace:
   kernel:  qdisc_destroy+0x56/0x140
   kernel:  qdisc_create_dflt+0x8b/0xb0
   kernel:  mq_init+0xc1/0xf0
   kernel:  qdisc_create_dflt+0x5a/0xb0
   kernel:  dev_activate+0x205/0x230
   kernel:  __dev_open+0xf5/0x160
   kernel:  __dev_change_flags+0x1a3/0x210
   kernel:  dev_change_flags+0x21/0x60
   kernel:  do_setlink+0x660/0xdf0
   kernel:  ? down_trylock+0x25/0x30
   kernel:  ? xfs_buf_trylock+0x1a/0xd0 [xfs]
   kernel:  ? rtnl_newlink+0x816/0x990
   kernel:  ? _xfs_buf_find+0x327/0x580 [xfs]
   kernel:  ? _cond_resched+0x15/0x30
   kernel:  ? kmem_cache_alloc+0x20/0x1b0
   kernel:  ? rtnetlink_rcv_msg+0x200/0x2f0
   kernel:  ? rtnl_calcit.isra.30+0x100/0x100
   kernel:  ? netlink_rcv_skb+0x4c/0x120
   kernel:  ? netlink_unicast+0x19e/0x260
   kernel:  ? netlink_sendmsg+0x1ff/0x3c0
   kernel:  ? sock_sendmsg+0x36/0x40
   kernel:  ? ___sys_sendmsg+0x295/0x2f0
   kernel:  ? ebitmap_cmp+0x6d/0x90
   kernel:  ? dev_get_by_name_rcu+0x73/0x90
   kernel:  ? skb_dequeue+0x52/0x60
   kernel:  ? __inode_wait_for_writeback+0x7f/0xf0
   kernel:  ? bit_waitqueue+0x30/0x30
   kernel:  ? fsnotify_grab_connector+0x3c/0x60
   kernel:  ? __sys_sendmsg+0x51/0x90
   kernel:  ? do_syscall_64+0x74/0x180
   kernel:  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
   kernel: Code: 00 00 48 89 87 00 02 00 00 8b 87 a0 01 00 00 85 c0 0f 84 84 00 
00 00 31 ed 48 63 dd 83 c5 01 48 c1 e3 06 49 03 9c 24 90 01 00 00 <48> 8b 73 08 
48 8b 3b e8 6c 9a 4f f6 48 8d 43 10 48 c7 03 00 00
   kernel: RIP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] RSP: bfbf4c1fb620
   kernel: CR2: 0008
   kernel: ---[ end trace e81a62bede66274e ]---

This is caused because flows_cnt is non-zero, but flows hasn't been
initialized. fq_codel_init has left the private data in a partially
initialized state.

To fix this, reset flows_cnt to 0 when we fail to initialize.
Additionally, to make the state more consistent, also cleanup the flows
pointer when the allocation of backlogs fails.

This fixes the NULL pointer dereference, since both the for-loop and
memset in fq_codel_reset will be no-ops when flow_cnt is zero.

Signed-off-by: Jacob Keller 
---
 net/sched/sch_fq_codel.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index cd2e0e342fb6..df7390b4185c 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -479,24 +479,28 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr 

Re: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth

2018-04-02 Thread Jacob Keller
On Mon, Apr 2, 2018 at 7:05 AM, Bjorn Helgaas  wrote:
> +/* PCIe speed to Mb/s reduced by encoding overhead */
> +#define PCIE_SPEED2MBS_ENC(speed) \
> +   ((speed) == PCIE_SPEED_16_0GT ? (16000*(128/130)) : \
> +(speed) == PCIE_SPEED_8_0GT  ?  (8000*(128/130)) : \
> +(speed) == PCIE_SPEED_5_0GT  ?  (5000*(8/10)) : \
> +(speed) == PCIE_SPEED_2_5GT  ?  (2500*(8/10)) : \
> +0)
> +

Should this be "(speed * x ) / y" instead? wouldn't they calculate
128/130 and truncate that to zero before multiplying by the speed? Or
are compilers smart enough to do this the other way to avoid the
losses?

Thanks,
Jake


[RFC PATCH] net: limit maximum number of packets to mark with xmit_more

2017-08-25 Thread Jacob Keller
Under some circumstances, such as with many stacked devices, it is
possible that dev_hard_start_xmit will bundle many packets together, and
mark them all with xmit_more.

Most drivers respond to xmit_more by skipping tail bumps on packet
rings, or similar behavior as long as xmit_more is set. This is
a performance win since it means drivers can avoid notifying hardware of
new packets repeat daily, and thus avoid wasting unnecessary PCIe or other
bandwidth.

This use of xmit_more comes with a trade off because bundling too many
packets can increase latency of the Tx packets. To avoid this, we should
limit the maximum number of packets with xmit_more.

Driver authors could modify their drivers to check for some determined
limit, but this requires all drivers to be modified in order to gain
advantage.

Instead, add a sysctl "xmit_more_max" which can be used to configure the
maximum number of xmit_more skbs to send in a sequence. This ensures
that all drivers benefit, and allows system administrators the option to
tune the value to their environment.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---

Stray thoughts and further questions

Is this the right approach? Did I miss any other places where we should
limit? Does the limit make sense? Should it instead be a per-device
tuning nob instead of a global? Is 32 a good default?

 Documentation/sysctl/net.txt |  6 ++
 include/linux/netdevice.h|  2 ++
 net/core/dev.c   | 10 +-
 net/core/sysctl_net_core.c   |  7 +++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index b67044a2575f..3d995e8f4448 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -230,6 +230,12 @@ netdev_max_backlog
 Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
 receives packets faster than kernel can process them.
 
+xmit_more_max
+-
+
+Maximum number of packets in a row to mark with skb->xmit_more. A value of zero
+indicates no limit.
+
 netdev_rss_key
 --
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c5475b37a631..6341452aed09 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3321,6 +3321,8 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct 
net_device *dev);
 extern int netdev_budget;
 extern unsigned intnetdev_budget_usecs;
 
+extern unsigned int sysctl_xmit_more_max;
+
 /* Called by rtnetlink.c:rtnl_unlock() */
 void netdev_run_todo(void);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 270b54754821..d9946d29c3a5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2983,12 +2983,19 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff 
*first, struct net_device *de
 {
struct sk_buff *skb = first;
int rc = NETDEV_TX_OK;
+   int xmit_count = 0;
+   bool more = true;
 
while (skb) {
struct sk_buff *next = skb->next;
 
+   if (sysctl_xmit_more_max)
+   more = xmit_count++ < sysctl_xmit_more_max;
+   if (!more)
+   xmit_count = 0;
+
skb->next = NULL;
-   rc = xmit_one(skb, dev, txq, next != NULL);
+   rc = xmit_one(skb, dev, txq, more && next != NULL);
if (unlikely(!dev_xmit_complete(rc))) {
skb->next = next;
goto out;
@@ -3523,6 +3530,7 @@ EXPORT_SYMBOL(netdev_max_backlog);
 int netdev_tstamp_prequeue __read_mostly = 1;
 int netdev_budget __read_mostly = 300;
 unsigned int __read_mostly netdev_budget_usecs = 2000;
+unsigned int __read_mostly sysctl_xmit_more_max = 32;
 int weight_p __read_mostly = 64;   /* old backlog weight */
 int dev_weight_rx_bias __read_mostly = 1;  /* bias for backlog weight */
 int dev_weight_tx_bias __read_mostly = 1;  /* bias for output_queue quota */
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index b7cd9aafe99e..6950e702e101 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -460,6 +460,13 @@ static struct ctl_table net_core_table[] = {
.proc_handler   = proc_dointvec_minmax,
.extra1 = ,
},
+   {
+   .procname   = "xmit_more_max",
+   .data   = _xmit_more_max,
+   .maxlen = sizeof(unsigned int),
+   .mode   = 0644,
+   .prox_handler   = proc_dointvec
+   },
{ }
 };
 
-- 
2.14.1.436.g33e61a4f0239



[PATCH v2] i40e/i40evf: fix out-of-bounds read of cpumask

2017-08-22 Thread Jacob Keller
When responding to an affinity hint we directly copied a cpumask value,
intsead of using cpumask_copy. According to cpumask.h this is not
correct because cpumask_t is only guaranteed to have enough space for
the number of CPUs in the system, and may not be as big as we expect.
Thus a direct copy results in an out-of-bound read and potentially
a crash if the pages are aligned just right. This will be easily
detected on a kernel with KASAN enabled:

KASAN reports:
[   25.242312] BUG: KASAN: slab-out-of-bounds in 
i40e_irq_affinity_notify+0x30/0x50 [i40e] at addr 880462eea960
[   25.242315] Read of size 1024 by task kworker/2:1/170
[   25.242322] CPU: 2 PID: 170 Comm: kworker/2:1 Not tainted 
4.11.0-22.el7a.x86_64 #1
[   25.242325] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 05/06/2015
[   25.242336] Workqueue: events irq_affinity_notify
[   25.242340] Call Trace:
[   25.242350]  dump_stack+0x63/0x8d
[   25.242358]  kasan_object_err+0x21/0x70
[   25.242364]  kasan_report+0x288/0x540
[   25.242397]  ? i40e_irq_affinity_notify+0x30/0x50 [i40e]
[   25.242403]  check_memory_region+0x13c/0x1a0
[   25.242408]  __asan_loadN+0xf/0x20
[   25.242440]  i40e_irq_affinity_notify+0x30/0x50 [i40e]
[   25.242446]  irq_affinity_notify+0x1b4/0x230
[   25.242452]  ? irq_set_affinity_notifier+0x130/0x130
[   25.242457]  ? kasan_slab_free+0x89/0xc0
[   25.242466]  process_one_work+0x32f/0x6f0
[   25.242472]  worker_thread+0x89/0x770
[   25.242481]  ? pci_mmcfg_check_reserved+0xc0/0xc0
[   25.242488]  kthread+0x18c/0x1e0
[   25.242493]  ? process_one_work+0x6f0/0x6f0
[   25.242499]  ? kthread_create_on_node+0xc0/0xc0
[   25.242506]  ret_from_fork+0x2c/0x40
[   25.242511] Object at 880462eea960, in cache kmalloc-8 size: 8
[   25.242513] Allocated:
[   25.242514] PID = 170
[   25.242522]  save_stack_trace+0x1b/0x20
[   25.242529]  save_stack+0x46/0xd0
[   25.242533]  kasan_kmalloc+0xad/0xe0
[   25.242537]  __kmalloc_node+0x12c/0x2b0
[   25.242542]  alloc_cpumask_var_node+0x3c/0x60
[   25.242546]  alloc_cpumask_var+0xe/0x10
[   25.242550]  irq_affinity_notify+0x94/0x230
[   25.242555]  process_one_work+0x32f/0x6f0
[   25.242559]  worker_thread+0x89/0x770
[   25.242564]  kthread+0x18c/0x1e0
[   25.242568]  ret_from_fork+0x2c/0x40
[   25.242569] Freed:
[   25.242570] PID = 0
[   25.242572] (stack is not available)
[   25.242573] Memory state around the buggy address:
[   25.242578]  880462eea800: fc fc 00 fc fc 00 fc fc 00 fc fc 00 fc fc fb 
fc
[   25.242582]  880462eea880: fc fb fc fc fb fc fc 00 fc fc 00 fc fc 00 fc 
fc
[   25.242586] >880462eea900: 00 fc fc 00 fc fc 00 fc fc fb fc fc 00 fc fc 
fc
[   25.242588]   ^
[   25.242592]  880462eea980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[   25.242596]  880462eeaa00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[   25.242597] 
==

Fixes: 96db776a3682 ("i40e/i40evf: fix interrupt affinity bug", 2016-09-14)
Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
Cc: sta...@vger.kernel.org # 4.10+
---
This updates the commit message for the original fix, and indicates that
it fixes a potential crash, as well as tagged the commit for stable and
added a Fixes to indicate which commit this fixes.

 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 397f1bcaed3e..50a7260b32c2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3450,7 +3450,7 @@ static void i40e_irq_affinity_notify(struct 
irq_affinity_notify *notify,
struct i40e_q_vector *q_vector =
container_of(notify, struct i40e_q_vector, affinity_notify);
 
-   q_vector->affinity_mask = *mask;
+   cpumask_copy(_vector->affinity_mask, mask);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 1ffd55e06a49..87175a14740e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -520,7 +520,7 @@ static void i40evf_irq_affinity_notify(struct 
irq_affinity_notify *notify,
struct i40e_q_vector *q_vector =
container_of(notify, struct i40e_q_vector, affinity_notify);
 
-   q_vector->affinity_mask = *mask;
+   cpumask_copy(_vector->affinity_mask, mask);
 }
 
 /**
-- 
2.14.1.323.g792488f9a5e1



[RFC PATCH] net: don't set __LINK_STATE_START until after dev->open() call

2017-08-07 Thread Jacob Keller
Fix an issue with relying on netif_running() which could be true during
when dev->open() handler is being called, even if it would exit with
a failure. This ensures the state does not get set and removed with
a narrow race for other callers to read it as open when infact it never
finished opening.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
I found this as a result of debugging a race condition in the i40evf
driver, in which we assumed that netif_running() would not be true until
after dev->open() had been called and succeeded. Unfortunately we can't
hold the rtnl_lock() while checking netif_running() because it would
cause a deadlock between our reset task and our ndo_open handler.

I am wondering whether the proposed change is acceptable here, or
whether some ndo_open handlers rely on __LINK_STATE_START being true
prior to their being called?

 net/core/dev.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1d75499add72..11953af90427 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1362,8 +1362,6 @@ static int __dev_open(struct net_device *dev)
if (ret)
return ret;
 
-   set_bit(__LINK_STATE_START, >state);
-
if (ops->ndo_validate_addr)
ret = ops->ndo_validate_addr(dev);
 
@@ -1372,9 +1370,8 @@ static int __dev_open(struct net_device *dev)
 
netpoll_poll_enable(dev);
 
-   if (ret)
-   clear_bit(__LINK_STATE_START, >state);
-   else {
+   if (!ret)
+   set_bit(__LINK_STATE_START, >state);
dev->flags |= IFF_UP;
dev_set_rx_mode(dev);
dev_activate(dev);
-- 
2.14.0.rc1.251.g593d8d6362ce



[PATCH] i40e: don't truncate match_method assignment

2016-12-09 Thread Jacob Keller
The .match_method field is a u8, so we shouldn't be casting to a u16,
and because it is only one byte, we do not need to byte swap anything.
Just assign the value directly. This avoids issues on Big Endian
architectures which would have byte swapped and then incorrectly
truncated the value.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
Cc: Stephen Rothwell <s...@canb.auug.org.au>
Cc: Bimmy Pujari <bimmy.puj...@intel.com>
---
Not sure if this was already in Jeff's queue, but since it's an obvious
fix for the issue found by Stephen, I thought I'd send it out now just
to make sure. Thanks for catching this, and sorry we didn't find the fix
earlier.

 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 69a51a4119d6..6ccf18464339 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2257,8 +2257,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi)
}
add_list[num_add].queue_number = 0;
/* set invalid match method for later detection */
-   add_list[num_add].match_method =
-   cpu_to_le16((u16)I40E_AQC_MM_ERR_NO_RES);
+   add_list[num_add].match_method = I40E_AQC_MM_ERR_NO_RES;
cmd_flags |= I40E_AQC_MACVLAN_ADD_PERFECT_MATCH;
add_list[num_add].flags = cpu_to_le16(cmd_flags);
num_add++;
-- 
2.11.0.rc2.152.g4d04e67



[PATCH RFC v1 1/2] rxclass: use ethtool_get_flow_spec_ring to display queue and VF

2016-12-06 Thread Jacob Keller
Recent kernels have made the ring_cookie store both the Queue index as
well as a VF identifier. This allows for drivers to direct traffic to
specific VF queues, without having the user have to understand the
physical queue layout. Add support to display this notation so that
users do not have to manually parse the value.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 rxclass.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/rxclass.c b/rxclass.c
index c7bfebaf6e22..7f0e765d3b47 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -247,11 +247,19 @@ static void rxclass_print_nfc_rule(struct 
ethtool_rx_flow_spec *fsp)
 
rxclass_print_nfc_spec_ext(fsp);
 
-   if (fsp->ring_cookie != RX_CLS_FLOW_DISC)
-   fprintf(stdout, "\tAction: Direct to queue %llu\n",
-   fsp->ring_cookie);
-   else
+   if (fsp->ring_cookie != RX_CLS_FLOW_DISC) {
+   u64 vf = ethtool_get_flow_spec_ring_vf(fsp->ring_cookie);
+   u64 queue = ethtool_get_flow_spec_ring(fsp->ring_cookie);
+
+   if (vf)
+   fprintf(stdout, "\tAction: Direct to queue %llu\n",
+   queue);
+   else
+   fprintf(stdout, "\tAction: Direct to VF %llu queue 
%llu\n",
+   vf, queue);
+   } else {
fprintf(stdout, "\tAction: Drop\n");
+   }
 
fprintf(stdout, "\n");
 }
-- 
2.11.0.rc2.152.g4d04e67



[PATCH RFC v1 0/2] ethtool: add support for VF in ring_cookie

2016-12-06 Thread Jacob Keller
The ring_cookie value for an ntuple filter is used by the driver to
determine what queue to direct the traffic towards. Since the main
function has cnotrol over all queues it is possible to use this to
direct traffic to a queue on a virtual function rather than a queue on
the physical function. However, this was cumbersome to do if you did not
know exactly how the device laid out virtual function queues.

To alleviate this, the kernel side of the ethtool API gained a couple of
functions used to partition the ring_cookie value into a queue index,
and a virtual function index. It was assumed that no device was likely
to have more than 2^32 queues, and currently no device has more than 2^8
virtual functions, so the split was chosen such that the lower 32 bits
of the ring_cookie represent the queue, and the next 8 bits would
represent a virtual function id.

This is useful, but lacked support on the ethtool command line side. It
is possible to simply use hex notation, and to "know" that the driver
works in this manner. However, this makes use of and knowledge of the
feature not widespread.

This series attempts to fix that with two patches. First, we make use of
the access functions when displaying the queue. Since no device
currently supports queues beyond 2^32 it is unlikely that any driver
which does not honor this partitioning of the ring_cookie would have its
index incorrectly interpreted. This makes it so that the user will see
the queue index and virtual function index split apart instead of being
displayed as a combined base-10 number.

Additionally, the second patch adds support for a new notation for the
action. Instead of just using a number, we add support for the special
syntax "X/Y" where X would be a number from 0-256 which is the virtual
function index, while Y would be a number from 0 to 2^32 which is the
queue index. This makes it much easier to write the encoded ring_cookie
value.

This series is sent as an RFC so that others may comment on the design
of the new syntax, and offer alternative suggestions. Some of the ones I
have considered:

a) just leave the second part alone, and display the VF+queue split, but
do not provide an easier syntax for writing the split value. This is
easy to do, but leaves the feature as a sort of hidden gem.
Additionally, the user would most easily do this by writing it as hex
notation such as

  action 0x10005

However, this is cumbersome. It can be partially alleviated by using
shell scripting on top of ethtool, but again is somewhat of a pain.

b) actually adding a new key work so that you would do something like
"queue x" and "vf y" or similar. However, this is much more difficult to
correctly implement. The current implementation of handling the values
assumes that each type argument is followed by a single argument which
fits into a well defined section of the structure. Because the structure
is not specified as le64 or be64, we can't use a union or similar to
make it correct (since placement in the union ordering would depend on
byte ordering of the field!) so we would have to re-architect the
handling of values to allow multiple keywords to combine into a single
value. This seemed incredibly cumbersome.

The disadvantage of the new syntax is that it *is* new syntax that we
have to parse ourselves, and that may not be obvious to users. However,
we can document it, and the current syntax will work for all drivers,
and the new syntax will generally only work on drivers which use the
ring_cookie partitioning in this manner (as no current driver has queues
beyond 2^32, as was decided when the partitioning scheme was added to
the ethtool header file).

I like the use of / but am open to alternative suggestions for a simple
sequence. Note that we cannot use spaces, as this would move into the
argv solution of (b) which I do not have time to implement, and would be
a much more massive project.

Jacob Keller (2):
  rxclass: use ethtool_get_flow_spec_ring to display queue and VF
  ethtool: add support to specify ring_cookie as VF/queue

 ethtool.8.in |  3 ++-
 rxclass.c| 74 +++-
 2 files changed, 65 insertions(+), 12 deletions(-)

-- 
2.11.0.rc2.152.g4d04e67



[PATCH RFC v1 2/2] ethtool: add support to specify ring_cookie as VF/queue

2016-12-06 Thread Jacob Keller
Add support to allow users to specify the ring_cookie as a VF/queue
value, ie: '0/25' means direct to the physical function queue 25, while
'3/10' means to direct to VF id 3, queue 10. It is driver dependent to
determine what the VF id is, excepting that 0 indicates the physical
function. This allows the user to more easily specify the matching
notation as it is understood by the kernel and some drivers. Since no
driver exists today with over 2,147,483,647 queues, this notation will
simply fail on drivers which don't recognize the new partitioning.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 ethtool.8.in |  3 ++-
 rxclass.c| 58 +++---
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index 5c36c06385f6..cb165fa4c77a 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -826,7 +826,8 @@ Specifies the Rx queue to send packets to, or some other 
action.
 nokeep;
 lB l.
 -1 Drop the matched flow
-0 or higherRx queue to route the flow
+0 or higherRoute flow to specific Rx queue on the physical function
+N/MRoute flow to Rx queue M of specific virtual function N (0 indicates 
the physical function)
 .TE
 .TP
 .BI loc \ N
diff --git a/rxclass.c b/rxclass.c
index 7f0e765d3b47..de25550529b4 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -614,6 +614,7 @@ typedef enum {
OPT_IP4,
OPT_IP6,
OPT_MAC,
+   OPT_ACTION,
 } rule_opt_type_t;
 
 #define NFC_FLAG_RING  0x001
@@ -654,7 +655,7 @@ static const struct rule_opts rule_nfc_tcp_ip4[] = {
{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.pdst),
  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.pdst) },
-   { "action", OPT_U64, NFC_FLAG_RING,
+   { "action", OPT_ACTION, NFC_FLAG_RING,
  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
{ "loc", OPT_U32, NFC_FLAG_LOC,
  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -685,7 +686,7 @@ static const struct rule_opts rule_nfc_esp_ip4[] = {
{ "spi", OPT_BE32, NFC_FLAG_SPI,
  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.spi),
  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.spi) },
-   { "action", OPT_U64, NFC_FLAG_RING,
+   { "action", OPT_ACTION, NFC_FLAG_RING,
  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
{ "loc", OPT_U32, NFC_FLAG_LOC,
  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -728,7 +729,7 @@ static const struct rule_opts rule_nfc_usr_ip4[] = {
{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.l4_4_bytes) + 
2,
  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) + 
2 },
-   { "action", OPT_U64, NFC_FLAG_RING,
+   { "action", OPT_ACTION, NFC_FLAG_RING,
  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
{ "loc", OPT_U32, NFC_FLAG_LOC,
  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -762,7 +763,7 @@ static const struct rule_opts rule_nfc_tcp_ip6[] = {
{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip6_spec.pdst),
  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip6_spec.pdst) },
-   { "action", OPT_U64, NFC_FLAG_RING,
+   { "action", OPT_ACTION, NFC_FLAG_RING,
  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
{ "loc", OPT_U32, NFC_FLAG_LOC,
  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -793,7 +794,7 @@ static const struct rule_opts rule_nfc_esp_ip6[] = {
{ "spi", OPT_BE32, NFC_FLAG_SPI,
  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip6_spec.spi),
  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip6_spec.spi) },
-   { "action", OPT_U64, NFC_FLAG_RING,
+   { "action", OPT_ACTION, NFC_FLAG_RING,
  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
{ "loc", OPT_U32, NFC_FLAG_LOC,
  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -836,7 +837,7 @@ static const struct rule_opts rule_nfc_usr_ip6[] = {
{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip6_spec.l4_4_bytes) + 
2,
  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip6_spec.l4_4_bytes) + 
2 },
-   { "action", OPT_U64, NFC_FLAG_RING,
+   { "action", OPT_ACTION, NFC_FLAG_RING,
  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
{ "loc", OPT_U32, NFC_FLAG_LOC,
  offsetof(struct ethtool_rx_flow_spec, locati

[PATCH] ethtool: mark mask values as ULL values

2016-11-29 Thread Jacob Keller
If compiling with signed checks enabled, there will be warnings
generated by the ETHTOOL_RX_FLOW_SPEC_RING(_VF) masks. These are
currently marked as signed constants, which will generate warnings when
masking with unsigned values. Avoid this by marking them explicitly as
unsigned values.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 include/uapi/linux/ethtool.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index ed650eef9e54..a716112b2b01 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -891,8 +891,8 @@ struct ethtool_rx_flow_spec {
  * space for this at this time. If a future patch consumes the next
  * byte it should be aware of this possiblity.
  */
-#define ETHTOOL_RX_FLOW_SPEC_RING  0xLL
-#define ETHTOOL_RX_FLOW_SPEC_RING_VF   0x00FFLL
+#define ETHTOOL_RX_FLOW_SPEC_RING  0xULL
+#define ETHTOOL_RX_FLOW_SPEC_RING_VF   0x00FFULL
 #define ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF 32
 static inline __u64 ethtool_get_flow_spec_ring(__u64 ring_cookie)
 {
-- 
2.11.0.rc2.152.g4d04e67



[PATCH RFC v3] ethtool: implement helper to get flow_type value

2016-11-29 Thread Jacob Keller
Often a driver wants to store the flow type and thus it must mask the
extra fields. This is a task that could grow more complex as more flags
are added in the future. Add a helper function that masks the flags for
marking additional fields.

Modify drivers in drivers/net/ethernet that currently check for FLOW_EXT
and FLOW_MAC_EXT to use the helper. Currently this is only the mellanox
drivers.

I chose not to modify other drivers as I'm actually unsure whether we
should always mask the flow type even for drivers which don't recognize
the newer flags. On the one hand, today's drivers (generally)
automatically fail when a new flag is used because they won't mask it
and their checks against flow_type will not match. On the other hand, it
means another place that you have to update when you begin implementing
a flag.

An alternative is to have the driver store a set of flags that it knows
about, and then have ethtool core do the check for us to discard frames.
I haven't implemented this quite yet.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 6 +++---
 include/uapi/linux/ethtool.h| 5 +
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 487a58f9c192..d8f9839ce2a3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1270,7 +1270,7 @@ static int mlx4_en_validate_flow(struct net_device *dev,
return -EINVAL;
}
 
-   switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) {
case TCP_V4_FLOW:
case UDP_V4_FLOW:
if (cmd->fs.m_u.tcp_ip4_spec.tos)
@@ -1493,7 +1493,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct 
net_device *dev,
if (err)
return err;
 
-   switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) {
case ETHER_FLOW:
spec_l2 = kzalloc(sizeof(*spec_l2), GFP_KERNEL);
if (!spec_l2)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index 3691451c728c..066e6c5cf38b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -63,7 +63,7 @@ static struct mlx5e_ethtool_table *get_flow_table(struct 
mlx5e_priv *priv,
int table_size;
int prio;
 
-   switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(fs->flow_type)) {
case TCP_V4_FLOW:
case UDP_V4_FLOW:
max_tuples = ETHTOOL_NUM_L3_L4_FTS;
@@ -147,7 +147,7 @@ static int set_flow_attrs(u32 *match_c, u32 *match_v,
 outer_headers);
void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
 outer_headers);
-   u32 flow_type = fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT);
+   u32 flow_type = ethtool_get_flow_spec_type(fs->flow_type);
struct ethtool_tcpip4_spec *l4_mask;
struct ethtool_tcpip4_spec *l4_val;
struct ethtool_usrip4_spec *l3_mask;
@@ -393,7 +393,7 @@ static int validate_flow(struct mlx5e_priv *priv,
fs->ring_cookie != RX_CLS_FLOW_DISC)
return -EINVAL;
 
-   switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(fs->flow_type)) {
case ETHER_FLOW:
eth_mask = >m_u.ether_spec;
if (!is_zero_ether_addr(eth_mask->h_dest))
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index f0db7788f887..ed650eef9e54 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1583,6 +1583,11 @@ static inline int ethtool_validate_duplex(__u8 duplex)
 #defineFLOW_EXT0x8000
 #defineFLOW_MAC_EXT0x4000
 
+static inline __u32 ethtool_get_flow_spec_type(__u32 flow_type)
+{
+   return flow_type & ~(FLOW_EXT | FLOW_MAC_EXT);
+}
+
 /* L3-L4 network traffic flow hash options */
 #defineRXH_L2DA(1 << 1)
 #defineRXH_VLAN(1 << 2)
-- 
2.11.0.rc2.152.g4d04e67



[PATCH RFC v2] ethtool: implement helper to get flow_type value

2016-11-28 Thread Jacob Keller
Often a driver wants to store the flow type and thus it must mask the
extra fields. This is a task that could grow more complex as more flags
are added in the future. Add a helper function that masks the flags for
marking additional fields.

Modify drivers in drivers/net/ethernet that currently check for FLOW_EXT
and FLOW_MAC_EXT to use the helper. Currently this is only the mellanox
drivers.

I chose not to modify other drivers as I'm actually unsure whether we
should always mask the flow type even for drivers which don't recognize
the newer flags. On the one hand, today's drivers (generally)
automatically fail when a new flag is used because they won't mask it
and their checks against flow_type will not match. On the other hand, it
means another place that you have to update when you begin implementing
a flag.

An alternative is to have the driver store a set of flags that it knows
about, and then have ethtool core do the check for us to discard frames.
I haven't implemented this quite yet.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 6 +++---
 include/uapi/linux/ethtool.h| 5 +
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 487a58f9c192..d8f9839ce2a3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1270,7 +1270,7 @@ static int mlx4_en_validate_flow(struct net_device *dev,
return -EINVAL;
}
 
-   switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) {
case TCP_V4_FLOW:
case UDP_V4_FLOW:
if (cmd->fs.m_u.tcp_ip4_spec.tos)
@@ -1493,7 +1493,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct 
net_device *dev,
if (err)
return err;
 
-   switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) {
case ETHER_FLOW:
spec_l2 = kzalloc(sizeof(*spec_l2), GFP_KERNEL);
if (!spec_l2)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index 3691451c728c..066e6c5cf38b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -63,7 +63,7 @@ static struct mlx5e_ethtool_table *get_flow_table(struct 
mlx5e_priv *priv,
int table_size;
int prio;
 
-   switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(fs->flow_type)) {
case TCP_V4_FLOW:
case UDP_V4_FLOW:
max_tuples = ETHTOOL_NUM_L3_L4_FTS;
@@ -147,7 +147,7 @@ static int set_flow_attrs(u32 *match_c, u32 *match_v,
 outer_headers);
void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
 outer_headers);
-   u32 flow_type = fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT);
+   u32 flow_type = ethtool_get_flow_spec_type(fs->flow_type);
struct ethtool_tcpip4_spec *l4_mask;
struct ethtool_tcpip4_spec *l4_val;
struct ethtool_usrip4_spec *l3_mask;
@@ -393,7 +393,7 @@ static int validate_flow(struct mlx5e_priv *priv,
fs->ring_cookie != RX_CLS_FLOW_DISC)
return -EINVAL;
 
-   switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(fs->flow_type)) {
case ETHER_FLOW:
eth_mask = >m_u.ether_spec;
if (!is_zero_ether_addr(eth_mask->h_dest))
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index f0db7788f887..e92ad725c9d0 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1583,6 +1583,11 @@ static inline int ethtool_validate_duplex(__u8 duplex)
 #defineFLOW_EXT0x8000
 #defineFLOW_MAC_EXT0x4000
 
+static inline __u32 ethtool_get_flow_spec_type(__u32 flow_type)
+{
+   return flow_type & (FLOW_EXT | FLOW_MAC_EXT);
+}
+
 /* L3-L4 network traffic flow hash options */
 #defineRXH_L2DA(1 << 1)
 #defineRXH_VLAN(1 << 2)
-- 
2.11.0.rc2.152.g4d04e67



[PATCH RFC v1] ethtool: implement helper to get flow_type value

2016-11-22 Thread Jacob Keller
Often a driver wants to store the flow type and thus it must mask the
extra fields. This is a task that could grow more complex as more flags
are added in the future. Add a helper function that masks the flags for
marking additional fields.

Modify drivers in drivers/net/ethernet that currently check for FLOW_EXT
and FLOW_MAC_EXT to use the helper. Currently this is only the mellanox
drivers.

I chose not to modify other drivers as I'm actually unsure whether we
should always mask the flow type even for drivers which don't recognize
the newer flags. On the one hand, today's drivers (generally)
automatically fail when a new flag is used because they won't mask it
and their checks against flow_type will not match. On the other hand, it
means another place that you have to update when you begin implementing
a flag.

An alternative is to have the driver store a set of flags that it knows
about, and then have ethtool core do the check for us to discard frames.
I haven't implemented this quite yet.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
I plan on using this helper when fixing the mask code for ntuple filters
in the Intel i40e driver. I wanted to see whether this approach was
acceptable, and whether we should implement additional checks. The
primary reason is that today's drivers are "fail closed" in that a new
flag type will probably fail on drivers due to checking for flow types
they recognize. Since drivers only remove the masked bits they recognize
this works. However, this gets cumbersome if new additional flags get
added in the future. I would like some sort of helper, but if we
encourage its use, and a new flag gets added, the helper will then
unforunately make the driver "fail open" in that a new flag will get
ignored as the driver won't know to return -EINVAL.

I think the right solution will be to add some sort of checks in core
ethtool which we can basically set the recognized flags in some way for
all drivers such that the ethtool core can drop requests for flows with
unknown flag types. I'm unsure how to implement this though.

Thoughts?

 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c |  6 +++---
 include/uapi/linux/ethtool.h| 11 ---
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 487a58f9c192..d8f9839ce2a3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1270,7 +1270,7 @@ static int mlx4_en_validate_flow(struct net_device *dev,
return -EINVAL;
}
 
-   switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) {
case TCP_V4_FLOW:
case UDP_V4_FLOW:
if (cmd->fs.m_u.tcp_ip4_spec.tos)
@@ -1493,7 +1493,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct 
net_device *dev,
if (err)
return err;
 
-   switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) {
case ETHER_FLOW:
spec_l2 = kzalloc(sizeof(*spec_l2), GFP_KERNEL);
if (!spec_l2)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index 3691451c728c..066e6c5cf38b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -63,7 +63,7 @@ static struct mlx5e_ethtool_table *get_flow_table(struct 
mlx5e_priv *priv,
int table_size;
int prio;
 
-   switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(fs->flow_type)) {
case TCP_V4_FLOW:
case UDP_V4_FLOW:
max_tuples = ETHTOOL_NUM_L3_L4_FTS;
@@ -147,7 +147,7 @@ static int set_flow_attrs(u32 *match_c, u32 *match_v,
 outer_headers);
void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
 outer_headers);
-   u32 flow_type = fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT);
+   u32 flow_type = ethtool_get_flow_spec_type(fs->flow_type);
struct ethtool_tcpip4_spec *l4_mask;
struct ethtool_tcpip4_spec *l4_val;
struct ethtool_usrip4_spec *l3_mask;
@@ -393,7 +393,7 @@ static int validate_flow(struct mlx5e_priv *priv,
fs->ring_cookie != RX_CLS_FLOW_DISC)
return -EINVAL;
 
-   switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
+   switch (ethtool_get_flow_spec_type(fs->flow_type)) {
case ETHER_FLOW:
eth_mask = >m_u.ether_

[PATCH v2] ethtool: check size of user memory before copying strings and stats

2016-03-01 Thread Jacob Keller
Since at least 2005, (oldest commit in ethtool.git), the userspace
ethtool implementation has given the size of the memory it has allocated
as the actual size in the ethtool data structures. We previously blindly
ignore this and overwrite the requested size with the current size
returned by .get_strings or .get_sset_count. This can cause problems if
these values aren't static.

Since ethtool has always given the expected size, first perform a check
to ensure that the current size is no larger than the requested size. If
it is, return with -ENOMEM, as we do not have enough memory to save the
results.

This protects against buffer overrun should the driver have a non-static
number of statistics, tests, or private flags, and the value changes
between the ethtool userspace call to .get_sset_count and the actual
calls to populate the requested memory.

Update the header files to indicate the expected behavior of userspace.
This change shouldn't break current or previous userspace as they have
consistently included the length correctly since as far back as we can
check.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
Reported-by: Alexander Duyck <alexander.du...@gmail.com>
---

- v2
* use -ENOSPC instead of -ENOMEM at the suggestion of Mark Rustad

 include/uapi/linux/ethtool.h | 19 +--
 net/core/ethtool.c   | 12 
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 37fd6dc33de4..0d7f4855dc0b 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -560,13 +560,16 @@ enum ethtool_stringset {
  * struct ethtool_gstrings - string set for data tagging
  * @cmd: Command number = %ETHTOOL_GSTRINGS
  * @string_set: String set ID; one of  ethtool_stringset
- * @len: On return, the number of strings in the string set
+ * @len: On entry, the number of strings which fit into the allocated space.
+ *   On return, the number of strings in the string set
  * @data: Buffer for strings.  Each string is null-padded to a size of
  * %ETH_GSTRING_LEN.
  *
  * Users must use %ETHTOOL_GSSET_INFO to find the number of strings in
  * the string set.  They must allocate a buffer of the appropriate
- * size immediately following this structure.
+ * size immediately following this structure. They must set the len to the
+ * number of strings which will fit into the allocated space following this
+ * structure.
  */
 struct ethtool_gstrings {
__u32   cmd;
@@ -622,13 +625,15 @@ enum ethtool_test_flags {
  * @flags: A bitmask of flags from  ethtool_test_flags.  Some
  * flags may be set by the user on entry; others may be set by
  * the driver on return.
- * @len: On return, the number of test results
+ * @len: On entry, the number of test results that fit in the allocated space
+ *   On return, the number of test results
  * @data: Array of test results
  *
  * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the
  * number of test results that will be returned.  They must allocate a
  * buffer of the appropriate size (8 * number of results) immediately
- * following this structure.
+ * following this structure. They must set the len to the number of tests
+ * results which will fit into the allocated space following the structure.
  */
 struct ethtool_test {
__u32   cmd;
@@ -641,13 +646,15 @@ struct ethtool_test {
 /**
  * struct ethtool_stats - device-specific statistics
  * @cmd: Command number = %ETHTOOL_GSTATS
- * @n_stats: On return, the number of statistics
+ * @n_stats: On entry, the number of stats that fit into the allocated space.
+ *   On return, the number of statistics
  * @data: Array of statistics
  *
  * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the
  * number of statistics that will be returned.  They must allocate a
  * buffer of the appropriate size (8 * number of statistics)
- * immediately following this structure.
+ * immediately following this structure. They must set n_stats to the number
+ * of stats which fit into the allocated space at the end of the structure.
  */
 struct ethtool_stats {
__u32   cmd;
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 2966cd0d7c93..8ce7d1b5756d 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1766,6 +1766,9 @@ static int ethtool_self_test(struct net_device *dev, char 
__user *useraddr)
if (copy_from_user(, useraddr, sizeof(test)))
return -EFAULT;
 
+   if (test_len > test.len)
+   return -ENOSPC;
+
test.len = test_len;
data = kmalloc(test_len * sizeof(u64), GFP_USER);
if (!data)
@@ -1799,6 +1802,9 @@ static int ethtool_get_strings(struct net_device *dev, 
void __user *useraddr)
if (ret < 0)
return ret;
 
+   if (ret > gstrings.len)
+   return -ENOSPC;
+
gstrings.len = ret;
 
data =

[PATCH] ethtool: check size of user memory before copying strings and stats

2016-03-01 Thread Jacob Keller
Since at least 2005, (oldest commit in ethtool.git), the userspace
ethtool implementation has given the size of the memory it has allocated
as the actual size in the ethtool data structures. We previously blindly
ignore this and overwrite the requested size with the current size
returned by .get_strings or .get_sset_count. This can cause problems if
these values aren't static.

Since ethtool has always given the expected size, first perform a check
to ensure that the current size is no larger than the requested size. If
it is, return with -ENOMEM, as we do not have enough memory to save the
results.

This protects against buffer overrun should the driver have a non-static
number of statistics, tests, or private flags, and the value changes
between the ethtool userspace call to .get_sset_count and the actual
calls to populate the requested memory.

Update the header files to indicate the expected behavior of userspace.
This change shouldn't break current or previous userspace as they have
consistently included the length correctly since as far back as we can
check.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
Reported-by: Alexander Duyck <alexander.du...@gmail.com>
---
 include/uapi/linux/ethtool.h | 19 +--
 net/core/ethtool.c   | 12 
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 37fd6dc33de4..0d7f4855dc0b 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -560,13 +560,16 @@ enum ethtool_stringset {
  * struct ethtool_gstrings - string set for data tagging
  * @cmd: Command number = %ETHTOOL_GSTRINGS
  * @string_set: String set ID; one of  ethtool_stringset
- * @len: On return, the number of strings in the string set
+ * @len: On entry, the number of strings which fit into the allocated space.
+ *   On return, the number of strings in the string set
  * @data: Buffer for strings.  Each string is null-padded to a size of
  * %ETH_GSTRING_LEN.
  *
  * Users must use %ETHTOOL_GSSET_INFO to find the number of strings in
  * the string set.  They must allocate a buffer of the appropriate
- * size immediately following this structure.
+ * size immediately following this structure. They must set the len to the
+ * number of strings which will fit into the allocated space following this
+ * structure.
  */
 struct ethtool_gstrings {
__u32   cmd;
@@ -622,13 +625,15 @@ enum ethtool_test_flags {
  * @flags: A bitmask of flags from  ethtool_test_flags.  Some
  * flags may be set by the user on entry; others may be set by
  * the driver on return.
- * @len: On return, the number of test results
+ * @len: On entry, the number of test results that fit in the allocated space
+ *   On return, the number of test results
  * @data: Array of test results
  *
  * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the
  * number of test results that will be returned.  They must allocate a
  * buffer of the appropriate size (8 * number of results) immediately
- * following this structure.
+ * following this structure. They must set the len to the number of tests
+ * results which will fit into the allocated space following the structure.
  */
 struct ethtool_test {
__u32   cmd;
@@ -641,13 +646,15 @@ struct ethtool_test {
 /**
  * struct ethtool_stats - device-specific statistics
  * @cmd: Command number = %ETHTOOL_GSTATS
- * @n_stats: On return, the number of statistics
+ * @n_stats: On entry, the number of stats that fit into the allocated space.
+ *   On return, the number of statistics
  * @data: Array of statistics
  *
  * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the
  * number of statistics that will be returned.  They must allocate a
  * buffer of the appropriate size (8 * number of statistics)
- * immediately following this structure.
+ * immediately following this structure. They must set n_stats to the number
+ * of stats which fit into the allocated space at the end of the structure.
  */
 struct ethtool_stats {
__u32   cmd;
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 2966cd0d7c93..20abcc43787e 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1766,6 +1766,9 @@ static int ethtool_self_test(struct net_device *dev, char 
__user *useraddr)
if (copy_from_user(, useraddr, sizeof(test)))
return -EFAULT;
 
+   if (test_len > test.len)
+   return -ENOMEM;
+
test.len = test_len;
data = kmalloc(test_len * sizeof(u64), GFP_USER);
if (!data)
@@ -1799,6 +1802,9 @@ static int ethtool_get_strings(struct net_device *dev, 
void __user *useraddr)
if (ret < 0)
return ret;
 
+   if (ret > gstrings.len)
+   return -ENOMEM;
+
gstrings.len = ret;
 
data = kcalloc(gstrings.len, ETH_GSTRING_LEN, GFP_USER);
@@ -1898,6 +1904,9 @@ static 

[PATCH 2/4] ethtool: ensure channel counts are within bounds during SCHANNELS

2016-02-08 Thread Jacob Keller
Add a sanity check to ensure that all requested channel sizes are within
bounds, which should reduce errors in driver implementation.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 net/core/ethtool.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 8a40948c1fc6..71aff99157c3 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1276,16 +1276,27 @@ static noinline_for_stack int 
ethtool_get_channels(struct net_device *dev,
 static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
   void __user *useraddr)
 {
-   struct ethtool_channels channels;
+   struct ethtool_channels channels, max;
u32 max_rx = 0;
int ret;
 
-   if (!dev->ethtool_ops->set_channels)
+   if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
return -EOPNOTSUPP;
 
if (copy_from_user(, useraddr, sizeof(channels)))
return -EFAULT;
 
+   ret = dev->ethtool_ops->get_channels(dev, );
+   if (ret)
+   return ret;
+
+   /* ensure new counts are within the maximums */
+   if ((channels.rx_count > max.max_rx) ||
+   (channels.tx_count > max.max_tx) ||
+   (channels.combined_count > max.max_combined) ||
+   (channels.other_count > max.max_other))
+   return -EINVAL;
+
/* ensure the new Rx count fits within the configured Rx flow
 * indirection table settings */
if (netif_is_rxfh_configured(dev) &&
-- 
2.7.0.236.gda096a0.dirty



[PATCH 0/3] ethtool: correct {GS}CHANNELS and {GS}RXFH conflict

2016-02-08 Thread Jacob Keller
This patch series fixes up ethtool_set_channels operation which
allowed modifying the RXFH table indirectly by reducing the number of
queues below the current max queue used by the Rx flow table. Most
drivers incorrectly allowed this to destroy the Rx flow table and
would then start by reinitializing it to default settings. However,
drivers are not able to correctly handle the conflict since there was
no way to differentiate between the default settings and the user
requested explicit settings.

To fix this, implement a new netdev private flag which we use to
indicate whether the RXFH has been user configured. If someone has
a better alternative of how to store this information, let me know.
I do not think that priv_flags is the best solution but I have not had
any better idea.

Secondly, we add a function which just calls the driver's get_rxfh
callback to determine the current indirection table. Loop through this
and we can determine the current highest queue that will be used by
RSS.

Now, modify ethtool_set_channels to add a check ensuring that if (a)
we have had rxfh configured by user, (b) we can get the maximum RSS
queue currently used, then we ensure that the newly requested Rx count
(or combined count) is at least as high as this maximum RSS queue. The
reasoning here is that we can always safely increase the number of
queues. If we decrease the queues we must ensure that the decrease
does not go lower than the highest in-use queue for the Rx flow table.

Drivers may still need to be patched if they currently overwrite the
Rx flow table during channel configuration. If the driver currently
always resets Rx flow table when increasing number of queues it must
be patched to only do this when netif_is_rxfh_configured returns
false.

The second two patches are suggestions which I think are valid and
noticed while implementing the above changes. The second patch simply
adds a check to ensure that all provided channel counts fit within
driver defined maximums.

The third patch I am unsure on so would like comment. I believe that
combined queue counts and separate tx/rx queue counts should not be
configurable at the same time (the only driver which allowed both does
requires they can't be defined at the same time). Other drivers
support either separate or combined but not both. If this isn't the
case then go ahead and ignore the third patch.

The fourth patch fixes fm10k to correctly recofigure the RSS reta
table whenever it is still unconfigured. This means that the default
state will provide RSS to every queue. Once the user has configured
RXFH, then we should maintain it. In addition, since the case where we
must reconfigure the RSS table in this case should now no longer
occur, add a dev_err message to indicate the user that we did so.

Jacob Keller (4):
  ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}
  ethtool: ensure channel counts are within bounds during SCHANNELS
  ethtool: can't set combined and tx/rx channel counts at the same time
  fm10k: don't reinitialize RSS flow table when RXFH configured

 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 10 +++-
 include/linux/netdevice.h |  8 +++
 net/core/ethtool.c| 79 ++-
 3 files changed, 93 insertions(+), 4 deletions(-)

-- 
2.7.0.236.gda096a0.dirty



[PATCH 4/4] fm10k: don't reinitialize RSS flow table when RXFH configured

2016-02-08 Thread Jacob Keller
Also print an error message incase we do have to reconfigure as this
should no longer happen anymore due to ethtool changes. If it somehow
does occur, user should be made aware of it.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 134ce4daa994..4e58f9155883 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1937,8 +1937,10 @@ static void fm10k_init_reta(struct fm10k_intfc 
*interface)
u16 i, rss_i = interface->ring_feature[RING_F_RSS].indices;
u32 reta, base;
 
-   /* If the netdev is initialized we have to maintain table if possible */
-   if (interface->netdev->reg_state != NETREG_UNINITIALIZED) {
+   /* If the Rx flow indirection table has been configured manually, we
+* need to maintain it when possible.
+*/
+   if (netif_is_rxfh_configured(interface->netdev)) {
for (i = FM10K_RETA_SIZE; i--;) {
reta = interface->reta[i];
if reta << 24) >> 24) < rss_i) &&
@@ -1946,6 +1948,10 @@ static void fm10k_init_reta(struct fm10k_intfc 
*interface)
(((reta <<  8) >> 24) < rss_i) &&
(((reta)   >> 24) < rss_i))
continue;
+
+   /* this should never happen */
+   dev_err(>pdev->dev,
+   "RSS indirection table assigned flows out of 
queue bounds. Reconfiguring.\n");
goto repopulate_reta;
}
 
-- 
2.7.0.236.gda096a0.dirty



[PATCH 3/4] ethtool: can't set combined and tx/rx channel counts at the same time

2016-02-08 Thread Jacob Keller
Based on reading current implementations of {GS}CHANNELS, most drivers
either support only combined counts or only separate counts. At least
one driver supported setting combined or separate channels. No driver
supported combined and separate channels setting at the same time, and
this patch adds a sanity check to prevent such requests. I am not 100%
sure if this is correct as I was not able to find documentation and
I think it currently depended on driver implementation.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 net/core/ethtool.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 71aff99157c3..7d4cef7b7176 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1297,6 +1297,11 @@ static noinline_for_stack int 
ethtool_set_channels(struct net_device *dev,
(channels.other_count > max.max_other))
return -EINVAL;
 
+   /* can't set combined and separate channels at the same time */
+   if ((channels.combined_count &&
+(channels.rx_count || channels.tx_count))
+   return -EINVAL;
+
/* ensure the new Rx count fits within the configured Rx flow
 * indirection table settings */
if (netif_is_rxfh_configured(dev) &&
-- 
2.7.0.236.gda096a0.dirty



[PATCH 1/4] ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}

2016-02-08 Thread Jacob Keller
Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops
incorrectly allow SCHANNELS when it would conflict with the settings
from SRXFH. This occurs because it is not possible for drivers to
understand whether their Rx flow indirection table has been configured
or is in the default state. In addition, drivers currently behave in
various ways when increasing the number of Rx channels.

Some drivers will always destroy the Rx flow indirection table when this
occurs, whether it has been set by the user or not. Other drivers will
attempt to preserve the table even if the user has never modified it
from the default driver settings. Neither of these situation is
desirable because it leads to unexpected behavior or loss of user
configuration.

The correct behavior is to simply return -EINVAL when SCHANNELS would
conflict with the current Rx flow table settings. However, it should
only do so if the current settings were modified by the user. If we
required that the new settings never conflict with the current (default)
Rx flow settings, we would force users to first reduce their Rx flow
settings and then reduce the number of Rx channels.

This patch proposes a solution implemented in net/core/ethtool.c which
ensures that all drivers behave correctly. It checks whether the RXFH
table has been configured to non-default settings, and stores this
information in a private netdev flag. When the number of channels is
requested to change, it first ensures that the current Rx flow table is
not going to assign flows to now disabled channels.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 include/linux/netdevice.h |  8 +++
 net/core/ethtool.c| 59 +++
 2 files changed, 67 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 289c2314d766..044fa2d88c7f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1291,6 +1291,7 @@ struct net_device_ops {
  * @IFF_OPENVSWITCH: device is a Open vSwitch master
  * @IFF_L3MDEV_SLAVE: device is enslaved to an L3 master device
  * @IFF_TEAM: device is a team device
+ * @IFF_RXFH_CONFIGURED: device has had Rx Flow indirection table configured
  */
 enum netdev_priv_flags {
IFF_802_1Q_VLAN = 1<<0,
@@ -1318,6 +1319,7 @@ enum netdev_priv_flags {
IFF_OPENVSWITCH = 1<<22,
IFF_L3MDEV_SLAVE= 1<<23,
IFF_TEAM= 1<<24,
+   IFF_RXFH_CONFIGURED = 1<<25,
 };
 
 #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
@@ -1345,6 +1347,7 @@ enum netdev_priv_flags {
 #define IFF_OPENVSWITCHIFF_OPENVSWITCH
 #define IFF_L3MDEV_SLAVE   IFF_L3MDEV_SLAVE
 #define IFF_TEAM   IFF_TEAM
+#define IFF_RXFH_CONFIGUREDIFF_RXFH_CONFIGURED
 
 /**
  * struct net_device - The DEVICE structure.
@@ -4045,6 +4048,11 @@ static inline bool netif_is_lag_port(const struct 
net_device *dev)
return netif_is_bond_slave(dev) || netif_is_team_port(dev);
 }
 
+static inline bool netif_is_rxfh_configured(const struct net_device *dev)
+{
+   return dev->priv_flags & IFF_RXFH_CONFIGURED;
+}
+
 /* This device needs to keep skb dst for qdisc enqueue or ndo_start_xmit() */
 static inline void netif_keep_dst(struct net_device *dev)
 {
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index daf04709dd3c..8a40948c1fc6 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -642,6 +642,39 @@ void netdev_rss_key_fill(void *buffer, size_t len)
 }
 EXPORT_SYMBOL(netdev_rss_key_fill);
 
+static inline int ethool_get_max_rxfh_channel(struct net_device *dev, u32 *max)
+{
+   u32 dev_size, current_max = 0;
+   u32 *indir;
+   int ret, i;
+
+   if (!dev->ethtool_ops->get_rxfh_indir_size ||
+   !dev->ethtool_ops->get_rxfh)
+   return -EOPNOTSUPP;
+   dev_size = dev->ethtool_ops->get_rxfh_indir_size(dev);
+   if (dev_size == 0)
+   return -EOPNOTSUPP;
+
+   indir = kcalloc(dev_size, sizeof(indir[0]), GFP_USER);
+   if (!indir)
+   return -ENOMEM;
+
+   ret = dev->ethtool_ops->get_rxfh(dev, indir, NULL, NULL);
+   if (ret)
+   goto out;
+
+   for (i = dev_size; i--;) {
+   if (indir[i] > current_max)
+   current_max = indir[i];
+   }
+
+   *max = current_max;
+
+out:
+   kfree(indir);
+   return ret;
+}
+
 static noinline_for_stack int ethtool_get_rxfh_indir(struct net_device *dev,
 void __user *useraddr)
 {
@@ -738,6 +771,14 @@ static noinline_for_stack int 
ethtool_set_rxfh_indir(struct net_device *dev,
}
 
ret = ops->set_rxfh(dev, indir, NULL, ETH_RSS_HASH_NO_CHANGE);
+   if (ret)
+   goto out;
+
+  

[PATCH 3/4] fm10k: don't reinitialize RSS flow table when RXFH configured

2016-02-08 Thread Jacob Keller
Also print an error message incase we do have to reconfigure as this
should no longer happen anymore due to ethtool changes. If it somehow
does occur, user should be made aware of it.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 134ce4daa994..4e58f9155883 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1937,8 +1937,10 @@ static void fm10k_init_reta(struct fm10k_intfc 
*interface)
u16 i, rss_i = interface->ring_feature[RING_F_RSS].indices;
u32 reta, base;
 
-   /* If the netdev is initialized we have to maintain table if possible */
-   if (interface->netdev->reg_state != NETREG_UNINITIALIZED) {
+   /* If the Rx flow indirection table has been configured manually, we
+* need to maintain it when possible.
+*/
+   if (netif_is_rxfh_configured(interface->netdev)) {
for (i = FM10K_RETA_SIZE; i--;) {
reta = interface->reta[i];
if reta << 24) >> 24) < rss_i) &&
@@ -1946,6 +1948,10 @@ static void fm10k_init_reta(struct fm10k_intfc 
*interface)
(((reta <<  8) >> 24) < rss_i) &&
(((reta)   >> 24) < rss_i))
continue;
+
+   /* this should never happen */
+   dev_err(>pdev->dev,
+   "RSS indirection table assigned flows out of 
queue bounds. Reconfiguring.\n");
goto repopulate_reta;
}
 
-- 
2.7.0.236.gda096a0.dirty



[PATCH v2 4/4] ethtool: support setting default Rx flow indirection table

2016-02-08 Thread Jacob Keller
Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 ethtool.8.in |  5 -
 ethtool.c| 46 --
 2 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index eeffa70415b5..b7d56fbe4484 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -296,7 +296,7 @@ ethtool \- query or control network driver and hardware 
settings
 .IR N \ |
 .BI weight\  W0
 .IR W1
-.RB ...\ ]
+.RB ...\ | \ default \ ]
 .HP
 .B ethtool \-f|\-\-flash
 .I devname file
@@ -805,6 +805,9 @@ Sets the receive flow hash indirection table to spread 
flows between
 receive queues according to the given weights.  The sum of the weights
 must be non-zero and must not exceed the size of the indirection table.
 .TP
+.BI default
+Sets the receive flow hash indirection table to its default value.
+.TP
 .B \-f \-\-flash
 Write a firmware image to flash or other non-volatile memory on the
 device.
diff --git a/ethtool.c b/ethtool.c
index 92c40b823f2c..c05b50f830a7 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3228,13 +3228,11 @@ static int do_grxfh(struct cmd_context *ctx)
return 0;
 }
 
-static int fill_indir_table(u32 *indir_size, u32 *indir, int rxfhindir_equal,
-   char **rxfhindir_weight, u32 num_weights)
+static int fill_indir_table(u32 *indir_size, u32 *indir, int rxfhindir_default,
+   int rxfhindir_equal, char **rxfhindir_weight,
+   u32 num_weights)
 {
u32 i;
-   /*
-* "*indir_size == 0" ==> reset indir to default
-*/
if (rxfhindir_equal) {
for (i = 0; i < *indir_size; i++)
indir[i] = i % rxfhindir_equal;
@@ -3268,6 +3266,9 @@ static int fill_indir_table(u32 *indir_size, u32 *indir, 
int rxfhindir_equal,
}
indir[i] = j;
}
+   } else if (rxfhindir_default) {
+   /* "*indir_size == 0" ==> reset indir to default */
+   *indir_size = 0;
} else {
*indir_size = ETH_RXFH_INDIR_NO_CHANGE;
}
@@ -3275,8 +3276,9 @@ static int fill_indir_table(u32 *indir_size, u32 *indir, 
int rxfhindir_equal,
return 0;
 }
 
-static int do_srxfhindir(struct cmd_context *ctx, int rxfhindir_equal,
-char **rxfhindir_weight, u32 num_weights)
+static int do_srxfhindir(struct cmd_context *ctx, int rxfhindir_default,
+int rxfhindir_equal, char **rxfhindir_weight,
+u32 num_weights)
 {
struct ethtool_rxfh_indir indir_head;
struct ethtool_rxfh_indir *indir;
@@ -3301,7 +3303,8 @@ static int do_srxfhindir(struct cmd_context *ctx, int 
rxfhindir_equal,
indir->cmd = ETHTOOL_SRXFHINDIR;
indir->size = indir_head.size;
 
-   if (fill_indir_table(>size, indir->ring_index, rxfhindir_equal,
+   if (fill_indir_table(>size, indir->ring_index,
+rxfhindir_default, rxfhindir_equal,
 rxfhindir_weight, num_weights)) {
free(indir);
return 1;
@@ -3323,7 +3326,7 @@ static int do_srxfh(struct cmd_context *ctx)
struct ethtool_rxfh rss_head = {0};
struct ethtool_rxfh *rss;
struct ethtool_rxnfc ring_count;
-   int rxfhindir_equal = 0;
+   int rxfhindir_equal = 0, rxfhindir_default = 0;
char **rxfhindir_weight = NULL;
char *rxfhindir_key = NULL;
char *hkey = NULL;
@@ -3332,7 +3335,7 @@ static int do_srxfh(struct cmd_context *ctx)
u32 entry_size = sizeof(rss_head.rss_config[0]);
u32 num_weights = 0;
 
-   if (ctx->argc < 2)
+   if (ctx->argc < 1)
exit_bad_args();
 
while (arg_num < ctx->argc) {
@@ -3357,6 +3360,9 @@ static int do_srxfh(struct cmd_context *ctx)
if (!rxfhindir_key)
exit_bad_args();
++arg_num;
+   } else if (!strcmp(ctx->argp[arg_num], "default")) {
+   ++arg_num;
+   rxfhindir_default = 1;
} else {
exit_bad_args();
}
@@ -3368,6 +3374,18 @@ static int do_srxfh(struct cmd_context *ctx)
return 1;
}
 
+   if (rxfhindir_equal && rxfhindir_default) {
+   fprintf(stderr,
+   "Equal and default options are mutually exclusive\n");
+   return 1;
+   }
+
+   if (rxfhindir_weight && rxfhindir_default) {
+   fprintf(stderr,
+   "Weight and default options are mutually exclusive\n");
+   return 1;
+   }
+
ring_count.cmd = ETHTOOL_GRXRINGS;
err = send_ioctl(ctx, _count);
if (err <

[PATCH v2 0/4] ethtool: correct {GS}CHANNELS and {GS}RXFH conflict

2016-02-08 Thread Jacob Keller
This patch series fixes up ethtool_set_channels operation which
allowed modifying the RXFH table indirectly by reducing the number of
queues below the current max queue used by the Rx flow table. Most
drivers incorrectly allowed this to destroy the Rx flow table and
would then start by reinitializing it to default settings. However,
drivers are not able to correctly handle the conflict since there was
no way to differentiate between the default settings and the user
requested explicit settings.

To fix this, implement a new netdev private flag which we use to
indicate whether the RXFH has been user configured. If someone has
a better alternative of how to store this information, let me know.
I am not sure that priv_flags is the best solution but I have not had
any better idea.

Secondly, we add a function which just calls the driver's get_rxfh
callback to determine the current indirection table. Loop through this
and we can determine the current highest queue that will be used by
RSS.

Now, modify ethtool_set_channels to add a check ensuring that if (a)
we have had rxfh configured by user, (b) we can get the maximum RSS
queue currently used, then we ensure that the newly requested Rx count
(or combined count) is at least as high as this maximum RSS queue. The
reasoning here is that we can always safely increase the number of
queues. If we decrease the queues we must ensure that the decrease
does not go lower than the highest in-use queue for the Rx flow table.

Drivers may still need to be patched if they currently overwrite the
Rx flow table during channel configuration. If the driver currently
always resets Rx flow table when increasing number of queues it must
be patched to only do this when netif_is_rxfh_configured returns
false.

The second patch simply adds a check to ensure that all provided
channel counts fit within driver defined maximums.

The third patch fixes fm10k to correctly reconfigure the RSS reta
table whenever it is still unconfigured. This means that the default
state will provide RSS to every queue. Once the user has configured
RXFH, then we should maintain it. In addition, since the case where we
must reconfigure the RSS table in this case should now no longer
occur, add a dev_err message to indicate the user that we did so.

I have also supplied an ethtool patch to enable setting the default Rx
flow indirection table. Without this, current ethtool does not support
sending an indir_size of 0, and thus does not correctly support
configuring back to the default.

Changes in v2:
* fixed compile error
* fixed incorrect comparison with max_rx_in_use
* adjusted looping over dev_size
* removed inline on function
* dropped patch about separating combined vs asymmetric channels
* verified behavior using fm10k driver

Jacob Keller (3):
  ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}
  ethtool: ensure channel counts are within bounds during SCHANNELS
  fm10k: don't reinitialize RSS flow table when RXFH configured

 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 10 +++-
 include/linux/netdevice.h |  8 +++
 net/core/ethtool.c| 71 ++-
 3 files changed, 85 insertions(+), 4 deletions(-)

-- 
2.7.0.236.gda096a0.dirty



[PATCH 2/4] ethtool: ensure channel counts are within bounds during SCHANNELS

2016-02-08 Thread Jacob Keller
Add a sanity check to ensure that all requested channel sizes are within
bounds, which should reduce errors in driver implementation.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 net/core/ethtool.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 59aebaf9ed54..dc4f6632f33b 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1274,15 +1274,24 @@ static noinline_for_stack int 
ethtool_get_channels(struct net_device *dev,
 static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
   void __user *useraddr)
 {
-   struct ethtool_channels channels;
+   struct ethtool_channels channels, max;
u32 max_rx_in_use = 0;
 
-   if (!dev->ethtool_ops->set_channels)
+   if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
return -EOPNOTSUPP;
 
if (copy_from_user(, useraddr, sizeof(channels)))
return -EFAULT;
 
+   dev->ethtool_ops->get_channels(dev, );
+
+   /* ensure new counts are within the maximums */
+   if ((channels.rx_count > max.max_rx) ||
+   (channels.tx_count > max.max_tx) ||
+   (channels.combined_count > max.max_combined) ||
+   (channels.other_count > max.max_other))
+   return -EINVAL;
+
/* ensure the new Rx count fits within the configured Rx flow
 * indirection table settings */
if (netif_is_rxfh_configured(dev) &&
-- 
2.7.0.236.gda096a0.dirty



[PATCH 1/4] ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}

2016-02-08 Thread Jacob Keller
Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops
incorrectly allow SCHANNELS when it would conflict with the settings
from SRXFH. This occurs because it is not possible for drivers to
understand whether their Rx flow indirection table has been configured
or is in the default state. In addition, drivers currently behave in
various ways when increasing the number of Rx channels.

Some drivers will always destroy the Rx flow indirection table when this
occurs, whether it has been set by the user or not. Other drivers will
attempt to preserve the table even if the user has never modified it
from the default driver settings. Neither of these situation is
desirable because it leads to unexpected behavior or loss of user
configuration.

The correct behavior is to simply return -EINVAL when SCHANNELS would
conflict with the current Rx flow table settings. However, it should
only do so if the current settings were modified by the user. If we
required that the new settings never conflict with the current (default)
Rx flow settings, we would force users to first reduce their Rx flow
settings and then reduce the number of Rx channels.

This patch proposes a solution implemented in net/core/ethtool.c which
ensures that all drivers behave correctly. It checks whether the RXFH
table has been configured to non-default settings, and stores this
information in a private netdev flag. When the number of channels is
requested to change, it first ensures that the current Rx flow table is
not going to assign flows to now disabled channels.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 include/linux/netdevice.h |  8 +++
 net/core/ethtool.c| 55 +++
 2 files changed, 63 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 289c2314d766..044fa2d88c7f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1291,6 +1291,7 @@ struct net_device_ops {
  * @IFF_OPENVSWITCH: device is a Open vSwitch master
  * @IFF_L3MDEV_SLAVE: device is enslaved to an L3 master device
  * @IFF_TEAM: device is a team device
+ * @IFF_RXFH_CONFIGURED: device has had Rx Flow indirection table configured
  */
 enum netdev_priv_flags {
IFF_802_1Q_VLAN = 1<<0,
@@ -1318,6 +1319,7 @@ enum netdev_priv_flags {
IFF_OPENVSWITCH = 1<<22,
IFF_L3MDEV_SLAVE= 1<<23,
IFF_TEAM= 1<<24,
+   IFF_RXFH_CONFIGURED = 1<<25,
 };
 
 #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
@@ -1345,6 +1347,7 @@ enum netdev_priv_flags {
 #define IFF_OPENVSWITCHIFF_OPENVSWITCH
 #define IFF_L3MDEV_SLAVE   IFF_L3MDEV_SLAVE
 #define IFF_TEAM   IFF_TEAM
+#define IFF_RXFH_CONFIGUREDIFF_RXFH_CONFIGURED
 
 /**
  * struct net_device - The DEVICE structure.
@@ -4045,6 +4048,11 @@ static inline bool netif_is_lag_port(const struct 
net_device *dev)
return netif_is_bond_slave(dev) || netif_is_team_port(dev);
 }
 
+static inline bool netif_is_rxfh_configured(const struct net_device *dev)
+{
+   return dev->priv_flags & IFF_RXFH_CONFIGURED;
+}
+
 /* This device needs to keep skb dst for qdisc enqueue or ndo_start_xmit() */
 static inline void netif_keep_dst(struct net_device *dev)
 {
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index daf04709dd3c..59aebaf9ed54 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -642,6 +642,37 @@ void netdev_rss_key_fill(void *buffer, size_t len)
 }
 EXPORT_SYMBOL(netdev_rss_key_fill);
 
+static int ethtool_get_max_rxfh_channel(struct net_device *dev, u32 *max)
+{
+   u32 dev_size, current_max = 0;
+   u32 *indir;
+   int ret;
+
+   if (!dev->ethtool_ops->get_rxfh_indir_size ||
+   !dev->ethtool_ops->get_rxfh)
+   return -EOPNOTSUPP;
+   dev_size = dev->ethtool_ops->get_rxfh_indir_size(dev);
+   if (dev_size == 0)
+   return -EOPNOTSUPP;
+
+   indir = kcalloc(dev_size, sizeof(indir[0]), GFP_USER);
+   if (!indir)
+   return -ENOMEM;
+
+   ret = dev->ethtool_ops->get_rxfh(dev, indir, NULL, NULL);
+   if (ret)
+   goto out;
+
+   while (dev_size--)
+   current_max = max(current_max, indir[dev_size]);
+
+   *max = current_max;
+
+out:
+   kfree(indir);
+   return ret;
+}
+
 static noinline_for_stack int ethtool_get_rxfh_indir(struct net_device *dev,
 void __user *useraddr)
 {
@@ -738,6 +769,14 @@ static noinline_for_stack int 
ethtool_set_rxfh_indir(struct net_device *dev,
}
 
ret = ops->set_rxfh(dev, indir, NULL, ETH_RSS_HASH_NO_CHANGE);
+   if (ret)
+   goto out;
+
+   /* indicate whether rxfh was set to 

[PATCH 0/2] ethtool: {SG}RXFH indirection deficiency

2016-02-05 Thread Jacob Keller
This patch set adds a new ethtool operation .reset_rxfh_indir which is
used by the core ethtool stack to properly indicate to a driver that
the default RSS indirection table has been requested. Current behavior
for notifying the default settings is indistinguishable from an
explicit request. There is no easy way to look at the indirection
table and tell if it matches the default, either. To allow drivers the
ability to correctly report -EINVAL when changing the number of
channels, add the new operation suggested. I chose to use a new
ethtool op instead of an additional flag since this has a lower impact
and we already use NULL on the *indir variable in set_rxfh to indicate
no change was requested.

The second patch in the series is an example implementation of the
.reset_rxfh_indir operation along with fixes to the fm10k_set_channels
to prevent changing the number of channels if it would interfere
with the current redirection table.

Jacob Keller (2):
  ethtool: support notifying drivers when user requests default rxfh
table
  fm10k: correctly report error when changing number of channels

 drivers/net/ethernet/intel/fm10k/fm10k.h |  2 ++
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 45 
 drivers/net/ethernet/intel/fm10k/fm10k_main.c| 11 --
 include/linux/ethtool.h  |  3 ++
 net/core/ethtool.c   |  8 +
 5 files changed, 66 insertions(+), 3 deletions(-)

-- 
2.7.0.236.gda096a0.dirty



[PATCH 1/2] ethtool: support notifying drivers when user requests default rxfh table

2016-02-05 Thread Jacob Keller
Currently, userspace ethtool supports requesting the default indirection
table by passing 0 as the indir_size. However, ops->set_rxfh does not
distinguish between user requesting default via indir_size=0 and user
requesting explicitly settings which are equivalent of the default.

This causes problems because other driver changes such as number of
channels (queues) should fail when they will not work with other current
settings such as the indirection table.

If there is no way to indicate when the driver is in default RSS table
state compared to user set states, then we wouldn't ever allow changing
the number of queues at all. To fix this, drivers must be able to
distinguish between requested settings and the configured defaults.

We can't use a value of NULL to the *indir pointer as this is already
used to indicate no change to the indirection table. Instead, implement
a new callback ops->reset_rxfh_indir which is used whenever the user
requests size of zero. This has lower impact than adding a new flag and
can be implemented by drivers as necessary.

In this way we have the following scenarios now:

(a) driver is in default configuration, and is free to change RSS
settings as necessary due to other changes such as number of queues.
This makes sense since we can essentially consider this as "RSS
indirection has not been configured". This is the default state of a new
device. In this state, I think the reasonable default is "Always RSS to
all enabled queues".

(b) user has requested RSS indirection settings, should change the
driver state to "RSS indirection table has been configured". Now if the
user requests a change in the number of queues, we can properly indicate
an error when these settings would conflict with the requested RSS
indirection table.

(c) The user can request default settings via the {GS}RXFH operations
and the driver will get a clear indication that it should reset to the
default "RSS indirection table has not been configured" mode. In this
way it will be able to then change the number of queues and go about
business.

If we don't have a way to properly indicate that we've reset to default
then we are not able to implement the proposed behavior, so this patch
adds a new method to properly indicate that we have reset to the default
indirection table.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
Cc: Dave Miller <da...@davemloft.net>
---
This is an alternative proposal to my previous patch. I do not believe
it is possible to obtain desired behavior without this patch, as it is
not possible for the driver to distinguish default settings from user
configured RSS table. If we don't do that, then the user will never be
able to reduce the number of queues without first modifying the RSS
redirection table, which seems wrong to me.

 include/linux/ethtool.h | 3 +++
 net/core/ethtool.c  | 8 
 2 files changed, 11 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 653dc9c4ebac..700ac5658d34 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -186,6 +186,8 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 
n_rx_rings)
  * will remain unchanged.
  * Returns a negative error code or zero. An error code must be returned
  * if at least one unsupported change was requested.
+ * @reset_rxfh_indir: Reset the contents of the RX flow hash indirection table
+ * to driver defaults. Returns a negative error code or zero.
  * @get_channels: Get number of channels.
  * @set_channels: Set number of channels.  Returns a negative error code or
  * zero.
@@ -262,6 +264,7 @@ struct ethtool_ops {
u8 *hfunc);
int (*set_rxfh)(struct net_device *, const u32 *indir,
const u8 *key, const u8 hfunc);
+   int (*reset_rxfh_indir)(struct net_device *);
void(*get_channels)(struct net_device *, struct ethtool_channels *);
int (*set_channels)(struct net_device *, struct ethtool_channels *);
int (*get_dump_flag)(struct net_device *, struct ethtool_dump *);
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index daf04709dd3c..4c6a1c2b8b61 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -725,7 +725,13 @@ static noinline_for_stack int 
ethtool_set_rxfh_indir(struct net_device *dev,
if (ret)
goto out;
 
+   /* user_size == 0 means reset the indir table to default. */
if (user_size == 0) {
+   if (ops->reset_rxfh_indir) {
+   err = ops->reset_rxfh_indir(dev);
+   goto out;
+   }
+
for (i = 0; i < dev_size; i++)
indir[i] = ethtool_rxfh_indir_default(i, rx_rings.data);
} else {
@@ -880,6 +886,8 @@ static noinline_for_stack int ethtool_set_rxfh(struct 
net_device *dev,

[PATCH 2/2] fm10k: correctly report error when changing number of channels

2016-02-05 Thread Jacob Keller
Previously, the fm10k driver would incorrectly allow changing the number
of combined channels when this would have altered user configured RSS
indirection table. With the new ethtool operation .reset_rxfh_indir, we
are now able to correctly handle the changes to number of channels. This
requires several changes:

(a) we must first store whether or not the RSS redirection table has
been manually configured, as there is no way to tell the default
table from an explicit request. Do this by implementing
reset_rxfh_indir ethtool op, and storing a flag to indicate how the
redirection table is set.

(b) replace the fm10k_init_reta code with a check of intialized
netdevice to instead check the new FM10K_FLAG_RETA_TABLE_CONFIGURED.
This will ensure that the table is always repopulated if we're in
the default (unconfigured) state. We still must repopulate if
somehow the reta table is changed to what will become an invalid
state due to new RSS limits. However, since this should no longer
happen, add a dev_err() call to indicate clearly to user what
happened.

(c) modify the fm10k_set_channels call to check that th new count is
within bounds on the reta table. This check is only enforced if the
user has manually configured the RSS indirection table, since the
driver should be free to repopulate its own default configuration.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k.h |  2 ++
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 45 
 drivers/net/ethernet/intel/fm10k/fm10k_main.c| 11 --
 3 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h 
b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 83f386714e87..983bdda9509b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -268,6 +268,7 @@ struct fm10k_intfc {
 #define FM10K_FLAG_RX_TS_ENABLED   (u32)(BIT(3))
 #define FM10K_FLAG_SWPRI_CONFIG(u32)(BIT(4))
 #define FM10K_FLAG_DEBUG_STATS (u32)(BIT(5))
+#define FM10K_FLAG_RETA_TABLE_CONFIGURED   (u32)(BIT(6))
int xcast_mode;
 
/* Tx fast path data */
@@ -475,6 +476,7 @@ netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
 void fm10k_tx_timeout_reset(struct fm10k_intfc *interface);
 bool fm10k_check_tx_hang(struct fm10k_ring *tx_ring);
 void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count);
+void fm10k_init_reta(struct fm10k_intfc *interface);
 
 /* PCI */
 void fm10k_mbx_free_irq(struct fm10k_intfc *);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index 6a9f9886cb98..febfa2b009ea 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -1060,6 +1060,8 @@ static int fm10k_set_reta(struct net_device *netdev, 
const u32 *indir)
if (!indir)
return 0;
 
+   interface->flags |= FM10K_FLAG_RETA_TABLE_CONFIGURED;
+
/* Verify user input. */
rss_i = interface->ring_feature[RING_F_RSS].indices;
for (i = fm10k_get_reta_size(netdev); i--;) {
@@ -1137,6 +1139,25 @@ static int fm10k_set_rssh(struct net_device *netdev, 
const u32 *indir,
return 0;
 }
 
+static int fm10k_reset_rssh(struct net_device *netdev)
+{
+   struct fm10k_intfc *interface = netdev_priv(netdev);
+   struct fm10k_hw *hw = >hw;
+   int i;
+
+   /* user has requested default configuration so clear configured flag */
+   interface->flags &= ~FM10K_FLAG_RETA_TABLE_CONFIGURED;
+
+   /* initialize the reta table to driver defaults */
+   fm10k_init_reta(interface);
+
+   /* write the new RETA table to hardware */
+   for (i = 0; i < FM10K_RETA_SIZE; i++)
+   fm10k_write_reg(hw, FM10K_RETA(0, i), interface->reta[i]);
+
+   return 0;
+}
+
 static unsigned int fm10k_max_channels(struct net_device *dev)
 {
struct fm10k_intfc *interface = netdev_priv(dev);
@@ -1173,6 +1194,8 @@ static int fm10k_set_channels(struct net_device *dev,
struct fm10k_intfc *interface = netdev_priv(dev);
unsigned int count = ch->combined_count;
struct fm10k_hw *hw = >hw;
+   u32 reta0, reta1, reta2, reta3;
+   int i, rss_i = 0;
 
/* verify they are not requesting separate vectors */
if (!count || ch->rx_count || ch->tx_count)
@@ -1186,6 +1209,27 @@ static int fm10k_set_channels(struct net_device *dev,
if (count > fm10k_max_channels(dev))
return -EINVAL;
 
+   /* determine the current number of queues used by the reta table */
+   for (i = FM10K_RETA_SIZE; i--;) {
+   reta0 = (interface->reta[i] << 24) >> 24;
+   reta1 = (interface->reta[i] << 

[PATCH] ethtool: add support for dynamic mode in {SG}RXFH commands

2016-02-02 Thread Jacob Keller
Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 ethtool-copy.h |  8 +++-
 ethtool.c  | 36 +++-
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index d23ffc4c38b4..620dcea06d25 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -890,6 +890,10 @@ struct ethtool_rxfh_indir {
  * hardware hash key.
  * @hfunc: Defines the current RSS hash function used by HW (or to be set to).
  * Valid values are one of the %ETH_RSS_HASH_*.
+ * @dynamic: Indicate whether the device driver may use dynamic RSS settings
+ * which change due to various run time factors, such as number of
+ * queues. When false driver must attempt to preserve RSS settings when
+ * possible. When true driver may override any requested RSS settings.
  * @rsvd:  Reserved for future extensions.
  * @rss_config: RX ring/queue index for each hash value i.e., indirection table
  * of @indir_size __u32 elements, followed by hash key of @key_size
@@ -900,6 +904,7 @@ struct ethtool_rxfh_indir {
  * %ETH_RXFH_INDIR_NO_CHANGE means that indir table setting is not requested
  * and a @indir_size of zero means the indir table should be reset to default
  * values. An hfunc of zero means that hash function setting is not requested.
+ * If dynamic is true, driver may ignore any other settings requested.
  */
 struct ethtool_rxfh {
__u32   cmd;
@@ -907,7 +912,8 @@ struct ethtool_rxfh {
__u32   indir_size;
__u32   key_size;
__u8hfunc;
-   __u8rsvd8[3];
+   __u8dynamic;
+   __u8rsvd8[2];
__u32   rsvd32;
__u32   rss_config[0];
 };
diff --git a/ethtool.c b/ethtool.c
index 92c40b823f2c..29b9279b6b1c 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -917,6 +917,19 @@ static int convert_string_to_hashkey(char *rss_hkey, u32 
key_size,
return 2;
 }
 
+static u8 parse_dynamic(const char *rss_dynamic, u8 *dynamic)
+{
+   if (!strcmp(rss_dynamic, "on")) {
+   *dynamic = 1;
+   return 0;
+   } else if (!strcmp(rss_dynamic, "off")) {
+   *dynamic = 0;
+   return 0;
+   } else {
+   return 2;
+   }
+}
+
 static int parse_hkey(char **rss_hkey, u32 key_size,
  const char *rss_hkey_string)
 {
@@ -3213,6 +3226,11 @@ static int do_grxfh(struct cmd_context *ctx)
indir_bytes = rss->indir_size * sizeof(rss->rss_config[0]);
hkey = ((char *)rss->rss_config + indir_bytes);
 
+   if (rss->dynamic)
+   printf("Dynamic mode enabled\n");
+   else
+   printf("Static mode enabled\n");
+
printf("RSS hash key:\n");
if (!rss->key_size)
printf("Operation not supported\n");
@@ -3326,11 +3344,13 @@ static int do_srxfh(struct cmd_context *ctx)
int rxfhindir_equal = 0;
char **rxfhindir_weight = NULL;
char *rxfhindir_key = NULL;
+   char *rxfh_dynamic = NULL;
char *hkey = NULL;
int err = 0;
u32 arg_num = 0, indir_bytes = 0;
u32 entry_size = sizeof(rss_head.rss_config[0]);
u32 num_weights = 0;
+   u8 dynamic = 0;
 
if (ctx->argc < 2)
exit_bad_args();
@@ -3357,6 +3377,12 @@ static int do_srxfh(struct cmd_context *ctx)
if (!rxfhindir_key)
exit_bad_args();
++arg_num;
+   } else if (!strcmp(ctx->argp[arg_num], "dynamic")) {
+   ++arg_num;
+   rxfh_dynamic = ctx->argp[arg_num];
+   if (!rxfhindir_key)
+   exit_bad_args();
+   ++arg_num;
} else {
exit_bad_args();
}
@@ -3392,6 +3418,12 @@ static int do_srxfh(struct cmd_context *ctx)
return err;
}
 
+   if (rxfh_dynamic) {
+   err = parse_dynamic(rxfh_dynamic, );
+   if (err)
+   return err;
+   }
+
if (rxfhindir_equal || rxfhindir_weight)
indir_bytes = rss_head.indir_size * entry_size;
 
@@ -3403,6 +3435,7 @@ static int do_srxfh(struct cmd_context *ctx)
rss->cmd = ETHTOOL_SRSSH;
rss->indir_size = rss_head.indir_size;
rss->key_size = rss_head.key_size;
+   rss->dynamic = dynamic;
 
if (fill_indir_table(>indir_size, rss->rss_config, rxfhindir_equal,
 rxfhindir_weight, num_weights)) {
@@ -4112,7 +4145,8 @@ static const struct option {
{ "-X|--set-rxfh-indir|--rxfh", 1, do_srxfh,
  "Set Rx flow hash indirection and/or hash key",
  " [ equal N | weight W0 W1 ... ]\n"
- "  

[PATCH 1/2] ethtool: add dynamic flag to ETHTOOL_{GS}RXFH commands

2016-02-02 Thread Jacob Keller
Ethtool supports a few operations for modifying and controlling
a device's RSS table. Sometimes, changes in other features of the device
may require (or desire) changes to the RSS table. Currently there is no
method to indicate to the driver whether the current RSS table settings
should be maintained or overridden.

A simple example of this is for when the number of receive queues is
changed, there are two possibilities. First, the number of queues is
decreased. This must result in a reprogramming of the RSS table since it
will no longer match correctly and may attempt to assign traffic to
a queue which is now disabled. In this case drivers have a clear
indication of what to do.

The second case, is when the number of queues has increased. In this
case, the current RSS table may be preserved. However, doing so would
result in the new queues being unused for RSS. But if the driver chooses
to destroy the RSS configuration it may result in unwanted behavior as
now the user's configured changes are lost.

This patch attempts to resolve this (and other similar) issues by
indicating a new flag "dynamic" which can be set by the user when
calling the ethtool interface.

This flag indicates to the driver that it may overwrite settings in the
RSS table. If false, it indicates the driver should do what it can to
preserve the RSS table changes requested by the user. That is, for cases
where it can preserve the table it must. If the value is set true, it
means the driver may or may not apply the current settings and is free
to change the values as necessary. The current default is set to false,
as this is how most drivers appear to behave today.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
Cc: Lendacky, Thomas <thomas.lenda...@amd.com>
Cc: Yuval Mintz <yuval...@broadcom.com>
Cc: Michael Chan <mc...@broadcom.com>
Cc: Matt Carlson <mcarl...@broadcom.com>
Cc: Sunil Goutham <sgout...@cavium.com>
Cc: Hariprasad Shenai <haripra...@chelsio.com>
Cc: Govindarajulu Varadarajan <_gov...@gmx.com>
Cc: Kalesh AP <kalesh.pura...@emulex.com>
Cc: Andrew Lunn <and...@lunn.ch>
Cc: Shannon Nelson <shannon.nel...@intel.com>
Cc: Mitch Williams <mitch.a.willi...@intel.com>
Cc: Carolyn Wyborny <carolyn.wybo...@intel.com>
Cc: Emil Tantilov <emil.s.tanti...@intel.com>
Cc: Thomas Petazzoni <thomas.petazz...@free-electrons.com>
Cc: Amir Vadai <am...@mellanox.com>
Cc: Achiad Shochat <ach...@mellanox.com>
Cc: Ben Hutchings <bhutchi...@solarflare.com>
Cc: Michał Mirosław <mirq-li...@rere.qmqm.pl>
Cc: Alexander Duyck <adu...@mirantis.com>

---
 drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c   |  7 --
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c|  7 --
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  5 +++-
 drivers/net/ethernet/broadcom/tg3.c|  8 +--
 .../net/ethernet/cavium/thunder/nicvf_ethtool.c|  7 --
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c |  7 --
 drivers/net/ethernet/cisco/enic/enic_ethtool.c |  7 --
 drivers/net/ethernet/emulex/benet/be_ethtool.c |  7 --
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |  8 +--
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c   |  7 --
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  7 --
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c |  6 +++--
 drivers/net/ethernet/intel/igb/igb_ethtool.c   |  6 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c   |  7 --
 drivers/net/ethernet/intel/ixgbevf/ethtool.c   |  5 +++-
 drivers/net/ethernet/marvell/mvneta.c  |  8 +--
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c|  7 --
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  7 --
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  7 --
 drivers/net/ethernet/sfc/ethtool.c |  7 --
 include/linux/ethtool.h|  4 ++--
 include/uapi/linux/ethtool.h   |  8 ++-
 net/core/ethtool.c | 27 +-
 23 files changed, 124 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
index 6040293db9c1..4eecd225db7c 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
@@ -509,11 +509,14 @@ static u32 xgbe_get_rxfh_indir_size(struct net_device 
*netdev)
 }
 
 static int xgbe_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key,
-u8 *hfunc)
+u8 *hfunc, u8 *dynamic)
 {
struct xgbe_prv_data *pdata = netdev_priv(netdev);
unsigned int i;
 
+   if (dynamic)
+   *dynamic = false;
+
if (indir) {
for (i = 0; i < ARRAY_SIZE(pdata->rss_table); i++)
ind

[PATCH 2/2] fm10k: support dynamic mode for RSS table control

2016-02-02 Thread Jacob Keller
Add support for the new dynamic flag from set_rxfh and get_rxfh. For now
the only known dynamic reason to change RSS is when number of queues is
changed. The default mode for the driver will be dynamic, indicating
that the driver is free to change its own default setting as it sees
fit. Since the current default userspace ethtool operation will only
send false as the value to dynamic this results in the behavior that the
driver configured RSS settings are capable of being modified but user
supplied changes will be preserved as long as possible.

In practice, this resolves an issue where decreasing the number of
queues and then increasing them shall no longer break RSS unless the
user has manually configured a non-dynamic RSS table setting.

If the user has configured the RSS table without setting dynamic, the
driver will do its best to maintain the configuration. If this is not
possible, we will indicate that the mode is now dynamic and reset to
driver defaults.

Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k.h |  1 +
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 10 +-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c|  8 +---
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h 
b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 83f386714e87..5402b5c55247 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -268,6 +268,7 @@ struct fm10k_intfc {
 #define FM10K_FLAG_RX_TS_ENABLED   (u32)(BIT(3))
 #define FM10K_FLAG_SWPRI_CONFIG(u32)(BIT(4))
 #define FM10K_FLAG_DEBUG_STATS (u32)(BIT(5))
+#define FM10K_FLAG_STATIC_RETA_TBL (u32)(BIT(6))
int xcast_mode;
 
/* Tx fast path data */
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index af11c4c1b256..8799296ff86e 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -1100,7 +1100,7 @@ static int fm10k_get_rssh(struct net_device *netdev, u32 
*indir, u8 *key,
*hfunc = ETH_RSS_HASH_TOP;
 
if (dynamic)
-   *dynamic = false;
+   *dynamic = !(interface->flags & FM10K_FLAG_STATIC_RETA_TBL);
 
err = fm10k_get_reta(netdev, indir);
if (err || !key)
@@ -1123,6 +1123,14 @@ static int fm10k_set_rssh(struct net_device *netdev, 
const u32 *indir,
if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
return -EOPNOTSUPP;
 
+   /* If dynamic mode is not requested, enable the static flag. We'll
+* still attempt to populate the RETA table using the provided
+* settings if possible. */
+   if (dynamic)
+   interface->flags &= ~FM10K_FLAG_STATIC_RETA_TBL;
+   else
+   interface->flags |= FM10K_FLAG_STATIC_RETA_TBL;
+
err = fm10k_set_reta(netdev, indir);
if (err || !key)
return err;
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 134ce4daa994..8c75f60028a6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1932,13 +1932,13 @@ static void fm10k_assign_rings(struct fm10k_intfc 
*interface)
fm10k_cache_ring_rss(interface);
 }
 
-static void fm10k_init_reta(struct fm10k_intfc *interface)
+void fm10k_init_reta(struct fm10k_intfc *interface)
 {
u16 i, rss_i = interface->ring_feature[RING_F_RSS].indices;
u32 reta, base;
 
-   /* If the netdev is initialized we have to maintain table if possible */
-   if (interface->netdev->reg_state != NETREG_UNINITIALIZED) {
+   /* Maintain static user provided table if possible. */
+   if (interface->flags & FM10K_FLAG_STATIC_RETA_TBL) {
for (i = FM10K_RETA_SIZE; i--;) {
reta = interface->reta[i];
if reta << 24) >> 24) < rss_i) &&
@@ -1954,6 +1954,8 @@ static void fm10k_init_reta(struct fm10k_intfc *interface)
}
 
 repopulate_reta:
+   interface->flags &= ~FM10K_FLAG_STATIC_RETA_TBL;
+
/* Populate the redirection table 4 entries at a time.  To do this
 * we are generating the results for n and n+2 and then interleaving
 * those with the results with n+1 and n+3.
-- 
2.6.3.505.g5cc1fd1



[PATCH] ethtool: add dynamic flag to {SG}RXFH

2016-02-02 Thread Jacob Keller
This patch series proposes the addition of a dynamic flag to the
ethtool {SG}RXFH operations. The primary reasoning for this is so that
drivers may indicate when they destroyed configured RSS settings, and
can determine when they have more liberty to remove user's settings.
The default mode shall be the current static mode, where drivers
should make best effort to maintain the RSS table settings if
possible. However, if the user sets dynamic mode flag, then the driver
is free (if necessary or useful) to modify the requested settings. The
primary reason for this is when queue sizes change dynamically. If you
increase the number of available queues, the RSS table may be
il-configured and the driver might which to change the settings.
Today, most drivers attempt to maintain the RSS table when possible.
This means that a user can observe functioning RSS, decrease the
number of queues, and then increase them again. Under current
functionality, drivers may end up never re-writing the RSS table back
to the default when the queues are increased. Even worse, if a driver
does do this today, they may have destroyed some specific settings the
user configured in the RSS table. Instead, use the dynamic mode value
which the driver will use to indicate whether or not the current
settings might change due to dynamic factors.

This series includes support to fix all the driver function pointers,
and a patch to enhance fm10k driver to support this feature. The
previous behavior of the fm10k driver was especially problematic, and
the current patch series attempts to resolve this.

It is possible to have most of the behavior handled purely in driver,
but then we lose any ability to communicate this to the user via
ethtool.

Jacob Keller (3):
  ethtool: add dynamic flag to ETHTOOL_{GS}RXFH commands
  fm10k: support dynamic mode for RSS table control
  ethtool: add support for dynamic mode in {SG}RXFH commands

 drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c |  7 +--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c  |  7 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c|  5 -
 drivers/net/ethernet/broadcom/tg3.c  |  8 ++--
 drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c  |  7 +--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c   |  7 +--
 drivers/net/ethernet/cisco/enic/enic_ethtool.c   |  7 +--
 drivers/net/ethernet/emulex/benet/be_ethtool.c   |  7 +--
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c |  8 ++--
 drivers/net/ethernet/intel/fm10k/fm10k.h |  1 +
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 15 +--
 drivers/net/ethernet/intel/fm10k/fm10k_main.c|  8 +---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c   |  7 +--
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c   |  6 --
 drivers/net/ethernet/intel/igb/igb_ethtool.c |  6 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |  7 +--
 drivers/net/ethernet/intel/ixgbevf/ethtool.c |  5 -
 drivers/net/ethernet/marvell/mvneta.c|  8 ++--
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c  |  7 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  7 +--
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c |  7 +--
 drivers/net/ethernet/sfc/ethtool.c   |  7 +--
 include/linux/ethtool.h  |  4 ++--
 include/uapi/linux/ethtool.h |  8 +++-
 net/core/ethtool.c   | 27 
---
 25 files changed, 138 insertions(+), 55 deletions(-)

 ethtool-copy.h |  8 +++-
 ethtool.c  | 36 +++-
 2 files changed, 42 insertions(+), 2 deletions(-)

-- 
2.6.3.505.g5cc1fd1