[net-next] i40e(vf): remove i40e_ethtool_stats.h header file
Essentially reverts commit 8fd75c58a09a ("i40e: move ethtool stats boiler plate code to i40e_ethtool_stats.h", 2018-08-30), and additionally moves the similar code in i40evf into i40evf_ethtool.c. The code was intially moved from i40e_ethtool.c into i40e_ethtool_stats.h as a way of better logically organizing the code. This has two problems. First, we can't have an inline function with variadic arguments on all platforms. Second, it gave the appearance that we had plans to share code between the i40e and i40evf drivers, due to having a near copy of the contents in the i40evf/i40e_ethtool_stats.h file. Patches which actually attempt to combine or share code between the i40e and i40evf drivers have not materialized, and are likely a ways off. Rather than fixing the one function which causes build issues, just move this code back into the i40e_ethtool.c and i40evf_ethtool.c files. Note that we also change these functions back from static inlines to just statics, since they're no longer in a header file. We can revisit this if/when work is done to actually attempt to share code between drivers. Alternatively, this stats code could be made more generic so that it can be shared across drivers as part of ethtool kernel work. Signed-off-by: Jacob Keller --- Sorry aboutthe delay, our iwl queue maintainer Jeff is on vacation, and discussion about how best to resolve this issue was/is ongoing in the IWL list. I opted to just move the contents of these files back into i40e_ethtool.c and i40evf_ethtool.c respectively, as I think it's better than only moving the single offending function back into the .c file. This is different from the approach suggested by Wang, Dongsheng on Intel Wired LAN, but I think it's a better approach for now. I've also kindly asked if Aaron Brown could test this out. .../net/ethernet/intel/i40e/i40e_ethtool.c| 219 - .../ethernet/intel/i40e/i40e_ethtool_stats.h | 221 -- .../intel/i40evf/i40e_ethtool_stats.h | 221 -- .../ethernet/intel/i40evf/i40evf_ethtool.c| 219 - 4 files changed, 436 insertions(+), 444 deletions(-) delete mode 100644 drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h delete mode 100644 drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c index d7d3974beca2..87fe2e60602f 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c @@ -6,7 +6,224 @@ #include "i40e.h" #include "i40e_diag.h" -#include "i40e_ethtool_stats.h" +/* ethtool statistics helpers */ + +/** + * struct i40e_stats - definition for an ethtool statistic + * @stat_string: statistic name to display in ethtool -S output + * @sizeof_stat: the sizeof() the stat, must be no greater than sizeof(u64) + * @stat_offset: offsetof() the stat from a base pointer + * + * This structure defines a statistic to be added to the ethtool stats buffer. + * It defines a statistic as offset from a common base pointer. Stats should + * be defined in constant arrays using the I40E_STAT macro, with every element + * of the array using the same _type for calculating the sizeof_stat and + * stat_offset. + * + * The @sizeof_stat is expected to be sizeof(u8), sizeof(u16), sizeof(u32) or + * sizeof(u64). Other sizes are not expected and will produce a WARN_ONCE from + * the i40e_add_ethtool_stat() helper function. + * + * The @stat_string is interpreted as a format string, allowing formatted + * values to be inserted while looping over multiple structures for a given + * statistics array. Thus, every statistic string in an array should have the + * same type and number of format specifiers, to be formatted by variadic + * arguments to the i40e_add_stat_string() helper function. + **/ +struct i40e_stats { + char stat_string[ETH_GSTRING_LEN]; + int sizeof_stat; + int stat_offset; +}; + +/* Helper macro to define an i40e_stat structure with proper size and type. + * Use this when defining constant statistics arrays. Note that @_type expects + * only a type name and is used multiple times. + */ +#define I40E_STAT(_type, _name, _stat) { \ + .stat_string = _name, \ + .sizeof_stat = FIELD_SIZEOF(_type, _stat), \ + .stat_offset = offsetof(_type, _stat) \ +} + +/* Helper macro for defining some statistics directly copied from the netdev + * stats structure. + */ +#define I40E_NETDEV_STAT(_net_stat) \ + I40E_STAT(struct rtnl_link_stats64, #_net_stat, _net_stat) + +/* Helper macro for defining some statistics related to queues */ +#define I40E_QUEUE_STAT(_name, _stat) \ + I40E_STAT(struct i40e_ring, _name, _stat) + +/* Stats associated with a Tx or Rx ring */ +static const struct i40e_stats i40e_gstrings_queue_stats[] = { + I40E_QUEUE_STAT("%s-%u.packets", stats.packets),
[net v2] sch_fq_codel: zero q->flows_cnt when fq_codel_init fails
When fq_codel_init fails, qdisc_create_dflt will cleanup by using qdisc_destroy. This function calls the ->reset() op prior to calling the ->destroy() op. Unfortunately, during the failure flow for sch_fq_codel, the ->flows parameter is not initialized, so the fq_codel_reset function will null pointer dereference. kernel: BUG: unable to handle kernel NULL pointer dereference at 0008 kernel: IP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] kernel: PGD 0 P4D 0 kernel: Oops: [#1] SMP PTI kernel: Modules linked in: i40iw i40e(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore ib_core intel_rapl_perf mei_me mei joydev i2c_i801 lpc_ich ioatdma shpchp wmi sch_fq_codel xfs libcrc32c mgag200 ixgbe drm_kms_helper isci ttm firewire_ohci kernel: mdio drm igb libsas crc32c_intel firewire_core ptp pps_core scsi_transport_sas crc_itu_t dca i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler [last unloaded: i40e] kernel: CPU: 10 PID: 4219 Comm: ip Tainted: G OE 4.16.13custom-fq-codel-test+ #3 kernel: Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.05.0004.051120151007 05/11/2015 kernel: RIP: 0010:fq_codel_reset+0x58/0xd0 [sch_fq_codel] kernel: RSP: 0018:bfbf4c1fb620 EFLAGS: 00010246 kernel: RAX: 0400 RBX: RCX: 05b9 kernel: RDX: RSI: 9d03264a60c0 RDI: 9cfd17b31c00 kernel: RBP: 0001 R08: 000260c0 R09: b679c3e9 kernel: R10: f1dab06a0e80 R11: 9cfd163af800 R12: 9cfd17b31c00 kernel: R13: 0001 R14: 9cfd153de600 R15: 0001 kernel: FS: 7fdec2f92800() GS:9d032648() knlGS: kernel: CS: 0010 DS: ES: CR0: 80050033 kernel: CR2: 0008 CR3: 000c1956a006 CR4: 000606e0 kernel: Call Trace: kernel: qdisc_destroy+0x56/0x140 kernel: qdisc_create_dflt+0x8b/0xb0 kernel: mq_init+0xc1/0xf0 kernel: qdisc_create_dflt+0x5a/0xb0 kernel: dev_activate+0x205/0x230 kernel: __dev_open+0xf5/0x160 kernel: __dev_change_flags+0x1a3/0x210 kernel: dev_change_flags+0x21/0x60 kernel: do_setlink+0x660/0xdf0 kernel: ? down_trylock+0x25/0x30 kernel: ? xfs_buf_trylock+0x1a/0xd0 [xfs] kernel: ? rtnl_newlink+0x816/0x990 kernel: ? _xfs_buf_find+0x327/0x580 [xfs] kernel: ? _cond_resched+0x15/0x30 kernel: ? kmem_cache_alloc+0x20/0x1b0 kernel: ? rtnetlink_rcv_msg+0x200/0x2f0 kernel: ? rtnl_calcit.isra.30+0x100/0x100 kernel: ? netlink_rcv_skb+0x4c/0x120 kernel: ? netlink_unicast+0x19e/0x260 kernel: ? netlink_sendmsg+0x1ff/0x3c0 kernel: ? sock_sendmsg+0x36/0x40 kernel: ? ___sys_sendmsg+0x295/0x2f0 kernel: ? ebitmap_cmp+0x6d/0x90 kernel: ? dev_get_by_name_rcu+0x73/0x90 kernel: ? skb_dequeue+0x52/0x60 kernel: ? __inode_wait_for_writeback+0x7f/0xf0 kernel: ? bit_waitqueue+0x30/0x30 kernel: ? fsnotify_grab_connector+0x3c/0x60 kernel: ? __sys_sendmsg+0x51/0x90 kernel: ? do_syscall_64+0x74/0x180 kernel: ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 kernel: Code: 00 00 48 89 87 00 02 00 00 8b 87 a0 01 00 00 85 c0 0f 84 84 00 00 00 31 ed 48 63 dd 83 c5 01 48 c1 e3 06 49 03 9c 24 90 01 00 00 <48> 8b 73 08 48 8b 3b e8 6c 9a 4f f6 48 8d 43 10 48 c7 03 00 00 kernel: RIP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] RSP: bfbf4c1fb620 kernel: CR2: 0008 kernel: ---[ end trace e81a62bede66274e ]--- This is caused because flows_cnt is non-zero, but flows hasn't been initialized. fq_codel_init has left the private data in a partially initialized state. To fix this, reset flows_cnt to 0 when we fail to initialize. Additionally, to make the state more consistent, also cleanup the flows pointer when the allocation of backlogs fails. This fixes the NULL pointer dereference, since both the for-loop and memset in fq_codel_reset will be no-ops when flow_cnt is zero. Signed-off-by: Jacob Keller --- net/sched/sch_fq_codel.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index cd2e0e342fb6..6c0a9d5dbf94 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -479,24 +479,28 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr
[net] sch_fq_codel: zero q->flows_cnt when fq_codel_init fails
When fq_codel_init fails, qdisc_create_dflt will cleanup by using qdisc_destroy. This function calls the ->reset() op prior to calling the ->destroy() op. Unfortunately, during the failure flow for sch_fq_codel, the ->flows parameter is not initialized, so the fq_codel_reset function will null pointer dereference. kernel: BUG: unable to handle kernel NULL pointer dereference at 0008 kernel: IP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] kernel: PGD 0 P4D 0 kernel: Oops: [#1] SMP PTI kernel: Modules linked in: i40iw i40e(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore ib_core intel_rapl_perf mei_me mei joydev i2c_i801 lpc_ich ioatdma shpchp wmi sch_fq_codel xfs libcrc32c mgag200 ixgbe drm_kms_helper isci ttm firewire_ohci kernel: mdio drm igb libsas crc32c_intel firewire_core ptp pps_core scsi_transport_sas crc_itu_t dca i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler [last unloaded: i40e] kernel: CPU: 10 PID: 4219 Comm: ip Tainted: G OE 4.16.13custom-fq-codel-test+ #3 kernel: Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.05.0004.051120151007 05/11/2015 kernel: RIP: 0010:fq_codel_reset+0x58/0xd0 [sch_fq_codel] kernel: RSP: 0018:bfbf4c1fb620 EFLAGS: 00010246 kernel: RAX: 0400 RBX: RCX: 05b9 kernel: RDX: RSI: 9d03264a60c0 RDI: 9cfd17b31c00 kernel: RBP: 0001 R08: 000260c0 R09: b679c3e9 kernel: R10: f1dab06a0e80 R11: 9cfd163af800 R12: 9cfd17b31c00 kernel: R13: 0001 R14: 9cfd153de600 R15: 0001 kernel: FS: 7fdec2f92800() GS:9d032648() knlGS: kernel: CS: 0010 DS: ES: CR0: 80050033 kernel: CR2: 0008 CR3: 000c1956a006 CR4: 000606e0 kernel: Call Trace: kernel: qdisc_destroy+0x56/0x140 kernel: qdisc_create_dflt+0x8b/0xb0 kernel: mq_init+0xc1/0xf0 kernel: qdisc_create_dflt+0x5a/0xb0 kernel: dev_activate+0x205/0x230 kernel: __dev_open+0xf5/0x160 kernel: __dev_change_flags+0x1a3/0x210 kernel: dev_change_flags+0x21/0x60 kernel: do_setlink+0x660/0xdf0 kernel: ? down_trylock+0x25/0x30 kernel: ? xfs_buf_trylock+0x1a/0xd0 [xfs] kernel: ? rtnl_newlink+0x816/0x990 kernel: ? _xfs_buf_find+0x327/0x580 [xfs] kernel: ? _cond_resched+0x15/0x30 kernel: ? kmem_cache_alloc+0x20/0x1b0 kernel: ? rtnetlink_rcv_msg+0x200/0x2f0 kernel: ? rtnl_calcit.isra.30+0x100/0x100 kernel: ? netlink_rcv_skb+0x4c/0x120 kernel: ? netlink_unicast+0x19e/0x260 kernel: ? netlink_sendmsg+0x1ff/0x3c0 kernel: ? sock_sendmsg+0x36/0x40 kernel: ? ___sys_sendmsg+0x295/0x2f0 kernel: ? ebitmap_cmp+0x6d/0x90 kernel: ? dev_get_by_name_rcu+0x73/0x90 kernel: ? skb_dequeue+0x52/0x60 kernel: ? __inode_wait_for_writeback+0x7f/0xf0 kernel: ? bit_waitqueue+0x30/0x30 kernel: ? fsnotify_grab_connector+0x3c/0x60 kernel: ? __sys_sendmsg+0x51/0x90 kernel: ? do_syscall_64+0x74/0x180 kernel: ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 kernel: Code: 00 00 48 89 87 00 02 00 00 8b 87 a0 01 00 00 85 c0 0f 84 84 00 00 00 31 ed 48 63 dd 83 c5 01 48 c1 e3 06 49 03 9c 24 90 01 00 00 <48> 8b 73 08 48 8b 3b e8 6c 9a 4f f6 48 8d 43 10 48 c7 03 00 00 kernel: RIP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] RSP: bfbf4c1fb620 kernel: CR2: 0008 kernel: ---[ end trace e81a62bede66274e ]--- This is caused because flows_cnt is non-zero, but flows hasn't been initialized. fq_codel_init has left the private data in a partially initialized state. To fix this, reset flows_cnt to 0 when we fail to initialize. Additionally, to make the state more consistent, also cleanup the flows pointer when the allocation of backlogs fails. This fixes the NULL pointer dereference, since both the for-loop and memset in fq_codel_reset will be no-ops when flow_cnt is zero. Signed-off-by: Jacob Keller --- net/sched/sch_fq_codel.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index cd2e0e342fb6..df7390b4185c 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -479,24 +479,28 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr
Re: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth
On Mon, Apr 2, 2018 at 7:05 AM, Bjorn Helgaaswrote: > +/* PCIe speed to Mb/s reduced by encoding overhead */ > +#define PCIE_SPEED2MBS_ENC(speed) \ > + ((speed) == PCIE_SPEED_16_0GT ? (16000*(128/130)) : \ > +(speed) == PCIE_SPEED_8_0GT ? (8000*(128/130)) : \ > +(speed) == PCIE_SPEED_5_0GT ? (5000*(8/10)) : \ > +(speed) == PCIE_SPEED_2_5GT ? (2500*(8/10)) : \ > +0) > + Should this be "(speed * x ) / y" instead? wouldn't they calculate 128/130 and truncate that to zero before multiplying by the speed? Or are compilers smart enough to do this the other way to avoid the losses? Thanks, Jake
[RFC PATCH] net: limit maximum number of packets to mark with xmit_more
Under some circumstances, such as with many stacked devices, it is possible that dev_hard_start_xmit will bundle many packets together, and mark them all with xmit_more. Most drivers respond to xmit_more by skipping tail bumps on packet rings, or similar behavior as long as xmit_more is set. This is a performance win since it means drivers can avoid notifying hardware of new packets repeat daily, and thus avoid wasting unnecessary PCIe or other bandwidth. This use of xmit_more comes with a trade off because bundling too many packets can increase latency of the Tx packets. To avoid this, we should limit the maximum number of packets with xmit_more. Driver authors could modify their drivers to check for some determined limit, but this requires all drivers to be modified in order to gain advantage. Instead, add a sysctl "xmit_more_max" which can be used to configure the maximum number of xmit_more skbs to send in a sequence. This ensures that all drivers benefit, and allows system administrators the option to tune the value to their environment. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- Stray thoughts and further questions Is this the right approach? Did I miss any other places where we should limit? Does the limit make sense? Should it instead be a per-device tuning nob instead of a global? Is 32 a good default? Documentation/sysctl/net.txt | 6 ++ include/linux/netdevice.h| 2 ++ net/core/dev.c | 10 +- net/core/sysctl_net_core.c | 7 +++ 4 files changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index b67044a2575f..3d995e8f4448 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -230,6 +230,12 @@ netdev_max_backlog Maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them. +xmit_more_max +- + +Maximum number of packets in a row to mark with skb->xmit_more. A value of zero +indicates no limit. + netdev_rss_key -- diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c5475b37a631..6341452aed09 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3321,6 +3321,8 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev); extern int netdev_budget; extern unsigned intnetdev_budget_usecs; +extern unsigned int sysctl_xmit_more_max; + /* Called by rtnetlink.c:rtnl_unlock() */ void netdev_run_todo(void); diff --git a/net/core/dev.c b/net/core/dev.c index 270b54754821..d9946d29c3a5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2983,12 +2983,19 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *first, struct net_device *de { struct sk_buff *skb = first; int rc = NETDEV_TX_OK; + int xmit_count = 0; + bool more = true; while (skb) { struct sk_buff *next = skb->next; + if (sysctl_xmit_more_max) + more = xmit_count++ < sysctl_xmit_more_max; + if (!more) + xmit_count = 0; + skb->next = NULL; - rc = xmit_one(skb, dev, txq, next != NULL); + rc = xmit_one(skb, dev, txq, more && next != NULL); if (unlikely(!dev_xmit_complete(rc))) { skb->next = next; goto out; @@ -3523,6 +3530,7 @@ EXPORT_SYMBOL(netdev_max_backlog); int netdev_tstamp_prequeue __read_mostly = 1; int netdev_budget __read_mostly = 300; unsigned int __read_mostly netdev_budget_usecs = 2000; +unsigned int __read_mostly sysctl_xmit_more_max = 32; int weight_p __read_mostly = 64; /* old backlog weight */ int dev_weight_rx_bias __read_mostly = 1; /* bias for backlog weight */ int dev_weight_tx_bias __read_mostly = 1; /* bias for output_queue quota */ diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index b7cd9aafe99e..6950e702e101 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -460,6 +460,13 @@ static struct ctl_table net_core_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = , }, + { + .procname = "xmit_more_max", + .data = _xmit_more_max, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .prox_handler = proc_dointvec + }, { } }; -- 2.14.1.436.g33e61a4f0239
[PATCH v2] i40e/i40evf: fix out-of-bounds read of cpumask
When responding to an affinity hint we directly copied a cpumask value, intsead of using cpumask_copy. According to cpumask.h this is not correct because cpumask_t is only guaranteed to have enough space for the number of CPUs in the system, and may not be as big as we expect. Thus a direct copy results in an out-of-bound read and potentially a crash if the pages are aligned just right. This will be easily detected on a kernel with KASAN enabled: KASAN reports: [ 25.242312] BUG: KASAN: slab-out-of-bounds in i40e_irq_affinity_notify+0x30/0x50 [i40e] at addr 880462eea960 [ 25.242315] Read of size 1024 by task kworker/2:1/170 [ 25.242322] CPU: 2 PID: 170 Comm: kworker/2:1 Not tainted 4.11.0-22.el7a.x86_64 #1 [ 25.242325] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 05/06/2015 [ 25.242336] Workqueue: events irq_affinity_notify [ 25.242340] Call Trace: [ 25.242350] dump_stack+0x63/0x8d [ 25.242358] kasan_object_err+0x21/0x70 [ 25.242364] kasan_report+0x288/0x540 [ 25.242397] ? i40e_irq_affinity_notify+0x30/0x50 [i40e] [ 25.242403] check_memory_region+0x13c/0x1a0 [ 25.242408] __asan_loadN+0xf/0x20 [ 25.242440] i40e_irq_affinity_notify+0x30/0x50 [i40e] [ 25.242446] irq_affinity_notify+0x1b4/0x230 [ 25.242452] ? irq_set_affinity_notifier+0x130/0x130 [ 25.242457] ? kasan_slab_free+0x89/0xc0 [ 25.242466] process_one_work+0x32f/0x6f0 [ 25.242472] worker_thread+0x89/0x770 [ 25.242481] ? pci_mmcfg_check_reserved+0xc0/0xc0 [ 25.242488] kthread+0x18c/0x1e0 [ 25.242493] ? process_one_work+0x6f0/0x6f0 [ 25.242499] ? kthread_create_on_node+0xc0/0xc0 [ 25.242506] ret_from_fork+0x2c/0x40 [ 25.242511] Object at 880462eea960, in cache kmalloc-8 size: 8 [ 25.242513] Allocated: [ 25.242514] PID = 170 [ 25.242522] save_stack_trace+0x1b/0x20 [ 25.242529] save_stack+0x46/0xd0 [ 25.242533] kasan_kmalloc+0xad/0xe0 [ 25.242537] __kmalloc_node+0x12c/0x2b0 [ 25.242542] alloc_cpumask_var_node+0x3c/0x60 [ 25.242546] alloc_cpumask_var+0xe/0x10 [ 25.242550] irq_affinity_notify+0x94/0x230 [ 25.242555] process_one_work+0x32f/0x6f0 [ 25.242559] worker_thread+0x89/0x770 [ 25.242564] kthread+0x18c/0x1e0 [ 25.242568] ret_from_fork+0x2c/0x40 [ 25.242569] Freed: [ 25.242570] PID = 0 [ 25.242572] (stack is not available) [ 25.242573] Memory state around the buggy address: [ 25.242578] 880462eea800: fc fc 00 fc fc 00 fc fc 00 fc fc 00 fc fc fb fc [ 25.242582] 880462eea880: fc fb fc fc fb fc fc 00 fc fc 00 fc fc 00 fc fc [ 25.242586] >880462eea900: 00 fc fc 00 fc fc 00 fc fc fb fc fc 00 fc fc fc [ 25.242588] ^ [ 25.242592] 880462eea980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 25.242596] 880462eeaa00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 25.242597] == Fixes: 96db776a3682 ("i40e/i40evf: fix interrupt affinity bug", 2016-09-14) Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> Cc: sta...@vger.kernel.org # 4.10+ --- This updates the commit message for the original fix, and indicates that it fixes a potential crash, as well as tagged the commit for stable and added a Fixes to indicate which commit this fixes. drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 397f1bcaed3e..50a7260b32c2 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3450,7 +3450,7 @@ static void i40e_irq_affinity_notify(struct irq_affinity_notify *notify, struct i40e_q_vector *q_vector = container_of(notify, struct i40e_q_vector, affinity_notify); - q_vector->affinity_mask = *mask; + cpumask_copy(_vector->affinity_mask, mask); } /** diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 1ffd55e06a49..87175a14740e 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -520,7 +520,7 @@ static void i40evf_irq_affinity_notify(struct irq_affinity_notify *notify, struct i40e_q_vector *q_vector = container_of(notify, struct i40e_q_vector, affinity_notify); - q_vector->affinity_mask = *mask; + cpumask_copy(_vector->affinity_mask, mask); } /** -- 2.14.1.323.g792488f9a5e1
[RFC PATCH] net: don't set __LINK_STATE_START until after dev->open() call
Fix an issue with relying on netif_running() which could be true during when dev->open() handler is being called, even if it would exit with a failure. This ensures the state does not get set and removed with a narrow race for other callers to read it as open when infact it never finished opening. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- I found this as a result of debugging a race condition in the i40evf driver, in which we assumed that netif_running() would not be true until after dev->open() had been called and succeeded. Unfortunately we can't hold the rtnl_lock() while checking netif_running() because it would cause a deadlock between our reset task and our ndo_open handler. I am wondering whether the proposed change is acceptable here, or whether some ndo_open handlers rely on __LINK_STATE_START being true prior to their being called? net/core/dev.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 1d75499add72..11953af90427 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1362,8 +1362,6 @@ static int __dev_open(struct net_device *dev) if (ret) return ret; - set_bit(__LINK_STATE_START, >state); - if (ops->ndo_validate_addr) ret = ops->ndo_validate_addr(dev); @@ -1372,9 +1370,8 @@ static int __dev_open(struct net_device *dev) netpoll_poll_enable(dev); - if (ret) - clear_bit(__LINK_STATE_START, >state); - else { + if (!ret) + set_bit(__LINK_STATE_START, >state); dev->flags |= IFF_UP; dev_set_rx_mode(dev); dev_activate(dev); -- 2.14.0.rc1.251.g593d8d6362ce
[PATCH] i40e: don't truncate match_method assignment
The .match_method field is a u8, so we shouldn't be casting to a u16, and because it is only one byte, we do not need to byte swap anything. Just assign the value directly. This avoids issues on Big Endian architectures which would have byte swapped and then incorrectly truncated the value. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> Cc: Stephen Rothwell <s...@canb.auug.org.au> Cc: Bimmy Pujari <bimmy.puj...@intel.com> --- Not sure if this was already in Jeff's queue, but since it's an obvious fix for the issue found by Stephen, I thought I'd send it out now just to make sure. Thanks for catching this, and sorry we didn't find the fix earlier. drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 69a51a4119d6..6ccf18464339 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -2257,8 +2257,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) } add_list[num_add].queue_number = 0; /* set invalid match method for later detection */ - add_list[num_add].match_method = - cpu_to_le16((u16)I40E_AQC_MM_ERR_NO_RES); + add_list[num_add].match_method = I40E_AQC_MM_ERR_NO_RES; cmd_flags |= I40E_AQC_MACVLAN_ADD_PERFECT_MATCH; add_list[num_add].flags = cpu_to_le16(cmd_flags); num_add++; -- 2.11.0.rc2.152.g4d04e67
[PATCH RFC v1 1/2] rxclass: use ethtool_get_flow_spec_ring to display queue and VF
Recent kernels have made the ring_cookie store both the Queue index as well as a VF identifier. This allows for drivers to direct traffic to specific VF queues, without having the user have to understand the physical queue layout. Add support to display this notation so that users do not have to manually parse the value. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- rxclass.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/rxclass.c b/rxclass.c index c7bfebaf6e22..7f0e765d3b47 100644 --- a/rxclass.c +++ b/rxclass.c @@ -247,11 +247,19 @@ static void rxclass_print_nfc_rule(struct ethtool_rx_flow_spec *fsp) rxclass_print_nfc_spec_ext(fsp); - if (fsp->ring_cookie != RX_CLS_FLOW_DISC) - fprintf(stdout, "\tAction: Direct to queue %llu\n", - fsp->ring_cookie); - else + if (fsp->ring_cookie != RX_CLS_FLOW_DISC) { + u64 vf = ethtool_get_flow_spec_ring_vf(fsp->ring_cookie); + u64 queue = ethtool_get_flow_spec_ring(fsp->ring_cookie); + + if (vf) + fprintf(stdout, "\tAction: Direct to queue %llu\n", + queue); + else + fprintf(stdout, "\tAction: Direct to VF %llu queue %llu\n", + vf, queue); + } else { fprintf(stdout, "\tAction: Drop\n"); + } fprintf(stdout, "\n"); } -- 2.11.0.rc2.152.g4d04e67
[PATCH RFC v1 0/2] ethtool: add support for VF in ring_cookie
The ring_cookie value for an ntuple filter is used by the driver to determine what queue to direct the traffic towards. Since the main function has cnotrol over all queues it is possible to use this to direct traffic to a queue on a virtual function rather than a queue on the physical function. However, this was cumbersome to do if you did not know exactly how the device laid out virtual function queues. To alleviate this, the kernel side of the ethtool API gained a couple of functions used to partition the ring_cookie value into a queue index, and a virtual function index. It was assumed that no device was likely to have more than 2^32 queues, and currently no device has more than 2^8 virtual functions, so the split was chosen such that the lower 32 bits of the ring_cookie represent the queue, and the next 8 bits would represent a virtual function id. This is useful, but lacked support on the ethtool command line side. It is possible to simply use hex notation, and to "know" that the driver works in this manner. However, this makes use of and knowledge of the feature not widespread. This series attempts to fix that with two patches. First, we make use of the access functions when displaying the queue. Since no device currently supports queues beyond 2^32 it is unlikely that any driver which does not honor this partitioning of the ring_cookie would have its index incorrectly interpreted. This makes it so that the user will see the queue index and virtual function index split apart instead of being displayed as a combined base-10 number. Additionally, the second patch adds support for a new notation for the action. Instead of just using a number, we add support for the special syntax "X/Y" where X would be a number from 0-256 which is the virtual function index, while Y would be a number from 0 to 2^32 which is the queue index. This makes it much easier to write the encoded ring_cookie value. This series is sent as an RFC so that others may comment on the design of the new syntax, and offer alternative suggestions. Some of the ones I have considered: a) just leave the second part alone, and display the VF+queue split, but do not provide an easier syntax for writing the split value. This is easy to do, but leaves the feature as a sort of hidden gem. Additionally, the user would most easily do this by writing it as hex notation such as action 0x10005 However, this is cumbersome. It can be partially alleviated by using shell scripting on top of ethtool, but again is somewhat of a pain. b) actually adding a new key work so that you would do something like "queue x" and "vf y" or similar. However, this is much more difficult to correctly implement. The current implementation of handling the values assumes that each type argument is followed by a single argument which fits into a well defined section of the structure. Because the structure is not specified as le64 or be64, we can't use a union or similar to make it correct (since placement in the union ordering would depend on byte ordering of the field!) so we would have to re-architect the handling of values to allow multiple keywords to combine into a single value. This seemed incredibly cumbersome. The disadvantage of the new syntax is that it *is* new syntax that we have to parse ourselves, and that may not be obvious to users. However, we can document it, and the current syntax will work for all drivers, and the new syntax will generally only work on drivers which use the ring_cookie partitioning in this manner (as no current driver has queues beyond 2^32, as was decided when the partitioning scheme was added to the ethtool header file). I like the use of / but am open to alternative suggestions for a simple sequence. Note that we cannot use spaces, as this would move into the argv solution of (b) which I do not have time to implement, and would be a much more massive project. Jacob Keller (2): rxclass: use ethtool_get_flow_spec_ring to display queue and VF ethtool: add support to specify ring_cookie as VF/queue ethtool.8.in | 3 ++- rxclass.c| 74 +++- 2 files changed, 65 insertions(+), 12 deletions(-) -- 2.11.0.rc2.152.g4d04e67
[PATCH RFC v1 2/2] ethtool: add support to specify ring_cookie as VF/queue
Add support to allow users to specify the ring_cookie as a VF/queue value, ie: '0/25' means direct to the physical function queue 25, while '3/10' means to direct to VF id 3, queue 10. It is driver dependent to determine what the VF id is, excepting that 0 indicates the physical function. This allows the user to more easily specify the matching notation as it is understood by the kernel and some drivers. Since no driver exists today with over 2,147,483,647 queues, this notation will simply fail on drivers which don't recognize the new partitioning. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- ethtool.8.in | 3 ++- rxclass.c| 58 +++--- 2 files changed, 53 insertions(+), 8 deletions(-) diff --git a/ethtool.8.in b/ethtool.8.in index 5c36c06385f6..cb165fa4c77a 100644 --- a/ethtool.8.in +++ b/ethtool.8.in @@ -826,7 +826,8 @@ Specifies the Rx queue to send packets to, or some other action. nokeep; lB l. -1 Drop the matched flow -0 or higherRx queue to route the flow +0 or higherRoute flow to specific Rx queue on the physical function +N/MRoute flow to Rx queue M of specific virtual function N (0 indicates the physical function) .TE .TP .BI loc \ N diff --git a/rxclass.c b/rxclass.c index 7f0e765d3b47..de25550529b4 100644 --- a/rxclass.c +++ b/rxclass.c @@ -614,6 +614,7 @@ typedef enum { OPT_IP4, OPT_IP6, OPT_MAC, + OPT_ACTION, } rule_opt_type_t; #define NFC_FLAG_RING 0x001 @@ -654,7 +655,7 @@ static const struct rule_opts rule_nfc_tcp_ip4[] = { { "dst-port", OPT_BE16, NFC_FLAG_DPORT, offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.pdst), offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.pdst) }, - { "action", OPT_U64, NFC_FLAG_RING, + { "action", OPT_ACTION, NFC_FLAG_RING, offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 }, { "loc", OPT_U32, NFC_FLAG_LOC, offsetof(struct ethtool_rx_flow_spec, location), -1 }, @@ -685,7 +686,7 @@ static const struct rule_opts rule_nfc_esp_ip4[] = { { "spi", OPT_BE32, NFC_FLAG_SPI, offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.spi), offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.spi) }, - { "action", OPT_U64, NFC_FLAG_RING, + { "action", OPT_ACTION, NFC_FLAG_RING, offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 }, { "loc", OPT_U32, NFC_FLAG_LOC, offsetof(struct ethtool_rx_flow_spec, location), -1 }, @@ -728,7 +729,7 @@ static const struct rule_opts rule_nfc_usr_ip4[] = { { "dst-port", OPT_BE16, NFC_FLAG_DPORT, offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.l4_4_bytes) + 2, offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) + 2 }, - { "action", OPT_U64, NFC_FLAG_RING, + { "action", OPT_ACTION, NFC_FLAG_RING, offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 }, { "loc", OPT_U32, NFC_FLAG_LOC, offsetof(struct ethtool_rx_flow_spec, location), -1 }, @@ -762,7 +763,7 @@ static const struct rule_opts rule_nfc_tcp_ip6[] = { { "dst-port", OPT_BE16, NFC_FLAG_DPORT, offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip6_spec.pdst), offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip6_spec.pdst) }, - { "action", OPT_U64, NFC_FLAG_RING, + { "action", OPT_ACTION, NFC_FLAG_RING, offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 }, { "loc", OPT_U32, NFC_FLAG_LOC, offsetof(struct ethtool_rx_flow_spec, location), -1 }, @@ -793,7 +794,7 @@ static const struct rule_opts rule_nfc_esp_ip6[] = { { "spi", OPT_BE32, NFC_FLAG_SPI, offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip6_spec.spi), offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip6_spec.spi) }, - { "action", OPT_U64, NFC_FLAG_RING, + { "action", OPT_ACTION, NFC_FLAG_RING, offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 }, { "loc", OPT_U32, NFC_FLAG_LOC, offsetof(struct ethtool_rx_flow_spec, location), -1 }, @@ -836,7 +837,7 @@ static const struct rule_opts rule_nfc_usr_ip6[] = { { "dst-port", OPT_BE16, NFC_FLAG_DPORT, offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip6_spec.l4_4_bytes) + 2, offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip6_spec.l4_4_bytes) + 2 }, - { "action", OPT_U64, NFC_FLAG_RING, + { "action", OPT_ACTION, NFC_FLAG_RING, offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 }, { "loc", OPT_U32, NFC_FLAG_LOC, offsetof(struct ethtool_rx_flow_spec, locati
[PATCH] ethtool: mark mask values as ULL values
If compiling with signed checks enabled, there will be warnings generated by the ETHTOOL_RX_FLOW_SPEC_RING(_VF) masks. These are currently marked as signed constants, which will generate warnings when masking with unsigned values. Avoid this by marking them explicitly as unsigned values. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- include/uapi/linux/ethtool.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index ed650eef9e54..a716112b2b01 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -891,8 +891,8 @@ struct ethtool_rx_flow_spec { * space for this at this time. If a future patch consumes the next * byte it should be aware of this possiblity. */ -#define ETHTOOL_RX_FLOW_SPEC_RING 0xLL -#define ETHTOOL_RX_FLOW_SPEC_RING_VF 0x00FFLL +#define ETHTOOL_RX_FLOW_SPEC_RING 0xULL +#define ETHTOOL_RX_FLOW_SPEC_RING_VF 0x00FFULL #define ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF 32 static inline __u64 ethtool_get_flow_spec_ring(__u64 ring_cookie) { -- 2.11.0.rc2.152.g4d04e67
[PATCH RFC v3] ethtool: implement helper to get flow_type value
Often a driver wants to store the flow type and thus it must mask the extra fields. This is a task that could grow more complex as more flags are added in the future. Add a helper function that masks the flags for marking additional fields. Modify drivers in drivers/net/ethernet that currently check for FLOW_EXT and FLOW_MAC_EXT to use the helper. Currently this is only the mellanox drivers. I chose not to modify other drivers as I'm actually unsure whether we should always mask the flow type even for drivers which don't recognize the newer flags. On the one hand, today's drivers (generally) automatically fail when a new flag is used because they won't mask it and their checks against flow_type will not match. On the other hand, it means another place that you have to update when you begin implementing a flag. An alternative is to have the driver store a set of flags that it knows about, and then have ethtool core do the check for us to discard frames. I haven't implemented this quite yet. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 6 +++--- include/uapi/linux/ethtool.h| 5 + 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c index 487a58f9c192..d8f9839ce2a3 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c @@ -1270,7 +1270,7 @@ static int mlx4_en_validate_flow(struct net_device *dev, return -EINVAL; } - switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) { case TCP_V4_FLOW: case UDP_V4_FLOW: if (cmd->fs.m_u.tcp_ip4_spec.tos) @@ -1493,7 +1493,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev, if (err) return err; - switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) { case ETHER_FLOW: spec_l2 = kzalloc(sizeof(*spec_l2), GFP_KERNEL); if (!spec_l2) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c index 3691451c728c..066e6c5cf38b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c @@ -63,7 +63,7 @@ static struct mlx5e_ethtool_table *get_flow_table(struct mlx5e_priv *priv, int table_size; int prio; - switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(fs->flow_type)) { case TCP_V4_FLOW: case UDP_V4_FLOW: max_tuples = ETHTOOL_NUM_L3_L4_FTS; @@ -147,7 +147,7 @@ static int set_flow_attrs(u32 *match_c, u32 *match_v, outer_headers); void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v, outer_headers); - u32 flow_type = fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT); + u32 flow_type = ethtool_get_flow_spec_type(fs->flow_type); struct ethtool_tcpip4_spec *l4_mask; struct ethtool_tcpip4_spec *l4_val; struct ethtool_usrip4_spec *l3_mask; @@ -393,7 +393,7 @@ static int validate_flow(struct mlx5e_priv *priv, fs->ring_cookie != RX_CLS_FLOW_DISC) return -EINVAL; - switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(fs->flow_type)) { case ETHER_FLOW: eth_mask = >m_u.ether_spec; if (!is_zero_ether_addr(eth_mask->h_dest)) diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index f0db7788f887..ed650eef9e54 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -1583,6 +1583,11 @@ static inline int ethtool_validate_duplex(__u8 duplex) #defineFLOW_EXT0x8000 #defineFLOW_MAC_EXT0x4000 +static inline __u32 ethtool_get_flow_spec_type(__u32 flow_type) +{ + return flow_type & ~(FLOW_EXT | FLOW_MAC_EXT); +} + /* L3-L4 network traffic flow hash options */ #defineRXH_L2DA(1 << 1) #defineRXH_VLAN(1 << 2) -- 2.11.0.rc2.152.g4d04e67
[PATCH RFC v2] ethtool: implement helper to get flow_type value
Often a driver wants to store the flow type and thus it must mask the extra fields. This is a task that could grow more complex as more flags are added in the future. Add a helper function that masks the flags for marking additional fields. Modify drivers in drivers/net/ethernet that currently check for FLOW_EXT and FLOW_MAC_EXT to use the helper. Currently this is only the mellanox drivers. I chose not to modify other drivers as I'm actually unsure whether we should always mask the flow type even for drivers which don't recognize the newer flags. On the one hand, today's drivers (generally) automatically fail when a new flag is used because they won't mask it and their checks against flow_type will not match. On the other hand, it means another place that you have to update when you begin implementing a flag. An alternative is to have the driver store a set of flags that it knows about, and then have ethtool core do the check for us to discard frames. I haven't implemented this quite yet. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 6 +++--- include/uapi/linux/ethtool.h| 5 + 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c index 487a58f9c192..d8f9839ce2a3 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c @@ -1270,7 +1270,7 @@ static int mlx4_en_validate_flow(struct net_device *dev, return -EINVAL; } - switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) { case TCP_V4_FLOW: case UDP_V4_FLOW: if (cmd->fs.m_u.tcp_ip4_spec.tos) @@ -1493,7 +1493,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev, if (err) return err; - switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) { case ETHER_FLOW: spec_l2 = kzalloc(sizeof(*spec_l2), GFP_KERNEL); if (!spec_l2) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c index 3691451c728c..066e6c5cf38b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c @@ -63,7 +63,7 @@ static struct mlx5e_ethtool_table *get_flow_table(struct mlx5e_priv *priv, int table_size; int prio; - switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(fs->flow_type)) { case TCP_V4_FLOW: case UDP_V4_FLOW: max_tuples = ETHTOOL_NUM_L3_L4_FTS; @@ -147,7 +147,7 @@ static int set_flow_attrs(u32 *match_c, u32 *match_v, outer_headers); void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v, outer_headers); - u32 flow_type = fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT); + u32 flow_type = ethtool_get_flow_spec_type(fs->flow_type); struct ethtool_tcpip4_spec *l4_mask; struct ethtool_tcpip4_spec *l4_val; struct ethtool_usrip4_spec *l3_mask; @@ -393,7 +393,7 @@ static int validate_flow(struct mlx5e_priv *priv, fs->ring_cookie != RX_CLS_FLOW_DISC) return -EINVAL; - switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(fs->flow_type)) { case ETHER_FLOW: eth_mask = >m_u.ether_spec; if (!is_zero_ether_addr(eth_mask->h_dest)) diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index f0db7788f887..e92ad725c9d0 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -1583,6 +1583,11 @@ static inline int ethtool_validate_duplex(__u8 duplex) #defineFLOW_EXT0x8000 #defineFLOW_MAC_EXT0x4000 +static inline __u32 ethtool_get_flow_spec_type(__u32 flow_type) +{ + return flow_type & (FLOW_EXT | FLOW_MAC_EXT); +} + /* L3-L4 network traffic flow hash options */ #defineRXH_L2DA(1 << 1) #defineRXH_VLAN(1 << 2) -- 2.11.0.rc2.152.g4d04e67
[PATCH RFC v1] ethtool: implement helper to get flow_type value
Often a driver wants to store the flow type and thus it must mask the extra fields. This is a task that could grow more complex as more flags are added in the future. Add a helper function that masks the flags for marking additional fields. Modify drivers in drivers/net/ethernet that currently check for FLOW_EXT and FLOW_MAC_EXT to use the helper. Currently this is only the mellanox drivers. I chose not to modify other drivers as I'm actually unsure whether we should always mask the flow type even for drivers which don't recognize the newer flags. On the one hand, today's drivers (generally) automatically fail when a new flag is used because they won't mask it and their checks against flow_type will not match. On the other hand, it means another place that you have to update when you begin implementing a flag. An alternative is to have the driver store a set of flags that it knows about, and then have ethtool core do the check for us to discard frames. I haven't implemented this quite yet. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- I plan on using this helper when fixing the mask code for ntuple filters in the Intel i40e driver. I wanted to see whether this approach was acceptable, and whether we should implement additional checks. The primary reason is that today's drivers are "fail closed" in that a new flag type will probably fail on drivers due to checking for flow types they recognize. Since drivers only remove the masked bits they recognize this works. However, this gets cumbersome if new additional flags get added in the future. I would like some sort of helper, but if we encourage its use, and a new flag gets added, the helper will then unforunately make the driver "fail open" in that a new flag will get ignored as the driver won't know to return -EINVAL. I think the right solution will be to add some sort of checks in core ethtool which we can basically set the recognized flags in some way for all drivers such that the ethtool core can drop requests for flows with unknown flag types. I'm unsure how to implement this though. Thoughts? drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 6 +++--- include/uapi/linux/ethtool.h| 11 --- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c index 487a58f9c192..d8f9839ce2a3 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c @@ -1270,7 +1270,7 @@ static int mlx4_en_validate_flow(struct net_device *dev, return -EINVAL; } - switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) { case TCP_V4_FLOW: case UDP_V4_FLOW: if (cmd->fs.m_u.tcp_ip4_spec.tos) @@ -1493,7 +1493,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev, if (err) return err; - switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) { case ETHER_FLOW: spec_l2 = kzalloc(sizeof(*spec_l2), GFP_KERNEL); if (!spec_l2) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c index 3691451c728c..066e6c5cf38b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c @@ -63,7 +63,7 @@ static struct mlx5e_ethtool_table *get_flow_table(struct mlx5e_priv *priv, int table_size; int prio; - switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(fs->flow_type)) { case TCP_V4_FLOW: case UDP_V4_FLOW: max_tuples = ETHTOOL_NUM_L3_L4_FTS; @@ -147,7 +147,7 @@ static int set_flow_attrs(u32 *match_c, u32 *match_v, outer_headers); void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v, outer_headers); - u32 flow_type = fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT); + u32 flow_type = ethtool_get_flow_spec_type(fs->flow_type); struct ethtool_tcpip4_spec *l4_mask; struct ethtool_tcpip4_spec *l4_val; struct ethtool_usrip4_spec *l3_mask; @@ -393,7 +393,7 @@ static int validate_flow(struct mlx5e_priv *priv, fs->ring_cookie != RX_CLS_FLOW_DISC) return -EINVAL; - switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) { + switch (ethtool_get_flow_spec_type(fs->flow_type)) { case ETHER_FLOW: eth_mask = >m_u.ether_
[PATCH v2] ethtool: check size of user memory before copying strings and stats
Since at least 2005, (oldest commit in ethtool.git), the userspace ethtool implementation has given the size of the memory it has allocated as the actual size in the ethtool data structures. We previously blindly ignore this and overwrite the requested size with the current size returned by .get_strings or .get_sset_count. This can cause problems if these values aren't static. Since ethtool has always given the expected size, first perform a check to ensure that the current size is no larger than the requested size. If it is, return with -ENOMEM, as we do not have enough memory to save the results. This protects against buffer overrun should the driver have a non-static number of statistics, tests, or private flags, and the value changes between the ethtool userspace call to .get_sset_count and the actual calls to populate the requested memory. Update the header files to indicate the expected behavior of userspace. This change shouldn't break current or previous userspace as they have consistently included the length correctly since as far back as we can check. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> Reported-by: Alexander Duyck <alexander.du...@gmail.com> --- - v2 * use -ENOSPC instead of -ENOMEM at the suggestion of Mark Rustad include/uapi/linux/ethtool.h | 19 +-- net/core/ethtool.c | 12 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 37fd6dc33de4..0d7f4855dc0b 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -560,13 +560,16 @@ enum ethtool_stringset { * struct ethtool_gstrings - string set for data tagging * @cmd: Command number = %ETHTOOL_GSTRINGS * @string_set: String set ID; one of ethtool_stringset - * @len: On return, the number of strings in the string set + * @len: On entry, the number of strings which fit into the allocated space. + * On return, the number of strings in the string set * @data: Buffer for strings. Each string is null-padded to a size of * %ETH_GSTRING_LEN. * * Users must use %ETHTOOL_GSSET_INFO to find the number of strings in * the string set. They must allocate a buffer of the appropriate - * size immediately following this structure. + * size immediately following this structure. They must set the len to the + * number of strings which will fit into the allocated space following this + * structure. */ struct ethtool_gstrings { __u32 cmd; @@ -622,13 +625,15 @@ enum ethtool_test_flags { * @flags: A bitmask of flags from ethtool_test_flags. Some * flags may be set by the user on entry; others may be set by * the driver on return. - * @len: On return, the number of test results + * @len: On entry, the number of test results that fit in the allocated space + * On return, the number of test results * @data: Array of test results * * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the * number of test results that will be returned. They must allocate a * buffer of the appropriate size (8 * number of results) immediately - * following this structure. + * following this structure. They must set the len to the number of tests + * results which will fit into the allocated space following the structure. */ struct ethtool_test { __u32 cmd; @@ -641,13 +646,15 @@ struct ethtool_test { /** * struct ethtool_stats - device-specific statistics * @cmd: Command number = %ETHTOOL_GSTATS - * @n_stats: On return, the number of statistics + * @n_stats: On entry, the number of stats that fit into the allocated space. + * On return, the number of statistics * @data: Array of statistics * * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the * number of statistics that will be returned. They must allocate a * buffer of the appropriate size (8 * number of statistics) - * immediately following this structure. + * immediately following this structure. They must set n_stats to the number + * of stats which fit into the allocated space at the end of the structure. */ struct ethtool_stats { __u32 cmd; diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 2966cd0d7c93..8ce7d1b5756d 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1766,6 +1766,9 @@ static int ethtool_self_test(struct net_device *dev, char __user *useraddr) if (copy_from_user(, useraddr, sizeof(test))) return -EFAULT; + if (test_len > test.len) + return -ENOSPC; + test.len = test_len; data = kmalloc(test_len * sizeof(u64), GFP_USER); if (!data) @@ -1799,6 +1802,9 @@ static int ethtool_get_strings(struct net_device *dev, void __user *useraddr) if (ret < 0) return ret; + if (ret > gstrings.len) + return -ENOSPC; + gstrings.len = ret; data =
[PATCH] ethtool: check size of user memory before copying strings and stats
Since at least 2005, (oldest commit in ethtool.git), the userspace ethtool implementation has given the size of the memory it has allocated as the actual size in the ethtool data structures. We previously blindly ignore this and overwrite the requested size with the current size returned by .get_strings or .get_sset_count. This can cause problems if these values aren't static. Since ethtool has always given the expected size, first perform a check to ensure that the current size is no larger than the requested size. If it is, return with -ENOMEM, as we do not have enough memory to save the results. This protects against buffer overrun should the driver have a non-static number of statistics, tests, or private flags, and the value changes between the ethtool userspace call to .get_sset_count and the actual calls to populate the requested memory. Update the header files to indicate the expected behavior of userspace. This change shouldn't break current or previous userspace as they have consistently included the length correctly since as far back as we can check. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> Reported-by: Alexander Duyck <alexander.du...@gmail.com> --- include/uapi/linux/ethtool.h | 19 +-- net/core/ethtool.c | 12 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 37fd6dc33de4..0d7f4855dc0b 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -560,13 +560,16 @@ enum ethtool_stringset { * struct ethtool_gstrings - string set for data tagging * @cmd: Command number = %ETHTOOL_GSTRINGS * @string_set: String set ID; one of ethtool_stringset - * @len: On return, the number of strings in the string set + * @len: On entry, the number of strings which fit into the allocated space. + * On return, the number of strings in the string set * @data: Buffer for strings. Each string is null-padded to a size of * %ETH_GSTRING_LEN. * * Users must use %ETHTOOL_GSSET_INFO to find the number of strings in * the string set. They must allocate a buffer of the appropriate - * size immediately following this structure. + * size immediately following this structure. They must set the len to the + * number of strings which will fit into the allocated space following this + * structure. */ struct ethtool_gstrings { __u32 cmd; @@ -622,13 +625,15 @@ enum ethtool_test_flags { * @flags: A bitmask of flags from ethtool_test_flags. Some * flags may be set by the user on entry; others may be set by * the driver on return. - * @len: On return, the number of test results + * @len: On entry, the number of test results that fit in the allocated space + * On return, the number of test results * @data: Array of test results * * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the * number of test results that will be returned. They must allocate a * buffer of the appropriate size (8 * number of results) immediately - * following this structure. + * following this structure. They must set the len to the number of tests + * results which will fit into the allocated space following the structure. */ struct ethtool_test { __u32 cmd; @@ -641,13 +646,15 @@ struct ethtool_test { /** * struct ethtool_stats - device-specific statistics * @cmd: Command number = %ETHTOOL_GSTATS - * @n_stats: On return, the number of statistics + * @n_stats: On entry, the number of stats that fit into the allocated space. + * On return, the number of statistics * @data: Array of statistics * * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the * number of statistics that will be returned. They must allocate a * buffer of the appropriate size (8 * number of statistics) - * immediately following this structure. + * immediately following this structure. They must set n_stats to the number + * of stats which fit into the allocated space at the end of the structure. */ struct ethtool_stats { __u32 cmd; diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 2966cd0d7c93..20abcc43787e 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1766,6 +1766,9 @@ static int ethtool_self_test(struct net_device *dev, char __user *useraddr) if (copy_from_user(, useraddr, sizeof(test))) return -EFAULT; + if (test_len > test.len) + return -ENOMEM; + test.len = test_len; data = kmalloc(test_len * sizeof(u64), GFP_USER); if (!data) @@ -1799,6 +1802,9 @@ static int ethtool_get_strings(struct net_device *dev, void __user *useraddr) if (ret < 0) return ret; + if (ret > gstrings.len) + return -ENOMEM; + gstrings.len = ret; data = kcalloc(gstrings.len, ETH_GSTRING_LEN, GFP_USER); @@ -1898,6 +1904,9 @@ static
[PATCH 2/4] ethtool: ensure channel counts are within bounds during SCHANNELS
Add a sanity check to ensure that all requested channel sizes are within bounds, which should reduce errors in driver implementation. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- net/core/ethtool.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 8a40948c1fc6..71aff99157c3 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1276,16 +1276,27 @@ static noinline_for_stack int ethtool_get_channels(struct net_device *dev, static noinline_for_stack int ethtool_set_channels(struct net_device *dev, void __user *useraddr) { - struct ethtool_channels channels; + struct ethtool_channels channels, max; u32 max_rx = 0; int ret; - if (!dev->ethtool_ops->set_channels) + if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels) return -EOPNOTSUPP; if (copy_from_user(, useraddr, sizeof(channels))) return -EFAULT; + ret = dev->ethtool_ops->get_channels(dev, ); + if (ret) + return ret; + + /* ensure new counts are within the maximums */ + if ((channels.rx_count > max.max_rx) || + (channels.tx_count > max.max_tx) || + (channels.combined_count > max.max_combined) || + (channels.other_count > max.max_other)) + return -EINVAL; + /* ensure the new Rx count fits within the configured Rx flow * indirection table settings */ if (netif_is_rxfh_configured(dev) && -- 2.7.0.236.gda096a0.dirty
[PATCH 0/3] ethtool: correct {GS}CHANNELS and {GS}RXFH conflict
This patch series fixes up ethtool_set_channels operation which allowed modifying the RXFH table indirectly by reducing the number of queues below the current max queue used by the Rx flow table. Most drivers incorrectly allowed this to destroy the Rx flow table and would then start by reinitializing it to default settings. However, drivers are not able to correctly handle the conflict since there was no way to differentiate between the default settings and the user requested explicit settings. To fix this, implement a new netdev private flag which we use to indicate whether the RXFH has been user configured. If someone has a better alternative of how to store this information, let me know. I do not think that priv_flags is the best solution but I have not had any better idea. Secondly, we add a function which just calls the driver's get_rxfh callback to determine the current indirection table. Loop through this and we can determine the current highest queue that will be used by RSS. Now, modify ethtool_set_channels to add a check ensuring that if (a) we have had rxfh configured by user, (b) we can get the maximum RSS queue currently used, then we ensure that the newly requested Rx count (or combined count) is at least as high as this maximum RSS queue. The reasoning here is that we can always safely increase the number of queues. If we decrease the queues we must ensure that the decrease does not go lower than the highest in-use queue for the Rx flow table. Drivers may still need to be patched if they currently overwrite the Rx flow table during channel configuration. If the driver currently always resets Rx flow table when increasing number of queues it must be patched to only do this when netif_is_rxfh_configured returns false. The second two patches are suggestions which I think are valid and noticed while implementing the above changes. The second patch simply adds a check to ensure that all provided channel counts fit within driver defined maximums. The third patch I am unsure on so would like comment. I believe that combined queue counts and separate tx/rx queue counts should not be configurable at the same time (the only driver which allowed both does requires they can't be defined at the same time). Other drivers support either separate or combined but not both. If this isn't the case then go ahead and ignore the third patch. The fourth patch fixes fm10k to correctly recofigure the RSS reta table whenever it is still unconfigured. This means that the default state will provide RSS to every queue. Once the user has configured RXFH, then we should maintain it. In addition, since the case where we must reconfigure the RSS table in this case should now no longer occur, add a dev_err message to indicate the user that we did so. Jacob Keller (4): ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH} ethtool: ensure channel counts are within bounds during SCHANNELS ethtool: can't set combined and tx/rx channel counts at the same time fm10k: don't reinitialize RSS flow table when RXFH configured drivers/net/ethernet/intel/fm10k/fm10k_main.c | 10 +++- include/linux/netdevice.h | 8 +++ net/core/ethtool.c| 79 ++- 3 files changed, 93 insertions(+), 4 deletions(-) -- 2.7.0.236.gda096a0.dirty
[PATCH 4/4] fm10k: don't reinitialize RSS flow table when RXFH configured
Also print an error message incase we do have to reconfigure as this should no longer happen anymore due to ethtool changes. If it somehow does occur, user should be made aware of it. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- drivers/net/ethernet/intel/fm10k/fm10k_main.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c index 134ce4daa994..4e58f9155883 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c @@ -1937,8 +1937,10 @@ static void fm10k_init_reta(struct fm10k_intfc *interface) u16 i, rss_i = interface->ring_feature[RING_F_RSS].indices; u32 reta, base; - /* If the netdev is initialized we have to maintain table if possible */ - if (interface->netdev->reg_state != NETREG_UNINITIALIZED) { + /* If the Rx flow indirection table has been configured manually, we +* need to maintain it when possible. +*/ + if (netif_is_rxfh_configured(interface->netdev)) { for (i = FM10K_RETA_SIZE; i--;) { reta = interface->reta[i]; if reta << 24) >> 24) < rss_i) && @@ -1946,6 +1948,10 @@ static void fm10k_init_reta(struct fm10k_intfc *interface) (((reta << 8) >> 24) < rss_i) && (((reta) >> 24) < rss_i)) continue; + + /* this should never happen */ + dev_err(>pdev->dev, + "RSS indirection table assigned flows out of queue bounds. Reconfiguring.\n"); goto repopulate_reta; } -- 2.7.0.236.gda096a0.dirty
[PATCH 3/4] ethtool: can't set combined and tx/rx channel counts at the same time
Based on reading current implementations of {GS}CHANNELS, most drivers either support only combined counts or only separate counts. At least one driver supported setting combined or separate channels. No driver supported combined and separate channels setting at the same time, and this patch adds a sanity check to prevent such requests. I am not 100% sure if this is correct as I was not able to find documentation and I think it currently depended on driver implementation. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- net/core/ethtool.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 71aff99157c3..7d4cef7b7176 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1297,6 +1297,11 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev, (channels.other_count > max.max_other)) return -EINVAL; + /* can't set combined and separate channels at the same time */ + if ((channels.combined_count && +(channels.rx_count || channels.tx_count)) + return -EINVAL; + /* ensure the new Rx count fits within the configured Rx flow * indirection table settings */ if (netif_is_rxfh_configured(dev) && -- 2.7.0.236.gda096a0.dirty
[PATCH 1/4] ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}
Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops incorrectly allow SCHANNELS when it would conflict with the settings from SRXFH. This occurs because it is not possible for drivers to understand whether their Rx flow indirection table has been configured or is in the default state. In addition, drivers currently behave in various ways when increasing the number of Rx channels. Some drivers will always destroy the Rx flow indirection table when this occurs, whether it has been set by the user or not. Other drivers will attempt to preserve the table even if the user has never modified it from the default driver settings. Neither of these situation is desirable because it leads to unexpected behavior or loss of user configuration. The correct behavior is to simply return -EINVAL when SCHANNELS would conflict with the current Rx flow table settings. However, it should only do so if the current settings were modified by the user. If we required that the new settings never conflict with the current (default) Rx flow settings, we would force users to first reduce their Rx flow settings and then reduce the number of Rx channels. This patch proposes a solution implemented in net/core/ethtool.c which ensures that all drivers behave correctly. It checks whether the RXFH table has been configured to non-default settings, and stores this information in a private netdev flag. When the number of channels is requested to change, it first ensures that the current Rx flow table is not going to assign flows to now disabled channels. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- include/linux/netdevice.h | 8 +++ net/core/ethtool.c| 59 +++ 2 files changed, 67 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 289c2314d766..044fa2d88c7f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1291,6 +1291,7 @@ struct net_device_ops { * @IFF_OPENVSWITCH: device is a Open vSwitch master * @IFF_L3MDEV_SLAVE: device is enslaved to an L3 master device * @IFF_TEAM: device is a team device + * @IFF_RXFH_CONFIGURED: device has had Rx Flow indirection table configured */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1318,6 +1319,7 @@ enum netdev_priv_flags { IFF_OPENVSWITCH = 1<<22, IFF_L3MDEV_SLAVE= 1<<23, IFF_TEAM= 1<<24, + IFF_RXFH_CONFIGURED = 1<<25, }; #define IFF_802_1Q_VLANIFF_802_1Q_VLAN @@ -1345,6 +1347,7 @@ enum netdev_priv_flags { #define IFF_OPENVSWITCHIFF_OPENVSWITCH #define IFF_L3MDEV_SLAVE IFF_L3MDEV_SLAVE #define IFF_TEAM IFF_TEAM +#define IFF_RXFH_CONFIGUREDIFF_RXFH_CONFIGURED /** * struct net_device - The DEVICE structure. @@ -4045,6 +4048,11 @@ static inline bool netif_is_lag_port(const struct net_device *dev) return netif_is_bond_slave(dev) || netif_is_team_port(dev); } +static inline bool netif_is_rxfh_configured(const struct net_device *dev) +{ + return dev->priv_flags & IFF_RXFH_CONFIGURED; +} + /* This device needs to keep skb dst for qdisc enqueue or ndo_start_xmit() */ static inline void netif_keep_dst(struct net_device *dev) { diff --git a/net/core/ethtool.c b/net/core/ethtool.c index daf04709dd3c..8a40948c1fc6 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -642,6 +642,39 @@ void netdev_rss_key_fill(void *buffer, size_t len) } EXPORT_SYMBOL(netdev_rss_key_fill); +static inline int ethool_get_max_rxfh_channel(struct net_device *dev, u32 *max) +{ + u32 dev_size, current_max = 0; + u32 *indir; + int ret, i; + + if (!dev->ethtool_ops->get_rxfh_indir_size || + !dev->ethtool_ops->get_rxfh) + return -EOPNOTSUPP; + dev_size = dev->ethtool_ops->get_rxfh_indir_size(dev); + if (dev_size == 0) + return -EOPNOTSUPP; + + indir = kcalloc(dev_size, sizeof(indir[0]), GFP_USER); + if (!indir) + return -ENOMEM; + + ret = dev->ethtool_ops->get_rxfh(dev, indir, NULL, NULL); + if (ret) + goto out; + + for (i = dev_size; i--;) { + if (indir[i] > current_max) + current_max = indir[i]; + } + + *max = current_max; + +out: + kfree(indir); + return ret; +} + static noinline_for_stack int ethtool_get_rxfh_indir(struct net_device *dev, void __user *useraddr) { @@ -738,6 +771,14 @@ static noinline_for_stack int ethtool_set_rxfh_indir(struct net_device *dev, } ret = ops->set_rxfh(dev, indir, NULL, ETH_RSS_HASH_NO_CHANGE); + if (ret) + goto out; + +
[PATCH 3/4] fm10k: don't reinitialize RSS flow table when RXFH configured
Also print an error message incase we do have to reconfigure as this should no longer happen anymore due to ethtool changes. If it somehow does occur, user should be made aware of it. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- drivers/net/ethernet/intel/fm10k/fm10k_main.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c index 134ce4daa994..4e58f9155883 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c @@ -1937,8 +1937,10 @@ static void fm10k_init_reta(struct fm10k_intfc *interface) u16 i, rss_i = interface->ring_feature[RING_F_RSS].indices; u32 reta, base; - /* If the netdev is initialized we have to maintain table if possible */ - if (interface->netdev->reg_state != NETREG_UNINITIALIZED) { + /* If the Rx flow indirection table has been configured manually, we +* need to maintain it when possible. +*/ + if (netif_is_rxfh_configured(interface->netdev)) { for (i = FM10K_RETA_SIZE; i--;) { reta = interface->reta[i]; if reta << 24) >> 24) < rss_i) && @@ -1946,6 +1948,10 @@ static void fm10k_init_reta(struct fm10k_intfc *interface) (((reta << 8) >> 24) < rss_i) && (((reta) >> 24) < rss_i)) continue; + + /* this should never happen */ + dev_err(>pdev->dev, + "RSS indirection table assigned flows out of queue bounds. Reconfiguring.\n"); goto repopulate_reta; } -- 2.7.0.236.gda096a0.dirty
[PATCH v2 4/4] ethtool: support setting default Rx flow indirection table
Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- ethtool.8.in | 5 - ethtool.c| 46 -- 2 files changed, 36 insertions(+), 15 deletions(-) diff --git a/ethtool.8.in b/ethtool.8.in index eeffa70415b5..b7d56fbe4484 100644 --- a/ethtool.8.in +++ b/ethtool.8.in @@ -296,7 +296,7 @@ ethtool \- query or control network driver and hardware settings .IR N \ | .BI weight\ W0 .IR W1 -.RB ...\ ] +.RB ...\ | \ default \ ] .HP .B ethtool \-f|\-\-flash .I devname file @@ -805,6 +805,9 @@ Sets the receive flow hash indirection table to spread flows between receive queues according to the given weights. The sum of the weights must be non-zero and must not exceed the size of the indirection table. .TP +.BI default +Sets the receive flow hash indirection table to its default value. +.TP .B \-f \-\-flash Write a firmware image to flash or other non-volatile memory on the device. diff --git a/ethtool.c b/ethtool.c index 92c40b823f2c..c05b50f830a7 100644 --- a/ethtool.c +++ b/ethtool.c @@ -3228,13 +3228,11 @@ static int do_grxfh(struct cmd_context *ctx) return 0; } -static int fill_indir_table(u32 *indir_size, u32 *indir, int rxfhindir_equal, - char **rxfhindir_weight, u32 num_weights) +static int fill_indir_table(u32 *indir_size, u32 *indir, int rxfhindir_default, + int rxfhindir_equal, char **rxfhindir_weight, + u32 num_weights) { u32 i; - /* -* "*indir_size == 0" ==> reset indir to default -*/ if (rxfhindir_equal) { for (i = 0; i < *indir_size; i++) indir[i] = i % rxfhindir_equal; @@ -3268,6 +3266,9 @@ static int fill_indir_table(u32 *indir_size, u32 *indir, int rxfhindir_equal, } indir[i] = j; } + } else if (rxfhindir_default) { + /* "*indir_size == 0" ==> reset indir to default */ + *indir_size = 0; } else { *indir_size = ETH_RXFH_INDIR_NO_CHANGE; } @@ -3275,8 +3276,9 @@ static int fill_indir_table(u32 *indir_size, u32 *indir, int rxfhindir_equal, return 0; } -static int do_srxfhindir(struct cmd_context *ctx, int rxfhindir_equal, -char **rxfhindir_weight, u32 num_weights) +static int do_srxfhindir(struct cmd_context *ctx, int rxfhindir_default, +int rxfhindir_equal, char **rxfhindir_weight, +u32 num_weights) { struct ethtool_rxfh_indir indir_head; struct ethtool_rxfh_indir *indir; @@ -3301,7 +3303,8 @@ static int do_srxfhindir(struct cmd_context *ctx, int rxfhindir_equal, indir->cmd = ETHTOOL_SRXFHINDIR; indir->size = indir_head.size; - if (fill_indir_table(>size, indir->ring_index, rxfhindir_equal, + if (fill_indir_table(>size, indir->ring_index, +rxfhindir_default, rxfhindir_equal, rxfhindir_weight, num_weights)) { free(indir); return 1; @@ -3323,7 +3326,7 @@ static int do_srxfh(struct cmd_context *ctx) struct ethtool_rxfh rss_head = {0}; struct ethtool_rxfh *rss; struct ethtool_rxnfc ring_count; - int rxfhindir_equal = 0; + int rxfhindir_equal = 0, rxfhindir_default = 0; char **rxfhindir_weight = NULL; char *rxfhindir_key = NULL; char *hkey = NULL; @@ -3332,7 +3335,7 @@ static int do_srxfh(struct cmd_context *ctx) u32 entry_size = sizeof(rss_head.rss_config[0]); u32 num_weights = 0; - if (ctx->argc < 2) + if (ctx->argc < 1) exit_bad_args(); while (arg_num < ctx->argc) { @@ -3357,6 +3360,9 @@ static int do_srxfh(struct cmd_context *ctx) if (!rxfhindir_key) exit_bad_args(); ++arg_num; + } else if (!strcmp(ctx->argp[arg_num], "default")) { + ++arg_num; + rxfhindir_default = 1; } else { exit_bad_args(); } @@ -3368,6 +3374,18 @@ static int do_srxfh(struct cmd_context *ctx) return 1; } + if (rxfhindir_equal && rxfhindir_default) { + fprintf(stderr, + "Equal and default options are mutually exclusive\n"); + return 1; + } + + if (rxfhindir_weight && rxfhindir_default) { + fprintf(stderr, + "Weight and default options are mutually exclusive\n"); + return 1; + } + ring_count.cmd = ETHTOOL_GRXRINGS; err = send_ioctl(ctx, _count); if (err <
[PATCH v2 0/4] ethtool: correct {GS}CHANNELS and {GS}RXFH conflict
This patch series fixes up ethtool_set_channels operation which allowed modifying the RXFH table indirectly by reducing the number of queues below the current max queue used by the Rx flow table. Most drivers incorrectly allowed this to destroy the Rx flow table and would then start by reinitializing it to default settings. However, drivers are not able to correctly handle the conflict since there was no way to differentiate between the default settings and the user requested explicit settings. To fix this, implement a new netdev private flag which we use to indicate whether the RXFH has been user configured. If someone has a better alternative of how to store this information, let me know. I am not sure that priv_flags is the best solution but I have not had any better idea. Secondly, we add a function which just calls the driver's get_rxfh callback to determine the current indirection table. Loop through this and we can determine the current highest queue that will be used by RSS. Now, modify ethtool_set_channels to add a check ensuring that if (a) we have had rxfh configured by user, (b) we can get the maximum RSS queue currently used, then we ensure that the newly requested Rx count (or combined count) is at least as high as this maximum RSS queue. The reasoning here is that we can always safely increase the number of queues. If we decrease the queues we must ensure that the decrease does not go lower than the highest in-use queue for the Rx flow table. Drivers may still need to be patched if they currently overwrite the Rx flow table during channel configuration. If the driver currently always resets Rx flow table when increasing number of queues it must be patched to only do this when netif_is_rxfh_configured returns false. The second patch simply adds a check to ensure that all provided channel counts fit within driver defined maximums. The third patch fixes fm10k to correctly reconfigure the RSS reta table whenever it is still unconfigured. This means that the default state will provide RSS to every queue. Once the user has configured RXFH, then we should maintain it. In addition, since the case where we must reconfigure the RSS table in this case should now no longer occur, add a dev_err message to indicate the user that we did so. I have also supplied an ethtool patch to enable setting the default Rx flow indirection table. Without this, current ethtool does not support sending an indir_size of 0, and thus does not correctly support configuring back to the default. Changes in v2: * fixed compile error * fixed incorrect comparison with max_rx_in_use * adjusted looping over dev_size * removed inline on function * dropped patch about separating combined vs asymmetric channels * verified behavior using fm10k driver Jacob Keller (3): ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH} ethtool: ensure channel counts are within bounds during SCHANNELS fm10k: don't reinitialize RSS flow table when RXFH configured drivers/net/ethernet/intel/fm10k/fm10k_main.c | 10 +++- include/linux/netdevice.h | 8 +++ net/core/ethtool.c| 71 ++- 3 files changed, 85 insertions(+), 4 deletions(-) -- 2.7.0.236.gda096a0.dirty
[PATCH 2/4] ethtool: ensure channel counts are within bounds during SCHANNELS
Add a sanity check to ensure that all requested channel sizes are within bounds, which should reduce errors in driver implementation. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- net/core/ethtool.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 59aebaf9ed54..dc4f6632f33b 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1274,15 +1274,24 @@ static noinline_for_stack int ethtool_get_channels(struct net_device *dev, static noinline_for_stack int ethtool_set_channels(struct net_device *dev, void __user *useraddr) { - struct ethtool_channels channels; + struct ethtool_channels channels, max; u32 max_rx_in_use = 0; - if (!dev->ethtool_ops->set_channels) + if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels) return -EOPNOTSUPP; if (copy_from_user(, useraddr, sizeof(channels))) return -EFAULT; + dev->ethtool_ops->get_channels(dev, ); + + /* ensure new counts are within the maximums */ + if ((channels.rx_count > max.max_rx) || + (channels.tx_count > max.max_tx) || + (channels.combined_count > max.max_combined) || + (channels.other_count > max.max_other)) + return -EINVAL; + /* ensure the new Rx count fits within the configured Rx flow * indirection table settings */ if (netif_is_rxfh_configured(dev) && -- 2.7.0.236.gda096a0.dirty
[PATCH 1/4] ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}
Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops incorrectly allow SCHANNELS when it would conflict with the settings from SRXFH. This occurs because it is not possible for drivers to understand whether their Rx flow indirection table has been configured or is in the default state. In addition, drivers currently behave in various ways when increasing the number of Rx channels. Some drivers will always destroy the Rx flow indirection table when this occurs, whether it has been set by the user or not. Other drivers will attempt to preserve the table even if the user has never modified it from the default driver settings. Neither of these situation is desirable because it leads to unexpected behavior or loss of user configuration. The correct behavior is to simply return -EINVAL when SCHANNELS would conflict with the current Rx flow table settings. However, it should only do so if the current settings were modified by the user. If we required that the new settings never conflict with the current (default) Rx flow settings, we would force users to first reduce their Rx flow settings and then reduce the number of Rx channels. This patch proposes a solution implemented in net/core/ethtool.c which ensures that all drivers behave correctly. It checks whether the RXFH table has been configured to non-default settings, and stores this information in a private netdev flag. When the number of channels is requested to change, it first ensures that the current Rx flow table is not going to assign flows to now disabled channels. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- include/linux/netdevice.h | 8 +++ net/core/ethtool.c| 55 +++ 2 files changed, 63 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 289c2314d766..044fa2d88c7f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1291,6 +1291,7 @@ struct net_device_ops { * @IFF_OPENVSWITCH: device is a Open vSwitch master * @IFF_L3MDEV_SLAVE: device is enslaved to an L3 master device * @IFF_TEAM: device is a team device + * @IFF_RXFH_CONFIGURED: device has had Rx Flow indirection table configured */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1318,6 +1319,7 @@ enum netdev_priv_flags { IFF_OPENVSWITCH = 1<<22, IFF_L3MDEV_SLAVE= 1<<23, IFF_TEAM= 1<<24, + IFF_RXFH_CONFIGURED = 1<<25, }; #define IFF_802_1Q_VLANIFF_802_1Q_VLAN @@ -1345,6 +1347,7 @@ enum netdev_priv_flags { #define IFF_OPENVSWITCHIFF_OPENVSWITCH #define IFF_L3MDEV_SLAVE IFF_L3MDEV_SLAVE #define IFF_TEAM IFF_TEAM +#define IFF_RXFH_CONFIGUREDIFF_RXFH_CONFIGURED /** * struct net_device - The DEVICE structure. @@ -4045,6 +4048,11 @@ static inline bool netif_is_lag_port(const struct net_device *dev) return netif_is_bond_slave(dev) || netif_is_team_port(dev); } +static inline bool netif_is_rxfh_configured(const struct net_device *dev) +{ + return dev->priv_flags & IFF_RXFH_CONFIGURED; +} + /* This device needs to keep skb dst for qdisc enqueue or ndo_start_xmit() */ static inline void netif_keep_dst(struct net_device *dev) { diff --git a/net/core/ethtool.c b/net/core/ethtool.c index daf04709dd3c..59aebaf9ed54 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -642,6 +642,37 @@ void netdev_rss_key_fill(void *buffer, size_t len) } EXPORT_SYMBOL(netdev_rss_key_fill); +static int ethtool_get_max_rxfh_channel(struct net_device *dev, u32 *max) +{ + u32 dev_size, current_max = 0; + u32 *indir; + int ret; + + if (!dev->ethtool_ops->get_rxfh_indir_size || + !dev->ethtool_ops->get_rxfh) + return -EOPNOTSUPP; + dev_size = dev->ethtool_ops->get_rxfh_indir_size(dev); + if (dev_size == 0) + return -EOPNOTSUPP; + + indir = kcalloc(dev_size, sizeof(indir[0]), GFP_USER); + if (!indir) + return -ENOMEM; + + ret = dev->ethtool_ops->get_rxfh(dev, indir, NULL, NULL); + if (ret) + goto out; + + while (dev_size--) + current_max = max(current_max, indir[dev_size]); + + *max = current_max; + +out: + kfree(indir); + return ret; +} + static noinline_for_stack int ethtool_get_rxfh_indir(struct net_device *dev, void __user *useraddr) { @@ -738,6 +769,14 @@ static noinline_for_stack int ethtool_set_rxfh_indir(struct net_device *dev, } ret = ops->set_rxfh(dev, indir, NULL, ETH_RSS_HASH_NO_CHANGE); + if (ret) + goto out; + + /* indicate whether rxfh was set to
[PATCH 0/2] ethtool: {SG}RXFH indirection deficiency
This patch set adds a new ethtool operation .reset_rxfh_indir which is used by the core ethtool stack to properly indicate to a driver that the default RSS indirection table has been requested. Current behavior for notifying the default settings is indistinguishable from an explicit request. There is no easy way to look at the indirection table and tell if it matches the default, either. To allow drivers the ability to correctly report -EINVAL when changing the number of channels, add the new operation suggested. I chose to use a new ethtool op instead of an additional flag since this has a lower impact and we already use NULL on the *indir variable in set_rxfh to indicate no change was requested. The second patch in the series is an example implementation of the .reset_rxfh_indir operation along with fixes to the fm10k_set_channels to prevent changing the number of channels if it would interfere with the current redirection table. Jacob Keller (2): ethtool: support notifying drivers when user requests default rxfh table fm10k: correctly report error when changing number of channels drivers/net/ethernet/intel/fm10k/fm10k.h | 2 ++ drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 45 drivers/net/ethernet/intel/fm10k/fm10k_main.c| 11 -- include/linux/ethtool.h | 3 ++ net/core/ethtool.c | 8 + 5 files changed, 66 insertions(+), 3 deletions(-) -- 2.7.0.236.gda096a0.dirty
[PATCH 1/2] ethtool: support notifying drivers when user requests default rxfh table
Currently, userspace ethtool supports requesting the default indirection table by passing 0 as the indir_size. However, ops->set_rxfh does not distinguish between user requesting default via indir_size=0 and user requesting explicitly settings which are equivalent of the default. This causes problems because other driver changes such as number of channels (queues) should fail when they will not work with other current settings such as the indirection table. If there is no way to indicate when the driver is in default RSS table state compared to user set states, then we wouldn't ever allow changing the number of queues at all. To fix this, drivers must be able to distinguish between requested settings and the configured defaults. We can't use a value of NULL to the *indir pointer as this is already used to indicate no change to the indirection table. Instead, implement a new callback ops->reset_rxfh_indir which is used whenever the user requests size of zero. This has lower impact than adding a new flag and can be implemented by drivers as necessary. In this way we have the following scenarios now: (a) driver is in default configuration, and is free to change RSS settings as necessary due to other changes such as number of queues. This makes sense since we can essentially consider this as "RSS indirection has not been configured". This is the default state of a new device. In this state, I think the reasonable default is "Always RSS to all enabled queues". (b) user has requested RSS indirection settings, should change the driver state to "RSS indirection table has been configured". Now if the user requests a change in the number of queues, we can properly indicate an error when these settings would conflict with the requested RSS indirection table. (c) The user can request default settings via the {GS}RXFH operations and the driver will get a clear indication that it should reset to the default "RSS indirection table has not been configured" mode. In this way it will be able to then change the number of queues and go about business. If we don't have a way to properly indicate that we've reset to default then we are not able to implement the proposed behavior, so this patch adds a new method to properly indicate that we have reset to the default indirection table. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> Cc: Dave Miller <da...@davemloft.net> --- This is an alternative proposal to my previous patch. I do not believe it is possible to obtain desired behavior without this patch, as it is not possible for the driver to distinguish default settings from user configured RSS table. If we don't do that, then the user will never be able to reduce the number of queues without first modifying the RSS redirection table, which seems wrong to me. include/linux/ethtool.h | 3 +++ net/core/ethtool.c | 8 2 files changed, 11 insertions(+) diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 653dc9c4ebac..700ac5658d34 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -186,6 +186,8 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 n_rx_rings) * will remain unchanged. * Returns a negative error code or zero. An error code must be returned * if at least one unsupported change was requested. + * @reset_rxfh_indir: Reset the contents of the RX flow hash indirection table + * to driver defaults. Returns a negative error code or zero. * @get_channels: Get number of channels. * @set_channels: Set number of channels. Returns a negative error code or * zero. @@ -262,6 +264,7 @@ struct ethtool_ops { u8 *hfunc); int (*set_rxfh)(struct net_device *, const u32 *indir, const u8 *key, const u8 hfunc); + int (*reset_rxfh_indir)(struct net_device *); void(*get_channels)(struct net_device *, struct ethtool_channels *); int (*set_channels)(struct net_device *, struct ethtool_channels *); int (*get_dump_flag)(struct net_device *, struct ethtool_dump *); diff --git a/net/core/ethtool.c b/net/core/ethtool.c index daf04709dd3c..4c6a1c2b8b61 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -725,7 +725,13 @@ static noinline_for_stack int ethtool_set_rxfh_indir(struct net_device *dev, if (ret) goto out; + /* user_size == 0 means reset the indir table to default. */ if (user_size == 0) { + if (ops->reset_rxfh_indir) { + err = ops->reset_rxfh_indir(dev); + goto out; + } + for (i = 0; i < dev_size; i++) indir[i] = ethtool_rxfh_indir_default(i, rx_rings.data); } else { @@ -880,6 +886,8 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
[PATCH 2/2] fm10k: correctly report error when changing number of channels
Previously, the fm10k driver would incorrectly allow changing the number of combined channels when this would have altered user configured RSS indirection table. With the new ethtool operation .reset_rxfh_indir, we are now able to correctly handle the changes to number of channels. This requires several changes: (a) we must first store whether or not the RSS redirection table has been manually configured, as there is no way to tell the default table from an explicit request. Do this by implementing reset_rxfh_indir ethtool op, and storing a flag to indicate how the redirection table is set. (b) replace the fm10k_init_reta code with a check of intialized netdevice to instead check the new FM10K_FLAG_RETA_TABLE_CONFIGURED. This will ensure that the table is always repopulated if we're in the default (unconfigured) state. We still must repopulate if somehow the reta table is changed to what will become an invalid state due to new RSS limits. However, since this should no longer happen, add a dev_err() call to indicate clearly to user what happened. (c) modify the fm10k_set_channels call to check that th new count is within bounds on the reta table. This check is only enforced if the user has manually configured the RSS indirection table, since the driver should be free to repopulate its own default configuration. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- drivers/net/ethernet/intel/fm10k/fm10k.h | 2 ++ drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 45 drivers/net/ethernet/intel/fm10k/fm10k_main.c| 11 -- 3 files changed, 55 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h index 83f386714e87..983bdda9509b 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k.h +++ b/drivers/net/ethernet/intel/fm10k/fm10k.h @@ -268,6 +268,7 @@ struct fm10k_intfc { #define FM10K_FLAG_RX_TS_ENABLED (u32)(BIT(3)) #define FM10K_FLAG_SWPRI_CONFIG(u32)(BIT(4)) #define FM10K_FLAG_DEBUG_STATS (u32)(BIT(5)) +#define FM10K_FLAG_RETA_TABLE_CONFIGURED (u32)(BIT(6)) int xcast_mode; /* Tx fast path data */ @@ -475,6 +476,7 @@ netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb, void fm10k_tx_timeout_reset(struct fm10k_intfc *interface); bool fm10k_check_tx_hang(struct fm10k_ring *tx_ring); void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count); +void fm10k_init_reta(struct fm10k_intfc *interface); /* PCI */ void fm10k_mbx_free_irq(struct fm10k_intfc *); diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c index 6a9f9886cb98..febfa2b009ea 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c @@ -1060,6 +1060,8 @@ static int fm10k_set_reta(struct net_device *netdev, const u32 *indir) if (!indir) return 0; + interface->flags |= FM10K_FLAG_RETA_TABLE_CONFIGURED; + /* Verify user input. */ rss_i = interface->ring_feature[RING_F_RSS].indices; for (i = fm10k_get_reta_size(netdev); i--;) { @@ -1137,6 +1139,25 @@ static int fm10k_set_rssh(struct net_device *netdev, const u32 *indir, return 0; } +static int fm10k_reset_rssh(struct net_device *netdev) +{ + struct fm10k_intfc *interface = netdev_priv(netdev); + struct fm10k_hw *hw = >hw; + int i; + + /* user has requested default configuration so clear configured flag */ + interface->flags &= ~FM10K_FLAG_RETA_TABLE_CONFIGURED; + + /* initialize the reta table to driver defaults */ + fm10k_init_reta(interface); + + /* write the new RETA table to hardware */ + for (i = 0; i < FM10K_RETA_SIZE; i++) + fm10k_write_reg(hw, FM10K_RETA(0, i), interface->reta[i]); + + return 0; +} + static unsigned int fm10k_max_channels(struct net_device *dev) { struct fm10k_intfc *interface = netdev_priv(dev); @@ -1173,6 +1194,8 @@ static int fm10k_set_channels(struct net_device *dev, struct fm10k_intfc *interface = netdev_priv(dev); unsigned int count = ch->combined_count; struct fm10k_hw *hw = >hw; + u32 reta0, reta1, reta2, reta3; + int i, rss_i = 0; /* verify they are not requesting separate vectors */ if (!count || ch->rx_count || ch->tx_count) @@ -1186,6 +1209,27 @@ static int fm10k_set_channels(struct net_device *dev, if (count > fm10k_max_channels(dev)) return -EINVAL; + /* determine the current number of queues used by the reta table */ + for (i = FM10K_RETA_SIZE; i--;) { + reta0 = (interface->reta[i] << 24) >> 24; + reta1 = (interface->reta[i] <<
[PATCH] ethtool: add support for dynamic mode in {SG}RXFH commands
Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- ethtool-copy.h | 8 +++- ethtool.c | 36 +++- 2 files changed, 42 insertions(+), 2 deletions(-) diff --git a/ethtool-copy.h b/ethtool-copy.h index d23ffc4c38b4..620dcea06d25 100644 --- a/ethtool-copy.h +++ b/ethtool-copy.h @@ -890,6 +890,10 @@ struct ethtool_rxfh_indir { * hardware hash key. * @hfunc: Defines the current RSS hash function used by HW (or to be set to). * Valid values are one of the %ETH_RSS_HASH_*. + * @dynamic: Indicate whether the device driver may use dynamic RSS settings + * which change due to various run time factors, such as number of + * queues. When false driver must attempt to preserve RSS settings when + * possible. When true driver may override any requested RSS settings. * @rsvd: Reserved for future extensions. * @rss_config: RX ring/queue index for each hash value i.e., indirection table * of @indir_size __u32 elements, followed by hash key of @key_size @@ -900,6 +904,7 @@ struct ethtool_rxfh_indir { * %ETH_RXFH_INDIR_NO_CHANGE means that indir table setting is not requested * and a @indir_size of zero means the indir table should be reset to default * values. An hfunc of zero means that hash function setting is not requested. + * If dynamic is true, driver may ignore any other settings requested. */ struct ethtool_rxfh { __u32 cmd; @@ -907,7 +912,8 @@ struct ethtool_rxfh { __u32 indir_size; __u32 key_size; __u8hfunc; - __u8rsvd8[3]; + __u8dynamic; + __u8rsvd8[2]; __u32 rsvd32; __u32 rss_config[0]; }; diff --git a/ethtool.c b/ethtool.c index 92c40b823f2c..29b9279b6b1c 100644 --- a/ethtool.c +++ b/ethtool.c @@ -917,6 +917,19 @@ static int convert_string_to_hashkey(char *rss_hkey, u32 key_size, return 2; } +static u8 parse_dynamic(const char *rss_dynamic, u8 *dynamic) +{ + if (!strcmp(rss_dynamic, "on")) { + *dynamic = 1; + return 0; + } else if (!strcmp(rss_dynamic, "off")) { + *dynamic = 0; + return 0; + } else { + return 2; + } +} + static int parse_hkey(char **rss_hkey, u32 key_size, const char *rss_hkey_string) { @@ -3213,6 +3226,11 @@ static int do_grxfh(struct cmd_context *ctx) indir_bytes = rss->indir_size * sizeof(rss->rss_config[0]); hkey = ((char *)rss->rss_config + indir_bytes); + if (rss->dynamic) + printf("Dynamic mode enabled\n"); + else + printf("Static mode enabled\n"); + printf("RSS hash key:\n"); if (!rss->key_size) printf("Operation not supported\n"); @@ -3326,11 +3344,13 @@ static int do_srxfh(struct cmd_context *ctx) int rxfhindir_equal = 0; char **rxfhindir_weight = NULL; char *rxfhindir_key = NULL; + char *rxfh_dynamic = NULL; char *hkey = NULL; int err = 0; u32 arg_num = 0, indir_bytes = 0; u32 entry_size = sizeof(rss_head.rss_config[0]); u32 num_weights = 0; + u8 dynamic = 0; if (ctx->argc < 2) exit_bad_args(); @@ -3357,6 +3377,12 @@ static int do_srxfh(struct cmd_context *ctx) if (!rxfhindir_key) exit_bad_args(); ++arg_num; + } else if (!strcmp(ctx->argp[arg_num], "dynamic")) { + ++arg_num; + rxfh_dynamic = ctx->argp[arg_num]; + if (!rxfhindir_key) + exit_bad_args(); + ++arg_num; } else { exit_bad_args(); } @@ -3392,6 +3418,12 @@ static int do_srxfh(struct cmd_context *ctx) return err; } + if (rxfh_dynamic) { + err = parse_dynamic(rxfh_dynamic, ); + if (err) + return err; + } + if (rxfhindir_equal || rxfhindir_weight) indir_bytes = rss_head.indir_size * entry_size; @@ -3403,6 +3435,7 @@ static int do_srxfh(struct cmd_context *ctx) rss->cmd = ETHTOOL_SRSSH; rss->indir_size = rss_head.indir_size; rss->key_size = rss_head.key_size; + rss->dynamic = dynamic; if (fill_indir_table(>indir_size, rss->rss_config, rxfhindir_equal, rxfhindir_weight, num_weights)) { @@ -4112,7 +4145,8 @@ static const struct option { { "-X|--set-rxfh-indir|--rxfh", 1, do_srxfh, "Set Rx flow hash indirection and/or hash key", " [ equal N | weight W0 W1 ... ]\n" - "
[PATCH 1/2] ethtool: add dynamic flag to ETHTOOL_{GS}RXFH commands
Ethtool supports a few operations for modifying and controlling a device's RSS table. Sometimes, changes in other features of the device may require (or desire) changes to the RSS table. Currently there is no method to indicate to the driver whether the current RSS table settings should be maintained or overridden. A simple example of this is for when the number of receive queues is changed, there are two possibilities. First, the number of queues is decreased. This must result in a reprogramming of the RSS table since it will no longer match correctly and may attempt to assign traffic to a queue which is now disabled. In this case drivers have a clear indication of what to do. The second case, is when the number of queues has increased. In this case, the current RSS table may be preserved. However, doing so would result in the new queues being unused for RSS. But if the driver chooses to destroy the RSS configuration it may result in unwanted behavior as now the user's configured changes are lost. This patch attempts to resolve this (and other similar) issues by indicating a new flag "dynamic" which can be set by the user when calling the ethtool interface. This flag indicates to the driver that it may overwrite settings in the RSS table. If false, it indicates the driver should do what it can to preserve the RSS table changes requested by the user. That is, for cases where it can preserve the table it must. If the value is set true, it means the driver may or may not apply the current settings and is free to change the values as necessary. The current default is set to false, as this is how most drivers appear to behave today. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> Cc: Lendacky, Thomas <thomas.lenda...@amd.com> Cc: Yuval Mintz <yuval...@broadcom.com> Cc: Michael Chan <mc...@broadcom.com> Cc: Matt Carlson <mcarl...@broadcom.com> Cc: Sunil Goutham <sgout...@cavium.com> Cc: Hariprasad Shenai <haripra...@chelsio.com> Cc: Govindarajulu Varadarajan <_gov...@gmx.com> Cc: Kalesh AP <kalesh.pura...@emulex.com> Cc: Andrew Lunn <and...@lunn.ch> Cc: Shannon Nelson <shannon.nel...@intel.com> Cc: Mitch Williams <mitch.a.willi...@intel.com> Cc: Carolyn Wyborny <carolyn.wybo...@intel.com> Cc: Emil Tantilov <emil.s.tanti...@intel.com> Cc: Thomas Petazzoni <thomas.petazz...@free-electrons.com> Cc: Amir Vadai <am...@mellanox.com> Cc: Achiad Shochat <ach...@mellanox.com> Cc: Ben Hutchings <bhutchi...@solarflare.com> Cc: Michał Mirosław <mirq-li...@rere.qmqm.pl> Cc: Alexander Duyck <adu...@mirantis.com> --- drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c | 7 -- .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c| 7 -- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 5 +++- drivers/net/ethernet/broadcom/tg3.c| 8 +-- .../net/ethernet/cavium/thunder/nicvf_ethtool.c| 7 -- drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c | 7 -- drivers/net/ethernet/cisco/enic/enic_ethtool.c | 7 -- drivers/net/ethernet/emulex/benet/be_ethtool.c | 7 -- drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 8 +-- drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 7 -- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 7 -- drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 6 +++-- drivers/net/ethernet/intel/igb/igb_ethtool.c | 6 +++-- drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 7 -- drivers/net/ethernet/intel/ixgbevf/ethtool.c | 5 +++- drivers/net/ethernet/marvell/mvneta.c | 8 +-- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c| 7 -- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 7 -- .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 7 -- drivers/net/ethernet/sfc/ethtool.c | 7 -- include/linux/ethtool.h| 4 ++-- include/uapi/linux/ethtool.h | 8 ++- net/core/ethtool.c | 27 +- 23 files changed, 124 insertions(+), 52 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c index 6040293db9c1..4eecd225db7c 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c @@ -509,11 +509,14 @@ static u32 xgbe_get_rxfh_indir_size(struct net_device *netdev) } static int xgbe_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key, -u8 *hfunc) +u8 *hfunc, u8 *dynamic) { struct xgbe_prv_data *pdata = netdev_priv(netdev); unsigned int i; + if (dynamic) + *dynamic = false; + if (indir) { for (i = 0; i < ARRAY_SIZE(pdata->rss_table); i++) ind
[PATCH 2/2] fm10k: support dynamic mode for RSS table control
Add support for the new dynamic flag from set_rxfh and get_rxfh. For now the only known dynamic reason to change RSS is when number of queues is changed. The default mode for the driver will be dynamic, indicating that the driver is free to change its own default setting as it sees fit. Since the current default userspace ethtool operation will only send false as the value to dynamic this results in the behavior that the driver configured RSS settings are capable of being modified but user supplied changes will be preserved as long as possible. In practice, this resolves an issue where decreasing the number of queues and then increasing them shall no longer break RSS unless the user has manually configured a non-dynamic RSS table setting. If the user has configured the RSS table without setting dynamic, the driver will do its best to maintain the configuration. If this is not possible, we will indicate that the mode is now dynamic and reset to driver defaults. Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com> --- drivers/net/ethernet/intel/fm10k/fm10k.h | 1 + drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 10 +- drivers/net/ethernet/intel/fm10k/fm10k_main.c| 8 +--- 3 files changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h index 83f386714e87..5402b5c55247 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k.h +++ b/drivers/net/ethernet/intel/fm10k/fm10k.h @@ -268,6 +268,7 @@ struct fm10k_intfc { #define FM10K_FLAG_RX_TS_ENABLED (u32)(BIT(3)) #define FM10K_FLAG_SWPRI_CONFIG(u32)(BIT(4)) #define FM10K_FLAG_DEBUG_STATS (u32)(BIT(5)) +#define FM10K_FLAG_STATIC_RETA_TBL (u32)(BIT(6)) int xcast_mode; /* Tx fast path data */ diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c index af11c4c1b256..8799296ff86e 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c @@ -1100,7 +1100,7 @@ static int fm10k_get_rssh(struct net_device *netdev, u32 *indir, u8 *key, *hfunc = ETH_RSS_HASH_TOP; if (dynamic) - *dynamic = false; + *dynamic = !(interface->flags & FM10K_FLAG_STATIC_RETA_TBL); err = fm10k_get_reta(netdev, indir); if (err || !key) @@ -1123,6 +1123,14 @@ static int fm10k_set_rssh(struct net_device *netdev, const u32 *indir, if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP) return -EOPNOTSUPP; + /* If dynamic mode is not requested, enable the static flag. We'll +* still attempt to populate the RETA table using the provided +* settings if possible. */ + if (dynamic) + interface->flags &= ~FM10K_FLAG_STATIC_RETA_TBL; + else + interface->flags |= FM10K_FLAG_STATIC_RETA_TBL; + err = fm10k_set_reta(netdev, indir); if (err || !key) return err; diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c index 134ce4daa994..8c75f60028a6 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c @@ -1932,13 +1932,13 @@ static void fm10k_assign_rings(struct fm10k_intfc *interface) fm10k_cache_ring_rss(interface); } -static void fm10k_init_reta(struct fm10k_intfc *interface) +void fm10k_init_reta(struct fm10k_intfc *interface) { u16 i, rss_i = interface->ring_feature[RING_F_RSS].indices; u32 reta, base; - /* If the netdev is initialized we have to maintain table if possible */ - if (interface->netdev->reg_state != NETREG_UNINITIALIZED) { + /* Maintain static user provided table if possible. */ + if (interface->flags & FM10K_FLAG_STATIC_RETA_TBL) { for (i = FM10K_RETA_SIZE; i--;) { reta = interface->reta[i]; if reta << 24) >> 24) < rss_i) && @@ -1954,6 +1954,8 @@ static void fm10k_init_reta(struct fm10k_intfc *interface) } repopulate_reta: + interface->flags &= ~FM10K_FLAG_STATIC_RETA_TBL; + /* Populate the redirection table 4 entries at a time. To do this * we are generating the results for n and n+2 and then interleaving * those with the results with n+1 and n+3. -- 2.6.3.505.g5cc1fd1
[PATCH] ethtool: add dynamic flag to {SG}RXFH
This patch series proposes the addition of a dynamic flag to the ethtool {SG}RXFH operations. The primary reasoning for this is so that drivers may indicate when they destroyed configured RSS settings, and can determine when they have more liberty to remove user's settings. The default mode shall be the current static mode, where drivers should make best effort to maintain the RSS table settings if possible. However, if the user sets dynamic mode flag, then the driver is free (if necessary or useful) to modify the requested settings. The primary reason for this is when queue sizes change dynamically. If you increase the number of available queues, the RSS table may be il-configured and the driver might which to change the settings. Today, most drivers attempt to maintain the RSS table when possible. This means that a user can observe functioning RSS, decrease the number of queues, and then increase them again. Under current functionality, drivers may end up never re-writing the RSS table back to the default when the queues are increased. Even worse, if a driver does do this today, they may have destroyed some specific settings the user configured in the RSS table. Instead, use the dynamic mode value which the driver will use to indicate whether or not the current settings might change due to dynamic factors. This series includes support to fix all the driver function pointers, and a patch to enhance fm10k driver to support this feature. The previous behavior of the fm10k driver was especially problematic, and the current patch series attempts to resolve this. It is possible to have most of the behavior handled purely in driver, but then we lose any ability to communicate this to the user via ethtool. Jacob Keller (3): ethtool: add dynamic flag to ETHTOOL_{GS}RXFH commands fm10k: support dynamic mode for RSS table control ethtool: add support for dynamic mode in {SG}RXFH commands drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c | 7 +-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 7 +-- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c| 5 - drivers/net/ethernet/broadcom/tg3.c | 8 ++-- drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c | 7 +-- drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c | 7 +-- drivers/net/ethernet/cisco/enic/enic_ethtool.c | 7 +-- drivers/net/ethernet/emulex/benet/be_ethtool.c | 7 +-- drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 8 ++-- drivers/net/ethernet/intel/fm10k/fm10k.h | 1 + drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 15 +-- drivers/net/ethernet/intel/fm10k/fm10k_main.c| 8 +--- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 7 +-- drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 6 -- drivers/net/ethernet/intel/igb/igb_ethtool.c | 6 -- drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 7 +-- drivers/net/ethernet/intel/ixgbevf/ethtool.c | 5 - drivers/net/ethernet/marvell/mvneta.c| 8 ++-- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 7 +-- drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 7 +-- drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 7 +-- drivers/net/ethernet/sfc/ethtool.c | 7 +-- include/linux/ethtool.h | 4 ++-- include/uapi/linux/ethtool.h | 8 +++- net/core/ethtool.c | 27 --- 25 files changed, 138 insertions(+), 55 deletions(-) ethtool-copy.h | 8 +++- ethtool.c | 36 +++- 2 files changed, 42 insertions(+), 2 deletions(-) -- 2.6.3.505.g5cc1fd1