RE: [RFC 5/5] i40e/ethtool: support coalesce setting by queue

2015-12-21 Thread Nelson, Shannon


> From: Liang, Kan
> Sent: Friday, December 18, 2015 1:06 PM
> 
> Hi sln,
> 
> Thanks for the comments. I will fix it in V2.
> 
> Could you please check if the following code can find and set/get vector
> correctly?
> 
> Get rx_usecs and tx_usecs per queue
> 
> + if (queue > 0) {
> + if (queue >= vsi->num_queue_pairs) {
> + netif_info(pf, drv, netdev, "Invalid queue number\n");

Adding queue range information here like in the next chunk would be nice.

Otherwise, I think the rest of this looks reasonable.

sln

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC 5/5] i40e/ethtool: support coalesce setting by queue

2015-12-17 Thread Nelson, Shannon
> From: Kan Liang 
> 
> This patch implements set_per_queue_coalesce for i40e driver.
> For i40e driver, only rx and tx usecs has per queue value. Changing
> these two parameters only impact the specific queue. For other interrupt
> coalescing parameters, they are shared among queues. The change to one
> queue will impact all queues.
> 
> Signed-off-by: Kan Liang 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 55 +++--
> -
>  1 file changed, 41 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index b41f0be..5a35fdb 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -1901,14 +1901,29 @@ static int i40e_get_per_queue_coalesce(struct
> net_device *netdev, int queue,
>   return __i40e_get_coalesce(netdev, ec, queue);
>  }
> 
> -static int i40e_set_coalesce(struct net_device *netdev,
> -  struct ethtool_coalesce *ec)
> +static void i40e_set_itr_for_queue(struct i40e_vsi *vsi, int queue, u16
> vector)
>  {
> - struct i40e_netdev_priv *np = netdev_priv(netdev);
>   struct i40e_q_vector *q_vector;
> - struct i40e_vsi *vsi = np->vsi;
>   struct i40e_pf *pf = vsi->back;
>   struct i40e_hw *hw = &pf->hw;
> + u16 intrl = INTRL_USEC_TO_REG(vsi->int_rate_limit);
> +
> + q_vector = vsi->q_vectors[queue];

Again, the problem here is that there can be more queues than vectors, so 
you'll need to check the requested queue number against vsi->num_queue_pairs 
for the max queue check, then look at vsi->tx_rings[queue].q_vector and/or 
vsi->rx_rings[queue].q_vector to find the actual q_vector for the requested 
queue.

Also, this means that it would be possible for the user to try to assign 
different values to queues on the same vector.  How do you want to handle that? 
 My first inclination is to not do anything and just let the last value 
assigned win.

sln

> + q_vector->rx.itr = ITR_TO_REG(vsi->rx_itr_setting);
> + wr32(hw, I40E_PFINT_ITRN(0, vector - 1), q_vector->rx.itr);
> + q_vector->tx.itr = ITR_TO_REG(vsi->tx_itr_setting);
> + wr32(hw, I40E_PFINT_ITRN(1, vector - 1), q_vector->tx.itr);
> + wr32(hw, I40E_PFINT_RATEN(vector - 1), intrl);
> + i40e_flush(hw);
> +}
> +
> +static int __i40e_set_coalesce(struct net_device *netdev,
> +struct ethtool_coalesce *ec,
> +int queue)
> +{
> + struct i40e_netdev_priv *np = netdev_priv(netdev);
> + struct i40e_vsi *vsi = np->vsi;
> + struct i40e_pf *pf = vsi->back;
>   u16 vector;
>   int i;
> 
> @@ -1964,21 +1979,32 @@ static int i40e_set_coalesce(struct net_device
> *netdev,
>   else
>   vsi->tx_itr_setting &= ~I40E_ITR_DYNAMIC;
> 
> - for (i = 0; i < vsi->num_q_vectors; i++, vector++) {
> - u16 intrl = INTRL_USEC_TO_REG(vsi->int_rate_limit);
> -
> - q_vector = vsi->q_vectors[i];
> - q_vector->rx.itr = ITR_TO_REG(vsi->rx_itr_setting);
> - wr32(hw, I40E_PFINT_ITRN(0, vector - 1), q_vector->rx.itr);
> - q_vector->tx.itr = ITR_TO_REG(vsi->tx_itr_setting);
> - wr32(hw, I40E_PFINT_ITRN(1, vector - 1), q_vector->tx.itr);
> - wr32(hw, I40E_PFINT_RATEN(vector - 1), intrl);
> - i40e_flush(hw);
> + if (queue < 0) {
> + for (i = 0; i < vsi->num_q_vectors; i++, vector++)
> + i40e_set_itr_for_queue(vsi, i, vector);
> + } else {
> + if (queue >= vsi->num_q_vectors) {
> + netif_info(pf, drv, netdev, "Invalid queue value, queue
> range is 0 - %d\n", vsi->num_q_vectors - 1);
> + return -EINVAL;
> + }
> + i40e_set_itr_for_queue(vsi, queue, vector + queue);
>   }
> 
>   return 0;
>  }
> 
> +static int i40e_set_coalesce(struct net_device *netdev,
> +  struct ethtool_coalesce *ec)
> +{
> + return __i40e_set_coalesce(netdev, ec, -1);
> +}
> +
> +static int i40e_set_per_queue_coalesce(struct net_device *netdev, int
> queue,
> +struct ethtool_coalesce *ec)
> +{
> + return __i40e_set_coalesce(netdev, ec, queue);
> +}
> +
>  /**
>   * i40e_get_rss_hash_opts - Get RSS hash Input Set for each flow type
>   * @pf: pointer to the physical function struct
> @@ -2818,6 +2844,7 @@ static const struct ethtool_ops i40e_ethtool_ops = {
>   .get_priv_flags = i40e_get_priv_flags,
>   .set_priv_flags = i40e_set_priv_flags,
>   .get_per_queue_coalesce = i40e_get_per_queue_coalesce,
> + .set_per_queue_coalesce = i40e_set_per_queue_coalesce,
>  };
> 
>  void i40e_set_ethtool_ops(struct net_device *netdev)
> --
> 1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kern

RE: [RFC 4/5] i40e/ethtool: support coalesce getting by queue

2015-12-17 Thread Nelson, Shannon
> From: Kan Liang 
> 
> This patch implements get_per_queue_coalesce for i40e driver.
> For i40e driver, only rx and tx usecs has per queue value. So only these
> two parameters are read from specific registers. For other interrupt
> coalescing parameters, they are shared among queues. The values which
> are stored in vsi will be return.
> 
> Signed-off-by: Kan Liang 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 34
> --
>  1 file changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index 3f385ff..b41f0be 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -1843,11 +1843,16 @@ static int i40e_set_phys_id(struct net_device
> *netdev,
>   * 125us (8000 interrupts per second) == ITR(62)
>   */
> 
> -static int i40e_get_coalesce(struct net_device *netdev,
> -  struct ethtool_coalesce *ec)
> +static int __i40e_get_coalesce(struct net_device *netdev,
> +struct ethtool_coalesce *ec,
> +int queue)
>  {
>   struct i40e_netdev_priv *np = netdev_priv(netdev);
>   struct i40e_vsi *vsi = np->vsi;
> + struct i40e_pf *pf = vsi->back;
> + struct i40e_hw *hw = &pf->hw;
> + struct i40e_q_vector *q_vector;
> + u16 vector;
> 
>   ec->tx_max_coalesced_frames_irq = vsi->work_limit;
>   ec->rx_max_coalesced_frames_irq = vsi->work_limit;
> @@ -1869,9 +1874,33 @@ static int i40e_get_coalesce(struct net_device
> *netdev,
>   ec->rx_coalesce_usecs_high = vsi->int_rate_limit;
>   ec->tx_coalesce_usecs_high = vsi->int_rate_limit;
> 
> + if (queue > 0) {
> + if (queue >= vsi->num_q_vectors) {
> + netif_info(pf, drv, netdev, "Invalid queue number\n");
> + return -EINVAL;
> + }
> +
> + q_vector = vsi->q_vectors[queue];

The problem here is that there can be more queues than vectors and they're not 
spread round-robin style.  For example, if the VSI was only given 4 vectors, 
but has 8 queues, there will be 2 queues per vector paired up such that V0 <= 
{Q0, Q1}, V1 <= {Q2, Q3} ...

You'll need to check the requested queue number against vsi->num_queue_pairs 
for the max queue check, then look at vsi->tx_rings[queue].q_vector and/or 
vsi->rx_rings[queue].q_vector to find the actual q_vector for the requested 
queue.

sln

> + vector = vsi->base_vector + queue;
> +
> + ec->rx_coalesce_usecs = ITR_REG_TO_USEC(rd32(hw,
> I40E_PFINT_ITRN(0, vector - 1)));
> + ec->tx_coalesce_usecs = ITR_REG_TO_USEC(rd32(hw,
> I40E_PFINT_ITRN(1, vector - 1)));
> + }
>   return 0;
>  }
> 
> +static int i40e_get_coalesce(struct net_device *netdev,
> +  struct ethtool_coalesce *ec)
> +{
> + return __i40e_get_coalesce(netdev, ec, -1);
> +}
> +
> +static int i40e_get_per_queue_coalesce(struct net_device *netdev, int
> queue,
> +struct ethtool_coalesce *ec)
> +{
> + return __i40e_get_coalesce(netdev, ec, queue);
> +}
> +
>  static int i40e_set_coalesce(struct net_device *netdev,
>struct ethtool_coalesce *ec)
>  {
> @@ -2788,6 +2817,7 @@ static const struct ethtool_ops i40e_ethtool_ops = {
>   .get_ts_info= i40e_get_ts_info,
>   .get_priv_flags = i40e_get_priv_flags,
>   .set_priv_flags = i40e_set_priv_flags,
> + .get_per_queue_coalesce = i40e_get_per_queue_coalesce,
>  };
> 
>  void i40e_set_ethtool_ops(struct net_device *netdev)
> --
> 1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARN trace - skb_warn_bad_offload - vxlan - large udp packet - udp checksum disabled

2015-12-14 Thread Nelson, Shannon
Using a slightly modified version of udpspam (see diff below - hopefully not 
mangled by corporate email servers), where I set the SO_NO_CHECK socket option 
and can specify a large buffer size, I can reliably get the following WARN 
trace.  I have reproduced this on both ixgbe and i40e drivers using 
"udpspam-no-check  6000".

It looks to me like this is in the Tx path before we get to the actual NIC 
drivers, but I may be wrong.


[ 1757.644324] [ cut here ]
[ 1757.644333] WARNING: CPU: 22 PID: 5537 at net/core/dev.c:2423 
skb_warn_bad_offload+0x104/0x111()
[ 1757.644340] ixgbe: caps=(0x080660314bb3, 0x) len=6092 
data_len=6000 gso_size=1528 gso_type=1026 ip_summed=0
[ 1757.644343] Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink vxlan 
ip6_udp_tunnel udp_tunnel rfcomm xt_CHECKSUM bnep bluetooth rfkill tun fuse 
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat 
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle 
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle 
iptable_security iptable_raw x86_pkg_temp_thermal coretemp kvm_intel kvm joydev 
iTCO_wdt ipmi_devintf iTCO_vendor_support irqbypass crct10dif_pclmul ixgbe igb 
crc32_pclmul ipmi_si ptp pps_core sb_edac lpc_ich crc32c_intel pcspkr edac_core 
i2c_i801 mfd_core mdio ipmi_msghandler mei_me shpchp tpm_tis ioatdma mei dca 
wmi tpm binfmt_misc uinput mgag200 i2c_algo_bit drm_kms_helper
[ 1757.63]  ttm drm isci libsas firewire_ohci firewire_core i2c_core 
scsi_transport_sas crc_itu_t
[ 1757.644475] CPU: 22 PID: 5537 Comm: udpspam-no-chec Tainted: GW  
 4.4.0-rc3+ #1
[ 1757.644480] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS 
SE5C600.86B.02.03.0003.041920141333 04/19/2014
[ 1757.644482]   c92b03e4 88081907b410 
8138f918
[ 1757.644488]  88081907b458 88081907b448 8109c036 
8808196e9500
[ 1757.644494]  880816f6 0402  
88080e8eb2ac
[ 1757.644499] Call Trace:
[ 1757.644509]  [] dump_stack+0x44/0x5c
[ 1757.644514]  [] warn_slowpath_common+0x86/0xc0
[ 1757.644518]  [] warn_slowpath_fmt+0x5c/0x80
[ 1757.644523]  [] ? ___ratelimit+0x8c/0xf0
[ 1757.644539]  [] skb_warn_bad_offload+0x104/0x111
[ 1757.644549]  [] __skb_gso_segment+0x7f/0xd0
[ 1757.644563]  [] 
validate_xmit_skb.isra.104.part.105+0x11f/0x2a0
[ 1757.644572]  [] validate_xmit_skb_list+0x3b/0x60
[ 1757.644579]  [] sch_direct_xmit+0xc1/0x1f0
[ 1757.644585]  [] __dev_queue_xmit+0x21b/0x510
[ 1757.644589]  [] dev_queue_xmit+0x10/0x20
[ 1757.644593]  [] ip_finish_output2+0x23f/0x310
[ 1757.644598]  [] ip_finish_output+0x139/0x1f0
[ 1757.644605]  [] ? nf_hook_slow+0x76/0xd0
[ 1757.644610]  [] ip_output+0x6e/0xe0
[ 1757.644615]  [] ? __ip_local_out+0x42/0x100
[ 1757.644620]  [] ? ip_fragment.constprop.49+0x80/0x80
[ 1757.644627]  [] ip_local_out+0x35/0x40
[ 1757.644634]  [] iptunnel_xmit+0x12d/0x150
[ 1757.644640]  [] udp_tunnel_xmit_skb+0xea/0x100 [udp_tunnel]
[ 1757.644648]  [] vxlan_xmit_one+0xac6/0x1280 [vxlan]
[ 1757.644659]  [] ? vprintk_emit+0x2f2/0x4f0
[ 1757.644675]  [] ? printk+0x5d/0x74
[ 1757.644681]  [] ? warn_slowpath_common+0x95/0xc0
[ 1757.644688]  [] vxlan_xmit+0x172/0xd44 [vxlan]
[ 1757.644694]  [] ? inet_gso_segment+0x163/0x360
[ 1757.644711]  [] dev_hard_start_xmit+0x22e/0x3b0
[ 1757.644721]  [] __dev_queue_xmit+0x414/0x510
[ 1757.644734]  [] dev_queue_xmit+0x10/0x20
[ 1757.644747]  [] ip_finish_output2+0x23f/0x310
[ 1757.644758]  [] ip_finish_output+0x139/0x1f0
[ 1757.644763]  [] ? nf_hook_slow+0x76/0xd0
[ 1757.644768]  [] ip_output+0x6e/0xe0
[ 1757.644775]  [] ? __ip_local_out+0x42/0x100
[ 1757.644780]  [] ? ip_fragment.constprop.49+0x80/0x80
[ 1757.644785]  [] ip_local_out+0x35/0x40
[ 1757.644793]  [] ip_send_skb+0x19/0x40
[ 1757.644800]  [] udp_send_skb+0x16d/0x270
[ 1757.644807]  [] udp_sendmsg+0x2c8/0x9a0
[ 1757.644812]  [] ? ip_reply_glue_bits+0x60/0x60
[ 1757.644825]  [] inet_sendmsg+0x67/0xa0
[ 1757.644838]  [] sock_sendmsg+0x38/0x50
[ 1757.644852]  [] SYSC_sendto+0x102/0x190
[ 1757.644860]  [] ? __audit_syscall_entry+0xaf/0x100
[ 1757.644867]  [] ? do_audit_syscall_entry+0x66/0x70
[ 1757.644873]  [] ? syscall_trace_enter_phase1+0x11f/0x140
[ 1757.644879]  [] ? syscall_slow_exit_work+0x3f/0x9f
[ 1757.644883]  [] SyS_sendto+0xe/0x10
[ 1757.644890]  [] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1757.644895] ---[ end trace ec9dfd887c59f41f ]---


Here are the udpspam.c diffs from the original that I found at
http://oss.sgi.com/archives/netdev/2001-10/txtmtEDzF63p0.txt


--- udpspam.c   2015-12-14 16:56:18.287053786 -0800
+++ udpspam-no-check.c  2015-12-14 17:02:21.979047972 -0800
@@ -42,8 +42,8 @@ typedef unsigned long dword;
 
 /* Globals */
 
-#define PAYLOAD_SIZE 4
-static char payload[PAYLOAD_SIZE];
+static int payload_size = 4;
+static char *payload;
 
 /* Socket functions 

RE: [PATCH v6] i40e: Look up MAC address in Open Firmware or IDPROM

2015-11-05 Thread Nelson, Shannon
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, November 05, 2015 8:29 AM
> 
> From: Sowmini Varadhan 
> Date: Thu, 5 Nov 2015 11:28:31 -0500
> 
> > On (11/05/15 11:05), David Miller wrote:
> >> From: David Miller 
> >> Date: Thu, 05 Nov 2015 10:31:26 -0500 (EST)
> >>
> >> > I'll see if I can cook something up.
> >>
> >> How does this look?
> >
> > Looks good to me,
> >
> > Do you want me to respin patch v7 with this? Or update ixgbe/i40e to
> use
> > this later, after this goes in?
> 
> The intention is to let your patch go in as-is, then try and update
> ixgbe/i40e later in net-next or similar.

That works for us as well - thanks!
sln


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v6] i40e: Look up MAC address in Open Firmware or IDPROM

2015-11-04 Thread Nelson, Shannon
> From: Sowmini Varadhan [mailto:sowmini.varad...@oracle.com]
> Sent: Wednesday, November 04, 2015 3:21 PM
> 
> This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC
> address in Open Firmware or IDPROM").
> 
> As with that fix, attempt to look up the MAC address in Open Firmware
> on systems that support it, and use IDPROM on SPARC if no OF address
> is found.
> 
> In the case of the i40e there is an assumption that the default mac
> address has already been set up as the primary mac filter on probe,
> so if this filter is obtained from the Open Firmware or IDPROM, an
> explicit write is needed via i40e_aq_mac_address_write() and
> i40e_aq_add_macvlan() invocation.
> 
> Reviewed-by: Martin K. Petersen 
> Signed-off-by: Sowmini Varadhan 
> ---
> v2, v3: Andy Shevchenko comments
> v4: Shannon Nelson review: explicitly set up mac filters before
> register_netdev
> v5: Shannon Nelson code style comments
> v6: Shannon Nelson code style comments
> 
>  drivers/net/ethernet/intel/i40e/i40e_main.c |   83
> ++-
>  1 files changed, 82 insertions(+), 1 deletions(-)
> 

Acked-by: Shannon Nelson 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5] i40e: Look up MAC address in Open Firmware or IDPROM

2015-11-04 Thread Nelson, Shannon
> From: Sowmini Varadhan [mailto:sowmini.varad...@oracle.com]
> Sent: Wednesday, November 04, 2015 11:40 AM
> 
> 
> This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC
> address in Open Firmware or IDPROM").
> 
> As with that fix, attempt to look up the MAC address in Open Firmware
> on systems that support it, and use IDPROM on SPARC if no OF address
> is found.
> 
> In the case of the i40e there is an assumption that the default mac
> address has already been set up as the primary mac filter on probe,
> so if this filter is obtained from the Open Firmware or IDPROM, an
> explicit write is needed via i40e_aq_mac_address_write() and
> i40e_aq_add_macvlan() invocation.
> 
> Reviewed-by: Martin K. Petersen 
> Signed-off-by: Sowmini Varadhan 
> ---
> v2, v3: Andy Shevchenko comments
> v4: Shannon Nelson review: explicitly set up mac filters before
> register_netdev
> v5: Shannon Nelson code style comments
> 
>  drivers/net/ethernet/intel/i40e/i40e_main.c |   84
> ++-
>  1 files changed, 83 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index b825f97..a3883cf 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -24,6 +24,15 @@
>   *
> 
> **
> /
> 
> +#include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_SPARC
> +#include 
> +#include 
> +#endif
> +
>  /* Local includes */
>  #include "i40e.h"
>  #include "i40e_diag.h"
> @@ -9213,6 +9222,44 @@ static struct i40e_vsi
> *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
>  }
> 
>  /**
> + * i40e_macaddr_init - explicitly write the mac address filters. This
> + * is needed when the macaddr has been obtained by other means than
> + * the default, e.g., from Open Firmware or IDPROM.

Note that this should be a simple single line, function name and short summary; 
anything more detailed goes into a description after the variables.


[...]

> 
>  /**
> + * i40e_get_platform_mac_addr - get mac address from Open Firmware
> + * or IDPROM if supported by the platform

Again, single line.

Thanks for your work on this, Sowmini.  If you can do a quick repost with these 
little function header comment bits tweaked, I'm willing to ACK this patch and 
I think we'll be ready for Jeff to include it into his tree.

sln

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5] i40e: Look up MAC address in Open Firmware or IDPROM

2015-11-04 Thread Nelson, Shannon
> From: Andy Shevchenko [mailto:andy.shevche...@gmail.com]
> Sent: Wednesday, November 04, 2015 11:59 AM
> 
> On Wed, Nov 4, 2015 at 9:39 PM, Sowmini Varadhan
>  wrote:
> >
> > This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC
> > address in Open Firmware or IDPROM").

[...]

> > +   }
> > +
> > +   memset(&element, 0, sizeof(element));
> > +   ether_addr_copy(element.mac_addr, macaddr);
> > +   element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH);
> > +   ret = i40e_aq_add_macvlan(&vsi->back->hw, vsi->seid, &element,
> 1, NULL);
> > +   aq_err = vsi->back->hw.aq.asq_last_status;
> 
> Do you really need a separate variable (aq_err)?

These are two separate error values that we're tracking - one from the 
communication between the driver and the firmware (aq_err) and one from the 
driver activity.  Sometimes there may be an AQ error that we want to report, 
but it might not actually be a driver error.  Alternatively, there are times 
when the AQ error needs to get interpreted different ways depending on which 
task the driver is performing.  Lastly, the AQ error gives us more detail on 
whatever the transaction error may have been which gives us more useful debug 
info.

sln


RE: [PATCH v3 net] i40e: Look up MAC address in Open Firmware or IDPROM

2015-11-02 Thread Nelson, Shannon


> -Original Message-
> From: Sowmini Varadhan [mailto:sowmini.varad...@oracle.com]
> Sent: Sunday, November 01, 2015 4:07 PM
> 
> On (11/01/15 21:03), Nelson, Shannon wrote:
> > .. In the meantime, be sure to test what happens over a reset, such as
> what
> > happens when the MTU is changed.  This will make sure that the replay
> > of mac and vlan filters happens correctly.  You'll want to test this
> > with and without vlans.
> 
> I assume you mean .1q (aka linux macvlan) as opposed to access/trunk
> vlans?

Yes, this is what I had in mind.

> I will test that tomorrow but I did a quick sanity check on mtu, as well
> as turning tso on/off which also restarts the driver (I believe), and
> it was "fine", i.e., able to ping offlink hosts.
> 
> --Sowmini

Great, thanks.

sln
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 RFC net] i40e: Look up MAC address in Open Firmware or IDPROM

2015-11-02 Thread Nelson, Shannon


> -Original Message-
> From: Sowmini Varadhan [mailto:sowmini.varad...@oracle.com]
> Sent: Sunday, November 01, 2015 8:25 AM
> 
> This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC
> address in Open Firmware or IDPROM").
> 
> As with that fix, attempt to look up the MAC address in Open Firmware
> on systems that support it, and use IDPROM on SPARC if no OF address
> is found.
> 
> In the case of the i40e there is an assumption that the default mac
> address has already been set up as the primary mac filter on probe,
> so if this filter is obtained from the Open Firmware or IDPROM, an
> explicit write is needed via i40e_aq_mac_address_write() and
> i40e_aq_add_macvlan() invocation.
> 
> Reviewers: please check if invoking i40e_macaddr_init() on
> platforms that use the default mac address (i.e., when it is not from
> OF or idprom) will cause harm, and if it is necessary/possible to
> move this invocation to an earlier point in i40e_probe().
> 
> Reviewed-by: Martin K. Petersen 
> Signed-off-by: Sowmini Varadhan 
> ---
> v2, v3: Andy Shevchenko comments
> v4:

Thanks, Sowmini, this looks reasonable to me, aside from a few simple stylistic 
comments below.

sln

> 
>  drivers/net/ethernet/intel/i40e/i40e_main.c |   53
> ++-
>  1 files changed, 52 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index b825f97..3c81c0c 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -24,6 +24,15 @@
>   *
> 
> **
> /
> 
> +#include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_SPARC
> +#include 
> +#include 
> +#endif
> +
>  /* Local includes */
>  #include "i40e.h"
>  #include "i40e_diag.h"
> @@ -9212,6 +9221,25 @@ static struct i40e_vsi
> *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
>   return NULL;
>  }
> 

Please add a function comment in the style of the rest of the code.

> +static int i40e_macaddr_init( struct i40e_vsi *vsi, u8 *macaddr)

No space between '(' and 'struct'

> +{
> + int ret;
> + struct i40e_aqc_add_macvlan_element_data element;
> +
> + ret = i40e_aq_mac_address_write(&vsi->back->hw,
> + I40E_AQC_WRITE_TYPE_LAA_WOL,
> + macaddr, NULL);
> + if (ret)
> + return -EADDRNOTAVAIL;

Please report the AQ error here as we do elsewhere in the code

> +
> + memset(&element, 0, sizeof(element));
> + ether_addr_copy(element.mac_addr, macaddr);
> + element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH);
> + i40e_aq_add_macvlan(&vsi->back->hw, vsi->seid, &element, 1, NULL);

Please check for and report any AQ error returned from this call as we do 
elsewhere in the code.

> +
> + return ret;
> +}
> +
>  /**
>   * i40e_vsi_setup - Set up a VSI by a given type
>   * @pf: board private structure
> @@ -9341,6 +9369,9 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf,
> u8 type,
>   ret = i40e_config_netdev(vsi);
>   if (ret)
>   goto err_netdev;
> + ret = i40e_macaddr_init(vsi, pf->hw.mac.addr);
> + if (ret)
> + goto err_netdev;
>   ret = register_netdev(vsi->netdev);
>   if (ret)
>   goto err_netdev;
> @@ -10162,6 +10193,24 @@ static void i40e_print_features(struct i40e_pf
> *pf)
>   kfree(string);
>  }
> 

Function comment needed

> +static int i40e_get_platform_mac_addr(struct pci_dev *pdev, u8 *mac_addr)
> +{
> + struct device_node *dp = pci_device_to_OF_node(pdev);
> + const unsigned char *addr;
> +
> + addr = of_get_mac_address(dp);
> + if (addr) {
> + ether_addr_copy(mac_addr, addr);
> + return 0;
> + }
> +#ifdef CONFIG_SPARC
> + ether_addr_copy(mac_addr, idprom->id_ethaddr);
> + return 0;
> +#else
> + return -EINVAL;
> +#endif /* CONFIG_SPARC */
> +}
> +
>  /**
>   * i40e_probe - Device initialization routine
>   * @pdev: PCI device information struct
> @@ -10360,7 +10409,9 @@ static int i40e_probe(struct pci_dev *pdev, const
> struct pci_device_id *ent)
>   i40e_aq_stop_lldp(hw, true, NULL);
>   }
> 
> - i40e_get_mac_addr(hw, hw->mac.addr);
> + err = i40e_get_platform_mac_addr(pdev, hw->mac.addr);
> + if (err)
> + i40e_get_mac_addr(hw, hw->mac.addr);
>   if (!is_valid_ether_addr(hw->mac.addr)) {
>   dev_info(&pdev->dev, "invalid MAC address %pM\n", hw-
> >mac.addr);
>   err = -EIO;
> --
> 1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 net] i40e: Look up MAC address in Open Firmware or IDPROM

2015-11-01 Thread Nelson, Shannon
> -Original Message-
> From: Sowmini Varadhan [mailto:sowmini.varad...@oracle.com]
> Sent: Sunday, November 01, 2015 8:24 AM
> 

[...]

> So I figured out why it all "seemed to work" - my test env had another
> obscure init process that was marking the link promiscuous.  I guess
> that was having the side-effect of somehow setting the filters above.
> 
> But looks like there's more to getting this right than just calling
> i40e_aq_mac_address_write() - I think it also needs a
> i40e_aq_add_macvlan().
> 
> I was able to get this to work by calling a the core part of
> i40e_set_mac just before register_netdev. In my patch (RFC patch
> in a separate thread - please review) I now have this sequence in
> i40e_probe
> 
>   err = i40e_get_platform_mac_addr(pdev, hw->mac.addr);
>   if (err)
>   i40e_get_mac_addr(hw, hw->mac.addr);
>:
>   i40e_setup_pf_switch(..);
> 
> And the resulting i40e_vsi_setup() from i40e_setup_pf_switch()
> will end up doing the right thing by invoking the guts of
> i40e_set_mac(), which is basically the  sequence:
>   i40e_aq_mac_address_write()
>   i40e_aq_add_macvlan()
> 
> I dont know if it is necessary/possible/important to set up the
> filters sooner in the sequence- the add_macvlan needs an "seid",
> and I could not tell when (in the ":" code above) the right seid
> can be found.
> 
> Please review the RFC patch I'll be sending shortly.
> 
> --Sowmini

Yeah, because of the underlying HW we need to manage, and the fact that we 
can't ask it for the filters it knows, it becomes a bit convoluted.  To manage 
and replay the filter lists, we use the i40e_{add,del}_filter() routines on the 
individual VSI filter lists.  I'm thinking you're on the right track, but we 
may not want to bypass the VSI's filter list.

My brain's not in gear this weekend so I'll review the patch later.  In the 
meantime, be sure to test what happens over a reset, such as what happens when 
the MTU is changed.  This will make sure that the replay of mac and vlan 
filters happens correctly.  You'll want to test this with and without vlans.

sln

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 net] i40e: Look up MAC address in Open Firmware or IDPROM

2015-10-30 Thread Nelson, Shannon


> -Original Message-
> From: Sowmini Varadhan [mailto:sowmini.varad...@oracle.com]
> Sent: Friday, October 30, 2015 12:25 PM
> 
> On (10/30/15 18:57), Nelson, Shannon wrote:
> > > >
> > > > Going along with this being the equivalent of the ixgbe patch, I'd
> > > > prefer the new code to be in i40e_main.c, rather than in i40e_common.c.
> > > > In the design of our drivers, the common file is essentially a device
> > > > specific layer, and the OS and platform related stuff should stay in
> > > > i40e_main.c.  That would also take care of one of the include file nits
> > > > that Andy mentioned.
>:
> > I'm not sure about any deeper wisdom, but the HW/FW model that this
> > device uses is rather different from our previous devices, so our SW
> > abstractions are a little different from ixgbe and igb.
> 
> Ok, in that case I can make i40e_main.c to do something like
> 
> static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> {
>:
>if (i40e_get_platform_mac_addr(..) != I40E_SUCCESS)
>   i40e_get_mac_addr(..);
>:
> }
> 
> and i agree that doing this from i40e_main will simplify a lot of the
> other stuff around getting the pdev etc.

The more common idiom in our driver would be

err = i40e_get_platform_mac_addr(..);
if (err) {
...
}

> 
> [Discussion about ndo_set_mac_address..]
> 
> > In the cases in which I'm aware, the platform manufacturer normally will
> > burn a different mac-address into the device's eeprom at manufacturing
> > time if they want something other than what came from Intel.  I suppose a
> > Device Tree oriented platform could have a different address in the DT,
> > but somehow that needs to get communicated to the device driver so that
> > it can tell the HW.
> >
> > My question to you remains: does this ndo_set_mac_address happen from
> > the stack above the driver or not?
> 
> I'm probably even less clued in to the DT arch than you in this case,
> so input from the intel experts would be useful here.

This worries me - if you're not clued into the DT architecture, why are you 
making these changes?  Have you tested this beyond a compile?  Do you have a DT 
model to try this against?

> 
> But both in this case, and for the ixgbe template on which I tried
> to model this, the OF/idprom probing happens from the ->probe when the
> driver comes up, and ndo_set_mac_address is not involved.
> 
> I dont know if it is easier (or even feasible to do this from ->probe)
> to just call the i40e_set_mac to make sure all the state-bits in
> the driver are correctly set.
> 
> --Sowmini

In looking at a couple other drivers, I see the difference being that they 
typically are writing the primary mac filter on probe (and any other reset), 
whereas the i40e "knows" that the default mac address is already set up as the 
filter and doesn't bother with a redundant write.  If you want to add this Open 
Filter code, you'll need to arrange for this write to happen.  You can't call 
i40e_set_mac() to do it, but you can see the i40e_aq_mac_address_write() code 
there that is involved in updating the mac address as an example.  You probably 
will want to look at section 4.2.1.5.3 of the XL710 data sheet in order to know 
how to use i40e_aq_mac_address_write() for your situation.

sln

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 net] i40e: Look up MAC address in Open Firmware or IDPROM

2015-10-30 Thread Nelson, Shannon
> -Original Message-
> From: Sowmini Varadhan [mailto:sowmini.varad...@oracle.com]
> Sent: Friday, October 30, 2015 11:37 AM
> To: 
> Subject: Re: [PATCH v3 net] i40e: Look up MAC address in Open Firmware or
> IDPROM
> 
> On (10/30/15 18:28), Nelson, Shannon wrote:
> >
> > Going along with this being the equivalent of the ixgbe patch, I'd
> > prefer the new code to be in i40e_main.c, rather than in i40e_common.c.
> > In the design of our drivers, the common file is essentially a device
> > specific layer, and the OS and platform related stuff should stay in
> > i40e_main.c.  That would also take care of one of the include file nits
> > that Andy mentioned.
> 
> ok, and that was my initial instinct as well, except that I noticed
> that the existing i40e code tries interesting variations from the
> typical intel driver model by calling i40e_get_mac_addr() which lives in
> (for reasons not obvious to me) i40e_common.c
> 
> So I assumed there must be some other deeper wisdom at work here.

I'm not sure about any deeper wisdom, but the HW/FW model that this device uses 
is rather different from our previous devices, so our SW abstractions are a 
little different from ixgbe and igb.

> 
> > At the risk of flying my Device Tree ignorance for all to see, I have
> > a couple questions on how this is used.
> >
> > Since the mac address is specific to the individual device, how does it
> > get into the device tree?  I thought the device tree was compiled ahead
> > of time for the platform it is used on.  Is this a generic DT pattern
> > just in case some platform actually has the mac-address?  Or does the
> > device mac-address get saved into the DT on the first time through this
> > path so it can be found next time?
> >
> > If the Device Tree has a different mac address than the HW, when
> > does the kernel tell the HW to use a different mac address?  Does the
> > DT management eventually call the ndo_set_mac_address call so the HW
> > knows to use a different mac address?
> 
> yes, and here I was hoping for some feedback from the intel folks as
> well. Commit c762dff24c06 sets hw->mac.perm_addr. I dont know
> if there is some similar i40e state that needs to be set.
> 
> Please share insights.
> 
> --Sowmini

Forgive me if you're already up with this, I just want to be sure we're 
thinking along the same lines.

The HW has a mac address burned into its own eeprom which it will use by 
default.  The driver merely queries the HW to find out what that mac address is 
and report it to the OS so it can be registered correctly in the netdev.

If a different mac address is desired, perhaps from the user issuing an "ip 
link" or ifconfig command, the OS will call the netdev's ndo_set_mac_address to 
tell the driver.  In our case, the driver then tells the HW device to filter 
for the new mac address rather than the eeprom'd address.  This remains until a 
reset when the HW goes back to its eeprom'd address.

In the cases in which I'm aware, the platform manufacturer normally will burn a 
different mac-address into the device's eeprom at manufacturing time if they 
want something other than what came from Intel.  I suppose a Device Tree 
oriented platform could have a different address in the DT, but somehow that 
needs to get communicated to the device driver so that it can tell the HW.

My question to you remains: does this ndo_set_mac_address happen from the stack 
above the driver or not?

Thanks,
sln


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 net] i40e: Look up MAC address in Open Firmware or IDPROM

2015-10-30 Thread Nelson, Shannon
> -Original Message-
> From: Sowmini Varadhan [mailto:sowmini.varad...@oracle.com]
> Sent: Friday, October 30, 2015 8:04 AM
> To: 
> Subject: [PATCH v3 net] i40e: Look up MAC address in Open Firmware or IDPROM
> 
> 
> This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC
> address in Open Firmware or IDPROM").
> 
> As with that fix, attempt to look up the MAC address in Open Firmware
> on systems that support it, and use IDPROM on SPARC if no OF address
> is found.

Going along with this being the equivalent of the ixgbe patch, I'd prefer the 
new code to be in i40e_main.c, rather than in i40e_common.c.  In the design of 
our drivers, the common file is essentially a device specific layer, and the OS 
and platform related stuff should stay in i40e_main.c.  That would also take 
care of one of the include file nits that Andy mentioned.


At the risk of flying my Device Tree ignorance for all to see, I have a couple 
questions on how this is used.

Since the mac address is specific to the individual device, how does it get 
into the device tree?  I thought the device tree was compiled ahead of time for 
the platform it is used on.  Is this a generic DT pattern just in case some 
platform actually has the mac-address?  Or does the device mac-address get 
saved into the DT on the first time through this path so it can be found next 
time?

If the Device Tree has a different mac address than the HW, when does the 
kernel tell the HW to use a different mac address?  Does the DT management 
eventually call the ndo_set_mac_address call so the HW knows to use a different 
mac address?

Thanks,
sln


> 
> Reviewed-by: Martin K. Petersen 
> Signed-off-by: Sowmini Varadhan 
> ---
> v2: andy shevchenko comments
> v3: more andy shevchenko comments
>  drivers/net/ethernet/intel/i40e/i40e_common.c |   30 
> +
>  1 files changed, 30 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c
> b/drivers/net/ethernet/intel/i40e/i40e_common.c
> index 2d74c6e..3edfe19 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_common.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
> @@ -24,6 +24,14 @@
>   *
> 
> **/
> 
> +#include 
> +#include 
> +#include 
> +#ifdef CONFIG_SPARC
> +#include 
> +#include 
> +#endif
> +#include "i40e.h"
>  #include "i40e_type.h"
>  #include "i40e_adminq.h"
>  #include "i40e_prototype.h"
> @@ -1008,6 +1016,25 @@ i40e_status i40e_aq_mac_address_write(struct i40e_hw
> *hw,
>   return status;
>  }
> 
> +static int i40e_get_platform_mac_addr(struct i40e_hw *hw, u8 *mac_addr)
> +{
> + struct i40e_pf *pf = hw->back;
> + struct device_node *dp = pci_device_to_OF_node(pf->pdev);
> + const unsigned char *addr;
> +
> + addr = of_get_mac_address(dp);
> + if (addr) {
> + ether_addr_copy(mac_addr, addr);
> + return 0;
> + }
> +
> +#ifdef CONFIG_SPARC
> + ether_addr_copy(mac_addr, idprom->id_ethaddr);
> + return 0;
> +#endif /* CONFIG_SPARC */
> + return 1;
> +}
> +
>  /**
>   * i40e_get_mac_addr - get MAC address
>   * @hw: pointer to the HW structure
> @@ -1021,6 +1048,9 @@ i40e_status i40e_get_mac_addr(struct i40e_hw *hw, u8
> *mac_addr)
>   i40e_status status;
>   u16 flags = 0;
> 
> + if (!i40e_get_platform_mac_addr(hw, mac_addr))
> + return I40E_SUCCESS;
> +
>   status = i40e_aq_mac_address_read(hw, &flags, &addrs, NULL);
> 
>   if (flags & I40E_AQC_LAN_ADDR_VALID)
> --
> 1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] intel: i40e: fix confused code

2015-10-20 Thread Nelson, Shannon
> -Original Message-
> From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> Sent: Saturday, October 17, 2015 1:58 PM
> Subject: [PATCH] intel: i40e: fix confused code
> 
> This code is pretty confused. The variable name 'bytes_not_copied'
> clearly indicates that the programmer knew the semantics of
> copy_{to,from}_user, but then the return value is checked for being
> negative and used as a -Exxx return value.
> 
> I'm not sure this is the proper fix, but at least we get rid of the
> dead code which pretended to check for access faults.
> 
> Signed-off-by: Rasmus Villemoes 
> ---

Acked-by: Shannon Nelson 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] intel: i40e: fix confused code

2015-10-20 Thread Nelson, Shannon
> From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> Sent: Tuesday, October 20, 2015 12:22 AM
> 
> On Mon, Oct 19 2015, "Nelson, Shannon"  wrote:
> 
> >> From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> >> Sent: Saturday, October 17, 2015 1:58 PM
> >> Subject: [PATCH] intel: i40e: fix confused code
> >>
> >> This code is pretty confused. The variable name 'bytes_not_copied'
> >> clearly indicates that the programmer knew the semantics of
> >> copy_{to,from}_user, but then the return value is checked for being
> >> negative and used as a -Exxx return value.
> >>
> >> I'm not sure this is the proper fix, but at least we get rid of the
> >> dead code which pretended to check for access faults.
> >>
> >> Signed-off-by: Rasmus Villemoes 
> >
> > I believe this patch is unnecessary: if the value is negative, then it
> > already is an error code giving some potentially useful information.
> > When I dig into the copy_to_user() code, I see in the comments for
> > put_user() that -EFAULT is the error being returned.
> 
> Thanks, this was precisely the kind of confusion I'm talking about:
> copy_{from,to}_user _never_ returns a negative value. It returns
> precisely what the very explicit variable name hints.
> 
> This is in contrast to the single-scalar functions get_user/put_user,
> which do return -EFAULT for error and 0 for success.
> 
> (See also lines 479-519 of Documentation/DocBook/kernel-hacking.tmpl).

I like the comment about the moronic interface for copy_to/from_user...

Yes, I see where I turned left instead of right.  This would be good to fix up.

sln
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] intel: i40e: fix confused code

2015-10-19 Thread Nelson, Shannon
> From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> Sent: Saturday, October 17, 2015 1:58 PM
> Subject: [PATCH] intel: i40e: fix confused code
> 
> This code is pretty confused. The variable name 'bytes_not_copied'
> clearly indicates that the programmer knew the semantics of
> copy_{to,from}_user, but then the return value is checked for being
> negative and used as a -Exxx return value.
> 
> I'm not sure this is the proper fix, but at least we get rid of the
> dead code which pretended to check for access faults.
> 
> Signed-off-by: Rasmus Villemoes 

I believe this patch is unnecessary: if the value is negative, then it already 
is an error code giving some potentially useful information.  When I dig into 
the copy_to_user() code, I see in the comments for put_user() that -EFAULT is 
the error being returned.  Also, if somewhere else along the line there is some 
other error, I'd prefer to return that value rather than stomp on it with my 
own error code.  

sln

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma

2007-11-29 Thread Nelson, Shannon
>From: J Hadi Salim [mailto:[EMAIL PROTECTED] On Behalf Of jamal
>
>On Thu, 2007-29-11 at 12:08 -0800, Nelson, Shannon wrote:
>> [RFC] I/OAT: Handle incoming udp through ioatdma
>> 
>> From: Shannon Nelson <[EMAIL PROTECTED]>
>> 
>> If the incoming udp packet is larger than 
>sysctl_udp_dma_copybreak, try
>> pushing it through the ioatdma asynchronous memcpy.  This is 
>very much
>> the
>> same as the tcp copy offload.  This is an RFC because we 
>know there are
>> stability problems under high traffic.
>
>
>What stability problems?

Under a heavy stress test combining TCP and UDP traffic we would get a
kernel panic from a NULL dereference in dma_unpin_iovec_pages().  Remove
this patch and the panic goes away.  Unfortunately, this problem is
below our priority line so it has received little attention since then.
We know of interest in this patch, however, so decided to release it
into the wild and see if it garners any other attention.

Part of the panic message:

Unable to handle kernel NULL pointer dereference at 
RIP: 
 [] set_page_dirty_lock+0xe/0x3a
PGD 2b91f067 PUD 2a04b067 PMD 0 
Oops: 0002 [1] SMP 
CPU 5 
Modules linked in: ioatdma dca igb i2c_dev i2c_core e1000
Pid: 10998, comm: netserver Not tainted 2.6.22.9_CB-2.05_patched #1
RIP: 0010:[]  []
set_page_dirty_lock+0xe/0x3a
RSP: 0018:810028fedb68  EFLAGS: 00010246
RAX:  RBX: 81003afea648 RCX: 81002a382b88
RDX: 810028fedfd8 RSI: 0282 RDI: 
RBP:  R08: 0001 R09: 
R10: 806c13e0 R11: 0246 R12: 
R13: 0001 R14:  R15: 81003afea660
FS:  2b23ba4177c0() GS:810001164e40()
knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2:  CR3: 29831000 CR4: 06e0
Process netserver (pid: 10998, threadinfo 810028fec000, task
81003a881590)
Stack:  810039887c80 81003afea648 81003afea640
8048545a
 fff4 810028fedf38 81003afea658 81003afea670
 81003afea640 80485657 81003afea660 
Call Trace:
 [] dma_unpin_iovec_pages+0x31/0x6e
 [] dma_pin_iovec_pages+0x1c0/0x1d9
 [] udp_recvmsg+0x94/0x43e
 [] sock_common_recvmsg+0x30/0x45
 [] sock_recvmsg+0xd5/0xed
 [] mutex_lock+0xd/0x1e
 [] autoremove_wake_function+0x0/0x2e
 [] find_get_page+0x21/0x50
 [] filemap_nopage+0x180/0x2b0
 [] __handle_mm_fault+0x404/0x9fc
 [] getnstimeofday+0x32/0x8d
 [] getnstimeofday+0x32/0x8d
 [] sys_recvfrom+0xe2/0x130
 [] enqueue_hrtimer+0x64/0x6b
 [] hrtimer_start+0xf2/0x104
 [] do_setitimer+0x15e/0x329
 [] alarm_setitimer+0x35/0x65
 [] system_call+0x7e/0x83


Code: f0 0f ba 6d 00 00 19 c0 85 c0 74 08 48 89 ef e8 89 ce ff ff 
RIP  [] set_page_dirty_lock+0xe/0x3a
 RSP 
CR2: 

>
>Is there some magic sysctl_udp_dma_copybreak threshold value where you
>start seeing the benefit of IOAT-ing? Since you mentioned
>"students", it would be interesting to see data where
>udp starts benefitting.

As I said, this is low on our priority list, so this data has not been
gathered.

>cheers,
>jamal

Thanks for your interest.
sln
--
==
Mr. Shannon Nelson LAN Access Division, Intel Corp.
[EMAIL PROTECTED]I don't speak for Intel
(503) 712-7659Parents can't afford to be squeamish.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma

2007-11-29 Thread Nelson, Shannon
[RFC] I/OAT: Handle incoming udp through ioatdma

From: Shannon Nelson <[EMAIL PROTECTED]>

If the incoming udp packet is larger than sysctl_udp_dma_copybreak, try
pushing it through the ioatdma asynchronous memcpy.  This is very much
the
same as the tcp copy offload.  This is an RFC because we know there are
stability problems under high traffic.

This code was originally proposed by the Capstone students at Portland
State University: Aaron Armstrong, Greg Nishikawa, Sean Gayner, Toai
Nguyen,
Stephen Bekefi, and Derek Chiles.

Signed-off-by: Shannon Nelson <[EMAIL PROTECTED]>
---

 include/net/udp.h   |5 +++
 net/core/user_dma.c |1 +
 net/ipv4/udp.c  |   79
---
 3 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 98755eb..d5e05d8 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -173,4 +173,9 @@ extern void udp_proc_unregister(struct
udp_seq_afinfo *afinfo);
 extern int  udp4_proc_init(void);
 extern void udp4_proc_exit(void);
 #endif
+
+#ifdef CONFIG_NET_DMA
+extern int sysctl_udp_dma_copybreak;
+#endif
+
 #endif /* _UDP_H */
diff --git a/net/core/user_dma.c b/net/core/user_dma.c
index 0ad1cd5..e876ca4 100644
--- a/net/core/user_dma.c
+++ b/net/core/user_dma.c
@@ -34,6 +34,7 @@
 #define NET_DMA_DEFAULT_COPYBREAK 4096
 
 int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
+int sysctl_udp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
 
 /**
  * dma_skb_copy_datagram_iovec - Copy a datagram to an iovec.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 69d4bd1..3b6d91c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -102,6 +102,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "udp_impl.h"
 
 /*
@@ -819,6 +821,11 @@ int udp_recvmsg(struct kiocb *iocb, struct sock
*sk, struct msghdr *msg,
unsigned int ulen, copied;
int err;
int is_udplite = IS_UDPLITE(sk);
+#ifdef CONFIG_NET_DMA
+   struct dma_chan *dma_chan = NULL;
+   struct dma_pinned_list  *pinned_list = NULL;
+   dma_cookie_tdma_cookie = 0;
+#endif
 
/*
 *  Check any passed addresses
@@ -829,6 +836,18 @@ int udp_recvmsg(struct kiocb *iocb, struct sock
*sk, struct msghdr *msg,
if (flags & MSG_ERRQUEUE)
return ip_recv_error(sk, msg, len);
 
+#ifdef CONFIG_NET_DMA
+   preempt_disable();
+   if ((len > sysctl_udp_dma_copybreak) && 
+   !(flags & MSG_PEEK) &&
+   __get_cpu_var(softnet_data).net_dma) {
+
+   preempt_enable_no_resched();
+   pinned_list = dma_pin_iovec_pages(msg->msg_iov, len);
+   } else
+   preempt_enable_no_resched();
+#endif
+
 try_again:
skb = skb_recv_datagram(sk, flags, noblock, &err);
if (!skb)
@@ -852,10 +871,30 @@ try_again:
goto csum_copy_err;
}
 
-   if (skb_csum_unnecessary(skb))
-   err = skb_copy_datagram_iovec(skb, sizeof(struct
udphdr),
- msg->msg_iov, copied
);
-   else {
+   if (skb_csum_unnecessary(skb)) {
+#ifdef CONFIG_NET_DMA
+   if (pinned_list && !dma_chan)
+   dma_chan = get_softnet_dma();
+   if (dma_chan) {
+   dma_cookie = dma_skb_copy_datagram_iovec(
+   dma_chan, skb, sizeof(struct
udphdr),
+   msg->msg_iov, copied,
pinned_list);
+   if (dma_cookie < 0) {
+   printk(KERN_ALERT "dma_cookie < 0\n");
+
+   /* Exception. Bailout! */
+   if (!copied)
+   copied = -EFAULT;
+   goto out_free;
+   }
+   err = 0;
+   }
+   else
+#endif
+   err = skb_copy_datagram_iovec(skb,
+ sizeof(struct
udphdr),
+ msg->msg_iov,
copied);
+   } else {
err = skb_copy_and_csum_datagram_iovec(skb,
sizeof(struct udphdr), msg->msg_iov);
 
if (err == -EINVAL)
@@ -882,6 +921,35 @@ try_again:
if (flags & MSG_TRUNC)
err = ulen;
 
+#ifdef CONFIG_NET_DMA
+   if (dma_chan) {
+   struct sk_buff *skb;
+   dma_cookie_t done, used;
+
+   dma_async_memcpy_issue_pending(dma_chan);
+
+   while (dma_async_memcpy_complete(dma_chan, dma_cookie,
&done,
+   &used) == DMA_IN_PROGRESS) {
+   /* do partial cleanup of sk_async_wait_queue */
+   while ((skb =
skb_peek(&sk->sk_async_wait_queue)) &&
+
(dma_async_is_complete(skb->dma_cooki

RE: [2.6 patch] unexport softnet_data

2007-11-07 Thread Nelson, Shannon
>From: David Miller [mailto:[EMAIL PROTECTED] 
>Sent: Tuesday, November 06, 2007 11:41 PM
>
>From: "Nelson, Shannon" <[EMAIL PROTECTED]>
>Date: Fri, 26 Oct 2007 08:38:55 -0700
>
>> There is no creation of a pinned_list yet in this path, so I 
>don't think
>> this would do us much good.
>
>The pinned list is created in the generic net/ipv4/tcp.c code, not in
>the address family specific code.
>
>So your comments do not apply as a reason for this patch to not go in.
>My patch has been in Linus's tree for weeks, and until you show me a
>real reason why the check shouldn't be there in the ipv6 path, it's
>staying.

Understood.

Thanks,
sln
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [2.6 patch] unexport softnet_data

2007-10-26 Thread Nelson, Shannon
>-Original Message-
>From: David Miller [mailto:[EMAIL PROTECTED] 
>
>From: Adrian Bunk <[EMAIL PROTECTED]>
>Date: Wed, 24 Oct 2007 18:24:25 +0200
>
>> The EXPORT_PER_CPU_SYMBOL(softnet_data) is no longer used.
>> 
>> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
>
>I wanted to apply this, but in validing the patch I noticed
>what appears to be an omission in TCP ipv6.
>
>It seems that NET_DMA support there is only half-cooked and
>the following patch is needed (and thus there is a modular
>use of softnet_data again).
>
>Looking at Christopher Leech's original TCP I/O AT commit:
>
>1a2449a87bb7606113b1aa1a9d3c3e78ef189a1c
>
>this appears to just be an oversight.
>
>If one of the current I/O AT folks can look this over and
>confirm I'd appreciate it.
>
>Thanks!
>
>[TCP]: Add missing I/O AT code to ipv6 side.
>
>Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
>
>diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
>index 32dc329..06fa4ba 100644
>--- a/net/ipv6/tcp_ipv6.c
>+++ b/net/ipv6/tcp_ipv6.c
>@@ -1732,6 +1732,8 @@ process:
>   if (!sock_owned_by_user(sk)) {
> #ifdef CONFIG_NET_DMA
>   struct tcp_sock *tp = tcp_sk(sk);
>+  if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
>+  tp->ucopy.dma_chan = get_softnet_dma();
>   if (tp->ucopy.dma_chan)
>   ret = tcp_v6_do_rcv(sk, skb);
>   else
>

There is no creation of a pinned_list yet in this path, so I don't think
this would do us much good.  Anyway, I believe the next bit is simply
checking to see if we can process it right away (if we have a dma
channel or a reader task is waiting) otherwise stick it into backlog.
Once we go to process it, if it is an IP packet we'll drop into the ipv4
processing code and end up in tcp_recv() which can create the
pinned_list and process from there.

I'm sure someone will correct me if I'm wrong, but I believe this bit it
not needed.  I suggest a NAK for this patch.

sln
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: net-2.6.24 plans

2007-09-17 Thread Nelson, Shannon
>From: [EMAIL PROTECTED] 
>[mailto:[EMAIL PROTECTED] On Behalf Of David Miller
>Sent: Monday, September 17, 2007 2:18 PM
>To: netdev@vger.kernel.org
>Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
>[EMAIL PROTECTED]
>Subject: Re: net-2.6.24 plans
[...]
>
>Andrew, I removed the troublesome IOAT patch.  The only conflicts
>and fixes you should need at this point are:
>
[...]

Which IOAT patch are you referring to?

sln
--
==
Mr. Shannon Nelson LAN Access Division, Intel Corp.
[EMAIL PROTECTED]I don't speak for Intel
(503) 712-7659Parents can't afford to be squeamish.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IOAT: Fix ioatdma descriptor cache miss

2007-08-14 Thread Nelson, Shannon
 
(copying from linux-kernel to net-dev)

The layout for struct ioat_desc_sw is non-optimal and causes an extra
cache hit for every descriptor processed.  By tightening up the struct
layout and removing one item, we pull in the fields that get used in
the speedpath and get a little better performance.


Before:
---
struct ioat_desc_sw {
struct ioat_dma_descriptor * hw; /* 0 8
*/
struct list_head   node; /* 816
*/
inttx_cnt;   /*24 4
*/

/* XXX 4 bytes hole, try to pack */

dma_addr_t src;  /*32 8
*/
__u32  src_len;  /*40 4
*/

/* XXX 4 bytes hole, try to pack */

dma_addr_t dst;  /*48 8
*/
__u32  dst_len;  /*56 4
*/

/* XXX 4 bytes hole, try to pack */

/* --- cacheline 1 boundary (64 bytes) --- */
struct dma_async_tx_descriptor async_tx; /*64   144
*/
/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */

/* size: 208, cachelines: 4 */
/* sum members: 196, holes: 3, sum holes: 12 */
/* last cacheline: 16 bytes */
};  /* definitions: 1 */


After:
--

struct ioat_desc_sw {
struct ioat_dma_descriptor * hw; /* 0 8
*/
struct list_head   node; /* 816
*/
inttx_cnt;   /*24 4
*/
__u32  len;  /*28 4
*/
dma_addr_t src;  /*32 8
*/
dma_addr_t dst;  /*40 8
*/
struct dma_async_tx_descriptor async_tx; /*48   144
*/
/* --- cacheline 3 boundary (192 bytes) --- */

/* size: 192, cachelines: 3 */
};  /* definitions: 1 */


Signed-off-by: Shannon Nelson <[EMAIL PROTECTED]>
---

 drivers/dma/ioatdma.c |7 +++
 drivers/dma/ioatdma.h |3 +--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
index 5fbe56b..2d1f178 100644
--- a/drivers/dma/ioatdma.c
+++ b/drivers/dma/ioatdma.c
@@ -347,8 +347,7 @@ ioat_dma_prep_memcpy(struct dma_chan *chan, size_t
len, int int_en)
new->async_tx.ack = 0; /* client is in control of this ack */
new->async_tx.cookie = -EBUSY;
 
-   pci_unmap_len_set(new, src_len, orig_len);
-   pci_unmap_len_set(new, dst_len, orig_len);
+   pci_unmap_len_set(new, len, orig_len);
spin_unlock_bh(&ioat_chan->desc_lock);
 
return new ? &new->async_tx : NULL;
@@ -423,11 +422,11 @@ static void ioat_dma_memcpy_cleanup(struct
ioat_dma_chan *chan)
*/
pci_unmap_page(chan->device->pdev,
pci_unmap_addr(desc, dst),
-   pci_unmap_len(desc, dst_len),
+   pci_unmap_len(desc, len),
PCI_DMA_FROMDEVICE);
pci_unmap_page(chan->device->pdev,
pci_unmap_addr(desc, src),
-   pci_unmap_len(desc, src_len),
+   pci_unmap_len(desc, len),
PCI_DMA_TODEVICE);
}
 
diff --git a/drivers/dma/ioatdma.h b/drivers/dma/ioatdma.h
index d372647..bf4dad7 100644
--- a/drivers/dma/ioatdma.h
+++ b/drivers/dma/ioatdma.h
@@ -111,10 +111,9 @@ struct ioat_desc_sw {
struct ioat_dma_descriptor *hw;
struct list_head node;
int tx_cnt;
+   DECLARE_PCI_UNMAP_LEN(len)
DECLARE_PCI_UNMAP_ADDR(src)
-   DECLARE_PCI_UNMAP_LEN(src_len)
DECLARE_PCI_UNMAP_ADDR(dst)
-   DECLARE_PCI_UNMAP_LEN(dst_len)
struct dma_async_tx_descriptor async_tx;
 };
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: NET_DMA: where do we ever call dma_skb_copy_datagram_iovec() with NULL pinned_list?

2007-07-24 Thread Nelson, Shannon
Al Viro:
>
>   AFAICS, all callers of dma_skb_copy_datagram_iovec()
>are either
>   * recursive for fragments, pass pinned_list unchanged or
>   * called from tcp, with pinned_list coming from
>tp->ucopy.pinned_list and only when tp->ucopy.dma_chan is non-NULL.
>
>Now, all non-NULL assignments to ->dma_chan have the same form:
>   if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
>   tp->ucopy.dma_chan = get_softnet_dma();
>IOW, if ->ucopy.pinned_list stays NULL, ->ucopy.dma_chan will 
>do the same.
>
>Moreover, any place that resets ->ucopy.pinned_list will also reset
>->ucopy.dma_chan.
>
>IOW, we can't ever get non-NULL tp->ucopy.dma_chan while 
>tp->ucopy.pinned_list
>is NULL.  So how can we ever get to the dma_memcpy_to_kernel_iovec()?

It looks like this is "extra" code.  The history seems to be that it was
thought to be useful for internal copying, perhaps for smbfs or iSCSI,
as the comment suggests.  However, since no one is using it, it can
probably come out.  If there is no argument, I'll post a patch to remove
it.

sln
--
==
Mr. Shannon Nelson LAN Access Division, Intel Corp.
[EMAIL PROTECTED]I don't speak for Intel
(503) 712-7659Parents can't afford to be squeamish.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html