date:20160405

Re: [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program

2016-04-05 Thread Brenden Blanco

On Tue, Apr 05, 2016 at 05:15:20PM +0300, Or Gerlitz wrote:
> On 4/4/2016 9:50 PM, Alexei Starovoitov wrote:
> >On Mon, Apr 04, 2016 at 08:22:03AM -0700, Eric Dumazet wrote:
> >>A single flow is able to use 40Gbit on those 40Gbit NIC, so there is not
> >>a single 10GB trunk used for a given flow.
> >>
> >>This 14Mpps thing seems to be a queue limitation on mlx4.
> >yeah, could be queueing related. Multiple cpus can send ~30Mpps of the same 
> >64 byte packet,
> >but mlx4 can only receive 14.5Mpps. Odd.
> >
> >Or (and other mellanox guys), what is really going on inside 40G nic?
> 
> Hi Alexei,
> 
> Not that I know everything that goes inside there, and not that if I
> knew it all I could have posted that here (I heard HWs sometimes
> have IP)... but, anyway, as for your questions:
> 
> ConnectX3 40Gbs NIC can receive > 10Gbs packet-worthy (14.5M) in
> single ring and Mellanox
> 100Gbs NICs can receive > 25Gbs packet-worthy (37.5M) in single
> ring, people that use DPDK (...) even see this numbers and AFAIU we
> now attempt to see that in the kernel with XDP :)
> 
> I realize that we might have some issues in the mlx4 driver
> reporting on HW drops. Eran (cc-ed) and Co are looking on that.
Thanks!
> 
> In parallel to doing so, I would suggest you to do some experiments
> that might shed some more light, if on the TX side you do
> 
> $ ./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
> 
> On the RX side,  skip RSS and force the packets that match that
> traffic pattern to go to (say) ring (==action) 0
> 
> $ ethtool -U $DEV flow-type ip4 dst-mac $MAC dst-ip $IP action 0 loc 0

I added the module parameter:
  options mlx4_core log_num_mgm_entry_size=-1
And with this I was able to reach to >20 Mpps. This is actually
regardless of the ethtool settings mentioned above.

 25.31%  ksoftirqd/0   [mlx4_en] [k] mlx4_en_process_rx_cq
 20.18%  ksoftirqd/0   [mlx4_en] [k] mlx4_en_alloc_frags
  8.42%  ksoftirqd/0   [mlx4_en] [k] mlx4_en_free_frag
  5.59%  swapper   [kernel.vmlinux]  [k] poll_idle
  5.38%  ksoftirqd/0   [kernel.vmlinux]  [k] get_page_from_freelist
  3.06%  ksoftirqd/0   [mlx4_en] [k] mlx4_call_bpf
  2.73%  ksoftirqd/0   [mlx4_en] [k] 0x0001cf94
  2.72%  ksoftirqd/0   [kernel.vmlinux]  [k] free_pages_prepare
  2.19%  ksoftirqd/0   [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
  2.08%  ksoftirqd/0   [kernel.vmlinux]  [k] sk_load_byte_positive_offset
  1.72%  ksoftirqd/0   [kernel.vmlinux]  [k] free_one_page
  1.59%  ksoftirqd/0   [kernel.vmlinux]  [k] bpf_map_lookup_elem
  1.30%  ksoftirqd/0   [mlx4_en] [k] 0x0001cfc1
  1.07%  ksoftirqd/0   [kernel.vmlinux]  [k] __alloc_pages_nodemask
  1.00%  ksoftirqd/0   [mlx4_en] [k] mlx4_alloc_pages.isra.23

> 
> to go back to RSS remove the rule
> 
> $ ethtool -U $DEV delete action 0
> 
> FWIW (not that I see how it helps you now), you can do HW drop on
> the RX side with ring -1
> 
> $ ethtool -U $DEV flow-type ip4 dst-mac $MAC dst-ip $IP action -1 loc 0
> 
> Or.
> 

Here also is the output from the two machines using a tool to get
ethtool delta stats at 1 second intervals:

--- sender ---
   tx_packets: 20,246,059
 tx_bytes: 1,214,763,540 bps= 9,267.91 Mbps
xmit_more: 19,463,226
queue_stopped: 36,982
   wake_queue: 36,982
 rx_pause: 6,351
tx_pause_duration: 124,974
  tx_pause_transition: 3,176
tx_novlan_packets: 20,244,344
  tx_novlan_bytes: 1,295,629,440 bps= 9,884.86 Mbps
  tx0_packets: 5,151,029
tx0_bytes: 309,061,680 bps  = 2,357.95 Mbps
  tx1_packets: 5,094,532
tx1_bytes: 305,671,920 bps  = 2,332.9 Mbps
  tx2_packets: 5,130,996
tx2_bytes: 307,859,760 bps  = 2,348.78 Mbps
  tx3_packets: 5,135,513
tx3_bytes: 308,130,780 bps  = 2,350.85 Mbps
 UP 0: 9,389.68 Mbps = 100.00%
 UP 0: 20,512,070   Tran/sec = 100.00%

--- receiver ---
   rx_packets: 20,207,929
 rx_bytes: 1,212,475,740 bps= 9,250.45 Mbps
   rx_dropped: 236,604
rx_pause_duration: 128,436
  rx_pause_transition: 3,258
 tx_pause: 6,516
rx_novlan_packets: 20,208,906
  rx_novlan_bytes: 1,293,369,984 bps= 9,867.62 Mbps
  rx0_packets: 20,444,526
rx0_bytes: 1,226,671,560 bps= 9,358.76 Mbps

[net-next 02/14] i40e: Enable Geneve offload for FW API ver > 1.4 for XL710/X710 devices

2016-04-05 Thread Jeff Kirsher

From: Anjali Singhai Jain 

This patch enables the Capability for XL710/X710 devices with FW API
version higher than 1.4 to do geneve Rx offload.

Change-ID: I9a8f87772c48d7d67dc85e3701d2e0b845034c0b
Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 297fd39..fdcb50a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9158,6 +9158,12 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
I40E_VLAN_ANY, false, true);
spin_unlock_bh(>mac_filter_list_lock);
}
+   } else if ((pf->hw.aq.api_maj_ver > 1) ||
+  ((pf->hw.aq.api_maj_ver == 1) &&
+   (pf->hw.aq.api_min_ver > 4))) {
+   /* Supported in FW API version higher than 1.4 */
+   pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
+   pf->auto_disable_flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE;
} else {
/* relate the VSI_VMDQ name to the VSI_MAIN name */
snprintf(netdev->name, IFNAMSIZ, "%sv%%d",
-- 
2.5.5

[net-next 01/14] i40e: remove redundant check on vsi->active_vlans

2016-04-05 Thread Jeff Kirsher

From: Colin King 

active_vlans is an unsigned long array, hence a null check on this
array is superfluous and can be removed.

Detected with static analysis by smatch:

drivers/net/ethernet/intel/i40e/i40e_debugfs.c:386
  i40e_dbg_dump_vsi_seid() warn: this array is probably
  non-NULL. 'vsi->active_vlans'

Signed-off-by: Colin Ian King 
Acked-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 0c97733..83dccf1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -147,9 +147,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int 
seid)
dev_info(>pdev->dev, "vlan_features = 0x%08lx\n",
 (unsigned long int)nd->vlan_features);
}
-   if (vsi->active_vlans)
-   dev_info(>pdev->dev,
-"vlgrp: & = %p\n", vsi->active_vlans);
+   dev_info(>pdev->dev,
+"vlgrp: & = %p\n", vsi->active_vlans);
dev_info(>pdev->dev,
 "state = %li flags = 0x%08lx, netdev_registered = %i, 
current_netdev_flags = 0x%04x\n",
 vsi->state, vsi->flags,
-- 
2.5.5

[net-next 10/14] i40e: Fix for supported link modes in 10GBaseT PHY's

2016-04-05 Thread Jeff Kirsher

From: Avinash Dayanand 

100baseT/Full is now listed and supported link mode for 10GBaseT PHY.
This is a fix to list all the supported link modes of 10GBaseT PHY.

Change-ID: If2be3212ef0fef85fd5d6e4550c7783de2f915e9
Signed-off-by: Avinash Dayanand 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 410d237..8a83d45 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -313,6 +313,13 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
ecmd->advertising |= ADVERTISED_1baseT_Full;
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_1GB)
ecmd->advertising |= ADVERTISED_1000baseT_Full;
+   /* adding 100baseT support for 10GBASET_PHY */
+   if (pf->flags & I40E_FLAG_HAVE_10GBASET_PHY) {
+   ecmd->supported |= SUPPORTED_100baseT_Full;
+   ecmd->advertising |= ADVERTISED_100baseT_Full |
+ADVERTISED_1000baseT_Full |
+ADVERTISED_1baseT_Full;
+   }
break;
case I40E_PHY_TYPE_1000BASE_T_OPTICAL:
ecmd->supported = SUPPORTED_Autoneg |
@@ -325,6 +332,15 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
  SUPPORTED_100baseT_Full;
if (hw_link_info->requested_speeds & I40E_LINK_SPEED_100MB)
ecmd->advertising |= ADVERTISED_100baseT_Full;
+   /* firmware detects 10G phy as 100M phy at 100M speed */
+   if (pf->flags & I40E_FLAG_HAVE_10GBASET_PHY) {
+   ecmd->supported |= SUPPORTED_1baseT_Full |
+  SUPPORTED_1000baseT_Full;
+   ecmd->advertising |= ADVERTISED_Autoneg |
+ADVERTISED_100baseT_Full |
+ADVERTISED_1000baseT_Full |
+ADVERTISED_1baseT_Full;
+   }
break;
case I40E_PHY_TYPE_10GBASE_CR1_CU:
case I40E_PHY_TYPE_10GBASE_CR1:
-- 
2.5.5

[net-next 05/14] i40e: Add new device ID for X722

2016-04-05 Thread Jeff Kirsher

From: Catherine Sullivan 

The new device ID is 0x37D3 and it should follow the same flows and
branding string as for 0x37D0.

Change-ID: Ia5ad4a1910268c4666a3fd46a7afffbec55b4fc2
Signed-off-by: Catherine Sullivan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c   | 1 +
 drivers/net/ethernet/intel/i40e/i40e_devids.h   | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_common.c | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_devids.h | 1 +
 5 files changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 8276a13..ebcc0d3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -60,6 +60,7 @@ static i40e_status i40e_set_mac_type(struct i40e_hw *hw)
case I40E_DEV_ID_SFP_X722:
case I40E_DEV_ID_1G_BASE_T_X722:
case I40E_DEV_ID_10G_BASE_T_X722:
+   case I40E_DEV_ID_SFP_I_X722:
hw->mac.type = I40E_MAC_X722;
break;
default:
diff --git a/drivers/net/ethernet/intel/i40e/i40e_devids.h 
b/drivers/net/ethernet/intel/i40e/i40e_devids.h
index 99257fc..dd4457d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_devids.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_devids.h
@@ -44,6 +44,7 @@
 #define I40E_DEV_ID_SFP_X722   0x37D0
 #define I40E_DEV_ID_1G_BASE_T_X722 0x37D1
 #define I40E_DEV_ID_10G_BASE_T_X7220x37D2
+#define I40E_DEV_ID_SFP_I_X722 0x37D3
 
 #define i40e_is_40G_device(d)  ((d) == I40E_DEV_ID_QSFP_A  || \
 (d) == I40E_DEV_ID_QSFP_B  || \
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index fdcb50a..73d4bea 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -90,6 +90,7 @@ static const struct pci_device_id i40e_pci_tbl[] = {
{PCI_VDEVICE(INTEL, I40E_DEV_ID_SFP_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_1G_BASE_T_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T_X722), 0},
+   {PCI_VDEVICE(INTEL, I40E_DEV_ID_SFP_I_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_20G_KR2), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_20G_KR2_A), 0},
/* required last entry */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c 
b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 771ac6a..4db0c03 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -58,6 +58,7 @@ i40e_status i40e_set_mac_type(struct i40e_hw *hw)
case I40E_DEV_ID_SFP_X722:
case I40E_DEV_ID_1G_BASE_T_X722:
case I40E_DEV_ID_10G_BASE_T_X722:
+   case I40E_DEV_ID_SFP_I_X722:
hw->mac.type = I40E_MAC_X722;
break;
case I40E_DEV_ID_X722_VF:
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_devids.h 
b/drivers/net/ethernet/intel/i40evf/i40e_devids.h
index ca8b58c..7023570 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_devids.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_devids.h
@@ -44,6 +44,7 @@
 #define I40E_DEV_ID_SFP_X722   0x37D0
 #define I40E_DEV_ID_1G_BASE_T_X722 0x37D1
 #define I40E_DEV_ID_10G_BASE_T_X7220x37D2
+#define I40E_DEV_ID_SFP_I_X722 0x37D3
 #define I40E_DEV_ID_X722_VF0x37CD
 #define I40E_DEV_ID_X722_VF_HV 0x37D9
 
-- 
2.5.5

[net-next 11/14] i40e: Lower some message levels

2016-04-05 Thread Jeff Kirsher

From: Mitch Williams 

These conditions can happen any time VFs are enabled or disabled and are
not really indicative of fatal problems unless they happen continuously.

Lower the log level so that people don't get scared.

Change-ID: I1ceb4adbd10d03cbeed54d1f5b7f20d60328351d
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 169c256..9924503 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1232,8 +1232,8 @@ static int i40e_vc_send_msg_to_vf(struct i40e_vf *vf, u32 
v_opcode,
/* single place to detect unsuccessful return values */
if (v_retval) {
vf->num_invalid_msgs++;
-   dev_err(>pdev->dev, "VF %d failed opcode %d, error: %d\n",
-   vf->vf_id, v_opcode, v_retval);
+   dev_info(>pdev->dev, "VF %d failed opcode %d, retval: %d\n",
+vf->vf_id, v_opcode, v_retval);
if (vf->num_invalid_msgs >
I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED) {
dev_err(>pdev->dev,
@@ -1251,9 +1251,9 @@ static int i40e_vc_send_msg_to_vf(struct i40e_vf *vf, u32 
v_opcode,
aq_ret = i40e_aq_send_msg_to_vf(hw, abs_vf_id,  v_opcode, v_retval,
msg, msglen, NULL);
if (aq_ret) {
-   dev_err(>pdev->dev,
-   "Unable to send the message to VF %d aq_err %d\n",
-   vf->vf_id, pf->hw.aq.asq_last_status);
+   dev_info(>pdev->dev,
+"Unable to send the message to VF %d aq_err %d\n",
+vf->vf_id, pf->hw.aq.asq_last_status);
return -EIO;
}
 
-- 
2.5.5

[net-next 09/14] i40evf: Fix get_rss_aq

2016-04-05 Thread Jeff Kirsher

From: Catherine Sullivan 

We were passing in the seed where we should just be passing false
because we want the VSI table not the pf table.

Change-ID: I9b633ab06eb59468087f0c0af8539857e99f9495
Signed-off-by: Catherine Sullivan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 6561a33..2d1fe56 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1341,7 +1341,7 @@ static int i40evf_get_rss_aq(struct i40e_vsi *vsi, const 
u8 *seed,
}
 
if (lut) {
-   ret = i40evf_aq_get_rss_lut(hw, vsi->id, seed, lut, lut_size);
+   ret = i40evf_aq_get_rss_lut(hw, vsi->id, false, lut, lut_size);
if (ret) {
dev_err(>pdev->dev,
"Cannot get RSS lut, err %s aq_err %s\n",
-- 
2.5.5

[net-next 08/14] i40e: Disable link polling

2016-04-05 Thread Jeff Kirsher

From: Shannon Nelson 

Periodic link polling was added when the link events were found not to be
trustworthy.  This was the case early on, but was likely because the link
event mask was being used incorrectly.  As this has been fixed in recent
code, we can disable the link polling to lessen the AQ traffic.

Change-ID: Id890b5ee3c2d04381fc76ffa434777644f5d8eb0
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 73d4bea..184f3f9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8448,7 +8448,6 @@ static int i40e_sw_init(struct i40e_pf *pf)
/* Set default capability flags */
pf->flags = I40E_FLAG_RX_CSUM_ENABLED |
I40E_FLAG_MSI_ENABLED |
-   I40E_FLAG_LINK_POLLING_ENABLED |
I40E_FLAG_MSIX_ENABLED;
 
if (iommu_present(_bus_type))
-- 
2.5.5

[net-next 04/14] i40evf: Fix VLAN features

2016-04-05 Thread Jeff Kirsher

From: Mitch Williams 

Users of ethtool were being given the mistaken impression that this
driver was able to change its VLAN tagging features, and were
disappointed that this was not actually the case. Implement
ndo_fix_features method so that we can adjust these flags as needed to
avoid false impressions.

Change-ID: I08584f103a4fa73d6a4128d472e4ef44dcfda57f
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index e397368..2d018b4 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2252,6 +2252,28 @@ static int i40evf_change_mtu(struct net_device *netdev, 
int new_mtu)
return 0;
 }
 
+#define I40EVF_VLAN_FEATURES (NETIF_F_HW_VLAN_CTAG_TX |\
+ NETIF_F_HW_VLAN_CTAG_RX |\
+ NETIF_F_HW_VLAN_CTAG_FILTER)
+
+/**
+ * i40evf_fix_features - fix up the netdev feature bits
+ * @netdev: our net device
+ * @features: desired feature bits
+ *
+ * Returns fixed-up features bits
+ **/
+static netdev_features_t i40evf_fix_features(struct net_device *netdev,
+netdev_features_t features)
+{
+   struct i40evf_adapter *adapter = netdev_priv(netdev);
+
+   features &= ~I40EVF_VLAN_FEATURES;
+   if (adapter->vf_res->vf_offload_flags & I40E_VIRTCHNL_VF_OFFLOAD_VLAN)
+   features |= I40EVF_VLAN_FEATURES;
+   return features;
+}
+
 static const struct net_device_ops i40evf_netdev_ops = {
.ndo_open   = i40evf_open,
.ndo_stop   = i40evf_close,
@@ -2264,6 +2286,7 @@ static const struct net_device_ops i40evf_netdev_ops = {
.ndo_tx_timeout = i40evf_tx_timeout,
.ndo_vlan_rx_add_vid= i40evf_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid   = i40evf_vlan_rx_kill_vid,
+   .ndo_fix_features   = i40evf_fix_features,
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller= i40evf_netpoll,
 #endif
-- 
2.5.5

[net-next 12/14] i40e: Request PHY media event at reset time

2016-04-05 Thread Jeff Kirsher

From: Shannon Nelson 

Add the Media Not Available flag to the link event mask.  It seems
that event comes first if you have a DA cable pulled out, but there's no
follow-up event for Link Down; if you're not looking for MEDIA_NA you will
get no event, even though there's now no Link.

Change-ID: cb3340a2849805bb881f64f6f2ae810eef46eba7
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 184f3f9..d2c0106 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6859,6 +6859,7 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, 
bool reinit)
 */
ret = i40e_aq_set_phy_int_mask(>hw,
   ~(I40E_AQ_EVENT_LINK_UPDOWN |
+I40E_AQ_EVENT_MEDIA_NA |
 I40E_AQ_EVENT_MODULE_QUAL_FAIL), NULL);
if (ret)
dev_info(>pdev->dev, "set phy mask fail, err %s aq_err 
%s\n",
@@ -11070,6 +11071,7 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 */
err = i40e_aq_set_phy_int_mask(>hw,
   ~(I40E_AQ_EVENT_LINK_UPDOWN |
+I40E_AQ_EVENT_MEDIA_NA |
 I40E_AQ_EVENT_MODULE_QUAL_FAIL), NULL);
if (err)
dev_info(>pdev->dev, "set phy mask fail, err %s aq_err 
%s\n",
-- 
2.5.5

[net-next 07/14] i40evf: Add longer wait after remove module

2016-04-05 Thread Jeff Kirsher

From: Mitch Williams 

Upon module remove, wait a little longer after requesting a reset before
checking to see if the firmware responded. This change prevents double
resets when the firmware is busy.

Change-ID: Ieedc988ee82fac1f32a074bf4d9e4dba426bfa58
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 2d018b4..6561a33 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2854,11 +2854,11 @@ static void i40evf_remove(struct pci_dev *pdev)
adapter->state = __I40EVF_REMOVE;
adapter->aq_required = 0;
i40evf_request_reset(adapter);
-   msleep(20);
+   msleep(50);
/* If the FW isn't responding, kick it once, but only once. */
if (!i40evf_asq_done(hw)) {
i40evf_request_reset(adapter);
-   msleep(20);
+   msleep(50);
}
 
if (adapter->msix_entries) {
-- 
2.5.5

[net-next 03/14] i40e: Remove unused variable

2016-04-05 Thread Jeff Kirsher

From: Mitch Williams 

This variable is vestigial, a remnant of the primordial code from which
this driver spawned. We can safely remove it.

Change-ID: I24e0fe338e7c7c50d27dc5515564f33caefbb93a
Signed-off-by: Mitch Williams 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 47b9e62..150002e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1311,8 +1311,8 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf 
*vf, u8 *msg)
struct i40e_pf *pf = vf->pf;
i40e_status aq_ret = 0;
struct i40e_vsi *vsi;
-   int i = 0, len = 0;
int num_vsis = 1;
+   int len = 0;
int ret;
 
if (!test_bit(I40E_VF_STAT_INIT, >vf_states)) {
@@ -1374,15 +1374,14 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf 
*vf, u8 *msg)
vfres->num_queue_pairs = vf->num_queue_pairs;
vfres->max_vectors = pf->hw.func_caps.num_msix_vectors_vf;
if (vf->lan_vsi_idx) {
-   vfres->vsi_res[i].vsi_id = vf->lan_vsi_id;
-   vfres->vsi_res[i].vsi_type = I40E_VSI_SRIOV;
-   vfres->vsi_res[i].num_queue_pairs = vsi->alloc_queue_pairs;
+   vfres->vsi_res[0].vsi_id = vf->lan_vsi_id;
+   vfres->vsi_res[0].vsi_type = I40E_VSI_SRIOV;
+   vfres->vsi_res[0].num_queue_pairs = vsi->alloc_queue_pairs;
/* VFs only use TC 0 */
-   vfres->vsi_res[i].qset_handle
+   vfres->vsi_res[0].qset_handle
  = le16_to_cpu(vsi->info.qs_handle[0]);
-   ether_addr_copy(vfres->vsi_res[i].default_mac_addr,
+   ether_addr_copy(vfres->vsi_res[0].default_mac_addr,
vf->default_lan_addr.addr);
-   i++;
}
set_bit(I40E_VF_STAT_ACTIVE, >vf_states);
 
-- 
2.5.5

[net-next 14/14] i40e/i40evf: Fix TSO checksum pseudo-header adjustment

2016-04-05 Thread Jeff Kirsher

From: Alexander Duyck 

With IPv4 and IPv6 now using the same format for checksums based on the
length of the frame we need to update the i40e and i40evf drivers so that
they correctly account for lengths greater than or equal to 64K.

With this patch the driver should now correctly update checksums for frames
up to 16776960 in length which should be more than large enough for all
possible TSO frames in the near future.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 11 ---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 11 ---
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5bef5b0..5d5fa53 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2304,10 +2304,8 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
l4_offset = l4.hdr - skb->data;
 
/* remove payload length from outer checksum */
-   paylen = (__force u16)l4.udp->check;
-   paylen += ntohs((__force __be16)1) *
-   (u16)~(skb->len - l4_offset);
-   l4.udp->check = ~csum_fold((__force __wsum)paylen);
+   paylen = skb->len - l4_offset;
+   csum_replace_by_diff(>check, htonl(paylen));
}
 
/* reset pointers to inner headers */
@@ -2327,9 +2325,8 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
l4_offset = l4.hdr - skb->data;
 
/* remove payload length from inner checksum */
-   paylen = (__force u16)l4.tcp->check;
-   paylen += ntohs((__force __be16)1) * (u16)~(skb->len - l4_offset);
-   l4.tcp->check = ~csum_fold((__force __wsum)paylen);
+   paylen = skb->len - l4_offset;
+   csum_replace_by_diff(>check, htonl(paylen));
 
/* compute length of segmentation header */
*hdr_len = (l4.tcp->doff * 4) + l4_offset;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 570348d..04aabc5 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1571,10 +1571,8 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
l4_offset = l4.hdr - skb->data;
 
/* remove payload length from outer checksum */
-   paylen = (__force u16)l4.udp->check;
-   paylen += ntohs((__force __be16)1) *
-   (u16)~(skb->len - l4_offset);
-   l4.udp->check = ~csum_fold((__force __wsum)paylen);
+   paylen = skb->len - l4_offset;
+   csum_replace_by_diff(>check, htonl(paylen));
}
 
/* reset pointers to inner headers */
@@ -1594,9 +1592,8 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
l4_offset = l4.hdr - skb->data;
 
/* remove payload length from inner checksum */
-   paylen = (__force u16)l4.tcp->check;
-   paylen += ntohs((__force __be16)1) * (u16)~(skb->len - l4_offset);
-   l4.tcp->check = ~csum_fold((__force __wsum)paylen);
+   paylen = skb->len - l4_offset;
+   csum_replace_by_diff(>check, htonl(paylen));
 
/* compute length of segmentation header */
*hdr_len = (l4.tcp->doff * 4) + l4_offset;
-- 
2.5.5

[net-next 13/14] i40e/i40evf: Bump patch from 1.5.1 to 1.5.2

2016-04-05 Thread Jeff Kirsher

From: Avinash Dayanand 

Signed-off-by: Avinash Dayanand 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d2c0106..d6147f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -46,7 +46,7 @@ static const char i40e_driver_string[] =
 
 #define DRV_VERSION_MAJOR 1
 #define DRV_VERSION_MINOR 5
-#define DRV_VERSION_BUILD 1
+#define DRV_VERSION_BUILD 2
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 __stringify(DRV_VERSION_MINOR) "." \
 __stringify(DRV_VERSION_BUILD)DRV_KERN
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 2d1fe56..f4dada0 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -38,7 +38,7 @@ static const char i40evf_driver_string[] =
 
 #define DRV_VERSION_MAJOR 1
 #define DRV_VERSION_MINOR 5
-#define DRV_VERSION_BUILD 1
+#define DRV_VERSION_BUILD 2
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 __stringify(DRV_VERSION_MINOR) "." \
 __stringify(DRV_VERSION_BUILD) \
-- 
2.5.5

[net-next 06/14] i40e: Make VF resets more reliable

2016-04-05 Thread Jeff Kirsher

From: Mitch Williams 

Clear the VFLR bit immediately after triggering a reset instead of
waiting until after cleanup is complete. Make sure to trigger a reset
every time, not just if the PF is up.

These changes fix a problem where VF resets would get lost by the PF,
preventing the VF driver from initializing.

Change-ID: I5945cf2884095b7b0554867c64df8617e71d9d29
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 150002e..169c256 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -937,6 +937,10 @@ void i40e_reset_vf(struct i40e_vf *vf, bool flr)
wr32(hw, I40E_VPGEN_VFRTRIG(vf->vf_id), reg);
i40e_flush(hw);
}
+   /* clear the VFLR bit in GLGEN_VFLRSTAT */
+   reg_idx = (hw->func_caps.vf_base_id + vf->vf_id) / 32;
+   bit_idx = (hw->func_caps.vf_base_id + vf->vf_id) % 32;
+   wr32(hw, I40E_GLGEN_VFLRSTAT(reg_idx), BIT(bit_idx));
 
if (i40e_quiesce_vf_pci(vf))
dev_err(>pdev->dev, "VF %d PCI transactions stuck\n",
@@ -989,10 +993,6 @@ complete_reset:
/* tell the VF the reset is done */
wr32(hw, I40E_VFGEN_RSTAT1(vf->vf_id), I40E_VFR_VFACTIVE);
 
-   /* clear the VFLR bit in GLGEN_VFLRSTAT */
-   reg_idx = (hw->func_caps.vf_base_id + vf->vf_id) / 32;
-   bit_idx = (hw->func_caps.vf_base_id + vf->vf_id) % 32;
-   wr32(hw, I40E_GLGEN_VFLRSTAT(reg_idx), BIT(bit_idx));
i40e_flush(hw);
clear_bit(__I40E_VF_DISABLE, >state);
 }
@@ -2296,11 +2296,9 @@ int i40e_vc_process_vflr_event(struct i40e_pf *pf)
/* read GLGEN_VFLRSTAT register to find out the flr VFs */
vf = >vf[vf_id];
reg = rd32(hw, I40E_GLGEN_VFLRSTAT(reg_idx));
-   if (reg & BIT(bit_idx)) {
+   if (reg & BIT(bit_idx))
/* i40e_reset_vf will clear the bit in GLGEN_VFLRSTAT */
-   if (!test_bit(__I40E_DOWN, >state))
-   i40e_reset_vf(vf, true);
-   }
+   i40e_reset_vf(vf, true);
}
 
return 0;
-- 
2.5.5

[net-next 00/14][pull request] 40GbE Intel Wired LAN Driver Updates 2016-04-05

2016-04-05 Thread Jeff Kirsher

This series contains updates to i40e and i40evf only.

Colin Ian King cleaned up a redundant NULL check which was found by static
analysis.

Anjali enables geneve receive offload for XL710/X710 devices.

Mitch cleans up unused variable in i40e_vc_get_vf_resources_msg().
Fixed the driver to actually be able to adjust VLAN tagging features
through ethtool, as expected.  Fixed a problem where VF resets would
get lost by the PF preventing the VF driver from initializing.  Also
put users mind at ease by lowering some message levels since many of
these conditions can happen any time VFs are enabled or disabled and
are not really indicative a fatal problems, unless they happen
continuously.

Shannon disables the link polling to lessen the admin queue traffic
especially since the link event mask usage has been fixed recently.

Alex Duyck fixes the i40e and i40evf drivers to correctly update
checksums for frames up to 16776960 in length which should be more than
large enough for all possible TSO frames in the near future.

The following are changes since commit 4da46cebbd3b4dc445195a9672c99c1353af5695:
  net/core/dev: Warn on a too-short GRO frame
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alexander Duyck (1):
  i40e/i40evf: Fix TSO checksum pseudo-header adjustment

Anjali Singhai Jain (1):
  i40e: Enable Geneve offload for FW API ver > 1.4 for XL710/X710
devices

Avinash Dayanand (2):
  i40e: Fix for supported link modes in 10GBaseT PHY's
  i40e/i40evf: Bump patch from 1.5.1 to 1.5.2

Catherine Sullivan (2):
  i40e: Add new device ID for X722
  i40evf: Fix get_rss_aq

Colin King (1):
  i40e: remove redundant check on vsi->active_vlans

Mitch Williams (5):
  i40e: Remove unused variable
  i40evf: Fix VLAN features
  i40e: Make VF resets more reliable
  i40evf: Add longer wait after remove module
  i40e: Lower some message levels

Shannon Nelson (2):
  i40e: Disable link polling
  i40e: Request phy media event at reset time

 drivers/net/ethernet/intel/i40e/i40e_common.c  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  5 ++-
 drivers/net/ethernet/intel/i40e/i40e_devids.h  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 16 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c| 12 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c| 11 +++
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 37 ++
 drivers/net/ethernet/intel/i40evf/i40e_common.c|  1 +
 drivers/net/ethernet/intel/i40evf/i40e_devids.h|  1 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  | 11 +++
 drivers/net/ethernet/intel/i40evf/i40evf_main.c| 31 +++---
 11 files changed, 84 insertions(+), 43 deletions(-)

-- 
2.5.5

Re: [PATCH net-next 1/3] net: dsa: make the STP state function return void

2016-04-05 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>> -- port_stp_update: bridge layer function invoked when a given switch port 
>> STP
>> +- port_stp_state: bridge layer function invoked when a given switch port STP
>
> port_stp_state_set might be a better name, to make it clear it is
> setting the state, not getting the current state, etc. Most of the
> other functions are _add, _prepare, _join, _leave, so _set would fit
> the pattern.

I agree, I'm changing that.

> Changing to a void makes sense.

Thanks,
Vivien

Re: [PATCH net-next 2/3] net: dsa: make the FDB add function return void

2016-04-05 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>>  mutex_lock(>smi_mutex);
>> -ret = _mv88e6xxx_port_fdb_load(ds, port, fdb->addr, fdb->vid, state);
>> +if (_mv88e6xxx_port_fdb_load(ds, port, fdb->addr, fdb->vid, state))
>> +netdev_warn(ds->ports[port], "cannot load address\n");
>
> In the SF2 driver you use pr_err, but here netdev_warn. We probably
> should be consistent if we error or warn. I would use netdev_error,
> since if this fails we probably have a real hardware problem.

I used pr_err in the SF2 driver to be consistent with the rest of the
code which only uses pr_err and pr_info.

I was thinking about adding ds_err and ds_port_err to print errors for
ds->master_dev and ds->ports[port], but that might be overkill. What do
you think? Or local to the driver for the moment, like mvsw_err maybe?

I tend to use warn for cases where the user cannot really do something
about the situation, but an hardware problem is indeed critical, so I
agree with you to use error over warn here.

Thanks,
Vivien

[PATCH net-next V3 13/16] net: fec: detect tx int lost

2016-04-05 Thread Troy Kisky

If a tx int is lost, no need to reset
the fec. Just mark the event and call napi_schedule.

Signed-off-by: Troy Kisky 

---
v3: no change
---
 drivers/net/ethernet/freescale/fec_main.c | 38 ++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index be875fd..445443d 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1094,14 +1094,50 @@ fec_stop(struct net_device *ndev)
}
 }
 
+static const uint txint_flags[] = {
+   FEC_ENET_TXF_0, FEC_ENET_TXF_1, FEC_ENET_TXF_2
+};
 
 static void
 fec_timeout(struct net_device *ndev)
 {
struct fec_enet_private *fep = netdev_priv(ndev);
+   struct bufdesc *bdp;
+   unsigned short status;
+   int i;
+   uint events = 0;
 
-   fec_dump(ndev);
+   for (i = 0; i < fep->num_tx_queues; i++) {
+   struct fec_enet_priv_tx_q *txq = fep->tx_queue[i];
+   int index;
+   struct sk_buff *skb = NULL;
 
+   bdp = txq->dirty_tx;
+   while (1) {
+   bdp = fec_enet_get_nextdesc(bdp, >bd);
+   if (bdp == txq->bd.cur)
+   break;
+   index = fec_enet_get_bd_index(bdp, >bd);
+   skb = txq->tx_skbuff[index];
+   if (skb) {
+   status = fec16_to_cpu(bdp->cbd_sc);
+   if ((status & BD_ENET_TX_READY) == 0)
+   events |= txint_flags[i];
+   break;
+   }
+   }
+   }
+   if (events) {
+   fep->events |= events;
+   /* Disable the RX/TX interrupt */
+   writel(FEC_NAPI_IMASK, fep->hwp + FEC_IMASK);
+   napi_schedule(>napi);
+   netif_wake_queue(fep->netdev);
+   pr_err("%s: tx int lost\n", __func__);
+   return;
+   }
+
+   fec_dump(ndev);
ndev->stats.tx_errors++;
 
schedule_work(>tx_timeout_work);
-- 
2.5.0

[PATCH net-next V3 09/16] net: fec: eliminate calls to fec_enet_get_prevdesc

2016-04-05 Thread Troy Kisky

Eliminating calls to fec_enet_get_prevdesc shrinks
the code a little.

Signed-off-by: Troy Kisky 

---
v3: Change commit message

s/unsigned status/unsigned int status/ as requested
---
 drivers/net/ethernet/freescale/fec_main.c | 37 +--
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 21d2cd0..349fda1 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -758,6 +758,7 @@ static void fec_enet_bd_init(struct net_device *dev)
struct bufdesc *bdp;
unsigned int i;
unsigned int q;
+   unsigned int status;
 
for (q = 0; q < fep->num_rx_queues; q++) {
/* Initialize the receive buffer descriptors. */
@@ -765,19 +766,13 @@ static void fec_enet_bd_init(struct net_device *dev)
bdp = rxq->bd.base;
 
for (i = 0; i < rxq->bd.ring_size; i++) {
-
/* Initialize the BD for every fragment in the page. */
-   if (bdp->cbd_bufaddr)
-   bdp->cbd_sc = cpu_to_fec16(BD_ENET_RX_EMPTY);
-   else
-   bdp->cbd_sc = cpu_to_fec16(0);
+   status = bdp->cbd_bufaddr ? BD_ENET_RX_EMPTY : 0;
+   if (bdp == rxq->bd.last)
+   status |= BD_SC_WRAP;
+   bdp->cbd_sc = cpu_to_fec16(status);
bdp = fec_enet_get_nextdesc(bdp, >bd);
}
-
-   /* Set the last buffer to wrap */
-   bdp = fec_enet_get_prevdesc(bdp, >bd);
-   bdp->cbd_sc |= cpu_to_fec16(BD_SC_WRAP);
-
rxq->bd.cur = rxq->bd.base;
}
 
@@ -789,18 +784,16 @@ static void fec_enet_bd_init(struct net_device *dev)
 
for (i = 0; i < txq->bd.ring_size; i++) {
/* Initialize the BD for every fragment in the page. */
-   bdp->cbd_sc = cpu_to_fec16(0);
if (txq->tx_skbuff[i]) {
dev_kfree_skb_any(txq->tx_skbuff[i]);
txq->tx_skbuff[i] = NULL;
}
bdp->cbd_bufaddr = cpu_to_fec32(0);
+   bdp->cbd_sc = cpu_to_fec16((bdp == txq->bd.last) ?
+   BD_SC_WRAP : 0);
bdp = fec_enet_get_nextdesc(bdp, >bd);
}
-
-   /* Set the last buffer to wrap */
bdp = fec_enet_get_prevdesc(bdp, >bd);
-   bdp->cbd_sc |= cpu_to_fec16(BD_SC_WRAP);
txq->dirty_tx = bdp;
}
 }
@@ -2717,19 +2710,16 @@ fec_enet_alloc_rxq_buffers(struct net_device *ndev, 
unsigned int queue)
}
 
rxq->rx_skbuff[i] = skb;
-   bdp->cbd_sc = cpu_to_fec16(BD_ENET_RX_EMPTY);
 
if (fep->bufdesc_ex) {
struct bufdesc_ex *ebdp = (struct bufdesc_ex *)bdp;
ebdp->cbd_esc = cpu_to_fec32(BD_ENET_RX_INT);
}
+   bdp->cbd_sc = cpu_to_fec16(BD_ENET_RX_EMPTY |
+   ((bdp == rxq->bd.last) ? BD_SC_WRAP : 0));
 
bdp = fec_enet_get_nextdesc(bdp, >bd);
}
-
-   /* Set the last buffer to wrap. */
-   bdp = fec_enet_get_prevdesc(bdp, >bd);
-   bdp->cbd_sc |= cpu_to_fec16(BD_SC_WRAP);
return 0;
 
  err_alloc:
@@ -2752,21 +2742,16 @@ fec_enet_alloc_txq_buffers(struct net_device *ndev, 
unsigned int queue)
if (!txq->tx_bounce[i])
goto err_alloc;
 
-   bdp->cbd_sc = cpu_to_fec16(0);
bdp->cbd_bufaddr = cpu_to_fec32(0);
 
if (fep->bufdesc_ex) {
struct bufdesc_ex *ebdp = (struct bufdesc_ex *)bdp;
ebdp->cbd_esc = cpu_to_fec32(BD_ENET_TX_INT);
}
-
+   bdp->cbd_sc = cpu_to_fec16((bdp == txq->bd.last) ?
+   BD_SC_WRAP : 0);
bdp = fec_enet_get_nextdesc(bdp, >bd);
}
-
-   /* Set the last buffer to wrap. */
-   bdp = fec_enet_get_prevdesc(bdp, >bd);
-   bdp->cbd_sc |= cpu_to_fec16(BD_SC_WRAP);
-
return 0;
 
  err_alloc:
-- 
2.5.0

[PATCH net-next V3 14/16] net: fec: create subroutine reset_tx_queue

2016-04-05 Thread Troy Kisky

Create subroutine reset_tx_queue to have one place
to release any queued tx skbs.

Signed-off-by: Troy Kisky 

---
v3: change commit message
---
 drivers/net/ethernet/freescale/fec_main.c | 50 +++
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 445443d..a38acf2 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -752,12 +752,33 @@ fec_enet_start_xmit(struct sk_buff *skb, struct 
net_device *ndev)
return NETDEV_TX_OK;
 }
 
+static void reset_tx_queue(struct fec_enet_private *fep,
+  struct fec_enet_priv_tx_q *txq)
+{
+   struct bufdesc *bdp = txq->bd.base;
+   unsigned int i;
+
+   txq->bd.cur = bdp;
+   for (i = 0; i < txq->bd.ring_size; i++) {
+   /* Initialize the BD for every fragment in the page. */
+   if (txq->tx_skbuff[i]) {
+   dev_kfree_skb_any(txq->tx_skbuff[i]);
+   txq->tx_skbuff[i] = NULL;
+   }
+   bdp->cbd_bufaddr = cpu_to_fec32(0);
+   bdp->cbd_sc = cpu_to_fec16((bdp == txq->bd.last) ?
+  BD_SC_WRAP : 0);
+   bdp = fec_enet_get_nextdesc(bdp, >bd);
+   }
+   bdp = fec_enet_get_prevdesc(bdp, >bd);
+   txq->dirty_tx = bdp;
+}
+
 /* Init RX & TX buffer descriptors
  */
 static void fec_enet_bd_init(struct net_device *dev)
 {
struct fec_enet_private *fep = netdev_priv(dev);
-   struct fec_enet_priv_tx_q *txq;
struct fec_enet_priv_rx_q *rxq;
struct bufdesc *bdp;
unsigned int i;
@@ -780,26 +801,8 @@ static void fec_enet_bd_init(struct net_device *dev)
rxq->bd.cur = rxq->bd.base;
}
 
-   for (q = 0; q < fep->num_tx_queues; q++) {
-   /* ...and the same for transmit */
-   txq = fep->tx_queue[q];
-   bdp = txq->bd.base;
-   txq->bd.cur = bdp;
-
-   for (i = 0; i < txq->bd.ring_size; i++) {
-   /* Initialize the BD for every fragment in the page. */
-   if (txq->tx_skbuff[i]) {
-   dev_kfree_skb_any(txq->tx_skbuff[i]);
-   txq->tx_skbuff[i] = NULL;
-   }
-   bdp->cbd_bufaddr = cpu_to_fec32(0);
-   bdp->cbd_sc = cpu_to_fec16((bdp == txq->bd.last) ?
-   BD_SC_WRAP : 0);
-   bdp = fec_enet_get_nextdesc(bdp, >bd);
-   }
-   bdp = fec_enet_get_prevdesc(bdp, >bd);
-   txq->dirty_tx = bdp;
-   }
+   for (q = 0; q < fep->num_tx_queues; q++)
+   reset_tx_queue(fep, fep->tx_queue[q]);
 }
 
 static void fec_enet_active_rxring(struct net_device *ndev)
@@ -2648,13 +2651,10 @@ static void fec_enet_free_buffers(struct net_device 
*ndev)
 
for (q = 0; q < fep->num_tx_queues; q++) {
txq = fep->tx_queue[q];
-   bdp = txq->bd.base;
+   reset_tx_queue(fep, txq);
for (i = 0; i < txq->bd.ring_size; i++) {
kfree(txq->tx_bounce[i]);
txq->tx_bounce[i] = NULL;
-   skb = txq->tx_skbuff[i];
-   txq->tx_skbuff[i] = NULL;
-   dev_kfree_skb(skb);
}
}
 }
-- 
2.5.0

[PATCH net-next V3 05/16] net: fec: reduce interrupts

2016-04-05 Thread Troy Kisky

By clearing the NAPI interrupts in the NAPI routine
and not in the interrupt handler, we can reduce the
number of interrupts. We also don't need any status
variables as the registers are still valid.

Also, notice that if budget pkts are received, the
next call to fec_enet_rx_napi will now continue to
receive the previously pending packets.

To test that this actually reduces interrupts, try
this command before/after patch

cat /proc/interrupts |grep ether; \
ping -s2800 192.168.0.201 -f -c1000 ; \
cat /proc/interrupts |grep ether

For me, before this patch is 2996 interrupts.
After patch is 2010 interrupts.

Signed-off-by: Troy Kisky 

---
v3:
Fix introduced bug of checking for FEC_ENET_TS_TIMER
before calling fec_ptp_check_pps_event

Changed commit message to show measured changes.

Used netdev_info instead of pr_info.

Fugang Duan suggested splitting TX and RX into two NAPI
contexts, but that should be a separate patch as it
is unrelated to what this patch does.
---
 drivers/net/ethernet/freescale/fec.h  |   6 +-
 drivers/net/ethernet/freescale/fec_main.c | 118 +++---
 2 files changed, 45 insertions(+), 79 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index 6dd0ba8..9d5bdc6 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -505,11 +505,7 @@ struct fec_enet_private {
 
unsigned int total_tx_ring_size;
unsigned int total_rx_ring_size;
-
-   unsigned long work_tx;
-   unsigned long work_rx;
-   unsigned long work_ts;
-   unsigned long work_mdio;
+   uintevents;
 
struct  platform_device *pdev;
 
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index b4d46f8..918ac82 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -70,8 +70,6 @@ static void fec_enet_itr_coal_init(struct net_device *ndev);
 
 #define DRIVER_NAME"fec"
 
-#define FEC_ENET_GET_QUQUE(_x) ((_x == 0) ? 1 : ((_x == 1) ? 2 : 0))
-
 /* Pause frame feild and FIFO threshold */
 #define FEC_ENET_FCE   (1 << 5)
 #define FEC_ENET_RSEM_V0x84
@@ -1257,21 +1255,6 @@ static void fec_txq(struct net_device *ndev, struct 
fec_enet_priv_tx_q *txq)
writel(0, txq->bd.reg_desc_active);
 }
 
-static void
-fec_enet_tx(struct net_device *ndev)
-{
-   struct fec_enet_private *fep = netdev_priv(ndev);
-   struct fec_enet_priv_tx_q *txq;
-   u16 queue_id;
-   /* First process class A queue, then Class B and Best Effort queue */
-   for_each_set_bit(queue_id, >work_tx, FEC_ENET_MAX_TX_QS) {
-   clear_bit(queue_id, >work_tx);
-   txq = fep->tx_queue[FEC_ENET_GET_QUQUE(queue_id)];
-   fec_txq(ndev, txq);
-   }
-   return;
-}
-
 static int
 fec_enet_new_rxbdp(struct net_device *ndev, struct bufdesc *bdp, struct 
sk_buff *skb)
 {
@@ -1505,70 +1488,34 @@ rx_processing_done:
return pkt_received;
 }
 
-static int
-fec_enet_rx(struct net_device *ndev, int budget)
-{
-   int pkt_received = 0;
-   u16 queue_id;
-   struct fec_enet_private *fep = netdev_priv(ndev);
-   struct fec_enet_priv_rx_q *rxq;
-
-   for_each_set_bit(queue_id, >work_rx, FEC_ENET_MAX_RX_QS) {
-   clear_bit(queue_id, >work_rx);
-   rxq = fep->rx_queue[FEC_ENET_GET_QUQUE(queue_id)];
-   pkt_received += fec_rxq(ndev, rxq, budget - pkt_received);
-   }
-   return pkt_received;
-}
-
-static bool
-fec_enet_collect_events(struct fec_enet_private *fep, uint int_events)
-{
-   if (int_events == 0)
-   return false;
-
-   if (int_events & FEC_ENET_RXF_0)
-   fep->work_rx |= (1 << 2);
-   if (int_events & FEC_ENET_RXF_1)
-   fep->work_rx |= (1 << 0);
-   if (int_events & FEC_ENET_RXF_2)
-   fep->work_rx |= (1 << 1);
-
-   if (int_events & FEC_ENET_TXF_0)
-   fep->work_tx |= (1 << 2);
-   if (int_events & FEC_ENET_TXF_1)
-   fep->work_tx |= (1 << 0);
-   if (int_events & FEC_ENET_TXF_2)
-   fep->work_tx |= (1 << 1);
-
-   return true;
-}
-
 static irqreturn_t
 fec_enet_interrupt(int irq, void *dev_id)
 {
struct net_device *ndev = dev_id;
struct fec_enet_private *fep = netdev_priv(ndev);
-   uint int_events;
irqreturn_t ret = IRQ_NONE;
+   uint eir = readl(fep->hwp + FEC_IEVENT);
+   uint int_events = eir & readl(fep->hwp + FEC_IMASK);
 
-   int_events = readl(fep->hwp + FEC_IEVENT);
-   writel(int_events, fep->hwp + FEC_IEVENT);
-   fec_enet_collect_events(fep, int_events);
-
-   if ((fep->work_tx || fep->work_rx) && fep->link) {
-   ret = IRQ_HANDLED;
-
+   if (int_events & (FEC_ENET_RXF | FEC_ENET_TXF)) {
if (napi_schedule_prep(>napi)) {

[PATCH net-next V3 10/16] net: fec: move restart test for efficiency

2016-04-05 Thread Troy Kisky

Move restart test to earlier in fec_txq() which saves one comparison.

Signed-off-by: Troy Kisky 

---
v3: change commit message
---
 drivers/net/ethernet/freescale/fec_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 349fda1..a2a9dca 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1157,8 +1157,13 @@ static void fec_txq(struct net_device *ndev, struct 
fec_enet_priv_tx_q *txq)
/* Order the load of bd.cur and cbd_sc */
rmb();
status = fec16_to_cpu(READ_ONCE(bdp->cbd_sc));
-   if (status & BD_ENET_TX_READY)
+   if (status & BD_ENET_TX_READY) {
+   if (!readl(txq->bd.reg_desc_active)) {
+   /* ERR006358 has hit, restart tx */
+   writel(0, txq->bd.reg_desc_active);
+   }
break;
+   }
 
index = fec_enet_get_bd_index(bdp, >bd);
 
@@ -1230,11 +1235,6 @@ static void fec_txq(struct net_device *ndev, struct 
fec_enet_priv_tx_q *txq)
netif_tx_wake_queue(nq);
}
}
-
-   /* ERR006538: Keep the transmitter going */
-   if (bdp != txq->bd.cur &&
-   readl(txq->bd.reg_desc_active) == 0)
-   writel(0, txq->bd.reg_desc_active);
 }
 
 static int
-- 
2.5.0

[PATCH net-next V3 11/16] net: fec: clear cbd_sc after transmission to help with debugging

2016-04-05 Thread Troy Kisky

When the tx queue is dumped, it is easier to see that this
entry is idle if cbd_sc is cleared after transmission.

Signed-off-by: Troy Kisky 

---
v3: change commit message
---
 drivers/net/ethernet/freescale/fec_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index a2a9dca..f96ea97 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1164,6 +1164,8 @@ static void fec_txq(struct net_device *ndev, struct 
fec_enet_priv_tx_q *txq)
}
break;
}
+   bdp->cbd_sc = cpu_to_fec16((bdp == txq->bd.last) ?
+  BD_SC_WRAP : 0);
 
index = fec_enet_get_bd_index(bdp, >bd);
 
-- 
2.5.0

[PATCH net-next V3 12/16] net: fec: dump all tx queues in fec_dump

2016-04-05 Thread Troy Kisky

Dump all tx queues, not just queue 0.
Also, disable fec interrupts first.
The interrupts will be reenabled in fec_restart.

Signed-off-by: Troy Kisky 

---
v3: no change
---
 drivers/net/ethernet/freescale/fec_main.c | 40 +--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index f96ea97..be875fd 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -271,28 +271,32 @@ static void swap_buffer2(void *dst_buf, void *src_buf, 
int len)
 static void fec_dump(struct net_device *ndev)
 {
struct fec_enet_private *fep = netdev_priv(ndev);
-   struct bufdesc *bdp;
-   struct fec_enet_priv_tx_q *txq;
-   int index = 0;
+   int i;
 
+   /* Disable all FEC interrupts */
+   writel(0, fep->hwp + FEC_IMASK);
netdev_info(ndev, "TX ring dump\n");
pr_info("Nr SC addr   len  SKB\n");
 
-   txq = fep->tx_queue[0];
-   bdp = txq->bd.base;
-
-   do {
-   pr_info("%3u %c%c 0x%04x 0x%08x %4u %p\n",
-   index,
-   bdp == txq->bd.cur ? 'S' : ' ',
-   bdp == txq->dirty_tx ? 'H' : ' ',
-   fec16_to_cpu(bdp->cbd_sc),
-   fec32_to_cpu(bdp->cbd_bufaddr),
-   fec16_to_cpu(bdp->cbd_datlen),
-   txq->tx_skbuff[index]);
-   bdp = fec_enet_get_nextdesc(bdp, >bd);
-   index++;
-   } while (bdp != txq->bd.base);
+   for (i = 0; i < fep->num_tx_queues; i++) {
+   struct fec_enet_priv_tx_q *txq = fep->tx_queue[i];
+   struct bufdesc *bdp = txq->bd.base;
+   int index = 0;
+
+   pr_info("tx queue %d\n", i);
+   do {
+   pr_info("%3u %c%c 0x%04x 0x%08x %4u %p\n",
+   index,
+   bdp == txq->bd.cur ? 'S' : ' ',
+   bdp == txq->dirty_tx ? 'H' : ' ',
+   fec16_to_cpu(bdp->cbd_sc),
+   fec32_to_cpu(bdp->cbd_bufaddr),
+   fec16_to_cpu(bdp->cbd_datlen),
+   txq->tx_skbuff[index]);
+   bdp = fec_enet_get_nextdesc(bdp, >bd);
+   index++;
+   } while (bdp != txq->bd.base);
+   }
 }
 
 static inline bool is_ipv4_pkt(struct sk_buff *skb)
-- 
2.5.0

[PATCH net-next V3 03/16] net: fec: return IRQ_HANDLED if fec_ptp_check_pps_event handled it

2016-04-05 Thread Troy Kisky

fec_ptp_check_pps_event will return 1 if FEC_T_TF_MASK caused
an interrupt. Don't return IRQ_NONE in this case.

Signed-off-by: Troy Kisky 

---
v3: New patch, came from feedback from another patch.
---
 drivers/net/ethernet/freescale/fec_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index a011719..7993040 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1579,8 +1579,8 @@ fec_enet_interrupt(int irq, void *dev_id)
}
 
if (fep->ptp_clock)
-   fec_ptp_check_pps_event(fep);
-
+   if (fec_ptp_check_pps_event(fep))
+   ret = IRQ_HANDLED;
return ret;
 }
 
-- 
2.5.0

[PATCH net-next V3 16/16] net: fec: don't set cbd_bufaddr unless no mapping error

2016-04-05 Thread Troy Kisky

Not assigning cbd_bufaddr on error will prevent trying to
unmap the error in case the FEC is reset.

Signed-off-by: Troy Kisky 

---
v3: no change
---
 drivers/net/ethernet/freescale/fec_main.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 101d820..c2ed8be 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -600,7 +600,7 @@ fec_enet_txq_put_hdr_tso(struct fec_enet_priv_tx_q *txq,
int hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
struct bufdesc_ex *ebdp = container_of(bdp, struct bufdesc_ex, desc);
void *bufaddr;
-   unsigned long dmabuf;
+   dma_addr_t dmabuf;
unsigned int estatus = 0;
 
bufaddr = txq->tso_hdrs + index * TSO_HEADER_SIZE;
@@ -1295,17 +1295,21 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct 
bufdesc *bdp, struct sk_buff
 {
struct  fec_enet_private *fep = netdev_priv(ndev);
int off;
+   dma_addr_t dmabuf;
 
off = ((unsigned long)skb->data) & fep->rx_align;
if (off)
skb_reserve(skb, fep->rx_align + 1 - off);
 
-   bdp->cbd_bufaddr = cpu_to_fec32(dma_map_single(>pdev->dev, 
skb->data, FEC_ENET_RX_FRSIZE - fep->rx_align, DMA_FROM_DEVICE));
-   if (dma_mapping_error(>pdev->dev, fec32_to_cpu(bdp->cbd_bufaddr))) 
{
+   dmabuf = dma_map_single(>pdev->dev, skb->data,
+   FEC_ENET_RX_FRSIZE - fep->rx_align,
+   DMA_FROM_DEVICE);
+   if (dma_mapping_error(>pdev->dev, dmabuf)) {
if (net_ratelimit())
netdev_err(ndev, "Rx DMA memory map failed\n");
return -ENOMEM;
}
+   bdp->cbd_bufaddr = cpu_to_fec32(dmabuf);
 
return 0;
 }
-- 
2.5.0

[PATCH net-next V3 15/16] net: fec: call dma_unmap_single on mapped tx buffers at restart

2016-04-05 Thread Troy Kisky

Make sure any pending tx buffers are unmapped when the
fec is restarted.

Signed-off-by: Troy Kisky 

---
v3: no change
---
 drivers/net/ethernet/freescale/fec_main.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index a38acf2..101d820 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -404,6 +404,7 @@ dma_mapping_error:
bdp = fec_enet_get_nextdesc(bdp, >bd);
dma_unmap_single(>pdev->dev, 
fec32_to_cpu(bdp->cbd_bufaddr),
 fec16_to_cpu(bdp->cbd_datlen), DMA_TO_DEVICE);
+   bdp->cbd_bufaddr = cpu_to_fec32(0);
}
return ERR_PTR(-ENOMEM);
 }
@@ -761,11 +762,18 @@ static void reset_tx_queue(struct fec_enet_private *fep,
txq->bd.cur = bdp;
for (i = 0; i < txq->bd.ring_size; i++) {
/* Initialize the BD for every fragment in the page. */
+   if (bdp->cbd_bufaddr) {
+   if (!IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr)))
+   dma_unmap_single(>pdev->dev,
+fec32_to_cpu(bdp->cbd_bufaddr),
+fec16_to_cpu(bdp->cbd_datlen),
+DMA_TO_DEVICE);
+   bdp->cbd_bufaddr = cpu_to_fec32(0);
+   }
if (txq->tx_skbuff[i]) {
dev_kfree_skb_any(txq->tx_skbuff[i]);
txq->tx_skbuff[i] = NULL;
}
-   bdp->cbd_bufaddr = cpu_to_fec32(0);
bdp->cbd_sc = cpu_to_fec16((bdp == txq->bd.last) ?
   BD_SC_WRAP : 0);
bdp = fec_enet_get_nextdesc(bdp, >bd);
@@ -2643,6 +2651,7 @@ static void fec_enet_free_buffers(struct net_device *ndev)
 fec32_to_cpu(bdp->cbd_bufaddr),
 FEC_ENET_RX_FRSIZE - 
fep->rx_align,
 DMA_FROM_DEVICE);
+   bdp->cbd_bufaddr = cpu_to_fec32(0);
dev_kfree_skb(skb);
}
bdp = fec_enet_get_nextdesc(bdp, >bd);
-- 
2.5.0

[PATCH net-next V3 07/16] net: fec: don't clear all rx queue bits when just one is being checked

2016-04-05 Thread Troy Kisky

FEC_ENET_RXF is 3 separate bits, we only check one queue
at a time. So, when the last queue is being checked, it is
bad to remove the interrupt on the 1st queue.

Also, since tx/rx interrupts are now cleared in the napi
routine and not the interrupt, it is not needed here any longer.

Signed-off-by: Troy Kisky 

---
v3: change commit message
---
 drivers/net/ethernet/freescale/fec_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 17140ea..3cd0cdf 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1339,8 +1339,6 @@ static int fec_rxq(struct net_device *ndev, struct 
fec_enet_priv_rx_q *rxq,
break;
pkt_received++;
 
-   writel(FEC_ENET_RXF, fep->hwp + FEC_IEVENT);
-
/* Check for errors. */
status ^= BD_ENET_RX_LAST;
if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_NO |
-- 
2.5.0

[PATCH net-next V3 08/16] net: fec: set cbd_sc without relying on previous value

2016-04-05 Thread Troy Kisky

Relying on the wrap bit of cdb_sc to stay valid once
initialized when the controller also writes to this byte
seems undesirable since we can easily know what the value
should be.

Signed-off-by: Troy Kisky 

---
v3: change commit message
---
 drivers/net/ethernet/freescale/fec_main.c | 38 +--
 1 file changed, 11 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 3cd0cdf..21d2cd0 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -340,9 +340,8 @@ fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
bdp = fec_enet_get_nextdesc(bdp, >bd);
ebdp = (struct bufdesc_ex *)bdp;
 
-   status = fec16_to_cpu(bdp->cbd_sc);
-   status &= ~BD_ENET_TX_STATS;
-   status |= (BD_ENET_TX_TC | BD_ENET_TX_READY);
+   status = BD_ENET_TX_TC | BD_ENET_TX_READY |
+   ((bdp == txq->bd.last) ? BD_SC_WRAP : 0);
frag_len = skb_shinfo(skb)->frags[frag].size;
 
/* Handle the last BD specially */
@@ -436,8 +435,6 @@ static int fec_enet_txq_submit_skb(struct 
fec_enet_priv_tx_q *txq,
/* Fill in a Tx ring entry */
bdp = txq->bd.cur;
last_bdp = bdp;
-   status = fec16_to_cpu(bdp->cbd_sc);
-   status &= ~BD_ENET_TX_STATS;
 
/* Set buffer length and buffer pointer */
bufaddr = skb->data;
@@ -462,6 +459,8 @@ static int fec_enet_txq_submit_skb(struct 
fec_enet_priv_tx_q *txq,
return NETDEV_TX_OK;
}
 
+   status = BD_ENET_TX_TC | BD_ENET_TX_READY |
+   ((bdp == txq->bd.last) ? BD_SC_WRAP : 0);
if (nr_frags) {
last_bdp = fec_enet_txq_submit_frag_skb(txq, skb, ndev);
if (IS_ERR(last_bdp)) {
@@ -512,7 +511,6 @@ static int fec_enet_txq_submit_skb(struct 
fec_enet_priv_tx_q *txq,
/* Send it on its way.  Tell FEC it's ready, interrupt when done,
 * it's the last BD of the frame, and to put the CRC on the end.
 */
-   status |= (BD_ENET_TX_READY | BD_ENET_TX_TC);
bdp->cbd_sc = cpu_to_fec16(status);
 
/* If this was the last BD in the ring, start at the beginning again. */
@@ -544,11 +542,6 @@ fec_enet_txq_put_data_tso(struct fec_enet_priv_tx_q *txq, 
struct sk_buff *skb,
unsigned int estatus = 0;
dma_addr_t addr;
 
-   status = fec16_to_cpu(bdp->cbd_sc);
-   status &= ~BD_ENET_TX_STATS;
-
-   status |= (BD_ENET_TX_TC | BD_ENET_TX_READY);
-
if (((unsigned long) data) & fep->tx_align ||
fep->quirks & FEC_QUIRK_SWAP_FRAME) {
memcpy(txq->tx_bounce[index], data, size);
@@ -578,15 +571,16 @@ fec_enet_txq_put_data_tso(struct fec_enet_priv_tx_q *txq, 
struct sk_buff *skb,
ebdp->cbd_esc = cpu_to_fec32(estatus);
}
 
+   status = BD_ENET_TX_TC | BD_ENET_TX_READY |
+   ((bdp == txq->bd.last) ? BD_SC_WRAP : 0);
/* Handle the last BD specially */
if (last_tcp)
-   status |= (BD_ENET_TX_LAST | BD_ENET_TX_TC);
+   status |= BD_ENET_TX_LAST;
if (is_last) {
status |= BD_ENET_TX_INTR;
if (fep->bufdesc_ex)
ebdp->cbd_esc |= cpu_to_fec32(BD_ENET_TX_INT);
}
-
bdp->cbd_sc = cpu_to_fec16(status);
 
return 0;
@@ -602,13 +596,8 @@ fec_enet_txq_put_hdr_tso(struct fec_enet_priv_tx_q *txq,
struct bufdesc_ex *ebdp = container_of(bdp, struct bufdesc_ex, desc);
void *bufaddr;
unsigned long dmabuf;
-   unsigned short status;
unsigned int estatus = 0;
 
-   status = fec16_to_cpu(bdp->cbd_sc);
-   status &= ~BD_ENET_TX_STATS;
-   status |= (BD_ENET_TX_TC | BD_ENET_TX_READY);
-
bufaddr = txq->tso_hdrs + index * TSO_HEADER_SIZE;
dmabuf = txq->tso_hdrs_dma + index * TSO_HEADER_SIZE;
if (((unsigned long)bufaddr) & fep->tx_align ||
@@ -641,8 +630,8 @@ fec_enet_txq_put_hdr_tso(struct fec_enet_priv_tx_q *txq,
ebdp->cbd_esc = cpu_to_fec32(estatus);
}
 
-   bdp->cbd_sc = cpu_to_fec16(status);
-
+   bdp->cbd_sc = cpu_to_fec16(BD_ENET_TX_TC | BD_ENET_TX_READY |
+   ((bdp == txq->bd.last) ? BD_SC_WRAP : 0));
return 0;
 }
 
@@ -1454,12 +1443,6 @@ static int fec_rxq(struct net_device *ndev, struct 
fec_enet_priv_rx_q *rxq,
}
 
 rx_processing_done:
-   /* Clear the status flags for this buffer */
-   status &= ~BD_ENET_RX_STATS;
-
-   /* Mark the buffer empty */
-   status |= BD_ENET_RX_EMPTY;
-
if (fep->bufdesc_ex) {
struct bufdesc_ex *ebdp = (struct bufdesc_ex *)bdp;
 
@@ -1471,7 +1454,8 @@ rx_processing_done:

[PATCH net-next V3 02/16] net: fec: remove unused interrupt FEC_ENET_TS_TIMER

2016-04-05 Thread Troy Kisky

FEC_ENET_TS_TIMER is not checked in the interrupt routine
so there is no need to enable it.

Signed-off-by: Troy Kisky 

---
v3: New patch

Frank Li said "TS_TIMER should never be triggered."
when discussing another patch.
---
 drivers/net/ethernet/freescale/fec.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index 195122e..6dd0ba8 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -374,8 +374,8 @@ struct bufdesc_ex {
 #define FEC_ENET_TS_AVAIL   ((uint)0x0001)
 #define FEC_ENET_TS_TIMER   ((uint)0x8000)
 
-#define FEC_DEFAULT_IMASK (FEC_ENET_TXF | FEC_ENET_RXF | FEC_ENET_MII | 
FEC_ENET_TS_TIMER)
-#define FEC_NAPI_IMASK (FEC_ENET_MII | FEC_ENET_TS_TIMER)
+#define FEC_DEFAULT_IMASK (FEC_ENET_TXF | FEC_ENET_RXF | FEC_ENET_MII)
+#define FEC_NAPI_IMASK FEC_ENET_MII
 #define FEC_RX_DISABLED_IMASK (FEC_DEFAULT_IMASK & (~FEC_ENET_RXF))
 
 /* ENET interrupt coalescing macro define */
-- 
2.5.0

[PATCH net-next V3 06/16] net: fec: split off napi routine with 3 queues

2016-04-05 Thread Troy Kisky

If we only have 1 tx/rx queue, we need not check
the other queues.

Signed-off-by: Troy Kisky 

---
v3: rebase changes only, fep is no longer passed as a parameter to
fec_rxq/fec_txq
---
 drivers/net/ethernet/freescale/fec_main.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 918ac82..17140ea 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1524,7 +1524,7 @@ fec_enet_interrupt(int irq, void *dev_id)
return ret;
 }
 
-static int fec_enet_rx_napi(struct napi_struct *napi, int budget)
+static int fec_enet_napi_q3(struct napi_struct *napi, int budget)
 {
struct net_device *ndev = napi->dev;
struct fec_enet_private *fep = netdev_priv(ndev);
@@ -1564,6 +1564,39 @@ static int fec_enet_rx_napi(struct napi_struct *napi, 
int budget)
return pkts;
 }
 
+static int fec_enet_napi_q1(struct napi_struct *napi, int budget)
+{
+   struct net_device *ndev = napi->dev;
+   struct fec_enet_private *fep = netdev_priv(ndev);
+   int pkts = 0;
+   uint events;
+
+   do {
+   events = readl(fep->hwp + FEC_IEVENT);
+   if (fep->events) {
+   events |= fep->events;
+   fep->events = 0;
+   }
+   events &= FEC_ENET_RXF_0 | FEC_ENET_TXF_0;
+   if (!events) {
+   if (budget) {
+   napi_complete(napi);
+   writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
+   }
+   return pkts;
+   }
+
+   writel(events, fep->hwp + FEC_IEVENT);
+   if (events & FEC_ENET_RXF_0)
+   pkts += fec_rxq(ndev, fep->rx_queue[0],
+   budget - pkts);
+   if (events & FEC_ENET_TXF_0)
+   fec_txq(ndev, fep->tx_queue[0]);
+   } while (pkts < budget);
+   fep->events |= FEC_ENET_RXF_0;  /* save for next callback */
+   return pkts;
+}
+
 /* - */
 static void fec_get_mac(struct net_device *ndev)
 {
@@ -3123,7 +3156,9 @@ static int fec_enet_init(struct net_device *ndev)
ndev->ethtool_ops = _enet_ethtool_ops;
 
writel(FEC_RX_DISABLED_IMASK, fep->hwp + FEC_IMASK);
-   netif_napi_add(ndev, >napi, fec_enet_rx_napi, NAPI_POLL_WEIGHT);
+   netif_napi_add(ndev, >napi, (fep->num_rx_queues |
+  fep->num_tx_queues) == 1 ? fec_enet_napi_q1 :
+  fec_enet_napi_q3, NAPI_POLL_WEIGHT);
 
if (fep->quirks & FEC_QUIRK_HAS_VLAN)
/* enable hw VLAN support */
-- 
2.5.0

[PATCH net-next V3 00/16] net: fec: cleanup and fixes

2016-04-05 Thread Troy Kisky

V3 has

1 dropped patch "net: fec: print more debug info in fec_timeout"
2 new patches
0002-net-fec-remove-unused-interrupt-FEC_ENET_TS_TIMER.patch
0003-net-fec-return-IRQ_HANDLED-if-fec_ptp_check_pps_even.patch

1 combined patch
0004-net-fec-pass-rxq-txq-to-fec_enet_rx-tx_queue-instead.patch

The changes are noted on individual patches

My measured performance of this series is

before patch set
365 Mbits/sec Tx/407 RX

after patch set
374 Tx/427 Rx


Troy Kisky (16):
  net: fec: only check queue 0 if RXF_0/TXF_0 interrupt is set
  net: fec: remove unused interrupt FEC_ENET_TS_TIMER
  net: fec: return IRQ_HANDLED if fec_ptp_check_pps_event handled it
  net: fec: pass rxq/txq to fec_enet_rx/tx_queue instead of queue_id
  net: fec: reduce interrupts
  net: fec: split off napi routine with 3 queues
  net: fec: don't clear all rx queue bits when just one is being checked
  net: fec: set cbd_sc without relying on previous value
  net: fec: eliminate calls to fec_enet_get_prevdesc
  net: fec: move restart test for efficiency
  net: fec: clear cbd_sc after transmission to help with debugging
  net: fec: dump all tx queues in fec_dump
  net: fec: detect tx int lost
  net: fec: create subroutine reset_tx_queue
  net: fec: call dma_unmap_single on mapped tx buffers at restart
  net: fec: don't set cbd_bufaddr unless no mapping error

 drivers/net/ethernet/freescale/fec.h  |  10 +-
 drivers/net/ethernet/freescale/fec_main.c | 410 --
 2 files changed, 218 insertions(+), 202 deletions(-)

-- 
2.5.0

[PATCH net-next V3 04/16] net: fec: pass rxq/txq to fec_enet_rx/tx_queue instead of queue_id

2016-04-05 Thread Troy Kisky

The queue_id is the qid member of struct bufdesc_prop.
Passing rxq/txq will allow the macro FEC_ENET_GET_QUQUE to be removed
in the next patch.

Signed-off-by: Troy Kisky 
Acked-by: Fugang Duan 

---
v3:
add Acked-by
combine with "net: fec: pass txq to fec_enet_tx_queue instead of queue_id"
reverted change that passed fep as a parameter
---
 drivers/net/ethernet/freescale/fec_main.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 7993040..b4d46f8 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1156,25 +1156,18 @@ fec_enet_hwtstamp(struct fec_enet_private *fep, 
unsigned ts,
hwtstamps->hwtstamp = ns_to_ktime(ns);
 }
 
-static void
-fec_enet_tx_queue(struct net_device *ndev, u16 queue_id)
+static void fec_txq(struct net_device *ndev, struct fec_enet_priv_tx_q *txq)
 {
-   struct  fec_enet_private *fep;
+   struct  fec_enet_private *fep = netdev_priv(ndev);
struct bufdesc *bdp;
unsigned short status;
struct  sk_buff *skb;
-   struct fec_enet_priv_tx_q *txq;
struct netdev_queue *nq;
int index = 0;
int entries_free;
 
-   fep = netdev_priv(ndev);
-
-   queue_id = FEC_ENET_GET_QUQUE(queue_id);
-
-   txq = fep->tx_queue[queue_id];
/* get next bdp of dirty_tx */
-   nq = netdev_get_tx_queue(ndev, queue_id);
+   nq = netdev_get_tx_queue(ndev, txq->bd.qid);
bdp = txq->dirty_tx;
 
/* get next bdp of dirty_tx */
@@ -1268,11 +1261,13 @@ static void
 fec_enet_tx(struct net_device *ndev)
 {
struct fec_enet_private *fep = netdev_priv(ndev);
+   struct fec_enet_priv_tx_q *txq;
u16 queue_id;
/* First process class A queue, then Class B and Best Effort queue */
for_each_set_bit(queue_id, >work_tx, FEC_ENET_MAX_TX_QS) {
clear_bit(queue_id, >work_tx);
-   fec_enet_tx_queue(ndev, queue_id);
+   txq = fep->tx_queue[FEC_ENET_GET_QUQUE(queue_id)];
+   fec_txq(ndev, txq);
}
return;
 }
@@ -1328,11 +1323,10 @@ static bool fec_enet_copybreak(struct net_device *ndev, 
struct sk_buff **skb,
  * not been given to the system, we just set the empty indicator,
  * effectively tossing the packet.
  */
-static int
-fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
+static int fec_rxq(struct net_device *ndev, struct fec_enet_priv_rx_q *rxq,
+  int budget)
 {
struct fec_enet_private *fep = netdev_priv(ndev);
-   struct fec_enet_priv_rx_q *rxq;
struct bufdesc *bdp;
unsigned short status;
struct  sk_buff *skb_new = NULL;
@@ -1350,8 +1344,6 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, 
u16 queue_id)
 #ifdef CONFIG_M532x
flush_cache_all();
 #endif
-   queue_id = FEC_ENET_GET_QUQUE(queue_id);
-   rxq = fep->rx_queue[queue_id];
 
/* First, grab all of the stats for the incoming packet.
 * These get messed up if we get called due to a busy condition.
@@ -1519,11 +1511,12 @@ fec_enet_rx(struct net_device *ndev, int budget)
int pkt_received = 0;
u16 queue_id;
struct fec_enet_private *fep = netdev_priv(ndev);
+   struct fec_enet_priv_rx_q *rxq;
 
for_each_set_bit(queue_id, >work_rx, FEC_ENET_MAX_RX_QS) {
clear_bit(queue_id, >work_rx);
-   pkt_received += fec_enet_rx_queue(ndev,
-   budget - pkt_received, queue_id);
+   rxq = fep->rx_queue[FEC_ENET_GET_QUQUE(queue_id)];
+   pkt_received += fec_rxq(ndev, rxq, budget - pkt_received);
}
return pkt_received;
 }
-- 
2.5.0

[PATCH net-next V3 01/16] net: fec: only check queue 0 if RXF_0/TXF_0 interrupt is set

2016-04-05 Thread Troy Kisky

Before queue 0 was always checked if any queue caused an interrupt.
It is better to just mark queue 0 if queue 0 has caused an interrupt.

Signed-off-by: Troy Kisky 
Acked-by: Fugang Duan 

---
v3: add Acked-by
---
 drivers/net/ethernet/freescale/fec_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 08243c2..a011719 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1534,14 +1534,14 @@ fec_enet_collect_events(struct fec_enet_private *fep, 
uint int_events)
if (int_events == 0)
return false;
 
-   if (int_events & FEC_ENET_RXF)
+   if (int_events & FEC_ENET_RXF_0)
fep->work_rx |= (1 << 2);
if (int_events & FEC_ENET_RXF_1)
fep->work_rx |= (1 << 0);
if (int_events & FEC_ENET_RXF_2)
fep->work_rx |= (1 << 1);
 
-   if (int_events & FEC_ENET_TXF)
+   if (int_events & FEC_ENET_TXF_0)
fep->work_tx |= (1 << 2);
if (int_events & FEC_ENET_TXF_1)
fep->work_tx |= (1 << 0);
-- 
2.5.0

Re: [PATCH net-next] net: dsa: document missing functions

2016-04-05 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> On Tue, Apr 05, 2016 at 11:22:40AM -0400, Vivien Didelot wrote:
>> Add description for the missing port_vlan_prepare, port_fdb_prepare,
>> port_fdb_dump functions in the DSA documentation.
>> 
>> Signed-off-by: Vivien Didelot 
>
> Hi Vivien
>
> A few English improvements:
>
>> ---
>>  Documentation/networking/dsa/dsa.txt | 16 
>>  1 file changed, 16 insertions(+)
>> 
>> diff --git a/Documentation/networking/dsa/dsa.txt 
>> b/Documentation/networking/dsa/dsa.txt
>> index 3b196c3..8ba3369 100644
>> --- a/Documentation/networking/dsa/dsa.txt
>> +++ b/Documentation/networking/dsa/dsa.txt
>> @@ -542,6 +542,12 @@ Bridge layer
>>  Bridge VLAN filtering
>>  -
>>  
>> +- port_vlan_prepare: bridge layer function invoked when the bridge prepares 
>> the
>> +  configuration of a VLAN on the given port. If the operation is not
>> +  programmable, this function should return -EOPNOTSUPP to inform the bridge
>
> s/programmable/supported by the hardware
>
>> +  code to fallback to a software implementation. No hardware programmation
>
> s/programmation/setup
>
>> +  must be done in this function. See port_vlan_add for this and details.
>> +
>>  - port_vlan_add: bridge layer function invoked when a VLAN is configured
>>(tagged or untagged) for the given switch port
>>  
>> @@ -552,6 +558,12 @@ Bridge VLAN filtering
>>function that the driver has to call for each VLAN the given port is a 
>> member
>>of. A switchdev object is used to carry the VID and bridge flags.
>>  
>> +- port_fdb_prepare: bridge layer function invoked when the bridge prepares 
>> the
>> +  installation of a Forwarding Database entry. If the operation is not
>> +  programmable, this function should return -EOPNOTSUPP to inform the bridge
>
> s/programmable/supported
>
>> +  code to fallback to a software implementation. No hardware programmation
>
> s/programmation/setup

Done, v2 on its way.

Thanks,
Vivien

Re: [RFC PATCH 5/6] ppp: define reusable device creation functions

2016-04-05 Thread Stephen Hemminger

On Tue, 5 Apr 2016 23:14:56 +0200
Guillaume Nault  wrote:

> On Tue, Apr 05, 2016 at 08:28:32AM -0700, Stephen Hemminger wrote:
> > On Tue, 5 Apr 2016 02:56:29 +0200
> > Guillaume Nault  wrote:
> > 
> > > Move PPP device initialisation and registration out of
> > > ppp_create_interface().
> > > This prepares code for device registration with rtnetlink.
> > > 
> > 
> > Does PPP module autoload correctly based on the netlink attributes?
> > 
> Patch #6 has MODULE_ALIAS_RTNL_LINK("ppp"). This works fine for
> auto-loading ppp_generic when creating a PPP device with rtnetlink.
> Is there anything else required?
> 

That should be enough.

Re: [RFC PATCH net 3/4] ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update

2016-04-05 Thread Martin KaFai Lau

On Mon, Apr 04, 2016 at 01:45:02PM -0700, Cong Wang wrote:
> I see your point, but calling __ip6_datagram_connect() seems overkill
> here, we don't need to update so many things in the pmtu update context,
> at least IPv4 doesn't do that either. I don't think you have to do that.
>
> So why just updating the dst cache (also some addr cache) here is not
> enough?
I am not sure I understand.  I could be missing something.

This patch uses ip6_datagram_dst_update() to do the route lookup and
sk->sk_dst_cache update.  ip6_datagram_dst_update() is
created in the first two refactoring patches and is also used by
__ip6_datagram_connect().

Which operations in ip6_datagram_dst_update() could be saved
during the pmtu update?

Re: [net PATCH v2 2/2] ipv4/GRO: Make GRO conform to RFC 6864

2016-04-05 Thread Marcelo Ricardo Leitner

On Tue, Apr 05, 2016 at 12:36:40PM -0300, Tom Herbert wrote:
> On Tue, Apr 5, 2016 at 12:07 PM, Edward Cree  wrote:
> > On 05/04/16 05:32, Herbert Xu wrote:
> >> On Mon, Apr 04, 2016 at 09:26:55PM -0700, Alexander Duyck wrote:
> >>> The question I would have is what are you really losing with increment
> >>> from 0 versus fixed 0?  From what I see it is essentially just garbage
> >>> in/garbage out.
> >> GRO is meant to be lossless, that is, you should not be able to
> >> detect its presence from the outside.  If you lose information then
> >> you're breaking this rule and people will soon start asking for it
> >> to be disabled in various situations.
> >>
> >> I'm not against doing this per se but it should not be part of the
> >> default configuration.
> > I'm certainly in favour of this being configurable - indeed IMHO it should
> > also be possible to configure GRO with the 'looser' semantics of LRO, so
> > that people who want that can get it without all the horrible "don't confuse
> > Slow Start" hacks, and so that LRO can go away (AIUI the only reasons it
> > exists are (a) improved performance from the 'loose' semantics and (b) old
> > kernels without GRO.  We may not be able to kill (b) but we can certainly
> > address (a)).
> >
> > But I don't agree that the default has to be totally lossless; anyone who is
> > caring about the ID fields in atomic datagrams is breaking the RFCs, and can
> > be assumed to Know What They're Doing sufficiently to configure this.
> >
> > On the gripping hand, I feel like GRO+TSO is the wrong model for speeding up
> > forwarding/routing workloads.  Instead we should be looking into having 
> > lists
> > of SKBs traverse the stack together, splitting the list whenever e.g. the
> > destination changes.  That seems like it ought to be much more efficient 
> > than
> > rewriting headers twice, once to coalesce a superframe and once to segment 
> > it
> > again - and it also means this worry about GRO being lossless can go away.
> > But until someone tries implementing skb batches, we won't know for sure if
> > it works (and I don't have time right now ;)
> >
> Ed,
> 
> I thought about that some. It seems like we would want to do both GRO
> and retain all the individual packets in the skb so that we could use
> those for forwarding instead of GSO as I think you're saying. This

Retaining the individual packets would also help to make GRO feasible
for SCTP.  SCTP needs to know where each packet ended because of AUTH
chunks and we cannot rely on something like gso_size as each original
packet had it's own size.

I could do it for tx side (see my SCTP/GSO RFC patches) using
skb_gro_receive() with a specially crafted header skb, but I'm not
seeing a way to do it in rx side as I cannot guarantee incoming skbs
will follow that pattern.

  Marcelo

> would would work great in the plain forwarding case, but one problem
> is what to do if the host modifies the super packet (for instance when
> forwarding over a tunnel we might add encapsulation header). This
> should work in GSO (although we need to address the limitations around
> 1 encap level), not sure this is easy if we need to add a header to
> each packet in a batch.
> 
> Tom
> 
> 
> 
> > -Ed
>

Re: [PATCH v3 -next] net/core/dev: Warn on a too-short GRO frame

2016-04-05 Thread David Miller

From: Aaron Conole 
Date: Sat,  2 Apr 2016 15:26:43 -0400

> From: Aaron Conole 
> 
> When signaling that a GRO frame is ready to be processed, the network stack
> correctly checks length and aborts processing when a frame is less than 14
> bytes. However, such a condition is really indicative of a broken driver,
> and should be loudly signaled, rather than silently dropped as the case is
> today.
> 
> Convert the condition to use net_warn_ratelimited() to ensure the stack
> loudly complains about such broken drivers.
> 
> Signed-off-by: Aaron Conole 

Applied, thanks.

Re: [PATCH v2] net: remove unimplemented RTNH_F_PERVASIVE

2016-04-05 Thread David Miller

From: Quentin Armitage 
Date: Sat,  2 Apr 2016 17:51:28 +0100

> Linux 2.1.68 introduced RTNH_F_PERVASIVE, but it had no implementation
> and couldn't be enabled since the required config parameter wasn't in
> any Kconfig file (see commit d088dde7b196 ("ipv4: obsolete config in
> kernel source (IP_ROUTE_PERVASIVE)")).
> 
> This commit removes all remaining references to RTNH_F_PERVASIVE.
> Although this will cause userspace applications that were using the
> flag to fail to build, they will be alerted to the fact that using
> RTNH_F_PERVASIVE was not achieving anything.
> 
> Signed-off-by: Quentin Armitage 

Can't really delete values like this from user visible headers.  It
can break the build.

What if some library or tool has a table translating RTNH_F_* values
into strings to display to the user?  Those sources will stop building
if I apply your changes.

Re: [RFC PATCH net 3/4] ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update

2016-04-05 Thread David Miller

From: Cong Wang 
Date: Mon, 4 Apr 2016 13:45:02 -0700

> On Sat, Apr 2, 2016 at 7:33 PM, Martin KaFai Lau  wrote:
>> One thing to note is that this patch uses the addresses from the sk
>> instead of iph when updating sk->sk_dst_cache.  It is basically the
>> same logic that the __ip6_datagram_connect() is doing, so some
>> refactoring works in the first two patches.
>>
>> AFAIK, a UDP socket can become connected after sending out some
>> datagrams in un-connected state.  or It can be connected
>> multiple times to different destinations.  I did some quick
>> tests but I could be wrong.
>>
>> I am thinking if there could be a chance that the skb->data, which
>> has the original outgoing iph, is not related to the current
>> connected address.  If it is possible, we have to specifically
>> use the addresses in the sk instead of skb->data (i.e. iph) when
>> updating the sk->sk_dst_cache.
>>
>> If we need to use the sk addresses (and other info) to find out a
>> new dst for a connected udp socket, it is better not doing it while
>> the userland is connecting to somewhere else.
>>
>> If the above case is impossible, we can keep using the info from iph to
>> do the dst update for a connected-udp sk without taking the lock.
> 
> I see your point, but calling __ip6_datagram_connect() seems overkill
> here, we don't need to update so many things in the pmtu update context,
> at least IPv4 doesn't do that either. I don't think you have to do that.
> 
> So why just updating the dst cache (also some addr cache) here is not
> enough?

I think we are steadily getting closer to a version of this fix that
we have some agreement on, right?

Martin can you address Cong's feedback and spin another version of this
series?

Thanks.

Re: [net-next PATCH 2/2 v4] ibmvnic: enable RX checksum offload

2016-04-05 Thread David Miller

From: Thomas Falcon 
Date: Fri,  1 Apr 2016 17:20:35 -0500

> Enable RX Checksum offload feature in the ibmvnic driver.
> 
> Signed-off-by: Thomas Falcon 

Applied.

Re: [PATCH net-next 2/3] net: dsa: make the FDB add function return void

2016-04-05 Thread Andrew Lunn

On Tue, Apr 05, 2016 at 11:24:34AM -0400, Vivien Didelot wrote:
> The switchdev design implies that a software error should not happen in
> the commit phase since it must have been previously reported in the
> prepare phase. If an hardware error occurs during the commit phase,
> there is nothing switchdev can do about it.
> 
> The DSA layer separates port_fdb_prepare and port_fdb_add for simplicity
> and convenience. If an hardware error occurs during the commit phase,
> there is no need to report it outside the DSA driver itself.
> 
> Make the DSA port_fdb_add routine return void for explicitness.
> 
> Signed-off-by: Vivien Didelot 
> ---
>  drivers/net/dsa/bcm_sf2.c   |  9 +
>  drivers/net/dsa/mv88e6xxx.c | 12 +---
>  drivers/net/dsa/mv88e6xxx.h |  6 +++---
>  include/net/dsa.h   |  2 +-
>  net/dsa/slave.c | 16 
>  5 files changed, 22 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
> index b847624..feebeaa 100644
> --- a/drivers/net/dsa/bcm_sf2.c
> +++ b/drivers/net/dsa/bcm_sf2.c
> @@ -722,13 +722,14 @@ static int bcm_sf2_sw_fdb_prepare(struct dsa_switch 
> *ds, int port,
>   return 0;
>  }
>  
> -static int bcm_sf2_sw_fdb_add(struct dsa_switch *ds, int port,
> -   const struct switchdev_obj_port_fdb *fdb,
> -   struct switchdev_trans *trans)
> +static void bcm_sf2_sw_fdb_add(struct dsa_switch *ds, int port,
> +const struct switchdev_obj_port_fdb *fdb,
> +struct switchdev_trans *trans)
>  {
>   struct bcm_sf2_priv *priv = ds_to_priv(ds);
>  
> - return bcm_sf2_arl_op(priv, 0, port, fdb->addr, fdb->vid, true);
> + if (bcm_sf2_arl_op(priv, 0, port, fdb->addr, fdb->vid, true))
> + pr_err("%s: failed to add address\n", __func__);
>  }
>  
>  static int bcm_sf2_sw_fdb_del(struct dsa_switch *ds, int port,
> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> index 5a2e46d..bca9a2c 100644
> --- a/drivers/net/dsa/mv88e6xxx.c
> +++ b/drivers/net/dsa/mv88e6xxx.c
> @@ -2090,21 +2090,19 @@ int mv88e6xxx_port_fdb_prepare(struct dsa_switch *ds, 
> int port,
>   return 0;
>  }
>  
> -int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
> -const struct switchdev_obj_port_fdb *fdb,
> -struct switchdev_trans *trans)
> +void mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
> + const struct switchdev_obj_port_fdb *fdb,
> + struct switchdev_trans *trans)
>  {
>   int state = is_multicast_ether_addr(fdb->addr) ?
>   GLOBAL_ATU_DATA_STATE_MC_STATIC :
>   GLOBAL_ATU_DATA_STATE_UC_STATIC;
>   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
> - int ret;
>  
>   mutex_lock(>smi_mutex);
> - ret = _mv88e6xxx_port_fdb_load(ds, port, fdb->addr, fdb->vid, state);
> + if (_mv88e6xxx_port_fdb_load(ds, port, fdb->addr, fdb->vid, state))
> + netdev_warn(ds->ports[port], "cannot load address\n");

In the SF2 driver you use pr_err, but here netdev_warn. We probably
should be consistent if we error or warn. I would use netdev_error,
since if this fails we probably have a real hardware problem.

  Andrew

Re: [net-next PATCH 1/2 v4] ibmvnic: map L2/L3/L4 header descriptors to firmware

2016-04-05 Thread David Miller

From: Thomas Falcon 
Date: Fri,  1 Apr 2016 17:20:34 -0500

> Allow the VNIC driver to provide descriptors containing
> L2/L3/L4 headers to firmware.  This feature is needed
> for greater hardware compatibility and enablement of checksum
> and TCP offloading features.
> 
> A new function is included for the hypervisor call,
> H_SEND_SUBCRQ_INDIRECT, allowing a DMA-mapped array of SCRQ
> descriptor elements to be sent to the VNIC server.
> 
> These additions will help fully enable checksum offloading as
> well as other features as they are included later.
> 
> Signed-off-by: Thomas Falcon 

Applied.

Re: [PATCH] ip6_tunnel: set rtnl_link_ops before calling register_netdevice

2016-04-05 Thread David Miller

From: Thadeu Lima de Souza Cascardo 
Date: Fri,  1 Apr 2016 17:17:50 -0300

> When creating an ip6tnl tunnel with ip tunnel, rtnl_link_ops is not set
> before ip6_tnl_create2 is called. When register_netdevice is called, there
> is no linkinfo attribute in the NEWLINK message because of that.
> 
> Setting rtnl_link_ops before calling register_netdevice fixes that.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo 

Applied and queued up for -stable.

Re: [PATCH net-next 1/3] net: dsa: make the STP state function return void

2016-04-05 Thread Andrew Lunn

> -- port_stp_update: bridge layer function invoked when a given switch port STP
> +- port_stp_state: bridge layer function invoked when a given switch port STP

Hi Vivien

port_stp_state_set might be a better name, to make it clear it is
setting the state, not getting the current state, etc. Most of the
other functions are _add, _prepare, _join, _leave, so _set would fit
the pattern.

Changing to a void makes sense.

 Andrew

Re: [net PATCH v2 2/2] ipv4/GRO: Make GRO conform to RFC 6864

2016-04-05 Thread David Miller

From: Edward Cree 
Date: Tue, 5 Apr 2016 16:07:49 +0100

> On the gripping hand, I feel like GRO+TSO is the wrong model for
> speeding up forwarding/routing workloads.  Instead we should be
> looking into having lists of SKBs traverse the stack together,
> splitting the list whenever e.g. the destination changes.

"Destination" is a very complicated beast.  It's not just a
destination IP address.

It's not even just a full saddr/daddr/TOS triplet.

Packets can be forwarded around based upon any key whatsoever in the
headers.  Netfilter can mangle them based upon arbitrary bits in the
packet, as can the packet scheduler classifier actions.

It's therefore not profitable to try this at all, it's completely
pointless unless all the keys match up exactly.

This is why GRO _is_ the proper model to speed this stuff and do
bulk processing, because it still presents a full "packet" to all
of these layers to mangle, rewrite, route, and do whatever else
however they like.

Re: [PATCH net-next] net: dsa: document missing functions

2016-04-05 Thread Andrew Lunn

On Tue, Apr 05, 2016 at 11:22:40AM -0400, Vivien Didelot wrote:
> Add description for the missing port_vlan_prepare, port_fdb_prepare,
> port_fdb_dump functions in the DSA documentation.
> 
> Signed-off-by: Vivien Didelot 

Hi Vivien

A few English improvements:

> ---
>  Documentation/networking/dsa/dsa.txt | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/Documentation/networking/dsa/dsa.txt 
> b/Documentation/networking/dsa/dsa.txt
> index 3b196c3..8ba3369 100644
> --- a/Documentation/networking/dsa/dsa.txt
> +++ b/Documentation/networking/dsa/dsa.txt
> @@ -542,6 +542,12 @@ Bridge layer
>  Bridge VLAN filtering
>  -
>  
> +- port_vlan_prepare: bridge layer function invoked when the bridge prepares 
> the
> +  configuration of a VLAN on the given port. If the operation is not
> +  programmable, this function should return -EOPNOTSUPP to inform the bridge

s/programmable/supported by the hardware

> +  code to fallback to a software implementation. No hardware programmation

s/programmation/setup

> +  must be done in this function. See port_vlan_add for this and details.
> +
>  - port_vlan_add: bridge layer function invoked when a VLAN is configured
>(tagged or untagged) for the given switch port
>  
> @@ -552,6 +558,12 @@ Bridge VLAN filtering
>function that the driver has to call for each VLAN the given port is a 
> member
>of. A switchdev object is used to carry the VID and bridge flags.
>  
> +- port_fdb_prepare: bridge layer function invoked when the bridge prepares 
> the
> +  installation of a Forwarding Database entry. If the operation is not
> +  programmable, this function should return -EOPNOTSUPP to inform the bridge

s/programmable/supported

> +  code to fallback to a software implementation. No hardware programmation

s/programmation/setup

Andrew

Re: [RESEND PATCH net-next v2 0/3] bcmgenet cleanups

2016-04-05 Thread David Miller

From: Petri Gynther 
Date: Tue,  5 Apr 2016 13:59:58 -0700

> Three cleanup patches for bcmgenet.

Series applied, thanks.

Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete

2016-04-05 Thread Bastien Philbert



On 2016-04-05 07:29 PM, David Miller wrote:
> From: Daniel Borkmann 
> Date: Tue, 05 Apr 2016 23:53:52 +0200
> 
>> On 04/05/2016 11:36 PM, Bastien Philbert wrote:
>>> This fixes error handling for the switch statement case
>>> SCTP_CMD_SEND_PKT by making the error value of the call
>>> to sctp_packet_transmit equal the variable error due to
>>> this function being able to fail with a error code. In
>>
>> What actual issue have you observed that you fix?
>>
>>> addition allow the call to sctp_ootb_pkt_free afterwards
>>> to free up the no longer in use sctp packet even if the
>>> call to the function sctp_packet_transmit fails in order
>>> to avoid a memory leak here for not freeing the sctp
>>
>> Not sure how this relates to your code?
> 
> Bastien, I'm seeing a clear negative pattern with the bug fixes
> you are submitting.
> 
> Just now you submitted the ICMP change which obviously was never
> tested because it tried to take the RTNL mutex in atomic context,
> and now this sctp thing.
> 
> If you don't start actually testing your changes and expalining
> clearly what the problem actually is, how you discovered it,
> and how you actually tested your patch, I will start completely
> ignoring your patch submissions.
> 
Ok sure I will be more careful with my future patches. Sorry about those 
two patches :(.
Bastien

Re: [net 0/3][pull request] Intel Wired LAN Driver Updates 2016-04-05

2016-04-05 Thread David Miller

From: Jeff Kirsher 
Date: Tue,  5 Apr 2016 15:30:48 -0700

> This series contains updates to i40e and e1000.

Pulled, thanks Jeff.

Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete

2016-04-05 Thread David Miller

From: Daniel Borkmann 
Date: Tue, 05 Apr 2016 23:53:52 +0200

> On 04/05/2016 11:36 PM, Bastien Philbert wrote:
>> This fixes error handling for the switch statement case
>> SCTP_CMD_SEND_PKT by making the error value of the call
>> to sctp_packet_transmit equal the variable error due to
>> this function being able to fail with a error code. In
> 
> What actual issue have you observed that you fix?
> 
>> addition allow the call to sctp_ootb_pkt_free afterwards
>> to free up the no longer in use sctp packet even if the
>> call to the function sctp_packet_transmit fails in order
>> to avoid a memory leak here for not freeing the sctp
> 
> Not sure how this relates to your code?

Bastien, I'm seeing a clear negative pattern with the bug fixes
you are submitting.

Just now you submitted the ICMP change which obviously was never
tested because it tried to take the RTNL mutex in atomic context,
and now this sctp thing.

If you don't start actually testing your changes and expalining
clearly what the problem actually is, how you discovered it,
and how you actually tested your patch, I will start completely
ignoring your patch submissions.

Re: [RESEND PATCH net-next v2 3/3] net: bcmgenet: cleanup for dmadesc_set()

2016-04-05 Thread Florian Fainelli

2016-04-05 14:00 GMT-07:00 Petri Gynther :
> dmadesc_set() is used for setting the Tx buffer DMA address, length,
> and status bits on a Tx ring descriptor when a frame is being Tx'ed.
>
> Always set the Tx buffer DMA address first, before updating the length
> and status bits, i.e. giving the Tx descriptor to the hardware.
>
> The reason this is a cleanup rather than a fix is that the hardware
> won't transmit anything from a Tx ring until the TDMA producer index
> has been incremented. As long as the dmadesc_set() writes complete
> before the TDMA producer index write, life is good.
>
> Signed-off-by: Petri Gynther 

Acked-by: Florian Fainelli 
--
Florian

Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete

2016-04-05 Thread Bastien Philbert



On 2016-04-05 06:12 PM, Marcelo Ricardo Leitner wrote:
> On Tue, Apr 05, 2016 at 05:36:41PM -0400, Bastien Philbert wrote:
>> This fixes error handling for the switch statement case
>> SCTP_CMD_SEND_PKT by making the error value of the call
>> to sctp_packet_transmit equal the variable error due to
>> this function being able to fail with a error code. In
>> addition allow the call to sctp_ootb_pkt_free afterwards
>> to free up the no longer in use sctp packet even if the
>> call to the function sctp_packet_transmit fails in order
>> to avoid a memory leak here for not freeing the sctp
> 
> This leak shouldn't exist as sctp_packet_transmit() will free the packet
> if it returns ENOMEM, through the nomem: handling.
> 
> But about making it visible to the user, that looks interesting to me
> although I cannot foresee yet its effects, like the comment at the end
> of sctp_packet_transmit() on not returning EHOSTUNREACH. Did you check
> it?
> 
I was aware of the -EHOSTUNREACH issue but assumed that this needs to be
known to functions internal to the kernel. TO rephase does it matter if
the callers of this function known if sctp_packet_transmit or care if it 
fails or is this just unnecessary as we do cleanup else where which is 
enough so the new error check is not needed? Again if their is a certain
test would like me to run on this patch too to make sure it's OK I don't
mind, just let me known :).
Cheers,
Bastien
>>
>> Signed-off-by: Bastien Philbert 
>> ---
>>  net/sctp/sm_sideeffect.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
>> index 7fe56d0..f3a8b58 100644
>> --- a/net/sctp/sm_sideeffect.c
>> +++ b/net/sctp/sm_sideeffect.c
>> @@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t 
>> event_type,
>>  case SCTP_CMD_SEND_PKT:
>>  /* Send a full packet to our peer.  */
>>  packet = cmd->obj.packet;
>> -sctp_packet_transmit(packet, gfp);
>> +error = sctp_packet_transmit(packet, gfp);
>>  sctp_ootb_pkt_free(packet);
>>  break;
>>  
>> -- 
>> 2.5.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

[net 2/3] e1000: Do not overestimate descriptor counts in Tx pre-check

2016-04-05 Thread Jeff Kirsher

From: Alexander Duyck 

The current code path is capable of grossly overestimating the number of
descriptors needed to transmit a new frame.  This specifically occurs if
the skb contains a number of 4K pages.  The issue is that the logic for
determining the descriptors needed is ((S) >> (X)) + 1.  When X is 12 it
means that we were indicating that we required 2 descriptors for each 4K
page when we only needed one.

This change corrects this by instead adding (1 << (X)) - 1 to the S value
instead of adding 1 after the fact.  This way we get an accurate descriptor
needed count as we are essentially doing a DIV_ROUNDUP().

Reported-by: Ivan Suzdal 
Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 3fc7bde..d213fb4 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3106,7 +3106,7 @@ static int e1000_maybe_stop_tx(struct net_device *netdev,
return __e1000_maybe_stop_tx(netdev, size);
 }
 
-#define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1)
+#define TXD_USE_COUNT(S, X) (((S) + ((1 << (X)) - 1)) >> (X))
 static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
struct net_device *netdev)
 {
-- 
2.5.5

[net 1/3] i40e: fix errant PCIe bandwidth message

2016-04-05 Thread Jeff Kirsher

From: Jesse Brandeburg 

There was an error introduced with commit 3fced535079a ("i40e: X722 is
on the IOSF bus and does not report the PCI bus info"), where code was
added but the enabling flag is never set.

CC: Anjali Singhai Jain 
CC: Stefan Assman 
Fixes: 3fced535079a ("i40e: X722 is on the IOSF bus ...")
Reported-by: Steve Best 
Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6700643..3449129 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8559,6 +8559,7 @@ static int i40e_sw_init(struct i40e_pf *pf)
 I40E_FLAG_OUTER_UDP_CSUM_CAPABLE |
 I40E_FLAG_WB_ON_ITR_CAPABLE |
 I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE |
+I40E_FLAG_NO_PCI_LINK_CHECK |
 I40E_FLAG_100M_SGMII_CAPABLE |
 I40E_FLAG_USE_SET_LLDP_MIB |
 I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
-- 
2.5.5

[net 3/3] e1000: Double Tx descriptors needed check for 82544

2016-04-05 Thread Jeff Kirsher

From: Alexander Duyck 

The 82544 has code that adds one additional descriptor per data buffer.
However we weren't taking that into account when determining the descriptors
needed for the next transmit at the end of the xmit_frame path.

This change takes that into account by doubling the number of descriptors
needed for the 82544 so that we can avoid a potential issue where we could
hang the Tx ring by loading frames with xmit_more enabled and then stopping
the ring without writing the tail.

In addition it adds a few more descriptors to account for some additional
workarounds that have been added over time.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index d213fb4..ae90d4f 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3256,12 +3256,29 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 nr_frags, mss);
 
if (count) {
+   /* The descriptors needed is higher than other Intel drivers
+* due to a number of workarounds.  The breakdown is below:
+* Data descriptors: MAX_SKB_FRAGS + 1
+* Context Descriptor: 1
+* Keep head from touching tail: 2
+* Workarounds: 3
+*/
+   int desc_needed = MAX_SKB_FRAGS + 7;
+
netdev_sent_queue(netdev, skb->len);
skb_tx_timestamp(skb);
 
e1000_tx_queue(adapter, tx_ring, tx_flags, count);
+
+   /* 82544 potentially requires twice as many data descriptors
+* in order to guarantee buffers don't end on evenly-aligned
+* dwords
+*/
+   if (adapter->pcix_82544)
+   desc_needed += MAX_SKB_FRAGS + 1;
+
/* Make sure there is space in the ring for the next send. */
-   e1000_maybe_stop_tx(netdev, tx_ring, MAX_SKB_FRAGS + 2);
+   e1000_maybe_stop_tx(netdev, tx_ring, desc_needed);
 
if (!skb->xmit_more ||
netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
-- 
2.5.5

[net 0/3][pull request] Intel Wired LAN Driver Updates 2016-04-05

2016-04-05 Thread Jeff Kirsher

This series contains updates to i40e and e1000.

Jesse fixes an issue where code was added by a previous commit but the
flag to enable it was never set.

Alex fixes the e1000 driver from grossly overestimated the descriptors
needed to transmit a frame.

The following are changes since commit eb8e97715f29a1240cdf67b0df725be27433259f:
  sctp: use list_* in sctp_list_dequeue
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue master

Alexander Duyck (2):
  e1000: Do not overestimate descriptor counts in Tx pre-check
  e1000: Double Tx descriptors needed check for 82544

Jesse Brandeburg (1):
  i40e: fix errant PCIe bandwidth message

 drivers/net/ethernet/intel/e1000/e1000_main.c | 21 +++--
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  1 +
 2 files changed, 20 insertions(+), 2 deletions(-)

-- 
2.5.5

Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete

2016-04-05 Thread Marcelo Ricardo Leitner

On Tue, Apr 05, 2016 at 05:36:41PM -0400, Bastien Philbert wrote:
> This fixes error handling for the switch statement case
> SCTP_CMD_SEND_PKT by making the error value of the call
> to sctp_packet_transmit equal the variable error due to
> this function being able to fail with a error code. In
> addition allow the call to sctp_ootb_pkt_free afterwards
> to free up the no longer in use sctp packet even if the
> call to the function sctp_packet_transmit fails in order
> to avoid a memory leak here for not freeing the sctp

This leak shouldn't exist as sctp_packet_transmit() will free the packet
if it returns ENOMEM, through the nomem: handling.

But about making it visible to the user, that looks interesting to me
although I cannot foresee yet its effects, like the comment at the end
of sctp_packet_transmit() on not returning EHOSTUNREACH. Did you check
it?

> 
> Signed-off-by: Bastien Philbert 
> ---
>  net/sctp/sm_sideeffect.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index 7fe56d0..f3a8b58 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>   case SCTP_CMD_SEND_PKT:
>   /* Send a full packet to our peer.  */
>   packet = cmd->obj.packet;
> - sctp_packet_transmit(packet, gfp);
> + error = sctp_packet_transmit(packet, gfp);
>   sctp_ootb_pkt_free(packet);
>   break;
>  
> -- 
> 2.5.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter

2016-04-05 Thread Alexei Starovoitov

On Tue, Apr 05, 2016 at 11:29:05AM +0200, Jesper Dangaard Brouer wrote:
> > 
> > Of course, there are other pieces to accelerate:
> >  12.71%  ksoftirqd/1[mlx4_en] [k] mlx4_en_alloc_frags
> >   6.87%  ksoftirqd/1[mlx4_en] [k] mlx4_en_free_frag
> >   4.20%  ksoftirqd/1[kernel.vmlinux]  [k] get_page_from_freelist
> >   4.09%  swapper[mlx4_en] [k] mlx4_en_process_rx_cq
> > and I think Jesper's work on batch allocation is going help that a lot.
> 
> Actually, it looks like all of this "overhead" comes from the page
> alloc/free (+ dma unmap/map). We would need a page-pool recycle
> mechanism to solve/remove this overhead.  For the early drop case we
> might be able to hack recycle the page directly in the driver (and also
> avoid dma_unmap/map cycle).

Exactly. A cache of allocated and mapped pages will help a lot both drop
and redirect use cases. After tx completion we can recycle still mmaped
page into the cache (need to make sure to map them PCI_DMA_BIDIRECTIONAL)
and rx can refill the ring with it. For load balancer steady state
we won't have any calls to page allocator and dma.
Being able to do cheap percpu pool like this is a huge advantage
that any kernel bypass cannot have. I'm pretty sure it will be
possible to avoid local_cmpxchg as well.

Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete

2016-04-05 Thread Bastien Philbert



On 2016-04-05 05:53 PM, Daniel Borkmann wrote:
> On 04/05/2016 11:36 PM, Bastien Philbert wrote:
>> This fixes error handling for the switch statement case
>> SCTP_CMD_SEND_PKT by making the error value of the call
>> to sctp_packet_transmit equal the variable error due to
>> this function being able to fail with a error code. In
> 
> What actual issue have you observed that you fix?
> 
The issue here is basically that sctp_packet_transmit
can return a error if it unsuccessfully transmit the
sk_buff as a parameter. Seems that we should signal
the user/caller(s) when a sctp packet transmission
fails here. If you would like I can resend with a better
commit message in a V2 if this explains the issue better.
Bastien
>> addition allow the call to sctp_ootb_pkt_free afterwards
>> to free up the no longer in use sctp packet even if the
>> call to the function sctp_packet_transmit fails in order
>> to avoid a memory leak here for not freeing the sctp
> 
> Not sure how this relates to your code?
> 
>> Signed-off-by: Bastien Philbert 
>> ---
>>   net/sctp/sm_sideeffect.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
>> index 7fe56d0..f3a8b58 100644
>> --- a/net/sctp/sm_sideeffect.c
>> +++ b/net/sctp/sm_sideeffect.c
>> @@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t 
>> event_type,
>>   case SCTP_CMD_SEND_PKT:
>>   /* Send a full packet to our peer.  */
>>   packet = cmd->obj.packet;
>> -sctp_packet_transmit(packet, gfp);
>> +error = sctp_packet_transmit(packet, gfp);
>>   sctp_ootb_pkt_free(packet);
>>   break;
>>
>>
>

Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete

2016-04-05 Thread Daniel Borkmann


On 04/05/2016 11:36 PM, Bastien Philbert wrote:

This fixes error handling for the switch statement case
SCTP_CMD_SEND_PKT by making the error value of the call
to sctp_packet_transmit equal the variable error due to
this function being able to fail with a error code. In


What actual issue have you observed that you fix?


addition allow the call to sctp_ootb_pkt_free afterwards
to free up the no longer in use sctp packet even if the
call to the function sctp_packet_transmit fails in order
to avoid a memory leak here for not freeing the sctp


Not sure how this relates to your code?


Signed-off-by: Bastien Philbert 
---
  net/sctp/sm_sideeffect.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 7fe56d0..f3a8b58 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
case SCTP_CMD_SEND_PKT:
/* Send a full packet to our peer.  */
packet = cmd->obj.packet;
-   sctp_packet_transmit(packet, gfp);
+   error = sctp_packet_transmit(packet, gfp);
sctp_ootb_pkt_free(packet);
break;

[PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete

2016-04-05 Thread Bastien Philbert

This fixes error handling for the switch statement case
SCTP_CMD_SEND_PKT by making the error value of the call
to sctp_packet_transmit equal the variable error due to
this function being able to fail with a error code. In
addition allow the call to sctp_ootb_pkt_free afterwards
to free up the no longer in use sctp packet even if the
call to the function sctp_packet_transmit fails in order
to avoid a memory leak here for not freeing the sctp

Signed-off-by: Bastien Philbert 
---
 net/sctp/sm_sideeffect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 7fe56d0..f3a8b58 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
case SCTP_CMD_SEND_PKT:
/* Send a full packet to our peer.  */
packet = cmd->obj.packet;
-   sctp_packet_transmit(packet, gfp);
+   error = sctp_packet_transmit(packet, gfp);
sctp_ootb_pkt_free(packet);
break;
 
-- 
2.5.0

Re: [PATCH] ipv6: icmp: Add protection from concurrent users in the function icmpv6_echo_reply

2016-04-05 Thread Hannes Frederic Sowa


On 05.04.2016 23:27, Bastien Philbert wrote:

This adds protection from concurrenct users in the function
icmpv6_echo_reply around the call to the function __in6_dev_get
by locking/unlocking around this call with calls to the functions
rtnl_lock and rtnl_unlock to protect against concurrent users
when calling this function in icmpv6_echo_reply as stated in the
comments for locking requirements for the function, __in6_dev_get.

Signed-off-by: Bastien Philbert 
---
  net/ipv6/icmp.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 0a37ddc..798434f 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -607,7 +607,9 @@ static void icmpv6_echo_reply(struct sk_buff *skb)

hlimit = ip6_sk_dst_hoplimit(np, , dst);

+   rtnl_lock();
idev = __in6_dev_get(skb->dev);
+   rtnl_unlock();

msg.skb = skb;
msg.offset = 0;



We can't hold rtnl_lock in bh context. Have you seen a rcu verifier 
report? I am sure we hold rcu read lock at this point.


Bye,
Hannes

[PATCH] ipv6: icmp: Add protection from concurrent users in the function icmpv6_echo_reply

2016-04-05 Thread Bastien Philbert

This adds protection from concurrenct users in the function
icmpv6_echo_reply around the call to the function __in6_dev_get
by locking/unlocking around this call with calls to the functions
rtnl_lock and rtnl_unlock to protect against concurrent users
when calling this function in icmpv6_echo_reply as stated in the
comments for locking requirements for the function, __in6_dev_get.

Signed-off-by: Bastien Philbert 
---
 net/ipv6/icmp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 0a37ddc..798434f 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -607,7 +607,9 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
 
hlimit = ip6_sk_dst_hoplimit(np, , dst);
 
+   rtnl_lock();
idev = __in6_dev_get(skb->dev);
+   rtnl_unlock();
 
msg.skb = skb;
msg.offset = 0;
-- 
2.5.0

Re: [RFC PATCH 6/6] ppp: add rtnetlink device creation support

2016-04-05 Thread Guillaume Nault

On Tue, Apr 05, 2016 at 07:18:14PM +0200, walter harms wrote:
> 
> 
> Am 05.04.2016 02:56, schrieb Guillaume Nault:
> > @@ -1043,12 +1048,39 @@ static int ppp_dev_configure(struct net *src_net, 
> > struct net_device *dev,
> >  const struct ppp_config *conf)
> >  {
> > struct ppp *ppp = netdev_priv(dev);
> > +   struct file *file;
> > int indx;
> > +   int err;
> > +
> > +   if (conf->fd < 0) {
> > +   file = conf->file;
> > +   if (!file) {
> > +   err = -EBADF;
> > +   goto out;
> 
> why not just return -EBADF;
> 
> > +   }
> > +   } else {
> > +   file = fget(conf->fd);
> > +   if (!file) {
> > +   err = -EBADF;
> > +   goto out;
>   
> why not just return -EBADF;
> 
Just because the 'out' label is declared anyway and because this
centralises the return point. But I agree returning -EBADF directly
could be more readable. I don't have strong opinion.

Re: [PATCH net-next 05/10] bnxt_en: Add get_eee() and set_eee() ethtool support.

2016-04-05 Thread Ben Hutchings

On Tue, 2016-04-05 at 03:36 -0700, Michael Chan wrote:
> On Tue, Apr 5, 2016 at 3:07 AM, Ben Hutchings  wrote:
[...]
> > > +static int bnxt_get_eee(struct net_device *dev, struct ethtool_eee 
> > > *edata)
> > > +{
> > > + struct bnxt *bp = netdev_priv(dev);
> > > +
> > > + if (!(bp->flags & BNXT_FLAG_EEE_CAP))
> > > + return -EOPNOTSUPP;
> > > +
> > > + *edata = bp->eee;
> > > + if (!bp->eee.eee_enabled) {
> > > + edata->advertised = 0;
> > > + edata->tx_lpi_enabled = 0;
> > What about tx_lpi_timer?
> We want to keep the tx_lpi_timer value so that it can be used again
> when it is turned on again.
> 
> The user doesn't have to figure out what value to use if he just wants
> to use the default or the last value.

OK, that seems like a good reason.

> > 
> > 
> > And, wouldn't it make more sense to do these fixups to the internal
> > state in bnxt_set_eee()?
> I don't understand.  If the user is enabling EEE, we take all the
> parameters.  If he is disabling, we don't take any of the parameters.

Right - it's just a bit weird that you keep the internal state in a
struct ethtool_eee but get_eee() returns a slightly different version
of the state.

Ben.

-- 
Ben Hutchings
No political challenge can be met by shopping. - George Monbiot

signature.asc
Description: This is a digitally signed message part

Re: [RFC PATCH 5/6] ppp: define reusable device creation functions

2016-04-05 Thread Guillaume Nault

On Tue, Apr 05, 2016 at 08:28:32AM -0700, Stephen Hemminger wrote:
> On Tue, 5 Apr 2016 02:56:29 +0200
> Guillaume Nault  wrote:
> 
> > Move PPP device initialisation and registration out of
> > ppp_create_interface().
> > This prepares code for device registration with rtnetlink.
> > 
> 
> Does PPP module autoload correctly based on the netlink attributes?
> 
Patch #6 has MODULE_ALIAS_RTNL_LINK("ppp"). This works fine for
auto-loading ppp_generic when creating a PPP device with rtnetlink.
Is there anything else required?

Re: [RFC PATCH 0/6] ppp: add rtnetlink support

2016-04-05 Thread Guillaume Nault

.On Tue, Apr 05, 2016 at 08:27:45AM -0700, Stephen Hemminger wrote:
> On Tue, 5 Apr 2016 02:56:17 +0200
> Guillaume Nault  wrote:
> 
> > The rtnetlink handlers implemented in this series are minimal, and can
> > only replace the PPPIOCNEWUNIT ioctl. The rest of PPP ioctls remains
> > necessary for any other operation on channels and units.
> > It is perfectly to possible to mix PPP devices created by rtnl
> > and by ioctl(PPPIOCNEWUNIT). Devices will behave in the same way,
> > except for a few specific cases (as detailed in patch #6).
> 
> What blocks PPP from being fully netlink (use attributes),
> 
I just didn't implement other netlink attributes because I wanted to
get the foundations validated first. Implementing PPP unit ioctls with
rtnetlink attributes shouldn't be a problem because there's a 1:1
mapping between units and netdevices. So we could have some kind of
feature parity (I'm not sure if all ioctls are worth a netlink
attribute though).

But there's the problem of getting the unit identifier of a PPP device.
If that device was created with kernel assigned name and index, then
the user space daemon has no ifindex or ifname for building an
RTM_GETLINK message. So the ability to retrieve the unit identifer with
rtnetlink wouldn't be enough to fully replace ioctls on unit.

If by "fully netlink", you also meant implementing a netlink
replacement for all supported ioctls, then that's going to be even
trickier. A genetlink API would probably need to be created for
handling generic operations on PPP channels. But that wouldn't be
enough since unknown ioctls on channels are passed to the
chan->ops->ioctl() callback. So netlink support would also have to be
added to the channel handlers (pptp, pppoatm, sync_ppp, irda...).

> and work with same API set independent of how device was created.
> Special cases are nuisance and source of bugs.
> 
It looks like handling rtnetlink messages in ioctl based PPP devices is
just a matter of assigning ->rtnl_link_ops in ppp_create_interface().
I'll consider that for v3.

> > I'm sending the series only as RFC this time, because there are a few
> > points I'm unsatisfied with.
> > 
> > First, I'm not fond of passing file descriptors as netlink attributes,
> > as done with IFLA_PPP_DEV_FD (which is filled with a /dev/ppp fd). But
> > given how PPP units work, we have to associate a /dev/ppp fd somehow.
> > 
> > More importantly, the locking constraints of PPP are quite problematic.
> > The rtnetlink handler has to associate the new PPP unit with the
> > /dev/ppp file descriptor passed as parameter. This requires holding the
> > ppp_mutex (see e8e56ffd9d29 "ppp: ensure file->private_data can't be
> > overridden"), while the rtnetlink callback is already protected by
> > rtnl_lock(). Since other parts of the module take these locks in
> > reverse order, most of this series deals with preparing the code for
> > inverting the dependency between rtnl_lock and ppp_mutex. Some more
> > work is needed on that part (see patch #4 for details), but I wanted
> > to be sure that approach it worth it before spending some more time on
> > it.
> 
> One other way to handle the locking is to use trylock. Yes it justs
> pushs the problem back to userspace, but that is how lock reordering was
> handled in sysfs.
>
If that's considered a valid approach, then I'll use it for v3. That'd
simplify things nicely.

[RESEND PATCH net-next v2 3/3] net: bcmgenet: cleanup for dmadesc_set()

2016-04-05 Thread Petri Gynther

dmadesc_set() is used for setting the Tx buffer DMA address, length,
and status bits on a Tx ring descriptor when a frame is being Tx'ed.

Always set the Tx buffer DMA address first, before updating the length
and status bits, i.e. giving the Tx descriptor to the hardware.

The reason this is a cleanup rather than a fix is that the hardware
won't transmit anything from a Tx ring until the TDMA producer index
has been incremented. As long as the dmadesc_set() writes complete
before the TDMA producer index write, life is good.

Signed-off-by: Petri Gynther 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index d77cd6d..f7b42b9 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -104,8 +104,8 @@ static inline void dmadesc_set_addr(struct bcmgenet_priv 
*priv,
 static inline void dmadesc_set(struct bcmgenet_priv *priv,
   void __iomem *d, dma_addr_t addr, u32 val)
 {
-   dmadesc_set_length_status(priv, d, val);
dmadesc_set_addr(priv, d, addr);
+   dmadesc_set_length_status(priv, d, val);
 }
 
 static inline dma_addr_t dmadesc_get_addr(struct bcmgenet_priv *priv,
-- 
2.8.0.rc3.226.g39d4020

[RESEND PATCH net-next v2 1/3] net: bcmgenet: cleanup for bcmgenet_xmit()

2016-04-05 Thread Petri Gynther

1. Readability: Move nr_frags assignment a few lines down in order
   to bundle index -> ring -> txq calculations together.
2. Readability: Add parentheses around nr_frags + 1.
3. Minor fix: Stop the Tx queue and throw the error message only if
   the Tx queue hasn't already been stopped.

Signed-off-by: Petri Gynther 
Acked-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index cf6445d..7f85a84 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1447,15 +1447,19 @@ static netdev_tx_t bcmgenet_xmit(struct sk_buff *skb, 
struct net_device *dev)
else
index -= 1;
 
-   nr_frags = skb_shinfo(skb)->nr_frags;
ring = >tx_rings[index];
txq = netdev_get_tx_queue(dev, ring->queue);
 
+   nr_frags = skb_shinfo(skb)->nr_frags;
+
spin_lock_irqsave(>lock, flags);
-   if (ring->free_bds <= nr_frags + 1) {
-   netif_tx_stop_queue(txq);
-   netdev_err(dev, "%s: tx ring %d full when queue %d awake\n",
-  __func__, index, ring->queue);
+   if (ring->free_bds <= (nr_frags + 1)) {
+   if (!netif_tx_queue_stopped(txq)) {
+   netif_tx_stop_queue(txq);
+   netdev_err(dev,
+  "%s: tx ring %d full when queue %d awake\n",
+  __func__, index, ring->queue);
+   }
ret = NETDEV_TX_BUSY;
goto out;
}
-- 
2.8.0.rc3.226.g39d4020

[RESEND PATCH net-next v2 2/3] net: bcmgenet: cleanup for bcmgenet_xmit_frag()

2016-04-05 Thread Petri Gynther

Add frag_size = skb_frag_size(frag) and use it when needed.

Signed-off-by: Petri Gynther 
Acked-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 7f85a84..d77cd6d 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1331,6 +1331,7 @@ static int bcmgenet_xmit_frag(struct net_device *dev,
struct bcmgenet_priv *priv = netdev_priv(dev);
struct device *kdev = >pdev->dev;
struct enet_cb *tx_cb_ptr;
+   unsigned int frag_size;
dma_addr_t mapping;
int ret;
 
@@ -1338,10 +1339,12 @@ static int bcmgenet_xmit_frag(struct net_device *dev,
 
if (unlikely(!tx_cb_ptr))
BUG();
+
tx_cb_ptr->skb = NULL;
 
-   mapping = skb_frag_dma_map(kdev, frag, 0,
-  skb_frag_size(frag), DMA_TO_DEVICE);
+   frag_size = skb_frag_size(frag);
+
+   mapping = skb_frag_dma_map(kdev, frag, 0, frag_size, DMA_TO_DEVICE);
ret = dma_mapping_error(kdev, mapping);
if (ret) {
priv->mib.tx_dma_failed++;
@@ -1351,10 +1354,10 @@ static int bcmgenet_xmit_frag(struct net_device *dev,
}
 
dma_unmap_addr_set(tx_cb_ptr, dma_addr, mapping);
-   dma_unmap_len_set(tx_cb_ptr, dma_len, frag->size);
+   dma_unmap_len_set(tx_cb_ptr, dma_len, frag_size);
 
dmadesc_set(priv, tx_cb_ptr->bd_addr, mapping,
-   (frag->size << DMA_BUFLENGTH_SHIFT) | dma_desc_flags |
+   (frag_size << DMA_BUFLENGTH_SHIFT) | dma_desc_flags |
(priv->hw_params->qtag_mask << DMA_TX_QTAG_SHIFT));
 
return 0;
-- 
2.8.0.rc3.226.g39d4020

[RESEND PATCH net-next v2 0/3] bcmgenet cleanups

2016-04-05 Thread Petri Gynther

Three cleanup patches for bcmgenet.

Petri Gynther (3):
  net: bcmgenet: cleanup for bcmgenet_xmit()
  net: bcmgenet: cleanup for bcmgenet_xmit_frag()
  net: bcmgenet: cleanup for dmadesc_set()

 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 27 --
 1 file changed, 17 insertions(+), 10 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

[PATCH] Revert "netpoll: Fix extra refcount release in netpoll_cleanup()"

2016-04-05 Thread Bjorn Helgaas

This reverts commit 543e3a8da5a4c453e992d5351ef405d5e32f27d7.

Direct callers of __netpoll_setup() depend on it to set np->dev,
so we can't simply move that assignment up to netpoll_stup().

Reported-by: Bart Van Assche 
Signed-off-by: Bjorn Helgaas 
---
 net/core/netpoll.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index a57bd17..94acfc8 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -603,6 +603,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device 
*ndev)
const struct net_device_ops *ops;
int err;
 
+   np->dev = ndev;
strlcpy(np->dev_name, ndev->name, IFNAMSIZ);
INIT_WORK(>cleanup_work, netpoll_async_cleanup);
 
@@ -669,7 +670,6 @@ int netpoll_setup(struct netpoll *np)
goto unlock;
}
dev_hold(ndev);
-   np->dev = ndev;
 
if (netdev_master_upper_dev_get(ndev)) {
np_err(np, "%s is a slave device, aborting\n", np->dev_name);
@@ -770,7 +770,6 @@ int netpoll_setup(struct netpoll *np)
return 0;
 
 put:
-   np->dev = NULL;
dev_put(ndev);
 unlock:
rtnl_unlock();

[PATCH net] bridge, netem: mark mailing lists as moderated

2016-04-05 Thread Stephen Hemminger

I moderate these (lightly loaded) lists to block spam.

Signed-off-by: Stephen Hemminger 
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 67d99dd..8355536 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4303,7 +4303,7 @@ F:drivers/net/ethernet/agere/
 
 ETHERNET BRIDGE
 M: Stephen Hemminger 
-L: bri...@lists.linux-foundation.org
+L: bri...@lists.linux-foundation.org (moderated for non-subscribers)
 L: netdev@vger.kernel.org
 W: http://www.linuxfoundation.org/en/Net:Bridge
 S: Maintained
@@ -7576,7 +7576,7 @@ F:drivers/infiniband/hw/nes/
 
 NETEM NETWORK EMULATOR
 M: Stephen Hemminger 
-L: ne...@lists.linux-foundation.org
+L: ne...@lists.linux-foundation.org (moderated for non-subscribers)
 S: Maintained
 F: net/sched/sch_netem.c
 
-- 
2.1.4

[PATCH net-next v2 3/3] tipc: reduce transmission rate of reset messages when link is down

2016-04-05 Thread Jon Maloy

When a link is down, it will continuously try to re-establish contact
with the peer by sending out a RESET or an ACTIVATE message at each
timeout interval. The default value for this interval is currently
375 ms. This is wasteful, and may become a problem in very large
clusters with dozens or hundreds of nodes being down simultaneously.

We now introduce a simple backoff algorithm for these cases. The
first five messages are sent at default rate; thereafter a message
is sent only each 16th timer interval.

This will cover the vast majority of link recycling cases, since the
endpoint starting last will transmit at the higher speed, and the link
should normally be established well be before the rate needs to be
reduced.

The only case where we will see a degradation of link re-establishment
is when the endpoints remain intact, and a glitch in the transmission
media is causing the link reset. We will then experience a worst-case
re-establishing time of 6 seconds, something we deem acceptable.

Acked-by: Ying Xue 
Signed-off-by: Jon Maloy 
---
 net/tipc/link.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 7d2bb3e..42cdbd1 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -140,6 +140,7 @@ struct tipc_link {
char if_name[TIPC_MAX_IF_NAME];
u32 priority;
char net_plane;
+   u16 rst_cnt;
 
/* Failover/synch */
u16 drop_point;
@@ -701,8 +702,6 @@ static void link_profile_stats(struct tipc_link *l)
 
 /* tipc_link_timeout - perform periodic task as instructed from node timeout
  */
-/* tipc_link_timeout - perform periodic task as instructed from node timeout
- */
 int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head *xmitq)
 {
int rc = 0;
@@ -730,11 +729,13 @@ int tipc_link_timeout(struct tipc_link *l, struct 
sk_buff_head *xmitq)
l->silent_intv_cnt++;
break;
case LINK_RESET:
-   xmit = true;
+   if ((l->rst_cnt++ <= 4) || !(l->rst_cnt % 16))
+   xmit = true;
mtyp = RESET_MSG;
break;
case LINK_ESTABLISHING:
-   xmit = true;
+   if ((l->rst_cnt++ <= 4) || !(l->rst_cnt % 16))
+   xmit = true;
mtyp = ACTIVATE_MSG;
break;
case LINK_PEER_RESET:
@@ -833,6 +834,7 @@ void tipc_link_reset(struct tipc_link *l)
l->rcv_nxt = 1;
l->acked = 0;
l->silent_intv_cnt = 0;
+   l->rst_cnt = 0;
l->stats.recv_info = 0;
l->stale_count = 0;
l->bc_peer_is_up = false;
-- 
1.9.1

[PATCH net-next v2 1/3] tipc: eliminate buffer leak in bearer layer

2016-04-05 Thread Jon Maloy

When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a  bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.

This commit fixes this problem by introducing two changes:

1) Instead of attemting to send the discovery message directly, we let
   tipc_disc_create() return the discovery buffer to the calling
   function, tipc_enable_bearer(), so that the latter can send it
   when the enabling sequence is finished.

2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
   functions at the bearer layer, we now free the indicated buffer or
   buffer chain when a valid bearer cannot be found.

Acked-by: Ying Xue 
Signed-off-by: Jon Maloy 
---
 net/tipc/bearer.c   | 51 ++-
 net/tipc/discover.c |  7 ++-
 net/tipc/discover.h |  2 +-
 3 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 27a5406..20566e9 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -205,6 +205,7 @@ static int tipc_enable_bearer(struct net *net, const char 
*name,
struct tipc_bearer *b;
struct tipc_media *m;
struct tipc_bearer_names b_names;
+   struct sk_buff *skb;
char addr_string[16];
u32 bearer_id;
u32 with_this_prio;
@@ -301,7 +302,7 @@ restart:
b->net_plane = bearer_id + 'A';
b->priority = priority;
 
-   res = tipc_disc_create(net, b, >bcast_addr);
+   res = tipc_disc_create(net, b, >bcast_addr, );
if (res) {
bearer_disable(net, b);
pr_warn("Bearer <%s> rejected, discovery object creation 
failed\n",
@@ -310,7 +311,8 @@ restart:
}
 
rcu_assign_pointer(tn->bearer_list[bearer_id], b);
-
+   if (skb)
+   tipc_bearer_xmit_skb(net, bearer_id, skb, >bcast_addr);
pr_info("Enabled bearer <%s>, discovery domain %s, priority %u\n",
name,
tipc_addr_string_fill(addr_string, disc_domain), priority);
@@ -450,6 +452,8 @@ void tipc_bearer_xmit_skb(struct net *net, u32 bearer_id,
b = rcu_dereference_rtnl(tn->bearer_list[bearer_id]);
if (likely(b))
b->media->send_msg(net, skb, b, dest);
+   else
+   kfree_skb(skb);
rcu_read_unlock();
 }
 
@@ -468,11 +472,11 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id,
 
rcu_read_lock();
b = rcu_dereference_rtnl(tn->bearer_list[bearer_id]);
-   if (likely(b)) {
-   skb_queue_walk_safe(xmitq, skb, tmp) {
-   __skb_dequeue(xmitq);
-   b->media->send_msg(net, skb, b, dst);
-   }
+   if (unlikely(!b))
+   __skb_queue_purge(xmitq);
+   skb_queue_walk_safe(xmitq, skb, tmp) {
+   __skb_dequeue(xmitq);
+   b->media->send_msg(net, skb, b, dst);
}
rcu_read_unlock();
 }
@@ -490,14 +494,14 @@ void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
 
rcu_read_lock();
b = rcu_dereference_rtnl(tn->bearer_list[bearer_id]);
-   if (likely(b)) {
-   skb_queue_walk_safe(xmitq, skb, tmp) {
-   hdr = buf_msg(skb);
-   msg_set_non_seq(hdr, 1);
-   msg_set_mc_netid(hdr, net_id);
-   __skb_dequeue(xmitq);
-   b->media->send_msg(net, skb, b, >bcast_addr);
-   }
+   if (unlikely(!b))
+   __skb_queue_purge(xmitq);
+   skb_queue_walk_safe(xmitq, skb, tmp) {
+   hdr = buf_msg(skb);
+   msg_set_non_seq(hdr, 1);
+   msg_set_mc_netid(hdr, net_id);
+   __skb_dequeue(xmitq);
+   b->media->send_msg(net, skb, b, >bcast_addr);
}
rcu_read_unlock();
 }
@@ -513,24 +517,21 @@ void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
  * ignores packets sent using interface multicast, and traffic sent to other
  * nodes (which can happen if interface is running in promiscuous mode).
  */
-static int tipc_l2_rcv_msg(struct sk_buff *buf, struct net_device *dev,
+static int tipc_l2_rcv_msg(struct sk_buff *skb, struct net_device *dev,
   struct packet_type *pt, struct net_device *orig_dev)
 {
struct tipc_bearer *b;
 
rcu_read_lock();
b = rcu_dereference_rtnl(dev->tipc_ptr);
-   if (likely(b)) {
-   if (likely(buf->pkt_type <= PACKET_BROADCAST)) {
-   buf->next = NULL;
-

[PATCH net-next v2 0/3] tipc: some small fixes

2016-04-05 Thread Jon Maloy

When running TIPC in large clusters we experience behavior that
may potentially become problematic in the future. This series
picks some low-hanging fruit in this regard, and also fixes a
couple of other minor issues.

v2: Corrected typos in commit #3, as per feedback from S. Shtylyov

Jon Maloy (3):
  tipc: eliminate buffer leak in bearer layer
  tipc: stricter filtering of packets in bearer layer
  tipc: reduce transmission rate of reset messages when link is down

 net/tipc/bearer.c   | 101 ++--
 net/tipc/discover.c |   7 ++--
 net/tipc/discover.h |   2 +-
 net/tipc/link.c |  10 +++---
 net/tipc/msg.h  |   5 +++
 5 files changed, 73 insertions(+), 52 deletions(-)

-- 
1.9.1

[PATCH net-next v2 2/3] tipc: stricter filtering of packets in bearer layer

2016-04-05 Thread Jon Maloy

Resetting a bearer/interface, with the consequence of resetting all its
pertaining links, is not an atomic action. This becomes particularly
evident in very large clusters, where a lot of traffic may happen on the
remaining links while we are busy shutting them down. In extreme cases,
we may even see links being re-created and re-established before we are
finished with the job.

To solve this, we now introduce a solution where we temporarily detach
the bearer from the interface when the bearer is reset. This inhibits
all packet reception, while sending still is possible. For the latter,
we use the fact that the device's user pointer now is zero to filter out
which packets can be sent during this situation; i.e., outgoing RESET
messages only.  This filtering serves to speed up the neighbors'
detection of the loss event, and saves us from unnecessary probing.

Acked-by: Ying Xue 
Signed-off-by: Jon Maloy 
---
 net/tipc/bearer.c | 50 +-
 net/tipc/msg.h|  5 +
 2 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 20566e9..6f11c62 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -337,23 +337,16 @@ static int tipc_reset_bearer(struct net *net, struct 
tipc_bearer *b)
  */
 static void bearer_disable(struct net *net, struct tipc_bearer *b)
 {
-   struct tipc_net *tn = net_generic(net, tipc_net_id);
-   u32 i;
+   struct tipc_net *tn = tipc_net(net);
+   int bearer_id = b->identity;
 
pr_info("Disabling bearer <%s>\n", b->name);
b->media->disable_media(b);
-
-   tipc_node_delete_links(net, b->identity);
+   tipc_node_delete_links(net, bearer_id);
RCU_INIT_POINTER(b->media_ptr, NULL);
if (b->link_req)
tipc_disc_delete(b->link_req);
-
-   for (i = 0; i < MAX_BEARERS; i++) {
-   if (b == rtnl_dereference(tn->bearer_list[i])) {
-   RCU_INIT_POINTER(tn->bearer_list[i], NULL);
-   break;
-   }
-   }
+   RCU_INIT_POINTER(tn->bearer_list[bearer_id], NULL);
kfree_rcu(b, rcu);
 }
 
@@ -396,7 +389,7 @@ void tipc_disable_l2_media(struct tipc_bearer *b)
 
 /**
  * tipc_l2_send_msg - send a TIPC packet out over an L2 interface
- * @buf: the packet to be sent
+ * @skb: the packet to be sent
  * @b: the bearer through which the packet is to be sent
  * @dest: peer destination address
  */
@@ -405,17 +398,21 @@ int tipc_l2_send_msg(struct net *net, struct sk_buff *skb,
 {
struct net_device *dev;
int delta;
+   void *tipc_ptr;
 
dev = (struct net_device *)rcu_dereference_rtnl(b->media_ptr);
if (!dev)
return 0;
 
+   /* Send RESET message even if bearer is detached from device */
+   tipc_ptr = rtnl_dereference(dev->tipc_ptr);
+   if (unlikely(!tipc_ptr && !msg_is_reset(buf_msg(skb
+   goto drop;
+
delta = dev->hard_header_len - skb_headroom(skb);
if ((delta > 0) &&
-   pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
-   kfree_skb(skb);
-   return 0;
-   }
+   pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC))
+   goto drop;
 
skb_reset_network_header(skb);
skb->dev = dev;
@@ -424,6 +421,9 @@ int tipc_l2_send_msg(struct net *net, struct sk_buff *skb,
dev->dev_addr, skb->len);
dev_queue_xmit(skb);
return 0;
+drop:
+   kfree_skb(skb);
+   return 0;
 }
 
 int tipc_bearer_mtu(struct net *net, u32 bearer_id)
@@ -549,9 +549,18 @@ static int tipc_l2_device_event(struct notifier_block *nb, 
unsigned long evt,
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct net *net = dev_net(dev);
+   struct tipc_net *tn = tipc_net(net);
struct tipc_bearer *b;
+   int i;
 
b = rtnl_dereference(dev->tipc_ptr);
+   if (!b) {
+   for (i = 0; i < MAX_BEARERS; b = NULL, i++) {
+   b = rtnl_dereference(tn->bearer_list[i]);
+   if (b && (b->media_ptr == dev))
+   break;
+   }
+   }
if (!b)
return NOTIFY_DONE;
 
@@ -561,13 +570,20 @@ static int tipc_l2_device_event(struct notifier_block 
*nb, unsigned long evt,
case NETDEV_CHANGE:
if (netif_carrier_ok(dev))
break;
+   case NETDEV_UP:
+   rcu_assign_pointer(dev->tipc_ptr, b);
+   break;
case NETDEV_GOING_DOWN:
+   RCU_INIT_POINTER(dev->tipc_ptr, NULL);
+   synchronize_net();
+   tipc_reset_bearer(net, b);
+   break;
case NETDEV_CHANGEMTU:
tipc_reset_bearer(net, b);
break;
case NETDEV_CHANGEADDR:

[PATCH net-next] bpf, verifier: further improve search pruning

2016-04-05 Thread Daniel Borkmann

The verifier needs to go through every path of the program in
order to check that it terminates safely, which can be quite a
lot of instructions that need to be processed f.e. in cases with
more branchy programs. With search pruning from f1bca824dabb ("bpf:
add search pruning optimization to verifier") the search space can
already be reduced significantly when the verifier detects that
a previously walked path with same register and stack contents
terminated already (see verifier's states_equal()), so the search
can skip walking those states.

When working with larger programs of > ~2000 (out of max 4096)
insns, we found that the current limit of 32k instructions is easily
hit. For example, a case we ran into is that the search space cannot
be pruned due to branches at the beginning of the program that make
use of certain stack space slots (STACK_MISC), which are never used
in the remaining program (STACK_INVALID). Therefore, the verifier
needs to walk paths for the slots in STACK_INVALID state, but also
all remaining paths with a stack structure, where the slots are in
STACK_MISC, which can nearly double the search space needed. After
various experiments, we find that a limit of 64k processed insns is
a more reasonable choice when dealing with larger programs in practice.
This still allows to reject extreme crafted cases that can have a
much higher complexity (f.e. > ~300k) within the 4096 insns limit
due to search pruning not being able to take effect.

Furthermore, we found that a lot of states can be pruned after a
call instruction, f.e. we were able to reduce the search state by
~35% in some cases with this heuristic, trade-off is to keep a bit
more states in env->explored_states. Usually, call instructions
have a number of preceding register assignments and/or stack stores,
where search pruning has a better chance to suceed in states_equal()
test. The current code marks the branch targets with STATE_LIST_MARK
in case of conditional jumps, and the next (t + 1) instruction in
case of unconditional jump so that f.e. a backjump will walk it. We
also did experiments with using t + insns[t].off + 1 as a marker in
the unconditionally jump case instead of t + 1 with the rationale
that these two branches of execution that converge after the label
might have more potential of pruning. We found that it was a bit
better, but not necessarily significantly better than the current
state, perhaps also due to clang not generating back jumps often.
Hence, we left that as is for now.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2e08f8e..212e52a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -202,6 +202,9 @@ struct verifier_env {
bool allow_ptr_leaks;
 };
 
+#define BPF_COMPLEXITY_LIMIT_INSNS 65536
+#define BPF_COMPLEXITY_LIMIT_STACK 1024
+
 /* verbose verifier prints what it's seeing
  * bpf_check() is called under lock, so no race to access these global vars
  */
@@ -454,7 +457,7 @@ static struct verifier_state *push_stack(struct 
verifier_env *env, int insn_idx,
elem->next = env->head;
env->head = elem;
env->stack_size++;
-   if (env->stack_size > 1024) {
+   if (env->stack_size > BPF_COMPLEXITY_LIMIT_STACK) {
verbose("BPF program is too complex\n");
goto err;
}
@@ -1539,6 +1542,8 @@ peek_stack:
goto peek_stack;
else if (ret < 0)
goto err_free;
+   if (t + 1 < insn_cnt)
+   env->explored_states[t + 1] = STATE_LIST_MARK;
} else if (opcode == BPF_JA) {
if (BPF_SRC(insns[t].code) != BPF_K) {
ret = -EINVAL;
@@ -1743,7 +1748,7 @@ static int do_check(struct verifier_env *env)
insn = [insn_idx];
class = BPF_CLASS(insn->code);
 
-   if (++insn_processed > 32768) {
+   if (++insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
verbose("BPF program is too large. Proccessed %d 
insn\n",
insn_processed);
return -E2BIG;
-- 
1.9.3

Re: [PATCH net-next v2 0/3] udp: support SO_PEEK_OFF

2016-04-05 Thread David Miller

From: Willem de Bruijn 
Date: Tue,  5 Apr 2016 12:41:13 -0400

> From: Willem de Bruijn 
> 
> Support peeking at a non-zero offset for UDP sockets. Match the
> existing behavior on Unix datagram sockets.
> 
> 1/3 makes the sk_peek_offset functions safe to use outside locks
> 2/3 removes udp headers before enqueue, to simplify offset arithmetic
> 3/3 introduces SO_PEEK_OFFSET support, with Unix socket peek semantics.
> 
> Changes
>   v1->v2
> - squash patches 3 and 4

Series applied, thanks Willem.

Re: [net-next 00/18][pull request] 40GbE Intel Wired LAN Driver Updates 2016-04-05

2016-04-05 Thread David Miller

From: Jeff Kirsher 
Date: Tue,  5 Apr 2016 13:17:17 -0700

> This series contains updates to i40e and i40evf only.

This looks fine, pulled, thanks Jeff.

[net-next 09/18] i40e: Fix up return code

2016-04-05 Thread Jeff Kirsher

From: Jesse Brandeburg 

The i40e_common.c typically uses i40e_status as a return code,
but got missed this one case.

Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index b0fd684..8276a13 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1901,13 +1901,13 @@ i40e_status i40e_aq_set_phy_int_mask(struct i40e_hw *hw,
  *
  * Reset the external PHY.
  **/
-enum i40e_status_code i40e_aq_set_phy_debug(struct i40e_hw *hw, u8 cmd_flags,
-   struct i40e_asq_cmd_details 
*cmd_details)
+i40e_status i40e_aq_set_phy_debug(struct i40e_hw *hw, u8 cmd_flags,
+ struct i40e_asq_cmd_details *cmd_details)
 {
struct i40e_aq_desc desc;
struct i40e_aqc_set_phy_debug *cmd =
(struct i40e_aqc_set_phy_debug *)
-   enum i40e_status_code status;
+   i40e_status status;
 
i40e_fill_default_direct_cmd_desc(,
  i40e_aqc_opc_set_phy_debug);
-- 
2.5.5

[net-next 11/18] i40e: Assure that adminq is alive in debug mode

2016-04-05 Thread Jeff Kirsher

From: Shannon Nelson 

When dropping into debug mode in a failed probe, make sure that
the AdminQ is left alive for possible hand debug of driver and
firmware states.

Move the mutex_init calls earlier in probe so that if init fails,
the admin queue interface is still available for debugging purposes.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2464dca..56d4416 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10822,6 +10822,12 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
hw->bus.func = PCI_FUNC(pdev->devfn);
pf->instance = pfs_found;
 
+   /* set up the locks for the AQ, do this only once in probe
+* and destroy them only once in remove
+*/
+   mutex_init(>aq.asq_mutex);
+   mutex_init(>aq.arq_mutex);
+
if (debug != -1) {
pf->msg_enable = pf->hw.debug_mask;
pf->msg_enable = debug;
@@ -10867,12 +10873,6 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
/* set up a default setting for link flow control */
pf->hw.fc.requested_mode = I40E_FC_NONE;
 
-   /* set up the locks for the AQ, do this only once in probe
-* and destroy them only once in remove
-*/
-   mutex_init(>aq.asq_mutex);
-   mutex_init(>aq.arq_mutex);
-
err = i40e_init_adminq(hw);
if (err) {
if (err == I40E_ERR_FIRMWARE_API_VERSION)
@@ -11265,7 +11265,6 @@ err_init_lan_hmc:
kfree(pf->qp_pile);
 err_sw_init:
 err_adminq_setup:
-   (void)i40e_shutdown_adminq(hw);
 err_pf_reset:
iounmap(hw->hw_addr);
 err_ioremap:
-- 
2.5.5

[net-next 04/18] i40e/i40evf: Fix handling of boolean logic in polling routines

2016-04-05 Thread Jeff Kirsher

From: Alexander Duyck 

In the polling routines for i40e and i40evf we were using bitwise operators
to avoid the side effects of the logical operators, specifically the fact
that if the first case is true with "||" we skip the second case, or if it
is false with "&&" we skip the second case.  This fixes an earlier patch
that converted the bitwise operators over to the logical operators and
instead replaces the entire thing with just an if statement since it should
be more readable what we are trying to do this way.

Fixes: 1a36d7fadd14 ("i40e/i40evf: use logical operators, not bitwise")
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 13 -
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 13 -
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 9af1411..8fb2a96 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1975,9 +1975,11 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 * budget and be more aggressive about cleaning up the Tx descriptors.
 */
i40e_for_each_ring(ring, q_vector->tx) {
-   clean_complete = clean_complete &&
-i40e_clean_tx_irq(ring, vsi->work_limit);
-   arm_wb = arm_wb || ring->arm_wb;
+   if (!i40e_clean_tx_irq(ring, vsi->work_limit)) {
+   clean_complete = false;
+   continue;
+   }
+   arm_wb |= ring->arm_wb;
ring->arm_wb = false;
}
 
@@ -1999,8 +2001,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
cleaned = i40e_clean_rx_irq_1buf(ring, budget_per_ring);
 
work_done += cleaned;
-   /* if we didn't clean as many as budgeted, we must be done */
-   clean_complete = clean_complete && (budget_per_ring > cleaned);
+   /* if we clean as many as budgeted, we must not be done */
+   if (cleaned >= budget_per_ring)
+   clean_complete = false;
}
 
/* If work not completed, return budget and polling will return */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 5f9c1bb..839a6df 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1411,9 +1411,11 @@ int i40evf_napi_poll(struct napi_struct *napi, int 
budget)
 * budget and be more aggressive about cleaning up the Tx descriptors.
 */
i40e_for_each_ring(ring, q_vector->tx) {
-   clean_complete = clean_complete &&
-i40e_clean_tx_irq(ring, vsi->work_limit);
-   arm_wb = arm_wb || ring->arm_wb;
+   if (!i40e_clean_tx_irq(ring, vsi->work_limit)) {
+   clean_complete = false;
+   continue;
+   }
+   arm_wb |= ring->arm_wb;
ring->arm_wb = false;
}
 
@@ -1435,8 +1437,9 @@ int i40evf_napi_poll(struct napi_struct *napi, int budget)
cleaned = i40e_clean_rx_irq_1buf(ring, budget_per_ring);
 
work_done += cleaned;
-   /* if we didn't clean as many as budgeted, we must be done */
-   clean_complete = clean_complete && (budget_per_ring > cleaned);
+   /* if we clean as many as budgeted, we must not be done */
+   if (cleaned >= budget_per_ring)
+   clean_complete = false;
}
 
/* If work not completed, return budget and polling will return */
-- 
2.5.5

[net-next 03/18] i40evf: remove dead code

2016-04-05 Thread Jeff Kirsher

From: Alan Cox 

The only error case is when the malloc fails, in which case the clean up
loop does nothing at all, so remove it

Signed-off-by: Alan Cox 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 4b70aae..820ad94 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1507,7 +1507,7 @@ static int i40evf_alloc_q_vectors(struct i40evf_adapter 
*adapter)
adapter->q_vectors = kcalloc(num_q_vectors, sizeof(*q_vector),
 GFP_KERNEL);
if (!adapter->q_vectors)
-   goto err_out;
+   return -ENOMEM;
 
for (q_idx = 0; q_idx < num_q_vectors; q_idx++) {
q_vector = >q_vectors[q_idx];
@@ -1519,15 +1519,6 @@ static int i40evf_alloc_q_vectors(struct i40evf_adapter 
*adapter)
}
 
return 0;
-
-err_out:
-   while (q_idx) {
-   q_idx--;
-   q_vector = >q_vectors[q_idx];
-   netif_napi_del(_vector->napi);
-   }
-   kfree(adapter->q_vectors);
-   return -ENOMEM;
 }
 
 /**
-- 
2.5.5

[net-next 13/18] i40e: Notify VFs of all resets

2016-04-05 Thread Jeff Kirsher

From: Mitch Williams 

Notify VFs in the reset interrupt handler, instead of the actual
reset initiation code. This allows the VFs to get properly notified for
all resets, including resets initiated by different PFs on the same
physical device.

Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e615f66..98bc749 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5534,8 +5534,6 @@ void i40e_do_reset(struct i40e_pf *pf, u32 reset_flags)
 
WARN_ON(in_interrupt());
 
-   if (i40e_check_asq_alive(>hw))
-   i40e_vc_notify_reset(pf);
 
/* do the biggest reset indicated */
if (reset_flags & BIT_ULL(__I40E_GLOBAL_RESET_REQUESTED)) {
@@ -6738,6 +6736,8 @@ static void i40e_prep_for_reset(struct i40e_pf *pf)
clear_bit(__I40E_RESET_INTR_RECEIVED, >state);
if (test_and_set_bit(__I40E_RESET_RECOVERY_PENDING, >state))
return;
+   if (i40e_check_asq_alive(>hw))
+   i40e_vc_notify_reset(pf);
 
dev_dbg(>pdev->dev, "Tearing down internal switch for reset\n");
 
-- 
2.5.5

[net-next 12/18] i40e: Remove timer and task only if created

2016-04-05 Thread Jeff Kirsher

From: Shannon Nelson 

In some error scenarios, we may find ourselves trying to remove a
non-existent timer or worktask.  This causes the kernel some bit
of consternation, so don't do it.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 56d4416..e615f66 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11306,8 +11306,10 @@ static void i40e_remove(struct pci_dev *pdev)
/* no more scheduling of any task */
set_bit(__I40E_SUSPENDED, >state);
set_bit(__I40E_DOWN, >state);
-   del_timer_sync(>service_timer);
-   cancel_work_sync(>service_task);
+   if (pf->service_timer.data)
+   del_timer_sync(>service_timer);
+   if (pf->service_task.func)
+   cancel_work_sync(>service_task);
 
if (pf->flags & I40E_FLAG_SRIOV_ENABLED) {
i40e_free_vfs(pf);
-- 
2.5.5

[net-next 06/18] i40e/i40evf: Fix casting in transmit code

2016-04-05 Thread Jeff Kirsher

From: Jesse Brandeburg 

Simple cast to fix a sparse warning.

Fixes: commit 5453205cd097 ("i40e/i40evf: Enable support for
SKB_GSO_UDP_TUNNEL_CSUM")

Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 5 +++--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 01cff07..5bef5b0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2305,7 +2305,8 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
 
/* remove payload length from outer checksum */
paylen = (__force u16)l4.udp->check;
-   paylen += ntohs(1) * (u16)~(skb->len - l4_offset);
+   paylen += ntohs((__force __be16)1) *
+   (u16)~(skb->len - l4_offset);
l4.udp->check = ~csum_fold((__force __wsum)paylen);
}
 
@@ -2327,7 +2328,7 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
 
/* remove payload length from inner checksum */
paylen = (__force u16)l4.tcp->check;
-   paylen += ntohs(1) * (u16)~(skb->len - l4_offset);
+   paylen += ntohs((__force __be16)1) * (u16)~(skb->len - l4_offset);
l4.tcp->check = ~csum_fold((__force __wsum)paylen);
 
/* compute length of segmentation header */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 9e91136..570348d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1572,7 +1572,8 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
 
/* remove payload length from outer checksum */
paylen = (__force u16)l4.udp->check;
-   paylen += ntohs(1) * (u16)~(skb->len - l4_offset);
+   paylen += ntohs((__force __be16)1) *
+   (u16)~(skb->len - l4_offset);
l4.udp->check = ~csum_fold((__force __wsum)paylen);
}
 
@@ -1594,7 +1595,7 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
 
/* remove payload length from inner checksum */
paylen = (__force u16)l4.tcp->check;
-   paylen += ntohs(1) * (u16)~(skb->len - l4_offset);
+   paylen += ntohs((__force __be16)1) * (u16)~(skb->len - l4_offset);
l4.tcp->check = ~csum_fold((__force __wsum)paylen);
 
/* compute length of segmentation header */
-- 
2.5.5

[net-next 17/18] i40e: Change comment to reflect correct function name

2016-04-05 Thread Jeff Kirsher

From: Mitch Williams 

Minor correction in the comment to reflect the correct function name

Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 291d628..47b9e62 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -63,7 +63,7 @@ static void i40e_vc_vf_broadcast(struct i40e_pf *pf,
 }
 
 /**
- * i40e_vc_notify_link_state
+ * i40e_vc_notify_vf_link_state
  * @vf: pointer to the VF structure
  *
  * send a link status message to a single VF
-- 
2.5.5

[net-next 15/18] i40e: Change unknown event error msg to ignore message

2016-04-05 Thread Jeff Kirsher

From: Shannon Nelson 

There's no real error in an unknown event from the Firmware, we're just
posting a useful FYI notice, so this patch simply removes the "Error" word.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 98bc749..3841005 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6371,7 +6371,7 @@ static void i40e_clean_adminq_subtask(struct i40e_pf *pf)
break;
default:
dev_info(>pdev->dev,
-"ARQ Error: Unknown event 0x%04x received\n",
+"ARQ: Unknown event 0x%04x ignored\n",
 opcode);
break;
}
-- 
2.5.5

[net-next 08/18] i40e: Save off VSI resource count when updating VSI

2016-04-05 Thread Jeff Kirsher

From: Kevin Scott 

When updating a VSI, save off the number of allocated and unallocated
VSIs as we do when adding a VSI.

Signed-off-by: Kevin Scott 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 4596294..b0fd684 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -2157,6 +2157,9 @@ i40e_status i40e_aq_update_vsi_params(struct i40e_hw *hw,
struct i40e_aq_desc desc;
struct i40e_aqc_add_get_update_vsi *cmd =
(struct i40e_aqc_add_get_update_vsi *)
+   struct i40e_aqc_add_get_update_vsi_completion *resp =
+   (struct i40e_aqc_add_get_update_vsi_completion *)
+   
i40e_status status;
 
i40e_fill_default_direct_cmd_desc(,
@@ -2168,6 +2171,9 @@ i40e_status i40e_aq_update_vsi_params(struct i40e_hw *hw,
status = i40e_asq_send_command(hw, , _ctx->info,
sizeof(vsi_ctx->info), cmd_details);
 
+   vsi_ctx->vsis_allocated = le16_to_cpu(resp->vsi_used);
+   vsi_ctx->vsis_unallocated = le16_to_cpu(resp->vsi_free);
+
return status;
 }
 
-- 
2.5.5

[net-next 05/18] i40e/i40evf: Add support for bulk free in Tx cleanup

2016-04-05 Thread Jeff Kirsher

From: Alexander Duyck 

This patch enables bulk Tx clean for skbs.  In order to enable it we need
to pass the napi_budget value as that is used to determine if we are truly
running in NAPI mode or if we are simply calling the routine from netpoll
with a budget of 0.  In order to avoid adding too many more variables I
thought it best to pass the VSI directly in a fashion similar to what we do
on igb and ixgbe with the q_vector.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 20 +++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 20 +++-
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 8fb2a96..01cff07 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -636,19 +636,21 @@ u32 i40e_get_tx_pending(struct i40e_ring *ring, bool 
in_sw)
 
 /**
  * i40e_clean_tx_irq - Reclaim resources after transmit completes
- * @tx_ring:  tx ring to clean
- * @budget:   how many cleans we're allowed
+ * @vsi: the VSI we care about
+ * @tx_ring: Tx ring to clean
+ * @napi_budget: Used to determine if we are in netpoll
  *
  * Returns true if there's any budget left (e.g. the clean is finished)
  **/
-static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
+static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
+ struct i40e_ring *tx_ring, int napi_budget)
 {
u16 i = tx_ring->next_to_clean;
struct i40e_tx_buffer *tx_buf;
struct i40e_tx_desc *tx_head;
struct i40e_tx_desc *tx_desc;
-   unsigned int total_packets = 0;
-   unsigned int total_bytes = 0;
+   unsigned int total_bytes = 0, total_packets = 0;
+   unsigned int budget = vsi->work_limit;
 
tx_buf = _ring->tx_bi[i];
tx_desc = I40E_TX_DESC(tx_ring, i);
@@ -678,7 +680,7 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, 
int budget)
total_packets += tx_buf->gso_segs;
 
/* free the skb */
-   dev_consume_skb_any(tx_buf->skb);
+   napi_consume_skb(tx_buf->skb, napi_budget);
 
/* unmap skb header data */
dma_unmap_single(tx_ring->dev,
@@ -749,7 +751,7 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, 
int budget)
 
if (budget &&
((j / (WB_STRIDE + 1)) == 0) && (j != 0) &&
-   !test_bit(__I40E_DOWN, _ring->vsi->state) &&
+   !test_bit(__I40E_DOWN, >state) &&
(I40E_DESC_UNUSED(tx_ring) != tx_ring->count))
tx_ring->arm_wb = true;
}
@@ -767,7 +769,7 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, 
int budget)
smp_mb();
if (__netif_subqueue_stopped(tx_ring->netdev,
 tx_ring->queue_index) &&
-  !test_bit(__I40E_DOWN, _ring->vsi->state)) {
+  !test_bit(__I40E_DOWN, >state)) {
netif_wake_subqueue(tx_ring->netdev,
tx_ring->queue_index);
++tx_ring->tx_stats.restart_queue;
@@ -1975,7 +1977,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 * budget and be more aggressive about cleaning up the Tx descriptors.
 */
i40e_for_each_ring(ring, q_vector->tx) {
-   if (!i40e_clean_tx_irq(ring, vsi->work_limit)) {
+   if (!i40e_clean_tx_irq(vsi, ring, budget)) {
clean_complete = false;
continue;
}
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 839a6df..9e91136 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -155,19 +155,21 @@ u32 i40evf_get_tx_pending(struct i40e_ring *ring, bool 
in_sw)
 
 /**
  * i40e_clean_tx_irq - Reclaim resources after transmit completes
- * @tx_ring:  tx ring to clean
- * @budget:   how many cleans we're allowed
+ * @vsi: the VSI we care about
+ * @tx_ring: Tx ring to clean
+ * @napi_budget: Used to determine if we are in netpoll
  *
  * Returns true if there's any budget left (e.g. the clean is finished)
  **/
-static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
+static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
+ struct i40e_ring *tx_ring, int napi_budget)
 {
u16 i = tx_ring->next_to_clean;
struct i40e_tx_buffer *tx_buf;
struct i40e_tx_desc *tx_head;
struct i40e_tx_desc *tx_desc;
-   unsigned int total_packets = 0;
-

[net-next 02/18] i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K

2016-04-05 Thread Jeff Kirsher

From: Alexander Duyck 

>From what I can tell the practical limitation on the size of the Tx data
buffer is the fact that the Tx descriptor is limited to 14 bits.  As such
we cannot use 16K as is typically used on the other Intel drivers.  However
artificially limiting ourselves to 8K can be expensive as this means that
we will consume up to 10 descriptors (1 context, 1 for header, and 9 for
payload, non-8K aligned) in a single send.

I propose that we can reduce this by increasing the maximum data for a 4K
aligned block to 12K.  We can reduce the descriptors used for a 32K aligned
block by 1 by increasing the size like this.  In addition we still have the
4K - 1 of space that is still unused.  We can use this as a bit of extra
padding when dealing with data that is not aligned to 4K.

By aligning the descriptors after the first to 4K we can improve the
efficiency of PCIe accesses as we can avoid using byte enables and can fetch
full TLP transactions after the first fetch of the buffer.  This helps to
improve PCIe efficiency.  Below is the results of testing before and after
with this patch:

Recv   Send   Send Utilization  Service Demand
Socket Socket Message  Elapsed Send RecvSendRecv
Size   Size   Size TimeThroughput  localremote  local   remote
bytes  bytes  bytessecs.   10^6bits/s  % S  % U us/KB   us/KB
Before:
87380  16384  1638410.00 33682.24  20.27-1.00   0.592   -1.00
After:
87380  16384  1638410.00 34204.08  20.54-1.00   0.590   -1.00

So the net result of this patch is that we have a small gain in throughput
due to a reduction in overhead for putting together the frame.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_fcoe.c   |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 13 +++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   | 35 ---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 13 +++---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 35 ---
 5 files changed, 83 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_fcoe.c 
b/drivers/net/ethernet/intel/i40e/i40e_fcoe.c
index 8ad162c..92d2208 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_fcoe.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_fcoe.c
@@ -1371,7 +1371,7 @@ static netdev_tx_t i40e_fcoe_xmit_frame(struct sk_buff 
*skb,
if (i40e_chk_linearize(skb, count)) {
if (__skb_linearize(skb))
goto out_drop;
-   count = TXD_USE_COUNT(skb->len);
+   count = i40e_txd_use_count(skb->len);
tx_ring->tx_stats.tx_linearize++;
}
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 084d0ab..9af1411 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2717,6 +2717,8 @@ static inline void i40e_tx_map(struct i40e_ring *tx_ring, 
struct sk_buff *skb,
tx_bi = first;
 
for (frag = _shinfo(skb)->frags[0];; frag++) {
+   unsigned int max_data = I40E_MAX_DATA_PER_TXD_ALIGNED;
+
if (dma_mapping_error(tx_ring->dev, dma))
goto dma_error;
 
@@ -2724,12 +2726,14 @@ static inline void i40e_tx_map(struct i40e_ring 
*tx_ring, struct sk_buff *skb,
dma_unmap_len_set(tx_bi, len, size);
dma_unmap_addr_set(tx_bi, dma, dma);
 
+   /* align size to end of page */
+   max_data += -dma & (I40E_MAX_READ_REQ_SIZE - 1);
tx_desc->buffer_addr = cpu_to_le64(dma);
 
while (unlikely(size > I40E_MAX_DATA_PER_TXD)) {
tx_desc->cmd_type_offset_bsz =
build_ctob(td_cmd, td_offset,
-  I40E_MAX_DATA_PER_TXD, td_tag);
+  max_data, td_tag);
 
tx_desc++;
i++;
@@ -2740,9 +2744,10 @@ static inline void i40e_tx_map(struct i40e_ring 
*tx_ring, struct sk_buff *skb,
i = 0;
}
 
-   dma += I40E_MAX_DATA_PER_TXD;
-   size -= I40E_MAX_DATA_PER_TXD;
+   dma += max_data;
+   size -= max_data;
 
+   max_data = I40E_MAX_DATA_PER_TXD_ALIGNED;
tx_desc->buffer_addr = cpu_to_le64(dma);
}
 
@@ -2892,7 +2897,7 @@ static netdev_tx_t i40e_xmit_frame_ring(struct sk_buff 
*skb,
if (i40e_chk_linearize(skb, count)) {
if (__skb_linearize(skb))
goto out_drop;
-

[net-next 07/18] i40e/i40evf: Remove I40E_MAX_USER_PRIORITY define

2016-04-05 Thread Jeff Kirsher

From: Catherine Sullivan 

This patch removes the duplicate definition of I40E_MAX_USER_PRIORITY
in i40e.h that is not needed.

Signed-off-by: Catherine Sullivan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index f208570..d25b3be 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -244,7 +244,6 @@ struct i40e_fdir_filter {
 #define I40E_DCB_PRIO_TYPE_STRICT  0
 #define I40E_DCB_PRIO_TYPE_ETS 1
 #define I40E_DCB_STRICT_PRIO_CREDITS   127
-#define I40E_MAX_USER_PRIORITY 8
 /* DCB per TC information data structure */
 struct i40e_tc_info {
u16 qoffset;/* Queue offset from base queue */
-- 
2.5.5

[net-next 16/18] i40evf: Add additional check for reset

2016-04-05 Thread Jeff Kirsher

From: Mitch Williams 

If the driver happens to read a register during the time in which the
device is undergoing reset, it will receive a value of 0xdeadbeef
instead of a valid value. Unfortunately, the driver may misinterpret
this as a valid value, especially if it's just looking for individual
bits.

Add an explicit check for this value when we are looking for admin queue
errors, and trigger reset recovery if we find it.

Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 820ad94..d783c1b 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1994,6 +1994,8 @@ static void i40evf_adminq_task(struct work_struct *work)
 
/* check for error indications */
val = rd32(hw, hw->aq.arq.len);
+   if (val == 0xdeadbeef) /* indicates device in reset */
+   goto freedom;
oldval = val;
if (val & I40E_VF_ARQLEN1_ARQVFE_MASK) {
dev_info(>pdev->dev, "ARQ VF Error detected\n");
-- 
2.5.5

[net-next 00/18][pull request] 40GbE Intel Wired LAN Driver Updates 2016-04-05

2016-04-05 Thread Jeff Kirsher

This series contains updates to i40e and i40evf only.

Stefan converts dev_close() to ndo_stop() for ethtool offline self test,
since dev_close() causes IFF_UP to be cleared which will remove the
interface routes and addresses.

Alex bumps up the size of the transmit data buffer to 12K rather than 8K,
which provides a gain in throughput and a reduction in overhead for
putting together the frame.  Fixed an issue in the polling routines where
we were using bitwise operators to avoid the side effects of the
logical operators.  Then added support for bulk transmit clean for skbs.

Jesse fixed a sparse issue in the type casting in the transmit code and
fixed i40e_aq_set_phy_debug() to use i40e_status as a return code.

Catherine cleans up duplicated code.

Shannon fixed the cleaning up of the interrupt handling to clean up the
IRQs only if we actually got them set up.  Also fixed up the error
scenarios where we were trying to remove a non-existent timer or
worktask, which causes the kernel heartburn.

Mitch changes the notification of resets to the reset interrupt handler,
instead of the actual reset initiation code.  This allows the VFs to get
properly notified for all resets, including resets initiated by different
PFs on the same physical device.  Also moved the clearing of VFLR bit
after reset processing, instead of before which could lead to double
resets on VF init.  Fixed code comment to match the actual function name.

The following are changes since commit 15f41e2ba13a6726632e44b1180e805a61e470ad:
  Merge branch 'tcp-udp-misc'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Cox (1):
  i40evf: remove dead code

Alexander Duyck (3):
  i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead
of 8K
  i40e/i40evf: Fix handling of boolean logic in polling routines
  i40e/i40evf: Add support for bulk free in Tx cleanup

Catherine Sullivan (2):
  i40e/i40evf: Remove I40E_MAX_USER_PRIORITY define
  i40e/i40evf: Bump patch from 1.4.25 to 1.5.1

Jesse Brandeburg (2):
  i40e/i40evf: Fix casting in transmit code
  i40e: Fix up return code

Kevin Scott (1):
  i40e: Save off VSI resource count when updating VSI

Mitch Williams (4):
  i40e: Notify VFs of all resets
  i40e: Added code to prevent double resets
  i40evf: Add additional check for reset
  i40e: Change comment to reflect correct function name

Shannon Nelson (4):
  i40e: Remove MSIx only if created
  i40e: Assure that adminq is alive in debug mode
  i40e: Remove timer and task only if created
  i40e: Change unknown event error msg to ignore message

Stefan Assmann (1):
  i40e: call ndo_stop() instead of dev_close() when running offline
selftest

 drivers/net/ethernet/intel/i40e/i40e.h |  3 +-
 drivers/net/ethernet/intel/i40e/i40e_common.c  | 12 --
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  4 +-
 drivers/net/ethernet/intel/i40e/i40e_fcoe.c|  2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c| 35 +++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c| 49 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h| 35 ++--
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 13 +++---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  | 49 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  | 35 ++--
 drivers/net/ethernet/intel/i40evf/i40evf_main.c| 17 +++-
 11 files changed, 166 insertions(+), 88 deletions(-)

-- 
2.5.5

[net-next 10/18] i40e: Remove MSIx only if created

2016-04-05 Thread Jeff Kirsher

From: Shannon Nelson 

When cleaning up the interrupt handling, clean up the IRQs only if
we actually got them set up.  There are a couple of error recovery
paths that were violating this and causing the kernel a bit of
indigestion.

Signed-off-by: Shannon Nelson 
Reviewed-by: Williams, Mitch A 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 650336e..2464dca 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4164,7 +4164,7 @@ static void i40e_clear_interrupt_scheme(struct i40e_pf 
*pf)
int i;
 
i40e_stop_misc_vector(pf);
-   if (pf->flags & I40E_FLAG_MSIX_ENABLED) {
+   if (pf->flags & I40E_FLAG_MSIX_ENABLED && pf->msix_entries) {
synchronize_irq(pf->msix_entries[0].vector);
free_irq(pf->msix_entries[0].vector, pf);
}
-- 
2.5.5

[net-next 18/18] i40e/i40evf: Bump patch from 1.4.25 to 1.5.1

2016-04-05 Thread Jeff Kirsher

From: Catherine Sullivan 

Signed-off-by: Catherine Sullivan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++--
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3841005..297fd39 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -45,8 +45,8 @@ static const char i40e_driver_string[] =
 #define DRV_KERN "-k"
 
 #define DRV_VERSION_MAJOR 1
-#define DRV_VERSION_MINOR 4
-#define DRV_VERSION_BUILD 25
+#define DRV_VERSION_MINOR 5
+#define DRV_VERSION_BUILD 1
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 __stringify(DRV_VERSION_MINOR) "." \
 __stringify(DRV_VERSION_BUILD)DRV_KERN
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index d783c1b..e397368 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -37,8 +37,8 @@ static const char i40evf_driver_string[] =
 #define DRV_KERN "-k"
 
 #define DRV_VERSION_MAJOR 1
-#define DRV_VERSION_MINOR 4
-#define DRV_VERSION_BUILD 15
+#define DRV_VERSION_MINOR 5
+#define DRV_VERSION_BUILD 1
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 __stringify(DRV_VERSION_MINOR) "." \
 __stringify(DRV_VERSION_BUILD) \
-- 
2.5.5

1 2 3 >

1 - 100 of 274 matches

Mail list logo