[26/74] IPoIB: Fix send lockup due to missed TX completion

2013-04-07 Thread Ben Hutchings
3.2.43-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Mike Marciniszyn 

commit 1ee9e2aa7b31427303466776f455d43e5e3c9275 upstream.

Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Reviewed-by: Dean Luick 
Signed-off-by: Mike Marciniszyn 
Signed-off-by: Roland Dreier 
Signed-off-by: Ben Hutchings 
---
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -755,9 +755,13 @@ void ipoib_cm_send(struct net_device *de
if (++priv->tx_outstanding == ipoib_sendq_size) {
ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net 
queue\n",
  tx->qp->qp_num);
-   if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP))
-   ipoib_warn(priv, "request notify on send CQ 
failed\n");
netif_stop_queue(dev);
+   rc = ib_req_notify_cq(priv->send_cq,
+   IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
+   if (rc < 0)
+   ipoib_warn(priv, "request notify on send CQ 
failed\n");
+   else if (rc)
+   ipoib_send_comp_handler(priv->send_cq, dev);
}
}
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[26/74] IPoIB: Fix send lockup due to missed TX completion

2013-04-07 Thread Ben Hutchings
3.2.43-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Mike Marciniszyn mike.marcinis...@intel.com

commit 1ee9e2aa7b31427303466776f455d43e5e3c9275 upstream.

Commit f0dc117abdfa (IPoIB: Fix TX queue lockup with mixed UD/CM
traffic) attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Reviewed-by: Dean Luick dean.lu...@intel.com
Signed-off-by: Mike Marciniszyn mike.marcinis...@intel.com
Signed-off-by: Roland Dreier rol...@purestorage.com
Signed-off-by: Ben Hutchings b...@decadent.org.uk
---
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -755,9 +755,13 @@ void ipoib_cm_send(struct net_device *de
if (++priv-tx_outstanding == ipoib_sendq_size) {
ipoib_dbg(priv, TX ring 0x%x full, stopping kernel net 
queue\n,
  tx-qp-qp_num);
-   if (ib_req_notify_cq(priv-send_cq, IB_CQ_NEXT_COMP))
-   ipoib_warn(priv, request notify on send CQ 
failed\n);
netif_stop_queue(dev);
+   rc = ib_req_notify_cq(priv-send_cq,
+   IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
+   if (rc  0)
+   ipoib_warn(priv, request notify on send CQ 
failed\n);
+   else if (rc)
+   ipoib_send_comp_handler(priv-send_cq, dev);
}
}
 }

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/