Hi Mike,

finally I've found the mysterious I/O stall I've been chasing
for the last two months.

Problem is iscsi_conn_queue_work(); we're just calling
queue_work() without checking the return value.

However, queue_work() will be effectively a no-op when
there is some work already running, so we'll be missing
quite a few invocations (on my NetApp target I've been
counting up to 120000 missed invocations ...).

In addition to that, iscsi_check_cmdsn_window_closed()
doesn't actually check if the cmdsn window has been closed
(in the sense that we're not allowed to send any PDUs),
but rather that the new PDU _to be queued_ will be rejected.
There might still be PDUs in the queue, waiting to be
transferred.

So if we're hitting queuecommand hard enough we might be running into this
scenario:

- iscsi_data_xmit:
   Transfers last Data-out PDU in requeue list; all queues are empty.
   -> xmit_task()
      -> unlock session

- queuecommand()
    being called repeatedly until iscsi_check_cmdsn triggers, giving
    xmit thread no chance to grab the session lock
    -> iscsi_conn_queue_work() does nothing as work is already pending

- iscsi_data_xmit:
      -> locks session
    -> returns from xmit_task()
  -> end of iscsi_data_xmit()

-> I/O stall.

So we really should check if we've missed some queue_work calls,
and restart iscsi_data_xmit() if so.
Proposed patch is attached.

Note that in theory we might loop here when ->xmit_task() returns
an error. But this loop should be closed as we have had some
changes to the lists as someone called queue_work in the meantime.

Comments etc welcome.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

>From 1ab84cdf687ad989a807e0e01fa136601fa3ac2b Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <h...@suse.de>
Date: Fri, 14 Aug 2009 13:50:15 +0200
Subject: [PATCH] libiscsi: I/O stall under heavy load

When the system is under load we might be missing quite some
invocations to queue_work(). So we need to check at the end
of iscsi_data_xmit() if we've missed some, and, given that
there's still some data in the queue, restart ourselves.

Signed-off-by: Hannes Reinecke <h...@suse.de>
---
 drivers/scsi/libiscsi.c |   16 +++++++++++-----
 include/scsi/libiscsi.h |    2 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 0356e3d..75870dd 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -104,8 +104,9 @@ inline void iscsi_conn_queue_work(struct iscsi_conn *conn)
 	struct Scsi_Host *shost = conn->session->host;
 	struct iscsi_host *ihost = shost_priv(shost);
 
-	if (ihost->workq)
-		queue_work(ihost->workq, &conn->xmitwork);
+	if (ihost->workq &&
+	    !queue_work(ihost->workq, &conn->xmitwork))
+		set_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx);
 }
 EXPORT_SYMBOL_GPL(iscsi_conn_queue_work);
 
@@ -1441,6 +1442,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
 	int rc = 0;
 
 	spin_lock_bh(&conn->session->lock);
+	clear_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx);
 	if (test_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx)) {
 		ISCSI_DBG_SESSION(conn->session, "Tx suspended!\n");
 		spin_unlock_bh(&conn->session->lock);
@@ -1527,6 +1529,10 @@ check_mgmt:
 		if (!list_empty(&conn->mgmtqueue))
 			goto check_mgmt;
 	}
+	/* Check if we've missed workqueue invocations */
+	if (test_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx) &&
+	    (!list_empty(&conn->cmdqueue) || !list_empty(&conn->requeue)) )
+		goto check_mgmt;
 	spin_unlock_bh(&conn->session->lock);
 	return -ENODATA;
 
@@ -1895,9 +1901,9 @@ void iscsi_suspend_tx(struct iscsi_conn *conn)
 	struct Scsi_Host *shost = conn->session->host;
 	struct iscsi_host *ihost = shost_priv(shost);
 
+	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 	if (ihost->workq)
 		flush_workqueue(ihost->workq);
-	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 }
 EXPORT_SYMBOL_GPL(iscsi_suspend_tx);
 
@@ -1956,6 +1962,7 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
 	task = (struct iscsi_task *)sc->SCp.ptr;
 	if (!task)
 		goto done;
+#if 0
 	/*
 	 * If we have sent (at least queued to the network layer) a pdu or
 	 * recvd one for the task since the last timeout ask for
@@ -1972,7 +1979,7 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
 		rc = BLK_EH_RESET_TIMER;
 		goto done;
 	}
-
+#endif
 	if (!conn->recv_timeout && !conn->ping_timeout)
 		goto done;
 	/*
@@ -2183,7 +2190,6 @@ success:
 success_unlocked:
 	ISCSI_DBG_EH(session, "abort success [sc %p itt 0x%x]\n",
 		     sc, task->itt);
-	memset(hdr, 0, sizeof(*hdr));
 	mutex_unlock(&session->eh_mutex);
 	return SUCCESS;
 
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 61afeb5..574e48b 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -63,6 +63,8 @@ enum {
 
 /* Connection suspend "bit" */
 #define ISCSI_SUSPEND_BIT		1
+/* Missed workqueue calls */
+#define ISCSI_WORKQUEUE_BIT		2
 
 #define ISCSI_ITT_MASK			0x1fff
 #define ISCSI_TOTAL_CMDS_MAX		4096
-- 
1.6.0.2

Reply via email to