Hi Mike,
finally I've found the mysterious I/O stall I've been chasing
for the last two months.
Problem is iscsi_conn_queue_work(); we're just calling
queue_work() without checking the return value.
However, queue_work() will be effectively a no-op when
there is some work already running, so we'll be missing
quite a few invocations (on my NetApp target I've been
counting up to 120000 missed invocations ...).
In addition to that, iscsi_check_cmdsn_window_closed()
doesn't actually check if the cmdsn window has been closed
(in the sense that we're not allowed to send any PDUs),
but rather that the new PDU _to be queued_ will be rejected.
There might still be PDUs in the queue, waiting to be
transferred.
So if we're hitting queuecommand hard enough we might be running into this
scenario:
- iscsi_data_xmit:
Transfers last Data-out PDU in requeue list; all queues are empty.
-> xmit_task()
-> unlock session
- queuecommand()
being called repeatedly until iscsi_check_cmdsn triggers, giving
xmit thread no chance to grab the session lock
-> iscsi_conn_queue_work() does nothing as work is already pending
- iscsi_data_xmit:
-> locks session
-> returns from xmit_task()
-> end of iscsi_data_xmit()
-> I/O stall.
So we really should check if we've missed some queue_work calls,
and restart iscsi_data_xmit() if so.
Proposed patch is attached.
Note that in theory we might loop here when ->xmit_task() returns
an error. But this loop should be closed as we have had some
changes to the lists as someone called queue_work in the meantime.
Comments etc welcome.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---
>From 1ab84cdf687ad989a807e0e01fa136601fa3ac2b Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <[email protected]>
Date: Fri, 14 Aug 2009 13:50:15 +0200
Subject: [PATCH] libiscsi: I/O stall under heavy load
When the system is under load we might be missing quite some
invocations to queue_work(). So we need to check at the end
of iscsi_data_xmit() if we've missed some, and, given that
there's still some data in the queue, restart ourselves.
Signed-off-by: Hannes Reinecke <[email protected]>
---
drivers/scsi/libiscsi.c | 16 +++++++++++-----
include/scsi/libiscsi.h | 2 ++
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 0356e3d..75870dd 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -104,8 +104,9 @@ inline void iscsi_conn_queue_work(struct iscsi_conn *conn)
struct Scsi_Host *shost = conn->session->host;
struct iscsi_host *ihost = shost_priv(shost);
- if (ihost->workq)
- queue_work(ihost->workq, &conn->xmitwork);
+ if (ihost->workq &&
+ !queue_work(ihost->workq, &conn->xmitwork))
+ set_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx);
}
EXPORT_SYMBOL_GPL(iscsi_conn_queue_work);
@@ -1441,6 +1442,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
int rc = 0;
spin_lock_bh(&conn->session->lock);
+ clear_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx);
if (test_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx)) {
ISCSI_DBG_SESSION(conn->session, "Tx suspended!\n");
spin_unlock_bh(&conn->session->lock);
@@ -1527,6 +1529,10 @@ check_mgmt:
if (!list_empty(&conn->mgmtqueue))
goto check_mgmt;
}
+ /* Check if we've missed workqueue invocations */
+ if (test_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx) &&
+ (!list_empty(&conn->cmdqueue) || !list_empty(&conn->requeue)) )
+ goto check_mgmt;
spin_unlock_bh(&conn->session->lock);
return -ENODATA;
@@ -1895,9 +1901,9 @@ void iscsi_suspend_tx(struct iscsi_conn *conn)
struct Scsi_Host *shost = conn->session->host;
struct iscsi_host *ihost = shost_priv(shost);
+ set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
if (ihost->workq)
flush_workqueue(ihost->workq);
- set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
}
EXPORT_SYMBOL_GPL(iscsi_suspend_tx);
@@ -1956,6 +1962,7 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
task = (struct iscsi_task *)sc->SCp.ptr;
if (!task)
goto done;
+#if 0
/*
* If we have sent (at least queued to the network layer) a pdu or
* recvd one for the task since the last timeout ask for
@@ -1972,7 +1979,7 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
rc = BLK_EH_RESET_TIMER;
goto done;
}
-
+#endif
if (!conn->recv_timeout && !conn->ping_timeout)
goto done;
/*
@@ -2183,7 +2190,6 @@ success:
success_unlocked:
ISCSI_DBG_EH(session, "abort success [sc %p itt 0x%x]\n",
sc, task->itt);
- memset(hdr, 0, sizeof(*hdr));
mutex_unlock(&session->eh_mutex);
return SUCCESS;
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 61afeb5..574e48b 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -63,6 +63,8 @@ enum {
/* Connection suspend "bit" */
#define ISCSI_SUSPEND_BIT 1
+/* Missed workqueue calls */
+#define ISCSI_WORKQUEUE_BIT 2
#define ISCSI_ITT_MASK 0x1fff
#define ISCSI_TOTAL_CMDS_MAX 4096
--
1.6.0.2