Re: [PATCH] I/O stall finally solved

2009-08-18 Thread Mike Christie

On 08/17/2009 10:00 AM, Hannes Reinecke wrote:
> Mike Christie wrote:
>> On 08/14/2009 07:01 AM, Hannes Reinecke wrote:
>>> Hi Mike,
>>>
>>> finally I've found the mysterious I/O stall I've been chasing
>>> for the last two months.
>>>
>>> Problem is iscsi_conn_queue_work(); we're just calling
>>> queue_work() without checking the return value.
>>>
>>> However, queue_work() will be effectively a no-op when
>>> there is some work already running, so we'll be missing
>>> quite a few invocations (on my NetApp target I've been
>>> counting up to 12 missed invocations ...).
>>>
>> Huh, I think I am misunderstanding you or I must be misreading workqueue
>> code or probably both :(
>>
>> When you saying running do you mean the work fn is running or that the
>> work is queued (WORK_STRUCT_PENDING bit is set but the work fn is not
>> yet called)?
>>
>>
>> I thought this happens when queue_work returns 1:
>>
>> run_workqueue
>>  work_clear_pending
>>  f(work)
>>
>> so if we called queue_work while iscsi_data_xmit was running then
>> queue_work's pending test:
>>
>>   if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
>>   BUG_ON(!list_empty(&work->entry));
>>   __queue_work(wq_per_cpu(wq, cpu), work);
>>
>>
>> would actually queue the work to be run. So I thought we could actually
>> be getting extra calls to our work function, because we could call
>> queue_work right after the WORK_STRUCT_PENDING bit cleared but the work
>> fn is not yet running.
>>
>> If the queue_work returns 0, then I thought iscsi_data_xmit is not yet
>> running, so when it does run it would handle the task added to the list
>> by the caller of queue_work (we do list_add then queue_work so it should
>> be on the list when the work function would run if the work is pending
>> when we call queue_work).
>>
> Yeah, you are right. My assumptions were too simplistic.
> (Read: next time I should read the code more thoroughly ...)
>
> However, I still have the feeling there still is potential
> for an I/O stall; this iscsi_check_cmdsn_window_closed() check
> simply doesn't feel safe.

The netapp target you are using has a fixed window for your test right? 
You could just set can_queue to that window, and see if the problem occurs.


>
>> Hey, for the list_add+queue_work sequence do I need to add some sort of
>> barrier?
>>
> Not sure; might've as queue_work is potentially run on another CPU.
>
>>
>>> In addition to that, iscsi_check_cmdsn_window_closed()
>>> doesn't actually check if the cmdsn window has been closed
>>> (in the sense that we're not allowed to send any PDUs),
>>> but rather that the new PDU _to be queued_ will be rejected.
>>> There might still be PDUs in the queue, waiting to be
>>> transferred.
>>>
>>> So if we're hitting queuecommand hard enough we might be running into this
>>> scenario:
>>>
>>> - iscsi_data_xmit:
>>>  Transfers last Data-out PDU in requeue list; all queues are empty.
>>>  ->   xmit_task()
>>> ->   unlock session
>>>
>>> - queuecommand()
>>>   being called repeatedly until iscsi_check_cmdsn triggers, giving
>>>   xmit thread no chance to grab the session lock
>>>   ->   iscsi_conn_queue_work() does nothing as work is already pending
>> Yeah, so I thought here because iscsi_data_xmit is running we hit this:
>>
>> run_workqueue
>>  work_clear_pending
>>
>> //WORK_STRUCT_PENDING is now cleared so queue_work will queue the work
>> (queue_work returns 1).
>>
>>  f(work)
>>
>>
>>
> But we still will be stuck eg if sendpage() returns a retryable
> error other than -EAGAIN.
> If such a thing exists. Will have to check.


I was asking on your other patch where you modified the queue_work in 
iscsi_update_cmdsn, if when the sendpage happens if 
iscsi_sw_tcp_write_space should always be called when space finally 
opens up. If so then iscsi_sw_tcp_write_space will call 
iscsi_conn_queue_work and that should start us up again.

If iscsi_sw_tcp_write_space does not always get called then we have to 
add some code to iscsi_complete_command to wake up the xmit thread when 
a command still needs to be sent, and we have to modify iscsi_data_xmit 
to do a delayed queue_work when there are no commands in flight.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH] I/O stall finally solved

2009-08-18 Thread Mike Christie

On 08/17/2009 10:00 AM, Hannes Reinecke wrote:
> However, I still have the feeling there still is potential
> for an I/O stall; this iscsi_check_cmdsn_window_closed() check
> simply doesn't feel safe.
>

Do you still have that test in iscsi_data_xmit or iscsi_xmit_task? If so 
there are the other problems I mentioned in the patch you modified 
iscsi_update_cmdsn's iscsi_conn_queue_work call.

If you just mean the iscsi_check_cmdsn_window_closed call in 
iscsi_queuecommand, then I have been hitting problems with fcoe.ko when 
it returns SCSI_MLQUEUE_HOST_BUSY. The throughput drops from 1XX MB/s to 
a couple MB/s. I thought there might be bug in the 
SCSI_MLQUEUE_HOST_BUSY handling code.

If you know the window size of your target you can remove the 
iscsi_check_cmdsn_window_closed call and set can_queue to it. Or you can 
also just set the can_queue to 1 command and see if you still hit the 
same problem.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH] I/O stall finally solved

2009-08-17 Thread Hannes Reinecke

Mike Christie wrote:
> On 08/14/2009 07:01 AM, Hannes Reinecke wrote:
>> Hi Mike,
>>
>> finally I've found the mysterious I/O stall I've been chasing
>> for the last two months.
>>
>> Problem is iscsi_conn_queue_work(); we're just calling
>> queue_work() without checking the return value.
>>
>> However, queue_work() will be effectively a no-op when
>> there is some work already running, so we'll be missing
>> quite a few invocations (on my NetApp target I've been
>> counting up to 12 missed invocations ...).
>>
> 
> Huh, I think I am misunderstanding you or I must be misreading workqueue 
> code or probably both :(
> 
> When you saying running do you mean the work fn is running or that the 
> work is queued (WORK_STRUCT_PENDING bit is set but the work fn is not 
> yet called)?
> 
> 
> I thought this happens when queue_work returns 1:
> 
> run_workqueue
>   work_clear_pending
>   f(work)
> 
> so if we called queue_work while iscsi_data_xmit was running then 
> queue_work's pending test:
> 
>  if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
>  BUG_ON(!list_empty(&work->entry));
>  __queue_work(wq_per_cpu(wq, cpu), work);
> 
> 
> would actually queue the work to be run. So I thought we could actually 
> be getting extra calls to our work function, because we could call 
> queue_work right after the WORK_STRUCT_PENDING bit cleared but the work 
> fn is not yet running.
> 
> If the queue_work returns 0, then I thought iscsi_data_xmit is not yet 
> running, so when it does run it would handle the task added to the list 
> by the caller of queue_work (we do list_add then queue_work so it should 
> be on the list when the work function would run if the work is pending 
> when we call queue_work).
> 
Yeah, you are right. My assumptions were too simplistic.
(Read: next time I should read the code more thoroughly ...)

However, I still have the feeling there still is potential
for an I/O stall; this iscsi_check_cmdsn_window_closed() check
simply doesn't feel safe.

> 
> Hey, for the list_add+queue_work sequence do I need to add some sort of 
> barrier?
> 
Not sure; might've as queue_work is potentially run on another CPU.

> 
> 
>> In addition to that, iscsi_check_cmdsn_window_closed()
>> doesn't actually check if the cmdsn window has been closed
>> (in the sense that we're not allowed to send any PDUs),
>> but rather that the new PDU _to be queued_ will be rejected.
>> There might still be PDUs in the queue, waiting to be
>> transferred.
>>
>> So if we're hitting queuecommand hard enough we might be running into this
>> scenario:
>>
>> - iscsi_data_xmit:
>> Transfers last Data-out PDU in requeue list; all queues are empty.
>> ->  xmit_task()
>>->  unlock session
>>
>> - queuecommand()
>>  being called repeatedly until iscsi_check_cmdsn triggers, giving
>>  xmit thread no chance to grab the session lock
>>  ->  iscsi_conn_queue_work() does nothing as work is already pending
> 
> Yeah, so I thought here because iscsi_data_xmit is running we hit this:
> 
> run_workqueue
>   work_clear_pending
> 
> //WORK_STRUCT_PENDING is now cleared so queue_work will queue the work 
> (queue_work returns 1).
> 
>   f(work)
> 
> 
> 
But we still will be stuck eg if sendpage() returns a retryable
error other than -EAGAIN.
If such a thing exists. Will have to check.

Anyway, hope to have some more details tomorrow.
Running failover tests against a NetApp Filer.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH] I/O stall finally solved

2009-08-15 Thread Mike Christie

On 08/14/2009 07:01 AM, Hannes Reinecke wrote:
> Hi Mike,
>
> finally I've found the mysterious I/O stall I've been chasing
> for the last two months.
>
> Problem is iscsi_conn_queue_work(); we're just calling
> queue_work() without checking the return value.
>
> However, queue_work() will be effectively a no-op when
> there is some work already running, so we'll be missing
> quite a few invocations (on my NetApp target I've been
> counting up to 12 missed invocations ...).
>

Huh, I think I am misunderstanding you or I must be misreading workqueue 
code or probably both :(

When you saying running do you mean the work fn is running or that the 
work is queued (WORK_STRUCT_PENDING bit is set but the work fn is not 
yet called)?


I thought this happens when queue_work returns 1:

run_workqueue
work_clear_pending
f(work)

so if we called queue_work while iscsi_data_xmit was running then 
queue_work's pending test:

 if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
 BUG_ON(!list_empty(&work->entry));
 __queue_work(wq_per_cpu(wq, cpu), work);


would actually queue the work to be run. So I thought we could actually 
be getting extra calls to our work function, because we could call 
queue_work right after the WORK_STRUCT_PENDING bit cleared but the work 
fn is not yet running.

If the queue_work returns 0, then I thought iscsi_data_xmit is not yet 
running, so when it does run it would handle the task added to the list 
by the caller of queue_work (we do list_add then queue_work so it should 
be on the list when the work function would run if the work is pending 
when we call queue_work).


Hey, for the list_add+queue_work sequence do I need to add some sort of 
barrier?



> In addition to that, iscsi_check_cmdsn_window_closed()
> doesn't actually check if the cmdsn window has been closed
> (in the sense that we're not allowed to send any PDUs),
> but rather that the new PDU _to be queued_ will be rejected.
> There might still be PDUs in the queue, waiting to be
> transferred.
>
> So if we're hitting queuecommand hard enough we might be running into this
> scenario:
>
> - iscsi_data_xmit:
> Transfers last Data-out PDU in requeue list; all queues are empty.
> ->  xmit_task()
>->  unlock session
>
> - queuecommand()
>  being called repeatedly until iscsi_check_cmdsn triggers, giving
>  xmit thread no chance to grab the session lock
>  ->  iscsi_conn_queue_work() does nothing as work is already pending

Yeah, so I thought here because iscsi_data_xmit is running we hit this:

run_workqueue
work_clear_pending

//WORK_STRUCT_PENDING is now cleared so queue_work will queue the work 
(queue_work returns 1).

f(work)



>
> - iscsi_data_xmit:
>->  locks session
>  ->  returns from xmit_task()
>->  end of iscsi_data_xmit()
>
> ->  I/O stall.
>
> So we really should check if we've missed some queue_work calls,
> and restart iscsi_data_xmit() if so.
> Proposed patch is attached.
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



[PATCH] I/O stall finally solved

2009-08-14 Thread Hannes Reinecke
Hi Mike,

finally I've found the mysterious I/O stall I've been chasing
for the last two months.

Problem is iscsi_conn_queue_work(); we're just calling
queue_work() without checking the return value.

However, queue_work() will be effectively a no-op when
there is some work already running, so we'll be missing
quite a few invocations (on my NetApp target I've been
counting up to 12 missed invocations ...).

In addition to that, iscsi_check_cmdsn_window_closed()
doesn't actually check if the cmdsn window has been closed
(in the sense that we're not allowed to send any PDUs),
but rather that the new PDU _to be queued_ will be rejected.
There might still be PDUs in the queue, waiting to be
transferred.

So if we're hitting queuecommand hard enough we might be running into this
scenario:

- iscsi_data_xmit:
   Transfers last Data-out PDU in requeue list; all queues are empty.
   -> xmit_task()
  -> unlock session

- queuecommand()
being called repeatedly until iscsi_check_cmdsn triggers, giving
xmit thread no chance to grab the session lock
-> iscsi_conn_queue_work() does nothing as work is already pending

- iscsi_data_xmit:
  -> locks session
-> returns from xmit_task()
  -> end of iscsi_data_xmit()

-> I/O stall.

So we really should check if we've missed some queue_work calls,
and restart iscsi_data_xmit() if so.
Proposed patch is attached.

Note that in theory we might loop here when ->xmit_task() returns
an error. But this loop should be closed as we have had some
changes to the lists as someone called queue_work in the meantime.

Comments etc welcome.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

>From 1ab84cdf687ad989a807e0e01fa136601fa3ac2b Mon Sep 17 00:00:00 2001
From: Hannes Reinecke 
Date: Fri, 14 Aug 2009 13:50:15 +0200
Subject: [PATCH] libiscsi: I/O stall under heavy load

When the system is under load we might be missing quite some
invocations to queue_work(). So we need to check at the end
of iscsi_data_xmit() if we've missed some, and, given that
there's still some data in the queue, restart ourselves.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/libiscsi.c |   16 +++-
 include/scsi/libiscsi.h |2 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 0356e3d..75870dd 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -104,8 +104,9 @@ inline void iscsi_conn_queue_work(struct iscsi_conn *conn)
 	struct Scsi_Host *shost = conn->session->host;
 	struct iscsi_host *ihost = shost_priv(shost);
 
-	if (ihost->workq)
-		queue_work(ihost->workq, &conn->xmitwork);
+	if (ihost->workq &&
+	!queue_work(ihost->workq, &conn->xmitwork))
+		set_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx);
 }
 EXPORT_SYMBOL_GPL(iscsi_conn_queue_work);
 
@@ -1441,6 +1442,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
 	int rc = 0;
 
 	spin_lock_bh(&conn->session->lock);
+	clear_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx);
 	if (test_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx)) {
 		ISCSI_DBG_SESSION(conn->session, "Tx suspended!\n");
 		spin_unlock_bh(&conn->session->lock);
@@ -1527,6 +1529,10 @@ check_mgmt:
 		if (!list_empty(&conn->mgmtqueue))
 			goto check_mgmt;
 	}
+	/* Check if we've missed workqueue invocations */
+	if (test_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx) &&
+	(!list_empty(&conn->cmdqueue) || !list_empty(&conn->requeue)) )
+		goto check_mgmt;
 	spin_unlock_bh(&conn->session->lock);
 	return -ENODATA;
 
@@ -1895,9 +1901,9 @@ void iscsi_suspend_tx(struct iscsi_conn *conn)
 	struct Scsi_Host *shost = conn->session->host;
 	struct iscsi_host *ihost = shost_priv(shost);
 
+	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 	if (ihost->workq)
 		flush_workqueue(ihost->workq);
-	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 }
 EXPORT_SYMBOL_GPL(iscsi_suspend_tx);
 
@@ -1956,6 +1962,7 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
 	task = (struct iscsi_task *)sc->SCp.ptr;
 	if (!task)
 		goto done;
+#if 0
 	/*
 	 * If we have sent (at least queued to the network layer) a pdu or
 	 * recvd one for the task since the last timeout ask for
@@ -1972,7 +1979,7 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
 		rc = BLK_EH_RESET_TIMER;
 		goto done;
 	}
-
+#endif
 	if (!conn->recv_timeout && !conn->