Re: [PATCH] I/O stall finally solved
On 08/17/2009 10:00 AM, Hannes Reinecke wrote: > Mike Christie wrote: >> On 08/14/2009 07:01 AM, Hannes Reinecke wrote: >>> Hi Mike, >>> >>> finally I've found the mysterious I/O stall I've been chasing >>> for the last two months. >>> >>> Problem is iscsi_conn_queue_work(); we're just calling >>> queue_work() without checking the return value. >>> >>> However, queue_work() will be effectively a no-op when >>> there is some work already running, so we'll be missing >>> quite a few invocations (on my NetApp target I've been >>> counting up to 12 missed invocations ...). >>> >> Huh, I think I am misunderstanding you or I must be misreading workqueue >> code or probably both :( >> >> When you saying running do you mean the work fn is running or that the >> work is queued (WORK_STRUCT_PENDING bit is set but the work fn is not >> yet called)? >> >> >> I thought this happens when queue_work returns 1: >> >> run_workqueue >> work_clear_pending >> f(work) >> >> so if we called queue_work while iscsi_data_xmit was running then >> queue_work's pending test: >> >> if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) { >> BUG_ON(!list_empty(&work->entry)); >> __queue_work(wq_per_cpu(wq, cpu), work); >> >> >> would actually queue the work to be run. So I thought we could actually >> be getting extra calls to our work function, because we could call >> queue_work right after the WORK_STRUCT_PENDING bit cleared but the work >> fn is not yet running. >> >> If the queue_work returns 0, then I thought iscsi_data_xmit is not yet >> running, so when it does run it would handle the task added to the list >> by the caller of queue_work (we do list_add then queue_work so it should >> be on the list when the work function would run if the work is pending >> when we call queue_work). >> > Yeah, you are right. My assumptions were too simplistic. > (Read: next time I should read the code more thoroughly ...) > > However, I still have the feeling there still is potential > for an I/O stall; this iscsi_check_cmdsn_window_closed() check > simply doesn't feel safe. The netapp target you are using has a fixed window for your test right? You could just set can_queue to that window, and see if the problem occurs. > >> Hey, for the list_add+queue_work sequence do I need to add some sort of >> barrier? >> > Not sure; might've as queue_work is potentially run on another CPU. > >> >>> In addition to that, iscsi_check_cmdsn_window_closed() >>> doesn't actually check if the cmdsn window has been closed >>> (in the sense that we're not allowed to send any PDUs), >>> but rather that the new PDU _to be queued_ will be rejected. >>> There might still be PDUs in the queue, waiting to be >>> transferred. >>> >>> So if we're hitting queuecommand hard enough we might be running into this >>> scenario: >>> >>> - iscsi_data_xmit: >>> Transfers last Data-out PDU in requeue list; all queues are empty. >>> -> xmit_task() >>> -> unlock session >>> >>> - queuecommand() >>> being called repeatedly until iscsi_check_cmdsn triggers, giving >>> xmit thread no chance to grab the session lock >>> -> iscsi_conn_queue_work() does nothing as work is already pending >> Yeah, so I thought here because iscsi_data_xmit is running we hit this: >> >> run_workqueue >> work_clear_pending >> >> //WORK_STRUCT_PENDING is now cleared so queue_work will queue the work >> (queue_work returns 1). >> >> f(work) >> >> >> > But we still will be stuck eg if sendpage() returns a retryable > error other than -EAGAIN. > If such a thing exists. Will have to check. I was asking on your other patch where you modified the queue_work in iscsi_update_cmdsn, if when the sendpage happens if iscsi_sw_tcp_write_space should always be called when space finally opens up. If so then iscsi_sw_tcp_write_space will call iscsi_conn_queue_work and that should start us up again. If iscsi_sw_tcp_write_space does not always get called then we have to add some code to iscsi_complete_command to wake up the xmit thread when a command still needs to be sent, and we have to modify iscsi_data_xmit to do a delayed queue_work when there are no commands in flight. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] I/O stall finally solved
On 08/17/2009 10:00 AM, Hannes Reinecke wrote: > However, I still have the feeling there still is potential > for an I/O stall; this iscsi_check_cmdsn_window_closed() check > simply doesn't feel safe. > Do you still have that test in iscsi_data_xmit or iscsi_xmit_task? If so there are the other problems I mentioned in the patch you modified iscsi_update_cmdsn's iscsi_conn_queue_work call. If you just mean the iscsi_check_cmdsn_window_closed call in iscsi_queuecommand, then I have been hitting problems with fcoe.ko when it returns SCSI_MLQUEUE_HOST_BUSY. The throughput drops from 1XX MB/s to a couple MB/s. I thought there might be bug in the SCSI_MLQUEUE_HOST_BUSY handling code. If you know the window size of your target you can remove the iscsi_check_cmdsn_window_closed call and set can_queue to it. Or you can also just set the can_queue to 1 command and see if you still hit the same problem. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] I/O stall finally solved
Mike Christie wrote: > On 08/14/2009 07:01 AM, Hannes Reinecke wrote: >> Hi Mike, >> >> finally I've found the mysterious I/O stall I've been chasing >> for the last two months. >> >> Problem is iscsi_conn_queue_work(); we're just calling >> queue_work() without checking the return value. >> >> However, queue_work() will be effectively a no-op when >> there is some work already running, so we'll be missing >> quite a few invocations (on my NetApp target I've been >> counting up to 12 missed invocations ...). >> > > Huh, I think I am misunderstanding you or I must be misreading workqueue > code or probably both :( > > When you saying running do you mean the work fn is running or that the > work is queued (WORK_STRUCT_PENDING bit is set but the work fn is not > yet called)? > > > I thought this happens when queue_work returns 1: > > run_workqueue > work_clear_pending > f(work) > > so if we called queue_work while iscsi_data_xmit was running then > queue_work's pending test: > > if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) { > BUG_ON(!list_empty(&work->entry)); > __queue_work(wq_per_cpu(wq, cpu), work); > > > would actually queue the work to be run. So I thought we could actually > be getting extra calls to our work function, because we could call > queue_work right after the WORK_STRUCT_PENDING bit cleared but the work > fn is not yet running. > > If the queue_work returns 0, then I thought iscsi_data_xmit is not yet > running, so when it does run it would handle the task added to the list > by the caller of queue_work (we do list_add then queue_work so it should > be on the list when the work function would run if the work is pending > when we call queue_work). > Yeah, you are right. My assumptions were too simplistic. (Read: next time I should read the code more thoroughly ...) However, I still have the feeling there still is potential for an I/O stall; this iscsi_check_cmdsn_window_closed() check simply doesn't feel safe. > > Hey, for the list_add+queue_work sequence do I need to add some sort of > barrier? > Not sure; might've as queue_work is potentially run on another CPU. > > >> In addition to that, iscsi_check_cmdsn_window_closed() >> doesn't actually check if the cmdsn window has been closed >> (in the sense that we're not allowed to send any PDUs), >> but rather that the new PDU _to be queued_ will be rejected. >> There might still be PDUs in the queue, waiting to be >> transferred. >> >> So if we're hitting queuecommand hard enough we might be running into this >> scenario: >> >> - iscsi_data_xmit: >> Transfers last Data-out PDU in requeue list; all queues are empty. >> -> xmit_task() >>-> unlock session >> >> - queuecommand() >> being called repeatedly until iscsi_check_cmdsn triggers, giving >> xmit thread no chance to grab the session lock >> -> iscsi_conn_queue_work() does nothing as work is already pending > > Yeah, so I thought here because iscsi_data_xmit is running we hit this: > > run_workqueue > work_clear_pending > > //WORK_STRUCT_PENDING is now cleared so queue_work will queue the work > (queue_work returns 1). > > f(work) > > > But we still will be stuck eg if sendpage() returns a retryable error other than -EAGAIN. If such a thing exists. Will have to check. Anyway, hope to have some more details tomorrow. Running failover tests against a NetApp Filer. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] I/O stall finally solved
On 08/14/2009 07:01 AM, Hannes Reinecke wrote: > Hi Mike, > > finally I've found the mysterious I/O stall I've been chasing > for the last two months. > > Problem is iscsi_conn_queue_work(); we're just calling > queue_work() without checking the return value. > > However, queue_work() will be effectively a no-op when > there is some work already running, so we'll be missing > quite a few invocations (on my NetApp target I've been > counting up to 12 missed invocations ...). > Huh, I think I am misunderstanding you or I must be misreading workqueue code or probably both :( When you saying running do you mean the work fn is running or that the work is queued (WORK_STRUCT_PENDING bit is set but the work fn is not yet called)? I thought this happens when queue_work returns 1: run_workqueue work_clear_pending f(work) so if we called queue_work while iscsi_data_xmit was running then queue_work's pending test: if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) { BUG_ON(!list_empty(&work->entry)); __queue_work(wq_per_cpu(wq, cpu), work); would actually queue the work to be run. So I thought we could actually be getting extra calls to our work function, because we could call queue_work right after the WORK_STRUCT_PENDING bit cleared but the work fn is not yet running. If the queue_work returns 0, then I thought iscsi_data_xmit is not yet running, so when it does run it would handle the task added to the list by the caller of queue_work (we do list_add then queue_work so it should be on the list when the work function would run if the work is pending when we call queue_work). Hey, for the list_add+queue_work sequence do I need to add some sort of barrier? > In addition to that, iscsi_check_cmdsn_window_closed() > doesn't actually check if the cmdsn window has been closed > (in the sense that we're not allowed to send any PDUs), > but rather that the new PDU _to be queued_ will be rejected. > There might still be PDUs in the queue, waiting to be > transferred. > > So if we're hitting queuecommand hard enough we might be running into this > scenario: > > - iscsi_data_xmit: > Transfers last Data-out PDU in requeue list; all queues are empty. > -> xmit_task() >-> unlock session > > - queuecommand() > being called repeatedly until iscsi_check_cmdsn triggers, giving > xmit thread no chance to grab the session lock > -> iscsi_conn_queue_work() does nothing as work is already pending Yeah, so I thought here because iscsi_data_xmit is running we hit this: run_workqueue work_clear_pending //WORK_STRUCT_PENDING is now cleared so queue_work will queue the work (queue_work returns 1). f(work) > > - iscsi_data_xmit: >-> locks session > -> returns from xmit_task() >-> end of iscsi_data_xmit() > > -> I/O stall. > > So we really should check if we've missed some queue_work calls, > and restart iscsi_data_xmit() if so. > Proposed patch is attached. > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
[PATCH] I/O stall finally solved
Hi Mike, finally I've found the mysterious I/O stall I've been chasing for the last two months. Problem is iscsi_conn_queue_work(); we're just calling queue_work() without checking the return value. However, queue_work() will be effectively a no-op when there is some work already running, so we'll be missing quite a few invocations (on my NetApp target I've been counting up to 12 missed invocations ...). In addition to that, iscsi_check_cmdsn_window_closed() doesn't actually check if the cmdsn window has been closed (in the sense that we're not allowed to send any PDUs), but rather that the new PDU _to be queued_ will be rejected. There might still be PDUs in the queue, waiting to be transferred. So if we're hitting queuecommand hard enough we might be running into this scenario: - iscsi_data_xmit: Transfers last Data-out PDU in requeue list; all queues are empty. -> xmit_task() -> unlock session - queuecommand() being called repeatedly until iscsi_check_cmdsn triggers, giving xmit thread no chance to grab the session lock -> iscsi_conn_queue_work() does nothing as work is already pending - iscsi_data_xmit: -> locks session -> returns from xmit_task() -> end of iscsi_data_xmit() -> I/O stall. So we really should check if we've missed some queue_work calls, and restart iscsi_data_xmit() if so. Proposed patch is attached. Note that in theory we might loop here when ->xmit_task() returns an error. But this loop should be closed as we have had some changes to the lists as someone called queue_work in the meantime. Comments etc welcome. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~--- >From 1ab84cdf687ad989a807e0e01fa136601fa3ac2b Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Fri, 14 Aug 2009 13:50:15 +0200 Subject: [PATCH] libiscsi: I/O stall under heavy load When the system is under load we might be missing quite some invocations to queue_work(). So we need to check at the end of iscsi_data_xmit() if we've missed some, and, given that there's still some data in the queue, restart ourselves. Signed-off-by: Hannes Reinecke --- drivers/scsi/libiscsi.c | 16 +++- include/scsi/libiscsi.h |2 ++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 0356e3d..75870dd 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -104,8 +104,9 @@ inline void iscsi_conn_queue_work(struct iscsi_conn *conn) struct Scsi_Host *shost = conn->session->host; struct iscsi_host *ihost = shost_priv(shost); - if (ihost->workq) - queue_work(ihost->workq, &conn->xmitwork); + if (ihost->workq && + !queue_work(ihost->workq, &conn->xmitwork)) + set_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx); } EXPORT_SYMBOL_GPL(iscsi_conn_queue_work); @@ -1441,6 +1442,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) int rc = 0; spin_lock_bh(&conn->session->lock); + clear_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx); if (test_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx)) { ISCSI_DBG_SESSION(conn->session, "Tx suspended!\n"); spin_unlock_bh(&conn->session->lock); @@ -1527,6 +1529,10 @@ check_mgmt: if (!list_empty(&conn->mgmtqueue)) goto check_mgmt; } + /* Check if we've missed workqueue invocations */ + if (test_bit(ISCSI_WORKQUEUE_BIT, &conn->suspend_tx) && + (!list_empty(&conn->cmdqueue) || !list_empty(&conn->requeue)) ) + goto check_mgmt; spin_unlock_bh(&conn->session->lock); return -ENODATA; @@ -1895,9 +1901,9 @@ void iscsi_suspend_tx(struct iscsi_conn *conn) struct Scsi_Host *shost = conn->session->host; struct iscsi_host *ihost = shost_priv(shost); + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); if (ihost->workq) flush_workqueue(ihost->workq); - set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); } EXPORT_SYMBOL_GPL(iscsi_suspend_tx); @@ -1956,6 +1962,7 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) task = (struct iscsi_task *)sc->SCp.ptr; if (!task) goto done; +#if 0 /* * If we have sent (at least queued to the network layer) a pdu or * recvd one for the task since the last timeout ask for @@ -1972,7 +1979,7 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) rc = BLK_EH_RESET_TIMER; goto done; } - +#endif if (!conn->recv_timeout && !conn->