Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Mike Christie wrote: On 07/23/2009 05:20 AM, Boaz Harrosh wrote: On 07/23/2009 12:24 PM, Mike Christie wrote: But you still might be hitting a problem where the target does not like data-outs when it closed the window. Maybe they interpreted the RFC differently. You should ask the HP target guys for more info. Also your patch might be working because I think it ends up throttling the connection, so IO does not timeout because pipes are backed up (the slow down from the throttling is one of the problems we hit with the patch I did before which was pretty much the same as you posted). I think that what Hannes patch is doing is exactly that off-by-one command. So maybe you are right and it is an HP bug where the window check is off-by-one or they do not like data-outs after window close. Try to compare less-one in queuecommand and see if it helps the same? Hannes, did you try this? I am attaching a patch that should give you this behavior. It should do the same as checking the session-cmdsn in iscsi_task_xmit where the cmdsn is already incremented so we end up not sending a task with cmdsn == maxcmdsn. I talked to the msa2000 target guys and they said they allow a cmdsn with maxcmdsn, so that patch should not help. Did it do anything? One thing they mentioned was that the target can send QUEUE_FULLs. The iscsi initiator does not really do anything with these now. Other FC drivers use track queue full. I am not sure how the queue fulls would cause a IO hang. Maybe because we are hitting too hard, something nasty ends up happening. Maybe the requeueing logic is bad in the ml. I did the attached patch to print out if we get a queue full. If so we call the track queue full function like fc. The problem is that it starts reducing the queue depth from the queue_depth. If sdev_busy was less than queue_depth then it will take some time to ramp down, so I am not sure if this will be that helpful. At least we can see if we are getting q fulls from the target - let me know at least if we are seeing those. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~--- diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index a751f62..6a21755 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -738,6 +738,25 @@ invalid_datalen: ISCSI_DBG_SESSION(session, copied %d bytes of sense\n, min_t(uint16_t, senselen, SCSI_SENSE_BUFFERSIZE)); + } else if (rhdr-cmd_status == SAM_STAT_TASK_SET_FULL) { + struct scsi_device *tmp_sdev; + struct scsi_device *sdev = sc-device; + + iscsi_session_printk(KERN_ERR, session, Got QUEUE FULL %u %u +%u %u\n, session-max_cmdsn, +session-exp_cmdsn, session-cmdsn, +session-queued_cmdsn); + + + shost_for_each_device(tmp_sdev, sdev-host) { + if (tmp_sdev-id != sdev-id) +continue; + + if (tmp_sdev-queue_depth 1) { +scsi_track_queue_full(tmp_sdev, + tmp_sdev-queue_depth - 1); + } + } } if (rhdr-flags (ISCSI_FLAG_CMD_BIDI_UNDERFLOW |
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Hey Hannes, Do you only need this patch with your cmdsn patch? The code below is a leftover from when I had the cmdsn check in iscsi_datra_xmit. I am not sure why we would need to wake the xmit thread. The reasons I have: 1. queuecommand or __iscsi_conn_send_pdu adds new task. 2. sendpage returns -EAGAIN. For this iscsi_tcp should call wake the xmit thread from write_space right? Is there another error sendpage could throw where write_space is not called like maybe a connection problem? 3. xmit thread was suspended for eh purposes. With your cmdsn patch 4. If you return from iscsi_data_xmit because you think the window is closed. Did you try the queued_cmdsn lt vs lte patch, or heard back from the hp guys about their handling of data-outs when the window is closed btw? Hannes Reinecke wrote: There is another issue leading to an I/O stall: when there is a command pending to be retried, ie conn-task is set after we exit xmitworker(), the next incoming PDU will not trigger the xmit thread as the queues might be empty, although a command is still pending. Try this: diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 716cc34..4493cd4 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -133,9 +133,7 @@ iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_ nopin *hdr) * if the window closed with IO queued, then kick the * xmit thread */ - if (!list_empty(session-leadconn-cmdqueue) || - !list_empty(session-leadconn-mgmtqueue)) - iscsi_conn_queue_work(session-leadconn); + iscsi_conn_queue_work(session-leadconn); } } EXPORT_SYMBOL_GPL(iscsi_update_cmdsn); Cheers, Hannes --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Mike Christie wrote: Hey Hannes, Do you only need this patch with your cmdsn patch? The code below is a leftover from when I had the cmdsn check in iscsi_datra_xmit. I am not sure why we would need to wake the xmit thread. The reasons I have: 1. queuecommand or __iscsi_conn_send_pdu adds new task. 2. sendpage returns -EAGAIN. For this iscsi_tcp should call wake the xmit thread from write_space right? Is there another error sendpage could throw where write_space is not called like maybe a connection problem? 3. xmit thread was suspended for eh purposes. Oh yeah, if it is somethimg like above, then the window can be open or closed or there might not even be any commands in flight, so maybe we should add a queue_work in iscsi_complete_pdu and a add a queue_delayed in iscsi_data_xmit if there is no commands in flight. But let me know when we are hitting it so we can figure out how to best handle it. With your cmdsn patch 4. If you return from iscsi_data_xmit because you think the window is closed. Did you try the queued_cmdsn lt vs lte patch, or heard back from the hp guys about their handling of data-outs when the window is closed btw? Hannes Reinecke wrote: There is another issue leading to an I/O stall: when there is a command pending to be retried, ie conn-task is set after we exit xmitworker(), the next incoming PDU will not trigger the xmit thread as the queues might be empty, although a command is still pending. Try this: diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 716cc34..4493cd4 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -133,9 +133,7 @@ iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_ nopin *hdr) * if the window closed with IO queued, then kick the * xmit thread */ - if (!list_empty(session-leadconn-cmdqueue) || - !list_empty(session-leadconn-mgmtqueue)) - iscsi_conn_queue_work(session-leadconn); + iscsi_conn_queue_work(session-leadconn); } } EXPORT_SYMBOL_GPL(iscsi_update_cmdsn); Cheers, Hannes --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
On 07/23/2009 05:20 AM, Boaz Harrosh wrote: On 07/23/2009 12:24 PM, Mike Christie wrote: But you still might be hitting a problem where the target does not like data-outs when it closed the window. Maybe they interpreted the RFC differently. You should ask the HP target guys for more info. Also your patch might be working because I think it ends up throttling the connection, so IO does not timeout because pipes are backed up (the slow down from the throttling is one of the problems we hit with the patch I did before which was pretty much the same as you posted). I think that what Hannes patch is doing is exactly that off-by-one command. So maybe you are right and it is an HP bug where the window check is off-by-one or they do not like data-outs after window close. Try to compare less-one in queuecommand and see if it helps the same? Hannes, did you try this? I am attaching a patch that should give you this behavior. It should do the same as checking the session-cmdsn in iscsi_task_xmit where the cmdsn is already incremented so we end up not sending a task with cmdsn == maxcmdsn. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~--- diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 55ff252..7324aa5 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 98df30b..2d4ee28 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1265,7 +1271,7 @@ static int iscsi_check_cmdsn_window_closed(struct iscsi_conn *conn) /* * Check for iSCSI window and take care of CmdSN wrap-around */ - if (!iscsi_sna_lte(session-queued_cmdsn, session-max_cmdsn)) { + if (!iscsi_sna_lt(session-queued_cmdsn, session-max_cmdsn)) { ISCSI_DBG_SESSION(session, iSCSI CmdSN closed. ExpCmdSn %u MaxCmdSN %u CmdSN %u/%u\n, session-exp_cmdsn, session-max_cmdsn,
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Hannes Reinecke wrote: Mike Christie wrote: On 07/22/2009 12:00 PM, Mike Christie wrote: No, wrong. The check in queuecommand is by no means relevant to the actual window. We're checking the target window at the time queuecommand is run, but we're _generating_ the CmdSN only much later after we've dequeued the command. The check in queuecommand is for scsi cmd pdus and checks the session-queued_cmdsn against the MaxCmdSn. The queued_cmdsn is just a preallocation of CmdSn values. So if the scsi layer sends us X commands, the initiator checks if the window has X spaces open. If it does, then we add the task to the cmdqueue, and increase queued_cmdsn. This is supposed to prevent the scsi layer from sending us too many commands and them sitting in the driver cmdqueue and timing out. The cmdsn is then allocated when we put the cmd on the wire. So the Maybe this is causing some confusion. We do not increment the cmdsn value for immediate commands (how we send a tmf or nop) and for data-out pdus. We only increment it for scsi cmd pdus, so the only place I check for the window having space is in queuecommand. If you are worried about a nop or data out increasing the cmdsn value in one thread and a scsi cmd pdu increasing it in another, then it will not happen. We only this one path to worry about. If it were so easy. I have this patch: diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 716cc34..90905f2 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1212,6 +1212,21 @@ static int iscsi_xmit_task(struct iscsi_conn *conn) struct iscsi_task *task = conn-task; int rc; + /* Check for target CmdSN window */ + if (conn-session-state == ISCSI_STATE_LOGGED_IN + iscsi_sna_lt(conn-session-max_cmdsn, +conn-session-cmdsn)) { + iscsi_conn_printk(KERN_INFO, conn, + target window closed cmd %u max %u, + conn-session-cmdsn, + conn-session-max_cmdsn); + +#if 0 + /* Window closed, wait for CmdSN update */ + return -EPROTO; +#endif + } + __iscsi_get_task(task); spin_unlock_bh(conn-session-lock); rc = conn-session-tt-xmit_task(task); against the current linux-2.6-iscsi tree. When running my test harness (10 concurrent bonnie runs over 10 iscsi LUNs), I'm seeing these messages: connection2:0: target window closed cmd 61225 max 61224 connection2:0: target window closed cmd 61258 max 61257 connection2:0: target window closed cmd 61291 max 61290 connection2:0: target window closed cmd 61324 max 61323 connection2:0: target window closed cmd 61357 max 61356 connection2:0: target window closed cmd 61413 max 61412 connection1:0: target window closed cmd 61321 max 61320 connection1:0: target window closed cmd 61361 max 61360 connection1:0: target window closed cmd 61401 max 61400 connection1:0: target window closed cmd 61449 max 61448 connection1:0: target window closed cmd 61482 max 61481 connection1:0: target window closed cmd 61531 max 61530 connection1:0: target window closed cmd 61579 max 61578 connection1:0: target window closed cmd 61626 max 61625 connection1:0: target window closed cmd 61674 max 61673 connection1:0: target window closed cmd 61717 max 61716 connection1:0: target window closed cmd 61754 max 61753 connection1:0: target window closed cmd 61802 max 61801 connection1:0: target window closed cmd 61837 max 61836 connection1:0: target window closed cmd 61874 max 61873 connection2:0: target window closed cmd 62205 max 62204 connection2:0: target window closed cmd 62326 max 62325 connection2:0: target window closed cmd 62390 max 62389 connection2:0: target window closed cmd 62486 max 62485 connection2:0: target window closed cmd 62522 max 62521 connection2:0: target window closed cmd 62686 max 62685 connection2:0: target window closed cmd 62813 max 62812 connection2:0: target window closed cmd 62881 max 62880 connection2:0: target window closed cmd 62914 max 62913 So there _is_ a race window. It might be that my This is expected like I said in the other mail, because before iscsi_xmit_task is called the session-cmdsn is incremented by iscsi_prep_scsi_cmd_pdu when sending scsi cmd pdus. For data-outs and nops, then it will probably be higher too, but like I said in the other mail that is ok. iscsi_data_xmit-iscsi_prep_scsi_cmd_pdu iscsi_prep_scsi_cmd_pdu() { hdr-cmdsn = session-cmdsn; session-cmdsn++; } Then we call iscsi_xmit_task() and your check is spitting out and testing the new/updated session-cmdsn. The scsi cmd pdu that gets sent out is going to have a cmdsn that is less than or equal to maxcmdsn. Try printing out the task-cmdsn for scsi cmd pdus. --~--~-~--~~~---~--~~ You
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Mike Christie wrote: Hannes Reinecke wrote: Mike Christie wrote: On 07/22/2009 12:00 PM, Mike Christie wrote: No, wrong. The check in queuecommand is by no means relevant to the actual window. We're checking the target window at the time queuecommand is run, but we're _generating_ the CmdSN only much later after we've dequeued the command. The check in queuecommand is for scsi cmd pdus and checks the session-queued_cmdsn against the MaxCmdSn. The queued_cmdsn is just a preallocation of CmdSn values. So if the scsi layer sends us X commands, the initiator checks if the window has X spaces open. If it does, then we add the task to the cmdqueue, and increase queued_cmdsn. This is supposed to prevent the scsi layer from sending us too many commands and them sitting in the driver cmdqueue and timing out. The cmdsn is then allocated when we put the cmd on the wire. So the Maybe this is causing some confusion. We do not increment the cmdsn value for immediate commands (how we send a tmf or nop) and for data-out pdus. We only increment it for scsi cmd pdus, so the only place I check for the window having space is in queuecommand. If you are worried about a nop or data out increasing the cmdsn value in one thread and a scsi cmd pdu increasing it in another, then it will not happen. We only this one path to worry about. If it were so easy. I have this patch: diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 716cc34..90905f2 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1212,6 +1212,21 @@ static int iscsi_xmit_task(struct iscsi_conn *conn) struct iscsi_task *task = conn-task; int rc; + /* Check for target CmdSN window */ + if (conn-session-state == ISCSI_STATE_LOGGED_IN + iscsi_sna_lt(conn-session-max_cmdsn, +conn-session-cmdsn)) { + iscsi_conn_printk(KERN_INFO, conn, + target window closed cmd %u max %u, + conn-session-cmdsn, + conn-session-max_cmdsn); + +#if 0 + /* Window closed, wait for CmdSN update */ + return -EPROTO; +#endif + } + __iscsi_get_task(task); spin_unlock_bh(conn-session-lock); rc = conn-session-tt-xmit_task(task); against the current linux-2.6-iscsi tree. When running my test harness (10 concurrent bonnie runs over 10 iscsi LUNs), I'm seeing these messages: connection2:0: target window closed cmd 61225 max 61224 connection2:0: target window closed cmd 61258 max 61257 connection2:0: target window closed cmd 61291 max 61290 connection2:0: target window closed cmd 61324 max 61323 connection2:0: target window closed cmd 61357 max 61356 connection2:0: target window closed cmd 61413 max 61412 connection1:0: target window closed cmd 61321 max 61320 connection1:0: target window closed cmd 61361 max 61360 connection1:0: target window closed cmd 61401 max 61400 connection1:0: target window closed cmd 61449 max 61448 connection1:0: target window closed cmd 61482 max 61481 connection1:0: target window closed cmd 61531 max 61530 connection1:0: target window closed cmd 61579 max 61578 connection1:0: target window closed cmd 61626 max 61625 connection1:0: target window closed cmd 61674 max 61673 connection1:0: target window closed cmd 61717 max 61716 connection1:0: target window closed cmd 61754 max 61753 connection1:0: target window closed cmd 61802 max 61801 connection1:0: target window closed cmd 61837 max 61836 connection1:0: target window closed cmd 61874 max 61873 connection2:0: target window closed cmd 62205 max 62204 connection2:0: target window closed cmd 62326 max 62325 connection2:0: target window closed cmd 62390 max 62389 connection2:0: target window closed cmd 62486 max 62485 connection2:0: target window closed cmd 62522 max 62521 connection2:0: target window closed cmd 62686 max 62685 connection2:0: target window closed cmd 62813 max 62812 connection2:0: target window closed cmd 62881 max 62880 connection2:0: target window closed cmd 62914 max 62913 So there _is_ a race window. It might be that my This is expected like I said in the other mail, because before iscsi_xmit_task is called the session-cmdsn is incremented by iscsi_prep_scsi_cmd_pdu when sending scsi cmd pdus. For data-outs and nops, then it will probably be higher too, but like I said in the other mail that is ok. iscsi_data_xmit-iscsi_prep_scsi_cmd_pdu iscsi_prep_scsi_cmd_pdu() { hdr-cmdsn = session-cmdsn; session-cmdsn++; } Then we call iscsi_xmit_task() and your check is spitting out and testing the new/updated session-cmdsn. The scsi cmd pdu that gets sent out is going to have a cmdsn that is less than or equal to maxcmdsn. Try printing out the task-cmdsn for scsi cmd pdus. Fsck. You are correct. Testing.
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Hannes Reinecke wrote: Mike Christie wrote: Hannes Reinecke wrote: Mike Christie wrote: On 07/22/2009 12:00 PM, Mike Christie wrote: No, wrong. The check in queuecommand is by no means relevant to the actual window. We're checking the target window at the time queuecommand is run, but we're _generating_ the CmdSN only much later after we've dequeued the command. The check in queuecommand is for scsi cmd pdus and checks the session-queued_cmdsn against the MaxCmdSn. The queued_cmdsn is just a preallocation of CmdSn values. So if the scsi layer sends us X commands, the initiator checks if the window has X spaces open. If it does, then we add the task to the cmdqueue, and increase queued_cmdsn. This is supposed to prevent the scsi layer from sending us too many commands and them sitting in the driver cmdqueue and timing out. The cmdsn is then allocated when we put the cmd on the wire. So the Maybe this is causing some confusion. We do not increment the cmdsn value for immediate commands (how we send a tmf or nop) and for data-out pdus. We only increment it for scsi cmd pdus, so the only place I check for the window having space is in queuecommand. If you are worried about a nop or data out increasing the cmdsn value in one thread and a scsi cmd pdu increasing it in another, then it will not happen. We only this one path to worry about. If it were so easy. I have this patch: diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 716cc34..90905f2 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1212,6 +1212,21 @@ static int iscsi_xmit_task(struct iscsi_conn *conn) struct iscsi_task *task = conn-task; int rc; + /* Check for target CmdSN window */ + if (conn-session-state == ISCSI_STATE_LOGGED_IN + iscsi_sna_lt(conn-session-max_cmdsn, +conn-session-cmdsn)) { + iscsi_conn_printk(KERN_INFO, conn, + target window closed cmd %u max %u, + conn-session-cmdsn, + conn-session-max_cmdsn); + +#if 0 + /* Window closed, wait for CmdSN update */ + return -EPROTO; +#endif + } + __iscsi_get_task(task); spin_unlock_bh(conn-session-lock); rc = conn-session-tt-xmit_task(task); against the current linux-2.6-iscsi tree. When running my test harness (10 concurrent bonnie runs over 10 iscsi LUNs), I'm seeing these messages: connection2:0: target window closed cmd 61225 max 61224 connection2:0: target window closed cmd 61258 max 61257 connection2:0: target window closed cmd 61291 max 61290 connection2:0: target window closed cmd 61324 max 61323 connection2:0: target window closed cmd 61357 max 61356 connection2:0: target window closed cmd 61413 max 61412 connection1:0: target window closed cmd 61321 max 61320 connection1:0: target window closed cmd 61361 max 61360 connection1:0: target window closed cmd 61401 max 61400 connection1:0: target window closed cmd 61449 max 61448 connection1:0: target window closed cmd 61482 max 61481 connection1:0: target window closed cmd 61531 max 61530 connection1:0: target window closed cmd 61579 max 61578 connection1:0: target window closed cmd 61626 max 61625 connection1:0: target window closed cmd 61674 max 61673 connection1:0: target window closed cmd 61717 max 61716 connection1:0: target window closed cmd 61754 max 61753 connection1:0: target window closed cmd 61802 max 61801 connection1:0: target window closed cmd 61837 max 61836 connection1:0: target window closed cmd 61874 max 61873 connection2:0: target window closed cmd 62205 max 62204 connection2:0: target window closed cmd 62326 max 62325 connection2:0: target window closed cmd 62390 max 62389 connection2:0: target window closed cmd 62486 max 62485 connection2:0: target window closed cmd 62522 max 62521 connection2:0: target window closed cmd 62686 max 62685 connection2:0: target window closed cmd 62813 max 62812 connection2:0: target window closed cmd 62881 max 62880 connection2:0: target window closed cmd 62914 max 62913 So there _is_ a race window. It might be that my This is expected like I said in the other mail, because before iscsi_xmit_task is called the session-cmdsn is incremented by iscsi_prep_scsi_cmd_pdu when sending scsi cmd pdus. For data-outs and nops, then it will probably be higher too, but like I said in the other mail that is ok. iscsi_data_xmit-iscsi_prep_scsi_cmd_pdu iscsi_prep_scsi_cmd_pdu() { hdr-cmdsn = session-cmdsn; session-cmdsn++; } Then we call iscsi_xmit_task() and your check is spitting out and testing the new/updated session-cmdsn. The scsi cmd pdu that gets sent out is going to have a cmdsn that is less than or equal to maxcmdsn. Try printing out the task-cmdsn for scsi cmd pdus. Fsck. You are
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Mike Christie wrote: Hannes Reinecke wrote: [ .. ] Fsck. You are correct. But you still might be hitting a problem where the target does not like data-outs when it closed the window. Maybe they interpreted the RFC differently. You should ask the HP target guys for more info. Also your patch might be working because I think it ends up throttling the connection, so IO does not timeout because pipes are backed up (the slow down from the throttling is one of the problems we hit with the patch I did before which was pretty much the same as you posted). I actually had quite good results with it, so it can't be too bad :-) IE the test HPs running continued for over a day, whereas previously it'd stall after some hours. I think I can replicate this problem now too. It was by accident. I am using a EQL target remotely (I am in the middle of the US and the target is on the west coast so there is a good deal of space between us and the connection is slow) and I am seeing the problem where the network layer is just not taking any more data so eventually something times out (if I turn off nops then scsi command timer fires and if I also increase that to 10 minutes then the EQL target will actually send me a nop and I cannot send that because the network layer just keeps returning -AGAIN). Are you still seeing that problem? Basically sendpage/sendmsg just keeps returning -EGAIN. We even get woken up by iscsi_sw_tcp_write_space, but the sk_wmem_queued and sk_sndbuf values are basically stuck and so no space ever opens up for some reason (I attached the debug patch I am using). I tried your patch hoping it might help, but it does not help for this problem here. Maybe it is different issues. It is. There is another issue leading to an I/O stall: when there is a command pending to be retried, ie conn-task is set after we exit xmitworker(), the next incoming PDU will not trigger the xmit thread as the queues might be empty, although a command is still pending. Try this: diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 716cc34..4493cd4 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -133,9 +133,7 @@ iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_ nopin *hdr) * if the window closed with IO queued, then kick the * xmit thread */ - if (!list_empty(session-leadconn-cmdqueue) || - !list_empty(session-leadconn-mgmtqueue)) - iscsi_conn_queue_work(session-leadconn); + iscsi_conn_queue_work(session-leadconn); } } EXPORT_SYMBOL_GPL(iscsi_update_cmdsn); Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
On 07/23/2009 12:24 PM, Mike Christie wrote: But you still might be hitting a problem where the target does not like data-outs when it closed the window. Maybe they interpreted the RFC differently. You should ask the HP target guys for more info. Also your patch might be working because I think it ends up throttling the connection, so IO does not timeout because pipes are backed up (the slow down from the throttling is one of the problems we hit with the patch I did before which was pretty much the same as you posted). I think that what Hannes patch is doing is exactly that off-by-one command. So maybe you are right and it is an HP bug where the window check is off-by-one or they do not like data-outs after window close. Try to compare less-one in queuecommand and see if it helps the same? I think I can replicate this problem now too. It was by accident. I am using a EQL target remotely (I am in the middle of the US and the target is on the west coast so there is a good deal of space between us and the connection is slow) and I am seeing the problem where the network layer is just not taking any more data so eventually something times out (if I turn off nops then scsi command timer fires and if I also increase that to 10 minutes then the EQL target will actually send me a nop and I cannot send that because the network layer just keeps returning -AGAIN). Are you still seeing that problem? Basically sendpage/sendmsg just keeps returning -EGAIN. We even get woken up by iscsi_sw_tcp_write_space, but the sk_wmem_queued and sk_sndbuf values are basically stuck and so no space ever opens up for some reason (I attached the debug patch I am using). I've used in the passed tgt with open-iscsi over an internet connection from Israel to US and it did work. Like a simple mount of ext3 and some read writes. But I've never put it into heavy load. Do you see this problem only on an heavy load or a single long dd will cause it? I tried your patch hoping it might help, but it does not help for this problem here. Maybe it is different issues. Boaz --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Hannes Reinecke wrote: Mike Christie wrote: Hannes Reinecke wrote: [ .. ] Fsck. You are correct. But you still might be hitting a problem where the target does not like data-outs when it closed the window. Maybe they interpreted the RFC differently. You should ask the HP target guys for more info. Also your patch might be working because I think it ends up throttling the connection, so IO does not timeout because pipes are backed up (the slow down from the throttling is one of the problems we hit with the patch I did before which was pretty much the same as you posted). I actually had quite good results with it, so it can't be too bad :-) IE the test HPs running continued for over a day, whereas previously it'd stall after some hours. Yeah, I saw the bz update too. I tested your patch out here localy just to double check that is works like what we had before. With a istor target write throughput goes from 50 MB/s to 15 MB/s. It eventually dies (ping timeout) because the window is closed and it does not open until we finish sending the data-outs for the currently running commands. So READs are just fine. We hit the window closed check but we continue to make progress on the READs because data-ins are processed like normal and the window opens back up as commands are completed. So someone is doing something screwy. I am testing a EQL box locally now and will try some others just to double check. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
On 07/23/2009 07:01 PM, Mike Christie wrote: Boaz Harrosh wrote: I think I can replicate this problem now too. It was by accident. I am using a EQL target remotely (I am in the middle of the US and the target is on the west coast so there is a good deal of space between us and the connection is slow) and I am seeing the problem where the network layer is just not taking any more data so eventually something times out (if I turn off nops then scsi command timer fires and if I also increase that to 10 minutes then the EQL target will actually send me a nop and I cannot send that because the network layer just keeps returning -AGAIN). Are you still seeing that problem? Basically sendpage/sendmsg just keeps returning -EGAIN. We even get woken up by iscsi_sw_tcp_write_space, but the sk_wmem_queued and sk_sndbuf values are basically stuck and so no space ever opens up for some reason (I attached the debug patch I am using). I've used in the passed tgt with open-iscsi over an internet connection from Israel to US and it did work. Like a simple mount of ext3 and some read writes. But I've never put it into heavy load. Do you see this problem only on an heavy load or a single long dd will cause it? Heavy load. The target has a window of 128 commands. I am setting the shost-can_queue to 1024 and the scsi_device queue_depth to 256. I then run multiple: disktest -PT -T130 -h1 -K256 -B256k -ID /dev/sdb -D 0:100 disktest -PT -T130 -h1 -K256 -B256k -ID /dev/sdb -D 0:100 disktest -PT -T130 -h1 -K256 -B256k -ID /dev/sdb -D 0:100 shost-host_busy and turn on debugging I can see the driver running 128 commands. With this the writes just die. sendpage/sendmsg just starts returning EAGAIN. I turned off nops and set the scsi cmnd timer to 1,200. And so we can go 1,200 seconds just getting -EAGAIN. The strange thing is that the network is fine on the recv side. The target is sending me a nop-in as a ping, and we are reading that in fine. Also if I just do READ tests: disktest -PT -T130 -h1 -K256 -B256k -ID /dev/sdb disktest -PT -T130 -h1 -K256 -B256k -ID /dev/sdb disktest -PT -T130 -h1 -K256 -B256k -ID /dev/sdb it works just fine. It is slow as heck. I get less than 1 MB/s throughput, but we always make forward progress. If I just set the scsi_device queue_depth to 1 and run that write workload it works just fine. What if you set scsi_device queue_depth to 126 or can_queue to 126 commandSN window left at 128 ? Boaz --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Mike Christie wrote: Hannes Reinecke wrote: Mike Christie wrote: Hannes Reinecke wrote: [ .. ] Fsck. You are correct. But you still might be hitting a problem where the target does not like data-outs when it closed the window. Maybe they interpreted the RFC differently. You should ask the HP target guys for more info. Also your patch might be working because I think it ends up throttling the connection, so IO does not timeout because pipes are backed up (the slow down from the throttling is one of the problems we hit with the patch I did before which was pretty much the same as you posted). I actually had quite good results with it, so it can't be too bad :-) IE the test HPs running continued for over a day, whereas previously it'd stall after some hours. Yeah, I saw the bz update too. I tested your patch out here localy just to double check that is works like what we had before. With a istor target write throughput goes from 50 MB/s to 15 MB/s. It eventually dies (ping timeout) because the window is closed and it does not open until we finish sending the data-outs for the currently running commands. So READs are just fine. We hit the window closed check but we continue to make progress on the READs because data-ins are processed like normal and the window opens back up as commands are completed. So someone is doing something screwy. I am testing a EQL box locally now and will try some others just to double check. Hannes, could you test without your cmdsn patch, but then set things up so we do not have to send data-out pdus? You want immediate data = Yes. node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 131072 Then either send IOs that are smaller than 131072 or set the /sys/block/sdc/queue/max_hw_sectors to 128 For the target you would want ImmediateData = Yes FirstBurstLength = 131072 MaxRecvDataSegmentLength = 131072 I guess you would want to lower the initiator values to match what the target is going to ask for if you cannot modify the target. I am seeing a problem here when having to send data-outs even when the window is open, throughput dies from 111 to .61 MB/s. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Mike Christie wrote: Mike Christie wrote: Hannes Reinecke wrote: Mike Christie wrote: Hannes Reinecke wrote: [ .. ] Fsck. You are correct. But you still might be hitting a problem where the target does not like data-outs when it closed the window. Maybe they interpreted the RFC differently. You should ask the HP target guys for more info. Also your patch might be working because I think it ends up throttling the connection, so IO does not timeout because pipes are backed up (the slow down from the throttling is one of the problems we hit with the patch I did before which was pretty much the same as you posted). I actually had quite good results with it, so it can't be too bad :-) IE the test HPs running continued for over a day, whereas previously it'd stall after some hours. Yeah, I saw the bz update too. I tested your patch out here localy just to double check that is works like what we had before. With a istor target write throughput goes from 50 MB/s to 15 MB/s. It eventually dies (ping timeout) because the window is closed and it does not open until we finish sending the data-outs for the currently running commands. So READs are just fine. We hit the window closed check but we continue to make progress on the READs because data-ins are processed like normal and the window opens back up as commands are completed. So someone is doing something screwy. I am testing a EQL box locally now and will try some others just to double check. Hannes, could you test without your cmdsn patch, but then set things up so we do not have to send data-out pdus? You want immediate data = Yes. node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 131072 Then either send IOs that are smaller than 131072 or set the /sys/block/sdc/queue/max_hw_sectors to 128 For the target you would want ImmediateData = Yes FirstBurstLength = 131072 MaxRecvDataSegmentLength = 131072 I guess you would want to lower the initiator values to match what the target is going to ask for if you cannot modify the target. I am seeing a problem here when having to send data-outs even when the window is open, throughput dies from 111 to .61 MB/s. It might not be the data-out in response to a R2T. I ran with this: ImmediateData = No InitialR2T = Yes when you login make sure that iscsiadm -m session -P 3 has ImmediateData: No InitialR2T: Yes And now it all works. Before I was barfing with IET and EQL. I could not replicate it on a iStor because it always uses those values. It might be due to a data-out when InitialR2T=yes or if we have sent immediate data with the scsi cmnd and then send a data-out in response to a r2t. will try that next. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Amazingly the RFC says The target MUST silently ignore any non-immediate command outside of this range(...), but it does not say something like the Initiator SHOULD NOT send ... any command out of range... Regards, Ulrich On 21 Jul 2009 at 15:21, Mike Christie wrote: On 07/21/2009 11:35 AM, Boaz Harrosh wrote: On 07/21/2009 03:33 PM, Hannes Reinecke wrote: When sending a PDU we're just increase the CmdSN number without any check for MaxCmdSN. This results in unexpected ping timeouts and even connection stalls. So we should make sure to check CmdSN against MaxCmdSN before transferring a PDU, and just retry until MaxCmdSN has been updated. Signed-off-by: Hannes Reineckeh...@suse.de --- drivers/scsi/libiscsi.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 21ed45f..ffb1338 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1273,6 +1273,19 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) goto again; } + if (conn-session-state == ISCSI_STATE_LOGGED_IN + !iscsi_sna_lte(conn-session-cmdsn, + conn-session-max_cmdsn)) { + /* Window closed, wait for MaxCmdSN update */ + ISCSI_DBG_SESSION(conn-session, +target window closed, +cmd %u max %u\n, +conn-session-cmdsn, +conn-session-max_cmdsn); + spin_unlock_bh(conn-session-lock); + return -ENODATA; + } I am not sure if we should be needing this if the target is operating within the RFC (there is one exception but I am not sure if you are hitting it). In 3.2.2.1, I saw this: An iSCSI target MUST be able to handle at least one immediate task management command and one immediate non-task-management iSCSI command per connection at any time. 3.2.2.1 also has: The target MUST silently ignore any non-immediate command outside of this range or non-immediate duplicates within the range. The CmdSN carried by immediate commands may lie outside the ExpCmdSN to MaxCmdSN range. I took this to mean even when the window is closed we can send a nop as immediate. What do you guys think? The initiator will only send 1 TMF as immediate per session at a time and it will only send one nop as a ping marked as immediate at a time. The only exception to use sending more than one non tmf immediate cmd is if the target sends us a nop-in we could have sent two nop-outs marked as immediate (for the nop-out in response to the target's nop-in, 10.18.1 says we have to set the I bit). If we send too many nops marked as immediate we should be getting a reject pdu right? If so then I think we just need the attached patch which adds some code to handle rejected immediate pdus. The patch is made over scsi-rc-fixes and is only compile tested. Are you only seeing this with the one target? Could we confirm with them that they will accept one non tmf immediate command? If I am reading the RFC wrong, then for your patch, we want to move the check to below the check_mgmt label because iscsi_data_xmit can send multiple pdus. You probably just want to move it to iscsi_prep_mgmt_task(). Also I think we want to dequeue a nop as a ping so it does not timeout while the cmd window is closed (the problem would be is if the window was closed and then the connection goes bad - we would not be able to catch that). + Looks good, From the time queuecommand did the check (iscsi_check_cmdsn_window_closed) a management command came in without checking and stuffed an entry into the task queue. Good catch. Just nit picking you could use iscsi_check_cmdsn_window_closed() above just for style optimization. And also you might want to add the below cleanup. One question though. Do we want to return -ENODATA which means we get woken up only at next command submission, or we want goto again to wake up after a while for retry? I'd say the later since it might be the last command in a set and We do the same thing for the goto if rc == -ENODATA and if you return -ENODATA. If the cmd window opens iscsi_update_cmdsn will call queue_work, so the xmit thread is woken up. Or if mem/space opens up in the socket, iscsi_tcp will call queue_work to wake it, or if a new command comes in that will also kick the thread. The only time we would retry immediately is if -EAGAIN is returned from iscsi_data_xmit. The goto again is probably better so it is consistent with the other code. then we get sleepy with queues full. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Mike Christie wrote: [ .. ] I am not sure if we should be needing this if the target is operating within the RFC (there is one exception but I am not sure if you are hitting it). In 3.2.2.1, I saw this: An iSCSI target MUST be able to handle at least one immediate task management command and one immediate non-task-management iSCSI command per connection at any time. 3.2.2.1 also has: The target MUST silently ignore any non-immediate command outside of this range or non-immediate duplicates within the range. The CmdSN carried by immediate commands may lie outside the ExpCmdSN to MaxCmdSN range. I took this to mean even when the window is closed we can send a nop as immediate. What do you guys think? Totally beside the point. We're not sending NOPs outside the CmdSN range, we're sending _data_ PDUs outside the CmdSN range. Just make it a printk in the above patch and start hitting the target hard. You'll see an amazing number of messages ... There is _nothing_ in the code which checks if the data PDU we're about to send has a CmdSN within the target window. And then we're hitting the quoted text and the target drops the PDU, leading to a nice I/O stall. Which is the I/O stall I'm fighting with since _months_. The initiator will only send 1 TMF as immediate per session at a time and it will only send one nop as a ping marked as immediate at a time. The only exception to use sending more than one non tmf immediate cmd is if the target sends us a nop-in we could have sent two nop-outs marked as immediate (for the nop-out in response to the target's nop-in, 10.18.1 says we have to set the I bit). If we send too many nops marked as immediate we should be getting a reject pdu right? If so then I think we just need the attached patch which adds some code to handle rejected immediate pdus. The patch is made over scsi-rc-fixes and is only compile tested. Are you only seeing this with the one target? Could we confirm with them that they will accept one non tmf immediate command? If I am reading the RFC wrong, then for your patch, we want to move the check to below the check_mgmt label because iscsi_data_xmit can send multiple pdus. You probably just want to move it to iscsi_prep_mgmt_task(). Also I think we want to dequeue a nop as a ping so it does not timeout while the cmd window is closed (the problem would be is if the window was closed and then the connection goes bad - we would not be able to catch that). Yes, you are probably correct in that we'd need to move it into the individual queue loops to be able to transmit as many PDUs as possible. With the original patch we're running into the risk of hitting the same error when enough PDUs are queued. I'll update the patch. + Looks good, From the time queuecommand did the check (iscsi_check_cmdsn_window_closed) a management command came in without checking and stuffed an entry into the task queue. Good catch. No, wrong. The check in queuecommand is by no means relevant to the actual window. We're checking the target window at the time queuecommand is run, but we're _generating_ the CmdSN only much later after we've dequeued the command. And it's quite feasible to flood the xmitqueue with more commands than can be transmitted, so the CmdSN window won't be updated for a long time. In addition we're not incrementing the This allows us to stuff quite a few commands in the xmitqueue. You can easily check this by eg doing a journal replay. That puts out quite a lot of I/O in a short time: Jul 22 12:29:46 esk kernel: [ 2164.102874] connection1:0: cmd target window closed, cmd 5319 max 5318 Jul 22 12:29:46 esk kernel: [ 2164.138952] connection1:0: cmd target window closed, cmd 5339 max 5338 Jul 22 12:29:46 esk kernel: [ 2164.177920] connection1:0: cmd target window closed, cmd 5362 max 5361 Jul 22 12:29:46 esk kernel: [ 2164.213620] connection1:0: cmd target window closed, cmd 5382 max 5381 Jul 22 12:29:46 esk kernel: [ 2164.251724] connection1:0: cmd target window closed, cmd 5402 max 5401 Jul 22 12:29:46 esk kernel: [ 2154.265954] connection2:0: cmd target window closed, cmd 5283 max 5282 Jul 22 12:29:46 esk kernel: [ 2164.298269] connection1:0: cmd target window closed, cmd 5445 max 5444 Jul 22 12:29:46 esk kernel: [ 2164.298380] connection1:0: cmd target window closed, cmd 5446 max 5445 Jul 22 12:29:46 esk kernel: [ 2164.298757] connection1:0: cmd target window closed, cmd 5447 max 5446 Jul 22 12:29:46 esk kernel: [ 2164.299374] connection1:0: cmd target window closed, cmd 5448 max 5447 Jul 22 12:29:46 esk kernel: [ 2164.299971] connection1:0: cmd target window closed, cmd 5449 max 5448 Jul 22 12:29:46 esk kernel: [ 2164.300717] connection1:0: cmd target window closed, cmd 5450 max 5449 Jul 22 12:29:46 esk kernel: [ 2164.301455] connection1:0: cmd target window closed, cmd 5451 max 5450 Jul 22 12:29:46 esk kernel: [ 2164.302187] connection1:0: cmd target window
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Hannes Reinecke wrote: Mike Christie wrote: [ .. ] I am not sure if we should be needing this if the target is operating within the RFC (there is one exception but I am not sure if you are hitting it). In 3.2.2.1, I saw this: An iSCSI target MUST be able to handle at least one immediate task management command and one immediate non-task-management iSCSI command per connection at any time. 3.2.2.1 also has: The target MUST silently ignore any non-immediate command outside of this range or non-immediate duplicates within the range. The CmdSN carried by immediate commands may lie outside the ExpCmdSN to MaxCmdSN range. I took this to mean even when the window is closed we can send a nop as immediate. What do you guys think? Totally beside the point. We're not sending NOPs outside the CmdSN range, we're sending _data_ PDUs outside the CmdSN range. Just make it a printk in the above patch and start hitting the target hard. You'll see an amazing number of messages ... There is _nothing_ in the code which checks if the data PDU we're about to send has a CmdSN within the target window. And then we're hitting the quoted text and the target drops the PDU, leading to a nice I/O stall. Which is the I/O stall I'm fighting with since _months_. The initiator will only send 1 TMF as immediate per session at a time and it will only send one nop as a ping marked as immediate at a time. The only exception to use sending more than one non tmf immediate cmd is if the target sends us a nop-in we could have sent two nop-outs marked as immediate (for the nop-out in response to the target's nop-in, 10.18.1 says we have to set the I bit). If we send too many nops marked as immediate we should be getting a reject pdu right? If so then I think we just need the attached patch which adds some code to handle rejected immediate pdus. The patch is made over scsi-rc-fixes and is only compile tested. Are you only seeing this with the one target? Could we confirm with them that they will accept one non tmf immediate command? If I am reading the RFC wrong, then for your patch, we want to move the check to below the check_mgmt label because iscsi_data_xmit can send multiple pdus. You probably just want to move it to iscsi_prep_mgmt_task(). Also I think we want to dequeue a nop as a ping so it does not timeout while the cmd window is closed (the problem would be is if the window was closed and then the connection goes bad - we would not be able to catch that). Yes, you are probably correct in that we'd need to move it into the individual queue loops to be able to transmit as many PDUs as possible. With the original patch we're running into the risk of hitting the same error when enough PDUs are queued. I'll update the patch. So, this looks far better: diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 21ed45f..8303676 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1212,6 +1212,13 @@ static int iscsi_xmit_task(struct iscsi_conn *conn) struct iscsi_task *task = conn-task; int rc; + /* Check for target CmdSN window */ + if (conn-session-state == ISCSI_STATE_LOGGED_IN + iscsi_sna_lt(conn-session-max_cmdsn, +conn-session-cmdsn)) + /* Window closed, wait for CmdSN update */ + return -EPROTO; + __iscsi_get_task(task); spin_unlock_bh(conn-session-lock); rc = conn-session-tt-xmit_task(task); There is one issue still outstanding: Even with this patch the iscsi stack feels it necessary to send NOPs: Jul 22 14:03:40 esk kernel: [ 485.274672] connection2:0: running ctask itt 90 rc -71 Jul 22 14:03:40 esk kernel: [ 485.297249] connection2:0: running ctask itt 8 rc -71 Jul 22 14:03:40 esk kernel: [ 547.508673] connection1:0: running ctask itt 46 rc -71 Jul 22 14:03:41 esk kernel: [ 548.155353] connection1:0: running ctask itt 85 rc -71 Jul 22 14:03:45 esk kernel: [ 550.291112] connection2:0: Sending nopout,cmd 283450 max 283512 exp 283449 Jul 22 14:03:45 esk kernel: [ 550.294583] connection2:0: mgmtpdu [itt xa09 p 81007a1ffa40] queued Jul 22 14:03:45 esk kernel: [ 487.595817] connection2:0: mgmtpdu [op 0x0 hdr-itt 0xa09 datalen 0 cmdsn 283450/283450] Jul 22 14:03:45 esk kernel: [ 487.598659] connection2:0: mgmtpdu [itt 0xa09 p 81007a1ffa40] done Jul 22 14:03:45 esk kernel: [ 539.429005] connection2:0: mgmtpdu [itt xa09 p 81007a1ffa40] finished Jul 22 14:03:45 esk kernel: [ 487.603722] connection2:0: running ctask itt 82 rc -71 Jul 22 14:03:45 esk kernel: [ 487.604826] connection2:0: running ctask itt 38 rc -71 And a wireshark trace reveals that indeed there is a network hickup before the NOP is sent during which time simply _nothing_ happens on the line. Well, not between the target and the initiator; can't tell about the overall load. (By way of clarification: 'running ctask
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
On 07/22/2009 05:57 AM, Hannes Reinecke wrote: Mike Christie wrote: [ .. ] I am not sure if we should be needing this if the target is operating within the RFC (there is one exception but I am not sure if you are hitting it). In 3.2.2.1, I saw this: An iSCSI target MUST be able to handle at least one immediate task management command and one immediate non-task-management iSCSI command per connection at any time. 3.2.2.1 also has: The target MUST silently ignore any non-immediate command outside of this range or non-immediate duplicates within the range. The CmdSN carried by immediate commands may lie outside the ExpCmdSN to MaxCmdSN range. I took this to mean even when the window is closed we can send a nop as immediate. What do you guys think? Totally beside the point. I do not get that. In the original mail you wrote: When sending a PDU we're just increase the CmdSN number without any check for MaxCmdSN. This results in unexpected ping timeouts and even connection stalls. I thought you were saying that we got ping timeouts because we sent a nop out side of the window. How does a data-out or scsi cmnd pdu getting sent out of the window cause a ping timeout? The target should just drop the data-out or scsi cmnd pdu and process the nop, right? We're not sending NOPs outside the CmdSN range, we're sending _data_ PDUs outside the CmdSN range. Just make it a printk in the above patch and start hitting the target hard. You'll see an amazing number of messages ... Do you mean data-out pdus or scsi cmd pdus? You do not check the window for a data-out do you? It does not have a higher cmdsn then the cmd that started it (data-out pdus do not have a cmdsn field and are not starting a new command). I accidentally did check the cmdsn for data-outs and it lead to lots of bug reports. I accidentally added the check here http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=77a23c21aaa723f6b0ffc4a701be8c8e5a32346d and then reverted it here: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e07264071f7f2b02a2973cb28d9fdf5eb8866cc1 Your old patch and new patch are checking the cmdsn window for data-outs and for commands like nops and tmfs that marked immediate which I do not think is right. There is _nothing_ in the code which checks if the data PDU we're about to send has a CmdSN within the target window. And then we're hitting the quoted text and the target drops the PDU, leading to a nice I/O stall. Which is the I/O stall I'm fighting with since _months_. The initiator will only send 1 TMF as immediate per session at a time and it will only send one nop as a ping marked as immediate at a time. The only exception to use sending more than one non tmf immediate cmd is if the target sends us a nop-in we could have sent two nop-outs marked as immediate (for the nop-out in response to the target's nop-in, 10.18.1 says we have to set the I bit). If we send too many nops marked as immediate we should be getting a reject pdu right? If so then I think we just need the attached patch which adds some code to handle rejected immediate pdus. The patch is made over scsi-rc-fixes and is only compile tested. Are you only seeing this with the one target? Could we confirm with them that they will accept one non tmf immediate command? If I am reading the RFC wrong, then for your patch, we want to move the check to below the check_mgmt label because iscsi_data_xmit can send multiple pdus. You probably just want to move it to iscsi_prep_mgmt_task(). Also I think we want to dequeue a nop as a ping so it does not timeout while the cmd window is closed (the problem would be is if the window was closed and then the connection goes bad - we would not be able to catch that). Yes, you are probably correct in that we'd need to move it into the individual queue loops to be able to transmit as many PDUs as possible. With the original patch we're running into the risk of hitting the same error when enough PDUs are queued. I'll update the patch. + Looks good, From the time queuecommand did the check (iscsi_check_cmdsn_window_closed) a management command came in without checking and stuffed an entry into the task queue. Good catch. No, wrong. The check in queuecommand is by no means relevant to the actual window. We're checking the target window at the time queuecommand is run, but we're _generating_ the CmdSN only much later after we've dequeued the command. The check in queuecommand is for scsi cmd pdus and checks the session-queued_cmdsn against the MaxCmdSn. The queued_cmdsn is just a preallocation of CmdSn values. So if the scsi layer sends us X commands, the initiator checks if the window has X spaces open. If it does, then we add the task to the cmdqueue, and increase queued_cmdsn. This is supposed to prevent the scsi layer from sending us too many commands and them sitting in the
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
On 07/22/2009 12:00 PM, Mike Christie wrote: No, wrong. The check in queuecommand is by no means relevant to the actual window. We're checking the target window at the time queuecommand is run, but we're _generating_ the CmdSN only much later after we've dequeued the command. The check in queuecommand is for scsi cmd pdus and checks the session-queued_cmdsn against the MaxCmdSn. The queued_cmdsn is just a preallocation of CmdSn values. So if the scsi layer sends us X commands, the initiator checks if the window has X spaces open. If it does, then we add the task to the cmdqueue, and increase queued_cmdsn. This is supposed to prevent the scsi layer from sending us too many commands and them sitting in the driver cmdqueue and timing out. The cmdsn is then allocated when we put the cmd on the wire. So the Maybe this is causing some confusion. We do not increment the cmdsn value for immediate commands (how we send a tmf or nop) and for data-out pdus. We only increment it for scsi cmd pdus, so the only place I check for the window having space is in queuecommand. If you are worried about a nop or data out increasing the cmdsn value in one thread and a scsi cmd pdu increasing it in another, then it will not happen. We only this one path to worry about. session-CmdSn should always be lower than the MaxCmdSn value (as long as the MaxCmdSn value does not suddenly decrease on us (it cannot do that can it?)), because the session-queued_cmdsn value is always lower than the MaxCmdSn. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
On 07/22/2009 12:24 PM, Mike Christie wrote: On 07/22/2009 12:00 PM, Mike Christie wrote: No, wrong. The check in queuecommand is by no means relevant to the actual window. We're checking the target window at the time queuecommand is run, but we're _generating_ the CmdSN only much later after we've dequeued the command. The check in queuecommand is for scsi cmd pdus and checks the session-queued_cmdsn against the MaxCmdSn. The queued_cmdsn is just a preallocation of CmdSn values. So if the scsi layer sends us X commands, the initiator checks if the window has X spaces open. If it does, then we add the task to the cmdqueue, and increase queued_cmdsn. This is supposed to prevent the scsi layer from sending us too many commands and them sitting in the driver cmdqueue and timing out. The cmdsn is then allocated when we put the cmd on the wire. So the Maybe this is causing some confusion. We do not increment the cmdsn value for immediate commands (how we send a tmf or nop) and for data-out pdus. We only increment it for scsi cmd pdus, so the only place I check for the window having space is in queuecommand. If you are worried about a nop or data out increasing the cmdsn value in one thread and a scsi cmd pdu increasing it in another, then it will not happen. We only this one path to worry about. session-CmdSn should always be lower than the MaxCmdSn value (as long as the MaxCmdSn value does not suddenly decrease on us (it cannot do Ah, I guess it does not have a MUST NOT decrease in the RFC. There is this though: - If the PDU MaxCmdSN is greater than the local MaxCmdSN (in Serial Arithmetic Sense), it updates the local MaxCmdSN; otherwise, it is ignored. If the target did send a MaxCmdSN that is smaller than our local then we are not going to update our value. The target would then drop the scsi cmd pdu. The scsi cmd timeout for that cmd will eventually fire and we will go through that. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
On 07/22/2009 12:00 PM, Mike Christie wrote: No, wrong. The check in queuecommand is by no means relevant to the actual window. We're checking the target window at the time queuecommand is run, but we're _generating_ the CmdSN only much later after we've dequeued the command. The check in queuecommand is for scsi cmd pdus and checks the session-queued_cmdsn against the MaxCmdSn. The queued_cmdsn is just a preallocation of CmdSn values. So if the scsi layer sends us X commands, the initiator checks if the window has X spaces open. If it does, then we add the task to the cmdqueue, and increase queued_cmdsn. This is supposed to prevent the scsi layer from sending us too many commands and them sitting in the driver cmdqueue and timing out. The cmdsn is then allocated when we put the cmd on the wire. So the session-CmdSn should always be lower than the MaxCmdSn value (as long as the MaxCmdSn value does not suddenly decrease on us (it cannot do that can it?)), because the session-queued_cmdsn value is always lower than the MaxCmdSn. Here is an example of what I am seeing when I run the code. - Login done and we are in FFP. iscsi_conn_start() { session-queued_cmdsn = session-cmdsn; } (here session-cmdsn is probably 1 and MaxCmdSn we will say is 4) - The scsi layer then sends a scsi command. queuecommand checks queued_cmdsn(1) MaxCmdSn(4). It is ok so it gets incremnted to queued_cmdsn = 2. The queuecommand does queue_work to wake the xmit thread. - Let's say the scsi layer is really quick. And calls the queuecommand again. queuecommand checks queued_cmdsn(2) MaxCmdSn(4). It is ok so it gets incremnted to queued_cmdsn = 3. - Let's say the scsi layer is really quick. And calls the queuecommand again. queuecommand checks queued_cmdsn(3) MaxCmdSn(4). It is ok so gets incremnted to queued_cmdsn = 4. - Let's say the scsi layer is really quick. And calls the queuecommand again. queuecommand checks queued_cmdsn(4) MaxCmdSn(4). Window is not open so queuecommand requeus command. - Now the xmit thread finally wakes up. It takes a task from the cmdqueue, and preps it. This does hdr-cmdsn = session-cmdsn; So here hdr-cmdsn and session-cmdsn is 1; Then we do session-cmdsn++ so the session-cmdsn is now 2; - we loop in the xmit thread and it takes a task from the cmdqueue, and preps it. This does hdr-cmdsn = session-cmdsn; So here hdr-cmdsn and session-cmdsn is 2; Then we do session-cmdsn++ so the session-cmdsn is now 3; - we loop in the xmit thread and it takes a task from the cmdqueue, and preps it. This does hdr-cmdsn = session-cmdsn; So here hdr-cmdsn and session-cmdsn is 3; Then we do session-cmdsn++ so the session-cmdsn is now 4; - We now return from iscsi_data_xmit because there was only 3 cmds queued and there is nothing left on the cmdqueue list. - Now when the the window opens and iscsi_update_cmdsn will update the session-max_cmdsn. At this point we should call something like scsi_run_queue but we don't. We wait for a scsi cmd to complete and then the scsi_done/scsi_io_completion completion path will run the scsi queues again. And so we start again, but with a higher MaxCmdSn. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
On 07/22/2009 03:01 PM, Mike Christie wrote: On 07/22/2009 12:00 PM, Mike Christie wrote: No, wrong. The check in queuecommand is by no means relevant to the actual window. We're checking the target window at the time queuecommand is run, but we're _generating_ the CmdSN only much later after we've dequeued the command. The check in queuecommand is for scsi cmd pdus and checks the session-queued_cmdsn against the MaxCmdSn. The queued_cmdsn is just a preallocation of CmdSn values. So if the scsi layer sends us X commands, the initiator checks if the window has X spaces open. If it does, then we add the task to the cmdqueue, and increase queued_cmdsn. This is supposed to prevent the scsi layer from sending us too many commands and them sitting in the driver cmdqueue and timing out. The cmdsn is then allocated when we put the cmd on the wire. So the session-CmdSn should always be lower than the MaxCmdSn value (as long as the MaxCmdSn value does not suddenly decrease on us (it cannot do that can it?)), because the session-queued_cmdsn value is always lower than the MaxCmdSn. Here is an example of what I am seeing when I run the code. - Login done and we are in FFP. iscsi_conn_start() { session-queued_cmdsn = session-cmdsn; } (here session-cmdsn is probably 1 and MaxCmdSn we will say is 4) - The scsi layer then sends a scsi command. queuecommand checks queued_cmdsn(1) MaxCmdSn(4). It is ok so it gets incremnted to queued_cmdsn = 2. The queuecommand does queue_work to wake the xmit thread. - Let's say the scsi layer is really quick. And calls the queuecommand again. queuecommand checks queued_cmdsn(2) MaxCmdSn(4). It is ok so it gets incremnted to queued_cmdsn = 3. - Let's say the scsi layer is really quick. And calls the queuecommand again. queuecommand checks queued_cmdsn(3) MaxCmdSn(4). It is ok so gets incremnted to queued_cmdsn = 4. - Let's say the scsi layer is really quick. And calls the queuecommand again. queuecommand checks queued_cmdsn(4) MaxCmdSn(4). Window is not open so queuecommand requeus command. Oh yeah, I messed up when I took this from my log and tried to write it out nicer below. The window check is for less than or equal to so this would be queued and we would have 4 tasks queued and the 5th would hit the window check. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
[PATCH] libiscsi: Check for CmdSN window before sending data
When sending a PDU we're just increase the CmdSN number without any check for MaxCmdSN. This results in unexpected ping timeouts and even connection stalls. So we should make sure to check CmdSN against MaxCmdSN before transferring a PDU, and just retry until MaxCmdSN has been updated. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/scsi/libiscsi.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 21ed45f..ffb1338 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1273,6 +1273,19 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) goto again; } + if (conn-session-state == ISCSI_STATE_LOGGED_IN + !iscsi_sna_lte(conn-session-cmdsn, + conn-session-max_cmdsn)) { + /* Window closed, wait for MaxCmdSN update */ + ISCSI_DBG_SESSION(conn-session, + target window closed, + cmd %u max %u\n, + conn-session-cmdsn, + conn-session-max_cmdsn); + spin_unlock_bh(conn-session-lock); + return -ENODATA; + } + /* * process mgmt pdus like nops before commands since we should * only have one nop-out as a ping from us and targets should not -- 1.5.3.2 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
On 07/21/2009 11:35 AM, Boaz Harrosh wrote: On 07/21/2009 03:33 PM, Hannes Reinecke wrote: When sending a PDU we're just increase the CmdSN number without any check for MaxCmdSN. This results in unexpected ping timeouts and even connection stalls. So we should make sure to check CmdSN against MaxCmdSN before transferring a PDU, and just retry until MaxCmdSN has been updated. Signed-off-by: Hannes Reineckeh...@suse.de --- drivers/scsi/libiscsi.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 21ed45f..ffb1338 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1273,6 +1273,19 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) goto again; } +if (conn-session-state == ISCSI_STATE_LOGGED_IN +!iscsi_sna_lte(conn-session-cmdsn, + conn-session-max_cmdsn)) { +/* Window closed, wait for MaxCmdSN update */ +ISCSI_DBG_SESSION(conn-session, + target window closed, + cmd %u max %u\n, + conn-session-cmdsn, + conn-session-max_cmdsn); +spin_unlock_bh(conn-session-lock); +return -ENODATA; +} I am not sure if we should be needing this if the target is operating within the RFC (there is one exception but I am not sure if you are hitting it). In 3.2.2.1, I saw this: An iSCSI target MUST be able to handle at least one immediate task management command and one immediate non-task-management iSCSI command per connection at any time. 3.2.2.1 also has: The target MUST silently ignore any non-immediate command outside of this range or non-immediate duplicates within the range. The CmdSN carried by immediate commands may lie outside the ExpCmdSN to MaxCmdSN range. I took this to mean even when the window is closed we can send a nop as immediate. What do you guys think? The initiator will only send 1 TMF as immediate per session at a time and it will only send one nop as a ping marked as immediate at a time. The only exception to use sending more than one non tmf immediate cmd is if the target sends us a nop-in we could have sent two nop-outs marked as immediate (for the nop-out in response to the target's nop-in, 10.18.1 says we have to set the I bit). If we send too many nops marked as immediate we should be getting a reject pdu right? If so then I think we just need the attached patch which adds some code to handle rejected immediate pdus. The patch is made over scsi-rc-fixes and is only compile tested. Are you only seeing this with the one target? Could we confirm with them that they will accept one non tmf immediate command? If I am reading the RFC wrong, then for your patch, we want to move the check to below the check_mgmt label because iscsi_data_xmit can send multiple pdus. You probably just want to move it to iscsi_prep_mgmt_task(). Also I think we want to dequeue a nop as a ping so it does not timeout while the cmd window is closed (the problem would be is if the window was closed and then the connection goes bad - we would not be able to catch that). + Looks good, From the time queuecommand did the check (iscsi_check_cmdsn_window_closed) a management command came in without checking and stuffed an entry into the task queue. Good catch. Just nit picking you could use iscsi_check_cmdsn_window_closed() above just for style optimization. And also you might want to add the below cleanup. One question though. Do we want to return -ENODATA which means we get woken up only at next command submission, or we want goto again to wake up after a while for retry? I'd say the later since it might be the last command in a set and We do the same thing for the goto if rc == -ENODATA and if you return -ENODATA. If the cmd window opens iscsi_update_cmdsn will call queue_work, so the xmit thread is woken up. Or if mem/space opens up in the socket, iscsi_tcp will call queue_work to wake it, or if a new command comes in that will also kick the thread. The only time we would retry immediately is if -EAGAIN is returned from iscsi_data_xmit. The goto again is probably better so it is consistent with the other code. then we get sleepy with queues full. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~--- diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 716cc34..7ea8b00 100644 --- a/drivers/scsi/libiscsi.c
Re: [PATCH] libiscsi: Check for CmdSN window before sending data
Mike Christie wrote: If we send too many nops marked as immediate we should be getting a reject pdu right? If so then I think we just need the attached patch which adds some code to handle rejected immediate pdus. The patch is made over scsi-rc-fixes and is only compile tested. I tested the patch and it should fix the problem where the connection was dropped because a nop was rejected and then we dropped the reject. I only tested the case where we sent too many nops as pings (I modified libiscsi to send a nop every second even if there were nops in flight already. I got floods of: connection1:0: pdu (op 0x455 itt 0x0) rejected. Too many immediate commands. but the connect did not get dropped from our side because we did not get a nop in. I did not test the case where the nop-out that was rejected was in response to a nop-in from the target. I could not replicate that one. But we would not have dropped the session in that case, so I do not think you were hitting that. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---