Re: LUN Reset TMF and R2T
Steven Hayter wrote: > On 28/07/2009 06:14 pm, Mike Christie wrote: >> On 07/28/2009 06:53 AM, Hannes Reinecke wrote: >>> Hi all, >>> >>> when my device-reset testcase I've come across this: >>> >>> Jul 28 12:46:08 tyne kernel: session1: iscsi_eh_device_reset LU Reset [sc >>> 8800731e9480 lun 6] >>> Jul 28 12:46:08 tyne kernel: session1: iscsi_exec_task_mgmt_fn tmf set >>> timeout >>> Jul 28 12:46:08 tyne kernel: session1: mgmtpdu [op 0x2 hdr->itt 0x69 >>> datalen 0] >>> Jul 28 12:46:08 tyne kernel: connection1:0: mgmtpdu [itt 0x69 task >>> 88007b022800] xmit >>> Jul 28 12:46:08 tyne kernel: connection1:0: tmf rsp [itt 0x69] response 0 >>> state 1 >>> Jul 28 12:46:08 tyne kernel: session1: iscsi_suspend_tx suspend Tx >>> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc >>> 88006fd20380 itt 0x54 state 3 >>> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc >>> 88006fd20380 lun 6 itt x54] state 3 >>> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc >>> 88007119b880 itt 0x5d state 3 >>> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc >>> 88007119b880 lun 6 itt x5d] state 3 >>> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc >>> 88007116ec80 itt 0x60 state 3 >>> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc >>> 88007116ec80 lun 6 itt x60] state 3 >>> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc >>> 880079dd8180 itt 0x61 state 3 >>> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc >>> 880079dd8180 lun 6 itt x61] state 3 >>> Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x5d in R2T hdr >>> Jul 28 12:46:08 tyne kernel: session1: iscsi_start_tx resume Tx >>> Jul 28 12:46:08 tyne kernel: session1: iscsi_eh_device_reset dev reset >>> result = SUCCESS >>> Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x60 in R2T hdr >>> Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x61 in R2T hdr >>> >>> As you can see, we're receiving R2Ts for tasks we've just aborted :-( >>> >>> Looking closely, I don't _actually_ think the we've received them >>> out-of-order (which would be >>> a violation of the RFC). The problem seems to be our skb handling (again): >>> >>> We're reading an skb, and call the handler function once the PDU is ready. >>> However, we're _not_ >>> checking if there is more data to be read from the socket. >>> So it looks to me as if we're first reading the TMF response, aborting all >>> tasks, and then >>> continue reading PDUs for tasks which we just aborted. >> We will definately do this. You mean the target sends a tmf response >> that indicates it cleaned up some tasks, then it sends pdus for the >> tasks that should have been affected by the TMF, right? If so I do not >> think targets are allowed to do this. In 3.5.1.4 we have: >> >> After the Task Management response indicates Task Management function >> completion, the initiator will not receive any additional responses >> from the affected tasks. >> >> "additional responses" means scsi response pdus and data-in with status, >> right? Does it also mean R2Ts? I thought it did, so we will just drop >> the session when getting all those pdus we thought the target should not >> be sending. >> >> If "additional responses" does not mean R2Ts, then what are we supposed >> to do? Handle them? Silently drop them? I could not find anything in the >> RFC. >> >> The nasty problem with the code and this scenario is that we preallcoate >> the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and >> cleans up the tasks, the scsi layer can start sending us commands. We >> could then allocate a task/itt that was used before and should have been >> cleaned up. The target could then send us pdus for the cleaned up >> task/itt while we are using the task/itt for a new command. Then Kablewly. > > It does look confusing, I think RFC 5048, Section 4.1.2. "Clarified > Multi-Task Abort Semantics", gives guidelines as to what should happen. > > Every way read it, the target shouldn't be sending R2Ts for tasks which > are part of the affected task set. (those equal or exceeding the CmdSN > of the reset TMF). But I've been wrong in the past. > Nevermind, found the reason. Totally different story, but we've been the culprit nevertheless. iscsi_xmit_task() runs in a loop, disregarding any TMF state. So we will happily continue sending R2T transfers even though the LU Reset has already finished. Patch to follow. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this g
Re: LUN Reset TMF and R2T
On 28/07/2009 06:14 pm, Mike Christie wrote: > On 07/28/2009 06:53 AM, Hannes Reinecke wrote: >> Hi all, >> >> when my device-reset testcase I've come across this: >> >> Jul 28 12:46:08 tyne kernel: session1: iscsi_eh_device_reset LU Reset [sc >> 8800731e9480 lun 6] >> Jul 28 12:46:08 tyne kernel: session1: iscsi_exec_task_mgmt_fn tmf set >> timeout >> Jul 28 12:46:08 tyne kernel: session1: mgmtpdu [op 0x2 hdr->itt 0x69 >> datalen 0] >> Jul 28 12:46:08 tyne kernel: connection1:0: mgmtpdu [itt 0x69 task >> 88007b022800] xmit >> Jul 28 12:46:08 tyne kernel: connection1:0: tmf rsp [itt 0x69] response 0 >> state 1 >> Jul 28 12:46:08 tyne kernel: session1: iscsi_suspend_tx suspend Tx >> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc >> 88006fd20380 itt 0x54 state 3 >> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc >> 88006fd20380 lun 6 itt x54] state 3 >> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc >> 88007119b880 itt 0x5d state 3 >> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc >> 88007119b880 lun 6 itt x5d] state 3 >> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc >> 88007116ec80 itt 0x60 state 3 >> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc >> 88007116ec80 lun 6 itt x60] state 3 >> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc >> 880079dd8180 itt 0x61 state 3 >> Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc >> 880079dd8180 lun 6 itt x61] state 3 >> Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x5d in R2T hdr >> Jul 28 12:46:08 tyne kernel: session1: iscsi_start_tx resume Tx >> Jul 28 12:46:08 tyne kernel: session1: iscsi_eh_device_reset dev reset >> result = SUCCESS >> Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x60 in R2T hdr >> Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x61 in R2T hdr >> >> As you can see, we're receiving R2Ts for tasks we've just aborted :-( >> >> Looking closely, I don't _actually_ think the we've received them >> out-of-order (which would be >> a violation of the RFC). The problem seems to be our skb handling (again): >> >> We're reading an skb, and call the handler function once the PDU is ready. >> However, we're _not_ >> checking if there is more data to be read from the socket. >> So it looks to me as if we're first reading the TMF response, aborting all >> tasks, and then >> continue reading PDUs for tasks which we just aborted. > > We will definately do this. You mean the target sends a tmf response > that indicates it cleaned up some tasks, then it sends pdus for the > tasks that should have been affected by the TMF, right? If so I do not > think targets are allowed to do this. In 3.5.1.4 we have: > > After the Task Management response indicates Task Management function > completion, the initiator will not receive any additional responses > from the affected tasks. > > "additional responses" means scsi response pdus and data-in with status, > right? Does it also mean R2Ts? I thought it did, so we will just drop > the session when getting all those pdus we thought the target should not > be sending. > > If "additional responses" does not mean R2Ts, then what are we supposed > to do? Handle them? Silently drop them? I could not find anything in the > RFC. > > The nasty problem with the code and this scenario is that we preallcoate > the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and > cleans up the tasks, the scsi layer can start sending us commands. We > could then allocate a task/itt that was used before and should have been > cleaned up. The target could then send us pdus for the cleaned up > task/itt while we are using the task/itt for a new command. Then Kablewly. It does look confusing, I think RFC 5048, Section 4.1.2. "Clarified Multi-Task Abort Semantics", gives guidelines as to what should happen. Every way read it, the target shouldn't be sending R2Ts for tasks which are part of the affected task set. (those equal or exceeding the CmdSN of the reset TMF). But I've been wrong in the past. Steve --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: LUN Reset TMF and R2T
Mike Christie wrote: > > The nasty problem with the code and this scenario is that we preallcoate > the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and > cleans up the tasks, the scsi layer can start sending us commands. We > could then allocate a task/itt that was used before and should have been > cleaned up. The target could then send us pdus for the cleaned up > task/itt while we are using the task/itt for a new command. Then Kablewly. > I think we can make this safer by seperateing the itt allocation from the task allocation. If we just let the itt increase and rollover then there we should not be getting collisions when the above happens. So I guess we would just need to decide if the target should be sending r2ts at this time and what to do about it if anything. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: LUN Reset TMF and R2T
On 07/28/2009 06:53 AM, Hannes Reinecke wrote: > Hi all, > > when my device-reset testcase I've come across this: > > Jul 28 12:46:08 tyne kernel: session1: iscsi_eh_device_reset LU Reset [sc > 8800731e9480 lun 6] > Jul 28 12:46:08 tyne kernel: session1: iscsi_exec_task_mgmt_fn tmf set > timeout > Jul 28 12:46:08 tyne kernel: session1: mgmtpdu [op 0x2 hdr->itt 0x69 datalen > 0] > Jul 28 12:46:08 tyne kernel: connection1:0: mgmtpdu [itt 0x69 task > 88007b022800] xmit > Jul 28 12:46:08 tyne kernel: connection1:0: tmf rsp [itt 0x69] response 0 > state 1 > Jul 28 12:46:08 tyne kernel: session1: iscsi_suspend_tx suspend Tx > Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc > 88006fd20380 itt 0x54 state 3 > Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc > 88006fd20380 lun 6 itt x54] state 3 > Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc > 88007119b880 itt 0x5d state 3 > Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc > 88007119b880 lun 6 itt x5d] state 3 > Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc > 88007116ec80 itt 0x60 state 3 > Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc > 88007116ec80 lun 6 itt x60] state 3 > Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc > 880079dd8180 itt 0x61 state 3 > Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc > 880079dd8180 lun 6 itt x61] state 3 > Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x5d in R2T hdr > Jul 28 12:46:08 tyne kernel: session1: iscsi_start_tx resume Tx > Jul 28 12:46:08 tyne kernel: session1: iscsi_eh_device_reset dev reset > result = SUCCESS > Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x60 in R2T hdr > Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x61 in R2T hdr > > As you can see, we're receiving R2Ts for tasks we've just aborted :-( > > Looking closely, I don't _actually_ think the we've received them > out-of-order (which would be > a violation of the RFC). The problem seems to be our skb handling (again): > > We're reading an skb, and call the handler function once the PDU is ready. > However, we're _not_ > checking if there is more data to be read from the socket. > So it looks to me as if we're first reading the TMF response, aborting all > tasks, and then > continue reading PDUs for tasks which we just aborted. We will definately do this. You mean the target sends a tmf response that indicates it cleaned up some tasks, then it sends pdus for the tasks that should have been affected by the TMF, right? If so I do not think targets are allowed to do this. In 3.5.1.4 we have: After the Task Management response indicates Task Management function completion, the initiator will not receive any additional responses from the affected tasks. "additional responses" means scsi response pdus and data-in with status, right? Does it also mean R2Ts? I thought it did, so we will just drop the session when getting all those pdus we thought the target should not be sending. If "additional responses" does not mean R2Ts, then what are we supposed to do? Handle them? Silently drop them? I could not find anything in the RFC. The nasty problem with the code and this scenario is that we preallcoate the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and cleans up the tasks, the scsi layer can start sending us commands. We could then allocate a task/itt that was used before and should have been cleaned up. The target could then send us pdus for the cleaned up task/itt while we are using the task/itt for a new command. Then Kablewly. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
LUN Reset TMF and R2T
Hi all, when my device-reset testcase I've come across this: Jul 28 12:46:08 tyne kernel: session1: iscsi_eh_device_reset LU Reset [sc 8800731e9480 lun 6] Jul 28 12:46:08 tyne kernel: session1: iscsi_exec_task_mgmt_fn tmf set timeout Jul 28 12:46:08 tyne kernel: session1: mgmtpdu [op 0x2 hdr->itt 0x69 datalen 0] Jul 28 12:46:08 tyne kernel: connection1:0: mgmtpdu [itt 0x69 task 88007b022800] xmit Jul 28 12:46:08 tyne kernel: connection1:0: tmf rsp [itt 0x69] response 0 state 1 Jul 28 12:46:08 tyne kernel: session1: iscsi_suspend_tx suspend Tx Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc 88006fd20380 itt 0x54 state 3 Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc 88006fd20380 lun 6 itt x54] state 3 Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc 88007119b880 itt 0x5d state 3 Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc 88007119b880 lun 6 itt x5d] state 3 Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc 88007116ec80 itt 0x60 state 3 Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc 88007116ec80 lun 6 itt x60] state 3 Jul 28 12:46:08 tyne kernel: session1: fail_scsi_tasks failing sc 880079dd8180 itt 0x61 state 3 Jul 28 12:46:08 tyne kernel: session1: fail_scsi_task fail task [sc 880079dd8180 lun 6 itt x61] state 3 Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x5d in R2T hdr Jul 28 12:46:08 tyne kernel: session1: iscsi_start_tx resume Tx Jul 28 12:46:08 tyne kernel: session1: iscsi_eh_device_reset dev reset result = SUCCESS Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x60 in R2T hdr Jul 28 12:46:08 tyne kernel: connection1:0: invalid itt 0x61 in R2T hdr As you can see, we're receiving R2Ts for tasks we've just aborted :-( Looking closely, I don't _actually_ think the we've received them out-of-order (which would be a violation of the RFC). The problem seems to be our skb handling (again): We're reading an skb, and call the handler function once the PDU is ready. However, we're _not_ checking if there is more data to be read from the socket. So it looks to me as if we're first reading the TMF response, aborting all tasks, and then continue reading PDUs for tasks which we just aborted. Bad. Currently I don't have any nice idea how to solve this, so this is just as a heads up that something is awry here. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---