Re: LUN Reset TMF and R2T

Hannes Reinecke Wed, 29 Jul 2009 02:06:31 -0700

Steven Hayter wrote:
> On 28/07/2009 06:14 pm, Mike Christie wrote:
>> On 07/28/2009 06:53 AM, Hannes Reinecke wrote:
>>> Hi all,
>>>
>>> when my device-reset testcase I've come across this:
>>>
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
>>> ffff8800731e9480 lun 6]
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
>>> timeout
>>> Jul 28 12:46:08 tyne kernel:  session1: mgmtpdu [op 0x2 hdr->itt 0x69 
>>> datalen 0]
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: mgmtpdu [itt 0x69 task 
>>> ffff88007b022800] xmit
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: tmf rsp [itt 0x69] response 0 
>>> state 1
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>>> ffff88006fd20380 itt 0x54 state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>>> ffff88006fd20380 lun 6 itt x54] state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>>> ffff88007119b880 itt 0x5d state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>>> ffff88007119b880 lun 6 itt x5d] state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>>> ffff88007116ec80 itt 0x60 state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>>> ffff88007116ec80 lun 6 itt x60] state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>>> ffff880079dd8180 itt 0x61 state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>>> ffff880079dd8180 lun 6 itt x61] state 3
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x5d in R2T hdr
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_start_tx resume Tx
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset dev reset 
>>> result = SUCCESS
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x60 in R2T hdr
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x61 in R2T hdr
>>>
>>> As you can see, we're receiving R2Ts for tasks we've just aborted :-(
>>>
>>> Looking closely, I don't _actually_ think the we've received them 
>>> out-of-order (which would be
>>> a violation of the RFC). The problem seems to be our skb handling (again):
>>>
>>> We're reading an skb, and call the handler function once the PDU is ready. 
>>> However, we're _not_
>>> checking if there is more data to be read from the socket.
>>> So it looks to me as if we're first reading the TMF response, aborting all 
>>> tasks, and then
>>> continue reading PDUs for tasks which we just aborted.
>> We will definately do this. You mean the target sends a tmf response
>> that indicates it cleaned up some tasks, then it sends pdus for the
>> tasks that should have been affected by the TMF, right? If so I do not
>> think targets are allowed to do this. In 3.5.1.4 we have:
>>
>>      After the Task Management response indicates Task Management function
>>      completion, the initiator will not receive any additional responses
>>      from the affected tasks.
>>
>> "additional responses" means scsi response pdus and data-in with status,
>> right? Does it also mean R2Ts? I thought it did, so we will just drop
>> the session when getting all those pdus we thought the target should not
>> be sending.
>>
>> If "additional responses" does not mean R2Ts, then what are we supposed
>> to do? Handle them? Silently drop them? I could not find anything in the
>> RFC.
>>
>> The nasty problem with the code and this scenario is that we preallcoate
>> the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and
>> cleans up the tasks, the scsi layer can start sending us commands. We
>> could then allocate a task/itt that was used before and should have been
>> cleaned up. The target could then send us pdus for the cleaned up
>> task/itt while we are using the task/itt for a new command. Then Kablewly.
> 
> It does look confusing, I think RFC 5048, Section 4.1.2. "Clarified 
> Multi-Task Abort Semantics", gives guidelines as to what should happen.
> 
> Every way read it, the target shouldn't be sending R2Ts for tasks which 
> are part of the affected task set.  (those equal or exceeding the CmdSN 
> of the reset TMF).  But I've been wrong in the past.
> 
Nevermind, found the reason.
Totally different story, but we've been the culprit nevertheless.


iscsi_xmit_task() runs in a loop, disregarding any TMF state.
So we will happily continue sending R2T transfers even though
the LU Reset has already finished.

Patch to follow.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Re: LUN Reset TMF and R2T

Reply via email to