Re: LUN Reset TMF and R2T

2009-07-29 Thread Hannes Reinecke

Steven Hayter wrote:
> On 28/07/2009 06:14 pm, Mike Christie wrote:
>> On 07/28/2009 06:53 AM, Hannes Reinecke wrote:
>>> Hi all,
>>>
>>> when my device-reset testcase I've come across this:
>>>
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
>>> 8800731e9480 lun 6]
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
>>> timeout
>>> Jul 28 12:46:08 tyne kernel:  session1: mgmtpdu [op 0x2 hdr->itt 0x69 
>>> datalen 0]
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: mgmtpdu [itt 0x69 task 
>>> 88007b022800] xmit
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: tmf rsp [itt 0x69] response 0 
>>> state 1
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>>> 88006fd20380 itt 0x54 state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>>> 88006fd20380 lun 6 itt x54] state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>>> 88007119b880 itt 0x5d state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>>> 88007119b880 lun 6 itt x5d] state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>>> 88007116ec80 itt 0x60 state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>>> 88007116ec80 lun 6 itt x60] state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>>> 880079dd8180 itt 0x61 state 3
>>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>>> 880079dd8180 lun 6 itt x61] state 3
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x5d in R2T hdr
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_start_tx resume Tx
>>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset dev reset 
>>> result = SUCCESS
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x60 in R2T hdr
>>> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x61 in R2T hdr
>>>
>>> As you can see, we're receiving R2Ts for tasks we've just aborted :-(
>>>
>>> Looking closely, I don't _actually_ think the we've received them 
>>> out-of-order (which would be
>>> a violation of the RFC). The problem seems to be our skb handling (again):
>>>
>>> We're reading an skb, and call the handler function once the PDU is ready. 
>>> However, we're _not_
>>> checking if there is more data to be read from the socket.
>>> So it looks to me as if we're first reading the TMF response, aborting all 
>>> tasks, and then
>>> continue reading PDUs for tasks which we just aborted.
>> We will definately do this. You mean the target sends a tmf response
>> that indicates it cleaned up some tasks, then it sends pdus for the
>> tasks that should have been affected by the TMF, right? If so I do not
>> think targets are allowed to do this. In 3.5.1.4 we have:
>>
>>  After the Task Management response indicates Task Management function
>>  completion, the initiator will not receive any additional responses
>>  from the affected tasks.
>>
>> "additional responses" means scsi response pdus and data-in with status,
>> right? Does it also mean R2Ts? I thought it did, so we will just drop
>> the session when getting all those pdus we thought the target should not
>> be sending.
>>
>> If "additional responses" does not mean R2Ts, then what are we supposed
>> to do? Handle them? Silently drop them? I could not find anything in the
>> RFC.
>>
>> The nasty problem with the code and this scenario is that we preallcoate
>> the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and
>> cleans up the tasks, the scsi layer can start sending us commands. We
>> could then allocate a task/itt that was used before and should have been
>> cleaned up. The target could then send us pdus for the cleaned up
>> task/itt while we are using the task/itt for a new command. Then Kablewly.
> 
> It does look confusing, I think RFC 5048, Section 4.1.2. "Clarified 
> Multi-Task Abort Semantics", gives guidelines as to what should happen.
> 
> Every way read it, the target shouldn't be sending R2Ts for tasks which 
> are part of the affected task set.  (those equal or exceeding the CmdSN 
> of the reset TMF).  But I've been wrong in the past.
> 
Nevermind, found the reason.
Totally different story, but we've been the culprit nevertheless.

iscsi_xmit_task() runs in a loop, disregarding any TMF state.
So we will happily continue sending R2T transfers even though
the LU Reset has already finished.

Patch to follow.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this g

Re: LUN Reset TMF and R2T

2009-07-28 Thread Steven Hayter

On 28/07/2009 06:14 pm, Mike Christie wrote:
> On 07/28/2009 06:53 AM, Hannes Reinecke wrote:
>> Hi all,
>>
>> when my device-reset testcase I've come across this:
>>
>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
>> 8800731e9480 lun 6]
>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
>> timeout
>> Jul 28 12:46:08 tyne kernel:  session1: mgmtpdu [op 0x2 hdr->itt 0x69 
>> datalen 0]
>> Jul 28 12:46:08 tyne kernel:  connection1:0: mgmtpdu [itt 0x69 task 
>> 88007b022800] xmit
>> Jul 28 12:46:08 tyne kernel:  connection1:0: tmf rsp [itt 0x69] response 0 
>> state 1
>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>> 88006fd20380 itt 0x54 state 3
>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>> 88006fd20380 lun 6 itt x54] state 3
>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>> 88007119b880 itt 0x5d state 3
>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>> 88007119b880 lun 6 itt x5d] state 3
>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>> 88007116ec80 itt 0x60 state 3
>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>> 88007116ec80 lun 6 itt x60] state 3
>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
>> 880079dd8180 itt 0x61 state 3
>> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
>> 880079dd8180 lun 6 itt x61] state 3
>> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x5d in R2T hdr
>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_start_tx resume Tx
>> Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset dev reset 
>> result = SUCCESS
>> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x60 in R2T hdr
>> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x61 in R2T hdr
>>
>> As you can see, we're receiving R2Ts for tasks we've just aborted :-(
>>
>> Looking closely, I don't _actually_ think the we've received them 
>> out-of-order (which would be
>> a violation of the RFC). The problem seems to be our skb handling (again):
>>
>> We're reading an skb, and call the handler function once the PDU is ready. 
>> However, we're _not_
>> checking if there is more data to be read from the socket.
>> So it looks to me as if we're first reading the TMF response, aborting all 
>> tasks, and then
>> continue reading PDUs for tasks which we just aborted.
>
> We will definately do this. You mean the target sends a tmf response
> that indicates it cleaned up some tasks, then it sends pdus for the
> tasks that should have been affected by the TMF, right? If so I do not
> think targets are allowed to do this. In 3.5.1.4 we have:
>
>  After the Task Management response indicates Task Management function
>  completion, the initiator will not receive any additional responses
>  from the affected tasks.
>
> "additional responses" means scsi response pdus and data-in with status,
> right? Does it also mean R2Ts? I thought it did, so we will just drop
> the session when getting all those pdus we thought the target should not
> be sending.
>
> If "additional responses" does not mean R2Ts, then what are we supposed
> to do? Handle them? Silently drop them? I could not find anything in the
> RFC.
>
> The nasty problem with the code and this scenario is that we preallcoate
> the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and
> cleans up the tasks, the scsi layer can start sending us commands. We
> could then allocate a task/itt that was used before and should have been
> cleaned up. The target could then send us pdus for the cleaned up
> task/itt while we are using the task/itt for a new command. Then Kablewly.

It does look confusing, I think RFC 5048, Section 4.1.2. "Clarified 
Multi-Task Abort Semantics", gives guidelines as to what should happen.

Every way read it, the target shouldn't be sending R2Ts for tasks which 
are part of the affected task set.  (those equal or exceeding the CmdSN 
of the reset TMF).  But I've been wrong in the past.

Steve

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: LUN Reset TMF and R2T

2009-07-28 Thread Mike Christie

Mike Christie wrote:
> 
> The nasty problem with the code and this scenario is that we preallcoate 
> the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and 
> cleans up the tasks, the scsi layer can start sending us commands. We 
> could then allocate a task/itt that was used before and should have been 
> cleaned up. The target could then send us pdus for the cleaned up 
> task/itt while we are using the task/itt for a new command. Then Kablewly.
> 

I think we can make this safer by seperateing the itt allocation from 
the task allocation. If we just let the itt increase and rollover then 
there we should not be getting collisions when the above happens.

So I guess we would just need to decide if the target should be sending 
r2ts at this time and what to do about it if anything.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: LUN Reset TMF and R2T

2009-07-28 Thread Mike Christie

On 07/28/2009 06:53 AM, Hannes Reinecke wrote:
> Hi all,
>
> when my device-reset testcase I've come across this:
>
> Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
> 8800731e9480 lun 6]
> Jul 28 12:46:08 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
> timeout
> Jul 28 12:46:08 tyne kernel:  session1: mgmtpdu [op 0x2 hdr->itt 0x69 datalen 
> 0]
> Jul 28 12:46:08 tyne kernel:  connection1:0: mgmtpdu [itt 0x69 task 
> 88007b022800] xmit
> Jul 28 12:46:08 tyne kernel:  connection1:0: tmf rsp [itt 0x69] response 0 
> state 1
> Jul 28 12:46:08 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
> 88006fd20380 itt 0x54 state 3
> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
> 88006fd20380 lun 6 itt x54] state 3
> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
> 88007119b880 itt 0x5d state 3
> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
> 88007119b880 lun 6 itt x5d] state 3
> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
> 88007116ec80 itt 0x60 state 3
> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
> 88007116ec80 lun 6 itt x60] state 3
> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
> 880079dd8180 itt 0x61 state 3
> Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
> 880079dd8180 lun 6 itt x61] state 3
> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x5d in R2T hdr
> Jul 28 12:46:08 tyne kernel:  session1: iscsi_start_tx resume Tx
> Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset dev reset 
> result = SUCCESS
> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x60 in R2T hdr
> Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x61 in R2T hdr
>
> As you can see, we're receiving R2Ts for tasks we've just aborted :-(
>
> Looking closely, I don't _actually_ think the we've received them 
> out-of-order (which would be
> a violation of the RFC). The problem seems to be our skb handling (again):
>
> We're reading an skb, and call the handler function once the PDU is ready. 
> However, we're _not_
> checking if there is more data to be read from the socket.
> So it looks to me as if we're first reading the TMF response, aborting all 
> tasks, and then
> continue reading PDUs for tasks which we just aborted.

We will definately do this. You mean the target sends a tmf response 
that indicates it cleaned up some tasks, then it sends pdus for the 
tasks that should have been affected by the TMF, right? If so I do not 
think targets are allowed to do this. In 3.5.1.4 we have:

After the Task Management response indicates Task Management function
completion, the initiator will not receive any additional responses
from the affected tasks.

"additional responses" means scsi response pdus and data-in with status, 
right? Does it also mean R2Ts? I thought it did, so we will just drop 
the session when getting all those pdus we thought the target should not 
be sending.

If "additional responses" does not mean R2Ts, then what are we supposed 
to do? Handle them? Silently drop them? I could not find anything in the 
RFC.

The nasty problem with the code and this scenario is that we preallcoate 
the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and 
cleans up the tasks, the scsi layer can start sending us commands. We 
could then allocate a task/itt that was used before and should have been 
cleaned up. The target could then send us pdus for the cleaned up 
task/itt while we are using the task/itt for a new command. Then Kablewly.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



LUN Reset TMF and R2T

2009-07-28 Thread Hannes Reinecke

Hi all,

when my device-reset testcase I've come across this:

Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
8800731e9480 lun 6]
Jul 28 12:46:08 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set timeout
Jul 28 12:46:08 tyne kernel:  session1: mgmtpdu [op 0x2 hdr->itt 0x69 datalen 0]
Jul 28 12:46:08 tyne kernel:  connection1:0: mgmtpdu [itt 0x69 task 
88007b022800] xmit
Jul 28 12:46:08 tyne kernel:  connection1:0: tmf rsp [itt 0x69] response 0 
state 1
Jul 28 12:46:08 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
88006fd20380 itt 0x54 state 3
Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
88006fd20380 lun 6 itt x54] state 3
Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
88007119b880 itt 0x5d state 3
Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
88007119b880 lun 6 itt x5d] state 3
Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
88007116ec80 itt 0x60 state 3
Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
88007116ec80 lun 6 itt x60] state 3
Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
880079dd8180 itt 0x61 state 3
Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
880079dd8180 lun 6 itt x61] state 3
Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x5d in R2T hdr
Jul 28 12:46:08 tyne kernel:  session1: iscsi_start_tx resume Tx
Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset dev reset result 
= SUCCESS
Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x60 in R2T hdr
Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x61 in R2T hdr

As you can see, we're receiving R2Ts for tasks we've just aborted :-(

Looking closely, I don't _actually_ think the we've received them out-of-order 
(which would be
a violation of the RFC). The problem seems to be our skb handling (again):

We're reading an skb, and call the handler function once the PDU is ready. 
However, we're _not_
checking if there is more data to be read from the socket.
So it looks to me as if we're first reading the TMF response, aborting all 
tasks, and then
continue reading PDUs for tasks which we just aborted.
Bad.

Currently I don't have any nice idea how to solve this, so this is just as a 
heads up that
something is awry here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---