Mike Christie wrote:
> Mike Christie wrote:
>> Mike Christie wrote:
>>> Mike Christie wrote:
>>>> Mike Christie wrote:
>>>>> Hannes Reinecke wrote:
>>>>>> Mike Christie wrote:
>>>>>>>> The second patch is the more important one, as it
>>>>>>>> fixes an error during LUN Reset handling in the
>>>>>>>> initiator. When sending a LUN Reset during an
>>>>>>>> ongoing R2T transfer, we're suspending Tx and
>>>>>>>> aborting all _SCSI_ tasks. However, once we're
>>>>>>>> done there we're resuming Tx and the R2T transfer
>>>>>>>> will happily continue. So we should rather be
>>>>>>> This should not be happening. When iscsi_suspend_tx returns the tx 
>>>>>>> thread has stopped so we know there are no users accessing the task 
>>>>>>> (well, there could be if a target is sending a tmf response then a r2t, 
>>>>>>> but if the target is following the rfc there should not be).
>>>>>>>
>>>>>>> So when fail_scsi_tasks calls
>>>>>>>
>>>>>>> fail_scsi_task ->iscsi_complete_task (this will cleanup conn->task if 
>>>>>>> this is the same task) -> __iscsi_put_task
>>>>>>>
>>>>>>> this should be the last put on the task and that should release it 
>>>>>>> calling iscsi_free_task which should call cleanup_task to kill any 
>>>>>>> pending r2t handling and it would remove it from the requeue list.
>>>>>>>
>>>>>>> If we are sending a data-out for a task that has had fail_scsi_task 
>>>>>>> ->iscsi_complete_task -> __iscsi_put_task called for it then we are in 
>>>>>>> bigger trouble because the last put should have been called on it and 
>>>>>>> we 
>>>>>>>   are accessing a bad task.
>>>>>>>
>>>>>> This is the log I'm getting:
>>>>>>
>>>>>>
>>>>>> Jul 29 10:34:48 tyne kernel:  session1: iscsi_eh_device_reset LU Reset 
>>>>>> [sc ffff88007b94d080 lun 6]
>>>>>> Jul 29 10:34:48 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
>>>>>> timeout
>>>>>> Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x3a lun 6 abort 
>>>>>> transfer
>>>>>> Jul 29 10:34:48 tyne kernel:  session1: mgmtpdu [op 0x2 hdr->itt 0x5d 
>>>>>> datalen 0]
>>>>>> Jul 29 10:34:48 tyne kernel:  connection1:0: mgmtpdu [itt 0x5d task 
>>>>>> ffff88007a01fc00] xmit
>>>>>> Jul 29 10:34:48 tyne kernel:  connection1:0: tmf rsp [itt 0x5d] response 
>>>>>> 0 state 1
>>>>>> Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x72 lun 6 abort 
>>>>>> transfer
>>>>>> Jul 29 10:34:48 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
>>>>>> Jul 29 10:34:48 tyne kernel:  session1: iscsi_complete_task task itt 
>>>>>> 0x72 sc ffff88007b5bc580 still active
>>>>>> Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x57 lun 6 abort 
>>>>>> transfer
>>>>>> Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x59 lun 6 abort 
>>>>>> transfer
>>>>>> Jul 29 10:34:48 tyne kernel:  session1: Tx suspended!
>>>>>>
>>>>>> So we're indeed would have continued the R2T task (itt 0x57 and itt 
>>>>>> 0x59) even though we've
>>>>>> already received a valid TMF response.
>>>>>> So I'm afraid it's us ...
>>>>> Ah, I misunderstood you. I do not think it has to do with the cleanup 
>>>>> still leaving r2ts. I am not sure where you are putting printks, but I 
>>>>> think it is this:
>>>>>
>>>>>          while (!list_empty(&conn->requeue)) {
>>>>>                  if (conn->session->fast_abort && conn->tmf_state != 
>>>>> TMF_INITIAL)
>>>>>                          break;
>>>>>
>>>>> Once the tmf completes, we will start sending data again.
>>>>>
>>>> Ooops. I am too sleepy. Ignore that. I am wrong there.
>>>>
>>> I guess if fast_abort is 0 though, we will hit this problem. And we will 
>>> send data-outs when getting tmf responses as well as when we are sending 
>>> the tmf.
>>
>>
>> I think the problem is wording like in 10.5.1:
>>
>>     For ABORT TASK SET and CLEAR TASK SET, the issuing initiator MUST
>>     continue to respond to all valid target transfer tags (received via
>>     R2T, Text Response, NOP-In, or SCSI Data-In PDUs) related to the
>>     affected task set, even after issuing the task management request.
>>
>> I think in some other doc (probably the one Mathew and Ulrich mentioned) 
>> there is wording about doing similar for abort and lu resets.
>>
>> The things is that I think half of targets want us to respond to r2ts 
>> and half do not. This is where the fast_abort comes from. If set then we 
>> reply to r2ts and if not set we do not. I think once we get a successful 
> 
> Fudge. I am really going to be now. I mean if it is set we do not reply 
> to r2ts. If not set then we reply.
> 
Actually, I think it's a race condition:

drivers/scsi/libiscsi.c:iscsi_eh_device_reset()
        rc = SUCCESS;
        spin_unlock_bh(&session->lock);

        iscsi_suspend_tx(conn);

So the workqueue thread could wedge in after we've
unlocked the session lock and start sending data
even though we're meant to suspend transmitting here.

Will be trying it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to