Mike Christie wrote:
> 
>> The second patch is the more important one, as it
>> fixes an error during LUN Reset handling in the
>> initiator. When sending a LUN Reset during an
>> ongoing R2T transfer, we're suspending Tx and
>> aborting all _SCSI_ tasks. However, once we're
>> done there we're resuming Tx and the R2T transfer
>> will happily continue. So we should rather be
> 
> This should not be happening. When iscsi_suspend_tx returns the tx 
> thread has stopped so we know there are no users accessing the task 
> (well, there could be if a target is sending a tmf response then a r2t, 
> but if the target is following the rfc there should not be).
> 
> So when fail_scsi_tasks calls
> 
> fail_scsi_task ->iscsi_complete_task (this will cleanup conn->task if 
> this is the same task) -> __iscsi_put_task
> 
> this should be the last put on the task and that should release it 
> calling iscsi_free_task which should call cleanup_task to kill any 
> pending r2t handling and it would remove it from the requeue list.
> 
> If we are sending a data-out for a task that has had fail_scsi_task 
> ->iscsi_complete_task -> __iscsi_put_task called for it then we are in 
> bigger trouble because the last put should have been called on it and we 
>   are accessing a bad task.
> 
This is the log I'm getting:


Jul 29 10:34:48 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
ffff88007b94d080 lun 6]
Jul 29 10:34:48 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set timeout
Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x3a lun 6 abort transfer
Jul 29 10:34:48 tyne kernel:  session1: mgmtpdu [op 0x2 hdr->itt 0x5d datalen 0]
Jul 29 10:34:48 tyne kernel:  connection1:0: mgmtpdu [itt 0x5d task 
ffff88007a01fc00] xmit
Jul 29 10:34:48 tyne kernel:  connection1:0: tmf rsp [itt 0x5d] response 0 
state 1
Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x72 lun 6 abort transfer
Jul 29 10:34:48 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
Jul 29 10:34:48 tyne kernel:  session1: iscsi_complete_task task itt 0x72 sc 
ffff88007b5bc580 still active
Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x57 lun 6 abort transfer
Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x59 lun 6 abort transfer
Jul 29 10:34:48 tyne kernel:  session1: Tx suspended!

So we're indeed would have continued the R2T task (itt 0x57 and itt 0x59) even 
though we've
already received a valid TMF response.
So I'm afraid it's us ...

I really do think part of the problem is that we setting the SUSPEND bit
without holding the session lock. We _check_ it under the lock in
iscsi_xmit(), but setting is done without the lock.
Which of course causes all sorts of race conditions.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to