Hannes Reinecke wrote:
> Mike Christie wrote:
>> Mike Christie wrote:
>>> Hannes Reinecke wrote:
>>>> Sigh. Why do you have to make is so complicated ...
>>>> My patch was easy and simple originally. And now this :-)
>>>>
>>> This gets really ugly if we do it in libiscsi_tcp. I moved the check to 
>>> libiscsi and I changed the abort task test to check for the rtt since 
>>> that works for data outs. I think the attached patch will do what you 
>>> wanted. It is only compile tested.
>>>
>> Bah. The lun and itt is not set for scsi cmd pdus. This should fix it.
>>
>> For the lu reset and requeue (r2t data-out handling) or scsi cmd case, 
>> the task sc lun is always going to be set.
>>
>> For the abort and requeue or cmd case, we only need to check the itt/rtt 
>> for data outs when doing a abort task (the requeue case), because the 
>> cmd has already been sent (iscsi_eh_abort checks for it on the cmd queue 
>> before sending) so there is no point to check at that point (also the 
>> itt is not set for scsi cmd pdus yet).
>>
>> It might be nicer to move the restrictions check after the prep scsi cmd 
>> pdu call but you need the cmdsn scsi_prep_scsi_cmd_pdu patch I sent the 
>> other day.
>>
> Better. Nearly there. I'm running with fast_abort disabled, and occasionally
> I'm getting this (this is now the HP MSA2012i, so we can't really blame
> NetApp here):
> 
[ .. ]

Bah. I _hate_ SMP.
This 'curious' behaviour is actually a race condition.
Any changes to 'tmf_state' are not reflected to other threads/cpus/whatever.
Two thing I did to remedy this:
- Break out from the iscsi_data_xmit() loop once tmf_state is something other
  than TMF_INITIAL, ie effectively single-stepping PDUs during TMF
- Make tmf_state atomic.

After these changes the race window is much smaller:

Jul 31 14:02:20 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
ffff8800744cda80 lun 2]
Jul 31 14:02:20 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set timeout
Jul 31 14:02:20 tyne kernel:  session1: mgmtpdu [op 0x2 hdr->itt 0x3 datalen 0]
Jul 31 14:02:20 tyne kernel:  connection1:0: mgmtpdu [itt 0x3 task 
ffff88007b91cc00] xmit
Jul 31 14:02:20 tyne kernel:  connection1:0: xmit pdu [op 42 itt 0x3 lun 2 
count 0]
Jul 31 14:02:20 tyne kernel:  connection1:0: tmf rsp [itt 0x3] response 0 state 
1
Jul 31 14:02:20 tyne kernel:  connection1:0: task [op 1 itt 0x4e lun 2] reset.
Jul 31 14:02:20 tyne kernel:  connection1:0: task [op 1 itt 0x4e lun 2] reset.
Jul 31 14:02:20 tyne kernel:  connection1:0: task [op 1 itt 0x4e lun 2] reset.
Jul 31 14:02:20 tyne kernel:  connection1:0: task [op 1 itt 0x4e lun 2] reset.
Jul 31 14:02:20 tyne kernel:  session1: iscsi_suspend_tx suspend Tx

note the 'xmit pdu' line.
Which means that we again managed to send a PDU _after_ the tmf command.
And the 'count 0' means it's the first PDU transmitted from iscsi_data_xmit(),
ie scheduled from another work_queue item.

Continuing with investigation.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to