Hannes Reinecke wrote: > Mike Christie wrote: >> Mike Christie wrote: >>> Hannes Reinecke wrote: >>>> Sigh. Why do you have to make is so complicated ... >>>> My patch was easy and simple originally. And now this :-) >>>> >>> This gets really ugly if we do it in libiscsi_tcp. I moved the check to >>> libiscsi and I changed the abort task test to check for the rtt since >>> that works for data outs. I think the attached patch will do what you >>> wanted. It is only compile tested. >>> >> Bah. The lun and itt is not set for scsi cmd pdus. This should fix it. >> >> For the lu reset and requeue (r2t data-out handling) or scsi cmd case, >> the task sc lun is always going to be set. >> >> For the abort and requeue or cmd case, we only need to check the itt/rtt >> for data outs when doing a abort task (the requeue case), because the >> cmd has already been sent (iscsi_eh_abort checks for it on the cmd queue >> before sending) so there is no point to check at that point (also the >> itt is not set for scsi cmd pdus yet). >> >> It might be nicer to move the restrictions check after the prep scsi cmd >> pdu call but you need the cmdsn scsi_prep_scsi_cmd_pdu patch I sent the >> other day. >> > Better. Nearly there. I'm running with fast_abort disabled, and occasionally > I'm getting this (this is now the HP MSA2012i, so we can't really blame > NetApp here): > [ .. ]
Bah. I _hate_ SMP. This 'curious' behaviour is actually a race condition. Any changes to 'tmf_state' are not reflected to other threads/cpus/whatever. Two thing I did to remedy this: - Break out from the iscsi_data_xmit() loop once tmf_state is something other than TMF_INITIAL, ie effectively single-stepping PDUs during TMF - Make tmf_state atomic. After these changes the race window is much smaller: Jul 31 14:02:20 tyne kernel: session1: iscsi_eh_device_reset LU Reset [sc ffff8800744cda80 lun 2] Jul 31 14:02:20 tyne kernel: session1: iscsi_exec_task_mgmt_fn tmf set timeout Jul 31 14:02:20 tyne kernel: session1: mgmtpdu [op 0x2 hdr->itt 0x3 datalen 0] Jul 31 14:02:20 tyne kernel: connection1:0: mgmtpdu [itt 0x3 task ffff88007b91cc00] xmit Jul 31 14:02:20 tyne kernel: connection1:0: xmit pdu [op 42 itt 0x3 lun 2 count 0] Jul 31 14:02:20 tyne kernel: connection1:0: tmf rsp [itt 0x3] response 0 state 1 Jul 31 14:02:20 tyne kernel: connection1:0: task [op 1 itt 0x4e lun 2] reset. Jul 31 14:02:20 tyne kernel: connection1:0: task [op 1 itt 0x4e lun 2] reset. Jul 31 14:02:20 tyne kernel: connection1:0: task [op 1 itt 0x4e lun 2] reset. Jul 31 14:02:20 tyne kernel: connection1:0: task [op 1 itt 0x4e lun 2] reset. Jul 31 14:02:20 tyne kernel: session1: iscsi_suspend_tx suspend Tx note the 'xmit pdu' line. Which means that we again managed to send a PDU _after_ the tmf command. And the 'count 0' means it's the first PDU transmitted from iscsi_data_xmit(), ie scheduled from another work_queue item. Continuing with investigation. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---