Lost active R2T transfers during reset

Hannes Reinecke Tue, 04 Aug 2009 05:24:50 -0700

Hi Mike,

as you might've seen, I finally found the problem for the MSA dropping
the connection. It seems that it's follows this section from the RFC:


   For the LOGICAL UNIT RESET function, the target MUST behave as
   dictated by the Logical Unit Reset function in [SAM2].

where SAM2 says:
  When a logical unit is aborting one or more tasks from a SCSI
  initiator port with the TASK ABORTED status it should complete
  all of those tasks before entering additional tasks from that
  SCSI initiator port into the task set.

So the tasks must be _completed_ at the target. Which can be
interpreted as requiring the target to send an ABORT_TASK_SET
to each outstanding task, so that this section applies:

   For ABORT TASK SET and CLEAR TASK SET, the issuing initiator MUST
   continue to respond to all valid target transfer tags (received via
   R2T, Text Response, NOP-In, or SCSI Data-In PDUs) related to the
   affected task set, even after issuing the task management request.
   The issuing initiator SHOULD however terminate (i.e., by setting the
   F-bit to 1) these response sequences as quickly as possible.  The
   target on its part MUST wait for responses on all affected target
   transfer tags before acting on either of these two task management
   requests.  In case all or part of the response sequence is not
   received (due to digest errors) for a valid TTT, the target MAY treat
   it as a case of within-command error recovery class (see Section
   6.1.4.1 Recovery Within-command) if it is supporting
   ErrorRecoveryLevel >= 1, or alternatively may drop the connection to
   complete the requested task set function.

This is clarified by RFC 5048 section 4.1.2:

The initiator iSCSI layer:
 a. MUST continue to respond to each TTT received for the affected
    tasks

[ .. ]
The target iSCSI layer:
 a. MUST wait for responses on currently valid target-transfer tags
    of the affected tasks from the issuing initiator.

Which is exactly what I've seen with the 'ttt tracking' patch:

Aug  4 13:58:10 tyne kernel:  session2: iscsi_eh_device_reset LU Reset [sc 
ffff88005cf4ba80 lun 1]
Aug  4 13:58:10 tyne kernel:  session2: iscsi_exec_task_mgmt_fn tmf set timeout
Aug  4 13:58:10 tyne kernel:  session2: iscsi_eh_device_reset dev reset result 
= SUCCESS
Aug  4 13:58:12 tyne kernel:  session2: iscsi_eh_device_reset LU Reset [sc 
ffff88005cc12880 lun 2]
Aug  4 13:58:12 tyne kernel:  session2: iscsi_exec_task_mgmt_fn tmf set timeout
Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0xe ttt 
0xc5cf6a01 sc ffff8800378c9d80 still active
Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x15 ttt 
0x2590d700 sc ffff88007a5c8980 still active
Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x18 ttt 
0x4926d000 sc ffff880078d8da80 still active
Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x1f ttt 
0x89ac9500 sc ffff88007a5de080 still active
Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x27 ttt 
0x7d0d4201 sc ffff8800378c9680 still active
Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x28 ttt 
0x4e2c1b01 sc ffff8800724cf680 still active

[ .. ]

Aug  4 13:58:41 tyne kernel:  session2: fail_scsi_task task itt 0x34 ttt 
0xdccab00 sc ffff880074444680 still active
Aug  4 13:58:41 tyne kernel:  session2: fail_scsi_task task itt 0x53 ttt 
0xa3f5bc01 sc ffff88005cc12380 still active
Aug  4 13:58:41 tyne kernel:  session2: fail_scsi_task task itt 0x71 ttt 
0x7018c701 sc ffff88006fcf1580 still active
Aug  4 13:58:41 tyne kernel:  session2: fail_scsi_task task itt 0x7c ttt 
0x9bc45001 sc ffff880074444280 still active
Aug  4 13:58:41 tyne kernel:  session2: iscsi_eh_device_reset dev reset result 
= SUCCESS
Aug  4 13:58:42 tyne kernel:  connection2:0: detected conn error (1020)

So I guess the MSA uses a 30 seconds timeout here :-)
Nasty, that one.

But how to handle it?
The best course of action here would be to send an ABORT_TASK TMF for each 
outstanding task;
however, our current code wouldn't allow us to.
I tried to fiddle with introducing a 'TMF_QUIESCE' state during which we're 
issuing outstanding
R2Ts, but so far with very limited success.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
[email protected]                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Lost active R2T transfers during reset

Reply via email to