Hi Mike,
as you might've seen, I finally found the problem for the MSA dropping
the connection. It seems that it's follows this section from the RFC:
For the LOGICAL UNIT RESET function, the target MUST behave as
dictated by the Logical Unit Reset function in [SAM2].
where SAM2 says:
When a logical unit is aborting one or more tasks from a SCSI
initiator port with the TASK ABORTED status it should complete
all of those tasks before entering additional tasks from that
SCSI initiator port into the task set.
So the tasks must be _completed_ at the target. Which can be
interpreted as requiring the target to send an ABORT_TASK_SET
to each outstanding task, so that this section applies:
For ABORT TASK SET and CLEAR TASK SET, the issuing initiator MUST
continue to respond to all valid target transfer tags (received via
R2T, Text Response, NOP-In, or SCSI Data-In PDUs) related to the
affected task set, even after issuing the task management request.
The issuing initiator SHOULD however terminate (i.e., by setting the
F-bit to 1) these response sequences as quickly as possible. The
target on its part MUST wait for responses on all affected target
transfer tags before acting on either of these two task management
requests. In case all or part of the response sequence is not
received (due to digest errors) for a valid TTT, the target MAY treat
it as a case of within-command error recovery class (see Section
6.1.4.1 Recovery Within-command) if it is supporting
ErrorRecoveryLevel >= 1, or alternatively may drop the connection to
complete the requested task set function.
This is clarified by RFC 5048 section 4.1.2:
The initiator iSCSI layer:
a. MUST continue to respond to each TTT received for the affected
tasks
[ .. ]
The target iSCSI layer:
a. MUST wait for responses on currently valid target-transfer tags
of the affected tasks from the issuing initiator.
Which is exactly what I've seen with the 'ttt tracking' patch:
Aug 4 13:58:10 tyne kernel: session2: iscsi_eh_device_reset LU Reset [sc
ffff88005cf4ba80 lun 1]
Aug 4 13:58:10 tyne kernel: session2: iscsi_exec_task_mgmt_fn tmf set timeout
Aug 4 13:58:10 tyne kernel: session2: iscsi_eh_device_reset dev reset result
= SUCCESS
Aug 4 13:58:12 tyne kernel: session2: iscsi_eh_device_reset LU Reset [sc
ffff88005cc12880 lun 2]
Aug 4 13:58:12 tyne kernel: session2: iscsi_exec_task_mgmt_fn tmf set timeout
Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0xe ttt
0xc5cf6a01 sc ffff8800378c9d80 still active
Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x15 ttt
0x2590d700 sc ffff88007a5c8980 still active
Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x18 ttt
0x4926d000 sc ffff880078d8da80 still active
Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x1f ttt
0x89ac9500 sc ffff88007a5de080 still active
Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x27 ttt
0x7d0d4201 sc ffff8800378c9680 still active
Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x28 ttt
0x4e2c1b01 sc ffff8800724cf680 still active
[ .. ]
Aug 4 13:58:41 tyne kernel: session2: fail_scsi_task task itt 0x34 ttt
0xdccab00 sc ffff880074444680 still active
Aug 4 13:58:41 tyne kernel: session2: fail_scsi_task task itt 0x53 ttt
0xa3f5bc01 sc ffff88005cc12380 still active
Aug 4 13:58:41 tyne kernel: session2: fail_scsi_task task itt 0x71 ttt
0x7018c701 sc ffff88006fcf1580 still active
Aug 4 13:58:41 tyne kernel: session2: fail_scsi_task task itt 0x7c ttt
0x9bc45001 sc ffff880074444280 still active
Aug 4 13:58:41 tyne kernel: session2: iscsi_eh_device_reset dev reset result
= SUCCESS
Aug 4 13:58:42 tyne kernel: connection2:0: detected conn error (1020)
So I guess the MSA uses a 30 seconds timeout here :-)
Nasty, that one.
But how to handle it?
The best course of action here would be to send an ABORT_TASK TMF for each
outstanding task;
however, our current code wouldn't allow us to.
I tried to fiddle with introducing a 'TMF_QUIESCE' state during which we're
issuing outstanding
R2Ts, but so far with very limited success.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---