Hannes Reinecke wrote:
> Hi Mike,
>
> as you might've seen, I finally found the problem for the MSA dropping
> the connection. It seems that it's follows this section from the RFC:
>
> For the LOGICAL UNIT RESET function, the target MUST behave as
> dictated by the Logical Unit Reset function in [SAM2].
>
> where SAM2 says:
> When a logical unit is aborting one or more tasks from a SCSI
> initiator port with the TASK ABORTED status it should complete
> all of those tasks before entering additional tasks from that
> SCSI initiator port into the task set.
>
> So the tasks must be _completed_ at the target. Which can be
> interpreted as requiring the target to send an ABORT_TASK_SET
> to each outstanding task, so that this section applies:
>
> For ABORT TASK SET and CLEAR TASK SET, the issuing initiator MUST
> continue to respond to all valid target transfer tags (received via
> R2T, Text Response, NOP-In, or SCSI Data-In PDUs) related to the
> affected task set, even after issuing the task management request.
> The issuing initiator SHOULD however terminate (i.e., by setting the
> F-bit to 1) these response sequences as quickly as possible. The
> target on its part MUST wait for responses on all affected target
> transfer tags before acting on either of these two task management
> requests. In case all or part of the response sequence is not
> received (due to digest errors) for a valid TTT, the target MAY treat
> it as a case of within-command error recovery class (see Section
> 6.1.4.1 Recovery Within-command) if it is supporting
> ErrorRecoveryLevel >= 1, or alternatively may drop the connection to
> complete the requested task set function.
>
> This is clarified by RFC 5048 section 4.1.2:
>
> The initiator iSCSI layer:
> a. MUST continue to respond to each TTT received for the affected
> tasks
4.1.2 and the passage above it from 3720 applies to lu reset too right?
That is my understanding. The comment about sending a ABORT_TASK_SET
confused me.
>
> [ .. ]
> The target iSCSI layer:
> a. MUST wait for responses on currently valid target-transfer tags
> of the affected tasks from the issuing initiator.
>
> Which is exactly what I've seen with the 'ttt tracking' patch:
>
> Aug 4 13:58:10 tyne kernel: session2: iscsi_eh_device_reset LU Reset [sc
> ffff88005cf4ba80 lun 1]
> Aug 4 13:58:10 tyne kernel: session2: iscsi_exec_task_mgmt_fn tmf set
> timeout
> Aug 4 13:58:10 tyne kernel: session2: iscsi_eh_device_reset dev reset
> result = SUCCESS
> Aug 4 13:58:12 tyne kernel: session2: iscsi_eh_device_reset LU Reset [sc
> ffff88005cc12880 lun 2]
> Aug 4 13:58:12 tyne kernel: session2: iscsi_exec_task_mgmt_fn tmf set
> timeout
> Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0xe ttt
> 0xc5cf6a01 sc ffff8800378c9d80 still active
> Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x15 ttt
> 0x2590d700 sc ffff88007a5c8980 still active
> Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x18 ttt
> 0x4926d000 sc ffff880078d8da80 still active
> Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x1f ttt
> 0x89ac9500 sc ffff88007a5de080 still active
> Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x27 ttt
> 0x7d0d4201 sc ffff8800378c9680 still active
> Aug 4 13:58:12 tyne kernel: session2: fail_scsi_task task itt 0x28 ttt
> 0x4e2c1b01 sc ffff8800724cf680 still active
>
I think what is being checked in the ttt tracking patch and what is
mentioned in the RFC are different.
I think we only need to respond to commands like r2t from the target in
order to satisfy the ttt comment. If fast_abort is 0/No, then when we
get a R2T we will to send the data for it. This completes the sequence
that the target is waiting for. We might slightly violate the RFC in
that we send all the data for the r2t, and the RFC says to terminate the
sequence quickly so maybe it wanted us to send a data-out with the F bit
set but not all the data. I do not know. It probably does not matter.
Once we send the data-outs for all the data that the r2t requested, then
the target can send another r2t, send a response for the task (it can
send a scsi cmd pdu indicating a error), or it can respond to the TMF
that was affecting it.
You patch considers the TTT completed when the entire command/task is
completed. So you are waiting for the initiator to get the task's status
in a scsi cmd pdu (for writes). If we do not get status that the task is
completed then your patch prints an error.
What your patch is expecting to happen with the current code is for the
lu reset to be sent, then R2T responded to, then the target send a scsi
cmd response pdu for the tasks affected by the TMF. I do not think this
is right, because when the target sends the TMF response then the
response applies to all the affected tasks and we do not need a response
for each individual scsi command.
If you want to see if r2ts are being dropped you can check in
iscsi_tcp_cleanup_task. There is this "pending r2t dropped" message.
Then you would want to add a printk here:
r2t = tcp_task->r2t;
if (r2t != NULL) {
__kfifo_put(tcp_task->r2tpool.queue, (void*)&r2t,
sizeof(void*));
tcp_task->r2t = NULL;
}
Note: if you are running with fast_abort=1/Yes, then we have that
problem I mentioned before where a task can get stuck at the head of the
requeue/cmd list and so tasks after it will not get run, and in that
case r2ts might not get answered.
If you think we need to send a abort for each task before sending a lu
reset, then let's take to the ips list, because I cannot even download
sam2 from t10's website. Is there supposed to be way to get non draft
versions for free somehow?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---