Re: Lost active R2T transfers during reset

2009-08-05 Thread Hannes Reinecke

Mike Christie wrote:
 Hannes Reinecke wrote:
 Hi Mike,

 as you might've seen, I finally found the problem for the MSA dropping
 the connection. It seems that it's follows this section from the RFC:

For the LOGICAL UNIT RESET function, the target MUST behave as
dictated by the Logical Unit Reset function in [SAM2].

 where SAM2 says:
   When a logical unit is aborting one or more tasks from a SCSI
   initiator port with the TASK ABORTED status it should complete
   all of those tasks before entering additional tasks from that
   SCSI initiator port into the task set.
  
 So the tasks must be _completed_ at the target. Which can be
 interpreted as requiring the target to send an ABORT_TASK_SET
 to each outstanding task, so that this section applies:

For ABORT TASK SET and CLEAR TASK SET, the issuing initiator MUST
continue to respond to all valid target transfer tags (received via
R2T, Text Response, NOP-In, or SCSI Data-In PDUs) related to the
affected task set, even after issuing the task management request.
The issuing initiator SHOULD however terminate (i.e., by setting the
F-bit to 1) these response sequences as quickly as possible.  The
target on its part MUST wait for responses on all affected target
transfer tags before acting on either of these two task management
requests.  In case all or part of the response sequence is not
received (due to digest errors) for a valid TTT, the target MAY treat
it as a case of within-command error recovery class (see Section
6.1.4.1 Recovery Within-command) if it is supporting
ErrorRecoveryLevel = 1, or alternatively may drop the connection to
complete the requested task set function.

 This is clarified by RFC 5048 section 4.1.2:

 The initiator iSCSI layer:
  a. MUST continue to respond to each TTT received for the affected
 tasks
 
 
 
 
 4.1.2 and the passage above it from 3720 applies to lu reset too right?
 That is my understanding. The comment about sending a ABORT_TASK_SET
 confused me.
 
 
 

 [ .. ]
 The target iSCSI layer:
 a. MUST wait for responses on currently valid target-transfer tags
 of the affected tasks from the issuing initiator.

 Which is exactly what I've seen with the 'ttt tracking' patch:

 Aug  4 13:58:10 tyne kernel:  session2: iscsi_eh_device_reset LU Reset
 [sc 88005cf4ba80 lun 1]
 Aug  4 13:58:10 tyne kernel:  session2: iscsi_exec_task_mgmt_fn tmf
 set timeout
 Aug  4 13:58:10 tyne kernel:  session2: iscsi_eh_device_reset dev
 reset result = SUCCESS
 Aug  4 13:58:12 tyne kernel:  session2: iscsi_eh_device_reset LU Reset
 [sc 88005cc12880 lun 2]
 Aug  4 13:58:12 tyne kernel:  session2: iscsi_exec_task_mgmt_fn tmf
 set timeout
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0xe
 ttt 0xc5cf6a01 sc 8800378c9d80 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x15
 ttt 0x2590d700 sc 88007a5c8980 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x18
 ttt 0x4926d000 sc 880078d8da80 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x1f
 ttt 0x89ac9500 sc 88007a5de080 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x27
 ttt 0x7d0d4201 sc 8800378c9680 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x28
 ttt 0x4e2c1b01 sc 8800724cf680 still active

 
 
 I think what is being checked in the ttt tracking patch and what is
 mentioned in the RFC are different.
 
Might well be; it's not that I've understood the iscsi stack in all
its subtleties.

 
 I think we only need to respond to commands like r2t from the target in
 order to satisfy the ttt comment. If fast_abort is 0/No,  then when we
 get a R2T we will to send the data for it. This completes the sequence
 that the target is waiting for.

Correct. My point here is that there might be still some Data-out PDUs
stuck in the queue, which will never get send as we break out on the
first non-eligible PDU.

 We might slightly violate the RFC in
 that we send all the data for the r2t, and the RFC says to terminate the
 sequence quickly so maybe it wanted us to send a data-out with the F bit
 set but not all the data. I do not know. It probably does not matter.
Don't know either, there is this section in the spec:

   An R2T MAY be answered with one or more SCSI Data-Out PDUs with a
   matching Target Transfer Tag.  If an R2T is answered with a single
   Data-Out PDU, the Buffer Offset in the Data PDU MUST be the same as
   the one specified by the R2T, and the data length of the Data PDU
   MUST be the same as the Desired Data Transfer Length specified in the
   R2T.  If the R2T is answered with a sequence of Data PDUs, the Buffer
   Offset and Length MUST be within the range of those specified by R2T,
   and the last PDU MUST have the F bit set to 1.  If the last PDU
   (marked with the F bit) is received before the Desired Data 

Re: Lost active R2T transfers during reset

2009-08-05 Thread Hannes Reinecke

Mike Christie wrote:
 Mike Christie wrote:
 Note: if you are running with fast_abort=1/Yes, then we have that
 problem I mentioned before where a task can get stuck at the head of
 the requeue/cmd list and so tasks after it will not get run, and in
 that case r2ts might not get answered.

 
 Oh yeah, make sure you are not using your cmdns window patch, because
 your patch will prevent data-outs from being sent in response to r2ts if
 the window is closed.

Yes, did so. Nae bother.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Lost active R2T transfers during reset

2009-08-05 Thread Mike Christie

On 08/05/2009 01:52 AM, Hannes Reinecke wrote:
 Aug  5 08:46:16 tyne kernel:  session2: fail_scsi_task task itt 0x5 ttt 
 0xf2e7e801 sc 880078866180 still active
 Aug  5 08:46:16 tyne kernel:  connection2:0: pending r2t itt 0x5 ttt 
 0xf2e7e801 dropped
 Aug  5 08:46:16 tyne kernel:  session2: fail_scsi_task task itt 0xf ttt 
 0x65c42601 sc 880073d5c480 still active
 Aug  5 08:46:16 tyne kernel:  connection2:0: pending r2t itt 0xf ttt 
 0x65c42601 dropped
 Aug  5 08:46:16 tyne kernel:  session2: iscsi_eh_device_reset dev reset 
 result = SUCCESS

 So my patch wasn't that far off the mark :-)


Yeah :) Something is screwing up. I will do some digging in that code. 
The original r2t code had some weird optimizations that always screw me up.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Lost active R2T transfers during reset

2009-08-04 Thread Mike Christie

Hannes Reinecke wrote:
 Hi Mike,
 
 as you might've seen, I finally found the problem for the MSA dropping
 the connection. It seems that it's follows this section from the RFC:
 
For the LOGICAL UNIT RESET function, the target MUST behave as
dictated by the Logical Unit Reset function in [SAM2].
 
 where SAM2 says:
   When a logical unit is aborting one or more tasks from a SCSI
   initiator port with the TASK ABORTED status it should complete
   all of those tasks before entering additional tasks from that
   SCSI initiator port into the task set.
  
 So the tasks must be _completed_ at the target. Which can be
 interpreted as requiring the target to send an ABORT_TASK_SET
 to each outstanding task, so that this section applies:
 
For ABORT TASK SET and CLEAR TASK SET, the issuing initiator MUST
continue to respond to all valid target transfer tags (received via
R2T, Text Response, NOP-In, or SCSI Data-In PDUs) related to the
affected task set, even after issuing the task management request.
The issuing initiator SHOULD however terminate (i.e., by setting the
F-bit to 1) these response sequences as quickly as possible.  The
target on its part MUST wait for responses on all affected target
transfer tags before acting on either of these two task management
requests.  In case all or part of the response sequence is not
received (due to digest errors) for a valid TTT, the target MAY treat
it as a case of within-command error recovery class (see Section
6.1.4.1 Recovery Within-command) if it is supporting
ErrorRecoveryLevel = 1, or alternatively may drop the connection to
complete the requested task set function.
 
 This is clarified by RFC 5048 section 4.1.2:
 
 The initiator iSCSI layer:
  a. MUST continue to respond to each TTT received for the affected
 tasks




4.1.2 and the passage above it from 3720 applies to lu reset too right? 
That is my understanding. The comment about sending a ABORT_TASK_SET 
confused me.



 
 [ .. ]
 The target iSCSI layer:
 a. MUST wait for responses on currently valid target-transfer tags
 of the affected tasks from the issuing initiator.
 
 Which is exactly what I've seen with the 'ttt tracking' patch:
 
 Aug  4 13:58:10 tyne kernel:  session2: iscsi_eh_device_reset LU Reset [sc 
 88005cf4ba80 lun 1]
 Aug  4 13:58:10 tyne kernel:  session2: iscsi_exec_task_mgmt_fn tmf set 
 timeout
 Aug  4 13:58:10 tyne kernel:  session2: iscsi_eh_device_reset dev reset 
 result = SUCCESS
 Aug  4 13:58:12 tyne kernel:  session2: iscsi_eh_device_reset LU Reset [sc 
 88005cc12880 lun 2]
 Aug  4 13:58:12 tyne kernel:  session2: iscsi_exec_task_mgmt_fn tmf set 
 timeout
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0xe ttt 
 0xc5cf6a01 sc 8800378c9d80 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x15 ttt 
 0x2590d700 sc 88007a5c8980 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x18 ttt 
 0x4926d000 sc 880078d8da80 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x1f ttt 
 0x89ac9500 sc 88007a5de080 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x27 ttt 
 0x7d0d4201 sc 8800378c9680 still active
 Aug  4 13:58:12 tyne kernel:  session2: fail_scsi_task task itt 0x28 ttt 
 0x4e2c1b01 sc 8800724cf680 still active
 


I think what is being checked in the ttt tracking patch and what is 
mentioned in the RFC are different.


I think we only need to respond to commands like r2t from the target in 
order to satisfy the ttt comment. If fast_abort is 0/No,  then when we 
get a R2T we will to send the data for it. This completes the sequence 
that the target is waiting for. We might slightly violate the RFC in 
that we send all the data for the r2t, and the RFC says to terminate the 
sequence quickly so maybe it wanted us to send a data-out with the F bit 
set but not all the data. I do not know. It probably does not matter. 
Once we send the data-outs for all the data that the r2t requested, then 
the target can send another r2t, send a response for the task (it can 
send a scsi cmd pdu indicating a error), or it can respond to the TMF 
that was affecting it.


You patch considers the TTT completed when the entire command/task is 
completed. So you are waiting for the initiator to get the task's status 
in a scsi cmd pdu (for writes). If we do not get status that the task is 
completed then your patch prints an error.

What your patch is expecting to happen with the current code is for the 
lu reset to be sent, then R2T responded to, then the target send a scsi 
cmd response pdu for the tasks affected by the TMF. I do not think this 
is right, because when the target sends the TMF response then the 
response applies to all the affected tasks and we do not need a response 
for each individual scsi command.

If you want to see if r2ts are being dropped you can