Hannes Reinecke wrote:
> Mike Christie wrote:
> [ .. ]
>> I am not sure if we should be needing this if the target is operating
>> within the RFC (there is one exception but I am not sure if you are
>> hitting it).
>> In, I saw this:
>> An iSCSI target MUST be able to handle at least one immediate task
>> management command and one immediate non-task-management iSCSI command
>> per connection at any time.
>> also has:
>> The target MUST silently ignore any non-immediate command outside of
>> this range or non-immediate duplicates within the range. The CmdSN
>> carried by immediate commands may lie outside the ExpCmdSN to MaxCmdSN
>> range.
>> I took this to mean even when the window is closed we can send a nop as
>> immediate. What do you guys think?
> Totally beside the point.
> We're not sending NOPs outside the CmdSN range, we're sending
> _data_ PDUs outside the CmdSN range. Just make it a printk in
> the above patch and start hitting the target hard.
> You'll see an amazing number of messages ...
> There is _nothing_ in the code which checks if the data PDU
> we're about to send has a CmdSN within the target window.
> And then we're hitting the quoted text and the target drops the
> PDU, leading to a nice I/O stall.
> Which is the I/O stall I'm fighting with since _months_.
>> The initiator will only send 1 TMF as immediate per session at a time
>> and it will only send one nop as a ping marked as immediate at a time.
>> The only exception to use sending more than one non tmf immediate cmd is
>> if the target sends us a nop-in we could have sent two nop-outs marked
>> as immediate (for the nop-out in response to the target's nop-in,
>> 10.18.1 says we have to set the I bit).
>> If we send too many nops marked as immediate we should be getting a
>> reject pdu right? If so then I think we just need the attached patch
>> which adds some code to handle rejected immediate pdus. The patch is
>> made over scsi-rc-fixes and is only compile tested.
>> Are you only seeing this with the one target? Could we confirm with them
>> that they will accept one non tmf immediate command?
>> If I am reading the RFC wrong, then for your patch, we want to move the
>> check to below the check_mgmt label because iscsi_data_xmit can send
>> multiple pdus. You probably just want to move it to
>> iscsi_prep_mgmt_task(). Also I think we want to dequeue a nop as a ping
>> so it does not timeout while the cmd window is closed (the problem would
>> be is if the window was closed and then the connection goes bad - we
>> would not be able to catch that).
> Yes, you are probably correct in that we'd need to move it into
> the individual queue loops to be able to transmit as many PDUs as
> possible.
> With the original patch we're running into the risk of hitting the
> same error when enough PDUs are queued.
> I'll update the patch.

So, this looks far better:

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 21ed45f..8303676 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1212,6 +1212,13 @@ static int iscsi_xmit_task(struct iscsi_conn *conn)
        struct iscsi_task *task = conn->task;
        int rc;
+       /* Check for target CmdSN window */
+       if (conn->session->state == ISCSI_STATE_LOGGED_IN &&
+           iscsi_sna_lt(conn->session->max_cmdsn,
+                        conn->session->cmdsn))
+               /* Window closed, wait for CmdSN update */
+               return -EPROTO;
        rc = conn->session->tt->xmit_task(task);

There is one issue still outstanding:
Even with this patch the iscsi stack feels it necessary to send NOPs:

Jul 22 14:03:40 esk kernel: [  485.274672]  connection2:0: running ctask itt 90 
rc -71
Jul 22 14:03:40 esk kernel: [  485.297249]  connection2:0: running ctask itt 8 
rc -71
Jul 22 14:03:40 esk kernel: [  547.508673]  connection1:0: running ctask itt 46 
rc -71
Jul 22 14:03:41 esk kernel: [  548.155353]  connection1:0: running ctask itt 85 
rc -71
Jul 22 14:03:45 esk kernel: [  550.291112]  connection2:0: Sending nopout,cmd 
283450 max 283512 exp 283449
Jul 22 14:03:45 esk kernel: [  550.294583]  connection2:0: mgmtpdu [itt xa09 p 
ffff81007a1ffa40] queued
Jul 22 14:03:45 esk kernel: [  487.595817]  connection2:0: mgmtpdu [op 0x0 
hdr->itt 0xa09 datalen 0 cmdsn 283450/283450]
Jul 22 14:03:45 esk kernel: [  487.598659]  connection2:0: mgmtpdu [itt 0xa09 p 
ffff81007a1ffa40] done
Jul 22 14:03:45 esk kernel: [  539.429005]  connection2:0: mgmtpdu [itt xa09 p 
ffff81007a1ffa40] finished
Jul 22 14:03:45 esk kernel: [  487.603722]  connection2:0: running ctask itt 82 
rc -71
Jul 22 14:03:45 esk kernel: [  487.604826]  connection2:0: running ctask itt 38 
rc -71

And a wireshark trace reveals that indeed there is a network hickup before the 
NOP is sent during which time
simply _nothing_ happens on the line.
Well, not between the target and the initiator; can't tell about the overall 
(By way of clarification: 'running ctask itt ...' means we're here:

        if (conn->task) {
                rc = iscsi_xmit_task(conn);
                if (rc)
->                      goto again;

ie a running task has to be retried as the CmdSN window closed. And as you can 
see, by the time
the nop is being send the window is wide open again.
Looks to me as if we're missing a scsi_queue_work somewhere.

Maybe someone has better eyes than me ...


Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
For more options, visit this group at http://groups.google.com/group/open-iscsi

Reply via email to