Mike Christie wrote:
On 08/07/2009 05:27 AM, Hannes Reinecke wrote:
We should be flushing the workqueue before setting the suspend
bit. If we don't a LU Reset might kill commands which are already
in the queue and waiting to be send, causing the target to barf.
What do you mean here? What do you mean by barf? What commands in what
queue?
In the 'requeue' queue. There might be Data-out requests still waiting
to be sent, but we can't as the xmit thread would be short-circuited
as we've set the SUSPEND bit.
Are you talking about the problem where we get a tmf response, and do
not send data-outs? If so this will not always help. The data-out could
be on the wire and the target could still hit the problem. The problem
is not ours. The target should not send a tmf response that indicates it
was successful if there is a r2t that is in progress of being fullfilled.
That is a matter of interpretation. The targets seem to implement the wording
of RFC5048, section 4.1.2:
The target iSCSI layer:
- MUST wait for responses on currently valid target-transfer tags of the
affected tasks from the issuing initiator.
So, a 'wait' clearly indicates some sort of timer on the target side.
However, what should be the response if not all ttt are being received
from the initiator?
From what I've gathered, the MSA target in this case will return a
TMF response even when not all data-outs have been received, and
after a certain time will drop the connection to trigger ERL0.
Basically, all I'm trying to do is to run this test-case:
(bonnie/dt on all devices)
for d in /sys/bus/sd*; do
sg_reset -d /dev/${file##*/}
done
The result of this test run is that in ca 90% sg_reset will
run through, but the other 10% cause a full ERL0.
And I'm trying to determine the reason for this pecularity.
Weird, though, that apparently I'm hitting quite a bunch
of issues here ...
Or...
Are you talking about if we have task 0 and task 1 in progress, then
send a lu reset that should affect those two tasks, but then we get a
task2 in the cmdqueue list, and then we get a tmf response, so we end up
cleaning up task0, task1 and task2? If so, how does the target barf here
(is it a connection drop or some error because the task does not get sent)?
Hmm. Checking ...
No, I won't be sending any other SCSI command PDU to the affected LUN; that's
being blocked by the check_tmf_restrictions() function.
If this is your problem, then your patch would work under normal
conditions, but if you have sndtmo=1 then you could still hit this
problem a little more easily than normal. If the sendpage code returned
EAGAIN (which gets converted to ENOBUFS in your patch) due to the sndtmo
then we will return from
iscsi_data_xmit without completely sending the task or any task behind
it (if there was also a task3 queued).
We could also hit the problem, where if task0 was affected by a TMF then
iscsi_data_xmit returns EACCESS. Then task2 is not going to get sent,
because it is stuck behind task0 on the cmdlist. I think in the patches
[PATCH 1/5] libiscsi: Allow multiple LUN Reset TMF
[PATCH 2/5] Check for TMF state before sending PDU
you need to also modify conn-tmf_state and clear the conn-tmf header
before suspending the xmit thread.
I was thinking about this the other day and I think we can jsut add a
check in fail_scsi_task to check for tasks with a cmdsn less than the lu
reset cmd's cmdsn, right? Maybe we want to fail the affected tasks with
DID_ERROR, but then for the non-affected tasks we could fail with
DID_IMM_RETRY. So then if this was a reset from something like sg_reset
those tasks that were not affected would get another full timeout period
to run in case the lu reset took a long time.
I actually thought about the same lines, and even added some CmdSn printks.
However, the CmdSn of running task is not getting updated, so currently
these are quite pointless. But I can see what I'm getting.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries Storage
h...@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---