On 08/07/2009 05:27 AM, Hannes Reinecke wrote:
> We should be flushing the workqueue before setting the suspend
> bit. If we don't a LU Reset might kill commands which are already
> in the queue and waiting to be send, causing the target to barf.

What do you mean here? What do you mean by barf? What commands in what 
queue?

Are you talking about the problem where we get a tmf response, and do 
not send data-outs? If so this will not always help. The data-out could 
be on the wire and the target could still hit the problem. The problem 
is not ours. The target should not send a tmf response that indicates it 
was successful if there is a r2t that is in progress of being fullfilled.


Or...

Are you talking about if we have task 0 and task 1 in progress, then 
send a lu reset that should affect those two tasks, but then we get a 
task2 in the cmdqueue list, and then we get a tmf response, so we end up 
cleaning up task0, task1 and task2? If so, how does the target barf here 
(is it a connection drop or some error because the task does not get sent)?

If this is your problem, then your patch would work under normal 
conditions, but if you have sndtmo=1 then you could still hit this 
problem a little more easily than normal. If the sendpage code returned 
EAGAIN (which gets converted to ENOBUFS in your patch) due to the sndtmo 
then we will return from
iscsi_data_xmit without completely sending the task or any task behind 
it (if there was also a task3 queued).

We could also hit the problem, where if task0 was affected by a TMF then 
iscsi_data_xmit returns EACCESS. Then task2 is not going to get sent, 
because it is stuck behind task0 on the cmdlist. I think in the patches
[PATCH 1/5] libiscsi: Allow multiple LUN Reset TMF
[PATCH 2/5] Check for TMF state before sending PDU
you need to also modify conn->tmf_state and clear the conn->tmf header 
before suspending the xmit thread.


I was thinking about this the other day and I think we can jsut add a 
check in fail_scsi_task to check for tasks with a cmdsn less than the lu 
reset cmd's cmdsn, right? Maybe we want to fail the affected tasks with 
DID_ERROR, but then for the non-affected tasks we could fail with 
DID_IMM_RETRY. So then if this was a reset from something like sg_reset 
those tasks that were not affected would get another full timeout period 
to run in case the lu reset took a long time.

>
> Signed-off-by: Hannes Reinecke<h...@suse.de>
> ---
>   drivers/scsi/libiscsi.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index c58de26..e15ea86 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1864,9 +1864,9 @@ void iscsi_suspend_tx(struct iscsi_conn *conn)
>       struct Scsi_Host *shost = conn->session->host;
>       struct iscsi_host *ihost = shost_priv(shost);
>
> -     set_bit(ISCSI_SUSPEND_BIT,&conn->suspend_tx);
>       if (ihost->workq)
>               flush_workqueue(ihost->workq);
> +     set_bit(ISCSI_SUSPEND_BIT,&conn->suspend_tx);
>   }
>   EXPORT_SYMBOL_GPL(iscsi_suspend_tx);
>


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to