On 08/07/2009 05:27 AM, Hannes Reinecke wrote: > We should be flushing the workqueue before setting the suspend > bit. If we don't a LU Reset might kill commands which are already > in the queue and waiting to be send, causing the target to barf.
What do you mean here? What do you mean by barf? What commands in what queue? Are you talking about the problem where we get a tmf response, and do not send data-outs? If so this will not always help. The data-out could be on the wire and the target could still hit the problem. The problem is not ours. The target should not send a tmf response that indicates it was successful if there is a r2t that is in progress of being fullfilled. Or... Are you talking about if we have task 0 and task 1 in progress, then send a lu reset that should affect those two tasks, but then we get a task2 in the cmdqueue list, and then we get a tmf response, so we end up cleaning up task0, task1 and task2? If so, how does the target barf here (is it a connection drop or some error because the task does not get sent)? If this is your problem, then your patch would work under normal conditions, but if you have sndtmo=1 then you could still hit this problem a little more easily than normal. If the sendpage code returned EAGAIN (which gets converted to ENOBUFS in your patch) due to the sndtmo then we will return from iscsi_data_xmit without completely sending the task or any task behind it (if there was also a task3 queued). We could also hit the problem, where if task0 was affected by a TMF then iscsi_data_xmit returns EACCESS. Then task2 is not going to get sent, because it is stuck behind task0 on the cmdlist. I think in the patches [PATCH 1/5] libiscsi: Allow multiple LUN Reset TMF [PATCH 2/5] Check for TMF state before sending PDU you need to also modify conn->tmf_state and clear the conn->tmf header before suspending the xmit thread. I was thinking about this the other day and I think we can jsut add a check in fail_scsi_task to check for tasks with a cmdsn less than the lu reset cmd's cmdsn, right? Maybe we want to fail the affected tasks with DID_ERROR, but then for the non-affected tasks we could fail with DID_IMM_RETRY. So then if this was a reset from something like sg_reset those tasks that were not affected would get another full timeout period to run in case the lu reset took a long time. > > Signed-off-by: Hannes Reinecke<h...@suse.de> > --- > drivers/scsi/libiscsi.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index c58de26..e15ea86 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1864,9 +1864,9 @@ void iscsi_suspend_tx(struct iscsi_conn *conn) > struct Scsi_Host *shost = conn->session->host; > struct iscsi_host *ihost = shost_priv(shost); > > - set_bit(ISCSI_SUSPEND_BIT,&conn->suspend_tx); > if (ihost->workq) > flush_workqueue(ihost->workq); > + set_bit(ISCSI_SUSPEND_BIT,&conn->suspend_tx); > } > EXPORT_SYMBOL_GPL(iscsi_suspend_tx); > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---