Vu> scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi midlayer
    Vu> tried to abort stu or tur command as well. Since we delay to
    Vu> clean in srp_reset_device(), srp's request queue is still not
    Vu> empty. This stu or tur command is freed by scsi midlayer. The
    Vu> next srp_host_reset() will try to clean srp's request queue
    Vu> with "old" request referencing to freed scsi command.

This is where I get confused.  We should be flushing out the command
queue in srp_host_reset(), so the loop in scsi_error.c after resetting
the host:

                rtn = scsi_try_host_reset(scmd);
                if (rtn == SUCCESS) {
                        list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
                                if (!scsi_device_online(scmd->device) ||
                                    (!scsi_eh_try_stu(scmd) && 
!scsi_eh_tur(scmd)) ||
                                    !scsi_eh_tur(scmd))
                                        scsi_eh_finish_cmd(scmd, done_q);
                        }

should not find any commands still queued.

Your previous patch can't be the right fix.  I think there are two
things wrong with the changes below to srp_reset_device():

 - You changed srp_abort to remove the request from SRP's queue, but
   then you look it up and use it again in srp_reset_device, which
   seems risky at best.
 - If srp_reset_device() succeeds, you don't flush all matching
   commands, so this will definitely leave some stale commands in
   SRP's queue.

 static int srp_reset_device(struct scsi_cmnd *scmnd)
 {
        struct srp_target_port *target = host_to_target(scmnd->device->host);
-       struct srp_request *req, *tmp;
+       struct srp_request *req;
+       int ret = SUCCESS;
 
        printk(KERN_ERR "SRP reset_device called\n");
 
-       if (srp_find_req(target, scmnd, &req))
-               return FAILED;
-       if (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET))
-               return FAILED;
-       if (req->tsk_status)
-               return FAILED;
-
-       spin_lock_irq(target->scsi_host->host_lock);
-
-       list_for_each_entry_safe(req, tmp, &target->req_queue, list)
-               if (req->scmnd->device == scmnd->device) {
-                       req->scmnd->result = DID_RESET << 16;
-                       scmnd->scsi_done(scmnd);
-                       srp_remove_req(target, req);
-               }
-
-       spin_unlock_irq(target->scsi_host->host_lock);
+       if ((srp_find_req(target, scmnd, &req)) ||
+           (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET)) ||
+           (req->tsk_status))
+               ret = FAILED;
 
-       return SUCCESS;
+       return ret;
 }
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to