Roland Dreier wrote:
 > 1st scsi_try_host_reset() --> srp_host_reset() -->
 > srp_reconnect_target() return SUCCESS. Then scsi_eh_try_stu() or
 > scsi_eh_tur() is called right after
> > scsi_eh_try_stu or scsi_eh_tur --> scsi_send_eh_cmnd() -->
 > srp_queuecommand()

But after srp_reconnect_target(), both SRP's and the midlayer's queue
of pending commands should be completely empty, since I put

        list_for_each_entry(req, &target->req_queue, list) {
                req->scmnd->result = DID_RESET << 16;
                req->scmnd->scsi_done(req->scmnd);
                srp_unmap_data(req->scmnd, target, req);
        }

and

        INIT_LIST_HEAD(&target->free_reqs);
        INIT_LIST_HEAD(&target->req_queue);
        for (i = 0; i < SRP_SQ_SIZE; ++i)
                list_add_tail(&target->req_ring[i].list, &target->free_reqs);

in there.  Why doesn't that work to kill all the pending commands?

That works fine and kills all the pending commands; however right after srp_host_reset return, scsi error handling queue/send the stu or tur scsi command right away in the error handling flow of function scsi_eh_host_reset()

Please re-read scsi_eh_host_reset() and scsi_try_host_reset() in scsi_error.c. Here is the logic

scsi_eh_host_reset() --> scsi_try_host_reset() --> srp_host_reset() --- all pending command are killed. srp_host_reset() returns SUCCESS, scsi_try_host_reset() returns SUCCCESS.

static int scsi_eh_host_reset(struct list_head *work_q,
                              struct list_head *done_q)
{
...

       rtn = scsi_try_host_reset(scmd);
       if (rtn == SUCCESS) {
list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
               if (!scsi_device_online(scmd->device) ||
(!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) ||
                   !scsi_eh_tur(scmd))

...
}

Since the (rtn == SUCCESS), scsi_eh_host_reset calls scsi_eh_try_stu() or scsi_eh_try_tur() which will call scsi_send_eh_cmnd() --> srp_queuecommand(). Now srp's request queue is not empty anymore.

scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi midlayer tried to abort stu or tur command as well. Since we delay to clean in srp_reset_device(), srp's request queue is still not empty. This stu or tur command is freed by scsi midlayer. The next srp_host_reset() will try to clean srp's request queue with "old" request referencing to freed scsi command.

If you still have question, I can call you or give me a call at (408) 916-0006

Vu
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to