Roland Dreier wrote:
Vu> scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi midlayer Vu> tried to abort stu or tur command as well. Since we delay to Vu> clean in srp_reset_device(), srp's request queue is still not Vu> empty. This stu or tur command is freed by scsi midlayer. The Vu> next srp_host_reset() will try to clean srp's request queue Vu> with "old" request referencing to freed scsi command.This is where I get confused. We should be flushing out the command queue in srp_host_reset(), so the loop in scsi_error.c after resetting the host: rtn = scsi_try_host_reset(scmd); if (rtn == SUCCESS) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (!scsi_device_online(scmd->device) || (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) || !scsi_eh_tur(scmd)) scsi_eh_finish_cmd(scmd, done_q); } should not find any commands still queued.
Have you read scsi_eh_try_stu(scmnd) and scsi_eh_tur(scmnd)? These functions use the same scmnd and reformat it with new cdb and call srp_queuecommand() which uses new req and put this new req in request queue for this same scmnd with different cdb
Your previous patch can't be the right fix. I think there are two things wrong with the changes below to srp_reset_device(): - You changed srp_abort to remove the request from SRP's queue, but then you look it up and use it again in srp_reset_device, which seems risky at best.
I aggree. In stead of looking for and using the same req's buffer to send task management packet, I need new buffer to send this task management then.
- If srp_reset_device() succeeds, you don't flush all matching commands, so this will definitely leave some stale commands in SRP's queue.
I already guarantee to flush all commands in srp_abort. Vu _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
