Hi Roland,
I reported the error from my original email responding to your fmr
patch. For ia64 system with pcix hca I got asyn event
IB_EVENT_QP_ACCESS_ERR at the initiator (and I got cqe with
IB_COMPLETION_STATUS_REMOTE_ACCESS_ERROR status at my target)
I still have not had an IB analyzer trace (as you suggested)
I still have not had the IB trace yet.
So the SCSI midlayer times out commands and tries to abort them. But
we have no connection so the abort fails. The SCSI command shouldn't
get freed now (at least if I'm understanding scsi_error.c correctly).
Then we have no .eh_device_reset_handler so everything should fall
through to calling our .eh_host_reset_handler without freeing any SCSI
commands. And then we crash on a use-after-free of a SCSI command.
So where is that command getting freed on us??
The scsi command that is used by error handlers (.eh_abort_handler,
.eh_host_reset_handler...) is not the same as use-after-free scsi
command from req->scmnd
There is some glitch that the scsi command from req->scmnd already freed
by scsi midlayer; however, the request is still in our pending request
queue
With the following patch applied my ia64 system does not crash anymore
I prepare this patch diffing from srp revision 6455 applied with
srp-params.patch that I sent you last week
Please let me know if you want this patch generated from current srp
(revision 6550)
What is the status for srp-params.patch (introducing tuned parameters)
Thanks,
Vu
diff -Nau srp/ib_srp.c srp.6455.tuned_params/ib_srp.c
--- srp/ib_srp.c 2006-04-21 14:51:52.000000000 -0700
+++ srp.6455.tuned_params/ib_srp.c 2006-04-21 15:00:28.000000000 -0700
@@ -80,9 +80,6 @@
static void srp_remove_one(struct ib_device *device);
static void srp_completion(struct ib_cq *cq, void *target_ptr);
static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
-static void srp_unmap_data(struct scsi_cmnd *scmnd,
- struct srp_target_port *target,
- struct srp_request *req);
static struct ib_client srp_client = {
.name = "srp",
@@ -462,8 +459,6 @@
; /* nothing */
list_for_each_entry(req, &target->req_queue, list) {
- srp_unmap_data(req->scmnd, target, req);
-
req->scmnd->result = DID_RESET << 16;
req->scmnd->scsi_done(req->scmnd);
}
@@ -1200,24 +1195,15 @@
spin_unlock_irq(target->scsi_host->host_lock);
if (!wait_for_completion_timeout(&req->done,
- msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS))) {
- if (func == SRP_TSK_ABORT_TASK) {
- srp_unmap_data(scmnd, target, req);
- srp_remove_req(target, req, req_index);
- }
+ msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS)))
return FAILED;
- }
spin_lock_irq(target->scsi_host->host_lock);
- if ((req->cmd_done) && (func == SRP_TSK_ABORT_TASK)) {
- srp_unmap_data(scmnd, target, req);
+ if (req->cmd_done) {
srp_remove_req(target, req, req_index);
scmnd->scsi_done(scmnd);
} else if (!req->tsk_status) {
- if (func == SRP_TSK_ABORT_TASK) {
- srp_unmap_data(scmnd, target, req);
- srp_remove_req(target, req, req_index);
- }
+ srp_remove_req(target, req, req_index);
scmnd->result = DID_ABORT << 16;
ret = SUCCESS;
}
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general