Sean, While debugging some iwarp connection setup problems, I _might_ have stumbled onto a cma bug.
I'm running the kernel cmatose. The server side gets a connect request, but the init_node() returns an error because the qp create fails. The cmatose module then rejects the connect request on that connect request upcall. Concurrently (on the main work thread running run_server()), cmatose calls rdma_destroy_id() on the listening id. The destroy happens before the connect request upcall thread finishes (SMP :). Then as the other thread doing the connection request upcall unwinds the stack and finishes processing in iw_conn_req_handler(), the system Oopses in cma_release_remove() at line 1048 (with the iwarp cma patch). I think the oops is because the listen_id was already destroyed, and iw_conn_req_handler() didn't have a refence to it. So the cma_release_remove() code is touching freed memory. I _think_ the solution is to bump the listen_id refcnt at the top of cma_req_handler() and iw_conn_req_handler(), and do a cma_deref_id() on the listen_id at the end of the functions. I added this logic to the iwarp side and it appears to have fixed the problem. Steve. _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
