Because ib_cancel_mad might invoke a callback that acquires the cm_id lock, the lock cannot be held when ib_cancel_mad is invoked.{snip}
To fix this in the CM, the call to ib_cancel_mad just needs to move inside the cm_id lock. Alternatively, it may be possible to change ib_cancel_mad to cancel MADs based on a second set of criteria. Both of these would require changes to the MAD layer.
Studying the problem more, I believe that this problem exists for both the CM and SA query code. Unless there is an objection, I'll submit a patch that will invoke a user's send callback after a MAD has been canceled from one of the MAD threads, rather than directly from the user's thread. (Similar to how the process local MAD functionality is implemented.)
This will allow locking around the cancel routine, which should fix the problem for the CM code. However, I don't think that locking around the cancel routine eliminates the issue from the SA query code, but I also don't see a simple fix in that case.
- Sean _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
