Hi Sean, I agree on the race between the threads, and this is something that I had considered as a separate problem (but now it comes back to haunt me :-).
An easier solution for this problem is to make sure that whoever gets the agent (ib_mad_recv_done_handler) validate the mad_agent before calling us. Basically find_mad_agent can hold a refcnt on the agent. Is that correct ? If so, I can make a patch to handle races on that front. This code is pretty complicated, so please let me know if I have grossly mis-stated something (agents and agent_private, and whatnots :-). Thanks for your feedback, - KK On Mon, 1 Nov 2004, Sean Hefty wrote: > On Mon, 1 Nov 2004 16:38:03 -0800 (PST) > Krishna Kumar <[EMAIL PROTECTED]> wrote: > > > Hi Sean, > > > > I think it is reasonable to have current senders racing with > > unregister. The unregister is waiting for all references to drop to > > zero before freeing up the resources. It killed the ones waiting for > > responses(mad_cancel), killed the ones who are executing in callback > > handlers, and finally after dropping the loader's module refcnt, it > > waits for the refcnt to drop to zero. These can only be threads which > > are actively receiving mad packets and those threads in the process of > > sending mad packets while the unregister was going on (and the ones > > which fail is the only cause of the problem). Essentially I think the > > unregister will hang and not free up the resource. > > The difference here is that a client is calling into the API at the same > time that they are trying to unregister. The code, even with this > change, cannot handle this condition. > > For example, if the thread calling ib_unregister_mad_agent executes > completely before the thread calling ib_post_send_mad runs (or can take > a reference on the mad_agent), the mad_agent is no longer valid, and the > structure will have been freed. The thread executing ib_post_send_mad > can crash the system at this point. > > If we want to allow a client to call ib_unregister_mad_agent and > ib_post_send_mad simultaneously, then ib_post_send_mad would need to > perform some sort of lookup (likely in some global map) to validate the > mad_agent. > > - Sean > > _______________________________________________ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
