>Once the the cm_id is connected, the provider must post a CLOSE event >when it is done with the cm_id. That's the model. The IWCM will not >free the cm_id until the CLOSE upcall happens. Adding an explicit >alloc_context/dealloc_context in the provider will just push this logic >down into each provider. IE: The chelsio provider would block the >dealloc_context call until the LLP connection is fully shut down.
I just think that this approach is susceptible subtle race conditions that will be extremely difficult to debug. And so far all of the patches submitted have had some sort of race. I do not know if there's a race in the latest submission. I'm just saying that the destruction is complex -- involving a cm_id state, bit-flag, event state, and reference count -- which makes it difficult to verify its correctness. For example, as soon as the user calls connect(), can they receive a CLOSE event, even before the connect() call returns? If so, are there any issues here? Is it possible for the user to call down to the provider, after the provider has generated a CLOSE event, resulting in accessing the wrong connection, or crashing in the provider? Note, that I'm not saying that providers need to block a call until everything is shutdown. It only needs to ensure that no callbacks will occur after dealloc_context() returns. Destroy_listen() should be providing similar logic, so it ends up being in each provider anyway. At this point, I'm still trying to understand the operation. When does the provider allocate a context for the user? My guess is when calling connect() or listen(). When does the provider deallocate this context? If it's not always done in response to the user invoking a function, then we're almost certain to have a race. - Sean _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
