The completion manager abstraction can hang during cleanup. The hang occurs when the user calls CompManagerCancel before calling CompManagerClose, and the user is not waiting for events. In this case, the completion manager thread will pull the cancel request from the IO completion port and queue it with the manager. When CompManagerClose is called, it calls CompManagerCancel to signal the thread to check the running state. However, the completion manager's event structure is still marked busy from the user's CompManagerCancel call.
The result is that the completion manager thread does not receive the signal to check the running flag and remains asleep. Fix this by using a different completion entry to signal the thread during destruction than that used to cancel a CompManagerPoll event. This fixes occasional hangs running dapltest with both the rdma_cm and socket cm providers. Signed-off-by: Sean Hefty <[email protected]> --- Index: comp_channel.cpp =================================================================== --- comp_channel.cpp (revision 2311) +++ comp_channel.cpp (working copy) @@ -102,8 +102,12 @@ void CompManagerClose(COMP_MANAGER *pMgr) { + COMP_CHANNEL *channel; + COMP_ENTRY entry; + pMgr->Run = FALSE; - CompManagerCancel(pMgr); + CompEntryInit(NULL, &entry); + PostQueuedCompletionStatus(pMgr->CompQueue, 0, (ULONG_PTR) pMgr, &entry.Overlap); WaitForSingleObject(pMgr->Thread, INFINITE); CloseHandle(pMgr->Thread); _______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
