[ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864488#action_12864488 ]
Henry Robinson commented on ZOOKEEPER-763: ------------------------------------------ Kapil - Thanks! Adding that sleep helped me understand what was going on. pyzoo_close has the GIL but blocks inside zookeeper_close, waiting for the completion thread to finish. However, if a completion is still inside Python, but has been pre-empted by the main thread which calls pyzoo_close, the completion can't get the GIL back to finish up executing, blocking the completions_thread for ever more. The fix is simple - relinquish the GIL during the zookeeper_close call, and then reacquire it straight after. There are even handy macros to do this: Py_BEGIN_ALLOW_THREADS ret = zookeeper_close(zhandles[zkhid]); Py_END_ALLOW_THREADS This same issue will affect any part of zkpython where a call to the C client is blocked on some work being completed in another Python thread - in practice, I think this means from callbacks. I'll audit the code to see if any other API calls are affected. Patch to fix this issue is following shortly - Kapil, I'd be very grateful if you could help us by testing it. cheers, Henry > Deadlock on close w/ zkpython / c client > ---------------------------------------- > > Key: ZOOKEEPER-763 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings > Affects Versions: 3.3.0 > Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk > Reporter: Kapil Thangavelu > Assignee: Mahadev konar > Fix For: 3.4.0 > > Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt > > > deadlocks occur if we attempt to close a handle while there are any > outstanding async requests (aget, acreate, etc). Normally on close both the > io thread terminates and the completion thread are terminated and joined, > however w\ith outstanding async requests, the completion thread won't be in a > joinable state, and we effectively hang when the main thread does the join. > afaics ideal behavior would be on close of a handle, to effectively clear out > any remaining callbacks and let the completion thread terminate. > i've tried adding some bookkeeping to within a python client to guard against > closing while there is an outstanding async completion request, but its an > imperfect solution since even after the python callback is executed there is > still a window for deadlock before the completion thread finishes the > callback. > a simple example to reproduce the deadlock is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.