Henry Robinson commented on ZOOKEEPER-763:
Thanks! Adding that sleep helped me understand what was going on.
pyzoo_close has the GIL but blocks inside zookeeper_close, waiting for the
completion thread to finish. However, if a completion is still inside Python,
but has been pre-empted by the main thread which calls pyzoo_close, the
completion can't get the GIL back to finish up executing, blocking the
completions_thread for ever more. The fix is simple - relinquish the GIL during
the zookeeper_close call, and then reacquire it straight after. There are even
handy macros to do this:
ret = zookeeper_close(zhandles[zkhid]);
This same issue will affect any part of zkpython where a call to the C client
is blocked on some work being completed in another Python thread - in practice,
I think this means from callbacks. I'll audit the code to see if any other API
calls are affected. Patch to fix this issue is following shortly - Kapil, I'd
be very grateful if you could help us by testing it.
> Deadlock on close w/ zkpython / c client
> Key: ZOOKEEPER-763
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
> Project: Zookeeper
> Issue Type: Bug
> Components: c client, contrib-bindings
> Affects Versions: 3.3.0
> Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
> Reporter: Kapil Thangavelu
> Assignee: Mahadev konar
> Fix For: 3.4.0
> Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt
> deadlocks occur if we attempt to close a handle while there are any
> outstanding async requests (aget, acreate, etc). Normally on close both the
> io thread terminates and the completion thread are terminated and joined,
> however w\ith outstanding async requests, the completion thread won't be in a
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against
> closing while there is an outstanding async completion request, but its an
> imperfect solution since even after the python callback is executed there is
> still a window for deadlock before the completion thread finishes the
> a simple example to reproduce the deadlock is attached.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.