Hi Folks,

I'm constructing an async api on top of the zookeeper python bindings for
twisted. The intent was to make a thin wrapper that would wrap the existing
async api with one that allows for integration with the twisted python event
loop (http://www.twistedmatrix.com) primarily using the async apis.

One issue i'm running into while developing a unit tests, deadlocks occur if
we attempt to close a handle while there are any outstanding async requests
(aget, acreate, etc). Normally on close both the io thread terminates and
the completion thread are terminated and joined, however w\ith outstanding
async requests, the completion thread won't be in a joinable state, and we
effectively hang when the main thread does the join.

I'm curious if this would be considered bug, afaics ideal behavior would be
on close of a handle, to effectively clear out any remaining callbacks and
let the completion thread terminate.

i've tried adding some bookkeeping to the api to guard against closing while
there is an outstanding completion request, but its an imperfect solution do
to the nature of the event loop integration. The problem is that the python
callback invoked by the completion thread in turn schedules a function for
the main thread. In twisted the api for this is implemented by appending the
function to a list attribute on the reactor and then writing a byte to a
pipe to wakeup the main thread. If a thread switch to the main thread occurs
before the completion thread callback returns, the scheduled function runs
and the rest of the application keeps processing, of which the last step for
the unit tests is to close the connection, which results in a deadlock.

i've included some of the client log and gdb stack traces from a deadlock'd
client process.

thanks,

Kapil
This is all against zookeeper 3.3.0

Log Trace From a Deadlock

  access prefixed lines in the log are a wrapper around the zookeeper module 
which prints the name of any zookeeper module attributes that are accessed by 
the client.

access set_debug_level
access LOG_LEVEL_DEBUG
access init
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@658: Client 
environment:zookeeper.version=zookeeper C client 3.3.0
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@662: Client 
environment:host.name=kapil-laptop
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@669: Client 
environment:os.name=Linux
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@670: Client 
environment:os.arch=2.6.32-21-generic
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@671: Client 
environment:os.version=#32-Ubuntu SMP Fri Apr 16 08:10:02 UTC 2010
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@679: Client 
environment:user.name=(null)
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@687: Client 
environment:user.home=/home/kapil
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@log_env@699: Client 
environment:user.dir=/home/kapil/projects/txzookeeper/trunk/_trial_temp
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_i...@zookeeper_init@727: 
Initiating client connection, host=127.0.0.1:2181 sessionTimeout=30000 
watcher=0xfc2ee0 sessionId=0 sessionPasswd=<null> context=0xa64c898 flags=0
2010-05-01 12:55:50,861:15428(0xb77b86c0):zoo_de...@start_threads@152: starting 
threads...
2010-05-01 12:55:50,861:15428(0xb7464b70):zoo_de...@do_io@279: started IO thread
2010-05-01 12:55:50,862:15428(0xb6c63b70):zoo_de...@do_completion@326: started 
completion thread
2010-05-01 12:55:50,862:15428(0xb7464b70):zoo_i...@check_events@1585: initiated 
connection to server [127.0.0.1:2181]
2010-05-01 12:55:50,913:15428(0xb7464b70):zoo_i...@check_events@1632: session 
establishment complete on server [127.0.0.1:2181], sessionId=0x12853ee006c0005, 
negotiated timeout=30000
2010-05-01 12:55:50,913:15428(0xb7464b70):zoo_de...@check_events@1638: Calling 
a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2010-05-01 12:55:50,913:15428(0xb6c63b70):zoo_de...@process_completions@1765: 
Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
access CONNECTED_STATE
access EPHEMERAL
access acreate
2010-05-01 12:55:50,913:15428(0xb77b86c0):zoo_de...@zoo_acreate@2489: Sending 
request xid=0x4bdc24d7 for path [/foobar] to 127.0.0.1:2181
access OK
2010-05-01 12:55:50,922:15428(0xb7464b70):zoo_de...@zookeeper_process@1980: 
Queueing asynchronous response
2010-05-01 12:55:50,922:15428(0xb6c63b70):zoo_de...@process_completions@1817: 
Calling COMPLETION_STRING for xid=0x4bdc24d7 rc=0
access NODEEXISTS
access aget
2010-05-01 12:55:50,922:15428(0xb77b86c0):zoo_de...@zoo_awget@2400: Sending 
request xid=0x4bdc24d8 for path [/foobar] to 127.0.0.1:2181
access OK
2010-05-01 12:55:50,923:15428(0xb7464b70):zoo_de...@zookeeper_process@1980: 
Queueing asynchronous response
2010-05-01 12:55:50,923:15428(0xb6c63b70):zoo_de...@process_completions@1772: 
Calling COMPLETION_DATA for xid=0x4bdc24d8 rc=0
access close
2010-05-01 12:55:50,924:15428(0xb7464b70):zoo_de...@do_io@317: IO thread 
terminated

Gdb Stack Trace

(gdb) bt
#0  0x00311422 in __kernel_vsyscall ()
#1  0x00413b5d in pthread_join () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x00d6518d in adaptor_finish (zh=0x95d9778) at src/mt_adaptor.c:219
#3  0x00d5b0dd in zookeeper_close (zh=0x95d9778) at src/zookeeper.c:2283
#4  0x00674bcb in pyzoo_close (self=0x0, args=(0,)) at src/c/zookeeper.c:975
#5  0x080e0a21 in call_function (f=
    Frame 0x95e198c, for file 
/home/kapil/projects/txzookeeper/trunk/txzookeeper/client.py, line 147, in 
close (self=<ZookeeperClient(handle=0, _async_calls=[], connected=False, 
servers='127.0.0.1:2181') at remote 0x95886cc>), throwflag=0) at 
../Python/ceval.c:3750
#6  PyEval_EvalFrameEx (f=
    Frame 0x95e198c, for file 
/home/kapil/projects/txzookeeper/trunk/txzookeeper/client.py, line 147, in 
close (self=<ZookeeperClient(handle=0, _async_calls=[], connected=False, 
servers='127.0.0.1:2181') at remote 0x95886cc>), throwflag=0) at 
../Python/ceval.c:2412
#7  0x080e1bb0 in fast_function (f=
    Frame 0x95e182c, for file 
/home/kapil/projects/txzookeeper/trunk/txzookeeper/tests/test_client.py, line 
110, in check_node_content (value=('rabbit', {'pzxid': 4533L, 'ctime': 
1272735260049L, 'aversion': 0, 'mzxid': 4533L, 'numChildren': 0, 
'ephemeralOwner': 83408874780753938L, 'version': 0, 'dataLength': 6, 'mtime': 
1272735260049L, 'cversion': 0, 'czxid': 4533L})), throwflag=0) at 
../Python/ceval.c:3836
#8  call_function (f=
    Frame 0x95e182c, for file 
/home/kapil/projects/txzookeeper/trunk/txzookeeper/tests/test_client.py, line 
110, in check_node_content (value=('rabbit', {'pzxid': 4533L, 'ctime': 
1272735260049L, 'aversion': 0, 'mzxid': 4533L, 'numChildren': 0, 
'ephemeralOwner': 83408874780753938L, 'version': 0, 'dataLength': 6, 'mtime': 
1272735260049L, 'cversion': 0, 'czxid': 4533L})), throwflag=0) at 
../Python/ceval.c:3771
#9  PyEval_EvalFrameEx (f=
    Frame 0x95e182c, for file 
/home/kapil/projects/txzookeeper/trunk/txzookeeper/tests/test_client.py, line 
110, in check_node_content (value=('rabbit', {'pzxid': 4533L, 'ctime': 
1272735260049L, 'aversion': 0, 'mzxid': 4533L, 'numChildren': 0, 
'ephemeralOwner': 83408874780753938L, 'version': 0, 'dataLength': 6, 'mtime': 
1272735260049L, 'cversion': 0, 'czxid': 4533L})), throwflag=0) at 
../Python/ceval.c:2412
#10 0x080e2807 in PyEval_EvalCodeEx (co=0x957a530, globals=




So for this i've artifically inserted a sleep call for the main thread before 
invoking zookeeper.close on the handle.


access set_debug_level
access LOG_LEVEL_DEBUG
access init
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_i...@log_env@658: Client 
environment:zookeeper.version=zookeeper C client 3.3.0
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_i...@log_env@662: Client 
environment:host.name=kapil-laptop
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_i...@log_env@669: Client 
environment:os.name=Linux
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_i...@log_env@670: Client 
environment:os.arch=2.6.32-21-generic
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_i...@log_env@671: Client 
environment:os.version=#32-Ubuntu SMP Fri Apr 16 08:10:02 UTC 2010
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_i...@log_env@679: Client 
environment:user.name=(null)
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_i...@log_env@687: Client 
environment:user.home=/home/kapil
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_i...@log_env@699: Client 
environment:user.dir=/home/kapil/projects/txzookeeper/trunk/_trial_temp
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_i...@zookeeper_init@727: 
Initiating client connection, host=127.0.0.1:2181 sessionTimeout=30000 
watcher=0x134ee0 sessionId=0 sessionPasswd=<null> context=0x9b6f898 flags=0
2010-05-01 16:43:34,238:15990(0xb77ec6c0):zoo_de...@start_threads@152: starting 
threads...
2010-05-01 16:43:34,239:15990(0xb6c97b70):zoo_de...@do_completion@326: started 
completion thread
2010-05-01 16:43:34,239:15990(0xb7498b70):zoo_de...@do_io@279: started IO thread
2010-05-01 16:43:34,239:15990(0xb7498b70):zoo_i...@check_events@1585: initiated 
connection to server [127.0.0.1:2181]
2010-05-01 16:43:34,291:15990(0xb7498b70):zoo_i...@check_events@1632: session 
establishment complete on server [127.0.0.1:2181], sessionId=0x12853ee006c0008, 
negotiated timeout=30000
2010-05-01 16:43:34,291:15990(0xb7498b70):zoo_de...@check_events@1638: Calling 
a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2010-05-01 16:43:34,291:15990(0xb6c97b70):zoo_de...@process_completions@1765: 
Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
access CONNECTED_STATE
access EPHEMERAL
access acreate
2010-05-01 16:43:34,291:15990(0xb77ec6c0):zoo_de...@zoo_acreate@2489: Sending 
request xid=0x4bdc5a37 for path [/foobar] to 127.0.0.1:2181
access OK
2010-05-01 16:43:34,299:15990(0xb7498b70):zoo_de...@zookeeper_process@1980: 
Queueing asynchronous response
2010-05-01 16:43:34,299:15990(0xb6c97b70):zoo_de...@process_completions@1817: 
Calling COMPLETION_STRING for xid=0x4bdc5a37 rc=0
access NODEEXISTS
access aget
2010-05-01 16:43:34,299:15990(0xb77ec6c0):zoo_de...@zoo_awget@2400: Sending 
request xid=0x4bdc5a38 for path [/foobar] to 127.0.0.1:2181
access OK
2010-05-01 16:43:34,299:15990(0xb7498b70):zoo_de...@zookeeper_process@1980: 
Queueing asynchronous response
2010-05-01 16:43:34,299:15990(0xb6c97b70):zoo_de...@process_completions@1772: 
Calling COMPLETION_DATA for xid=0x4bdc5a38 rc=0
access state
access close
2010-05-01 16:43:34,400:15990(0xb7498b70):zoo_de...@do_io@317: IO thread 
terminated
2010-05-01 16:43:34,400:15990(0xb6c97b70):zoo_de...@do_completion@336: 
completion thread terminated
2010-05-01 16:43:34,400:15990(0xb77ec6c0):zoo_i...@zookeeper_close@2290: 
Closing zookeeper sessionId=0x12853ee006c0008 to [127.0.0.1:2181]
access set_log_stream

Reply via email to