james strachan commented on ZOOKEEPER-63:
So this patch does not attempt to fix the race condition problem, apologies if
I gave that impression :)
What it does do though is act as a workaround so that if a client is not able
to properly send a disconnect packet to the server for *any reason at all* such
* a hung socket (which can be quite common)
* no servers available
* a race condition in the ZK client code of some kind (which we definitely have
to not hang the client application forever - as its trying to close and shut
down anyway :). So its a side benefit that it acts as a band aid until someone
fixes all the possible race conditions and potential socket hangs.
Let me put it another way. Given that the client is closing; is it really
correct to leave it potentially hanging around forever just because it cannot
be sure if the disconnect packet was received and properly processed by the
server? If the socket is dead before the call to close(), is it really correct
to block until a connection can be re-established, just so it can be properly
closed - when the code will effectively close the hung socket without sending a
disconnect packet anyway :) ?
The server has to detect and timeout failed sessions; whether it receives an
explicit disconnect packet or not (as a process could just hang). So do we
really need to be super strict on the client side, forcing clients to block
when trying to shut down if they can't do so cleanly within some time period?
I totally agree that we should fix the race condition though :). I just wanted
a work around to avoid my ZK test cases hanging forever due to the race
> Race condition in client close() operation
> Key: ZOOKEEPER-63
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63
> Project: Zookeeper
> Issue Type: Bug
> Components: java client
> Reporter: Patrick Hunt
> Assignee: Benjamin Reed
> Attachments: patch_ZOOKEEPER-63.patch
> There is a race condition in the java close operation on ZooKeeper.java.
> Client is sending a disconnect request to the server. Server will close any
> open connections with the client when it receives this. If the client has not
> yet shutdown it's subthreads (event/send threads for example) these threads
> may consider the condition an error. We see this alot in the tests where the
> clients output error logs because they are unaware that a disconnection has
> been requested by the client.
> Ben mentioned: perhaps we just have to change state to closed (on client)
> before sending disconnect request.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.