[jira] Commented: (ZOOKEEPER-63) Race condition in client close() operation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12631506#action_12631506 ] Patrick Hunt commented on ZOOKEEPER-63: --- Thank you for taking the time to report the issue. Regards. Race condition in client close() operation -- Key: ZOOKEEPER-63 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63 Project: Zookeeper Issue Type: Bug Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: client-test-fail.diff, ZOOKEEPER-63.patch There is a race condition in the java close operation on ZooKeeper.java. Client is sending a disconnect request to the server. Server will close any open connections with the client when it receives this. If the client has not yet shutdown it's subthreads (event/send threads for example) these threads may consider the condition an error. We see this alot in the tests where the clients output error logs because they are unaware that a disconnection has been requested by the client. Ben mentioned: perhaps we just have to change state to closed (on client) before sending disconnect request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-63) Race condition in client close() operation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12616426#action_12616426 ] james strachan commented on ZOOKEEPER-63: - So this patch does not attempt to fix the race condition problem, apologies if I gave that impression :) What it does do though is act as a workaround so that if a client is not able to properly send a disconnect packet to the server for *any reason at all* such as * a hung socket (which can be quite common) * no servers available * a race condition in the ZK client code of some kind (which we definitely have now) to not hang the client application forever - as its trying to close and shut down anyway :). So its a side benefit that it acts as a band aid until someone fixes all the possible race conditions and potential socket hangs. Let me put it another way. Given that the client is closing; is it really correct to leave it potentially hanging around forever just because it cannot be sure if the disconnect packet was received and properly processed by the server? If the socket is dead before the call to close(), is it really correct to block until a connection can be re-established, just so it can be properly closed - when the code will effectively close the hung socket without sending a disconnect packet anyway :) ? The server has to detect and timeout failed sessions; whether it receives an explicit disconnect packet or not (as a process could just hang). So do we really need to be super strict on the client side, forcing clients to block when trying to shut down if they can't do so cleanly within some time period? I totally agree that we should fix the race condition though :). I just wanted a work around to avoid my ZK test cases hanging forever due to the race condition :) Race condition in client close() operation -- Key: ZOOKEEPER-63 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63 Project: Zookeeper Issue Type: Bug Components: java client Reporter: Patrick Hunt Assignee: Benjamin Reed Attachments: patch_ZOOKEEPER-63.patch There is a race condition in the java close operation on ZooKeeper.java. Client is sending a disconnect request to the server. Server will close any open connections with the client when it receives this. If the client has not yet shutdown it's subthreads (event/send threads for example) these threads may consider the condition an error. We see this alot in the tests where the clients output error logs because they are unaware that a disconnection has been requested by the client. Ben mentioned: perhaps we just have to change state to closed (on client) before sending disconnect request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-63) Race condition in client close() operation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12616136#action_12616136 ] james strachan commented on ZOOKEEPER-63: - I wonder if I've seen this too - I can reliably get a hung test when trying to close a client (though the server is still up at the point if the hang). I'm thinking the close() method should not wait() forever on the disconnect packet, just a closeTimeout length - say a few seconds. Afterall blocking and forcing a reconnect just to redeliver the disconnect packet seems a bit silly - when the server has to deal with clients which just have their sockets fail anyway :) Race condition in client close() operation -- Key: ZOOKEEPER-63 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63 Project: Zookeeper Issue Type: Bug Components: java client Reporter: Patrick Hunt Assignee: Benjamin Reed There is a race condition in the java close operation on ZooKeeper.java. Client is sending a disconnect request to the server. Server will close any open connections with the client when it receives this. If the client has not yet shutdown it's subthreads (event/send threads for example) these threads may consider the condition an error. We see this alot in the tests where the clients output error logs because they are unaware that a disconnection has been requested by the client. Ben mentioned: perhaps we just have to change state to closed (on client) before sending disconnect request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.