[ https://issues.apache.org/jira/browse/ZOOKEEPER-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139360#comment-13139360 ]
Patrick Hunt edited comment on ZOOKEEPER-1271 at 10/29/11 4:46 PM: ------------------------------------------------------------------- The error handling added to ZOOKEEPER-1174 is causing this bug. {noformat} try { sockKey = sock.register(selector, SelectionKey.OP_CONNECT); boolean immediateConnect = sock.connect(addr); if (immediateConnect) { sendThread.primeConnection(); } } catch (IOException e) { LOG.error("Unable to open socket to " + addr); sock.close(); } {noformat} if an exception is thrown inside the try the socket is closed, however sockKey is left set. As a result he client will not attempt to reconnect to the server (typically it will continue to retry every second or so). I think the bug here is that the exception should be rethrown, otw the 'cleanup' routine in SendThread.run will not be executed. was (Author: phunt): The error handling added to ZOOKEEPER-1174 is causing this bug. {noformat} try { sockKey = sock.register(selector, SelectionKey.OP_CONNECT); boolean immediateConnect = sock.connect(addr); if (immediateConnect) { sendThread.primeConnection(); } } catch (IOException e) { LOG.error("Unable to open socket to " + addr); sock.close(); } if an exception is thrown inside the try the socket is closed, however sockKey is left set. As a result he client will not attempt to reconnect to the server (typically it will continue to retry every second or so). I think the bug here is that the exception should be rethrown, otw the 'cleanup' routine in SendThread.run will not be executed. > testEarlyLeaderAbandonment failing on solaris - clients not retrying > connection > ------------------------------------------------------------------------------- > > Key: ZOOKEEPER-1271 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1271 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Affects Versions: 3.4.0, 3.5.0 > Reporter: Patrick Hunt > Priority: Blocker > Fix For: 3.4.0, 3.5.0 > > Attachments: solarisClientFailure.txt.gz > > > See: > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_solaris/1/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testEarlyLeaderAbandonment/ > Notice that the clients attempt to connect before the servers have bound, > then 30 seconds later, after seemingly no further client activity we see: > 2011-10-28 21:40:56,828 [myid:] - INFO > [main-SendThread(localhost:11227):ClientCnxn$SendThread@1057] - Client > session timed out, have not heard from server in 30032ms for sessionid 0x0, > closing socket connection and attempting reconnect > I believe this is different from ZOOKEEPER-1270 because in the 1270 case it > seems like the clients are attempting to connect but the servers are not > accepting (notice the stat commands are being dropped due to no server > running) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira