[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139360#comment-13139360
 ] 

Patrick Hunt edited comment on ZOOKEEPER-1271 at 10/29/11 4:46 PM:
-------------------------------------------------------------------

The error handling added to ZOOKEEPER-1174 is causing this bug.

{noformat}
        try {
            sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
            boolean immediateConnect = sock.connect(addr);            
            if (immediateConnect) {
                sendThread.primeConnection();
            }
        } catch (IOException e) {
            LOG.error("Unable to open socket to " + addr);
            sock.close();
        }
{noformat}

if an exception is thrown inside the try the socket is closed, however sockKey 
is left set. As a result he client will not attempt to reconnect to the server 
(typically it will continue to retry every second or so). I think the bug here 
is that the exception should be rethrown, otw the 'cleanup' routine in 
SendThread.run will not be executed. 

                
      was (Author: phunt):
    The error handling added to ZOOKEEPER-1174 is causing this bug.

{noformat}
        try {
            sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
            boolean immediateConnect = sock.connect(addr);            
            if (immediateConnect) {
                sendThread.primeConnection();
            }
        } catch (IOException e) {
            LOG.error("Unable to open socket to " + addr);
            sock.close();
        }

if an exception is thrown inside the try the socket is closed, however sockKey 
is left set. As a result he client will not attempt to reconnect to the server 
(typically it will continue to retry every second or so). I think the bug here 
is that the exception should be rethrown, otw the 'cleanup' routine in 
SendThread.run will not be executed. 

                  
> testEarlyLeaderAbandonment failing on solaris - clients not retrying 
> connection
> -------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1271
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1271
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.4.0, 3.5.0
>            Reporter: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.4.0, 3.5.0
>
>         Attachments: solarisClientFailure.txt.gz
>
>
> See:
> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_solaris/1/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testEarlyLeaderAbandonment/
> Notice that the clients attempt to connect before the servers have bound, 
> then 30 seconds later, after seemingly no further client activity we see:
> 2011-10-28 21:40:56,828 [myid:] - INFO  
> [main-SendThread(localhost:11227):ClientCnxn$SendThread@1057] - Client 
> session timed out, have not heard from server in 30032ms for sessionid 0x0, 
> closing socket connection and attempting reconnect
> I believe this is different from ZOOKEEPER-1270 because in the 1270 case it 
> seems like the clients are attempting to connect but the servers are not 
> accepting (notice the stat commands are being dropped due to no server 
> running)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to