[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929681#action_12929681
 ] 

Vishal K commented on ZOOKEEPER-914:
------------------------------------

Hi Flavio,

The documentation is not clear.
SO_TIMEOUT  has not effect on blocking channels. Non-blocking channels, wait 
for the specified timeout if nothing is available in the buffer. Otherwise, it 
returns whatever bytes are currently available in the buffer. I wrote a test 
the following test to verify this. Let me know if you know about way to make 
SO_TIMEOUT to work.
 
        QuorumPeer peerLeader = new QuorumPeer(peers, tmpdir[1], tmpdir[1], 
port[1], 3, 0, 2, 2, 2);
        QuorumCnxManager cnxManager = new QuorumCnxManager(peerLeader);
        QuorumCnxManager.Listener listener = cnxManager.listener;
        SocketChannel channel = SocketChannel.open();
        channel.socket().connect(peers.get(new Long(0)).electionAddr, 5000);
        channel.configureBlocking(false);
        channel.socket().setSoTimeout(1000);
        byte[] msgBytes = new byte[8];
        ByteBuffer msgBuffer = ByteBuffer.wrap(msgBytes);

        /**
         * Don't send any data and call read() and see how long it waits.
         */
        long begin = System.currentTimeMillis();
        channel.read(msgBuffer);
       long end = System.currentTimeMillis();

Feel to free close duplicate bugs.

> QuorumCnxManager blocks forever 
> --------------------------------
>
>                 Key: ZOOKEEPER-914
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Blocker
>             Fix For: 3.3.3, 3.4.0
>
>
> This was a disaster. While testing our application we ran into a scenario 
> where a rebooted follower could not join the cluster. Further debugging 
> showed that the follower could not join because the QuorumCnxManager on the 
> leader was blocked for indefinite amount of time in receiveConnect()
> "Thread-3" prio=10 tid=0x00007fa920005800 nid=0x11bb runnable 
> [0x00007fa9275ed000]
>    java.lang.Thread.State: RUNNABLE
>     at sun.nio.ch.FileDispatcher.read0(Native Method)
>     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>     at sun.nio.ch.IOUtil.read(IOUtil.java:206)
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>     - locked <0x00007fa93315f988> (a java.lang.Object)
>     at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210)
>     at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501)
> I had pointed out this bug along with several other problems in 
> QuorumCnxManager earlier in 
> https://issues.apache.org/jira/browse/ZOOKEEPER-900 and 
> https://issues.apache.org/jira/browse/ZOOKEEPER-822.
> I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix 
> and a patch will be out soon. 
> The problem is that QuorumCnxManager is using SocketChannel in blocking mode. 
> It does a read() in receiveConnection() and a write() in initiateConnection().
> Sorry, but this is really bad programming. Also, points out to lack of 
> failure tests for QuorumCnxManager.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to