[jira] Updated: (ZOOKEEPER-914) QuorumCnxManager blocks forever

2010-10-28 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-914:
---

Component/s: (was: server)
 (was: quorum)
 leaderElection

> QuorumCnxManager blocks forever 
> 
>
> Key: ZOOKEEPER-914
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Reporter: Vishal K
>Assignee: Vishal K
>Priority: Blocker
> Fix For: 3.3.3, 3.4.0
>
>
> This was a disaster. While testing our application we ran into a scenario 
> where a rebooted follower could not join the cluster. Further debugging 
> showed that the follower could not join because the QuorumCnxManager on the 
> leader was blocked for indefinite amount of time in receiveConnect()
> "Thread-3" prio=10 tid=0x7fa920005800 nid=0x11bb runnable 
> [0x7fa9275ed000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
> at sun.nio.ch.IOUtil.read(IOUtil.java:206)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> - locked <0x7fa93315f988> (a java.lang.Object)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501)
> I had pointed out this bug along with several other problems in 
> QuorumCnxManager earlier in 
> https://issues.apache.org/jira/browse/ZOOKEEPER-900 and 
> https://issues.apache.org/jira/browse/ZOOKEEPER-822.
> I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix 
> and a patch will be out soon. 
> The problem is that QuorumCnxManager is using SocketChannel in blocking mode. 
> It does a read() in receiveConnection() and a write() in initiateConnection().
> Sorry, but this is really bad programming. Also, points out to lack of 
> failure tests for QuorumCnxManager.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-914) QuorumCnxManager blocks forever

2010-10-27 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-914:
---

  Component/s: server
   quorum
Fix Version/s: 3.4.0
   3.3.3

> QuorumCnxManager blocks forever 
> 
>
> Key: ZOOKEEPER-914
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum, server
>Reporter: Vishal K
>Assignee: Vishal K
>Priority: Blocker
> Fix For: 3.3.3, 3.4.0
>
>
> This was a disaster. While testing our application we ran into a scenario 
> where a rebooted follower could not join the cluster. Further debugging 
> showed that the follower could not join because the QuorumCnxManager on the 
> leader was blocked for indefinite amount of time in receiveConnect()
> "Thread-3" prio=10 tid=0x7fa920005800 nid=0x11bb runnable 
> [0x7fa9275ed000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
> at sun.nio.ch.IOUtil.read(IOUtil.java:206)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> - locked <0x7fa93315f988> (a java.lang.Object)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501)
> I had pointed out this bug along with several other problems in 
> QuorumCnxManager earlier in 
> https://issues.apache.org/jira/browse/ZOOKEEPER-900 and 
> https://issues.apache.org/jira/browse/ZOOKEEPER-822.
> I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix 
> and a patch will be out soon. 
> The problem is that QuorumCnxManager is using SocketChannel in blocking mode. 
> It does a read() in receiveConnection() and a write() in initiateConnection().
> Sorry, but this is really bad programming. Also, points out to lack of 
> failure tests for QuorumCnxManager.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.