To be sure and diagnose I would also change gw110.iu.xsede.org by IP of the machine (to avoid layer, dns caching or...) At the beginning, when you start the cluster you can check the ensemble with http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkCommands You Will be able to know how many followers you have at init
Le vendredi 13 juin 2014, Cameron McKenzie <[email protected]> a écrit : > It's possible that two of the nodes can talk to each other, but the third > can't. This means that when all three are running you will get a quorum > because two can connect to each other. Once one of these two is shut down, > you will not be able to reform a quorum. I would check via something simple > like telnet. That you can telnet from each host onto each of the other > hosts at the appropriate ports you have configured. > > > On Fri, Jun 13, 2014 at 11:39 PM, Lahiru Gunathilake <[email protected] > <javascript:;>> > wrote: > > > Thanks all for the response but I still couldn't figure out why its not > > working. If I configured the cluster it should give an error first place. > > When I kill the leader it fails and at the same time when I kill a > follower > > and try to start it again it doesn't work either, but the other nodes in > > the cluster works fine. > > > > When kill the leader I see following error in one of the followers, > > > > 2014-06-13 09:35:37,215 [myid:1] - WARN [QuorumPeer[myid=1]/ > 0.0.0.0:2181 > > :Learner@233] - Unexpected exception, tries=1, connecting to / > > 129.79.247.5:2888 > > java.net.ConnectException: Connection refused > > at java.net.PlainSocketImpl.socketConnect(Native Method) > > at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) > > at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) > > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432) > > at java.net.Socket.connect(Socket.java:529) > > at > > > > > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:225) > > at > > > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71) > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786) > > > > > > I can see 129.79.247.5 is the other follower and something is wrong. But > > what I do not understand is why this is not coming when I start the > cluster > > at the first place, because when I start the cluster initially it finish > > the voting process successfully then one became a leader and rest became > > follower. > > > > Regards > > Lahiru > > > > > > > > On Thu, Jun 12, 2014 at 9:56 PM, James A. Robinson <[email protected] > <javascript:;>> > > wrote: > > > > > On Thu, Jun 12, 2014 at 4:47 PM, Cameron McKenzie < > > [email protected] <javascript:;>> > > > wrote: > > > > > > > This is not correct, 3 is a minimum for redundancy. If 1 goes down, > the > > > > other 2 can still form a quorum (as there are more than half of them > > > > remaining). > > > > > > > > > > Thank you, it's good to know this -- I must have gotten confused > > > about the way the quorum logic worked at some point. > > > > > > Jim > > > > > > > > > > > -- > > System Analyst Programmer > > PTI Lab > > Indiana University > > >
