On 06/26/2010 06:53 AM, Peeyush Kumar wrote:
I have a 6 node cluster (5 slaves and 1 master). I am trying to
You typically want an odd number given that zk works by majority (even is fine, but not optimal). So 5 would be great (7 is a bit of overkill). 3 is fine too, but 5 allows for you to take 1 server down for "scheduled maintenance" and still experience an unexpected failure w/o impact to service availability.
In your exception I see "DatagramSocket" this is unusual. What are you running for ZK version? As Lei suggested please include your config file so that we can review that as well (if you are overriding electionAlg this might be part of the problem. Current versions of ZK servers use tcp for connections by default, that's why this is unusual.)
Most likely there is either a config problem or perhaps you have a firewall that's blocking communication btw the servers? Try verifying server to server connectivity on the ports you've selected.
Patrick
start the zookeper server on the cluster. when I issue this command: $ java -cp zookeeper.jar:lib/log4j-1.2.15.jar:conf \ org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg I get the following error: 2010-06-26 18:09:17,468 - INFO [main:quorumpeercon...@80] - Reading configuration from: conf/zoo.cfg 2010-06-26 18:09:17,483 - INFO [main:quorumpeercon...@232] - Defaulting to majority quorums 2010-06-26 18:09:17,545 - INFO [main:quorumpeerm...@118] - Starting quorum peer 2010-06-26 18:09:17,585 - INFO [QuorumPeer:/0.0.0.0:2179:quorump...@514] - LOOKING 2010-06-26 18:09:17,589 - INFO [QuorumPeer:/0.0.0.0:2179:leaderelect...@154] - Server address: master.cf.net/192.168.1.1:2180 2010-06-26 18:09:17,589 - INFO [QuorumPeer:/0.0.0.0:2179:leaderelect...@154] - Server address: slave01.cf.net/192.168.1.2:2180 2010-06-26 18:09:17,792 - WARN [QuorumPeer:/0.0.0.0:2179:leaderelect...@194] - Ignoring exception while looking for leader java.net.SocketTimeoutException: Receive timed out at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) at java.net.DatagramSocket.receive(DatagramSocket.java:725) at org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) 2010-06-26 18:09:17,794 - INFO [QuorumPeer:/0.0.0.0:2179:leaderelect...@154] - Server address: slave02.cf.net/192.168.1.3:2180 2010-06-26 18:09:17,995 - WARN [QuorumPeer:/0.0.0.0:2179:leaderelect...@194] - Ignoring exception while looking for leader java.net.SocketTimeoutException: Receive timed out at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) at java.net.DatagramSocket.receive(DatagramSocket.java:725) at org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) 2010-06-26 18:09:17,996 - INFO [QuorumPeer:/0.0.0.0:2179:leaderelect...@154] - Server address: slave03.cf.net/192.168.1.4:2180 2010-06-26 18:09:18,197 - WARN [QuorumPeer:/0.0.0.0:2179:leaderelect...@194] - Ignoring exception while looking for leader java.net.SocketTimeoutException: Receive timed out at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) at java.net.DatagramSocket.receive(DatagramSocket.java:725) at org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) 2010-06-26 18:09:18,200 - INFO [QuorumPeer:/0.0.0.0:2179:leaderelect...@154] - Server address: slave04.cf.net/192.168.1.5:2180 2010-06-26 18:09:18,401 - WARN [QuorumPeer:/0.0.0.0:2179:leaderelect...@194] - Ignoring exception while looking for leader java.net.SocketTimeoutException: Receive timed out at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) at java.net.DatagramSocket.receive(DatagramSocket.java:725) at org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) 2010-06-26 18:09:18,402 - INFO [QuorumPeer:/0.0.0.0:2179:leaderelect...@154] - Server address: slave05.cf.net/192.168.1.6:2180 2010-06-26 18:09:18,604 - WARN [QuorumPeer:/0.0.0.0:2179:leaderelect...@194] - Ignoring exception while looking for leader java.net.SocketTimeoutException: Receive timed out at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) at java.net.DatagramSocket.receive(DatagramSocket.java:725) at org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515) 2010-06-26 18:09:18,605 - INFO [QuorumPeer:/0.0.0.0:2179:leaderelect...@102] - Election tally: 2010-06-26 18:09:18,606 - INFO [QuorumPeer:/0.0.0.0:2179:leaderelect...@108] - 1 -> 1 .....this error continues indefinitely.... can anyone please help me around this? Your help is solicited Thanks Peeyush