Austin, Could you try using the new leader election algorithm? You need to set the algorithm type to 3 and you also need to set the election port (TCP) to be used.
See http://zookeeper.wiki.sourceforge.net/ZooKeeperConfiguration for more details. ben -----Original Message----- From: Austin Shoemaker [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 02, 2008 9:57 AM To: [email protected] Subject: Leader election stalled Hi, We have run into a situation where killing the leader results in followers perpetually trying to reelect that leader. We have 11 zookeeper (2.2.1 from SF.net) servers and 256 clients connecting at random. We kill the leader and observe the impact, monitoring a script that repeatedly prints the responses to "ruok" and "stat". All servers except the killed leader respond with "imok" and "ZooKeeperServer not running", respectively. About half of the time, each remaining server gets into a loop of failing to connect to the killed leader and then reelecting the killed leader. Here is an example log, which is representative of similar logs on the other servers. We additionally logged connectivity during leader election. If anyone would like complete logs, let me know. Thanks, Austin Shoemaker WARN - [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING *WARN - [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.22:2889* ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG java.net.ConnectException: Connection refused * .... cont'd ....* ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG java.lang.Exception: shutdown Follower at com.yahoo.zookeeper.server.quorum.Follower.shutdown(Follower.java:364) at com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:403) WARN - [QuorumPeer:[EMAIL PROTECTED] - LOOKING WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.22:2888 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.22:2888 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.21:2888 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.21:2888 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.12:2888 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.12:2888 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.11:2888 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.11:2888 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.12:2890 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.12:2890 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.11:2890 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.11:2890 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.22:2889 *WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Exception occurred when sending / receiving packet to / from /10.50.65.22:2889 java.net.SocketTimeoutException: Receive timed out *WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to /10.50.65.21:2890 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.21:2890 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.21:2889 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.21:2889 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.12:2889 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.12:2889 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election packet to / 10.50.65.11:2889 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response from / 10.50.65.11:2889 WARN - [QuorumPeer:[EMAIL PROTECTED] - Election tally: WARN - [QuorumPeer:[EMAIL PROTECTED] - 8 -> 1 WARN - [QuorumPeer:[EMAIL PROTECTED] - 4 -> 1 WARN - [QuorumPeer:[EMAIL PROTECTED] - 7 -> 8 WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Election complete, result.winner = 7 *WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Election complete, address = /10.50.65.22:2889 WARN - [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING WARN - [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.22:2889 ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG java.net.ConnectException: Connection refused * at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at com.yahoo.zookeeper.server.quorum.Follower.followLeader(Follower.java:13 3) at com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:399)
