Austin, Please check: https://issues.apache.org/jira/browse/ZOOKEEPER-140
Thanks, -Flavio > -----Original Message----- > From: Austin Shoemaker [mailto:[EMAIL PROTECTED] > Sent: Tuesday, September 16, 2008 12:22 PM > To: zookeeper-user@hadoop.apache.org > Subject: Re: Leader election stalled > > Ben, > > Here is a proposed fix for the deadlock issue in QuorumCnxManager. > > The protocol starts by an initiator invoking > handleConnection(socket_out) where socket is a connection to a remote > peer, > or if an incoming connection first triggers > handleConnection(socket_in) before we initiate a connection to the > peer. In the > event that we and the peer both initiate connections, the above calls > to handleConnection will proceed on different threads > in the same peer. > > Per-peer instance variables > myVersion = 0 > myChallenge = genChallenge() > > "socket" is the connection to the peer. > > boolean handleConnection(socket) throws Exception { > done = false > wins = false > > while (!done) { > // Send the current version and challenge to the peer, then > wait for it to send its current version and challenge. > // The read is blocking though we expect the peer to write > since reads and writes are matched. > synchronized (challengeLock) { > socket.write(myVersion, myChallenge) > } > peerVersion, peerChallenge = socket.read() > > synchronized (challengeLock) { > // If peer is obsolete, bring it up to date. > if (peerVersion < myVersion) { > continue; > } > > // If we are obsolete, wait to be brought up to date. > if (peerVersion > myVersion) { > myVersion = peerVersion > myChallenge = genChallenge() > continue > } > > assert(myVersion == peerVersion) > > // Challenges are compared, resulting in win, lose, or > retry. > if (myChallenge > peerChallenge) { > wins = true > done = true > } else if (myChallenge < peerChallenge) { > done = true > } else { > ++myVersion > myChallenge = genChallenge() > } > } > } > > // We return true if we won, otherwise we return false. Either we > or the peer will win, not both. If a connection error occurs, > // this method will throw an exception. > return wins > } > > Do you think it's correct? I wonder if there is a way to simplify this > protocol. > > Austin > > On Sep 12, 2008, at 4:51 PM, Austin Shoemaker wrote: > > > Ben, > > > > I am able to run algorithm 3 successfully sometimes, though > > frequently the servers deadlock in > > QuorumCnxManager:initiateConnection on s.read(msgBuffer) when > > reading the challenge from the peer. > > > > Calls to initiateConnection and receiveConnection are synchronized, > > so only one or the other can be executing at a time. This prevents > > two connections from opening between the same pair of servers. > > > > However, it seems that this leads to deadlock, as in this scenario: > > > > A (initiate --> B) > > B (initiate --> C) > > C (initiate --> A) > > > > initiateConnection can only complete when receiveConnection runs on > > the remote peer and answers the challenge. If all servers are > > blocked in initiateConnection, receiveConnection never runs and > > leader election halts. > > > > Looking forward to your thoughts. > > > > Thanks, > > > > Austin > > > > On Sep 2, 2008, at 10:14 AM, Benjamin Reed wrote: > > > >> Austin, > >> > >> Could you try using the new leader election algorithm? You need to > >> set > >> the algorithm type to 3 and you also need to set the election port > >> (TCP) > >> to be used. > >> > >> See http://zookeeper.wiki.sourceforge.net/ZooKeeperConfiguration for > >> more details. > >> > >> ben > >> > >> -----Original Message----- > >> From: Austin Shoemaker [mailto:[EMAIL PROTECTED] > >> Sent: Tuesday, September 02, 2008 9:57 AM > >> To: zookeeper-user@hadoop.apache.org > >> Subject: Leader election stalled > >> > >> Hi, > >> > >> We have run into a situation where killing the leader results in > >> followers > >> perpetually trying to reelect that leader. > >> > >> We have 11 zookeeper (2.2.1 from SF.net) servers and 256 clients > >> connecting > >> at random. We kill the leader and observe the impact, monitoring a > >> script > >> that repeatedly prints the responses to "ruok" and "stat". All > >> servers > >> except the killed leader respond with "imok" and "ZooKeeperServer not > >> running", respectively. > >> > >> About half of the time, each remaining server gets into a loop of > >> failing to > >> connect to the killed leader and then reelecting the killed leader. > >> > >> Here is an example log, which is representative of similar logs on > >> the > >> other > >> servers. We additionally logged connectivity during leader > >> election. If > >> anyone would like complete logs, let me know. > >> > >> Thanks, > >> > >> Austin Shoemaker > >> > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING > >> *WARN - [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.22:2889* > >> ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG > >> java.net.ConnectException: Connection refused > >> * > >> .... cont'd ....* > >> > >> ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG > >> java.lang.Exception: shutdown Follower > >> at > >> com.yahoo.zookeeper.server.quorum.Follower.shutdown(Follower.java: > >> 364) > >> at > >> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:403) > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - LOOKING > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.22:2888 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.22:2888 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.21:2888 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.21:2888 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.12:2888 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.12:2888 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.11:2888 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.11:2888 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.12:2890 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.12:2890 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.11:2890 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.11:2890 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.22:2889 > >> *WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Exception occurred > >> when > >> sending / receiving packet to / from /10.50.65.22:2889 > >> java.net.SocketTimeoutException: Receive timed out > >> *WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to > >> /10.50.65.21:2890 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.21:2890 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.21:2889 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.21:2889 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.12:2889 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.12:2889 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election > >> packet > >> to / > >> 10.50.65.11:2889 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response > >> from / > >> 10.50.65.11:2889 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - Election tally: > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - 8 -> 1 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - 4 -> 1 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - 7 -> 8 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Election complete, > >> result.winner = 7 > >> *WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Election complete, > >> address > >> = /10.50.65.22:2889 > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING > >> WARN - [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.22:2889 > >> ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG > >> java.net.ConnectException: Connection refused > >> * at java.net.PlainSocketImpl.socketConnect(Native Method) > >> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) > >> at > >> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) > >> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) > >> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) > >> at java.net.Socket.connect(Socket.java:519) > >> at > >> com > >> .yahoo.zookeeper.server.quorum.Follower.followLeader(Follower.java:13 > >> 3) > >> at > >> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:399) > >