Ben,
Here is a proposed fix for the deadlock issue in QuorumCnxManager.
The protocol starts by an initiator invoking
handleConnection(socket_out) where socket is a connection to a remote
peer,
or if an incoming connection first triggers
handleConnection(socket_in) before we initiate a connection to the
peer. In the
event that we and the peer both initiate connections, the above calls
to handleConnection will proceed on different threads
in the same peer.
Per-peer instance variables
myVersion = 0
myChallenge = genChallenge()
"socket" is the connection to the peer.
boolean handleConnection(socket) throws Exception {
done = false
wins = false
while (!done) {
// Send the current version and challenge to the peer, then
wait for it to send its current version and challenge.
// The read is blocking though we expect the peer to write
since reads and writes are matched.
synchronized (challengeLock) {
socket.write(myVersion, myChallenge)
}
peerVersion, peerChallenge = socket.read()
synchronized (challengeLock) {
// If peer is obsolete, bring it up to date.
if (peerVersion < myVersion) {
continue;
}
// If we are obsolete, wait to be brought up to date.
if (peerVersion > myVersion) {
myVersion = peerVersion
myChallenge = genChallenge()
continue
}
assert(myVersion == peerVersion)
// Challenges are compared, resulting in win, lose, or
retry.
if (myChallenge > peerChallenge) {
wins = true
done = true
} else if (myChallenge < peerChallenge) {
done = true
} else {
++myVersion
myChallenge = genChallenge()
}
}
}
// We return true if we won, otherwise we return false. Either we
or the peer will win, not both. If a connection error occurs,
// this method will throw an exception.
return wins
}
Do you think it's correct? I wonder if there is a way to simplify this
protocol.
Austin
On Sep 12, 2008, at 4:51 PM, Austin Shoemaker wrote:
Ben,
I am able to run algorithm 3 successfully sometimes, though
frequently the servers deadlock in
QuorumCnxManager:initiateConnection on s.read(msgBuffer) when
reading the challenge from the peer.
Calls to initiateConnection and receiveConnection are synchronized,
so only one or the other can be executing at a time. This prevents
two connections from opening between the same pair of servers.
However, it seems that this leads to deadlock, as in this scenario:
A (initiate --> B)
B (initiate --> C)
C (initiate --> A)
initiateConnection can only complete when receiveConnection runs on
the remote peer and answers the challenge. If all servers are
blocked in initiateConnection, receiveConnection never runs and
leader election halts.
Looking forward to your thoughts.
Thanks,
Austin
On Sep 2, 2008, at 10:14 AM, Benjamin Reed wrote:
Austin,
Could you try using the new leader election algorithm? You need to
set
the algorithm type to 3 and you also need to set the election port
(TCP)
to be used.
See http://zookeeper.wiki.sourceforge.net/ZooKeeperConfiguration for
more details.
ben
-----Original Message-----
From: Austin Shoemaker [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 02, 2008 9:57 AM
To: zookeeper-user@hadoop.apache.org
Subject: Leader election stalled
Hi,
We have run into a situation where killing the leader results in
followers
perpetually trying to reelect that leader.
We have 11 zookeeper (2.2.1 from SF.net) servers and 256 clients
connecting
at random. We kill the leader and observe the impact, monitoring a
script
that repeatedly prints the responses to "ruok" and "stat". All
servers
except the killed leader respond with "imok" and "ZooKeeperServer not
running", respectively.
About half of the time, each remaining server gets into a loop of
failing to
connect to the killed leader and then reelecting the killed leader.
Here is an example log, which is representative of similar logs on
the
other
servers. We additionally logged connectivity during leader
election. If
anyone would like complete logs, let me know.
Thanks,
Austin Shoemaker
WARN - [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING
*WARN - [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.22:2889*
ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG
java.net.ConnectException: Connection refused
*
.... cont'd ....*
ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG
java.lang.Exception: shutdown Follower
at
com.yahoo.zookeeper.server.quorum.Follower.shutdown(Follower.java:
364)
at
com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:403)
WARN - [QuorumPeer:[EMAIL PROTECTED] - LOOKING
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.22:2888
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.22:2888
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.21:2888
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.21:2888
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.12:2888
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.12:2888
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.11:2888
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.11:2888
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.12:2890
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.12:2890
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.11:2890
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.11:2890
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.22:2889
*WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Exception occurred
when
sending / receiving packet to / from /10.50.65.22:2889
java.net.SocketTimeoutException: Receive timed out
*WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to
/10.50.65.21:2890
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.21:2890
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.21:2889
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.21:2889
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.12:2889
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.12:2889
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Sending election
packet
to /
10.50.65.11:2889
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Received response
from /
10.50.65.11:2889
WARN - [QuorumPeer:[EMAIL PROTECTED] - Election tally:
WARN - [QuorumPeer:[EMAIL PROTECTED] - 8 -> 1
WARN - [QuorumPeer:[EMAIL PROTECTED] - 4 -> 1
WARN - [QuorumPeer:[EMAIL PROTECTED] - 7 -> 8
WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Election complete,
result.winner = 7
*WARN - [QuorumPeer:[EMAIL PROTECTED] - ----> Election complete,
address
= /10.50.65.22:2889
WARN - [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING
WARN - [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.22:2889
ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG
java.net.ConnectException: Connection refused
* at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at
com
.yahoo.zookeeper.server.quorum.Follower.followLeader(Follower.java:13
3)
at
com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:399)