After some further analysis I think I have found a bug.

In QuorumCnxManager.toSend there is a call to create a connection as follows:
    channel = SocketChannel.open(new InetSocketAddress(addr, port));

Unfortunately "addr" is the ip address of a remote server while "port" is the 
electionPort of *this* server.
As an example, given this configuration (taken from my zoo.cfg)
  server.1=10.20.9.254:2881
  server.2=10.20.9.9:2882
  server.3=10.20.9.254:2883
Server 3 was observed trying to make a connection to host 10.20.9.9 on port 
2883 and obviously failing.

In tests where all machines use the same electionPort this bug would not 
manifest itself.

Cheers,
Mark







----- Original Message ----
From: mark harwood <[EMAIL PROTECTED]>
To: zookeeper-user@hadoop.apache.org
Sent: Wednesday, 27 August, 2008 12:11:58
Subject: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing 
to find each other

First a quick thanks for releasing this project - very useful.

I've had success working with the sourceforge version (2.2.1) and just tried 
moving to the Apache SVN trunk version and found the servers fail to find each 
other.

My test environment has 3 zookeeper servers all running on the same machine, 
started from the command line in different directories.
I changed my startup batch files to run QuorumPeerMain in place of conf 
QuorumPeer, wiped the data directories (keeping the "myid" files) and used the 
previous zoo.cfg files (an example below).

#########  Server 1 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2181
electionPort=2881
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

#########  Server 2 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2182
electionPort=2882
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

#########  Server 3 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2183
electionPort=2883
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

Firing up each server, they all hang with the following output

D:\tmp\Zookeeper3Servers\server2>java -cp lib\zookeeper-dev.jar;lib\log4j-1.2.15
.jar;conf org.apache.zookeeper.server.quorum.QuorumPeerMain conf/zoo.cfg
INFO  - [QuorumPeer:[EMAIL PROTECTED] - LOOKING
WARN  - [QuorumPeer:[EMAIL PROTECTED] - New election: 0

I tried firing up one of the servers from Eclipse in debug mode  and it 
appeared to loop around FastLeaderElection.lookForLeader().

While poking around in the debugger I also noticed that in 
QuorumCnxManager.toSend this test failed:
    if (addr.equals(localIP)) 
..because addr was held as "localhost/127.0.0.1" and localIP was held as my 
10.20.x.x address on the local network.
I tried changing the zoo.cfg files to the 10.20.x.x address and this made the 
above "if" statement evaluate to true but the end result was the same - servers 
failing to connect.

If it helps, the logging from my sourceforge 2.2.1 run of the above config 
produces the following and works fine:

D:\servers\IeIncrementalIndexingTests\ZookeeperServers\server3>java -cp lib\zook
eeper-dev.jar;lib\log4j-1.2.15.jar;conf com.yahoo.zookeeper.server.quorum.Quorum
Peer conf/zoo.cfg
WARN  - [QuorumPeer:[EMAIL PROTECTED] - LOOKING
WARN  - [QuorumPeer:[EMAIL PROTECTED] - Election tally:
WARN  - [QuorumPeer:[EMAIL PROTECTED] - 3      -> 1
WARN  - [QuorumPeer:[EMAIL PROTECTED] - 1      -> 1
WARN  - [QuorumPeer:[EMAIL PROTECTED] - 2      -> 1
WARN  - [QuorumPeer:[EMAIL PROTECTED] - Election tally:
WARN  - [QuorumPeer:[EMAIL PROTECTED] - 3      -> 1
WARN  - [QuorumPeer:[EMAIL PROTECTED] - 2      -> 2
WARN  - [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING
WARN  - [QuorumPeer:[EMAIL PROTECTED] - Following localhost/127.0.0.1:2882
WARN  - [QuorumPeer:[EMAIL PROTECTED] - Getting a snapshot from leader
WARN  - [NIOServerCxn.Factory:[EMAIL PROTECTED] - Connected to /127.0.0.1:2375
lastZxid 0
WARN  - [NIOServerCxn.Factory:[EMAIL PROTECTED] - Creating new session 31c03d95
1fe0000
WARN  - [QuorumPeer:[EMAIL PROTECTED] - Got zxid 100000001 expected 1
WARN  - [SyncThread:[EMAIL PROTECTED] - Elapsed 10717 ms: Logfile padding 
exceeded ti
me threshold
WARN  - [Thread-0:[EMAIL PROTECTED] - Finished init of 31c03d951fe0000: true

This looks to be using a different leader election algo. 

Any ideas?
Cheers,
Mark


Send instant messages to your online friends http://uk.messenger.yahoo.com


Send instant messages to your online friends http://uk.messenger.yahoo.com

Reply via email to