I think I understand your confusion Mark. There are actually three ports
used. It's always been this way, but there was a trick we could use to
avoid requiring the third port in the configuration file. Let me go
through the ports and I think it may become clear.

The first port is the "client port". Clients of ZooKeeper connect to
this TCP port.

The second port is the "quorum port" (not the greatest name). The
ZooKeeper servers communicate with each other using this TCP port to
process state changes.

The third port is the "leader election port". ZooKeeper servers use this
port to communicate with each other to elect a leader.

Now a couple of questions need to be answered:

Q. Why are there a quorum port and a leader election port. Since both
are used for server to server communication wouldn't it be better to use
just one?

A. Yes it would be better. Eventually, we would like to make it that
way. The difficulty comes from the different communication topologies in
the two cases. In processing state changes we have a star topology. All
servers connect to a leader to send and receive changes. For leader
election we need a full mesh since we do not have a leader so everyone
needs to talk to everyone else. Since the protocols are different and
the topologies are different it is easy to just write them as two
completely separate pieces of code.

Q. Why did I not have to specify the election port in the sourceforge
releases? What is this "trick to avoid specifying the election port in
the config file"?

A. As it turns out, there are a couple of versions of leader election.
The default version on sourceforge was UDP based. Because the UDP and
TCP have different port namespaces, we could use the same port number
for both, so we use the quorum port specified in the config file for
both updates and leader election. On Apache we changed the default to a
TCP based leader election. (It's faster and deals with firewalls
better.) When leader election uses TCP, we can't use our trick anymore
and we need another port number for leader election.

Does this make sense?

Unfortunately the transition to Apache has taken a long time. We
probably will not have a stable release for a couple more weeks. (Unlike
sourceforge we cannot decide a release is ready and push it out that
evening. Apache has a much more involved process.) Future development
will take place on Apache. There is a bug with sync() that we want to
fix on sourceforge and do another release, but I don't expect there will
be anymore releases after that on sourceforge.

If you need leader election to run on different ports, until
ZOOKEEPER-127 is fixed you can use the configuration file to set the
leader election algorithm to 0. That was the default on sourceforge.

-----Original Message-----
From: mark harwood [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 28, 2008 2:55 AM
To: zookeeper-user@hadoop.apache.org
Subject: RE: Migrating from sourceforge 2.2.1 to Apache trunk -
QuorumPeers failing to find each other

>>Please use a port for electionPort different from the one you're using
in the server configuration.

I think I am getting confused with the range of port numbers that must
be defined. I had assumed there were only 2 types - clientPort and
electionPort representing the client-server comms and the server-server
comms respectively as shown in the overview diagram below:

It sounds like there may be another type of port to deal with - is this

I previously added a comment about electionPorts to the Wiki
documentation here (
http://zookeeper.wiki.sourceforge.net/ZooKeeperGettingStarted ) to
clarify my understanding of the config settings. 
While this interpretation works OK in sourceforge 2.2.1 I am now
confused as to the arrangement in Apache 3.0. I spent most of yesterday
debugging and trying different configuration files. Sourceforge version
worked fine but when I flipped the jars to the Apache version (keeping
the same zoo config files) it just wouldn't work -either running on a
single machine or multiples. Sometimes this was because the Apache
version was using the wrong port to try talk to another machine (when I
configured each server with different election port settings) and
sometimes a single server would get a BindException trying to open the
same ServerSocket twice.

I suspect this may be down to my misunderstanding of the ports now used
and a change since the sourceforge version. Can you cast any more light
on this?

I'd also be keen to get some advice on whether to go with sourceforge
2.2.1 or Apache 3.x for an upcoming deployment to a live system. I
imagine ZK 3 may be a bit of a moving target but is more likely to get
bug-fixed than zk 2.2.1?

Many thanks,

Send instant messages to your online friends

Reply via email to