I think I understand your confusion Mark. There are actually three ports used. It's always been this way, but there was a trick we could use to avoid requiring the third port in the configuration file. Let me go through the ports and I think it may become clear.
The first port is the "client port". Clients of ZooKeeper connect to this TCP port. The second port is the "quorum port" (not the greatest name). The ZooKeeper servers communicate with each other using this TCP port to process state changes. The third port is the "leader election port". ZooKeeper servers use this port to communicate with each other to elect a leader. Now a couple of questions need to be answered: Q. Why are there a quorum port and a leader election port. Since both are used for server to server communication wouldn't it be better to use just one? A. Yes it would be better. Eventually, we would like to make it that way. The difficulty comes from the different communication topologies in the two cases. In processing state changes we have a star topology. All servers connect to a leader to send and receive changes. For leader election we need a full mesh since we do not have a leader so everyone needs to talk to everyone else. Since the protocols are different and the topologies are different it is easy to just write them as two completely separate pieces of code. Q. Why did I not have to specify the election port in the sourceforge releases? What is this "trick to avoid specifying the election port in the config file"? A. As it turns out, there are a couple of versions of leader election. The default version on sourceforge was UDP based. Because the UDP and TCP have different port namespaces, we could use the same port number for both, so we use the quorum port specified in the config file for both updates and leader election. On Apache we changed the default to a TCP based leader election. (It's faster and deals with firewalls better.) When leader election uses TCP, we can't use our trick anymore and we need another port number for leader election. Does this make sense? Unfortunately the transition to Apache has taken a long time. We probably will not have a stable release for a couple more weeks. (Unlike sourceforge we cannot decide a release is ready and push it out that evening. Apache has a much more involved process.) Future development will take place on Apache. There is a bug with sync() that we want to fix on sourceforge and do another release, but I don't expect there will be anymore releases after that on sourceforge. If you need leader election to run on different ports, until ZOOKEEPER-127 is fixed you can use the configuration file to set the leader election algorithm to 0. That was the default on sourceforge. Thanx ben -----Original Message----- From: mark harwood [mailto:[EMAIL PROTECTED] Sent: Thursday, August 28, 2008 2:55 AM To: firstname.lastname@example.org Subject: RE: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other >>Please use a port for electionPort different from the one you're using in the server configuration. I think I am getting confused with the range of port numbers that must be defined. I had assumed there were only 2 types - clientPort and electionPort representing the client-server comms and the server-server comms respectively as shown in the overview diagram below: http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription It sounds like there may be another type of port to deal with - is this right? I previously added a comment about electionPorts to the Wiki documentation here ( http://zookeeper.wiki.sourceforge.net/ZooKeeperGettingStarted ) to clarify my understanding of the config settings. While this interpretation works OK in sourceforge 2.2.1 I am now confused as to the arrangement in Apache 3.0. I spent most of yesterday debugging and trying different configuration files. Sourceforge version worked fine but when I flipped the jars to the Apache version (keeping the same zoo config files) it just wouldn't work -either running on a single machine or multiples. Sometimes this was because the Apache version was using the wrong port to try talk to another machine (when I configured each server with different election port settings) and sometimes a single server would get a BindException trying to open the same ServerSocket twice. I suspect this may be down to my misunderstanding of the ports now used and a change since the sourceforge version. Can you cast any more light on this? I'd also be keen to get some advice on whether to go with sourceforge 2.2.1 or Apache 3.x for an upcoming deployment to a live system. I imagine ZK 3 may be a bit of a moving target but is more likely to get bug-fixed than zk 2.2.1? Many thanks, Mark Send instant messages to your online friends http://uk.messenger.yahoo.com