No worries. Kudos to Mahadev sniffing out the UDP in the netstat, I glossed right over it. ;-)

Lots of good fixes in 3.2.2 vs pre-3.2. Still doesn't explain what Nick was seeing originally though...


Patrick

Jean-Daniel d wrote:
Oh my god! You are right, we run an old dev version of 3.2.0:

zookeeper-r785019-hbase-1329.jar

This was what we shipped HBase trunk with last summer... This quorum
has an uptime of more than 6 months! Well I guess that explains it, I
thought we restarted it since then during our HBase upgrades but it
seems not so I'm very sorry about this false alert.

So... all I can say is thank you guys for such a reliable software!
We'll be upgrading to 3.2.2 really soon.

J-D

On Mon, Jan 25, 2010 at 1:44 PM, Patrick Hunt <ph...@apache.org> wrote:
JD, there's something _very_ unusual in your setup. Are you running
"official" released ZooKeeper code or something else?

Either there is a misconfiguration on the other servers (the configs for the
other servers is exactly the same as 222 right?), or perhaps some patches to
ZK codebase that went awry?

See the attached file "zk_ports.txt". This is a summary of the netstat -a
that you sent. Notice in particular that UDP sockets are open for port 2888!
This should not happen in the default ZK configuration case.

By default we only use tcp connections between servers (quorum & election).
There is a "electionAlg" option that allows users to turn off the TCP based
fast leader election and go with a UDP based, but I don't see that in the
config you provided for 222. (as I said, assuming you are not setting this
option on the other servers either, correct?).


Mahadev and I do remember that there was a bug in the 3.2 branch prior to
3.2 ever being released that caused us to use non-FLE (so UDP based)
election by default, however we fixed that before 3.2.0 ever shipped (it was
a bug in our config processing code) and it was never exposed in an official
release. Perhaps you have picked up some code prior to that?

Patrick

Jean-Daniel Cryans wrote:
According to the log for 222 it can't open a connection to the election
port
(3888) for any of the other servers. This seems very unusual. Can you
verify
that ther's connectivity on that port btw 222 and all the other servers?
jdcry...@sv4borg222:~$ telnet sv4borg224 3888
Trying 10.10.20.224...
telnet: Unable to connect to remote host: Connection refused
jdcry...@sv4borg222:~$ telnet sv4borg224 2888
Trying 10.10.20.224...
Connected to sv4borg224.
Escape character is '^]'.

Also, can you re-run the netstat with -a option? We can see the listen
sockets that way (omitted by netstat by default). It would be great if
you
could send the netstat for all 5 servers.
I updated the tar.gz with the 5 netstat -anp

Thx!

J-D

Thanks,

Patrick

Jean-Daniel Cryans wrote:
Everything is here
http://people.apache.org/~jdcryans/zk_election_bug.tar.gz

The server we are trying to start is sv4borg222 (myid is 2) and we
started it around 10:03:21

Thx!

J-D

tcp6       0      0 10.10.20.221:34865      10.10.20.224:2888
ESTABLISHED 14682/java
udp6       0      0 :::2888                 :::*
   14682/java


tcp6       0      0 :::3888                 :::*                    LISTEN
   4092/java
unix  2      [ ]         STREAM     CONNECTED     721588877 7642/java


tcp6       0      0 10.10.20.223:42518      10.10.20.224:2888
ESTABLISHED 2704/java
udp6       0      0 :::2888                 :::*
   2704/java


tcp6       0      0 :::2888                 :::*                    LISTEN
   31052/java
tcp6       0      0 10.10.20.224:2888       10.10.20.223:42518
 ESTABLISHED 31052/java
tcp6       0      0 10.10.20.224:2888       10.10.20.225:51459
 ESTABLISHED 31052/java
tcp6       0      0 10.10.20.224:2888       10.10.20.221:34865
 ESTABLISHED 31052/java
udp6       0      0 :::2888                 :::*
   31052/java


tcp6       0      0 10.10.20.225:51459      10.10.20.224:2888
ESTABLISHED 19545/java
udp6       0      0 :::2888                 :::*
   19545/java


Reply via email to