JD, there's something _very_ unusual in your setup. Are you running
"official" released ZooKeeper code or something else?
Either there is a misconfiguration on the other servers (the configs for
the other servers is exactly the same as 222 right?), or perhaps some
patches to ZK codebase that went awry?
See the attached file "zk_ports.txt". This is a summary of the netstat
-a that you sent. Notice in particular that UDP sockets are open for
port 2888! This should not happen in the default ZK configuration case.
By default we only use tcp connections between servers (quorum &
election). There is a "electionAlg" option that allows users to turn off
the TCP based fast leader election and go with a UDP based, but I don't
see that in the config you provided for 222. (as I said, assuming you
are not setting this option on the other servers either, correct?).
Mahadev and I do remember that there was a bug in the 3.2 branch prior
to 3.2 ever being released that caused us to use non-FLE (so UDP based)
election by default, however we fixed that before 3.2.0 ever shipped (it
was a bug in our config processing code) and it was never exposed in an
official release. Perhaps you have picked up some code prior to that?
Patrick
Jean-Daniel Cryans wrote:
According to the log for 222 it can't open a connection to the election port
(3888) for any of the other servers. This seems very unusual. Can you verify
that ther's connectivity on that port btw 222 and all the other servers?
jdcry...@sv4borg222:~$ telnet sv4borg224 3888
Trying 10.10.20.224...
telnet: Unable to connect to remote host: Connection refused
jdcry...@sv4borg222:~$ telnet sv4borg224 2888
Trying 10.10.20.224...
Connected to sv4borg224.
Escape character is '^]'.
Also, can you re-run the netstat with -a option? We can see the listen
sockets that way (omitted by netstat by default). It would be great if you
could send the netstat for all 5 servers.
I updated the tar.gz with the 5 netstat -anp
Thx!
J-D
Thanks,
Patrick
Jean-Daniel Cryans wrote:
Everything is here
http://people.apache.org/~jdcryans/zk_election_bug.tar.gz
The server we are trying to start is sv4borg222 (myid is 2) and we
started it around 10:03:21
Thx!
J-D
tcp6 0 0 10.10.20.221:34865 10.10.20.224:2888 ESTABLISHED
14682/java
udp6 0 0 :::2888 :::*
14682/java
tcp6 0 0 :::3888 :::* LISTEN
4092/java
unix 2 [ ] STREAM CONNECTED 721588877 7642/java
tcp6 0 0 10.10.20.223:42518 10.10.20.224:2888 ESTABLISHED
2704/java
udp6 0 0 :::2888 :::*
2704/java
tcp6 0 0 :::2888 :::* LISTEN
31052/java
tcp6 0 0 10.10.20.224:2888 10.10.20.223:42518 ESTABLISHED
31052/java
tcp6 0 0 10.10.20.224:2888 10.10.20.225:51459 ESTABLISHED
31052/java
tcp6 0 0 10.10.20.224:2888 10.10.20.221:34865 ESTABLISHED
31052/java
udp6 0 0 :::2888 :::*
31052/java
tcp6 0 0 10.10.20.225:51459 10.10.20.224:2888 ESTABLISHED
19545/java
udp6 0 0 :::2888 :::*
19545/java