According to the log for 222 it can't open a connection to the election port (3888) for any of the other servers. This seems very unusual. Can you verify that ther's connectivity on that port btw 222 and all the other servers?

Also, can you re-run the netstat with -a option? We can see the listen sockets that way (omitted by netstat by default). It would be great if you could send the netstat for all 5 servers.


Thanks,

Patrick

Jean-Daniel Cryans wrote:
Everything is here http://people.apache.org/~jdcryans/zk_election_bug.tar.gz

The server we are trying to start is sv4borg222 (myid is 2) and we
started it around 10:03:21

Thx!

J-D

On Mon, Jan 25, 2010 at 10:49 AM, Patrick Hunt <ph...@apache.org> wrote:
1) Capture the logs from all 5 servers
2) give the config for the "down" server, also indicate that it's server id
is.
3) if possible it would be interesting to see the netstat information from 2
of the servers - the one that's down and one or more of the others.

Patrick

Jean-Daniel Cryans wrote:
I believe we've just hit the same problem with zk-3.2.1

For some reason a machine crashed and it was part of our quorum of 5
servers. When we try to restart it it this does this (I replaced
hostname and IP):

2010-01-25 10:25:06,469 WARN
org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open
channel to 1 at election address somehost1/someip1:3888
java.net.ConnectException: Connection refused
       at sun.nio.ch.Net.connect(Native Method)
       at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
       at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
       at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
       at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:356)
       at
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:603)
       at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:488)

It has been like that for almost 20 minutes now, trying every other
server in the quorum on different channels. ruok says imok but all
other commands say that ZK server isn't running. I don't believe that
3.2.2 will help unless ZK-547 does more than it seems to.

Any else I should look at?

Thx!

J-D

On Wed, Jan 13, 2010 at 11:19 AM, Nick Bailey <ni...@mailtrust.com> wrote:
So the solution for us was to just nuke zookeeper and restart everywhere.
 We will also be upgrading soon as well.

To answer your question, yes I believe all the servers were running
normally
except for the fact that they were experiencing high CPU usage.  As we
began
to see some CPU alerts I started restarting some of the servers.

It was then that we noticed that they were not actually running according
to
'stat'.

I still have the log from one server with a debug level and the rest with
a
warn level. If you would like to see any of these and analyze them just
let
me know.

Thanks for the help,
Nick Bailey

Reply via email to