It will be good to see the logs, however I had one additional thought.
The leader (the zk leader) is the one checking for session MOVED. It
keeps track of which server the session is currently attached to and
will throw the moved exception if the session proposes a request through
a server other than who the leader thinks is the owner.
I'm wondering, if/when you see this again, if you restart the server
that the session is attached to (use netstat on the client for this)
what would happen. The client will re-attach to the cluster, I'm
wondering if this would fix the problem. (rather than trying to restart
the client as you have been doing).
Not sure if you can try this (production env?) but it would be an
interesting additional data point if you can give it a try.
Patrick Hunt wrote:
Yes, if you search "back" (older entries) in the server log you will be
able to see who the leader is, it will say something like "LEADING" or
"FOLLOWING", but this may change over time (which is why you need to
search "back" as I mention) if leadership within the ZK cluster changes
(say due to networking issue). This is why I mention the logs so highly
- it really will give us much additional insight into the issue.
here's an example of a 5 server ensemble:
ph...@valhalla:~/dev/workspace/zkconf/test5[master]$ egrep LEAD
localhost:2184/zoo.log:2010-03-16 12:50:13,711 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2184:quorump...@632] - LEADING
ph...@valhalla:~/dev/workspace/zkconf/test5[master]$ egrep FOLLOW
localhost:2181/zoo.log:2010-03-16 12:50:13,649 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - FOLLOWING
localhost:2182/zoo.log:2010-03-16 12:50:13,933 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2182:quorump...@620] - FOLLOWING
localhost:2183/zoo.log:2010-03-16 12:50:13,901 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:quorump...@620] - FOLLOWING
localhost:2185/zoo.log:2010-03-16 12:50:13,661 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2185:quorump...@620] - FOLLOWING
Additionally if you use the "stat" 4letter word you will see the current
status of the server, leader or follower. (JMX as well)
You might also find this useful: http://github.com/phunt/zktop
Łukasz Osipiuk wrote:
On Tue, Mar 16, 2010 at 20:05, Patrick Hunt <ph...@apache.org> wrote:
We'll probably need the ZK server/client logs to hunt this down. Can you
tell if the MOVED happens in some particular scenario, say you are
to a follower and move to a leader, or perhaps you are connected to
A, get disconnected and reconnected to server A? .... is there some
that could help us understand what's causing this?
When I get to office tomorrow I will try to investigate logs and maybe
i will be able to find out what the error scenario is.
But I am not sure if I will be able to find out what was the role of
each node when problem occurred?
Does zookeeper server log when node state changes between follower and
leader. Or can I make it log it?