With 0.8, we have a situation where a broker is removing itself (or being 
removed) as a leader for no apparent reason. The cluster has 3 nodes. In this 
case, broker id=1 stopped leading. This is what I see in the server.log at the 
time it stopped leading:

[2013-09-22 14:00:06,141] INFO re-registering broker info in ZK for broker 1 
(kafka.server.KafkaZooKeeper)
[2013-09-22 14:00:06,507] INFO Registered broker 1 at path /brokers/ids/1 with 
address 10.27.63.37:9092. (kafka.utils.ZkUtils$)
[2013-09-22 14:00:06,508] INFO done re-registering broker 
(kafka.server.KafkaZooKeeper)
[2013-09-22 14:00:06,509] INFO Subscribing to /brokers/topics path to watch for 
new topics (kafka.server.KafkaZooKeeper)
[2013-09-22 14:00:06,515] INFO Closing socket connection to /10.27.63.37. 
(kafka.network.Processor)
[2013-09-22 14:00:06,519] INFO conflict in /controller data: 1 stored data: 2 
(kafka.utils.ZkUtils$)
[2013-09-22 14:00:06,526] INFO New leader is 2 
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)

The broker process itself stayed up and I was able to get it back to leading by 
simply running the preferred-replica-election tool. Looking at server.log, 
controller.log and state-change.log on all 3 brokers, it's unclear what 
triggered this. I thought it might be a problem communicating with ZK but I 
don't see any such errors. The broker had been running fine for several days 
prior to this. I looked at the gc logs and I don't see any long running garbage 
collection at that time.

What else should I be looking for?

Thanks,
Paul

Reply via email to