With 0.8, we have a situation where a broker is removing itself (or being removed) as a leader for no apparent reason. The cluster has 3 nodes. In this case, broker id=1 stopped leading. This is what I see in the server.log at the time it stopped leading:
[2013-09-22 14:00:06,141] INFO re-registering broker info in ZK for broker 1 (kafka.server.KafkaZooKeeper) [2013-09-22 14:00:06,507] INFO Registered broker 1 at path /brokers/ids/1 with address 10.27.63.37:9092. (kafka.utils.ZkUtils$) [2013-09-22 14:00:06,508] INFO done re-registering broker (kafka.server.KafkaZooKeeper) [2013-09-22 14:00:06,509] INFO Subscribing to /brokers/topics path to watch for new topics (kafka.server.KafkaZooKeeper) [2013-09-22 14:00:06,515] INFO Closing socket connection to /10.27.63.37. (kafka.network.Processor) [2013-09-22 14:00:06,519] INFO conflict in /controller data: 1 stored data: 2 (kafka.utils.ZkUtils$) [2013-09-22 14:00:06,526] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) The broker process itself stayed up and I was able to get it back to leading by simply running the preferred-replica-election tool. Looking at server.log, controller.log and state-change.log on all 3 brokers, it's unclear what triggered this. I thought it might be a problem communicating with ZK but I don't see any such errors. The broker had been running fine for several days prior to this. I looked at the gc logs and I don't see any long running garbage collection at that time. What else should I be looking for? Thanks, Paul
