The logs show that the broker had to re-register its broker information in zookeeper. That would mean its previous registration was lost. It could be GC on the broker or some issue on zookeeper side. It will help of you send around the log4j log before the re-registration.
Another thing that will help is to send around the output of the state change log merger tool https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-7.StateChangeLogMergerTool Thanks, Neha On Sep 22, 2013 5:42 PM, "Paul Mackles" <[email protected]> wrote: > With 0.8, we have a situation where a broker is removing itself (or being > removed) as a leader for no apparent reason. The cluster has 3 nodes. In > this case, broker id=1 stopped leading. This is what I see in the > server.log at the time it stopped leading: > > [2013-09-22 14:00:06,141] INFO re-registering broker info in ZK for broker > 1 (kafka.server.KafkaZooKeeper) > [2013-09-22 14:00:06,507] INFO Registered broker 1 at path /brokers/ids/1 > with address 10.27.63.37:9092. (kafka.utils.ZkUtils$) > [2013-09-22 14:00:06,508] INFO done re-registering broker > (kafka.server.KafkaZooKeeper) > [2013-09-22 14:00:06,509] INFO Subscribing to /brokers/topics path to > watch for new topics (kafka.server.KafkaZooKeeper) > [2013-09-22 14:00:06,515] INFO Closing socket connection to /10.27.63.37. > (kafka.network.Processor) > [2013-09-22 14:00:06,519] INFO conflict in /controller data: 1 stored > data: 2 (kafka.utils.ZkUtils$) > [2013-09-22 14:00:06,526] INFO New leader is 2 > (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) > > The broker process itself stayed up and I was able to get it back to > leading by simply running the preferred-replica-election tool. Looking at > server.log, controller.log and state-change.log on all 3 brokers, it's > unclear what triggered this. I thought it might be a problem communicating > with ZK but I don't see any such errors. The broker had been running fine > for several days prior to this. I looked at the gc logs and I don't see any > long running garbage collection at that time. > > What else should I be looking for? > > Thanks, > Paul > >
