It could be that broker 1 and 3 can't communicate with broker 2 and the consumer client. You may want to read https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whycan'tmyconsumers/producersconnecttothebrokers ?
Thanks, Jun On Thu, Sep 25, 2014 at 1:52 PM, florent valdelievre < florentvaldelie...@gmail.com> wrote: > Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT > > Kafka version: kafka_2.8.0-0.8.1.1 > > I have the following architecture/configuration > > staging2.mtl.shopmedia.com (broker.id=1) > > zookeeper:9092 > > kafka:2181 > > staging3.mtl.shopmedia.com(broker.id=2) > > zookeeper:9092 > > kafka:2181 > > centos.mtl.shopmedia.com(broker.id=3) > > zookeeper:9092 > > kafka:2181 > > Each kafka server has the same configuration except broker.idand log.dirs > > broker.id=XXX > > port=9092 > > num.network.threads=2 > > num.io.threads=8 > > socket.send.buffer.bytes=1048576 > > socket.receive.buffer.bytes=1048576 > > socket.request.max.bytes=104857600 > > log.dirs=/home/shopmedia/nfs/logs/XXX/kafka > > num.partitions=1 > > log.retention.hours=1 > > log.segment.bytes=536870912 > > log.retention.check.interval.ms=60000 > > log.cleaner.enable=false > > zookeeper.connect=staging2.mtl.shopmedia.com:2181, > staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 > > zookeeper.connection.timeout.ms=1000000 > > auto.create.topics.enable=true > > default.replication.factor=3 > > Zookeeper configuration is also the same on all servers: > > dataDir=/home/shopmedia/apps/zookeeper/data > > clientPort=2181 > > maxClientCnxns=0 > > I have only 1 topic and 1 partition > > I have 3 servers(staging2, staging3 and centos) in case of failover. Each > partition should be replicated among all kafka brokers ( as replica.factor > = 3 ) > > I have created my topic like this: > > kafka-topics.sh --create --zookeeperstaging2.mtl.shopmedia.com:2181, > staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 --topic > hibe-user-server-event --partitions 1 --replication-factor 3 > > Then I check the topic configuration: > > [shopmedia@staging3:~] $kafka-topics.sh --describe --zookeeper > staging2.mtl.shopmedia.com:2181,staging3.mtl.shopmedia.com:2181, > centos.mtl.shopmedia.com:2181 --topic hibe-user-server-event > > Topic:hibe-user-server-event PartitionCount:1 ReplicationFactor:3 > Configs: > > Topic: hibe-user-server-event Partition: 0 Leader: 2 Replicas: > 1,2,3 Isr: 2 > > According to the describe, my broker leader is 2 (staging3) > > QUESTIONS) > > 1) Why Isr(In Sync Replica) is only 2 and not 1,2,3? This way, if the > leader2 crashes, the other broker won't have any data > > 2) > > I am running a consumers on each machine(staging2, staging3 and centos) > with the following command: > > kafka-console-consumer.sh --zookeeperstaging2.mtl.shopmedia.com:2181, > staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 --topic > hibe-user-server-event > > All my servers are up and running(Zoo + kafka) > > I start a producer from staging2: > > kafka-console-producer.sh --topic hibe-user-server-event --broker-list= > staging2.mtl.shopmedia.com:9092,staging3.mtl.shopmedia.com:9092, > centos.mtl.shopmedia.com:9092 > > All my consumers receive the message properly. > > I shutdown 1 and 3(staging2 and centos) > > My consumers still receives the message from the producer( good !) > > I restart 1 and 3 ( so all servers are running like before) > > I shut 2 only(Leader becomes 1, ISR: 1), My consumers don't receive anymore > message and stdout have the following: > > Staging2 > > [2014-09-25 04:23:57,602] ERROR > > [ConsumerFetcherThread-console-consumer-4903_staging2.hibe.com-1411630863195-cbe7a1e8-0-1], > Error for partition [hibe-user-server-event,0] tobroker 1:class > kafka.common.UnknownTopicOrPartitionException > (kafka.consumer.ConsumerFetcherThread) > > Staging3 > > [2014-09-25 04:23:58,459] ERROR > > [ConsumerFetcherThread-console-consumer-99699_staging3.hibe.com-1411630877045-98f884fa-0-1], > Error for partition [hibe-user-server-event,0]to broker 1:class > kafka.common.NotLeaderForPartitionException > (kafka.consumer.ConsumerFetcherThread) > > Centos > > [2014-09-25 04:21:42,393] ERROR > > [ConsumerFetcherThread-console-consumer-38882_centos.mtl.shopmedia.com-1411630833934-e6ceffde-0-1], > Error for partition [hibe-user-server-event,0] to broker 1:class > kafka.common.NotLeaderForPartitionException > (kafka.consumer.ConsumerFetcherThread) > > Conclusion: When I shut the broker leader, my consumers can't catch up ( I > suspect this is because ISR is not up to date ) > > Any idea ? >