Re: How Producer handles Network Connectivity Issues
Hi Kamal, In order to monitor each instance of producer, you will need to have alternative network monitoring channel (e.g Flume or Another Kafka Cluster for just monitoring a producers at large scale). Here is detail: 1) Add Custom Appender for Log4J and intercept all logs of Kafka Producer java Package. 2) Capture WARN and ERROR logs and log it to disk 3) Have Flume Agent or any program that can ship logs on disk to remote location or central location ( Basically use alternative system to transport logs and ingestion log to any monitoring system or Elastic Search ) This assumes that you have network and/or physical layer redundancy for this critical monitoring. I hope this helps ! Thanks, Bhavesh On Wed, May 27, 2015 at 10:37 AM, Kamal C kamaltar...@gmail.com wrote: Thanks for the response Ewen! On Tue, May 26, 2015 at 10:52 PM, Ewen Cheslack-Postava e...@confluent.io wrote: It's not being switched in this case because the broker hasn't failed. It can still connect to all the other brokers and zookeeper. The only failure is of the link between a client and the broker. Another way to think of this is to extend the scenario with more producers. If I have 100 other producers and they can all still connect, would you still consider this a failure and expect the leader to change? Since network partitions (or periods of high latency, or long GC pauses, etc) can happen arbitrarily and clients might be spread far and wide, you can't rely on their connectivity as an indicator of the health of the Kafka broker. Of course, there's also a pretty big practical issue: since the client can't connect to the broker, how would it even report that it has a connectivity issue? -Ewen On Mon, May 25, 2015 at 10:05 PM, Kamal C kamaltar...@gmail.com wrote: Hi, I have a cluster of 3 Kafka brokers and a remote producer. Producer started to send messages to *SampleTopic*. Then I blocked the network connectivity between the Producer and the leader node for the topic *SampleTopic* but network connectivity is healthy between the cluster and producer is able to reach the other two nodes. *With Script* sh kafka-topics.sh --zookeeper localhost --describe Topic:SampleTopicPartitionCount:1ReplicationFactor:3 Configs: Topic: SampleTopicPartition: 0Leader: 1Replicas: 1,2,0 Isr: 1,2,0 Producer tries forever to reach the leader node by throwing connection refused exception. I understand that when there is a node failure leader gets switched. Why it's not switching the leader in this scenario ? -- Kamal C -- Thanks, Ewen
Re: How Producer handles Network Connectivity Issues
Thanks for the response Ewen! On Tue, May 26, 2015 at 10:52 PM, Ewen Cheslack-Postava e...@confluent.io wrote: It's not being switched in this case because the broker hasn't failed. It can still connect to all the other brokers and zookeeper. The only failure is of the link between a client and the broker. Another way to think of this is to extend the scenario with more producers. If I have 100 other producers and they can all still connect, would you still consider this a failure and expect the leader to change? Since network partitions (or periods of high latency, or long GC pauses, etc) can happen arbitrarily and clients might be spread far and wide, you can't rely on their connectivity as an indicator of the health of the Kafka broker. Of course, there's also a pretty big practical issue: since the client can't connect to the broker, how would it even report that it has a connectivity issue? -Ewen On Mon, May 25, 2015 at 10:05 PM, Kamal C kamaltar...@gmail.com wrote: Hi, I have a cluster of 3 Kafka brokers and a remote producer. Producer started to send messages to *SampleTopic*. Then I blocked the network connectivity between the Producer and the leader node for the topic *SampleTopic* but network connectivity is healthy between the cluster and producer is able to reach the other two nodes. *With Script* sh kafka-topics.sh --zookeeper localhost --describe Topic:SampleTopicPartitionCount:1ReplicationFactor:3Configs: Topic: SampleTopicPartition: 0Leader: 1Replicas: 1,2,0 Isr: 1,2,0 Producer tries forever to reach the leader node by throwing connection refused exception. I understand that when there is a node failure leader gets switched. Why it's not switching the leader in this scenario ? -- Kamal C -- Thanks, Ewen
Re: How Producer handles Network Connectivity Issues
It's not being switched in this case because the broker hasn't failed. It can still connect to all the other brokers and zookeeper. The only failure is of the link between a client and the broker. Another way to think of this is to extend the scenario with more producers. If I have 100 other producers and they can all still connect, would you still consider this a failure and expect the leader to change? Since network partitions (or periods of high latency, or long GC pauses, etc) can happen arbitrarily and clients might be spread far and wide, you can't rely on their connectivity as an indicator of the health of the Kafka broker. Of course, there's also a pretty big practical issue: since the client can't connect to the broker, how would it even report that it has a connectivity issue? -Ewen On Mon, May 25, 2015 at 10:05 PM, Kamal C kamaltar...@gmail.com wrote: Hi, I have a cluster of 3 Kafka brokers and a remote producer. Producer started to send messages to *SampleTopic*. Then I blocked the network connectivity between the Producer and the leader node for the topic *SampleTopic* but network connectivity is healthy between the cluster and producer is able to reach the other two nodes. *With Script* sh kafka-topics.sh --zookeeper localhost --describe Topic:SampleTopicPartitionCount:1ReplicationFactor:3Configs: Topic: SampleTopicPartition: 0Leader: 1Replicas: 1,2,0 Isr: 1,2,0 Producer tries forever to reach the leader node by throwing connection refused exception. I understand that when there is a node failure leader gets switched. Why it's not switching the leader in this scenario ? -- Kamal C -- Thanks, Ewen