Re: How Producer handles Network Connectivity Issues

2015-05-29 Thread Bhavesh Mistry
Hi Kamal,

In order to monitor each instance of producer, you will need to have
alternative network monitoring channel (e.g Flume or Another Kafka Cluster
for just monitoring a producers at large scale).

Here is detail:

1) Add Custom Appender for Log4J and intercept all logs of Kafka Producer
java Package.
2) Capture WARN and ERROR logs and log it to disk
3) Have Flume Agent or any program that can ship logs on disk to  remote
location or central location ( Basically use alternative system to
transport logs and ingestion log to any monitoring system or Elastic Search
)

This assumes that you have network and/or physical layer redundancy for
this critical monitoring.  I hope this helps !


Thanks,
Bhavesh

On Wed, May 27, 2015 at 10:37 AM, Kamal C kamaltar...@gmail.com wrote:

 Thanks for the response Ewen!

 On Tue, May 26, 2015 at 10:52 PM, Ewen Cheslack-Postava e...@confluent.io
 
 wrote:

  It's not being switched in this case because the broker hasn't failed. It
  can still connect to all the other brokers and zookeeper. The only
 failure
  is of the link between a client and the broker.
 
  Another way to think of this is to extend the scenario with more
 producers.
  If I have 100 other producers and they can all still connect, would you
  still consider this a failure and expect the leader to change? Since
  network partitions (or periods of high latency, or long GC pauses, etc)
 can
  happen arbitrarily and clients might be spread far and wide, you can't
 rely
  on their connectivity as an indicator of the health of the Kafka broker.
 
  Of course, there's also a pretty big practical issue: since the client
  can't connect to the broker, how would it even report that it has a
  connectivity issue?
 
  -Ewen
 
  On Mon, May 25, 2015 at 10:05 PM, Kamal C kamaltar...@gmail.com wrote:
 
   Hi,
  
   I have a cluster of 3 Kafka brokers and a remote producer. Producer
   started to send messages to *SampleTopic*. Then I blocked the network
   connectivity between the Producer and the leader node for the topic
   *SampleTopic* but network connectivity is healthy between the cluster
 and
   producer is able to reach the other two nodes.
  
   *With Script*
  
   sh kafka-topics.sh --zookeeper localhost --describe
   Topic:SampleTopicPartitionCount:1ReplicationFactor:3
 Configs:
   Topic: SampleTopicPartition: 0Leader: 1Replicas: 1,2,0
   Isr: 1,2,0
  
  
   Producer tries forever to reach the leader node by throwing connection
   refused exception. I understand that when there is a node failure
 leader
   gets switched. Why it's not switching the leader in this scenario ?
  
   --
   Kamal C
  
 
 
 
  --
  Thanks,
  Ewen
 



Re: How Producer handles Network Connectivity Issues

2015-05-27 Thread Kamal C
Thanks for the response Ewen!

On Tue, May 26, 2015 at 10:52 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 It's not being switched in this case because the broker hasn't failed. It
 can still connect to all the other brokers and zookeeper. The only failure
 is of the link between a client and the broker.

 Another way to think of this is to extend the scenario with more producers.
 If I have 100 other producers and they can all still connect, would you
 still consider this a failure and expect the leader to change? Since
 network partitions (or periods of high latency, or long GC pauses, etc) can
 happen arbitrarily and clients might be spread far and wide, you can't rely
 on their connectivity as an indicator of the health of the Kafka broker.

 Of course, there's also a pretty big practical issue: since the client
 can't connect to the broker, how would it even report that it has a
 connectivity issue?

 -Ewen

 On Mon, May 25, 2015 at 10:05 PM, Kamal C kamaltar...@gmail.com wrote:

  Hi,
 
  I have a cluster of 3 Kafka brokers and a remote producer. Producer
  started to send messages to *SampleTopic*. Then I blocked the network
  connectivity between the Producer and the leader node for the topic
  *SampleTopic* but network connectivity is healthy between the cluster and
  producer is able to reach the other two nodes.
 
  *With Script*
 
  sh kafka-topics.sh --zookeeper localhost --describe
  Topic:SampleTopicPartitionCount:1ReplicationFactor:3Configs:
  Topic: SampleTopicPartition: 0Leader: 1Replicas: 1,2,0
  Isr: 1,2,0
 
 
  Producer tries forever to reach the leader node by throwing connection
  refused exception. I understand that when there is a node failure leader
  gets switched. Why it's not switching the leader in this scenario ?
 
  --
  Kamal C
 



 --
 Thanks,
 Ewen



Re: How Producer handles Network Connectivity Issues

2015-05-26 Thread Ewen Cheslack-Postava
It's not being switched in this case because the broker hasn't failed. It
can still connect to all the other brokers and zookeeper. The only failure
is of the link between a client and the broker.

Another way to think of this is to extend the scenario with more producers.
If I have 100 other producers and they can all still connect, would you
still consider this a failure and expect the leader to change? Since
network partitions (or periods of high latency, or long GC pauses, etc) can
happen arbitrarily and clients might be spread far and wide, you can't rely
on their connectivity as an indicator of the health of the Kafka broker.

Of course, there's also a pretty big practical issue: since the client
can't connect to the broker, how would it even report that it has a
connectivity issue?

-Ewen

On Mon, May 25, 2015 at 10:05 PM, Kamal C kamaltar...@gmail.com wrote:

 Hi,

 I have a cluster of 3 Kafka brokers and a remote producer. Producer
 started to send messages to *SampleTopic*. Then I blocked the network
 connectivity between the Producer and the leader node for the topic
 *SampleTopic* but network connectivity is healthy between the cluster and
 producer is able to reach the other two nodes.

 *With Script*

 sh kafka-topics.sh --zookeeper localhost --describe
 Topic:SampleTopicPartitionCount:1ReplicationFactor:3Configs:
 Topic: SampleTopicPartition: 0Leader: 1Replicas: 1,2,0
 Isr: 1,2,0


 Producer tries forever to reach the leader node by throwing connection
 refused exception. I understand that when there is a node failure leader
 gets switched. Why it's not switching the leader in this scenario ?

 --
 Kamal C




-- 
Thanks,
Ewen