Hi all!

We are running Kafka in a 3 node setup with Kafka and Zookeeper on each node. 
The topics have 1 partition and 2 replicas, like:

Topic:someTopic    PartitionCount:1    ReplicationFactor:2    
Configs:retention.ms=600000
    Topic: someTopic    Partition: 0    Leader: 2    Replicas: 2,0    Isr: 2,0

We uses the following settings

Consumer settings:
fetch.min.bytes=1
enable.auto.commit=true
max.partition.fetch.bytes=1073741824

Producer settings:
metadata.fetch.timeout.ms=1000

If we stop Kafka and Zookeeper on one node with 'kill -9', Kafka detects that 
the leader is missing within seconds and switches leader to the other replica 
and consumers will continue to receive messages.

If we on the other hand bring down the network for the same node with 'ifdown 
eth0' (which will break the connection to both Kafka and Zookeeper on that 
node) it seems like Kafka have problems detecting that the broker is missing 
and it takes up to 2 minutes until any more messages can be consumed on 
affected topics.

The following log can be seen on the consumer :
[2017-05-04 15:44:26,916] WARN Auto offset commit failed for group 
console-consumer-75510: Commit offsets failed with retriable exception. You 
should retry committing offsets. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

and on the producer:
May 04 15:44:18: 15:44:18.420 [kafka-producer-network-thread | producer-2] 
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server 
disconnected before a response was received.
May 04 15:44:18: 15:44:18.435 [kafka-producer-network-thread | producer-2] 
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server 
disconnected before a response was received.
May 04 15:44:18: 15:44:18.440 [kafka-producer-network-thread | producer-2] 
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server 
disconnected before a response was received.
May 04 15:44:18: 15:44:18.442 [kafka-producer-network-thread | producer-2] 
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server 
disconnected before a response was received.
May 04 15:44:18: 15:44:18.444 [kafka-producer-network-thread | producer-2] 
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server 
disconnected before a response was received.
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch 
containing 31 record(s) expired due to timeout while requesting metadata from 
brokers for someTopic-0
May 04 15:44:18: 15:44:18.446 [kafka-producer-network-thread | producer-2] 
ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch 
containing 31 record(s) expired due to timeout while requesting metadata from 
brokers for someTopic-0
May 04 15:44:18: 15:44:18.448 [kafka-producer-network-thread | producer-2] 
ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch 
containing 31 record(s) expired due to timeout while requesting metadata from 
brokers for someTopic-0
May 04 15:44:18: 15:44:18.449 [kafka-producer-network-thread | producer-2] 
ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
... will continue print those for a while

________________________
This email was scanned by Bitdefender

Reply via email to