[jira] [Updated] (KAFKA-4460) Consumer stops getting messages when partition leader dies

2016-12-12 Thread Bernhard Bonigl (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernhard Bonigl updated KAFKA-4460:
---
Description: 
I have a setup consisting of 2 Kafka broker (0 and 1) using a zookeeper, a 
spring boot application with producers and a spring boot application with 
consumers.

The topic has 5 partitions and a replication factor of 2, both brokers are in 
sync, partitions have alternating leader (although it doesn't matter).

The spring boot kafka configuration is setup as follows:
{code}
kafka.address: localhost:9092,localhost:9093
kafka.numberOfConsumers: 20
{code}
Where Broker 0 uses port 9092 and Broker 1 uses port 9093.



When sending events they are consumed just fine. When Broker 0 is killed all 
topics get Broker 1 as their leader, however the consumers stop consuming 
events until Broker 0 is back. This happens nearly every time, but usually it 
takes at most 3 attempts of alternatively killing the leading broker to create 
the error state.

The console log is getting spammed by the coordinators, it looks like the 
coordinator representing broker 0 is marked as dead, but instantly rediscovered 
and used again many many times, and only at the end the other broker is 
discovered. When the switch works the log is only minimally spammed and the 
other broker is discovered very quickly.

[This gist | https://gist.github.com/bonii-xx/2f1c122f643019a1525fbe120e9162d8] 
contains the log of the application when the problem occurs. The first line is 
a log of ours indicating a successfully consumed message. After that the Broker 
0 (localhost:9092) is killed - you can see the log spam I was talking about. At 
the end localhost:9093 is discovered, however no further messages are consumed. 
After that I killed the application.



I also discovered [this | 
https://stackoverflow.com/questions/39650993/kafka-consumer-abstractcoordinator-discovered-coordinator-java-client]
 unresolved stackoverflow question, which seems to be the same problem.

  was:
I have a setup consisting of 2 Kafka broker (0 and 1) using a zookeeper, a 
spring boot application with producers and a spring boot application with 
consumers.

The topic has 5 partitions and a replication factor of 2, both brokers are in 
sync, partitions have alternating leader (although it doesn't matter).

The spring boot kafka configuration is setup as follows:
{code}
kafka.address: localhost:9092,localhost:9093
kafka.numberOfConsumers: 20
{code}
Where Broker 0 uses port 9092 and Broker 1 uses port 9093.



When sending events they are consumed just fine. When Broker 0 is killed all 
topics get Broker 1 as their leader, however the consumers stop consuming 
events until Broker 0 is back. This happens nearly every time, but usually it 
takes at most 3 attempts of alternatively killing the leading broker to create 
the error state.

The console log is getting spammed by the coordinators, it looks like the 
coordinator representing broker 0 is marked as dead, but instantly rediscovered 
and used again many many times, and only at the end the other broker is 
discovered. When the switch works the log is only minimally spammed and the 
other broker is discovered very quickly.

This gist contains the log of the application when the problem occurs. The 
first line is a log of ours indicating a successfully consumed message. After 
that the Broker 0 (localhost:9092) is killed - you can see the log spam I was 
talking about. At the end localhost:9093 is discovered, however no further 
messages are consumed. After that I killed the application.



I also discovered this unresolved stackoverflow question, which seems to be the 
same problem.


> Consumer stops getting messages when partition leader dies
> --
>
> Key: KAFKA-4460
> URL: https://issues.apache.org/jira/browse/KAFKA-4460
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.0.1
>Reporter: Bernhard Bonigl
>  Labels: reliability
>
> I have a setup consisting of 2 Kafka broker (0 and 1) using a zookeeper, a 
> spring boot application with producers and a spring boot application with 
> consumers.
> The topic has 5 partitions and a replication factor of 2, both brokers are in 
> sync, partitions have alternating leader (although it doesn't matter).
> The spring boot kafka configuration is setup as follows:
> {code}
> kafka.address: localhost:9092,localhost:9093
> kafka.numberOfConsumers: 20
> {code}
> Where Broker 0 uses port 9092 and Broker 1 uses port 9093.
> 
> When sending events they are consumed just fine. When Broker 0 is killed all 
> topics get Broker 1 as their leader, however the consumers stop consuming 
> events until Broker 0 is back. This happens nearly every time, but usually it 
> takes at most 3 

[jira] [Updated] (KAFKA-4460) Consumer stops getting messages when partition leader dies

2016-11-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4460:
---
Labels: reliability  (was: )

> Consumer stops getting messages when partition leader dies
> --
>
> Key: KAFKA-4460
> URL: https://issues.apache.org/jira/browse/KAFKA-4460
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.0.1
>Reporter: Bernhard Bonigl
>  Labels: reliability
>
> I have a setup consisting of 2 Kafka broker (0 and 1) using a zookeeper, a 
> spring boot application with producers and a spring boot application with 
> consumers.
> The topic has 5 partitions and a replication factor of 2, both brokers are in 
> sync, partitions have alternating leader (although it doesn't matter).
> The spring boot kafka configuration is setup as follows:
> {code}
> kafka.address: localhost:9092,localhost:9093
> kafka.numberOfConsumers: 20
> {code}
> Where Broker 0 uses port 9092 and Broker 1 uses port 9093.
> 
> When sending events they are consumed just fine. When Broker 0 is killed all 
> topics get Broker 1 as their leader, however the consumers stop consuming 
> events until Broker 0 is back. This happens nearly every time, but usually it 
> takes at most 3 attempts of alternatively killing the leading broker to 
> create the error state.
> The console log is getting spammed by the coordinators, it looks like the 
> coordinator representing broker 0 is marked as dead, but instantly 
> rediscovered and used again many many times, and only at the end the other 
> broker is discovered. When the switch works the log is only minimally spammed 
> and the other broker is discovered very quickly.
> This gist contains the log of the application when the problem occurs. The 
> first line is a log of ours indicating a successfully consumed message. After 
> that the Broker 0 (localhost:9092) is killed - you can see the log spam I was 
> talking about. At the end localhost:9093 is discovered, however no further 
> messages are consumed. After that I killed the application.
> 
> I also discovered this unresolved stackoverflow question, which seems to be 
> the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)