[jira] [Commented] (KAFKA-5430) new consumers getting data for revoked partitions

2017-08-07 Thread Lior Chaga (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117882#comment-16117882
 ] 

Lior Chaga commented on KAFKA-5430:
---

Thanks [~jasong35], this certainly seems related. We will upgrade soon.

> new consumers getting data for revoked partitions
> -
>
> Key: KAFKA-5430
> URL: https://issues.apache.org/jira/browse/KAFKA-5430
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.0
>Reporter: Lior Chaga
> Attachments: consumer-thread.log, consumer-thread.log, 
> kafka_trace.log.gz
>
>
> Due to bad configuration applied to network components, we experienced issues 
> with communication between kafka brokers (causing under replication) as well 
> as producers/consumers not being able to work against kafka.
> The symptoms on the consumer were many errors of the following form:
> {code}
> 2017-06-04 04:27:35,200 ERROR [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] TaboolaKafkaConsumer [] - Failed 
> committing to kafka topicPartitions 
> [requestlegacy-2,requestlegacy-0,requestlegacy-1] 
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> {code}
> So far so good. However, upon network recovery, there were several rebalance 
> operations, which eventually resulted in only one consumer (#14) being 
> assigned with all topic partitions (at this case we're talking about a 
> consumer groups for which all consumers are running in same process):
> {code}
> 2017-06-04 04:27:02,168 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-8, requestlegacy-9] 
> for group session_parser_02
> 2017-06-04 04:27:04,208 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-10, requestlegacy-11] 
> for group session_parser_02
> 2017-06-04 04:27:18,167 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-3, requestlegacy-4, 
> requestlegacy-5] for group session_parser_02
> 2017-06-04 04:27:20,232 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-2, requestlegacy-0, 
> requestlegacy-1] for group session_parser_02
> 2017-06-04 04:27:20,236 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-9, requestlegacy-10, 
> requestlegacy-11] for group session_parser_02
> 2017-06-04 04:27:20,237 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-3, requestlegacy-4, requestlegacy-5] 
> for group session_parser_02
> 2017-06-04 04:27:20,237 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-6, requestlegacy-7, requestlegacy-8] 
> for group session_parser_02
> 2017-06-04 04:27:20,332 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-2, requestlegacy-0, requestlegacy-1] 
> for group session_parser_02
> 2017-06-04 04:28:52,368 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-13_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-6, requestlegacy-7] 
> for group session_parser_02
> 2017-06-04 04:29:15,201 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-2, requestlegacy-0, 
> requestlegacy-1] for group session_parser_02
> 2017-06-04 04:30:22,379 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-6, requestlegacy-7, 
> requestlegacy-8] for group session_parser_02
> 2017-06-04 04:30:24,431 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-9, requestlegacy-10, 
> requestlegacy-11] for group session_parser_02
> 2017-06-04 04:30:38,229 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-3, 

[jira] [Commented] (KAFKA-5430) new consumers getting data for revoked partitions

2017-08-07 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116776#comment-16116776
 ] 

Jason Gustafson commented on KAFKA-5430:


[~liorchaga] Thanks for the report. I think this problem may have been fixed in 
KAFKA-5154 (the commit itself is a bit more descriptive: 
https://github.com/apache/kafka/commit/1b16acaaa181ceb214d84e70b8ddc146af9c0c5c).
 Is there any chance you could try again with the 0.11.0.0 client?

> new consumers getting data for revoked partitions
> -
>
> Key: KAFKA-5430
> URL: https://issues.apache.org/jira/browse/KAFKA-5430
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.0
>Reporter: Lior Chaga
> Attachments: consumer-thread.log, consumer-thread.log, 
> kafka_trace.log.gz
>
>
> Due to bad configuration applied to network components, we experienced issues 
> with communication between kafka brokers (causing under replication) as well 
> as producers/consumers not being able to work against kafka.
> The symptoms on the consumer were many errors of the following form:
> {code}
> 2017-06-04 04:27:35,200 ERROR [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] TaboolaKafkaConsumer [] - Failed 
> committing to kafka topicPartitions 
> [requestlegacy-2,requestlegacy-0,requestlegacy-1] 
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> {code}
> So far so good. However, upon network recovery, there were several rebalance 
> operations, which eventually resulted in only one consumer (#14) being 
> assigned with all topic partitions (at this case we're talking about a 
> consumer groups for which all consumers are running in same process):
> {code}
> 2017-06-04 04:27:02,168 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-8, requestlegacy-9] 
> for group session_parser_02
> 2017-06-04 04:27:04,208 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-10, requestlegacy-11] 
> for group session_parser_02
> 2017-06-04 04:27:18,167 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-3, requestlegacy-4, 
> requestlegacy-5] for group session_parser_02
> 2017-06-04 04:27:20,232 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-2, requestlegacy-0, 
> requestlegacy-1] for group session_parser_02
> 2017-06-04 04:27:20,236 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-9, requestlegacy-10, 
> requestlegacy-11] for group session_parser_02
> 2017-06-04 04:27:20,237 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-3, requestlegacy-4, requestlegacy-5] 
> for group session_parser_02
> 2017-06-04 04:27:20,237 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-6, requestlegacy-7, requestlegacy-8] 
> for group session_parser_02
> 2017-06-04 04:27:20,332 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-2, requestlegacy-0, requestlegacy-1] 
> for group session_parser_02
> 2017-06-04 04:28:52,368 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-13_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-6, requestlegacy-7] 
> for group session_parser_02
> 2017-06-04 04:29:15,201 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-2, requestlegacy-0, 
> requestlegacy-1] for group session_parser_02
> 2017-06-04 04:30:22,379 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-6, requestlegacy-7, 
> requestlegacy-8] for group session_parser_02
> 2017-06-04 04:30:24,431 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-9, requestlegacy-10, 
> 

[jira] [Commented] (KAFKA-5430) new consumers getting data for revoked partitions

2017-06-20 Thread Lior Chaga (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055418#comment-16055418
 ] 

Lior Chaga commented on KAFKA-5430:
---

Hi,
I gathered some more information regarding this issue.
The cluster is deployed on 2 adjacent data centers (with fiber network between 
them).
Brokers 9,12,13,14,15,16 are on data center A.
Brokers 5,6,7,8,10,11 are on data center B.

There was a firmware upgrade for Arista network devices at DC A.
During the upgrade, there were errors in the broker logs:

{code}
[2017-06-04 04:33:03,836] WARN [ReplicaFetcherThread-0-12], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@145b009c 
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 12 was disconnected before the response was 
read
at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:114)
at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
at scala.Option.foreach(Option.scala:236)
at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
at 
kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136)
at 
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:142)
at 
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
at 
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:249)
at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:234)
at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
{code}

And these:
{code}
[2017-06-04 04:33:13,579] WARN [ReplicaFetcherThread-0-13], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@4abe0742 
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to broker013:6667 (id: 13 rack: null) failed
at 
kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:84)
at 
kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:94)
at 
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:234)
at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
{code}

The latter seemed (didn't go through all broker logs) to appear only on brokers 
from DC A (in which the upgrade was performed). 

Hope this is helpful to understand the issue.

> new consumers getting data for revoked partitions
> -
>
> Key: KAFKA-5430
> URL: https://issues.apache.org/jira/browse/KAFKA-5430
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.0
>Reporter: Lior Chaga
> Attachments: kafka_trace.log.gz
>
>
> Due to bad configuration applied to network components, we experienced issues 
> with communication between kafka brokers (causing under replication) as well 
> as producers/consumers not being able to work against kafka.
> The symptoms on the consumer were many errors of the following form:
> {code}
> 2017-06-04 04:27:35,200 ERROR [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] TaboolaKafkaConsumer [] - Failed 
> committing to kafka topicPartitions 
> [requestlegacy-2,requestlegacy-0,requestlegacy-1] 
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> {code}
> So far so good. However, upon network recovery, there were several rebalance 
> operations, which eventually resulted in only one consumer (#14) being 
> assigned with all topic partitions (at this case we're talking about a 
> consumer groups for which all