[jira] [Resolved] (KAFKA-10229) Kafka stream dies for no apparent reason, no errors logged on client or server

2020-07-14 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta resolved KAFKA-10229.
-
Resolution: Invalid

Not an issue with Kafka -- the code run by the stream was blocked.

> Kafka stream dies for no apparent reason, no errors logged on client or server
> --
>
> Key: KAFKA-10229
> URL: https://issues.apache.org/jira/browse/KAFKA-10229
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.1
>Reporter: Raman Gupta
>Priority: Major
>
> My broker and clients are 2.4.1. I'm currently running a single broker. I 
> have a Kafka stream with exactly once processing turned on. I also have an 
> uncaught exception handler defined on the client. I have a stream which I 
> noticed was lagging. Upon investigation, I see that the consumer group was 
> empty.
> On restarting the consumers, the consumer group re-established itself, but 
> after about 8 minutes, the group became empty again. There is nothing logged 
> on the client side about any stream errors, despite the existence of an 
> uncaught exception handler.
> In the broker logs, I see that about 8 minutes after the clients restart / 
> the stream goes to RUNNING state:
> ```
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 in group 
> produs-cisFileIndexer-stream has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Preparing to rebalance 
> group produs-cisFileIndexer-stream in state PreparingRebalance with old 
> generation 228 (__consumer_offsets-3) (reason: removing member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 on heartbeat 
> expiration) (kafka.coordinator.group.GroupCoordinator)
> ```
> so according to this the consumer heartbeat has expired. I don't know why 
> this would be, logging shows that the stream was running and processing 
> messages normally and then just stopped processing anything about 4 minutes 
> before it dies, with no apparent errors or issues or anything logged via the 
> uncaught exception handler.
> It doesn't appear to be related to any specific poison pill type messages: 
> restarting the stream causes it to reprocess a bunch more messages from the 
> backlog, and then die again approximately 8 minutes later. At the time of the 
> last message consumed by the stream, there are no `INFO`-level or above logs 
> either in the client or the broker, or any errors whatsoever. The stream 
> consumption simply stops.
> There are two consumers -- even if I limit consumption to only a single 
> consumer, the same thing happens.
> The runtime environment is Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10229) Kafka stream dies for no apparent reason, no errors logged on client or server

2020-07-04 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151456#comment-17151456
 ] 

Raman Gupta commented on KAFKA-10229:
-

[~guozhang] Its the latter case -- the app itself is running fine. I have both 
an uncaught exception handler and a streams state change listener defined. 
While the stream stops consuming, there is no transition of state change in the 
stream, nor any exceptions logged. Everything else in the app continues to run 
just fine, including other Kafka consumers. Its just the single stream that 
stops consuming.

The stream is not stateful -- its just a simple read, write some data to an 
external system, transform and write to another topic. I suppose its possible 
the write to the external system is hanging for some reason. Unfortunately that 
topics is "caught up" now so I'm not seeing this problem currently. However, 
next time it happens I'll take a thread dump and we can see what the stream 
threads are doing.

> Kafka stream dies for no apparent reason, no errors logged on client or server
> --
>
> Key: KAFKA-10229
> URL: https://issues.apache.org/jira/browse/KAFKA-10229
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.1
>Reporter: Raman Gupta
>Priority: Major
>
> My broker and clients are 2.4.1. I'm currently running a single broker. I 
> have a Kafka stream with exactly once processing turned on. I also have an 
> uncaught exception handler defined on the client. I have a stream which I 
> noticed was lagging. Upon investigation, I see that the consumer group was 
> empty.
> On restarting the consumers, the consumer group re-established itself, but 
> after about 8 minutes, the group became empty again. There is nothing logged 
> on the client side about any stream errors, despite the existence of an 
> uncaught exception handler.
> In the broker logs, I see that about 8 minutes after the clients restart / 
> the stream goes to RUNNING state:
> ```
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 in group 
> produs-cisFileIndexer-stream has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Preparing to rebalance 
> group produs-cisFileIndexer-stream in state PreparingRebalance with old 
> generation 228 (__consumer_offsets-3) (reason: removing member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 on heartbeat 
> expiration) (kafka.coordinator.group.GroupCoordinator)
> ```
> so according to this the consumer heartbeat has expired. I don't know why 
> this would be, logging shows that the stream was running and processing 
> messages normally and then just stopped processing anything about 4 minutes 
> before it dies, with no apparent errors or issues or anything logged via the 
> uncaught exception handler.
> It doesn't appear to be related to any specific poison pill type messages: 
> restarting the stream causes it to reprocess a bunch more messages from the 
> backlog, and then die again approximately 8 minutes later. At the time of the 
> last message consumed by the stream, there are no `INFO`-level or above logs 
> either in the client or the broker, or any errors whatsoever. The stream 
> consumption simply stops.
> There are two consumers -- even if I limit consumption to only a single 
> consumer, the same thing happens.
> The runtime environment is Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10229) Kafka stream dies for no apparent reason, no errors logged on client or server

2020-07-02 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-10229:

Summary: Kafka stream dies for no apparent reason, no errors logged on 
client or server  (was: Kafka stream dies when earlier shut down node leaves 
group, no errors logged on client)

> Kafka stream dies for no apparent reason, no errors logged on client or server
> --
>
> Key: KAFKA-10229
> URL: https://issues.apache.org/jira/browse/KAFKA-10229
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.1
>Reporter: Raman Gupta
>Priority: Major
>
> My broker and clients are 2.4.1. I'm currently running a single broker. I 
> have a Kafka stream with exactly once processing turned on. I also have an 
> uncaught exception handler defined on the client. I have a stream which I 
> noticed was lagging. Upon investigation, I see that the consumer group was 
> empty.
> On restarting the consumers, the consumer group re-established itself, but 
> after about 8 minutes, the group became empty again. There is nothing logged 
> on the client side about any stream errors, despite the existence of an 
> uncaught exception handler.
> In the broker logs, I see that about 8 minutes after the clients restart / 
> the stream goes to RUNNING state:
> ```
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 in group 
> produs-cisFileIndexer-stream has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Preparing to rebalance 
> group produs-cisFileIndexer-stream in state PreparingRebalance with old 
> generation 228 (__consumer_offsets-3) (reason: removing member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 on heartbeat 
> expiration) (kafka.coordinator.group.GroupCoordinator)
> ```
> so according to this the consumer heartbeat has expired. I don't know why 
> this would be, logging shows that the stream was running and processing 
> messages normally and then just stopped processing anything about 4 minutes 
> before it dies, with no apparent errors or issues or anything logged via the 
> uncaught exception handler.
> It doesn't appear to be related to any specific poison pill type messages: 
> restarting the stream causes it to reprocess a bunch more messages from the 
> backlog, and then die again approximately 8 minutes later. At the time of the 
> last message consumed by the stream, there are no `INFO`-level or above logs 
> either in the client or the broker, or any errors whatsoever. The stream 
> consumption simply stops.
> There are two consumers -- even if I limit consumption to only a single 
> consumer, the same thing happens.
> The runtime environment is Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10229) Kafka stream dies when earlier shut down node leaves group, no errors logged on client

2020-07-02 Thread Raman Gupta (Jira)
Raman Gupta created KAFKA-10229:
---

 Summary: Kafka stream dies when earlier shut down node leaves 
group, no errors logged on client
 Key: KAFKA-10229
 URL: https://issues.apache.org/jira/browse/KAFKA-10229
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.4.1
Reporter: Raman Gupta


My broker and clients are 2.4.1. I'm currently running a single broker. I have 
a Kafka stream with exactly once processing turned on. I also have an uncaught 
exception handler defined on the client. I have a stream which I noticed was 
lagging. Upon investigation, I see that the consumer group was empty.

On restarting the consumers, the consumer group re-established itself, but 
after about 8 minutes, the group became empty again. There is nothing logged on 
the client side about any stream errors, despite the existence of an uncaught 
exception handler.

In the broker logs, I see that about 8 minutes after the clients restart / the 
stream goes to RUNNING state:

```
[2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Member 
cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 in group 
produs-cisFileIndexer-stream has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Preparing to rebalance 
group produs-cisFileIndexer-stream in state PreparingRebalance with old 
generation 228 (__consumer_offsets-3) (reason: removing member 
cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 on heartbeat 
expiration) (kafka.coordinator.group.GroupCoordinator)
```

so according to this the consumer heartbeat has expired. I don't know why this 
would be, logging shows that the stream was running and processing messages 
normally and then just stopped processing anything about 4 minutes before it 
dies, with no apparent errors or issues or anything logged via the uncaught 
exception handler.

It doesn't appear to be related to any specific poison pill type messages: 
restarting the stream causes it to reprocess a bunch more messages from the 
backlog, and then die again approximately 8 minutes later. At the time of the 
last message consumed by the stream, there are no `INFO`-level or above logs 
either in the client or the broker, or any errors whatsoever. The stream 
consumption simply stops.

There are two consumers -- even if I limit consumption to only a single 
consumer, the same thing happens.

The runtime environment is Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10007) Kafka consumer offset reset despite recent group activity

2020-05-27 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117760#comment-17117760
 ] 

Raman Gupta commented on KAFKA-10007:
-

It certainly could be. I'm not certain KAFKA-9543 listed too many specific 
conditions -- just some vague race conditions. Is there a way to tell for sure 
if this is the same problem or not?

> Kafka consumer offset reset despite recent group activity
> -
>
> Key: KAFKA-10007
> URL: https://issues.apache.org/jira/browse/KAFKA-10007
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
>
> I was running a Kafka 2.3.0 broker with the default values for 
> `offset.retention.minutes` (which should be 7 days as of 2.0.0). I deployed a 
> 2.4.1 broker, along with a change in setting `offset.retention.minutes` to 14 
> days, as I have several low-traffic topics in which exactly-once processing 
> is desired.
> As I understand it, with https://issues.apache.org/jira/browse/KAFKA-4682 and 
> KIP-211, offsets should no longer be expired based on the last commit 
> timestamp, but instead on the last time the group transitioned into an Empty 
> state.
> However, the behavior I saw from Kafka upon broker shutdown was that the 
> offsets were expired for a group when as far as I can tell, they should not 
> have been. See these logs from during the cluster recycle -- during this time 
> the consumer, configured with the static group membership protocol, is always 
> running:
> {code}
> < offsets.retention.minutes using default value>>
> [2020-05-10 05:37:01,070] <>
> << Starting broker-0 on 2.4.1 with protocol version 2.3, 
> offsets.retention.minutes = 10080 >>
> kafka-0   [2020-05-10 05:37:39,682] INFO starting 
> (kafka.server.KafkaServer)
> kafka-0   [2020-05-10 05:39:42,680] INFO [GroupCoordinator 0]: Loading 
> group metadata for produs-cis-CisFileEventConsumer with generation 27 
> (kafka.coordinator.group.GroupCoordinator)
> << Recycling broker-1 on 2.4.1, protocol version 2.3, 
> offsets.retention.minutes = 10080, looks like the consumer fails because of 
> the broker going down, and kafka-0 reports: >>
> kafka-0   [2020-05-10 05:45:14,121] INFO [GroupCoordinator 0]: Member 
> cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 in group 
> produs-cis-CisFileEventConsumer has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:45:14,124] INFO [GroupCoordinator 0]: Preparing 
> to rebalance group produs-cis-CisFileEventConsumer in state 
> PreparingRebalance with old generation 27 (__consumer_offsets-17) (reason: 
> removing member cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 on 
> heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:45:19,479] INFO [GroupCoordinator 0]: Member 
> cis-9c5d994c5-sknlk-2b9ed8bf-348c-4a10-97d3-5f2caccce7df in group 
> produs-cis-CisFileEventConsumer has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:45:19,482] INFO [GroupCoordinator 0]: Group 
> produs-cis-CisFileEventConsumer with generation 28 is now empty 
> (__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)
> << and now kafka-1 starts up again, the offsets are expired >>
> kafka-1   [2020-05-10 05:46:11,229] INFO starting 
> (kafka.server.KafkaServer)
> ...
> kafka-0   [2020-05-10 05:47:42,303] INFO [GroupCoordinator 0]: Preparing 
> to rebalance group produs-cis-CisFileEventConsumer in state 
> PreparingRebalance with old generation 28 (__consumer_offsets-17) (reason: 
> Adding new member cis-9c5d994c5-sknlk-1194b4b6-81ae-4a78-89a7-c610cf8c65be 
> with group instanceid Some(cis-9c5d994c5-sknlk)) 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:47:47,611] INFO [GroupMetadataManager 
> brokerId=0] Removed 43 expired offsets in 13 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)
> kafka-0   [2020-05-10 05:48:12,308] INFO [GroupCoordinator 0]: Stabilized 
> group produs-cis-CisFileEventConsumer generation 29 (__consumer_offsets-17) 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:48:12,311] INFO [GroupCoordinator 0]: Assignment 
> received from leader for group produs-cis-CisFileEventConsumer for generation 
> 29 (kafka.coordinator.group.GroupCoordinator)
> {code}
> The group becomes empty at 2020-05-10 05:45:19,482, and then the offsets are 
> expired about two minutes later at 05:47:47,611. I can't see any reason based 
> on my understanding of how things work for this to have happened, other than 
> it being a bug of some type?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10007) Kafka consumer offset reset despite recent group activity

2020-05-26 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117039#comment-17117039
 ] 

Raman Gupta edited comment on KAFKA-10007 at 5/26/20, 8:59 PM:
---

Just happened to me again on a completely different 2.4.1 broker. The cluster 
was recently downscaled from 4 brokers to 1 and today when a client restarted, 
it had lost its offsets from 3 partitions out of 100. So it seems like just 
shutting down brokers is enough to cause this. This is a really serious issue 
and needs some attention.


was (Author: rocketraman):
Just happened to me again on a completely different 2.4.1 broker. The cluster 
was recently downscaled from 4 brokers to 1 and today when a client restarted, 
it had lost its offsets from 3 partitions out of 100.  This is a really serious 
issue and needs some attention.

> Kafka consumer offset reset despite recent group activity
> -
>
> Key: KAFKA-10007
> URL: https://issues.apache.org/jira/browse/KAFKA-10007
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
>
> I was running a Kafka 2.3.0 broker with the default values for 
> `offset.retention.minutes` (which should be 7 days as of 2.0.0). I deployed a 
> 2.4.1 broker, along with a change in setting `offset.retention.minutes` to 14 
> days, as I have several low-traffic topics in which exactly-once processing 
> is desired.
> As I understand it, with https://issues.apache.org/jira/browse/KAFKA-4682 and 
> KIP-211, offsets should no longer be expired based on the last commit 
> timestamp, but instead on the last time the group transitioned into an Empty 
> state.
> However, the behavior I saw from Kafka upon broker shutdown was that the 
> offsets were expired for a group when as far as I can tell, they should not 
> have been. See these logs from during the cluster recycle -- during this time 
> the consumer, configured with the static group membership protocol, is always 
> running:
> {code}
> < offsets.retention.minutes using default value>>
> [2020-05-10 05:37:01,070] <>
> << Starting broker-0 on 2.4.1 with protocol version 2.3, 
> offsets.retention.minutes = 10080 >>
> kafka-0   [2020-05-10 05:37:39,682] INFO starting 
> (kafka.server.KafkaServer)
> kafka-0   [2020-05-10 05:39:42,680] INFO [GroupCoordinator 0]: Loading 
> group metadata for produs-cis-CisFileEventConsumer with generation 27 
> (kafka.coordinator.group.GroupCoordinator)
> << Recycling broker-1 on 2.4.1, protocol version 2.3, 
> offsets.retention.minutes = 10080, looks like the consumer fails because of 
> the broker going down, and kafka-0 reports: >>
> kafka-0   [2020-05-10 05:45:14,121] INFO [GroupCoordinator 0]: Member 
> cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 in group 
> produs-cis-CisFileEventConsumer has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:45:14,124] INFO [GroupCoordinator 0]: Preparing 
> to rebalance group produs-cis-CisFileEventConsumer in state 
> PreparingRebalance with old generation 27 (__consumer_offsets-17) (reason: 
> removing member cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 on 
> heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:45:19,479] INFO [GroupCoordinator 0]: Member 
> cis-9c5d994c5-sknlk-2b9ed8bf-348c-4a10-97d3-5f2caccce7df in group 
> produs-cis-CisFileEventConsumer has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:45:19,482] INFO [GroupCoordinator 0]: Group 
> produs-cis-CisFileEventConsumer with generation 28 is now empty 
> (__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)
> << and now kafka-1 starts up again, the offsets are expired >>
> kafka-1   [2020-05-10 05:46:11,229] INFO starting 
> (kafka.server.KafkaServer)
> ...
> kafka-0   [2020-05-10 05:47:42,303] INFO [GroupCoordinator 0]: Preparing 
> to rebalance group produs-cis-CisFileEventConsumer in state 
> PreparingRebalance with old generation 28 (__consumer_offsets-17) (reason: 
> Adding new member cis-9c5d994c5-sknlk-1194b4b6-81ae-4a78-89a7-c610cf8c65be 
> with group instanceid Some(cis-9c5d994c5-sknlk)) 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:47:47,611] INFO [GroupMetadataManager 
> brokerId=0] Removed 43 expired offsets in 13 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)
> kafka-0   [2020-05-10 05:48:12,308] INFO [GroupCoordinator 0]: Stabilized 
> group produs-cis-CisFileEventConsumer generation 29 (__consumer_offsets-17) 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:48:12,311] INFO [GroupCoordinator 0]: Assignment 
> received from leader for group 

[jira] [Commented] (KAFKA-10007) Kafka consumer offset reset despite recent group activity

2020-05-26 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117039#comment-17117039
 ] 

Raman Gupta commented on KAFKA-10007:
-

Just happened to me again on a completely different 2.4.1 broker. The cluster 
was recently downscaled from 4 brokers to 1 and today when a client restarted, 
it had lost its offsets from 3 partitions out of 100.  This is a really serious 
issue and needs some attention.

> Kafka consumer offset reset despite recent group activity
> -
>
> Key: KAFKA-10007
> URL: https://issues.apache.org/jira/browse/KAFKA-10007
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
>
> I was running a Kafka 2.3.0 broker with the default values for 
> `offset.retention.minutes` (which should be 7 days as of 2.0.0). I deployed a 
> 2.4.1 broker, along with a change in setting `offset.retention.minutes` to 14 
> days, as I have several low-traffic topics in which exactly-once processing 
> is desired.
> As I understand it, with https://issues.apache.org/jira/browse/KAFKA-4682 and 
> KIP-211, offsets should no longer be expired based on the last commit 
> timestamp, but instead on the last time the group transitioned into an Empty 
> state.
> However, the behavior I saw from Kafka upon broker shutdown was that the 
> offsets were expired for a group when as far as I can tell, they should not 
> have been. See these logs from during the cluster recycle -- during this time 
> the consumer, configured with the static group membership protocol, is always 
> running:
> {code}
> < offsets.retention.minutes using default value>>
> [2020-05-10 05:37:01,070] <>
> << Starting broker-0 on 2.4.1 with protocol version 2.3, 
> offsets.retention.minutes = 10080 >>
> kafka-0   [2020-05-10 05:37:39,682] INFO starting 
> (kafka.server.KafkaServer)
> kafka-0   [2020-05-10 05:39:42,680] INFO [GroupCoordinator 0]: Loading 
> group metadata for produs-cis-CisFileEventConsumer with generation 27 
> (kafka.coordinator.group.GroupCoordinator)
> << Recycling broker-1 on 2.4.1, protocol version 2.3, 
> offsets.retention.minutes = 10080, looks like the consumer fails because of 
> the broker going down, and kafka-0 reports: >>
> kafka-0   [2020-05-10 05:45:14,121] INFO [GroupCoordinator 0]: Member 
> cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 in group 
> produs-cis-CisFileEventConsumer has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:45:14,124] INFO [GroupCoordinator 0]: Preparing 
> to rebalance group produs-cis-CisFileEventConsumer in state 
> PreparingRebalance with old generation 27 (__consumer_offsets-17) (reason: 
> removing member cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 on 
> heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:45:19,479] INFO [GroupCoordinator 0]: Member 
> cis-9c5d994c5-sknlk-2b9ed8bf-348c-4a10-97d3-5f2caccce7df in group 
> produs-cis-CisFileEventConsumer has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:45:19,482] INFO [GroupCoordinator 0]: Group 
> produs-cis-CisFileEventConsumer with generation 28 is now empty 
> (__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)
> << and now kafka-1 starts up again, the offsets are expired >>
> kafka-1   [2020-05-10 05:46:11,229] INFO starting 
> (kafka.server.KafkaServer)
> ...
> kafka-0   [2020-05-10 05:47:42,303] INFO [GroupCoordinator 0]: Preparing 
> to rebalance group produs-cis-CisFileEventConsumer in state 
> PreparingRebalance with old generation 28 (__consumer_offsets-17) (reason: 
> Adding new member cis-9c5d994c5-sknlk-1194b4b6-81ae-4a78-89a7-c610cf8c65be 
> with group instanceid Some(cis-9c5d994c5-sknlk)) 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:47:47,611] INFO [GroupMetadataManager 
> brokerId=0] Removed 43 expired offsets in 13 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)
> kafka-0   [2020-05-10 05:48:12,308] INFO [GroupCoordinator 0]: Stabilized 
> group produs-cis-CisFileEventConsumer generation 29 (__consumer_offsets-17) 
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0   [2020-05-10 05:48:12,311] INFO [GroupCoordinator 0]: Assignment 
> received from leader for group produs-cis-CisFileEventConsumer for generation 
> 29 (kafka.coordinator.group.GroupCoordinator)
> {code}
> The group becomes empty at 2020-05-10 05:45:19,482, and then the offsets are 
> expired about two minutes later at 05:47:47,611. I can't see any reason based 
> on my understanding of how things work for this to have happened, other than 
> it being a bug of some type?



--
This message was sent by Atlassian Jira

[jira] [Created] (KAFKA-10007) Kafka consumer offset reset despite recent group activity

2020-05-15 Thread Raman Gupta (Jira)
Raman Gupta created KAFKA-10007:
---

 Summary: Kafka consumer offset reset despite recent group activity
 Key: KAFKA-10007
 URL: https://issues.apache.org/jira/browse/KAFKA-10007
 Project: Kafka
  Issue Type: Bug
Reporter: Raman Gupta


I was running a Kafka 2.3.0 broker with the default values for 
`offset.retention.minutes` (which should be 7 days as of 2.0.0). I deployed a 
2.4.1 broker, along with a change in setting `offset.retention.minutes` to 14 
days, as I have several low-traffic topics in which exactly-once processing is 
desired.

As I understand it, with https://issues.apache.org/jira/browse/KAFKA-4682 and 
KIP-211, offsets should no longer be expired based on the last commit 
timestamp, but instead on the last time the group transitioned into an Empty 
state.

However, the behavior I saw from Kafka upon broker shutdown was that the 
offsets were expired for a group when as far as I can tell, they should not 
have been. See these logs from during the cluster recycle -- during this time 
the consumer, configured with the static group membership protocol, is always 
running:


{code}
<>

[2020-05-10 05:37:01,070] <>

<< Starting broker-0 on 2.4.1 with protocol version 2.3, 
offsets.retention.minutes = 10080 >>

kafka-0 [2020-05-10 05:37:39,682] INFO starting (kafka.server.KafkaServer)
kafka-0 [2020-05-10 05:39:42,680] INFO [GroupCoordinator 0]: Loading group 
metadata for produs-cis-CisFileEventConsumer with generation 27 
(kafka.coordinator.group.GroupCoordinator)

<< Recycling broker-1 on 2.4.1, protocol version 2.3, offsets.retention.minutes 
= 10080, looks like the consumer fails because of the broker going down, and 
kafka-0 reports: >>

kafka-0 [2020-05-10 05:45:14,121] INFO [GroupCoordinator 0]: Member 
cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 in group 
produs-cis-CisFileEventConsumer has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:45:14,124] INFO [GroupCoordinator 0]: Preparing to 
rebalance group produs-cis-CisFileEventConsumer in state PreparingRebalance 
with old generation 27 (__consumer_offsets-17) (reason: removing member 
cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 on heartbeat 
expiration) (kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:45:19,479] INFO [GroupCoordinator 0]: Member 
cis-9c5d994c5-sknlk-2b9ed8bf-348c-4a10-97d3-5f2caccce7df in group 
produs-cis-CisFileEventConsumer has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:45:19,482] INFO [GroupCoordinator 0]: Group 
produs-cis-CisFileEventConsumer with generation 28 is now empty 
(__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)

<< and now kafka-1 starts up again, the offsets are expired >>

kafka-1 [2020-05-10 05:46:11,229] INFO starting (kafka.server.KafkaServer)
...
kafka-0 [2020-05-10 05:47:42,303] INFO [GroupCoordinator 0]: Preparing to 
rebalance group produs-cis-CisFileEventConsumer in state PreparingRebalance 
with old generation 28 (__consumer_offsets-17) (reason: Adding new member 
cis-9c5d994c5-sknlk-1194b4b6-81ae-4a78-89a7-c610cf8c65be with group instanceid 
Some(cis-9c5d994c5-sknlk)) (kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:47:47,611] INFO [GroupMetadataManager brokerId=0] 
Removed 43 expired offsets in 13 milliseconds. 
(kafka.coordinator.group.GroupMetadataManager)
kafka-0 [2020-05-10 05:48:12,308] INFO [GroupCoordinator 0]: Stabilized group 
produs-cis-CisFileEventConsumer generation 29 (__consumer_offsets-17) 
(kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:48:12,311] INFO [GroupCoordinator 0]: Assignment 
received from leader for group produs-cis-CisFileEventConsumer for generation 
29 (kafka.coordinator.group.GroupCoordinator)
{code}


The group becomes empty at 2020-05-10 05:45:19,482, and then the offsets are 
expired about two minutes later at 05:47:47,611. I can't see any reason based 
on my understanding of how things work for this to have happened, other than it 
being a bug of some type?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-05-12 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105745#comment-17105745
 ] 

Raman Gupta commented on KAFKA-8803:


[~guozhang] No, you never mentioned that commit above AFAICS.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.2, 2.4.2
>
> Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, 
> logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-05-12 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta reopened KAFKA-8803:


> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.2, 2.4.2
>
> Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, 
> logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-05-12 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105691#comment-17105691
 ] 

Raman Gupta commented on KAFKA-8803:


[~guozhang] Just ran into this problem again, with 2.4.1 + patch for 
KAFKA-9749. :-(

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.2, 2.4.2
>
> Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, 
> logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-04-30 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8803:
---
Fix Version/s: 2.4.2
   2.3.2

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.2, 2.4.2
>
> Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, 
> logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-04-06 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076895#comment-17076895
 ] 

Raman Gupta commented on KAFKA-8803:


We are on 2.4.1 now, locally patched with the fix for KAFKA-9749. So far so 
good.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, 
> logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-03-18 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062286#comment-17062286
 ] 

Raman Gupta commented on KAFKA-8803:


[~guozhang] I plan on upgrading to 2.4.1 (from 2.3.0) within the next few days.

Regarding your last post, what is the conclusion you are drawing?

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, 
> logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-03-11 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057352#comment-17057352
 ] 

Raman Gupta edited comment on KAFKA-8803 at 3/11/20, 8:20 PM:
--

Here are the logs from this morning, bracketing the problem by an hour or so on 
each side, and includes logs from the broker shutdowns and restarts. The first 
instance of `UNKNOWN_LEADER_EPOCH` is at 14:38:36. 

Our client-side streams were failing from 14:59: to 16:40, the first instance 
of the error is at 14:59:41. The error always seems to occur for the first time 
after the client restarts, so its quite possible its related to a streams 
shutdown process, although I don't think so because the UNKNOWN_LEADER_EPOCH 
happened earlier. It seems like whatever the issue was occurred on the broker 
side, but as long as the client side didn't restart, things were ok. Just a 
theory though.

There is no IllegalStateException as seen by [~oleksii.boiko].

 [^logs-20200311.txt.gz]  [^logs-client-20200311.txt.gz] 



> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, 
> logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-03-11 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8803:
---
Attachment: logs-client-20200311.txt.gz
logs-20200311.txt.gz

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, 
> logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-03-11 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057281#comment-17057281
 ] 

Raman Gupta commented on KAFKA-8803:


[~guozhang] FYI we had this happen again today, and this time I did NOT see any 
errors similar to "The metadata cache for txn partition 22 has already exist 
with epoch 567 and 9 entries while trying to add to it; this should not happen" 
error, so it may or may not be related (probably not).

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-03-11 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057262#comment-17057262
 ] 

Raman Gupta commented on KAFKA-8803:


Thanks [~guozhang]. I'm a little confused by your response. Regarding the 
timeout and timeout retry, as mentioned previously in this issue, I don't think 
that is going to help. We currently let-it-crash in this situation, and retry 
on the next startup, and when this error happens, these retries do not work 
until the brokers are restarted. Whatever state the brokers are in, is not 
fixed by retries nor by longer timeouts.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-03-11 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057191#comment-17057191
 ] 

Raman Gupta commented on KAFKA-8803:


Any progress on this? Still happening regularly for us.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-27 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022192#comment-17022192
 ] 

Raman Gupta edited comment on KAFKA-8803 at 1/27/20 9:52 PM:
-

[~oleksii.boiko]'s last message inspired me to check for 
`IllegalStateException` in our logs. I don't see the same error as he, but I do 
see 12 of these errors on our `kafka-2` broker a few hours before the last 
timeout error we experienced on Jan 15th – these errors always seem to occur on 
stream restart. The `kafka-2` broker is the one which is the same broker 
restarted after which the stream recovered.
{code:java}
[2020-01-15 09:27:08,689] ERROR Uncaught exception in scheduled task 
'load-txns-for-partition-__transaction_state-22' (kafka.utils.KafkaScheduler)
 java.lang.IllegalStateException: The metadata cache for txn partition 22 has 
already exist with epoch 567 and 9 entries while trying to add to it; this 
should not happen
 at 
kafka.coordinator.transaction.TransactionStateManager.addLoadedTransactionsToCache(TransactionStateManager.scala:369)
 at 
kafka.coordinator.transaction.TransactionStateManager.$anonfun$loadTransactionsForTxnTopicPartition$3(TransactionStateManager.scala:394)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
 at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:261)
 at 
kafka.coordinator.transaction.TransactionStateManager.loadTransactions$1(TransactionStateManager.scala:393)
 at 
kafka.coordinator.transaction.TransactionStateManager.$anonfun$loadTransactionsForTxnTopicPartition$7(TransactionStateManager.scala:426)
 at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
 at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
 at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
 at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)
{code}


was (Author: rocketraman):
[~oleksii.boiko]'s last message inspired me to check for 
`IllegalStateException` in our logs. I don't see the same error as he, but I do 
see 12 of these errors on our `kafka-2` broker a few hours before the last 
timeout error we experienced on Jan 15th – these errors always seem to occur on 
stream restart. The `kafka-2` broker is the one which is the same broker 
restarted just before the stream recovered.
{code:java}
[2020-01-15 09:27:08,689] ERROR Uncaught exception in scheduled task 
'load-txns-for-partition-__transaction_state-22' (kafka.utils.KafkaScheduler)
 java.lang.IllegalStateException: The metadata cache for txn partition 22 has 
already exist with epoch 567 and 9 entries while trying to add to it; this 
should not happen
 at 
kafka.coordinator.transaction.TransactionStateManager.addLoadedTransactionsToCache(TransactionStateManager.scala:369)
 at 
kafka.coordinator.transaction.TransactionStateManager.$anonfun$loadTransactionsForTxnTopicPartition$3(TransactionStateManager.scala:394)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
 at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:261)
 at 
kafka.coordinator.transaction.TransactionStateManager.loadTransactions$1(TransactionStateManager.scala:393)
 at 
kafka.coordinator.transaction.TransactionStateManager.$anonfun$loadTransactionsForTxnTopicPartition$7(TransactionStateManager.scala:426)
 at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
 at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
 at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
 at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)
{code}

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams

[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-23 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022192#comment-17022192
 ] 

Raman Gupta commented on KAFKA-8803:


[~oleksii.boiko]'s last message inspired me to check for 
`IllegalStateException` in our logs. I don't see the same error as he, but I do 
see 12 of these errors on our `kafka-2` broker a few hours before the last 
timeout error we experienced on Jan 15th – these errors always seem to occur on 
stream restart. The `kafka-2` broker is the one which is the same broker 
restarted just before the stream recovered.
{code:java}
[2020-01-15 09:27:08,689] ERROR Uncaught exception in scheduled task 
'load-txns-for-partition-__transaction_state-22' (kafka.utils.KafkaScheduler)
 java.lang.IllegalStateException: The metadata cache for txn partition 22 has 
already exist with epoch 567 and 9 entries while trying to add to it; this 
should not happen
 at 
kafka.coordinator.transaction.TransactionStateManager.addLoadedTransactionsToCache(TransactionStateManager.scala:369)
 at 
kafka.coordinator.transaction.TransactionStateManager.$anonfun$loadTransactionsForTxnTopicPartition$3(TransactionStateManager.scala:394)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
 at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:261)
 at 
kafka.coordinator.transaction.TransactionStateManager.loadTransactions$1(TransactionStateManager.scala:393)
 at 
kafka.coordinator.transaction.TransactionStateManager.$anonfun$loadTransactionsForTxnTopicPartition$7(TransactionStateManager.scala:426)
 at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
 at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
 at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
 at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)
{code}

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-17 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018259#comment-17018259
 ] 

Raman Gupta commented on KAFKA-8803:


And it happened again today with the same stream. Offsets were / are not 
expired so never mind my last theory.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-16 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017311#comment-17017311
 ] 

Raman Gupta commented on KAFKA-8803:


Is it possible this is related in the opposite direction? i.e. the offsets 
expired due to `offsets.retention.minutes`, and then because of that this issue 
is triggered for the stream for which offsets were expired? I don't see why 
that would be the case in this situation, because the stream has been running 
continuously, and since the consumer group was never empty for more than 7 
days, I see no reason why the offsets should have been expired. However, I 
throw this out there for consideration.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-16 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017287#comment-17017287
 ] 

Raman Gupta commented on KAFKA-8803:


Very very strange. Following up on my last message, after a client restart, my 
stream is showing a `-` for multiple partition offsets (no offsets committed), 
even though every partition is assigned a consumer. The stream does not read 
any of the messages on that partition until more messages are sent to it. At 
that point it reads those messages and updates the offset. However, I certainly 
have messages in that topic that were *never* processed by my EXACTLY_ONCE 
stream. It all seems related to this issue, and this is very scary -- now I'm 
wondering what else hasn't been processed from Kafka.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-15 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016236#comment-17016236
 ] 

Raman Gupta commented on KAFKA-8803:


Looks like this is not just a dev-ops issue, from what I'm seeing here, it 
looks like *the EXACTLY_ONCE guarantee of the stream that failed to start with 
this error is being violated by Kafka*.

After restarting all my brokers I was expecting the stream to start where it 
left off and process all input messages. However, the stream offset was already 
past a message that had never been processed. This is a huge problem.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-15 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016128#comment-17016128
 ] 

Raman Gupta commented on KAFKA-8803:


This happened to me again today. Recycling 3 of 4 brokers fixed the issue.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-12-10 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993219#comment-16993219
 ] 

Raman Gupta commented on KAFKA-8803:


[~guozhang] As per the OP, broker is 2.3.0 (with a patch for KAFKA-8715).

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-12-09 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991711#comment-16991711
 ] 

Raman Gupta commented on KAFKA-8803:


[~guozhang] Yes, the state is indefinite until the brokers are bounced. I 
believe I posted the complete broker logs earlier -- what is missing from them 
that you need?

I've had this issue happen to me three times, but as [~timvanlaer] says, it 
does not happen consistently, nor do I know how to reproduce it.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-12-06 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990054#comment-16990054
 ] 

Raman Gupta commented on KAFKA-8803:


[~bchen225242] I see the pull request adds a retry. That is fine and good, but 
I think that won't solve this problem. I already effectively have retries 
because my app, upon encountering this error, crashes with a fatal error and is 
restarted automatically, at which point it retries. If you look at the original 
posts above, you will see that some of my streams retried every couple of 
minutes over the course of multiple weeks to restart with no success, so adding 
a retry in the client isn't going to solve this problem.

You have to figure out what the underlying problem in the broker is that is 
causing the problem in the first place.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-11-28 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984248#comment-16984248
 ] 

Raman Gupta commented on KAFKA-8803:


[~bchen225242] IIRC, it seems to happen randomly to specific streams upon 
application startup. Once it happens, the stream usually does not recover on 
its own unless the brokers are restarted, even after restarting the application.

I doubt the issue is broker overload. Note also that most of the streams in my 
system are totally fine -- when the issue (seemingly) randomly happens, it 
happens only to specific streams. If it was a broker overload issue, I would 
expect more streams to be affected right? Plus these brokers are really not 
doing that much.

That being said I can still provide the info. When you say "request metrics" 
what metrics are you referring to? The producer metrics?

Right now I don't have any streams with this issue, but I can make a note to 
check specific things the next time it happens.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-11-27 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983584#comment-16983584
 ] 

Raman Gupta commented on KAFKA-8803:


[~bchen225242]  actually only one eos app is affected (but seemingly random 
ones, not the same one every time). The rest of the EOS and non-EOS apps are 
fine.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-11-19 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977577#comment-16977577
 ] 

Raman Gupta commented on KAFKA-8803:


We had this problem again today, and checked on each node restart whether the 
problem was fixed. It went away after restarting the third of four nodes.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-10-21 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956165#comment-16956165
 ] 

Raman Gupta commented on KAFKA-8803:


[~guozhang] I've provided all the logs, including broker side, as attachments 
to this issue. The main thing that seemed irregular was `UNKNOWN_LEADER_EPOCH` 
errors.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-10-20 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955404#comment-16955404
 ] 

Raman Gupta commented on KAFKA-8803:


I've restarted my entire Kafka cluster, one node at a time, and that seems to 
have "solved" the problem. I have no idea what happened here, but one or more 
Kafka brokers must have been "broken" somehow.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-10-19 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955393#comment-16955393
 ] 

Raman Gupta edited comment on KAFKA-8803 at 10/20/19 5:34 AM:
--

[~bbejeck] Now the error is happening again, for two different streams than the 
stream which was failing with this error before. Both of streams now 
experiencing issus have also been running just fine until now, and changing 
`max.block.ms` for them. I still get the same error message. After setting:
{code:java}
props.put(StreamsConfig.producerPrefix(ProducerConfig.MAX_BLOCK_MS_CONFIG), 
120);{code}
it takes longer for the error to occur but after 20 minutes it still does:
{code:java}
2019-10-19 22:10:52,910 ERROR --- [c892e-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamTask    : task [0_1] Timeout 
exception caught when initializing transactions for task 0_1. This might happen 
if the broker is slow to respond, if the network connection to the broker was 
interrupted, or if similar circumstances arise. You can increase producer 
parameter `max.block.ms` to increase this timeout
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
120milliseconds while awaiting InitProducerId{code}
So at this point, short of waiting 17 days for the stream to finally recover on 
its own as it did before, I don't know how to solve this, other than ditching 
Kafka entirely, which admittedly, is an idea looking better and better.

 


was (Author: rocketraman):
[~bbejeck] Now the error is happening again, for two different streams than the 
stream before before. Both of these streams have also been running just fine 
until now, and changing `max.block.ms` for the stream has no effect. I still 
get the same error message. After setting:
{code:java}
props.put(StreamsConfig.producerPrefix(ProducerConfig.MAX_BLOCK_MS_CONFIG), 
120);{code}
it takes longer for the error to occur but after 20 minutes it still does:
{code:java}
2019-10-19 22:10:52,910 ERROR --- [c892e-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamTask    : task [0_1] Timeout 
exception caught when initializing transactions for task 0_1. This might happen 
if the broker is slow to respond, if the network connection to the broker was 
interrupted, or if similar circumstances arise. You can increase producer 
parameter `max.block.ms` to increase this timeout
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
120milliseconds while awaiting InitProducerId{code}
So at this point, short of waiting 17 days for the stream to finally recover on 
its own as it did before, I don't know how to solve this, other than ditching 
Kafka entirely, which admittedly, is an idea looking better and better.

 

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 

[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-10-19 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955393#comment-16955393
 ] 

Raman Gupta commented on KAFKA-8803:


[~bbejeck] Now the error is happening again, for two different streams than the 
stream before before. Both of these streams have also been running just fine 
until now, and changing `max.block.ms` for the stream has no effect. I still 
get the same error message. After setting:
{code:java}
props.put(StreamsConfig.producerPrefix(ProducerConfig.MAX_BLOCK_MS_CONFIG), 
120);{code}
it takes longer for the error to occur but after 20 minutes it still does:
{code:java}
2019-10-19 22:10:52,910 ERROR --- [c892e-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamTask    : task [0_1] Timeout 
exception caught when initializing transactions for task 0_1. This might happen 
if the broker is slow to respond, if the network connection to the broker was 
interrupted, or if similar circumstances arise. You can increase producer 
parameter `max.block.ms` to increase this timeout
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
120milliseconds while awaiting InitProducerId{code}
So at this point, short of waiting 17 days for the stream to finally recover on 
its own as it did before, I don't know how to solve this, other than ditching 
Kafka entirely, which admittedly, is an idea looking better and better.

 

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-10-19 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955262#comment-16955262
 ] 

Raman Gupta commented on KAFKA-8803:


And now suddenly I have this same problem again... this is super-frustrating.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8966) Stream state does not transition to RUNNING on client, broker consumer group shows RUNNING

2019-10-01 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8966:
---
Description: 
I have a Kafka stream that has been running fine until recently. The new 
behavior I see is that the stream state on the client goes from CREATED to 
REBALANCING, but never transitions from REBALANCING to RUNNING.

However, at the same time, if I look at the offsets of the corresponding 
consumer group, the consumer group appears to be consuming from the topic and 
has no lag. And yet, the client never made a state change to RUNNING. This is 
confirmed by calling `streams.close` on the stream and noting the state change 
goes from REBALANCING to PENDING_SHUTDOWN instead of RUNNING to 
PENDING_SHUTDOWN as expected.

I use the state change to enable queries on the stream store -- if the state 
change listener never triggers to the RUNNING state, there is no way to know 
when the client is available for queries.

Yes, I have confirmed its the correct consumer group. Yes, the consumer group 
has no consumers when I shut down the client stream.

Server logs:

kafka-2 kafka 2019-10-01T16:59:36.348859731Z [2019-10-01 16:59:36,348] INFO 
[GroupCoordinator 2]: Preparing to rebalance group 
arena-rg-uiService-fileStatusStore-stream in state PreparingRebalance with old 
generation 0 (__consumer_offsets-42) (reason: Adding new member 
arena-rg-uiService-fileStatusStore-stream-0a954f60-f8a3-4f13-8d9e-6caa63773dd2-StreamThread-1-consumer-325a6889-659f-48cb-b308-0d626b573944
 with group instanceid None) (kafka.coordinator.group.GroupCoordinator)
kafka-2 kafka 2019-10-01T17:00:06.349171842Z [2019-10-01 17:00:06,348] INFO 
[GroupCoordinator 2]: Stabilized group 
arena-rg-uiService-fileStatusStore-stream generation 1 (__consumer_offsets-42) 
(kafka.coordinator.group.GroupCoordinator)
kafka-2 kafka 2019-10-01T17:00:06.604980028Z [2019-10-01 17:00:06,604] INFO 
[GroupCoordinator 2]: Assignment received from leader for group 
arena-rg-uiService-fileStatusStore-stream for generation 1 
(kafka.coordinator.group.GroupCoordinator)



  was:
I have a Kafka stream that has been running fine until recently. The new 
behavior I see is that the stream state on the client goes from CREATED to 
REBALANCING, but never transitions from REBALANCING to RUNNING.

However, at the same time, if I look at the offsets of the corresponding 
consumer group, the consumer group appears to be consuming from the topic and 
has no lag. And yet, the client never made a state change to RUNNING. This is 
confirmed by calling `streams.close` on the stream and noting the state change 
goes from REBALANCING to PENDING_SHUTDOWN instead of RUNNING to 
PENDING_SHUTDOWN as expected.

I use the state change to enable queries on the stream store -- if the state 
change listener never triggers to the RUNNING state, there is no way to know 
when the client is available for queries.

Yes, I have confirmed its the correct consumer group. Yes, the consumer group 
has no consumers when I shut down the client stream.


> Stream state does not transition to RUNNING on client, broker consumer group 
> shows RUNNING
> --
>
> Key: KAFKA-8966
> URL: https://issues.apache.org/jira/browse/KAFKA-8966
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Critical
>
> I have a Kafka stream that has been running fine until recently. The new 
> behavior I see is that the stream state on the client goes from CREATED to 
> REBALANCING, but never transitions from REBALANCING to RUNNING.
> However, at the same time, if I look at the offsets of the corresponding 
> consumer group, the consumer group appears to be consuming from the topic and 
> has no lag. And yet, the client never made a state change to RUNNING. This is 
> confirmed by calling `streams.close` on the stream and noting the state 
> change goes from REBALANCING to PENDING_SHUTDOWN instead of RUNNING to 
> PENDING_SHUTDOWN as expected.
> I use the state change to enable queries on the stream store -- if the state 
> change listener never triggers to the RUNNING state, there is no way to know 
> when the client is available for queries.
> Yes, I have confirmed its the correct consumer group. Yes, the consumer group 
> has no consumers when I shut down the client stream.
> Server logs:
> kafka-2 kafka 2019-10-01T16:59:36.348859731Z [2019-10-01 16:59:36,348] INFO 
> [GroupCoordinator 2]: Preparing to rebalance group 
> arena-rg-uiService-fileStatusStore-stream in state PreparingRebalance with 
> old generation 0 (__consumer_offsets-42) (reason: Adding new member 
> 

[jira] [Created] (KAFKA-8966) Stream state does not transition to RUNNING on client, broker consumer group shows RUNNING

2019-10-01 Thread Raman Gupta (Jira)
Raman Gupta created KAFKA-8966:
--

 Summary: Stream state does not transition to RUNNING on client, 
broker consumer group shows RUNNING
 Key: KAFKA-8966
 URL: https://issues.apache.org/jira/browse/KAFKA-8966
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.3.0
Reporter: Raman Gupta


I have a Kafka stream that has been running fine until recently. The new 
behavior I see is that the stream state on the client goes from CREATED to 
REBALANCING, but never transitions from REBALANCING to RUNNING.

However, at the same time, if I look at the offsets of the corresponding 
consumer group, the consumer group appears to be consuming from the topic and 
has no lag. And yet, the client never made a state change to RUNNING. This is 
confirmed by calling `streams.close` on the stream and noting the state change 
goes from REBALANCING to PENDING_SHUTDOWN instead of RUNNING to 
PENDING_SHUTDOWN as expected.

I use the state change to enable queries on the stream store -- if the state 
change listener never triggers to the RUNNING state, there is no way to know 
when the client is available for queries.

Yes, I have confirmed its the correct consumer group. Yes, the consumer group 
has no consumers when I shut down the client stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8922) Failed to get end offsets for topic partitions of global store

2019-09-18 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta resolved KAFKA-8922.

Resolution: Invalid

Closing as the error had nothing to do with streams -- just general broker 
unavailability which was reported with a poor error message by the client. 
Still don't know why the broker were unavailable but, hey, that's Kafka!

> Failed to get end offsets for topic partitions of global store
> --
>
> Key: KAFKA-8922
> URL: https://issues.apache.org/jira/browse/KAFKA-8922
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
>
> I have a Kafka stream that fails with this error on startup every time:
> {code}
> org.apache.kafka.streams.errors.StreamsException: Failed to get end offsets 
> for topic partitions of global store test-uiService-dlq-events-table-store 
> after 0 retry attempts. You can increase the number of retries via 
> configuration parameter `retries`.
> at 
> org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.register(GlobalStateManagerImpl.java:186)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:101)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:207)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.state.internals.KeyValueToTimestampedKeyValueByteStoreAdapter.init(KeyValueToTimestampedKeyValueByteStoreAdapter.java:87)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:58)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:112)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.initialize(GlobalStateManagerImpl.java:123)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.GlobalStateUpdateTask.initialize(GlobalStateUpdateTask.java:61)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.initialize(GlobalStreamThread.java:229)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.GlobalStreamThread.initialize(GlobalStreamThread.java:345)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:270)
>  ~[kafka-streams-2.3.0.jar:?]
> Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to get 
> offsets by times in 30001ms
> {code}
> The stream was working fine and then this started happening.
> The stream now throws this error on every start. I am now going to attempt to 
> reset the stream and delete its local state.
> I hate to say it, but Kafka Streams suck. Its problem after problem.
> UPDATE: Some more info: it appears that the brokers may have gotten into some 
> kind of crazy state, for an unknown reason, and now they are just shrinking 
> and expanding ISRs repeatedly. Trying to figure out the root cause of this 
> craziness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8922) Failed to get end offsets for topic partitions of global store

2019-09-18 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8922:
---
Description: 
I have a Kafka stream that fails with this error on startup every time:

{code}
org.apache.kafka.streams.errors.StreamsException: Failed to get end offsets for 
topic partitions of global store test-uiService-dlq-events-table-store after 0 
retry attempts. You can increase the number of retries via configuration 
parameter `retries`.
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.register(GlobalStateManagerImpl.java:186)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:101)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:207)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.KeyValueToTimestampedKeyValueByteStoreAdapter.init(KeyValueToTimestampedKeyValueByteStoreAdapter.java:87)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:58)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:112)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.initialize(GlobalStateManagerImpl.java:123)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateUpdateTask.initialize(GlobalStateUpdateTask.java:61)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.initialize(GlobalStreamThread.java:229)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread.initialize(GlobalStreamThread.java:345)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:270)
 ~[kafka-streams-2.3.0.jar:?]
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to get 
offsets by times in 30001ms
{code}

The stream was working fine and then this started happening.

The stream now throws this error on every start. I am now going to attempt to 
reset the stream and delete its local state.

I hate to say it, but Kafka Streams suck. Its problem after problem.

UPDATE: Some more info: it appears that the brokers may have gotten into some 
kind of crazy state, for an unknown reason, and now they are just shrinking and 
expanding ISRs repeatedly. Trying to figure out the root cause of this 
craziness.

  was:
I have a Kafka stream that fails with this error on startup every time:

{code}
org.apache.kafka.streams.errors.StreamsException: Failed to get end offsets for 
topic partitions of global store test-uiService-dlq-events-table-store after 0 
retry attempts. You can increase the number of retries via configuration 
parameter `retries`.
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.register(GlobalStateManagerImpl.java:186)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:101)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:207)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.KeyValueToTimestampedKeyValueByteStoreAdapter.init(KeyValueToTimestampedKeyValueByteStoreAdapter.java:87)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:58)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:112)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.initialize(GlobalStateManagerImpl.java:123)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateUpdateTask.initialize(GlobalStateUpdateTask.java:61)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.initialize(GlobalStreamThread.java:229)
 ~[kafka-streams-2.3.0.jar:?]
at 

[jira] [Updated] (KAFKA-8922) Failed to get end offsets for topic partitions of global store

2019-09-18 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8922:
---
Description: 
I have a Kafka stream that fails with this error on startup every time:

{code}
org.apache.kafka.streams.errors.StreamsException: Failed to get end offsets for 
topic partitions of global store test-uiService-dlq-events-table-store after 0 
retry attempts. You can increase the number of retries via configuration 
parameter `retries`.
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.register(GlobalStateManagerImpl.java:186)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:101)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:207)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.KeyValueToTimestampedKeyValueByteStoreAdapter.init(KeyValueToTimestampedKeyValueByteStoreAdapter.java:87)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:58)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:112)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.initialize(GlobalStateManagerImpl.java:123)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateUpdateTask.initialize(GlobalStateUpdateTask.java:61)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.initialize(GlobalStreamThread.java:229)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread.initialize(GlobalStreamThread.java:345)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:270)
 ~[kafka-streams-2.3.0.jar:?]
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to get 
offsets by times in 30001ms
{code}

The stream was working fine and then this started happening.

The stream now throws this error on every start. I am now going to attempt to 
reset the stream and delete its local state.

I hate to say it, but Kafka Streams suck. Its problem after problem.

UPDATE: Some more info: it appears that the brokers may have gotten into some 
kind of crazy state due to a librdkafka-based client (NodeJS). I see thousands 
of logs per minute related to rebalancing that consumer 

  was:
I have a Kafka stream that fails with this error on startup every time:

{code}
org.apache.kafka.streams.errors.StreamsException: Failed to get end offsets for 
topic partitions of global store test-uiService-dlq-events-table-store after 0 
retry attempts. You can increase the number of retries via configuration 
parameter `retries`.
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.register(GlobalStateManagerImpl.java:186)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:101)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:207)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.KeyValueToTimestampedKeyValueByteStoreAdapter.init(KeyValueToTimestampedKeyValueByteStoreAdapter.java:87)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:58)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:112)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.initialize(GlobalStateManagerImpl.java:123)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateUpdateTask.initialize(GlobalStateUpdateTask.java:61)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.initialize(GlobalStreamThread.java:229)
 ~[kafka-streams-2.3.0.jar:?]
at 

[jira] [Created] (KAFKA-8922) Failed to get end offsets for topic partitions of global store

2019-09-18 Thread Raman Gupta (Jira)
Raman Gupta created KAFKA-8922:
--

 Summary: Failed to get end offsets for topic partitions of global 
store
 Key: KAFKA-8922
 URL: https://issues.apache.org/jira/browse/KAFKA-8922
 Project: Kafka
  Issue Type: Bug
Reporter: Raman Gupta


I have a Kafka stream that fails with this error on startup every time:

{code}
org.apache.kafka.streams.errors.StreamsException: Failed to get end offsets for 
topic partitions of global store test-uiService-dlq-events-table-store after 0 
retry attempts. You can increase the number of retries via configuration 
parameter `retries`.
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.register(GlobalStateManagerImpl.java:186)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:101)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:207)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.KeyValueToTimestampedKeyValueByteStoreAdapter.init(KeyValueToTimestampedKeyValueByteStoreAdapter.java:87)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:58)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:112)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl.initialize(GlobalStateManagerImpl.java:123)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStateUpdateTask.initialize(GlobalStateUpdateTask.java:61)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.initialize(GlobalStreamThread.java:229)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread.initialize(GlobalStreamThread.java:345)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:270)
 ~[kafka-streams-2.3.0.jar:?]
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to get 
offsets by times in 30001ms
{code}

The stream was working fine and then this started happening.

The stream now throws this error on every start. I am now going to attempt to 
reset the stream and delete its local state.

I hate to say it, but Kafka Streams suck. Its problem after problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8793) StickyTaskAssignor throws java.lang.ArithmeticException

2019-08-30 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920011#comment-16920011
 ] 

Raman Gupta commented on KAFKA-8793:


[~guozhang] Unfortunately my logs have rolled over and I don't have this error 
recently. However, I will note that I applied an earlier patch for KAFKA-8715 
in which the patch used a timestamp and not a UUID. There was discussion on the 
pull request about this, and then it was changed to UUID, but I had already 
patched my broker, as the issue was causing me lots of trouble. So given your 
comments, a timestamp collision due to my older patch of KAFKA-8715 is probably 
the cause of this issue. There are 3 streams instances, and 2 threads per 
instance so there could easily have been a timestamp collision between each of 
the two threads connecting simultaneously to the broker. If that is the case, 
then I think it makes sense to close this.

> StickyTaskAssignor throws java.lang.ArithmeticException
> ---
>
> Key: KAFKA-8793
> URL: https://issues.apache.org/jira/browse/KAFKA-8793
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Assignee: Guozhang Wang
>Priority: Critical
>
> Occassionally when starting a streams consumer that uses the static consumer 
> group protocol, I get the following error:
> {code:java}
> 2019-08-13 06:06:43,527 ERROR --- [691d2-StreamThread-1] 
> org.apa.kaf.str.pro.int.StreamThread  : stream-thread 
> [prod-cisSegmenter-777489d8-6cc5-48b4-8771-868d873691d2-StreamThread-1] 
> Encountered the following er
> ror during processing:
> EXCEPTION: java.lang.ArithmeticException: / by zero
> at 
> org.apache.kafka.streams.processor.internals.assignment.StickyTaskAssignor.assignActive(StickyTaskAssignor.java:76)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.assignment.StickyTaskAssignor.assign(StickyTaskAssignor.java:52)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:634)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:424)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:622)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:107)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:544)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:527)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:978)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:958)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:578)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:212)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:415)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
>  

[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-30 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920004#comment-16920004
 ] 

Raman Gupta commented on KAFKA-8803:


[~bbejeck] If you want to close it go ahead, however, I don't really consider 
any situation in which a stream takes 17 days to recover normal, when using the 
default settings. Furthermore, the documentation for `max.block.ms` does not in 
any way cover this situation. It says:

> These methods can be blocked either because the buffer is full or metadata 
> unavailable.

Neither of these was true in this situation. Furthermore the error message 
says: "This might happen if the broker is slow to respond, if the network 
connection to the broker was interrupted, or if similar circumstances arise." 
Note that these situations explicitly refer to performance and networking 
problems, and do not mention that the broker state for this particular stream 
could be causing the issue.

Furthermore, I still don't see why the broker would continue to experience the 
same UNKNOWN_LEADER_EPOCH error over the course of 17 days. Shouldn't the 
broker's recover on their own and the stream successfully reconnect once they 
do? Any situation in which the client is somehow causing this error to continue 
to happen for 17 days is in my opinion a bug (especially given I had even 
turned off this stream for about 6 of these 17 days, and still the brokers did 
not recover during this period).

Given all that, it seems to me there are still lots of unexplained behavior 
here, and it doesn't make sense to me to close the issue.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-30 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919877#comment-16919877
 ] 

Raman Gupta commented on KAFKA-8803:


[~bbejeck] Hmm, I was just looking over the logs again. I thought the stream 
had recovered on that setting change, but in reality the brokers / stream seem 
to have recovered on their own. The last time the TimeoutException occurred was 
Aug 26th 13:04. I did also disable these streams to avoid a crash-loop in the 
process at that time, but today those streams did run fine without the 
max.block.ms change. Here is the timeline:

Aug 20 16:31 - disabled stream

Aug 26 12:52 - stream enabled, issue still happening

Aug 26 13:04 - disabled stream again

Aug 30 14:18 - stream enabled, issue now resolved (no max.block.ms change)

Let me know if you want me to dig into any logs between any of the above times, 
but I wanted to point this behavior out in case it impacts your working theory.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-30 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919838#comment-16919838
 ] 

Raman Gupta commented on KAFKA-8803:


Changing the `max.block.ms` configuration did seem to fix the problem. I'm 
still curious to understand more about why this error kept occurring, as per my 
last comment. Also, should the default value of this config be higher?

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-30 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919744#comment-16919744
 ] 

Raman Gupta commented on KAFKA-8803:


[~bbejeck] I'll try that but one thing that seems odd to me: why would the next 
time my app restarts, the broker experience the same UNKNOWN_LEADER_EPOCH 
error? Is that somehow caused by the client?

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-29 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918695#comment-16918695
 ] 

Raman Gupta commented on KAFKA-8803:


Thanks [~bbejeck]. Currently the stream is in the same state so if additional 
debugging information is needed, I can probably still get it. However, very 
soon I'll need to reset the environment and move on, as this stream has been 
down a long time.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-26 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915946#comment-16915946
 ] 

Raman Gupta commented on KAFKA-8803:


[~bbejeck] Any updates on this? The stream still won't start.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-20 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911420#comment-16911420
 ] 

Raman Gupta edited comment on KAFKA-8803 at 8/20/19 2:43 PM:
-

You're right, I had the incorrect issue reference. I meant issue 
https://issues.apache.org/jira/browse/KAFKA-8715 (patch 
https://github.com/apache/kafka/pull/7116, although note that my patch is 
actually an earlier version of this pull request that used timestamp -- I don't 
believe its relevant here though).


was (Author: rocketraman):
You're right, I had the incorrect issue reference. I meant issue 
https://issues.apache.org/jira/browse/KAFKA-8715 (patch 
https://github.com/apache/kafka/pull/7116).

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-20 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911420#comment-16911420
 ] 

Raman Gupta commented on KAFKA-8803:


You're right, I had the incorrect issue reference. I meant issue 
https://issues.apache.org/jira/browse/KAFKA-8715 (patch 
https://github.com/apache/kafka/pull/7116).

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-20 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911022#comment-16911022
 ] 

Raman Gupta edited comment on KAFKA-8803 at 8/20/19 6:07 AM:
-

[~bbejeck] I have attached the logs of all the broker processes, as well as the 
client application containing this stream, as well as some other streams, from 
the period when this problem first started. The file is tab-delimited, with the 
first column being the Kubernetes pod name (will contain kafka-x for the broker 
logs, or cis-x for the client logs), and the second column being the log 
message. The logs from the different processes are interleaved by timestamp. 
Let me know if that is sufficient.

A few streams experienced the timeout error as you can see from the logs, but 
most of them recovered. The one that has not is "dev-cisSegmenter-stream".


was (Author: rocketraman):
[~bbejeck] I have attached the logs of all the broker processes, as well as the 
client application containing this stream, as well as some other streams, from 
the period when this problem first started. The file is tab-delimited, with the 
first column being the Kubernetes pod name (will contain kafka-x for the broker 
logs, or cis-x for the client logs), and the second column being the log 
message. The streams are interleaved by timestamp. Let me know if that is 
sufficient.

A few streams experienced the timeout error as you can see from the logs, but 
most of them recovered. The one that has not is "dev-cisSegmenter-stream".

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-20 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911022#comment-16911022
 ] 

Raman Gupta edited comment on KAFKA-8803 at 8/20/19 6:06 AM:
-

[~bbejeck] I have attached the logs of all the broker processes, as well as the 
client application containing this stream, as well as some other streams, from 
the period when this problem first started. The file is tab-delimited, with the 
first column being the Kubernetes pod name (will contain kafka-x for the broker 
logs, or cis-x for the client logs), and the second column being the log 
message. The streams are interleaved by timestamp. Let me know if that is 
sufficient.

A few streams experienced the timeout error as you can see from the logs, but 
most of them recovered. The one that has not is "dev-cisSegmenter-stream".


was (Author: rocketraman):
[~bbejeck] I have attached the logs of all the broker processes, as well as the 
client application containing this stream, as well as some other streams, from 
the period when this problem first started. The file is tab-delimited, with the 
first column being the Kubernetes pod name (will contain kafka-x for the broker 
logs, or cis-x for the client logs), and the second column being the log 
message. Let me know if that is sufficient.

A few streams experienced the timeout error as you can see from the logs, but 
most of them recovered. The one that has not is "dev-cisSegmenter-stream".

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-20 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911022#comment-16911022
 ] 

Raman Gupta edited comment on KAFKA-8803 at 8/20/19 6:06 AM:
-

[~bbejeck] I have attached the logs of all the broker processes, as well as the 
client application containing this stream, as well as some other streams, from 
the period when this problem first started. The file is tab-delimited, with the 
first column being the Kubernetes pod name (will contain kafka-x for the broker 
logs, or cis-x for the client logs), and the second column being the log 
message. Let me know if that is sufficient.

A few streams experienced the timeout error as you can see from the logs, but 
most of them recovered. The one that has not is "dev-cisSegmenter-stream".


was (Author: rocketraman):
[~bbejeck] I have attached the logs of all the broker processes, as well as the 
client application containing this stream, as well as some other streams, from 
the period when this problem first started. The file is tab-delimited, with the 
first column being the Kubernetes pod name (will contain kafka-x for the broker 
logs, or cis-x for the client logs), and the second column being the log 
message. Let me know if that is sufficient.


> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-20 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911022#comment-16911022
 ] 

Raman Gupta commented on KAFKA-8803:


[~bbejeck] I have attached the logs of all the broker processes, as well as the 
client application containing this stream, as well as some other streams, from 
the period when this problem first started. The file is tab-delimited, with the 
first column being the Kubernetes pod name (will contain kafka-x for the broker 
logs, or cis-x for the client logs), and the second column being the log 
message. Let me know if that is sufficient.


> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-20 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8803:
---
Attachment: logs.txt.gz

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-20 Thread Raman Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8803:
---
Description: 
One streams app is consistently failing at startup with the following exception:

{code}
2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
exception caught when initializing transactions for task 0_36. This might 
happen if the broker is slow to respond, if the network connection to the 
broker was interrupted, or if similar circumstances arise. You can increase 
producer parameter `max.block.ms` to increase this timeout.
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
6milliseconds while awaiting InitProducerId
{code}

These same brokers are used by many other streams without any issue, including 
some in the very same processes for the stream which consistently throws this 
exception.

*UPDATE 08/16:*

The very first instance of this error is August 13th 2019, 17:03:36.754 and it 
happened for 4 different streams. For 3 of these streams, the error only 
happened once, and then the stream recovered. For the 4th stream, the error has 
continued to happen, and continues to happen now.

I looked up the broker logs for this time, and see that at August 13th 2019, 
16:47:43, two of four brokers started reporting messages like this, for 
multiple partitions:

[2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)

The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
here is a view of the count of these messages over time:

 !screenshot-1.png! 

However, as noted, the stream task timeout error continues to happen.

I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
broker. The broker has a patch for KAFKA-8773.

  was:
One streams app is consistently failing at startup with the following exception:

{code}
2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
exception caught when initializing transactions for task 0_36. This might 
happen if the broker is slow to respond, if the network connection to the 
broker was interrupted, or if similar circumstances arise. You can increase 
producer parameter `max.block.ms` to increase this timeout.
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
6milliseconds while awaiting InitProducerId
{code}

These same brokers are used by many other streams without any issue, including 
some in the very same processes for the stream which consistently throws this 
exception.

*UPDATE 08/16:*

The very first instance of this error is August 13th 2019, 17:03:36.754 and it 
happened for 4 different streams. For 3 of these streams, the error only 
happened once, and then the stream recovered. For the 4th stream, the error has 
continued to happen, and continues to happen now.

I looked up the broker logs for this time, and see that at August 13th 2019, 
16:47:43, two of four brokers started reporting messages like this, for 
multiple partitions:

[2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)

One of these brokers only reported 2 cases, the other reported many many 
thousands.

The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
here is a view of the count of these messages over time:

 !screenshot-1.png! 

However, as noted, the stream task timeout error continues to happen.

I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
broker. The broker has a patch for KAFKA-8773.


> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances 

[jira] [Commented] (KAFKA-8766) Allow a custom offset policy for Kafka Streams applications

2019-08-16 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909316#comment-16909316
 ] 

Raman Gupta commented on KAFKA-8766:


See KAFKA-8650 for related discussion.

> Allow a custom offset policy for Kafka Streams applications 
> 
>
> Key: KAFKA-8766
> URL: https://issues.apache.org/jira/browse/KAFKA-8766
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Patrik Kleindl
>Priority: Minor
>
> Currently when starting a new streams application (= new consumer group) you 
> can only choose between starting from the beginning of all topics or only 
> processing newly arriving records.
> To start processing at any give point in the past (e.g. only processing data 
> of the last month) the application has to be started (so the consumer group 
> exists), stopped, the offsets reset and then restarted.
> It would be helpful if this could be passed in with the help of some kind of 
> "offset reset strategy" which could be provided by the user.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-16 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8803:
---
Description: 
One streams app is consistently failing at startup with the following exception:

{code}
2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
exception caught when initializing transactions for task 0_36. This might 
happen if the broker is slow to respond, if the network connection to the 
broker was interrupted, or if similar circumstances arise. You can increase 
producer parameter `max.block.ms` to increase this timeout.
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
6milliseconds while awaiting InitProducerId
{code}

These same brokers are used by many other streams without any issue, including 
some in the very same processes for the stream which consistently throws this 
exception.

*UPDATE 08/16:*

The very first instance of this error is August 13th 2019, 17:03:36.754 and it 
happened for 4 different streams. For 3 of these streams, the error only 
happened once, and then the stream recovered. For the 4th stream, the error has 
continued to happen, and continues to happen now.

I looked up the broker logs for this time, and see that at August 13th 2019, 
16:47:43, two of four brokers started reporting messages like this, for 
multiple partitions:

[2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)

One of these brokers only reported 2 cases, the other reported many many 
thousands.

The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
here is a view of the count of these messages over time:

 !screenshot-1.png! 

However, as noted, the stream task timeout error continues to happen.

I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
broker. The broker has a patch for KAFKA-8773.

  was:
One streams app is consistently failing at startup with the following exception:

{code}
2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
exception caught when initializing transactions for task 0_36. This might 
happen if the broker is slow to respond, if the network connection to the 
broker was interrupted, or if similar circumstances arise. You can increase 
producer parameter `max.block.ms` to increase this timeout.
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
6milliseconds while awaiting InitProducerId
{code}

These same brokers are used by many other streams without any issue, including 
some in the very same processes for the stream which consistently throws this 
exception.

*UPDATE 08/16:*

The very first instance of this error is August 13th 2019, 17:03:36.754 and it 
happened for 4 different streams. For 3 of these streams, the error only 
happened once, and then the stream recovered. For the 4th stream, the error has 
continued to happen, and continues to happen now.

I looked up the broker logs for this time, and see that at August 13th 2019, 
16:47:43, two of four brokers started reporting messages like this, for 
multiple partitions:

[2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)

One of these brokers only reported 2 cases, the other reported many many 
thousands.

The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
here is a view of the count of these messages over time:

 !screenshot-1.png! 



I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
broker. The broker has a patch for KAFKA-8773.


> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances 

[jira] [Updated] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-16 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8803:
---
Attachment: screenshot-1.png

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-16 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8803:
---
Description: 
One streams app is consistently failing at startup with the following exception:

{code}
2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
exception caught when initializing transactions for task 0_36. This might 
happen if the broker is slow to respond, if the network connection to the 
broker was interrupted, or if similar circumstances arise. You can increase 
producer parameter `max.block.ms` to increase this timeout.
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
6milliseconds while awaiting InitProducerId
{code}

These same brokers are used by many other streams without any issue, including 
some in the very same processes for the stream which consistently throws this 
exception.

*UPDATE 08/16:*

The very first instance of this error is August 13th 2019, 17:03:36.754 and it 
happened for 4 different streams. For 3 of these streams, the error only 
happened once, and then the stream recovered. For the 4th stream, the error has 
continued to happen, and continues to happen now.

I looked up the broker logs for this time, and see that at August 13th 2019, 
16:47:43, two of four brokers started reporting messages like this, for 
multiple partitions:

[2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)

One of these brokers only reported 2 cases, the other reported many many 
thousands.

The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
here is a view of the count of these messages over time:

 !screenshot-1.png! 



I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
broker. The broker has a patch for KAFKA-8773.

  was:
One streams app is consistently failing at startup with the following exception:

{code}
2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
exception caught when initializing transactions for task 0_36. This might 
happen if the broker is slow to respond, if the network connection to the 
broker was interrupted, or if similar circumstances arise. You can increase 
producer parameter `max.block.ms` to increase this timeout.
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
6milliseconds while awaiting InitProducerId
{code}

These same brokers are used by many other streams without any issue, including 
some in the very same processes for the stream which consistently throws this 
exception.

I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
broker. The broker has a patch for KAFKA-8773.


> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition 

[jira] [Created] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-08-14 Thread Raman Gupta (JIRA)
Raman Gupta created KAFKA-8803:
--

 Summary: Stream will not start due to TimeoutException: Timeout 
expired after 6milliseconds while awaiting InitProducerId
 Key: KAFKA-8803
 URL: https://issues.apache.org/jira/browse/KAFKA-8803
 Project: Kafka
  Issue Type: Bug
Reporter: Raman Gupta


One streams app is consistently failing at startup with the following exception:

{code}
2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
exception caught when initializing transactions for task 0_36. This might 
happen if the broker is slow to respond, if the network connection to the 
broker was interrupted, or if similar circumstances arise. You can increase 
producer parameter `max.block.ms` to increase this timeout.
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
6milliseconds while awaiting InitProducerId
{code}

These same brokers are used by many other streams without any issue, including 
some in the very same processes for the stream which consistently throws this 
exception.

I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8797) BufferUnderflowException: Error reading field 'version' from consumer

2019-08-13 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8797:
---
Affects Version/s: 2.3.0

> BufferUnderflowException: Error reading field 'version' from consumer
> -
>
> Key: KAFKA-8797
> URL: https://issues.apache.org/jira/browse/KAFKA-8797
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> Occassionally I get these errors from my 2.3.0 consumers, talking to 2.3.0 
> brokers:
> {code}
> 2019-08-08 16:56:47,235 ERROR — 
> red.mic.kaf.AbstractKafkaAutoCommitConsumerService: Exception in polling 
> thread. Will die for safety.
> EXCEPTION: org.apache.kafka.common.protocol.types.SchemaException: Error 
> reading field 'version': java.nio.BufferUnderflowException
> at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerProtocol.deserializeAssignment(ConsumerProtocol.java:106)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:262)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:424)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:353)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> com.redock.microservice.kafka.BasicCommitAfterProcessingConsumer.run(BasicCommitAfterProcessingConsumer.kt:51)
>  ~[classes/:?]
> at 
> com.redock.microservice.kafka.AbstractKafkaAutoCommitConsumerService$start$2.invokeSuspend(AbstractKafkaAutoCommitConsumerService.kt:44)
>  [classes/:?]
> ... suppressed 2 lines
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
> at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
> [?:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>  [?:?]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> ]]
> {code}
> It seems to happen randomly in consumer restart situations. I use static 
> consumer groups.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8797) BufferUnderflowException: Error reading field 'version' from consumer

2019-08-13 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8797:
---
Component/s: consumer

> BufferUnderflowException: Error reading field 'version' from consumer
> -
>
> Key: KAFKA-8797
> URL: https://issues.apache.org/jira/browse/KAFKA-8797
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> Occassionally I get these errors from my 2.3.0 consumers, talking to 2.3.0 
> brokers:
> {code}
> 2019-08-08 16:56:47,235 ERROR — 
> red.mic.kaf.AbstractKafkaAutoCommitConsumerService: Exception in polling 
> thread. Will die for safety.
> EXCEPTION: org.apache.kafka.common.protocol.types.SchemaException: Error 
> reading field 'version': java.nio.BufferUnderflowException
> at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerProtocol.deserializeAssignment(ConsumerProtocol.java:106)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:262)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:424)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:353)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> com.redock.microservice.kafka.BasicCommitAfterProcessingConsumer.run(BasicCommitAfterProcessingConsumer.kt:51)
>  ~[classes/:?]
> at 
> com.redock.microservice.kafka.AbstractKafkaAutoCommitConsumerService$start$2.invokeSuspend(AbstractKafkaAutoCommitConsumerService.kt:44)
>  [classes/:?]
> ... suppressed 2 lines
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
> at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
> [?:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>  [?:?]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> ]]
> {code}
> It seems to happen randomly in consumer restart situations. I use static 
> consumer groups.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8797) BufferUnderflowException: Error reading field 'version' from consumer

2019-08-13 Thread Raman Gupta (JIRA)
Raman Gupta created KAFKA-8797:
--

 Summary: BufferUnderflowException: Error reading field 'version' 
from consumer
 Key: KAFKA-8797
 URL: https://issues.apache.org/jira/browse/KAFKA-8797
 Project: Kafka
  Issue Type: Bug
Reporter: Raman Gupta


Occassionally I get these errors from my 2.3.0 consumers, talking to 2.3.0 
brokers:

{code}
2019-08-08 16:56:47,235 ERROR — 
red.mic.kaf.AbstractKafkaAutoCommitConsumerService: Exception in polling 
thread. Will die for safety.
EXCEPTION: org.apache.kafka.common.protocol.types.SchemaException: Error 
reading field 'version': java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerProtocol.deserializeAssignment(ConsumerProtocol.java:106)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:262)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:424)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:353)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
~[kafka-clients-2.3.0.jar:?]
at 
com.redock.microservice.kafka.BasicCommitAfterProcessingConsumer.run(BasicCommitAfterProcessingConsumer.kt:51)
 ~[classes/:?]
at 
com.redock.microservice.kafka.AbstractKafkaAutoCommitConsumerService$start$2.invokeSuspend(AbstractKafkaAutoCommitConsumerService.kt:44)
 [classes/:?]
... suppressed 2 lines
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 
[?:?]
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
]]
{code}

It seems to happen randomly in consumer restart situations. I use static 
consumer groups.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with newer client

2019-08-13 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Summary: kafka-console-consumer timeout-ms setting behaves incorrectly with 
newer client  (was: kafka-console-consumer timeout-ms setting behaves 
incorrectly with older client)

> kafka-console-consumer timeout-ms setting behaves incorrectly with newer 
> client
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Assignee: Lee Dongjin
>Priority: Major
>
> I have a topic with about 20,000 events in it, running on a Kafka 2.3.0 
> broker. When I run the following tools command using the older Kafka client 
> included in Confluent 5.0.3.
> bin/kafka-console-consumer \ 
>   --bootstrap-server $KAFKA \ 
>   --topic x \ 
>   --from-beginning --max-messages 1 \
>  --timeout-ms 15000
> I get 1 message as expected.
> However, when running the exact same command using the console consumer 
> included with Confluent 5.3.0, I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8793) StickyTaskAssignor throws java.lang.ArithmeticException

2019-08-13 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8793:
---
Description: 
Occassionally when starting a streams consumer that uses the static consumer 
group protocol, I get the following error:
{code:java}
2019-08-13 06:06:43,527 ERROR --- [691d2-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamThread  : stream-thread 
[prod-cisSegmenter-777489d8-6cc5-48b4-8771-868d873691d2-StreamThread-1] 
Encountered the following er
ror during processing:
EXCEPTION: java.lang.ArithmeticException: / by zero
at 
org.apache.kafka.streams.processor.internals.assignment.StickyTaskAssignor.assignActive(StickyTaskAssignor.java:76)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.assignment.StickyTaskAssignor.assign(StickyTaskAssignor.java:52)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:634)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:424)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:622)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:107)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:544)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:527)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:978)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:958)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:578)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:212)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:415)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:353)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:941)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:850)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
 [kafka-streams-2.3.0.jar:?]
{code}

It seems to happen after a restart of a process containing a stream, and it 
does not happen consistently, however it does happen somewhat regularly.

My Kafka server is 2.3.0, with a patch for KAFKA-8715.

  was:
Occassionally when starting a streams consumer that uses the static consumer 
group 

[jira] [Created] (KAFKA-8793) StickyTaskAssignor throws java.lang.ArithmeticException

2019-08-13 Thread Raman Gupta (JIRA)
Raman Gupta created KAFKA-8793:
--

 Summary: StickyTaskAssignor throws java.lang.ArithmeticException
 Key: KAFKA-8793
 URL: https://issues.apache.org/jira/browse/KAFKA-8793
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.3.0
Reporter: Raman Gupta


Occassionally when starting a streams consumer that uses the static consumer 
group protocol, I get the following error:
{code:java}
2019-08-13 06:06:43,527 ERROR --- [691d2-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamThread : stream-thread 
[prod-cisSegmenter-777489d8-6cc5-48b4-8771-868d873691d2-StreamThread-1] 
Encountered the following er
ror during processing:
EXCEPTION: java.lang.ArithmeticException: / by zero
at 
org.apache.kafka.streams.processor.internals.assignment.StickyTaskAssignor.assignActive(StickyTaskAssignor.java:76)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.assignment.StickyTaskAssignor.assign(StickyTaskAssignor.java:52)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:634)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:424)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:622)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:107)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:544)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:527)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:978)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:958)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:578)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:212)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:415)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:353)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:941)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:850)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
 [kafka-streams-2.3.0.jar:?]
{code}

It seems to happen after a restart of a process containing a stream, and it 
does not happen consistently, however it does happen somewhat regularly.

My Kafka server is 2.3.0, with a patch for KAFKA-8715.



--
This message was sent by 

[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 11:11 PM:
--

The problem seems to be related to the `--timeout-ms` parameter.

In every case, the total time for the command to run is pretty much the same:
{code:java}
# Confluent 5.0.3 with no timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

# Confluent 5.3.0 with no timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

# Confluent 5.0.3 with 15s timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

# Confluent 5.3.0 with 15s timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

# Confluent 5.3.0 with 45s timeout works

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 45000
Processed a total of 1 messages
3.12user 0.27system 0:32.55elapsed 10%CPU (0avgtext+0avgdata 178252maxresident)k
0inputs+0outputs (0major+41263minor)pagefaults 0swaps
{code}
but newer client versions appear to need a longer timeout to work correctly Has 
the behavior of the `--timeout-ms` parameter changed in some way?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout, so having the smallest timeout possible that will likely give me 
all the messages is desirable.


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
# Confluent 5.0.3 with no timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

# Confluent 5.3.0 with no timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

# Confluent 5.0.3 with 15s timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

# Confluent 5.3.0 with 15s timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

# Confluent 5.3.0 with 45s timeout works

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 45000
Processed a total of 1 messages
3.12user 0.27system 0:32.55elapsed 10%CPU (0avgtext+0avgdata 178252maxresident)k
0inputs+0outputs (0major+41263minor)pagefaults 0swaps
{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only 

[jira] [Issue Comment Deleted] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Comment: was deleted

(was: And the same behavior for the regular console consumer:
{code:java}
confluent-5.3.0 $ time bin/kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 16:57:04,777] ERROR Error processing message, terminating consumer 
process:  (kafka.tools.ConsoleConsumer$)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
1.97user 0.23system 0:31.48elapsed 7%CPU (0avgtext+0avgdata 150260maxresident)k
0inputs+0outputs (0major+34637minor)pagefaults 0swaps{code})

> kafka-console-consumer timeout-ms setting behaves incorrectly with older 
> client
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it, running on a Kafka 2.3.0 
> broker. When I run the following tools command using the older Kafka client 
> included in Confluent 5.0.3.
> bin/kafka-console-consumer \ 
>   --bootstrap-server $KAFKA \ 
>   --topic x \ 
>   --from-beginning --max-messages 1 \
>  --timeout-ms 15000
> I get 1 message as expected.
> However, when running the exact same command using the console consumer 
> included with Confluent 5.3.0, I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 11:09 PM:
--

I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
# Confluent 5.0.3 with no timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

# Confluent 5.3.0 with no timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

# Confluent 5.0.3 with 15s timeout

confluent-5.0.3 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

# Confluent 5.3.0 with 15s timeout

confluent-5.3.0 $ time kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

# Confluent 5.3.0 with 45s timeout works

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 45000
Processed a total of 1 messages
3.12user 0.27system 0:32.55elapsed 10%CPU (0avgtext+0avgdata 178252maxresident)k
0inputs+0outputs (0major+41263minor)pagefaults 0swaps
{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout, so having the smallest timeout possible that will likely give me 
all the messages is desirable.


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 

[jira] [Updated] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Description: 
I have a topic with about 20,000 events in it, running on a Kafka 2.3.0 broker. 
When I run the following tools command using the older Kafka client included in 
Confluent 5.0.3.

bin/kafka-console-consumer \ 
  --bootstrap-server $KAFKA \ 
  --topic x \ 
  --from-beginning --max-messages 1 \
 --timeout-ms 15000

I get 1 message as expected.

However, when running the exact same command using the console consumer 
included with Confluent 5.3.0, I get 
org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.

NOTE: I am using the Confluent distribution of Kafka for the client side tools, 
specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try to 
replicate with a vanilla Kafka if necessary.

  was:
I have a topic with about 20,000 events in it. When I run the following tools 
command using Kafka 2.

bin/kafka-avro-console-consumer \ 
  --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
  --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
  --from-beginning --max-messages 100 \
  --isolation-level read_committed --skip-message-on-error \
  --timeout-ms 15000

I get 100 messages as expected.

However, when running the exact same command using Kafka 2.3.0 I get 
org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.

The version of Kafka on the server is 2.3.0.

NOTE: I am using the Confluent distribution of Kafka for the client side tools, 
specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try to 
replicate with a vanilla Kafka if necessary.


> kafka-console-consumer timeout-ms setting behaves incorrectly with older 
> client
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it, running on a Kafka 2.3.0 
> broker. When I run the following tools command using the older Kafka client 
> included in Confluent 5.0.3.
> bin/kafka-console-consumer \ 
>   --bootstrap-server $KAFKA \ 
>   --topic x \ 
>   --from-beginning --max-messages 1 \
>  --timeout-ms 15000
> I get 1 message as expected.
> However, when running the exact same command using the console consumer 
> included with Confluent 5.3.0, I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Issue Comment Deleted] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Comment: was deleted

(was: UPDATE: The error message may be coming from the schema registry, which 
would put it outside the purview of the Kafka project. Sorry for the noise.

For future Googlers, I created this issue instead: 
[https://github.com/confluentinc/schema-registry/issues/1185])

> kafka-console-consumer timeout-ms setting behaves incorrectly with older 
> client
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it, running on a Kafka 2.3.0 
> broker. When I run the following tools command using the older Kafka client 
> included in Confluent 5.0.3.
> bin/kafka-console-consumer \ 
>   --bootstrap-server $KAFKA \ 
>   --topic x \ 
>   --from-beginning --max-messages 1 \
>  --timeout-ms 15000
> I get 1 message as expected.
> However, when running the exact same command using the console consumer 
> included with Confluent 5.3.0, I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-console-consumer timeout-ms setting behaves incorrectly with older client

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Summary: kafka-console-consumer timeout-ms setting behaves incorrectly with 
older client  (was: kafka-console-consumer needs bigger timeout-ms setting in 
order to work)

> kafka-console-consumer timeout-ms setting behaves incorrectly with older 
> client
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8789) kafka-console-consumer needs bigger timeout-ms setting in order to work

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905652#comment-16905652
 ] 

Raman Gupta commented on KAFKA-8789:


And the same behavior for the regular console consumer:
{code:java}
confluent-5.3.0 $ time bin/kafka-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 16:57:04,777] ERROR Error processing message, terminating consumer 
process:  (kafka.tools.ConsoleConsumer$)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
1.97user 0.23system 0:31.48elapsed 7%CPU (0avgtext+0avgdata 150260maxresident)k
0inputs+0outputs (0major+34637minor)pagefaults 0swaps{code}

> kafka-console-consumer needs bigger timeout-ms setting in order to work
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer needs bigger timeout-ms setting in order to work

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 7:49 PM:
-

I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 45000
Processed a total of 1 messages
3.12user 0.27system 0:32.55elapsed 10%CPU (0avgtext+0avgdata 178252maxresident)k
0inputs+0outputs (0major+41263minor)pagefaults 0swaps
{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout.

 


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, 

[jira] [Updated] (KAFKA-8789) kafka-console-consumer needs bigger timeout-ms setting in order to work

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Summary: kafka-console-consumer needs bigger timeout-ms setting in order to 
work  (was: kafka-console-consumer performance regression)

> kafka-console-consumer needs bigger timeout-ms setting in order to work
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 7:23 PM:
-

I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps{code}
so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout.

 


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Ok more information here... the problem appears to be that the 5.3.0 client 
needs a much bigger timeout to work consistently i.e. 60s instead of 15s. At 
15s, the 5.0.3 client works consistently, but the 5.3.0 client times out every 
time. The SR is simply never called because the consumer never receives any 
messages within the timeout. So this may be a performance regression with Kafka 
and is not related to the SR, or a change in the behavior of the `--timeout-ms` 
command.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs 

[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 7:22 PM:
-

I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Ok more information here... the problem appears to be that the 5.3.0 client 
needs a much bigger timeout to work consistently i.e. 60s instead of 15s. At 
15s, the 5.0.3 client works consistently, but the 5.3.0 client times out every 
time. The SR is simply never called because the consumer never receives any 
messages within the timeout. So this may be a performance regression with Kafka 
and is not related to the SR, or a change in the behavior of the `--timeout-ms` 
command.

Also interesting is that the total time for the command to run is pretty much 
the same:
{code:java}
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps{code}

so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout.

 


was (Author: rocketraman):
I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Ok more information here... the problem appears to be that the 5.3.0 client 
needs a much bigger timeout to work consistently i.e. 60s instead of 15s. At 
15s, the 5.0.3 client works consistently, but the 5.3.0 client times out every 
time. The SR is simply never called because the consumer never receives any 
messages within the timeout. So this may be a performance regression with Kafka 
and is not related to the SR, or a change in the behavior of the `--timeout-ms` 
command.

Also interesting is that the total time for the command to run is pretty much 
the same:

```
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000

[jira] [Commented] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905510#comment-16905510
 ] 

Raman Gupta commented on KAFKA-8789:


I'm reopening this. The problem does not appear to be with the SR tooling at 
all, but rather that the console consumer for the Kafka version included in 
Confluent 5.3.0 is a lot slower than the console consumer in 5.0.3. Using a 
timeout of 15s is consistently enough to read all messages on the topic in 
5.0.3 but has to be at least 60s in 5.3.0, against the same brokers and with 
the same parameters.

Ok more information here... the problem appears to be that the 5.3.0 client 
needs a much bigger timeout to work consistently i.e. 60s instead of 15s. At 
15s, the 5.0.3 client works consistently, but the 5.3.0 client times out every 
time. The SR is simply never called because the consumer never receives any 
messages within the timeout. So this may be a performance regression with Kafka 
and is not related to the SR, or a change in the behavior of the `--timeout-ms` 
command.

Also interesting is that the total time for the command to run is pretty much 
the same:

```
confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
2.60user 0.22system 0:32.15elapsed 8%CPU (0avgtext+0avgdata 145764maxresident)k
0inputs+0outputs (0major+33989minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1
[...]
Processed a total of 1 messages
3.09user 0.28system 0:32.43elapsed 10%CPU (0avgtext+0avgdata 176440maxresident)k
0inputs+0outputs (0major+40773minor)pagefaults 0swaps

confluent-5.0.3 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[...]
Processed a total of 1 messages
2.58user 0.24system 0:32.29elapsed 8%CPU (0avgtext+0avgdata 144780maxresident)k
0inputs+0outputs (0major+33562minor)pagefaults 0swaps

confluent-5.3.0 $ time kafka-avro-console-consumer <...> --from-beginning 
--max-messages 1 --timeout-ms 15000
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
[2019-08-12 15:19:51,214] ERROR Error processing message, terminating consumer 
process: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 0 messages
2.09user 0.17system 0:31.47elapsed 7%CPU (0avgtext+0avgdata 149300maxresident)k
0inputs+8outputs (0major+33949minor)pagefaults 0swaps
```

so perhaps the behavior of the `--timeout-ms` parameter has changed?

As an aside, the only reason I need the timeout here is because this command is 
part of a unix pipeline, and I need it to exit when there are no more messages 
to read. Unfortunately, there doesn't appear to be any way to do that except to 
set a timeout.

 

> kafka-console-consumer performance regression
> -
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905343#comment-16905343
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 7:03 PM:
-

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. Sorry for the noise.

For future Googlers, I created this issue instead: 
[https://github.com/confluentinc/schema-registry/issues/1185]


was (Author: rocketraman):
UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url property or something? Sorry for the 
noise.

For future Googlers, I created this issue instead: 
[https://github.com/confluentinc/schema-registry/issues/1185]

> kafka-console-consumer performance regression
> -
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-console-consumer performance regression

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Summary: kafka-console-consumer performance regression  (was: 
kafka-avro-console-consumer works with 2.0.x, but not 2.3.x)

> kafka-console-consumer performance regression
> -
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta reopened KAFKA-8789:


> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905343#comment-16905343
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 5:06 PM:
-

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url property or something? Sorry for the 
noise.

For future Googlers, I created this issue instead: 
[https://github.com/confluentinc/schema-registry/issues/1185]


was (Author: rocketraman):
UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url property or something? Sorry for the 
noise.

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905343#comment-16905343
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 4:16 PM:
-

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url property or something? Sorry for the 
noise.


was (Author: rocketraman):
UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url or something? Sorry for the noise.

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905343#comment-16905343
 ] 

Raman Gupta edited comment on KAFKA-8789 at 8/12/19 4:16 PM:
-

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. I think the Confluent distro 
might be override the schema.registyr.url or something? Sorry for the noise.


was (Author: rocketraman):
UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. Sorry for the noise.

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta resolved KAFKA-8789.

Resolution: Invalid

UPDATE: The error message may be coming from the schema registry, which would 
put it outside the purview of the Kafka project. Sorry for the noise.

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Description: 
I have a topic with about 20,000 events in it. When I run the following tools 
command using Kafka 2.

bin/kafka-avro-console-consumer \ 
  --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
  --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
  --from-beginning --max-messages 100 \
  --isolation-level read_committed --skip-message-on-error \
  --timeout-ms 15000

I get 100 messages as expected.

However, when running the exact same command using Kafka 2.3.0 I get 
org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.

The version of Kafka on the server is 2.3.0.

NOTE: I am using the Confluent distribution of Kafka for the client side tools, 
specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try to 
replicate with a vanilla Kafka if necessary.

  was:
I have a topic with about 20,000 events in it. When I run the following tools 
command using

 

bin/kafka-avro-console-consumer \ 
  --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
  --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
  --from-beginning --max-messages 100 \
 --isolation-level read_committed --skip-message-on-error \
 --timeout-ms 15000

I get 100 messages 


> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using Kafka 2.
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>   --isolation-level read_committed --skip-message-on-error \
>   --timeout-ms 15000
> I get 100 messages as expected.
> However, when running the exact same command using Kafka 2.3.0 I get 
> org.apache.kafka.common.errors.TimeoutException, and 0 messages processed.
> The version of Kafka on the server is 2.3.0.
> NOTE: I am using the Confluent distribution of Kafka for the client side 
> tools, specifically Confluent 5.0.3 and Confluent 5.3.0. I can certainly try 
> to replicate with a vanilla Kafka if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8789) kafka-avro-console-consumer works with confluent 5.0.3, but not 5.3.0

2019-08-12 Thread Raman Gupta (JIRA)
Raman Gupta created KAFKA-8789:
--

 Summary: kafka-avro-console-consumer works with confluent 5.0.3, 
but not 5.3.0
 Key: KAFKA-8789
 URL: https://issues.apache.org/jira/browse/KAFKA-8789
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 2.3.0
Reporter: Raman Gupta


I have a topic with about 20,000 events in it. When I run the following tools 
command using

 

bin/kafka-avro-console-consumer \ 
  --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
  --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
  --from-beginning --max-messages 100 \
 --isolation-level read_committed --skip-message-on-error \
 --timeout-ms 15000

I get 100 messages 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8789) kafka-avro-console-consumer works with 2.0.x, but not 2.3.x

2019-08-12 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8789:
---
Summary: kafka-avro-console-consumer works with 2.0.x, but not 2.3.x  (was: 
kafka-avro-console-consumer works with confluent 5.0.3, but not 5.3.0)

> kafka-avro-console-consumer works with 2.0.x, but not 2.3.x
> ---
>
> Key: KAFKA-8789
> URL: https://issues.apache.org/jira/browse/KAFKA-8789
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Major
>
> I have a topic with about 20,000 events in it. When I run the following tools 
> command using
>  
> bin/kafka-avro-console-consumer \ 
>   --bootstrap-server $KAFKA --property schema.registry.url=$SCHEMAREGISTRY \ 
>   --topic $TOPICPREFIX-user-clickstream-events-ui-v2 \ 
>   --from-beginning --max-messages 100 \
>  --isolation-level read_committed --skip-message-on-error \
>  --timeout-ms 15000
> I get 100 messages 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8773) Static membership protocol borks on re-used group id

2019-08-08 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8773:
---
Description: 
I am using the new static group membership protocol in 2.3.0. I have a 
situation in which an application defines multiple consumers, lets call them:

consumer-1
 consumer-2

Each consumer uses the same group id "app-x", as they all belong to the same 
application. With dynamic group membership, this is no problem at all. However, 
with static membership starting a single instance of this application (and 
therefore both consumers have the same instance.id) fails consistently with 
errors like:
{code:java}
2019-08-08 16:56:47,223 ERROR — org.apa.kaf.cli.con.int.AbstractCoordinator : 
[Consumer instanceId=x-1, clientId=consumer-2, groupId=x] Received fatal 
exception: group.instance.id gets fenced
2019-08-08 16:56:47,229 ERROR — org.apa.kaf.cli.con.int.AbstractCoordinator : 
[Consumer instanceId=x-1, clientId=consumer-1, groupId=x] Received fatal 
exception: group.instance.id gets fenced
2019-08-08 16:56:47,234 ERROR 
---red.mic.kaf.AbstractKafkaAutoCommitConsumerService: Exception in polling 
thread. Will die for safety. [[EXCEPTION: 
org.apache.kafka.common.errors.FencedInstanceIdException: The broker rejected 
this static consumer since another consumer with the same group.instance.id has 
registered with a different member.id.
]]
2019-08-08 16:56:47,229 ERROR — 
red.mic.kaf.AbstractKafkaAutoCommitConsumerService: Exception in polling 
thread. Will die for safety. [[EXCEPTION: 
org.apache.kafka.common.errors.FencedInstanceIdException: The broker rejected 
this static consumer since another consumer with the same group.instance.id has 
registered with a different member.id.
]]{code}

and to top it off, I also get this obviously incorrect error:
{code:java}
2019-08-08 16:56:47,235 ERROR — 
red.mic.kaf.AbstractKafkaAutoCommitConsumerService: Exception in polling 
thread. Will die for safety. [[EXCEPTION: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'version': java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerProtocol.deserializeAssignment(ConsumerProtocol.java:106)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:262)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:424)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:353)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
~[kafka-clients-2.3.0.jar:?]
at 
com.redock.microservice.kafka.BasicCommitAfterProcessingConsumer.run(BasicCommitAfterProcessingConsumer.kt:51)
 ~[classes/:?]
at 
com.redock.microservice.kafka.AbstractKafkaAutoCommitConsumerService$start$2.invokeSuspend(AbstractKafkaAutoCommitConsumerService.kt:44)
 [classes/:?]
... suppressed 2 lines
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
]]{code}
 

The broker logs contain this error:
{code:java}
ERROR given member.id x-1-1565298855983 is identified as a known static member 
x-1,but not matching the expected member.id x-1-1565298855984 
(kafka.coordinator.group.GroupMetadata){code}
 

It seems like the client-id is not taken into account by the server in figuring 
the static group membership?

While the workaround is simple – change the group id of each consumer to 
include the client id – I don't believe this should be necessary.

  was:
I am using the new static group membership protocol in 2.3.0. I have a 
situation in which an application defines multiple consumers, lets call them:

consumer-1
consumer-2

Each consumer uses the same group id "app-x", as they all belong to the same 

[jira] [Created] (KAFKA-8773) Static membership protocol borks on re-used group id

2019-08-08 Thread Raman Gupta (JIRA)
Raman Gupta created KAFKA-8773:
--

 Summary: Static membership protocol borks on re-used group id
 Key: KAFKA-8773
 URL: https://issues.apache.org/jira/browse/KAFKA-8773
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Raman Gupta


I am using the new static group membership protocol in 2.3.0. I have a 
situation in which an application defines multiple consumers, lets call them:

consumer-1
consumer-2

Each consumer uses the same group id "app-x", as they all belong to the same 
application. With dynamic group membership, this is no problem at all. However, 
with static membership starting a single instance of this application (and 
therefore both consumers have the same instance.id) fails consistently with 
errors like:

```
2019-08-08 16:56:47,223 ERROR --- org.apa.kaf.cli.con.int.AbstractCoordinator   
: [Consumer instanceId=x-1, clientId=consumer-2, groupId=x] Received fatal 
exception: group.instance.id gets fenced
2019-08-08 16:56:47,229 ERROR --- org.apa.kaf.cli.con.int.AbstractCoordinator   
: [Consumer instanceId=x-1, clientId=consumer-1, groupId=x] Received fatal 
exception: group.instance.id gets fenced
2019-08-08 16:56:47,234 ERROR 
---red.mic.kaf.AbstractKafkaAutoCommitConsumerService: Exception in polling 
thread. Will die for safety. [[EXCEPTION: 
org.apache.kafka.common.errors.FencedInstanceIdException: The broker rejected 
this static consumer since another consumer with the same group.instance.id has 
registered with a different member.id.
]]
2019-08-08 16:56:47,229 ERROR --- 
red.mic.kaf.AbstractKafkaAutoCommitConsumerService: Exception in polling 
thread. Will die for safety. [[EXCEPTION: 
org.apache.kafka.common.errors.FencedInstanceIdException: The broker rejected 
this static consumer since another consumer with the same group.instance.id has 
registered with a different member.id.
]]
```

and to top it off, I also get this obviously incorrect error:

```
2019-08-08 16:56:47,235 ERROR --- 
red.mic.kaf.AbstractKafkaAutoCommitConsumerService: Exception in polling 
thread. Will die for safety. [[EXCEPTION: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'version': java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerProtocol.deserializeAssignment(ConsumerProtocol.java:106)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:262)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:424)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:353)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
~[kafka-clients-2.3.0.jar:?]
at 
com.redock.microservice.kafka.BasicCommitAfterProcessingConsumer.run(BasicCommitAfterProcessingConsumer.kt:51)
 ~[classes/:?]
at 
com.redock.microservice.kafka.AbstractKafkaAutoCommitConsumerService$start$2.invokeSuspend(AbstractKafkaAutoCommitConsumerService.kt:44)
 [classes/:?]
... suppressed 2 lines
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
]]
```

The broker logs contain this error:

ERROR given member.id x-1-1565298855983 is identified as a known static member 
x-1,but not matching the expected member.id x-1-1565298855984 
(kafka.coordinator.group.GroupMetadata)

It seems like the client-id is not taken into account by the server in figuring 
the static group membership?

While the workaround is simple -- change the group id of each consumer to 
include the client id -- I don't believe this 

[jira] [Comment Edited] (KAFKA-8715) Static consumer cannot join group due to ERROR in broker

2019-07-25 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893117#comment-16893117
 ] 

Raman Gupta edited comment on KAFKA-8715 at 7/25/19 8:45 PM:
-

[~bchen225242] No it isn't happening all the time. Interestingly, I have two 
different instances of the same 2.3.0 client code running against the same 
2.3.0 Kafka broker. The only difference between them is the configured name of 
the group, and the topics being consumed. One of them get this error, the other 
does not.

In addition, it initially used to work with both consumer groups. It stopped 
working for one group when I restarted the cluster in order to apply an 
unrelated configuration change.


was (Author: rocketraman):
[~bchen225242] No it isn't happening all the time. Interestingly, I have two 
different instances of the same 2.3.0 client code running against the same 
2.3.0 Kafka broker. The only difference between them is the configured name of 
the group, and the topics being consumed. One of them get this error, the other 
does not.

> Static consumer cannot join group due to ERROR in broker
> 
>
> Key: KAFKA-8715
> URL: https://issues.apache.org/jira/browse/KAFKA-8715
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Critical
>
> A streams consumer using a static group instance id is unable to join the 
> group due to an invalid group join  -- the consumer gets the error:
> {code}
> ERROR stream-thread 
> [x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1] Encountered 
> the following unexpected Kafka exception during processing, this usually 
> indicate Streams internal errors:
> [[EXCEPTION: org.apache.kafka.common.KafkaException: Unexpected error in join 
> group response: The server experienced an unexpected error when processing 
> the request.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:599)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:527)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:978)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:958)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:578)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.ensureFreshMetadata(ConsumerNetworkClient.java:172)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:346)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:941)
>  

[jira] [Commented] (KAFKA-8715) Static consumer cannot join group due to ERROR in broker

2019-07-25 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893117#comment-16893117
 ] 

Raman Gupta commented on KAFKA-8715:


[~bchen225242] No it isn't happening all the time. Interestingly, I have two 
different instances of the same 2.3.0 client code running against the same 
2.3.0 Kafka broker. The only difference between them is the configured name of 
the group, and the topics being consumed.

> Static consumer cannot join group due to ERROR in broker
> 
>
> Key: KAFKA-8715
> URL: https://issues.apache.org/jira/browse/KAFKA-8715
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Critical
>
> A streams consumer using a static group instance id is unable to join the 
> group due to an invalid group join  -- the consumer gets the error:
> {code}
> ERROR stream-thread 
> [x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1] Encountered 
> the following unexpected Kafka exception during processing, this usually 
> indicate Streams internal errors:
> [[EXCEPTION: org.apache.kafka.common.KafkaException: Unexpected error in join 
> group response: The server experienced an unexpected error when processing 
> the request.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:599)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:527)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:978)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:958)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:578)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.ensureFreshMetadata(ConsumerNetworkClient.java:172)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:346)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:941)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:846)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
>  [kafka-streams-2.3.0.jar:?]
> ]]
> {code}
> On the broker, I see this error:
> {code}
> [2019-07-25 08:14:11,978] ERROR [KafkaApi-1] Error when handling request: 
> clientId=x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1-consumer,
>  

[jira] [Comment Edited] (KAFKA-8715) Static consumer cannot join group due to ERROR in broker

2019-07-25 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893117#comment-16893117
 ] 

Raman Gupta edited comment on KAFKA-8715 at 7/25/19 8:44 PM:
-

[~bchen225242] No it isn't happening all the time. Interestingly, I have two 
different instances of the same 2.3.0 client code running against the same 
2.3.0 Kafka broker. The only difference between them is the configured name of 
the group, and the topics being consumed. One of them get this error, the other 
does not.


was (Author: rocketraman):
[~bchen225242] No it isn't happening all the time. Interestingly, I have two 
different instances of the same 2.3.0 client code running against the same 
2.3.0 Kafka broker. The only difference between them is the configured name of 
the group, and the topics being consumed.

> Static consumer cannot join group due to ERROR in broker
> 
>
> Key: KAFKA-8715
> URL: https://issues.apache.org/jira/browse/KAFKA-8715
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Critical
>
> A streams consumer using a static group instance id is unable to join the 
> group due to an invalid group join  -- the consumer gets the error:
> {code}
> ERROR stream-thread 
> [x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1] Encountered 
> the following unexpected Kafka exception during processing, this usually 
> indicate Streams internal errors:
> [[EXCEPTION: org.apache.kafka.common.KafkaException: Unexpected error in join 
> group response: The server experienced an unexpected error when processing 
> the request.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:599)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:527)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:978)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:958)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:578)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.ensureFreshMetadata(ConsumerNetworkClient.java:172)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:346)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
>  ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
> ~[kafka-clients-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:941)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:846)
>  ~[kafka-streams-2.3.0.jar:?]
> at 
> 

[jira] [Updated] (KAFKA-8715) Static consumer cannot join group due to ERROR in broker

2019-07-25 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8715:
---
Description: 
A streams consumer using a static group instance id is unable to join the group 
due to an invalid group join  -- the consumer gets the error:

{code}
ERROR stream-thread 
[x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1] Encountered the 
following unexpected Kafka exception during processing, this usually indicate 
Streams internal errors:
[[EXCEPTION: org.apache.kafka.common.KafkaException: Unexpected error in join 
group response: The server experienced an unexpected error when processing the 
request.
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:599)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:527)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:978)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:958)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:578)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.ensureFreshMetadata(ConsumerNetworkClient.java:172)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:346)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
 ~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) 
~[kafka-clients-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:941)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:846)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
 ~[kafka-streams-2.3.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
 [kafka-streams-2.3.0.jar:?]
]]
{code}

On the broker, I see this error:

{code}
[2019-07-25 08:14:11,978] ERROR [KafkaApi-1] Error when handling request: 
clientId=x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1-consumer, 
correlationId=6, api=JOIN_GROUP, 
body={group_id=x-stream,session_timeout_ms=1,rebalance_timeout_ms=30,member_id=,group_instance_id=lcrzf-1,protocol_type=consumer,protocols=[{name=stream,metadata=java.nio.HeapByteBuffer[pos=0
 lim=64 cap=64]}]} (kafka.server.KafkaApis)
java.util.NoSuchElementException: None.get
  at scala.None$.get(Option.scala:366)
  at scala.None$.get(Option.scala:364)
  at 
kafka.coordinator.group.GroupMetadata.generateMemberId(GroupMetadata.scala:368)
  at 
kafka.coordinator.group.GroupCoordinator.$anonfun$doUnknownJoinGroup$1(GroupCoordinator.scala:178)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
  at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
  at 
kafka.coordinator.group.GroupCoordinator.doUnknownJoinGroup(GroupCoordinator.scala:169)
  

[jira] [Updated] (KAFKA-8715) Static consumer cannot join group due to ERROR in broker

2019-07-25 Thread Raman Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-8715:
---
Description: 
A streams consumer using a static group instance id is unable to join the group 
due to an invalid group join  -- the consumer gets the error:

{code}
ERROR stream-thread 
[x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1] Encountered the 
following unexpected Kafka exception during processing, this usually indicate 
Streams internal errors: [[EXCEPTION: org.apache.kafka.common.KafkaException: 
Unexpected error in join group response: The server experienced an unexpected 
error when processing the request.
{code}

On the broker, I see this error:

{code}
[2019-07-25 08:14:11,978] ERROR [KafkaApi-1] Error when handling request: 
clientId=x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1-consumer, 
correlationId=6, api=JOIN_GROUP, 
body={group_id=x-stream,session_timeout_ms=1,rebalance_timeout_ms=30,member_id=,group_instance_id=lcrzf-1,protocol_type=consumer,protocols=[{name=stream,metadata=java.nio.HeapByteBuffer[pos=0
 lim=64 cap=64]}]} (kafka.server.KafkaApis)
java.util.NoSuchElementException: None.get
  at scala.None$.get(Option.scala:366)
  at scala.None$.get(Option.scala:364)
  at 
kafka.coordinator.group.GroupMetadata.generateMemberId(GroupMetadata.scala:368)
  at 
kafka.coordinator.group.GroupCoordinator.$anonfun$doUnknownJoinGroup$1(GroupCoordinator.scala:178)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
  at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
  at 
kafka.coordinator.group.GroupCoordinator.doUnknownJoinGroup(GroupCoordinator.scala:169)
  at 
kafka.coordinator.group.GroupCoordinator.$anonfun$handleJoinGroup$2(GroupCoordinator.scala:144)
  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
  at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
  at 
kafka.coordinator.group.GroupCoordinator.handleJoinGroup(GroupCoordinator.scala:136)
  at kafka.server.KafkaApis.handleJoinGroupRequest(KafkaApis.scala:1389)
  at kafka.server.KafkaApis.handle(KafkaApis.scala:124)
  at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
  at java.base/java.lang.Thread.run(Thread.java:834)
{code}

  was:
A streams consumer using a static group instance id is unable to join the group 
due to an invalid group join  -- the consumer gets the error:

```
ERROR stream-thread 
[x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1] Encountered the 
following unexpected Kafka exception during processing, this usually indicate 
Streams internal errors: [[EXCEPTION: org.apache.kafka.common.KafkaException: 
Unexpected error in join group response: The server experienced an unexpected 
error when processing the request.
```

On the broker, I see this error:

```
[2019-07-25 08:14:11,978] ERROR [KafkaApi-1] Error when handling request: 
clientId=x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1-consumer, 
correlationId=6, api=JOIN_GROUP, 
body={group_id=x-stream,session_timeout_ms=1,rebalance_timeout_ms=30,member_id=,group_instance_id=lcrzf-1,protocol_type=consumer,protocols=[{name=stream,metadata=java.nio.HeapByteBuffer[pos=0
 lim=64 cap=64]}]} (kafka.server.KafkaApis)
java.util.NoSuchElementException: None.get
  at scala.None$.get(Option.scala:366)
  at scala.None$.get(Option.scala:364)
  at 
kafka.coordinator.group.GroupMetadata.generateMemberId(GroupMetadata.scala:368)
  at 
kafka.coordinator.group.GroupCoordinator.$anonfun$doUnknownJoinGroup$1(GroupCoordinator.scala:178)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
  at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
  at 
kafka.coordinator.group.GroupCoordinator.doUnknownJoinGroup(GroupCoordinator.scala:169)
  at 
kafka.coordinator.group.GroupCoordinator.$anonfun$handleJoinGroup$2(GroupCoordinator.scala:144)
  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
  at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
  at 
kafka.coordinator.group.GroupCoordinator.handleJoinGroup(GroupCoordinator.scala:136)
  at kafka.server.KafkaApis.handleJoinGroupRequest(KafkaApis.scala:1389)
  at kafka.server.KafkaApis.handle(KafkaApis.scala:124)
  at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
  at java.base/java.lang.Thread.run(Thread.java:834)
```


> Static consumer cannot join group due to ERROR in broker
> 
>
> Key: KAFKA-8715
> URL: https://issues.apache.org/jira/browse/KAFKA-8715
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Raman Gupta
>Priority: Critical
>

[jira] [Created] (KAFKA-8715) Static consumer cannot join group due to ERROR in broker

2019-07-25 Thread Raman Gupta (JIRA)
Raman Gupta created KAFKA-8715:
--

 Summary: Static consumer cannot join group due to ERROR in broker
 Key: KAFKA-8715
 URL: https://issues.apache.org/jira/browse/KAFKA-8715
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Raman Gupta


A streams consumer using a static group instance id is unable to join the group 
due to an invalid group join  -- the consumer gets the error:

```
ERROR stream-thread 
[x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1] Encountered the 
following unexpected Kafka exception during processing, this usually indicate 
Streams internal errors: [[EXCEPTION: org.apache.kafka.common.KafkaException: 
Unexpected error in join group response: The server experienced an unexpected 
error when processing the request.
```

On the broker, I see this error:

```
[2019-07-25 08:14:11,978] ERROR [KafkaApi-1] Error when handling request: 
clientId=x-stream-4a43d5d4-d38f-4cb0-8741-7a6c685abf15-StreamThread-1-consumer, 
correlationId=6, api=JOIN_GROUP, 
body={group_id=x-stream,session_timeout_ms=1,rebalance_timeout_ms=30,member_id=,group_instance_id=lcrzf-1,protocol_type=consumer,protocols=[{name=stream,metadata=java.nio.HeapByteBuffer[pos=0
 lim=64 cap=64]}]} (kafka.server.KafkaApis)
java.util.NoSuchElementException: None.get
  at scala.None$.get(Option.scala:366)
  at scala.None$.get(Option.scala:364)
  at 
kafka.coordinator.group.GroupMetadata.generateMemberId(GroupMetadata.scala:368)
  at 
kafka.coordinator.group.GroupCoordinator.$anonfun$doUnknownJoinGroup$1(GroupCoordinator.scala:178)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
  at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
  at 
kafka.coordinator.group.GroupCoordinator.doUnknownJoinGroup(GroupCoordinator.scala:169)
  at 
kafka.coordinator.group.GroupCoordinator.$anonfun$handleJoinGroup$2(GroupCoordinator.scala:144)
  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
  at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
  at 
kafka.coordinator.group.GroupCoordinator.handleJoinGroup(GroupCoordinator.scala:136)
  at kafka.server.KafkaApis.handleJoinGroupRequest(KafkaApis.scala:1389)
  at kafka.server.KafkaApis.handle(KafkaApis.scala:124)
  at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
  at java.base/java.lang.Thread.run(Thread.java:834)
```



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID

2019-07-24 Thread Raman Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891992#comment-16891992
 ] 

Raman Gupta commented on KAFKA-7190:


> Did you override message.timestamp.difference.max.ms?

No.

> Under low traffic conditions purging repartition topics cause WARN statements 
> about  UNKNOWN_PRODUCER_ID 
> -
>
> Key: KAFKA-7190
> URL: https://issues.apache.org/jira/browse/KAFKA-7190
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, streams
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Bill Bejeck
>Assignee: Guozhang Wang
>Priority: Major
>
> When a streams application has little traffic, then it is possible that 
> consumer purging would delete
> even the last message sent by a producer (i.e., all the messages sent by
> this producer have been consumed and committed), and as a result, the broker
> would delete that producer's ID. The next time when this producer tries to
> send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case,
> this error is retriable: the producer would just get a new producer id and
> retries, and then this time it will succeed. 
>  
> Possible fixes could be on the broker side, i.e., delaying the deletion of 
> the produderIDs for a more extended period or on the streams side developing 
> a more conservative approach to deleting offsets from repartition topics
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   >