[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898623#comment-15898623 ] Jin Tianfan commented on KAFKA-4739: is this problem sloved? > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897164#comment-15897164 ] Jin Tianfan commented on KAFKA-4739: unfortunately,I met the same problem.Is this problem sloved? > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861745#comment-15861745 ] Sagar Sadashiv Patwardhan commented on KAFKA-4739: -- [~hachikuji] [~huxi_2b] Thanks for getting back to us. We really appreciate your help! :) It is interesting to see that between L151 and L178 of https://gist.github.com/neoeahit/3a0a5027bc3499b85cb888918faac2a3 kafka client does not send any requests to the broker. We get successful heartbeat responses from the coordinator, but we still timeout for some reason after ~40 secs(default request.timeout.ms). I have not read the client code in detail(may be I should), but if the heartbeats are successful, then why are we timing out? Also, the request timeout is 500 msec per documentation, so how is this timeout related to request.timeout.ms(40 secs). > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860768#comment-15860768 ] Jason Gustafson commented on KAFKA-4739: [~neoeahit] Thanks for the update. These logs look much more like what I expect. There doesn't appear to be anything wrong with the consumer, but it would be nice to confirm it. 1. Can you use the console consumer both with and without the {{--new-consumer}} option to consume from the same exact topics? 2. You mentioned that you can reproduce the problem by restarting the process repeatedly, can you explain this a bit more? Can you get the same behavior using the console consumer? 3. If you can get the consumer into this state again, can you get a thread dump from the broker? There have been a couple deadlocks fixed since 0.9.0.1, so it would be nice to confirm that we're not hitting one of them. 4. What do you do currently to recover? Since you said restarting the consumers doesn't help, what does? Restarting the brokers? > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860011#comment-15860011 ] Vipul Singh commented on KAFKA-4739: Hey [~hachikuji]. We tried to reproduce the issue again. Please note: 1. In our broker config we use *request.timeout.ms* of *31*, and *group.max.session.timeout.ms* of *30* 2. In out client config, the only thing we have different from the default is in this gist: https://gist.github.com/neoeahit/c1d4027b975b95267e3cbe506899aef8 3. We tried to grep for our consumer group during the time the disconnection was happening(https://gist.github.com/neoeahit/622515ba391ddf8566bf09af880a6ae0). If you see between 17:55:50,614 and 17:55:51,027, we weren't able to find any requests. 4. The logs at the client side, around this time are here. https://gist.github.com/neoeahit/3a0a5027bc3499b85cb888918faac2a3 ( Please note we have two brokers, and their ip's are 1.1.1.1 and 1.1.1.6[actual ips have been changed for the purpose of making logs publicly available]) [~huxi_2b] I am puzzled by that 40 seconds myself. We dont set it in the config anywhere, yet we are seeing this in the logs. One other thing which is a bit puzzling is that the max_wait_time=500 in client requests, dosent seem to be honored. Maybe because the connection is already disconnected? Please help us with any pointers, or any troubleshooting steps which we can use to help figure this issue. This is causing us a lot of pain, with consumers randomly being blocked. > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855232#comment-15855232 ] huxi commented on KAFKA-4739: - Seems that FETCH requests time out every 40 seconds and 40 second is the default value for config `request.timeout.ms` of the new consumer. Could you make sure if your settings take effect? And why do you set a probably large timeout value? Poor network condition? > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855098#comment-15855098 ] Jason Gustafson commented on KAFKA-4739: [~sagar8192] Unfortunately, there is no such option. Traditionally, kafka clients attempt to handle broker failures internally. This usually means a metadata refresh and a reconnect, which is exactly what the client appears to be doing here. We normally expect that the assigned partitions are spread across multiple brokers, so a failure fetching from any particular broker should only affect the availability of the partitions it was hosting. This is typically what you want since a broker failure will cause another broker to take over its partitions. There is little applications can do in these cases anyway other than possibly sending an alert. Nevertheless, this behavior is often contested and may change, especially as some of the automatic behavior (such as topic auto-creation) is retired. One small request: the logs seem to have sanitized broker ids. Can you ensure that they have all been updated consistently? The puzzling thing is that the the requests appear to be timing out on the client after 30s, yet you've enabled 120s in the config. Are you sure the 120s is correct? In which config did you enable "request_timeout_ms = 31" (the broker doesn't have such a config)? It's also strange that multiple fetches are cancelled after a disconnect. The consumer should only ever have one fetch in-flight for each broker. I don't have a ready explanation for that. Could there be some details left out of the logs? We might get more information if you enable TRACE level logging. > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854956#comment-15854956 ] Sagar Sadashiv Patwardhan commented on KAFKA-4739: -- [~hachikuji] I am with Vipul Singh. Is there a limit(timeout or number of attempts) on how many times kafka client tries to reconnect to the broker? We run into situations where a kafka client acquires multiple partitions but does not read any messages due to this issue. It would be great if kafka client could crash(or at least revoke the acquired partitions in case of prolonged network connection issues) after x number of attempts or some timeout. We run multiple consumers, and the other consumers read the messages without any issue from this broker(partition(s)). > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854943#comment-15854943 ] Vipul Singh commented on KAFKA-4739: consumer config: https://gist.github.com/neoeahit/c1d4027b975b95267e3cbe506899aef8 We have request_timeout_ms = 31 on the broker side. So the "Cancelled FETCH request", during the reconnect backoff is expected, i guess. We looked on the broker server.log, and did not find anything for this consumer during the time this issue was happening. We added debug level logging and were able to see the heartbeat requests on kafka-request.log. But nothing else. The brokers seem to be healthy, we are not seeing this issue with other consumers. The only thing special about this consumer is that it uses KafkaConsumer with these configs. The other consumers use python to consume. > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854923#comment-15854923 ] Jason Gustafson commented on KAFKA-4739: [~wushujames] Sounds a bit different. Not sure yet, but from the logs, this looks like more like a problem on the broker. Might be worth opening a separate JIRA for your issue, though we'll probably end up suggesting you upgrade 0.10.0.1 and see if the problem persists (especially if you've already upgraded the brokers). From memory, there were a few cases we've fixed where the retry backoff wasn't being observed. > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > > {quote} > DEBUG [2017-02-03 17:05:17,971] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed > offset abc for partition topic1-partition0 > DEBUG [2017-02-03 17:05:17,971] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed > offset abc1 for partition topic2-partition0 > DEBUG [2017-02-03 17:05:18,112] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:19,828] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:20,902] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:22,860] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:24,112] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:25,884] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:27,109] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:28,860] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:30,112] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:31,827] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:33,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:34,834] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:36,269] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:37,838] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:39,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:40,824] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:42,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:43,825] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:45,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:46,840] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:48,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:49,823] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:51,269] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received >
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854905#comment-15854905 ] James Cheng commented on KAFKA-4739: I wonder if we're seeing the same thing as this. Kafka 0.9.0.1 consumer, talking to 0.10.0.1 broker. Our kafka cluster went down. While the cluster was down, this consumer wrote this into the logs at the rate of 1600 times per second for an extended period of time: 2016-12-23 04:42:46 [pool-2-thread-1] INFO o.a.k.c.c.i.AbstractCoordinator - Marking the coordinator 2147483643 dead. > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > > {quote} > DEBUG [2017-02-03 17:05:17,971] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed > offset abc for partition topic1-partition0 > DEBUG [2017-02-03 17:05:17,971] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed > offset abc1 for partition topic2-partition0 > DEBUG [2017-02-03 17:05:18,112] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:19,828] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:20,902] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:22,860] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:24,112] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:25,884] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:27,109] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:28,860] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:30,112] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:31,827] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:33,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:34,834] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:36,269] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:37,838] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:39,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:40,824] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:42,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:43,825] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:45,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:46,840] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:48,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:49,823] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:51,269] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854894#comment-15854894 ] Jason Gustafson commented on KAFKA-4739: bq. An observation from our side: it looks like the network client never attempts to reconnect to the broker after getting disconnected. It cancels all the in-flight requests, but never attempts to reconnect. Hmm... Actually I do see multiple reconnects to "xyz." It seems the request timeout is set to 30 seconds? After connection, that's how long it seems to take for the disconnect to appear. There is a delay of 50ms after disconnecting before reconnecting (this is the reconnect backoff). Can you also provide your consumer config (it's a bit easier to process if you attach a document to the ticket instead of appending to the description). Also, how is the health of your brokers? It would be good to check the logs of "xyz" to see if there are any hints to why the fetches are timing out. > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > > {quote} > DEBUG [2017-02-03 17:05:17,971] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed > offset abc for partition topic1-partition0 > DEBUG [2017-02-03 17:05:17,971] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed > offset abc1 for partition topic2-partition0 > DEBUG [2017-02-03 17:05:18,112] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:19,828] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:20,902] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:22,860] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:24,112] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:25,884] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:27,109] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:28,860] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:30,112] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:31,827] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:33,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:34,834] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:36,269] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:37,838] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:39,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:40,824] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:42,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:43,825] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:45,268] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03 17:05:46,840] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received > successful heartbeat response. > DEBUG [2017-02-03
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854842#comment-15854842 ] Vipul Singh commented on KAFKA-4739: [~hachikuji], I have updated the description of the jira with updated logs. To answer your questions: 1. Once the consumer reaches this state, then around 10 every second, the new log lines posted above should help establish this. 2. Yes, both are running on 0.9.0.1 at the moment 3. I have posted logs with timestamps above to help answer this question 4. I am afraid not at the moment. An observation from our side: it looks like the network client never attempts to reconnect to the broker after getting disconnected. It cancels all the in-flight requests, but never attempts to reconnect. A couple of configs we use: on broker side: request_timeout_ms = 31 on consumer side: session.timeout.ms = 12 request.timeout.ms = 120001 Hope this helps! > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient: > Cancelled FETCH request ClientRequest(metadata info) with correlation id abc > due to node xyz being disconnected > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.common.errors.DisconnectException: null > DEBUG org.apache.kafka.clients.NetworkClient: Initiating connection to node > abc at nodename:port > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUGorg.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.NetworkClient: Completed connection to node xyz > DEBUG org.apache.kafka.clients.Metadata: Updated cluster metadata version 4 > to Cluster(cluster_info) > DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: > Received successful heartbeat response. > DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: > Received successful heartbeat response. > DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: > Received successful heartbeat response. > DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: > Received successful heartbeat response. > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854731#comment-15854731 ] Jason Gustafson commented on KAFKA-4739: [~neoeahit] Thanks for the report. Would you mind providing the raw logs? It's useful to see the log timestamps. A couple additional questions: 1. I noticed a disconnect in there. How often do you see the {{DIsconnectException}} in the logs? Any reason the connection would be unstable? 2. Are the brokers on the same version as the client? 3. Hard to say without seeing the additional logs, but the {{SendFailedException}} errors could be benign. After a disconnect, the connection would be "blacked out" for a short time (50ms or so I think). During that time, we wouldn't be able to send fetches 4. We've improved the consumer network internals in recent releases. Is upgrading to 0.10 an option? > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient: > Cancelled FETCH request ClientRequest(metadata info) with correlation id abc > due to node xyz being disconnected > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.common.errors.DisconnectException: null > DEBUG org.apache.kafka.clients.NetworkClient: Initiating connection to node > abc at nodename:port > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUGorg.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed > ! org.apache.kafka.clients.consumer.internals.SendFailedException: null > DEBUG org.apache.kafka.clients.NetworkClient: Completed connection to node xyz > DEBUG org.apache.kafka.clients.Metadata: Updated cluster metadata version 4 > to Cluster(cluster_info) > DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: > Received successful heartbeat response. > DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: > Received successful heartbeat response. > DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: > Received successful heartbeat response. > DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: > Received successful heartbeat response. > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)