[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-03-06 Thread Jin Tianfan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898623#comment-15898623
 ] 

Jin Tianfan commented on KAFKA-4739:


is this problem sloved?

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-03-06 Thread Jin Tianfan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897164#comment-15897164
 ] 

Jin Tianfan commented on KAFKA-4739:


unfortunately,I met the same problem.Is this problem sloved?

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-10 Thread Sagar Sadashiv Patwardhan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861745#comment-15861745
 ] 

Sagar Sadashiv Patwardhan commented on KAFKA-4739:
--

[~hachikuji] [~huxi_2b] Thanks for getting back to us. We really appreciate 
your help! :)

It is interesting to see that between L151 and L178 of 
https://gist.github.com/neoeahit/3a0a5027bc3499b85cb888918faac2a3 kafka client 
does not send any requests to the broker. We get successful heartbeat responses 
from the coordinator, but we still timeout for some reason after ~40 
secs(default request.timeout.ms). I have not read the client code in detail(may 
be I should), but if the heartbeats are successful, then why are we timing out? 
Also, the request timeout is 500 msec per documentation, so how is this timeout 
related to request.timeout.ms(40 secs).

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-09 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860768#comment-15860768
 ] 

Jason Gustafson commented on KAFKA-4739:


[~neoeahit] Thanks for the update. These logs look much more like what I 
expect. There doesn't appear to be anything wrong with the consumer, but it 
would be nice to confirm it. 

1. Can you use the console consumer both with and without the 
{{--new-consumer}} option to consume from the same exact topics? 
2. You mentioned that you can reproduce the problem by restarting the process 
repeatedly, can you explain this a bit more? Can you get the same behavior 
using the console consumer? 
3. If you can get the consumer into this state again, can you get a thread dump 
from the broker? There have been a couple deadlocks fixed since 0.9.0.1, so it 
would be nice to confirm that we're not hitting one of them.
4. What do you do currently to recover? Since you said restarting the consumers 
doesn't help, what does? Restarting the brokers?

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-09 Thread Vipul Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860011#comment-15860011
 ] 

Vipul Singh commented on KAFKA-4739:


Hey [~hachikuji]. 

We tried to reproduce the issue again.

Please note: 
1. In our broker config we use *request.timeout.ms* of *31*, and 
*group.max.session.timeout.ms* of *30*
2. In out client config, the only thing we have different from the default is 
in this gist: https://gist.github.com/neoeahit/c1d4027b975b95267e3cbe506899aef8
3. We tried to grep for our consumer group during the time the disconnection 
was 
happening(https://gist.github.com/neoeahit/622515ba391ddf8566bf09af880a6ae0). 
If you see between 17:55:50,614 and 17:55:51,027, we weren't able to find any 
requests.
4. The logs at the client side, around this time are here. 
https://gist.github.com/neoeahit/3a0a5027bc3499b85cb888918faac2a3 ( Please note 
we have two brokers, and their ip's are 1.1.1.1 and 1.1.1.6[actual ips have 
been changed for the purpose of making logs publicly available])

[~huxi_2b] I am puzzled by that 40 seconds myself. We dont set it in the config 
anywhere, yet we are seeing this in the logs.
One other thing which is a bit puzzling is that the max_wait_time=500 in client 
requests, dosent seem to be honored. Maybe because the connection is already 
disconnected?

Please help us with any pointers, or any troubleshooting steps which we can use 
to help figure this issue. This is causing us a lot of pain, with consumers 
randomly being blocked. 




> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855232#comment-15855232
 ] 

huxi commented on KAFKA-4739:
-

Seems that FETCH requests time out every 40 seconds and 40 second is the 
default value for config `request.timeout.ms` of the new consumer. Could you 
make sure if your settings take effect? And why do you set a probably large 
timeout value? Poor network condition?

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855098#comment-15855098
 ] 

Jason Gustafson commented on KAFKA-4739:


[~sagar8192] Unfortunately, there is no such option. Traditionally, kafka 
clients attempt to handle broker failures internally. This usually means a 
metadata refresh and a reconnect, which is exactly what the client appears to 
be doing here. We normally expect that the assigned partitions are spread 
across multiple brokers, so a failure fetching from any particular broker 
should only affect the availability of the partitions it was hosting. This is 
typically what you want since a broker failure will cause another broker to 
take over its partitions. There is little applications can do in these cases 
anyway other than possibly sending an alert. Nevertheless, this behavior is 
often contested and may change, especially as some of the automatic behavior 
(such as topic auto-creation) is retired.

One small request: the logs seem to have sanitized broker ids. Can you ensure 
that they have all been updated consistently? The puzzling thing is that the 
the requests appear to be timing out on the client after 30s, yet you've 
enabled 120s in the config. Are you sure the 120s is correct? In which config 
did you enable "request_timeout_ms = 31" (the broker doesn't have such a 
config)? It's also strange that multiple fetches are cancelled after a 
disconnect. The consumer should only ever have one fetch in-flight for each 
broker. I don't have a ready explanation for that. Could there be some details 
left out of the logs? We might get more information if you enable TRACE level 
logging.

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread Sagar Sadashiv Patwardhan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854956#comment-15854956
 ] 

Sagar Sadashiv Patwardhan commented on KAFKA-4739:
--

[~hachikuji] I am with Vipul Singh. Is there a limit(timeout or number of 
attempts) on how many times kafka client tries to reconnect to the broker? We 
run into situations where a kafka client acquires multiple partitions but does 
not read any messages due to this issue. It would be great if kafka client 
could crash(or at least revoke the acquired partitions in case of prolonged 
network connection issues) after x number of attempts or some timeout. We run 
multiple consumers, and the other consumers read the messages without any issue 
from this broker(partition(s)).

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread Vipul Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854943#comment-15854943
 ] 

Vipul Singh commented on KAFKA-4739:


consumer config: 
https://gist.github.com/neoeahit/c1d4027b975b95267e3cbe506899aef8

We have request_timeout_ms = 31 on the broker side.
So the "Cancelled FETCH request", during the reconnect backoff is expected, i 
guess.

We looked on the broker server.log, and did not find anything for this consumer 
during the time this issue was happening. 
We added debug level logging and were able to see the heartbeat requests on 
kafka-request.log. But nothing else. The brokers seem to be healthy, we are not 
seeing this issue with other consumers. The only thing special about this 
consumer is that it uses KafkaConsumer with these configs. The other consumers 
use python to consume.



> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854923#comment-15854923
 ] 

Jason Gustafson commented on KAFKA-4739:


[~wushujames] Sounds a bit different. Not sure yet, but from the logs, this 
looks like more like a problem on the broker. Might be worth opening a separate 
JIRA for your issue, though we'll probably end up suggesting you upgrade 
0.10.0.1 and see if the problem persists (especially if you've already upgraded 
the brokers). From memory, there were a few cases we've fixed where the retry 
backoff wasn't being observed.

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs:
> 
> {quote}
> DEBUG [2017-02-03 17:05:17,971] 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed 
> offset abc for partition topic1-partition0
> DEBUG [2017-02-03 17:05:17,971] 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed 
> offset abc1 for partition topic2-partition0
> DEBUG [2017-02-03 17:05:18,112] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:19,828] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:20,902] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:22,860] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:24,112] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:25,884] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:27,109] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:28,860] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:30,112] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:31,827] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:33,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:34,834] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:36,269] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:37,838] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:39,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:40,824] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:42,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:43,825] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:45,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:46,840] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:48,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:49,823] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:51,269] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> 

[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854905#comment-15854905
 ] 

James Cheng commented on KAFKA-4739:


I wonder if we're seeing the same thing as this.

Kafka 0.9.0.1 consumer, talking to 0.10.0.1 broker. Our kafka cluster went 
down. While the cluster was down, this consumer wrote this into the logs at the 
rate of 1600 times per second for an extended period of time:

2016-12-23 04:42:46 [pool-2-thread-1] INFO  o.a.k.c.c.i.AbstractCoordinator - 
Marking the coordinator 2147483643 dead.



> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs:
> 
> {quote}
> DEBUG [2017-02-03 17:05:17,971] 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed 
> offset abc for partition topic1-partition0
> DEBUG [2017-02-03 17:05:17,971] 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed 
> offset abc1 for partition topic2-partition0
> DEBUG [2017-02-03 17:05:18,112] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:19,828] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:20,902] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:22,860] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:24,112] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:25,884] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:27,109] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:28,860] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:30,112] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:31,827] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:33,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:34,834] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:36,269] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:37,838] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:39,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:40,824] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:42,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:43,825] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:45,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:46,840] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:48,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:49,823] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:51,269] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG 

[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854894#comment-15854894
 ] 

Jason Gustafson commented on KAFKA-4739:


bq. An observation from our side: it looks like the network client never 
attempts to reconnect to the broker after getting disconnected. It cancels all 
the in-flight requests, but never attempts to reconnect.

Hmm... Actually I do see multiple reconnects to "xyz." It seems the request 
timeout is set to 30 seconds? After connection, that's how long it seems to 
take for the disconnect to appear. There is a delay of 50ms after disconnecting 
before reconnecting (this is the reconnect backoff).

Can you also provide your consumer config (it's a bit easier to process if you 
attach a document to the ticket instead of appending to the description). Also, 
how is the health of your brokers? It would be good to check the logs of "xyz" 
to see if there are any hints to why the fetches are timing out.

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs:
> 
> {quote}
> DEBUG [2017-02-03 17:05:17,971] 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed 
> offset abc for partition topic1-partition0
> DEBUG [2017-02-03 17:05:17,971] 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator: Committed 
> offset abc1 for partition topic2-partition0
> DEBUG [2017-02-03 17:05:18,112] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:19,828] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:20,902] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:22,860] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:24,112] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:25,884] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:27,109] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:28,860] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:30,112] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:31,827] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:33,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:34,834] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:36,269] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:37,838] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:39,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:40,824] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:42,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:43,825] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:45,268] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 17:05:46,840] 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator: Received 
> successful heartbeat response.
> DEBUG [2017-02-03 

[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread Vipul Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854842#comment-15854842
 ] 

Vipul Singh commented on KAFKA-4739:


[~hachikuji], I have updated the description of the jira with updated logs.
 To answer your questions:

1. Once the consumer reaches this state, then around 10 every second, the new 
log lines posted above should help establish this.
2. Yes, both are running on 0.9.0.1 at the moment
3. I have posted logs with timestamps above to help answer this question
4. I am afraid not at the moment. 

An observation from our side: it looks like the network client never attempts 
to reconnect to the broker after getting disconnected. It cancels all the 
in-flight requests, but never attempts to reconnect.

A couple of configs we use:

on broker side:
request_timeout_ms = 31

on consumer side:
session.timeout.ms = 12
request.timeout.ms = 120001

Hope this helps!

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs:
> DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient: 
> Cancelled FETCH request ClientRequest(metadata info) with correlation id abc 
> due to node xyz being disconnected
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.common.errors.DisconnectException: null
> DEBUG org.apache.kafka.clients.NetworkClient: Initiating connection to node 
> abc at nodename:port
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG  org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG  org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG  org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUGorg.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.NetworkClient: Completed connection to node xyz
> DEBUG  org.apache.kafka.clients.Metadata: Updated cluster metadata version 4 
> to Cluster(cluster_info)
> DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: 
> Received successful heartbeat response.
> DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: 
> Received successful heartbeat response.
> DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: 
> Received successful heartbeat response.
> DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: 
> Received successful heartbeat response.
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854731#comment-15854731
 ] 

Jason Gustafson commented on KAFKA-4739:


[~neoeahit] Thanks for the report. Would you mind providing the raw logs? It's 
useful to see the log timestamps. A couple additional questions:

1. I noticed a disconnect in there. How often do you see the 
{{DIsconnectException}} in the logs? Any reason the connection would be 
unstable?
2. Are the brokers on the same version as the client?
3. Hard to say without seeing the additional logs, but the 
{{SendFailedException}} errors could be benign. After a disconnect, the 
connection would be "blacked out" for a short time (50ms or so I think). During 
that time, we wouldn't be able to send fetches
4. We've improved the consumer network internals in recent releases. Is 
upgrading to 0.10 an option?

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs:
> DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient: 
> Cancelled FETCH request ClientRequest(metadata info) with correlation id abc 
> due to node xyz being disconnected
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.common.errors.DisconnectException: null
> DEBUG org.apache.kafka.clients.NetworkClient: Initiating connection to node 
> abc at nodename:port
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG  org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG  org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG  org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUGorg.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.consumer.internals.Fetcher: Fetch failed
> ! org.apache.kafka.clients.consumer.internals.SendFailedException: null
> DEBUG org.apache.kafka.clients.NetworkClient: Completed connection to node xyz
> DEBUG  org.apache.kafka.clients.Metadata: Updated cluster metadata version 4 
> to Cluster(cluster_info)
> DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: 
> Received successful heartbeat response.
> DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: 
> Received successful heartbeat response.
> DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: 
> Received successful heartbeat response.
> DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator: 
> Received successful heartbeat response.
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)