[jira] [Commented] (KAFKA-4674) Frequent ISR shrinking and expanding and disconnects among brokers

2017-03-31 Thread varun sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951127#comment-15951127
 ] 

varun sharma commented on KAFKA-4674:
-

[~mjuarez], We are facing the same issue with kafka 0.10.2.0 as well.
Did you found some fix for this? If yes, please share.
Did you have too many connections from producers? because we are suspecting it 
can be because of this.

> Frequent ISR shrinking and expanding and disconnects among brokers
> --
>
> Key: KAFKA-4674
> URL: https://issues.apache.org/jira/browse/KAFKA-4674
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core
>Affects Versions: 0.10.0.1
> Environment: OS: Redhat Linux 2.6.32-431.el6.x86_64
> JDK: 1.8.0_45
>Reporter: Kaiming Wan
> Attachments: controller.log.rar, kafkabroker.20170221.log.zip, 
> server.log.2017-01-11-14, zookeeper.out.2017-01-11.log
>
>
> We use a kafka cluster with 3 brokers in production environment. It works 
> well for several month. Recently, we get the UnderReplicatedPartitions>0 
> warning mail. When we check the log, we find that the partition is always 
> experience ISR shrinking and expanding. And the disconnection exception can 
> be found in controller's log.
> We also found some deviant output in zookeeper's log which point to a 
> consumer(using old API depends on zookeeper ) which has stopped its work with 
> many lags.
> Actually, it is not the first time we encounter this problem. When we 
> first met this problem, we also found the same phenomenon and the log output. 
> We solve the problem by deleting the consumer node info in zookeeper. Then 
> everything goes well.
> However, this time, after we deleting the consumer which already have 
> large lag, the frequent ISR shrinking and expanding didn't stop for a very 
> long time(serveral hours). Though, the issue didn't affect our consumer and 
> producer, we think it will make our cluster unstable. So at last, we solve 
> this problem by restart the controller broker.
> And now I wander what cause this problem. I check the source code and 
> only know poll timeout will cause disconnection and ISR shrinking. Is the 
> issue related to zookeeper because it will not hold too many metadata 
> modification and make the replication fetch thread take more time?
> I upload the log file in the attachment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4674) Frequent ISR shrinking and expanding and disconnects among brokers

2017-02-22 Thread mjuarez (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879553#comment-15879553
 ] 

mjuarez commented on KAFKA-4674:


This is happening to us in one our clusters in prod today, across multiple 
topics, and basically on all nodes.  The most impacted topic/partitions seem to 
be the ones that have the highest volume, but every topic that has some 
activity is being impacted.

Unlike [~344277...@qq.com], we are not seeing any exceptions in the logs.  See 
attached broker log file.  Please let me know if you need more log files and/or 
details.

> Frequent ISR shrinking and expanding and disconnects among brokers
> --
>
> Key: KAFKA-4674
> URL: https://issues.apache.org/jira/browse/KAFKA-4674
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core
>Affects Versions: 0.10.0.1
> Environment: OS: Redhat Linux 2.6.32-431.el6.x86_64
> JDK: 1.8.0_45
>Reporter: Kaiming Wan
> Attachments: controller.log.rar, kafkabroker.20170221.log.zip, 
> server.log.2017-01-11-14, zookeeper.out.2017-01-11.log
>
>
> We use a kafka cluster with 3 brokers in production environment. It works 
> well for several month. Recently, we get the UnderReplicatedPartitions>0 
> warning mail. When we check the log, we find that the partition is always 
> experience ISR shrinking and expanding. And the disconnection exception can 
> be found in controller's log.
> We also found some deviant output in zookeeper's log which point to a 
> consumer(using old API depends on zookeeper ) which has stopped its work with 
> many lags.
> Actually, it is not the first time we encounter this problem. When we 
> first met this problem, we also found the same phenomenon and the log output. 
> We solve the problem by deleting the consumer node info in zookeeper. Then 
> everything goes well.
> However, this time, after we deleting the consumer which already have 
> large lag, the frequent ISR shrinking and expanding didn't stop for a very 
> long time(serveral hours). Though, the issue didn't affect our consumer and 
> producer, we think it will make our cluster unstable. So at last, we solve 
> this problem by restart the controller broker.
> And now I wander what cause this problem. I check the source code and 
> only know poll timeout will cause disconnection and ISR shrinking. Is the 
> issue related to zookeeper because it will not hold too many metadata 
> modification and make the replication fetch thread take more time?
> I upload the log file in the attachment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4674) Frequent ISR shrinking and expanding and disconnects among brokers

2017-01-19 Thread Kaiming Wan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829542#comment-15829542
 ] 

Kaiming Wan commented on KAFKA-4674:


[~huxi_2b]I have read that issue, but still not sure what cause the problem. 
Can you tell me what cause the problem? Is it the ideal way to solve this issue 
by upgrading to kafka 0.10.1.0?

> Frequent ISR shrinking and expanding and disconnects among brokers
> --
>
> Key: KAFKA-4674
> URL: https://issues.apache.org/jira/browse/KAFKA-4674
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core
>Affects Versions: 0.10.0.1
> Environment: OS: Redhat Linux 2.6.32-431.el6.x86_64
> JDK: 1.8.0_45
>Reporter: Kaiming Wan
> Attachments: controller.log.rar, server.log.2017-01-11-14, 
> zookeeper.out.2017-01-11.log
>
>
> We use a kafka cluster with 3 brokers in production environment. It works 
> well for several month. Recently, we get the UnderReplicatedPartitions>0 
> warning mail. When we check the log, we find that the partition is always 
> experience ISR shrinking and expanding. And the disconnection exception can 
> be found in controller's log.
> We also found some deviant output in zookeeper's log which point to a 
> consumer(using old API depends on zookeeper ) which has stopped its work with 
> many lags.
> Actually, it is not the first time we encounter this problem. When we 
> first met this problem, we also found the same phenomenon and the log output. 
> We solve the problem by deleting the consumer node info in zookeeper. Then 
> everything goes well.
> However, this time, after we deleting the consumer which already have 
> large lag, the frequent ISR shrinking and expanding didn't stop for a very 
> long time(serveral hours). Though, the issue didn't affect our consumer and 
> producer, we think it will make our cluster unstable. So at last, we solve 
> this problem by restart the controller broker.
> And now I wander what cause this problem. I check the source code and 
> only know poll timeout will cause disconnection and ISR shrinking. Is the 
> issue related to zookeeper because it will not hold too many metadata 
> modification and make the replication fetch thread take more time?
> I upload the log file in the attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4674) Frequent ISR shrinking and expanding and disconnects among brokers

2017-01-19 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829525#comment-15829525
 ] 

huxi commented on KAFKA-4674:
-

Is it a duplicate of 
[KAFKA-3916|https://issues.apache.org/jira/browse/KAFKA-3916]?

> Frequent ISR shrinking and expanding and disconnects among brokers
> --
>
> Key: KAFKA-4674
> URL: https://issues.apache.org/jira/browse/KAFKA-4674
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core
>Affects Versions: 0.10.0.1
> Environment: OS: Redhat Linux 2.6.32-431.el6.x86_64
> JDK: 1.8.0_45
>Reporter: Kaiming Wan
> Attachments: controller.log.rar, server.log.2017-01-11-14, 
> zookeeper.out.2017-01-11.log
>
>
> We use a kafka cluster with 3 brokers in production environment. It works 
> well for several month. Recently, we get the UnderReplicatedPartitions>0 
> warning mail. When we check the log, we find that the partition is always 
> experience ISR shrinking and expanding. And the disconnection exception can 
> be found in controller's log.
> We also found some deviant output in zookeeper's log which point to a 
> consumer(using old API depends on zookeeper ) which has stopped its work with 
> many lags.
> Actually, it is not the first time we encounter this problem. When we 
> first met this problem, we also found the same phenomenon and the log output. 
> We solve the problem by deleting the consumer node info in zookeeper. Then 
> everything goes well.
> However, this time, after we deleting the consumer which already have 
> large lag, the frequent ISR shrinking and expanding didn't stop for a very 
> long time(serveral hours). Though, the issue didn't affect our consumer and 
> producer, we think it will make our cluster unstable. So at last, we solve 
> this problem by restart the controller broker.
> And now I wander what cause this problem. I check the source code and 
> only know poll timeout will cause disconnection and ISR shrinking. Is the 
> issue related to zookeeper because it will not hold too many metadata 
> modification and make the replication fetch thread take more time?
> I upload the log file in the attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)