[jira] [Commented] (KAFKA-3916) Connection from controller to broker disconnects

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432004#comment-15432004
 ] 

ASF GitHub Bot commented on KAFKA-3916:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1734


> Connection from controller to broker disconnects
> 
>
> Key: KAFKA-3916
> URL: https://issues.apache.org/jira/browse/KAFKA-3916
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1
>Reporter: Dave Powell
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0
>
>
> We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
> day, the controllers in our clusters have their connection to all brokers 
> disconnected, and then successfully reconnected a few hundred ms later. Each 
> time this occurs we see a brief spike in our 99th percentile produce and 
> consume times, reaching several hundred ms.
> Here is an example of what we're seeing in the controller.log:
> {code}
> [2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
> Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
> broker.160.hostname, 9092). Reconnecting to broker. 
> (kafka.controller.RequestSendThread)
> java.io.IOException: Connection to 160 was disconnected before the response 
> was read
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
> at scala.Option.foreach(Option.scala:236)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
> at 
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
> at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
> at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
> at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
> at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> ... one each for all brokers (including the controller) ...
>  [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
> Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
> state change requests (kafka.controller.RequestSendThread)
> … one each for all brokers (including the controller) ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3916) Connection from controller to broker disconnects

2016-08-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420154#comment-15420154
 ] 

ASF GitHub Bot commented on KAFKA-3916:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1734

KAFKA-3916: Check for disconnects properly before sending from the 
controller



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-3916

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1734


commit 1cc38623c047128d9c83309c66f37491726c8082
Author: Jason Gustafson 
Date:   2016-08-12T00:40:32Z

KAFKA-3916: Check for disconnects properly before sending from the 
controller




> Connection from controller to broker disconnects
> 
>
> Key: KAFKA-3916
> URL: https://issues.apache.org/jira/browse/KAFKA-3916
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1
>Reporter: Dave Powell
>
> We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
> day, the controllers in our clusters have their connection to all brokers 
> disconnected, and then successfully reconnected a few hundred ms later. Each 
> time this occurs we see a brief spike in our 99th percentile produce and 
> consume times, reaching several hundred ms.
> Here is an example of what we're seeing in the controller.log:
> {code}
> [2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
> Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
> broker.160.hostname, 9092). Reconnecting to broker. 
> (kafka.controller.RequestSendThread)
> java.io.IOException: Connection to 160 was disconnected before the response 
> was read
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
> at scala.Option.foreach(Option.scala:236)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
> at 
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
> at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
> at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
> at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
> at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> ... one each for all brokers (including the controller) ...
>  [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
> Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
> state change requests (kafka.controller.RequestSendThread)
> … one each for all brokers (including the controller) ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3916) Connection from controller to broker disconnects

2016-08-12 Thread Michal Turek (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418477#comment-15418477
 ] 

Michal Turek commented on KAFKA-3916:
-

Hi Jason and others,

I have finally found the root cause for our case. I was grep-ing the controller 
logs for a while to hide disturbing stack traces and noticed that all warnings 
from last month always mention the same topic, which is quite strange because 
we have about 50 of them total. This one is the largest one, with the highest 
throughput, but not strongly larger than others. Time stamps of the warnings 
are also from the afternoon or early evening when Kafka receives the highest 
traffic, so I believe the issue is somehow related to performance. There is 
also always leader=2, ISR changes between [1,2] and [1,2,3], related partitions 
are always 2, 7, 11.

While working on a completely different task yesterday, I also by coincidence 
found that consumption speed of events buffered in Kafka for this topic reached 
its limits, other topics were ok. Producers were only little slower than 
consumers. So we are trying to add few more partitions to increase parallelism 
both in consumers and internally in Kafka (replication). I guess it will help.

{noformat}
# egrep 'WARN|IOException' controller.log | less -S

[2016-08-10 19:00:14,502] WARN [Controller-1-to-broker-1-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2,3],zk_version=145,replicas=[2,3,1]},{topic=(always
 same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2,3],zk_version=120,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(1, (host1), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 1 was disconnected before the response was 
read
[2016-08-10 19:00:14,506] WARN [Controller-1-to-broker-4-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2,3],zk_version=145,replicas=[2,3,1]},{topic=(always
 same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2,3],zk_version=120,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(4, (host4), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 4 was disconnected before the response was 
read
[2016-08-10 19:00:14,506] WARN [Controller-1-to-broker-2-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2,3],zk_version=145,replicas=[2,3,1]},{topic=(always
 same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2,3],zk_version=120,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(2, (host2), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 2 was disconnected before the response was 
read
[2016-08-10 19:00:14,502] WARN [Controller-1-to-broker-3-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2,3],zk_version=145,replicas=[2,3,1]},{topic=(always
 same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2,3],zk_version=120,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(3, (host3), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 3 was disconnected before the 

[jira] [Commented] (KAFKA-3916) Connection from controller to broker disconnects

2016-08-11 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417813#comment-15417813
 ] 

Jason Gustafson commented on KAFKA-3916:


Could we be hitting the idle connection timeout? The default setting for 
{{connections.max.idle.ms}} is 10 minutes. When the cluster is stable, I'm not 
sure there's much traffic from the controller, so we may proactively disconnect 
the connection. Perhaps someone can try increasing it?

> Connection from controller to broker disconnects
> 
>
> Key: KAFKA-3916
> URL: https://issues.apache.org/jira/browse/KAFKA-3916
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1
>Reporter: Dave Powell
>
> We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
> day, the controllers in our clusters have their connection to all brokers 
> disconnected, and then successfully reconnected a few hundred ms later. Each 
> time this occurs we see a brief spike in our 99th percentile produce and 
> consume times, reaching several hundred ms.
> Here is an example of what we're seeing in the controller.log:
> {code}
> [2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
> Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
> broker.160.hostname, 9092). Reconnecting to broker. 
> (kafka.controller.RequestSendThread)
> java.io.IOException: Connection to 160 was disconnected before the response 
> was read
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
> at scala.Option.foreach(Option.scala:236)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
> at 
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
> at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
> at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
> at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
> at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> ... one each for all brokers (including the controller) ...
>  [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
> Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
> state change requests (kafka.controller.RequestSendThread)
> … one each for all brokers (including the controller) ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3916) Connection from controller to broker disconnects

2016-08-11 Thread Michal Turek (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416682#comment-15416682
 ] 

Michal Turek commented on KAFKA-3916:
-

Production of messages was failing for few seconds this time. Exceptions in 
controller.log are the same as in description. Multiple ISR shrinks and expands 
happened during 6:30 pm to 9:30 pm based on our monitoring and graphs, then the 
issue disappeared.

{noformat}
2016-08-10 20:01:40.611 WARN  (application class)  
[kafka-producer-network-thread | (consumer group)]: Write to Kafka failed: 
(topic), org.apache.kafka.common.errors.TimeoutException: Batch Expired
2016-08-10 20:01:40.683 WARN  (application class)  
[kafka-producer-network-thread | (consumer group)]: Write to Kafka failed: 
(topic), org.apache.kafka.common.errors.TimeoutException: Batch Expired
...
2016-08-10 20:01:43.714 WARN  (application class)  
[kafka-producer-network-thread | (consumer group)]: Write to Kafka failed: 
(topic), org.apache.kafka.common.errors.TimeoutException: Batch Expired
2016-08-10 20:01:43.714 WARN  (application class)  
[kafka-producer-network-thread | (consumer group)]: Write to Kafka failed: 
(topic), org.apache.kafka.common.errors.TimeoutException: Batch Expired
2016-08-10 20:01:43.838 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-3, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.840 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-0, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.840 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-0, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.840 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-8, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.841 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-4, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.841 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-0, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.841 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-10, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.842 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-0, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.842 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-6, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.842 WARN  o.a.kafka.clients.producer.internals.Sender   
[kafka-producer-network-thread | (consumer group)]: Got error produce response 
with correlation id 113676841 on topic-partition (topic)-6, retrying (0 
attempts left). Error: NETWORK_EXCEPTION
2016-08-10 20:01:43.919 WARN  (application class)  
[kafka-producer-network-thread | (consumer group)]: Write to Kafka failed: 
(topic), org.apache.kafka.common.errors.TimeoutException: Batch Expired
2016-08-10 20:01:43.920 WARN  (application class)  
[kafka-producer-network-thread | (consumer group)]: Write to Kafka failed: 
(topic), org.apache.kafka.common.errors.TimeoutException: Batch Expired
...
2016-08-10 20:01:43.971 WARN  (application class)  
[kafka-producer-network-thread | (consumer group)]: Write to Kafka failed: 
(topic), org.apache.kafka.common.errors.TimeoutException: Batch Expired
2016-08-10 20:01:43.972 WARN  (application class)  
[kafka-producer-network-thread | (consumer group)]: Write to Kafka failed: 
(topic), org.apache.kafka.common.errors.TimeoutException: Batch Expired
{noformat}

> Connection from controller to broker disconnects
> 
>
> Key: KAFKA-3916
> URL: 

[jira] [Commented] (KAFKA-3916) Connection from controller to broker disconnects

2016-08-03 Thread Michal Turek (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406027#comment-15406027
 ] 

Michal Turek commented on KAFKA-3916:
-

We saw this issue yesterday, I don't know if this helps, but it may be useful 
while debugging.

- Kafka 0.9.0.1 and 0.9.0.1 clients.
- There is ISR shrink and immediate ISR expand visible in graphs based on JMX 
of Kafka brokers.
- Consumers were unable to commit offsets at that time.

{noformat}
2016-08-02 14:25:29.589 INFO  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-7]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will 
find new coordinator and retry
2016-08-02 14:25:52.560 INFO  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-2]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will 
find new coordinator and retry
2016-08-02 14:25:52.562 INFO  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-0]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will 
find new coordinator and retry
2016-08-02 14:25:52.563 INFO  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-5]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will 
find new coordinator and retry
2016-08-02 14:25:52.570 INFO  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-6]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will 
find new coordinator and retry
2016-08-02 14:25:52.570 INFO  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-3]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will 
find new coordinator and retry
2016-08-02 14:25:52.570 INFO  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-4]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will 
find new coordinator and retry
2016-08-02 14:25:52.572 INFO  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-1]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will 
find new coordinator and retry
2016-08-02 14:25:52.572 INFO  o.a.k.c.c.internals.AbstractCoordinator   
[Consumer-7]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.573 INFO  o.a.k.c.c.internals.AbstractCoordinator   
[Consumer-2]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.573 INFO  o.a.k.c.c.internals.AbstractCoordinator   
[Consumer-0]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.574 INFO  o.a.k.c.c.internals.AbstractCoordinator   
[Consumer-5]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.575 INFO  o.a.k.c.c.internals.AbstractCoordinator   
[Consumer-6]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.575 INFO  o.a.k.c.c.internals.AbstractCoordinator   
[Consumer-3]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.575 INFO  o.a.k.c.c.internals.AbstractCoordinator   
[Consumer-4]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.576 INFO  o.a.k.c.c.internals.AbstractCoordinator   
[Consumer-1]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.576 WARN  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-7]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.577 WARN  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-2]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.577 WARN  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-0]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.577 WARN  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-5]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.578 WARN  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-6]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.578 WARN  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-3]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.578 WARN  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-4]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.579 WARN  o.a.k.c.c.internals.ConsumerCoordinator   
[Consumer-1]: Auto offset commit failed: The request timed out.
{noformat}

> Connection from controller to broker disconnects
> 
>
> Key: KAFKA-3916
> URL: https://issues.apache.org/jira/browse/KAFKA-3916
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1
>Reporter: Dave Powell
>
> We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
> day, the controllers in our clusters have their connection to all brokers 
> disconnected, and then successfully reconnected a few hundred ms later. Each 
> time this occurs we see a brief spike in our 99th percentile produce and 
> consume times, reaching several hundred ms.
> Here is an example of what 

[jira] [Commented] (KAFKA-3916) Connection from controller to broker disconnects

2016-07-22 Thread Andrey Konyaev (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389188#comment-15389188
 ] 

Andrey Konyaev commented on KAFKA-3916:
---

I have this problem with 0.10 vers.



> Connection from controller to broker disconnects
> 
>
> Key: KAFKA-3916
> URL: https://issues.apache.org/jira/browse/KAFKA-3916
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1
>Reporter: Dave Powell
>
> We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
> day, the controllers in our clusters have their connection to all brokers 
> disconnected, and then successfully reconnected a few hundred ms later. Each 
> time this occurs we see a brief spike in our 99th percentile produce and 
> consume times, reaching several hundred ms.
> Here is an example of what we're seeing in the controller.log:
> {code}
> [2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
> Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
> broker.160.hostname, 9092). Reconnecting to broker. 
> (kafka.controller.RequestSendThread)
> java.io.IOException: Connection to 160 was disconnected before the response 
> was read
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
> at scala.Option.foreach(Option.scala:236)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
> at 
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
> at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
> at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
> at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
> at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> ... one each for all brokers (including the controller) ...
>  [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
> Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
> state change requests (kafka.controller.RequestSendThread)
> … one each for all brokers (including the controller) ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)