[ 
https://issues.apache.org/jira/browse/KAFKA-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4573:
------------------------------
    Component/s: producer 

> Producer sporadic timeout
> -------------------------
>
>                 Key: KAFKA-4573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4573
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 0.9.0.1
>            Reporter: Ankur C
>            Priority: Major
>
> We had production outage due to sporadic kafka producer timeout. About 1 to 
> 2% of the message would timeout continuously. 
> Kafka version - 0.9.0.1
> #Kafka brokers - 5
> #Replication for each topic - 3
> #Number of topics  - ~30
> #Number of partition - ~300
> We have kafka 0.9.0.1 running in our 5 broker cluster for 1 month without any 
> issues. However, on Dec 23rd we saw sporadic kafka producer timeout. 
> Issue begin around 6:51am and continued until we bounced kafka broker. 
> 6:51am Underreplication started on small number of topics
> 6:53am All underreplication recovered 
> 11:00am We restarted all kafka producer writer app but this didn't solve the 
> sporadic kafka producer timeout issue
> 12:01pm We restarted all kafka broker after this the issue was resolved.
> Kafka metrics and kafka logs doesn't show any major issue. There were no 
> offline partitions during the outage and #controller was exactly 1. 
> We only saw following exception in kafka broker in controller.log. This log 
> was present for all broker 0 to 4.
> java.io.IOException: Connection to 2 was disconnected before the response was 
> read at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
>  at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
>  at scala.Option.foreach(Option.scala:236) at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
>  at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
>  at 
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
>  at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
>  at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
>  at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
>  at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) 
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>  [2016-12-23 06:51:37,384] WARN [Controller-2-to-broker-2-send-thread], 
> Controller 2 epoch 18 fails to send request 
> {controller_id=2,controller_epoch=18,partition_states=[{topic=compliance_pipeline_fast_green,partition=4,controller_epoch=18,leader=4,leader_epoch=53,isr=[2,4],zk_version=111,replicas=[4,1,2]}],live_brokers=[{id=3,end_points=[{port=31161,host=10.126.144.73,security_protocol_type=0}]},{id=4,end_points=[{port=31355,host=10.126.144.233,security_protocol_type=0}]},{id=2,end_points=[{port=31293,host=10.126.144.137,security_protocol_type=0}]},{id=1,end_points=[{port=31824,host=10.126.144.169,security_protocol_type=0}]},{id=0,end_points=[{port=31139,host=10.126.144.201,security_protocol_type=0}]}]}
>  to broker Node(2, 10.126.144.137, 31293). Reconnecting to broker. 
> (kafka.controller.RequestSendThread)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to