Chaitanya GSK created KAFKA-6064:
------------------------------------

             Summary: Cluster hung when the controller tried to delete a bunch 
of topics 
                 Key: KAFKA-6064
                 URL: https://issues.apache.org/jira/browse/KAFKA-6064
             Project: Kafka
          Issue Type: Bug
          Components: controller
    Affects Versions: 0.8.2.1
         Environment: rhel 6, 12 core, 48GB 
            Reporter: Chaitanya GSK


Hi, 

We have been using 0.8.2.1 in our kafka cluster and we had a full cluster 
outage when we programmatically tried to delete 220 topics and the controller 
got hung and went out of memory. This has somehow led to the whole cluster 
outage and the clients were not able to push the data at the right rate. AFAIK, 
controller shouldn't impact the write rate to the fellow brokers and in this 
case, it did. Below is the client error.

[WARN] Failed to send kafka.producer.async request with correlation id 
1613935688 to broker 44 with data for partitions 
[topic_2,65],[topic_2,167],[topic_3,2],[topic_4,0],[topic_5,30],[topic_2,48],[topic_2,150]
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:1.8.0_60]
        at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) 
~[?:1.8.0_60]
        at sun.nio.ch.IOUtil.write(IOUtil.java:148) ~[?:1.8.0_60]
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) 
~[?:1.8.0_60]
        at java.nio.channels.SocketChannel.write(SocketChannel.java:502) 
~[?:1.8.0_60]
        at 
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) 
~[stormjar.jar:?]
        at kafka.network.Send$class.writeCompletely(Transmission.scala:75) 
~[stormjar.jar:?]
        at 
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
 ~[stormjar.jar:?]
        at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) 
~[stormjar.jar:?]
        at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73) 
~[stormjar.jar:?]
        at 
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
 ~[stormjar.jar:?]
        at 
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:103)
 ~[stormjar.jar:?]
        at 
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103)
 ~[stormjar.jar:?]
        at 
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103)
 ~[stormjar.jar:?]
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
        at 
kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:102)
 ~[stormjar.jar:?]
        at 
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) 
~[stormjar.jar:?]
        at 
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) 
~[stormjar.jar:?]
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
        at kafka.producer.SyncProducer.send(SyncProducer.scala:101) 
~[stormjar.jar:?]
        at 
kafka.producer.async.YamasKafkaEventHandler.kafka$producer$async$YamasKafkaEventHandler$$send(YamasKafkaEventHandler.scala:481)
 [stormjar.jar:?]
        at 
kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:144)
 [stormjar.jar:?]
        at 
kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:138)
 [stormjar.jar:?]
        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 [stormjar.jar:?]
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) 
[stormjar.jar:?]
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) 
[stormjar.jar:?]
        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) 
[stormjar.jar:?]
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) 
[stormjar.jar:?]
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) 
[stormjar.jar:?]
        at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) 
[stormjar.jar:?]
        at 
kafka.producer.async.YamasKafkaEventHandler.dispatchSerializedData(YamasKafkaEventHandler.scala:138)
 [stormjar.jar:?]
        at 
kafka.producer.async.YamasKafkaEventHandler.handle(YamasKafkaEventHandler.scala:79)
 [stormjar.jar:?]
        at 
kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105)
 [stormjar.jar:?]
        at 
kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88)
 [stormjar.jar:?]
        at 
kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68)
 [stormjar.jar:?]
        at scala.collection.immutable.Stream.foreach(Stream.scala:547) 
[stormjar.jar:?]
        at 
kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67)
 [stormjar.jar:?]
        at 
kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45) 
[stormjar.jar:?]

We tried shifting the controller to a different broker and that didn't help. We 
had to ultimately clean up the kafka cluster to stabilize it. 

Wondering if this is a known issue and if not we would appreciate it if anyone 
in the community could provide insights into why the hung controller would 
bring down the cluster and why deleting the topics would cause the controllers 
hang.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to