[ 
https://issues.apache.org/jira/browse/KAFKA-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-6134.
------------------------------------
    Resolution: Fixed

> High memory usage on controller during partition reassignment
> -------------------------------------------------------------
>
>                 Key: KAFKA-6134
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6134
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.11.0.0, 0.11.0.1
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Critical
>              Labels: regression
>             Fix For: 1.0.0, 0.11.0.2
>
>         Attachments: Screen Shot 2017-10-26 at 3.05.40 PM.png
>
>
> We've had a couple users reporting spikes in memory usage when the controller 
> is performing partition reassignment in 0.11. After investigation, we found 
> that the controller event queue was using most of the retained memory. In 
> particular, we found several thousand {{PartitionReassignment}} objects, each 
> one containing one fewer partition than the previous one (see the attached 
> image).
> From the code, it seems clear why this is happening. We have a watch on the 
> partition reassignment path which adds the {{PartitionReassignment}} object 
> to the event queue:
> {code}
>   override def handleDataChange(dataPath: String, data: Any): Unit = {
>     val partitionReassignment = 
> ZkUtils.parsePartitionReassignmentData(data.toString)
>     eventManager.put(controller.PartitionReassignment(partitionReassignment))
>   }
> {code}
> In the {{PartitionReassignment}} event handler, we iterate through all of the 
> partitions in the reassignment. After we complete reassignment for each 
> partition, we remove that partition and update the node in zookeeper. 
> {code}
>     // remove this partition from that list
>     val updatedPartitionsBeingReassigned = partitionsBeingReassigned - 
> topicAndPartition
>     // write the new list to zookeeper
>   
> zkUtils.updatePartitionReassignmentData(updatedPartitionsBeingReassigned.mapValues(_.newReplicas))
> {code}
> This triggers the handler above which adds a new event in the queue. So what 
> you get is an n^2 increase in memory where n is the number of partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to