[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699373#comment-13699373 ]
Djellel Eddine Difallah commented on YARN-897: ---------------------------------------------- We spotted this bug while experimenting on dynamic queues updates. The TreeSet methods .contains() and .remove() failed on retrieving a queue that we knew was there, and that gave us a hint that the tree was unsorted properly. The attached test is a [simple junit test | https://issues.apache.org/jira/secure/attachment/12590676/TestBugParentQueue.java] inspired by the already available capacity scheduler tests. It does simulate the scenario that [~curino] describes above and displays the order in which the childQueues is left after a couple of container assignments and completions. I will post a first version of a patch that re-inserts the recently completed container's queue (and all its parents) into their respective parents' childQueues. > CapacityScheduler wrongly sorted queues > --------------------------------------- > > Key: YARN-897 > URL: https://issues.apache.org/jira/browse/YARN-897 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Reporter: Djellel Eddine Difallah > Attachments: TestBugParentQueue.java > > > The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity > defines the sort order. This ensures the queue with least UsedCapacity to > receive resources next. On containerAssignment we correctly update the order, > but we miss to do so on container completions. This corrupts the TreeSet > structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira