[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699204#comment-13699204
 ] 

Carlo Curino commented on YARN-897:
-----------------------------------

The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
defines the sort order. I believe for the scheduler to work correctly, we must 
maintain this order explicitly. 
When a new container is assigned to an application, the correposnding queue is 
removed and readded, maintain the order. When a container completes however the 
UsedCapacity of the queue is
changed, but we don't resort the childQueues. This means the TreeSet 
assumptions are not maintained, and we might miss to assign containers to this 
queue. 

Example:
Parent queue (root) has four child queues with capacities (A:25%, B:25%, C:25%, 
D:25%). The cluster has 10GB of resources with a minimum allocation of 1GB.
1-      Through some history we got to assign 1,2,3,4 containers respectively 
to the queues (note: container = 1GB): status child-queues: root.a(0.4), 
root.b(0.8), root.c(1.2), root.d(1.6)
2-      3 containers from D complete, status child-queues: root.a(0.4), 
root.b(0.8), root.c(1.2), root.d(0.4)
3-      Now if A and B keep receiving and releasing containers without ever 
passing the 1.2 mark of C we might have D being stuck behind C and never 
receive containers. 

In practice this might not show up often because of reservations (that bypass 
this ordering). If D has reservations pending it might get at least one 
container, and this will trigger the resorting, thus un-stucking it. 
Nonetheless this should be addressed. I discussed this briefly with few folks 
at Hadoop Summit and we seemed to confirm the problem, but we should double 
check further. 

[~dedcode] will post a small test that triggers the issue, and an idea of patch 
soon... comments welcome.


                
> CapacityScheduler wrongly sorted queues
> ---------------------------------------
>
>                 Key: YARN-897
>                 URL: https://issues.apache.org/jira/browse/YARN-897
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Djellel Eddine Difallah
>
> The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
> defines the sort order. This ensures the queue with least UsedCapacity to 
> receive resources next. On containerAssignment we correctly update the order, 
> but we miss to do so on container completions. This corrupts the TreeSet 
> structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to