Ashwin Shankar commented on YARN-2026:

Hi [~sandyr],
bq. We would see that parentA is below its minShare, so we would preempt 
resources on its behalf. 
minShare preemption at parent queue is not yet implemented 
,FairScheduler.resToPreempt() is not recursive(YARN-596 doesn't address this).
I had created YARN-1961 for this purpose,which I plan to work on.
But yes you are right,if YARN-1961 is in place, we can set minShare and 
minShareTimeout at parentA,which would
reclaim resource from parentB.

This solves problem-1 in the description,but what about problem-2 ?
When we have many leaf queues under a parent,say using NestedUserQueue rule.
 - parentA has 100 user queues under it
 - fair share of each user queue is 1% of parentA(assuming weight=1)
 - Say user queue parentA.user1 is taking up 100% of cluster since its the only 
active queue.
 - parentA.user2 which was inactive till now ,submits a job and needs say 20%.
 - parentA.user2 would get only 1% through preemption and parentA.user1 would 
have 99%.
  This seems unfair considering users have equal weight. Eventually,as user1 
releases its containers,
  it would go to user2,but until that happens user1 can hog the cluster.

In our cluster we have about 200 users(so 200 user queues),but only about 
20%(avg) are active
at a point in time. Fair share for each user becomes really low (1/200)*parent 
and can causes
this 'unfairness' mentioned in above example.
This can be solved by dividing fair share only to active queues.

How about this,can we have a new property say 'fairShareForActiveQueues' which 
turns on/off this feature,that way people
who need it can use it and other's can turn it off and would get the usual 
static fair share behavior.
Thoughts ?

> Fair scheduler : Fair share for inactive queues causes unfair allocation in 
> some scenarios
> ------------------------------------------------------------------------------------------
>                 Key: YARN-2026
>                 URL: https://issues.apache.org/jira/browse/YARN-2026
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Ashwin Shankar
>            Assignee: Ashwin Shankar
>              Labels: scheduler
>         Attachments: YARN-2026-v1.txt
> Problem1- While using hierarchical queues in fair scheduler,there are few 
> scenarios where we have seen a leaf queue with least fair share can take 
> majority of the cluster and starve a sibling parent queue which has greater 
> weight/fair share and preemption doesn’t kick in to reclaim resources.
> The root cause seems to be that fair share of a parent queue is distributed 
> to all its children irrespective of whether its an active or an inactive(no 
> apps running) queue. Preemption based on fair share kicks in only if the 
> usage of a queue is less than 50% of its fair share and if it has demands 
> greater than that. When there are many queues under a parent queue(with high 
> fair share),the child queue’s fair share becomes really low. As a result when 
> only few of these child queues have apps running,they reach their *tiny* fair 
> share quickly and preemption doesn’t happen even if other leaf 
> queues(non-sibling) are hogging the cluster.
> This can be solved by dividing fair share of parent queue only to active 
> child queues.
> Here is an example describing the problem and proposed solution:
> root.lowPriorityQueue is a leaf queue with weight 2
> root.HighPriorityQueue is parent queue with weight 8
> root.HighPriorityQueue has 10 child leaf queues : 
> root.HighPriorityQueue.childQ(1..10)
> Above config,results in root.HighPriorityQueue having 80% fair share
> and each of its ten child queue would have 8% fair share. Preemption would 
> happen only if the child queue is <4% (0.5*8=4). 
> Lets say at the moment no apps are running in any of the 
> root.HighPriorityQueue.childQ(1..10) and few apps are running in 
> root.lowPriorityQueue which is taking up 95% of the cluster.
> Up till this point,the behavior of FS is correct.
> Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
> of the cluster. It would get only the available 5% in the cluster and 
> preemption wouldn't kick in since its above 4%(half fair share).This is bad 
> considering childQ1 is under a highPriority parent queue which has *80% fair 
> share*.
> Until root.lowPriorityQueue starts relinquishing containers,we would see the 
> following allocation on the scheduler page:
> *root.lowPriorityQueue = 95%*
> *root.HighPriorityQueue.childQ1=5%*
> This can be solved by distributing a parent’s fair share only to active 
> queues.
> So in the example above,since childQ1 is the only active queue
> under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
> 80%.
> This would cause preemption to reclaim the 30% needed by childQ1 from 
> root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
> Problem2 - Also note that similar situation can happen between 
> root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
> hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
> at 5%,until childQ2 starts relinquishing containers. We would like each of 
> childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
> 40%,which would ensure childQ1 gets upto 40% resource if needed through 
> preemption.

This message was sent by Atlassian JIRA

Reply via email to