[
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054629#comment-14054629
]
Sandy Ryza commented on YARN-2026:
----------------------------------
I had a conversation with [~kkambatl] about this, and he convinced me that we
should turn this on in all cases - i.e. modify FairSharePolicy and
DominantResourceFairnessPolicy instead of creating additional policies. Sorry
to vacillate on this.
Some additional comments on the code:
{code}
+ return this.getNumRunnableApps() > 0;
{code}
{code}
+ || (sched instanceof FSQueue && ((FSQueue) sched).isActive())) {
{code}
Instead of using instanceof, can we add an isActive method to Schedulable, and
always return true for it in AppSchedulable?
{code}
+ out.println(" <queue name=\"childA1\" />");
+ out.println(" <queue name=\"childA2\" />");
+ out.println(" <queue name=\"childA3\" />");
+ out.println(" <queue name=\"childA4\" />");
+ out.println(" <queue name=\"childA5\" />");
+ out.println(" <queue name=\"childA6\" />");
+ out.println(" <queue name=\"childA7\" />");
+ out.println(" <queue name=\"childA8\" />");
{code}
Do we need this many children?
{code}
+ out.println("</queue>");
+
+ out.println("</allocations>");
{code}
Unnecessary newline
{code}
+ public void testFairShareActiveOnly_ShareResetsToZeroWhenAppsComplete()
{code}
Take out underscore.
{code}
+ private void setupCluster(int mem, int vCores) throws IOException {
{code}
Give this method a name that's more descriptive of the kind of configuration
it's setting up.
{code}
+ private void setupCluster(int nodeMem) throws IOException {
{code}
Can this call the setupCluster that takes two arguments?
To help with the fight against TestFairScheduler becoming a monstrosity, the
tests should go into a new test file. TestFairSchedulerPreemption is a good
example of how to do this.
{code}
+ int nodeVcores = 10;
{code}
Nit: "nodeVCores"
> Fair scheduler : Fair share for inactive queues causes unfair allocation in
> some scenarios
> ------------------------------------------------------------------------------------------
>
> Key: YARN-2026
> URL: https://issues.apache.org/jira/browse/YARN-2026
> Project: Hadoop YARN
> Issue Type: Bug
> Components: scheduler
> Reporter: Ashwin Shankar
> Assignee: Ashwin Shankar
> Labels: scheduler
> Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt
>
>
> Problem1- While using hierarchical queues in fair scheduler,there are few
> scenarios where we have seen a leaf queue with least fair share can take
> majority of the cluster and starve a sibling parent queue which has greater
> weight/fair share and preemption doesn’t kick in to reclaim resources.
> The root cause seems to be that fair share of a parent queue is distributed
> to all its children irrespective of whether its an active or an inactive(no
> apps running) queue. Preemption based on fair share kicks in only if the
> usage of a queue is less than 50% of its fair share and if it has demands
> greater than that. When there are many queues under a parent queue(with high
> fair share),the child queue’s fair share becomes really low. As a result when
> only few of these child queues have apps running,they reach their *tiny* fair
> share quickly and preemption doesn’t happen even if other leaf
> queues(non-sibling) are hogging the cluster.
> This can be solved by dividing fair share of parent queue only to active
> child queues.
> Here is an example describing the problem and proposed solution:
> root.lowPriorityQueue is a leaf queue with weight 2
> root.HighPriorityQueue is parent queue with weight 8
> root.HighPriorityQueue has 10 child leaf queues :
> root.HighPriorityQueue.childQ(1..10)
> Above config,results in root.HighPriorityQueue having 80% fair share
> and each of its ten child queue would have 8% fair share. Preemption would
> happen only if the child queue is <4% (0.5*8=4).
> Lets say at the moment no apps are running in any of the
> root.HighPriorityQueue.childQ(1..10) and few apps are running in
> root.lowPriorityQueue which is taking up 95% of the cluster.
> Up till this point,the behavior of FS is correct.
> Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30%
> of the cluster. It would get only the available 5% in the cluster and
> preemption wouldn't kick in since its above 4%(half fair share).This is bad
> considering childQ1 is under a highPriority parent queue which has *80% fair
> share*.
> Until root.lowPriorityQueue starts relinquishing containers,we would see the
> following allocation on the scheduler page:
> *root.lowPriorityQueue = 95%*
> *root.HighPriorityQueue.childQ1=5%*
> This can be solved by distributing a parent’s fair share only to active
> queues.
> So in the example above,since childQ1 is the only active queue
> under root.HighPriorityQueue, it would get all its parent’s fair share i.e.
> 80%.
> This would cause preemption to reclaim the 30% needed by childQ1 from
> root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
> Problem2 - Also note that similar situation can happen between
> root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2
> hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck
> at 5%,until childQ2 starts relinquishing containers. We would like each of
> childQ1 and childQ2 to get half of root.HighPriorityQueue fair share ie
> 40%,which would ensure childQ1 gets upto 40% resource if needed through
> preemption.
--
This message was sent by Atlassian JIRA
(v6.2#6252)