Arun Suresh updated YARN-3453:
    Attachment: YARN-3453.5.patch

[~peng.zhang], thanks for the review..

bq. Why not changing all usage of calculator in FairScheduler to policy 
The main reason why I left that section of the code to use 
{{RESOURCE_CALCULATOR}} is that at that point, we do not have the associated 
queue information.. and thus we will not know the relevant policy required.. 
and hence will not be able to pick the correct calculator..

But after you pointed it out.. I gave it some further thought.. and have 
uploaded a new patch : {{YARN-3453.5.patch}} .. where I do not use a 
calculator.. but instead I check if either mem or vcores > 0.

Do take a look and let me know if its fine..

But in any case, as [~ashwinshankar77] had pointed out, we need a proper 
re-look at YARN-2154 to improve pre-emption in general.

> Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
> even in DRF mode causing thrashing
> ------------------------------------------------------------------------------------------------------------
>                 Key: YARN-3453
>                 URL: https://issues.apache.org/jira/browse/YARN-3453
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Ashwin Shankar
>            Assignee: Arun Suresh
>         Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
> YARN-3453.4.patch, YARN-3453.5.patch
> There are two places in preemption code flow where DefaultResourceCalculator 
> is used, even in DRF mode.
> Which basically results in more resources getting preempted than needed, and 
> those extra preempted containers aren’t even getting to the “starved” queue 
> since scheduling logic is based on DRF's Calculator.
> Following are the two places :
> 1. {code:title=FSLeafQueue.java|borderStyle=solid}
> private boolean isStarved(Resource share)
> {code}
> A queue shouldn’t be marked as “starved” if the dominant resource usage
> is >=  fair/minshare.
> 2. {code:title=FairScheduler.java|borderStyle=solid}
> protected Resource resToPreempt(FSLeafQueue sched, long curTime)
> {code}
> --------------------------------------------------------------
> One more thing that I believe needs to change in DRF mode is : during a 
> preemption round,if preempting a few containers results in satisfying needs 
> of a resource type, then we should exit that preemption round, since the 
> containers that we just preempted should bring the dominant resource usage to 
> min/fair share.

This message was sent by Atlassian JIRA

Reply via email to