Arun Suresh commented on YARN-3453:


bq. After review above comments, I am reminded that the case (0 GB, non-zero 
cores) like (non-zero GB, 0 cores) will also cause preempt more resources than 
I agree... But I feel instead of fixing it here, if we can have a comprehensive 
fix as requested by YARN-2154 ( [~kasha] and myself had an offline discussion 
about how we should actually break from the preemption loop when incoming 
requests are satisfied), then we wont even hit this case.
Further more, this JIRA fixes the {{isStarved()}} method in the Queue 
correctly, so at the very least, the {{toPreempt}} resource object would be 
smaller (and thus would implicitly result in less pre-emptions)

I also agree fining the ratio of demand is definitely useful. But again, let us 
grab all the low hanging fruit first. I propose we create a separate JIRA for 

> Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
> even in DRF mode causing thrashing
> ------------------------------------------------------------------------------------------------------------
>                 Key: YARN-3453
>                 URL: https://issues.apache.org/jira/browse/YARN-3453
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Ashwin Shankar
>            Assignee: Arun Suresh
>         Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
> YARN-3453.4.patch, YARN-3453.5.patch
> There are two places in preemption code flow where DefaultResourceCalculator 
> is used, even in DRF mode.
> Which basically results in more resources getting preempted than needed, and 
> those extra preempted containers aren’t even getting to the “starved” queue 
> since scheduling logic is based on DRF's Calculator.
> Following are the two places :
> 1. {code:title=FSLeafQueue.java|borderStyle=solid}
> private boolean isStarved(Resource share)
> {code}
> A queue shouldn’t be marked as “starved” if the dominant resource usage
> is >=  fair/minshare.
> 2. {code:title=FairScheduler.java|borderStyle=solid}
> protected Resource resToPreempt(FSLeafQueue sched, long curTime)
> {code}
> --------------------------------------------------------------
> One more thing that I believe needs to change in DRF mode is : during a 
> preemption round,if preempting a few containers results in satisfying needs 
> of a resource type, then we should exit that preemption round, since the 
> containers that we just preempted should bring the dominant resource usage to 
> min/fair share.

This message was sent by Atlassian JIRA

Reply via email to