[ 
https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652426#comment-15652426
 ] 

Yufei Gu commented on YARN-5774:
--------------------------------

Thanks [~templedf] for the review. 
In CS, minimum share will serve as an increment share and it cannot be zero, 
which is guaranteed by the CS sanity check. FIFO doesn't have sanity check. I 
guess it is because of nobody cares about it. We can definitely add sanity 
check for FIFO scheduler in this JIRA or followup JIRA. So that's fine for CS, 
FIFO and FS.
The real tricky part is in common parts of scheduler(or RM). People who write 
the code in common parts might not even notice there is an increment share 
config because CS and FIFO don't have it and FS has it. That is how the issue 
happens in the very beginning.
This patch let {{normalize()}} throw a runtime exception if increment is 0, 
which no need to catch and handle. It will fail the RM when it happens. The 
main reason is that we should consider 0 increment as an invalid configuration 
according to the offline discussion with [~kasha].


> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if set 
> yarn.scheduler.minimum-allocation-mb to 0.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5774
>                 URL: https://issues.apache.org/jira/browse/YARN-5774
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>              Labels: oct16-easy
>         Attachments: YARN-5774.001.patch, YARN-5774.002.patch, 
> YARN-5774.003.patch, YARN-5774.004.patch
>
>
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler 
> because there is no resource request for the AM. This happened when you 
> configure {{yarn.scheduler.minimum-allocation-mb}} to zero.
> The problem is in the code used by both Capacity Scheduler and Fair 
> Scheduler. {{scheduler.increment-allocation-mb}} is a concept in FS, but not 
> CS. So the common code in class RMAppManager passes the 
> {{yarn.scheduler.minimum-allocation-mb}} as incremental one because there is 
> no incremental one for CS when it tried to normalize the resource requests.
> {code}
>      SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
>           scheduler.getClusterResource(),
>           scheduler.getMinimumResourceCapability(),
>           scheduler.getMaximumResourceCapability(),
>           scheduler.getMinimumResourceCapability());  --> incrementResource 
> should be passed here.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to