[
https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651964#comment-15651964
]
Daniel Templeton commented on YARN-5774:
----------------------------------------
bq. So if users misconfigure the increment resource in fair scheduler, a
detailed error message will show up.
That's great for fair scheduler, but what about CS and FIFO? I'm particularly
worried about those because an increment of 0 was not previously treated as
invalid. What's the intent of throwing the exception in {{normalize()}}? You
throw an exception when you want to halt the execution flow and allow some
out-of-sequence remedial code to run. In this case, no one is expecting to see
this exception, so no one will catch it, and there's no action that needs to be
taken in the CS and FIFO cases. The net result is that things will fail in
indirect ways. Since it's not a failure for CS and FIFO, I don't think you
should throw the exception. In the case of FS, as you point out, the only
reason to hit the exception is misuse of the resource calculator.
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if set
> yarn.scheduler.minimum-allocation-mb to 0.
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-5774
> URL: https://issues.apache.org/jira/browse/YARN-5774
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.0.0-alpha1
> Reporter: Yufei Gu
> Assignee: Yufei Gu
> Labels: oct16-easy
> Attachments: YARN-5774.001.patch, YARN-5774.002.patch,
> YARN-5774.003.patch, YARN-5774.004.patch
>
>
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler
> because there is no resource request for the AM. This happened when you
> configure {{yarn.scheduler.minimum-allocation-mb}} to zero.
> The problem is in the code used by both Capacity Scheduler and Fair
> Scheduler. {{scheduler.increment-allocation-mb}} is a concept in FS, but not
> CS. So the common code in class RMAppManager passes the
> {{yarn.scheduler.minimum-allocation-mb}} as incremental one because there is
> no incremental one for CS when it tried to normalize the resource requests.
> {code}
> SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
> scheduler.getClusterResource(),
> scheduler.getMinimumResourceCapability(),
> scheduler.getMaximumResourceCapability(),
> scheduler.getMinimumResourceCapability()); --> incrementResource
> should be passed here.
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]