[
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207251#comment-14207251
]
Karthik Kambatla commented on YARN-2604:
----------------------------------------
Looks mostly good. Just want to confirm - when there are no nodes connected to
the RM, the patch sets the max-allocation to the configured value and not zero.
I think this is good, otherwise all apps will get rejected immediately after
the RM (re)starts. Actually, I wonder if we should add a config to specify
either (a) a particular number of NMs after which this behavior kicks in or (b)
a minimum/floor value for the configurable maximum (min-max-allocation :P).
[~jlowe] - do you think such a config would be useful?
Few comments on the patch itself:
# We should have tests similar to TestFifoScheduler#testMaximumAllocation for
Capacity and FairSchedulers.
# Nit: Rename AbstractYarnScheduler#realMaximumAllocation to
configuredMaximumAllocation? And, in all the schedulers, we should set
configuredMaximumAllocation first and then set maximumAllocation to that. Also,
given both these fields are in AbstractYarnScheduler, I wouldn't refer to them
using {{this.}} in the sub-classes.
# Nit: With locks and unlocks, we follow the following convention in YARN. Mind
updating accordingly?
{code}
lock.lock();
try {
// do your thing
} finally {
lock.unlock();
}
{code}
> Scheduler should consider max-allocation-* in conjunction with the largest
> node
> -------------------------------------------------------------------------------
>
> Key: YARN-2604
> URL: https://issues.apache.org/jira/browse/YARN-2604
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: scheduler
> Affects Versions: 2.5.1
> Reporter: Karthik Kambatla
> Assignee: Robert Kanter
> Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch
>
>
> If the scheduler max-allocation-* values are larger than the resources
> available on the largest node in the cluster, an application requesting
> resources between the two values will be accepted by the scheduler but the
> requests will never be satisfied. The app essentially hangs forever.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)