[
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207405#comment-14207405
]
Craig Welch commented on YARN-2637:
-----------------------------------
I think the fix is fairly straightforward - there is an "amResource" property
on the SchedulerApplicationAttempt / FiCaSchedulerApp, it does not appear to be
being populated in the CapacityScheduler case (but it should be, and the
information is available in the submission / from the resource requests of the
appliction) - populate this value, and then add a Resource property to
LeafQueue which represents the resources used by active application masters -
when an application starts, add it's amResource value to the LeafQueue's active
application master resource value, when an application ends, remove it. Before
starting an application compare the sum of the active application masters + the
new application's resource to the resource represented by the percentage of
cluster resource allowed to be used by am's in the queue (this can differ by
queue...) and if it exceeds the value do not start the application. The
existing trickle down logic base on the minimum allocation should be removed,
there is also logic regarding how many applications can be running based on
explicit configuration which should be retained.
{code}
if ((queue.activeApplicationMasterResourceTotal +
readyToStartApplication.applicationMasterResource) <=
queue.portionOfClusterResourceAllowedForApplicatoinMaster * clusterResource &&
maxAllowedApplications < runningApplications + 1) {
queue.startTheApp
}
{code}
> maximum-am-resource-percent could be violated when resource of AM is >
> minimumAllocation
> ----------------------------------------------------------------------------------------
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: Wangda Tan
> Priority: Critical
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be
> activated in following way:
> {code}
> for (Iterator<FiCaSchedulerApp> i=pendingApplications.iterator();
> i.hasNext(); ) {
> FiCaSchedulerApp application = i.next();
>
> // Check queue limit
> if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
> }
>
> // Check user limit
> User user = getUser(application.getUser());
> if (user.getActiveApplications() <
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() +
> " activated in queue: " + getQueueName());
> }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All
> apps can still be activated, and it will occupy all resource of a queue
> instead of only a max_am_resource_percent of a queue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)