[
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391802#comment-16391802
]
Eric Payne commented on YARN-4606:
----------------------------------
[[email protected]], thank you for the patch. The overall approach looks
fine, but I have a couple of concerns.
- The behavior of assigning resources to schedulable applications has changed.
With this patch, in the following use case, resources are not assigned to the
second app when they should be. I have not analyzed the behavior closely enough
to debug the issue, but I wish to document the behavior:
-- Queue1 total resources: 40
-- Queue1 Max Application Master Resources: 2
-- Container sizes are all 1 resource
|*User Name*|*Applicatiton ID*|*Used AM resources*|*Total Used
Resources*|*Pending Resources*|
|User1|App1|1|39|20|
|User2|App2|0|0|1 (waiting for AM)|
-- In this scenario, User2 wants to start App2 but User1 is consuming all
resources in the queue with App1. When App1 releases a resource, however, it is
not given to App2. The resource is given back to App1, which brings its Pending
value down to 19. This is incorrect behavior since Queue1 has room for 2 AMs.
- I think the {{TestRMHA}} unit test needs to be modified to adjust to this
patch:
{code:java}
TestRMHA
TestRMHA.testFailoverAndTransitions:219->verifyClusterMetrics:754 Incorrect
value for metric activeApplications expected:<1> but was:<0>
TestRMHA.testFailoverClearsRMContext:550->verifyClusterMetrics:754 Incorrect
value for metric activeApplications expected:<1> but was:<0>
{code}
- A couple of minor things:
-- IIUC, the value stored in {{activeUsersOfPendingApps}} represents the
number of suers that do not have any active applications. Is that correct? If
so, I think it would be more clear if it were called
{{usersWithOnlyPendingApps}}.
-- In {{AbstractUsersManager}} and {{ActiveUsersManager}}, *atleast* should be
"at least*.
> CapacityScheduler: applications could get starved because computation of
> #activeUsers considers pending apps
> -------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, capacityscheduler
> Affects Versions: 2.8.0, 2.7.1
> Reporter: Karam Singh
> Assignee: Wangda Tan
> Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user
> is an active user. This could lead to starvation of active applications, for
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new
> resources. So computed user-limit-resource could be lower than expected.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]