[
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418782#comment-16418782
]
Manikandan R commented on YARN-4606:
------------------------------------
[~eepayne] Thanks for your detailed explanation. Sorry for the delay.
{quote}In this scenario, User2 wants to start App2 but User1 is consuming all
resources in the queue with App1. When App1 releases a resource, however, it is
not given to App2. The resource is given back to App1, which brings its Pending
value down to 19. This is incorrect behavior since Queue1 has room for 2
AMs.{quote}
I was trying to understand this behaviour in current code (without my patch)
and come to know that AM container is being allocated to App2 only after App1
completion when cluster is running full.
In my single node pseudo setup, total cluster resources is 8192M, 8 vcores,
only 1 queue (default) with 100% allocation and max am resources is 2048MB, 2
vcores as max am resource percent is 0.2. I submitted an app (say App1) through
DS with num_containers as 20. While App1 is running and its pending containers
is around 15, submitted second app (say App2) with num_containers as 10. I can
see AM container for App2 is being allocated only after App1 completion, which
is not in line with your earlier comments. Am I missing anything here?
{quote}However, I'm not sure of the best way to get the values for a queue's
Used AM Resources and Max AM Resources from this context. Those may be capacity
scheduler-specific values.
{quote}
Yes. But I do see some equivalents available in {{FSQueueMetrics}}.
> CapacityScheduler: applications could get starved because computation of
> #activeUsers considers pending apps
> -------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, capacityscheduler
> Affects Versions: 2.8.0, 2.7.1
> Reporter: Karam Singh
> Assignee: Wangda Tan
> Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user
> is an active user. This could lead to starvation of active applications, for
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new
> resources. So computed user-limit-resource could be lower than expected.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]