[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

Eric Payne (JIRA) Mon, 02 Apr 2018 08:38:14 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422631#comment-16422631
 ]


Eric Payne commented on YARN-4606:
----------------------------------

{quote}AM container is being allocated to App2 only after App1 completion when 
cluster is running full.
{quote}
[[email protected]], In the current implementation (that is, without the 
patch), there are a couple of things that can affect whether or not App2 will 
be given the freed container:
 - In the running queue, if {{Configured Minimum User Limit Percent}} is set to 
100%, only one user can run in the queue at a time. If this is so, then the 
Capacity Scheduler will only assign new containers to App1 (owned by User1). 
However, if {{Configured Minimum User Limit Percent}} is 50% or less, the 
Capacity Scheduler will assign new containers to App2 (owned by User2) until 
they both have 50% of the queue or one stops asking for new resources.
 - In the running queue, if {{Used Application Master Resources}} equals {{Max 
Application Master Resources}}, the Capacity Scheduler will not assign an AM to 
App2.
 - The same thing happens if {{Num Schedulable Applications}} is equal to {{Max 
Applications}}, but that's probably not the case here.

I suspect it may be the first case. Please check to make sure your queue 
configuration is set to allow multiple running users in the queue.
{quote}{quote}However, I'm not sure of the best way to get the values for a 
queue's Used AM Resources and Max AM Resources from this context. Those may be 
capacity scheduler-specific values.
{quote}
Yes. But I do see some equivalents available in FSQueueMetrics.
{quote}
The concern is that even though they may be in both FSQueueMetrics, and 
CSQueueMetrics, they are not accessible at the abstract {{QueueMetrics}} layer 
because they have different accessors. It should be possible to add a new, 
abstract accessor in {{QueueMetrics}} that is implemented in FS/CS QueueMetrics.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

Reply via email to