[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16384645#comment-16384645
 ] 

Manikandan R commented on YARN-4606:
------------------------------------

[~eepayne] [~sunilg] Thanks for your inputs. Sorry for the delay.

Attached POC patch to confirm it is in line with our discussions. Please review 
the approach. Will need to make it as robust patch by adding tests etc and also 
have to cover FS, FIFO as well after the feedback.

Approach:

1. Introduce activeUsersOfPendingApps in users manager and increment this count 
as and when apps are accepted.
 2. After activating the application, increment activeUsers and decrement 
activeUsersOfPendingApps in {{UsersManager#activateApplication}} from 
{{AppSchedulingInfo#updatePendingResources}} only when app is no more waiting 
for AM container.
 3. To calculate max AM limit per user, use activeUsers + 
activeUsersOfPendingApps.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to