Eric Payne commented on YARN-4606:

bq. could you briefly summary what is the current issue and solution being 
[~leftnoteasy], the latest patch ({{YARN-4606.POC.patch}}) changed the behavior 
of the capacity scheduler so that it would never give a container to the second 
app for its AM as long as the first app consumed the entire queue and had 
pending requests, even when the AM used is lower than AM max. I described it in 
more detail 

[I suggested that one 
 would be to modify the code as follows as long as there is a way to do it in 
an abstract way:
        if( Not Waiting For AM Container
            || (Queue Used AM Resources < Queue Max AM Resources) {
          abstractUsersManager.activateApplication(user, applicationId);

I suggested a way to do that, but it seems a little cumbersome.

So then I started wondering if there was a way to leverage the {{Schedulable 
Apps}} and {{Non-Schedulable Apps}} user info in the 
{{AppSchedulingInfo#updatePendingResources}} code. I looked more closely, 
however, and it is too early within 
{{AppSchedulingInfo#updatePendingResources}} to tell whether or not a new app 
is destined to be schedulable.

So, I think the best suggestion I have is the pseudo-code I posted above.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to