Manikandan R commented on YARN-4606:

Attaching .002 patch for review.

{quote}Does this patch handles the case that one user has multiple pending 
apps? (Since it doesn't store user to apps information).{quote}
Started handling this case.
{quote}Should we call this inside 
I think we should remove active user from pending apps once AM container get 
Yes, inside {{SchedulerApplicationAttempt#pullNewlyAllocatedContainers}} and 
that too after updating containers with tokens as 
{{SchedulerApplicationAttempt#pullNewlyUpdatedContainers}} does takes care of 
INCREASE, DECREASE, PROMOTE, DEMOTE cases etc not the regular cases.
{quote}Instead of using metrics, it might be better to use 
SchedulerApplicationAttempt#getAppAttemptResourceUsage instead.{quote}
Not required, I guess as explained in previous comment.
{quote}I am doing an in-depth review, but I would like to address a few things 
first regarding method names and comments. I feel that it is important to be 
accurate in these areas in order to eliminate confusion for those maintaining 
this code.{quote}
Taken care of all related comments.

In addition to above changes, We have taken care of app being in ACCEPTED state 
with all AM attempts has been failed because of some reasons. We would like to 
decrement the count even in this case and handles this case via signalling 
scheduler using new event type. 

Also, I am assuming app MOVE from one queue to another doesn't require changes 
as it happen only when app is running?

Thanks [~sunilg] for providing suggestions in some of the above steps.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to