[
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468872#comment-16468872
]
Manikandan R commented on YARN-4606:
------------------------------------
{quote}1) Does this patch handles the case that one user has multiple pending
apps? (Since it doesn't store user to apps information).{quote}
Patch doesn't do anything about this case. As and when user submits an app, CS
keeps increasing activeUsersOfPendingApps count as part of accepting the
application irrespective of whether app has been submitted by same or different
user.
{quote}Should we call this inside
SchedulerApplicationAttempt#pullNewlyUpdatedContainers?
I think we should remove active user from pending apps once AM container get
allocated{quote}
While trying to understand this through a real testing, encountered a situation
where in {{SchedulerApplicationAttempt#pullNewlyUpdatedContainers}} returns
empty {{updatedContainers}} always. I was just thinking whether can we call
{{abstractUsersManager.decrNumActiveUsersOfPendingApps()}} inside
{{SchedulerApplicationAttempt#pullNewlyAllocatedContainers}} something like
{code}
if(! this.isWaitingForAMContainer() &&
! hasActiveUsersOfPendingAppsDecremented.get()) {
this.queue.getAbstractUsersManager().decrNumActiveUsersOfPendingApps();
hasActiveUsersOfPendingAppsDecremented.set(true);
}
{code}
If we are planning to move calling {{decrNumActiveUsersOfPendingApps}} from
{{AppSchedulingInfo#updatePendingResources}} to
{{SchedulerApplicationAttempt}}, then do we still need to am usage check
against max am limit? I don't think so. We faced the issue of accepting second
app when we were calling {{decrNumActiveUsersOfPendingApps}} inside
{{abstractUsersManager.activateApplication()}} and that too from
{{AppSchedulingInfo#updatePendingResources}}. I dont think it is required
anymore?
{quote}Does hasActiveUsersOfPendingAppsDecremented need to be atomic? What is
the benefit?{quote}
Not required, I guess. Was trying to be too defensive :)
Will address names and comments related review points once we conclude the flow.
> CapacityScheduler: applications could get starved because computation of
> #activeUsers considers pending apps
> -------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, capacityscheduler
> Affects Versions: 2.8.0, 2.7.1
> Reporter: Karam Singh
> Assignee: Manikandan R
> Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch,
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user
> is an active user. This could lead to starvation of active applications, for
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new
> resources. So computed user-limit-resource could be lower than expected.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]