[
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847719#comment-15847719
]
Wangda Tan commented on YARN-5889:
----------------------------------
Major comments of UsersManager:
1) Remove UM#incResourceUsagePerUser, instead LQ should use
updateUserResourceUsageDuringAllocate to make queue ratio get updated as well.
2) Instead of storing user limit for every user/partition/scheduling-mode, we
should only store it for every partition and scheduling-mode since now we have
an identical UL for all users in active set and total set.
3) Instead of storing local version inside user, we should store it for every
partition/scheduling-mode to make sure we can always get up-to-date UL. And
also, I think we may need two maps to store active-user/all-users local version.
4) Headroom of user can be fixed in a separate patch with lower priority: now
headroom is set to queue-capacity * MULP, and it will slowly increase with more
container allocation.
Minor comments for UsersManager
5) public to private:
updateQueueUsageRatio/getUsers/getLatestVersionOfUsersState
6) Merge implementation of
updateUserResourceUsageDuringAllocate/updateUserResourceUsageDuringRelease.
7) resourceCalculator can be stored in UsersManager, which we can avoid passing
RC in all parameters.
8) getTotalResourceUsagePerUser, instead of using nested {{? .. : ..}}, suggest
to use if..else
9) UM#UsageRatios: not caused by your patch: there're some necessary boxing
operation. Better to update inc/setUsageRatio, all new Float is not required.
10) Following method is too simple to be a separated method:
removeFromUsersSet/removeFromUsersSet
11) Need writeLock: UM#removeUser,
12) isRecomputeNeeded: if you agree with #3, I think the parameter should be:
partition/schedulingMode only. Map<SchedulingMode, Resource> is not required
since if partition=x and scheduling-mode=y exists, the value of resource should
not null, correct?
13) getComputedResourceLimitForAll/ActiveUsers, I think the logic can be
simplified to:
{code}
writelock {
if (isRecomputationNeede(...)) {
// ...
}
}
return map.get(scheduling_mode, partition);
{code}
14) {{userLimitPerSchedulingMode.clear()}} can be removed from
{{reComputeUserLimits}} if you agree with #12.
15) {{Resource resourceByLabel}} can be removed from parameter list of
{{updateUserResourceUsageDuringAllocate}}
16) calculateUserUsageRatio could be removed and all logics can be done inside
setUsageRatio which wrapped by writelock. And rename {{setUsageRatio}} to
{{updateUsageRatio}}
17) rename updateQueueUsageRatio to incQueueUsageRatio
18) (TODO) Please review all usage of getUserAndAddIfAbsent, I think some of
them might not necessary.
> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Reporter: Sunil G
> Assignee: Sunil G
> Attachments: YARN-5889.0001.patch,
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch,
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch,
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.v0.patch,
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with
> a write lock. To improve performance, this tickets is focussing on moving
> user-limit calculation out of heartbeat allocation flow.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]