[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822457#comment-15822457
 ] 

Wangda Tan commented on YARN-5889:
----------------------------------

Thanks for update the patch, my comments for ver.3 patch.

1) LQ:
- getUserAndAddIfAbsent could be move to UM
- addUser/removeUser could be private method of UM
- getUsers -> getUsersInfo, and move implemenation to UM
- getComputedActiveUserLimit -> getComputedResourceLimitForActiveUsers (or 
shorter name)
- getComputedUserLimit -> getComputedResourceLimitForAllUsers (mostly for more 
consistency naming to above, and more acurate and less ambiguous)
- Add more description to above two methods.
- Is it better to move recalculateQueueUsageRatio/calculateUserUsageRatio into 
UM, and recalculateQueueUsageRatio can be renamed to "clusterResourceUpdated"
- inc/decUsedResource, keep User reference to avoid hit hashMap twice.
- Not caused by your patch: could you add a desc to UsageRatios?

2) UM:
- UM#userCountToRecalculateUserlimit could be AtomicLong to avoid writelock and 
less frequently wrapped to MIN_VALUE. Rename it to latestVersion.
- rename UM#getUserCountToRecalculateUserlimit -> getLatestVersion and readlock 
is not required. 
- A potential race condition:
{code} 
 ....
    // In allocation thread, cross check whether user became active or not.
    // Such a change needs recomputation for all user-limits.
    if (activeUsersManager.isUserAddedOrRemoved()) {  // 1
      // Clear computed user-limits and ask for recomputation. 
      userLimitNeedsRecompute(); // 2

      // reset the flag so that we can further track for active users.
      activeUsersManager.setUserAddedOrRemoved(false); // 3
    }
 ....
{code} 
If thread#1 at  // 2, and thread #2 calls setUserAddedOrRemoved, the 
{{isUserAddedOrRemoved}} could be overwritten and cannot properly trigger 
another update.
To simplify interface and avoid the race condition, can we change 
{{isUserAddedOrRemoved}} to use an AtomicBoolean, invokes getAndSet(false) to 
do the "flip" operation.
- Renaming {{userCountToRecalculateUserlimit}} to {{localVersion}}?
- Similiarily,
{code}
    if (isRecomputeNeeded(user, userLimitPerSchedulingMode)) {
      // recompute
      userLimitPerSchedulingMode = reComputeUserLimits(rc, userName,
          nodePartition, clusterResource, schedulingMode, false);

      // update user count to cache so that we can avoid recompute if no major
      // changes.
      user.setUserCountToRecalculateUserlimit(
          getUserCountToRecalculateUserlimit());
    }
{code}
To avoid race condition, is it better to save value of "latestVersion" and set 
it to "local" version, pesudo code like:
{code}
   int latestVersion = UM#getLatestVersion
   if (getLocalVersion != latestVersion) {
       // recompute ...
       setLocalVersion(latestVersion);
   }
{code}
- {{getComputedUserLimit}} / {{getComputedActiveUserLimit}}: I'm wondering if 
the UserToPartitionRecord is required, since existing computation doesn't need 
the "user" field.
- Does existing logic handle following scenario? Assume 
userLimitPerSchedulingMode is null and no new changes to user-limit, will NPE 
happen if we get a RESPECT_EXCLUSIVITY first and then get a IGNORE_EXCLUSIVITY?
- Inside {{computeUserLimit}}, it calls 
{{user.setUserResourceLimit(userLimitResource);}}, I think we need to file a 
separate JIRA to make sure REST API / web UI can get expected UL.
- TODO, after you update the patch, need to double check locks of UM.

> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to