[ 
https://issues.apache.org/jira/browse/YARN-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457499#comment-16457499
 ] 

Tao Yang commented on YARN-8025:
--------------------------------

Thanks [~yangjiandan] for discovering this problem. 
This NPE happens when async-scheduling enabled, first thread (Thread-1) is 
holding the write lock and computing limit resource while another thread 
(Thread-2) is waiting. In this case, Thread-2 can hold the write lock later but 
don't need to recompute resource limit (Thread-1 have did this and add into the 
cache) and update userLimitPerSchedulingMode (will remain be null). So that NPE 
will be thrown when calling {{userLimitPerSchedulingMode.get(...)}} later.
Reference code:
{code:java}
Map<SchedulingMode, Resource> userLimitPerSchedulingMode = 
preComputedActiveUserLimit
        .get(nodePartition);

    try {
      writeLock.lock();
      if (isRecomputeNeeded(schedulingMode, nodePartition, true)) {
        // recompute
        userLimitPerSchedulingMode = reComputeUserLimits(userName,
            nodePartition, clusterResource, schedulingMode, true);

        // update user count to cache so that we can avoid recompute if no major
        // changes.
        setLocalVersionOfUsersState(nodePartition, schedulingMode, true);
      }
    } finally {
      writeLock.unlock();
    }

    Resource userLimitResource = userLimitPerSchedulingMode.get(schedulingMode);
{code}

We can fix this problem through calling {{userLimitPerSchedulingMode = 
preComputedActiveUserLimit.get(nodePartition)}} after the lock block. 


> UsersManangers#getComputedResourceLimitForActiveUsers throws NPE due to 
> preComputedActiveUserLimit is empty
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8025
>                 URL: https://issues.apache.org/jira/browse/YARN-8025
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Jiandan Yang 
>            Priority: Major
>         Attachments: YARN-8025.001.patch
>
>
> UsersManangers#getComputedResourceLimitForActiveUsers throws NPE when I run 
> SLS.
>  *preComputedActiveUserLimit* is not put any element in the code.
> {code:java}
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.getComputedResourceLimitForActiveUsers(UsersManager.java:511)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getResourceLimitForActiveUsers(LeafQueue.java:1576)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.computeUserLimitAndSetHeadroom(LeafQueue.java:1517)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1190)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:824)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:630)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1834)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1802)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersOnMultiNodes(CapacityScheduler.java:1925)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1946)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.scheduleBasedOnNodeLabels(CapacityScheduler.java:732)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:774)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to