[
https://issues.apache.org/jira/browse/YARN-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457499#comment-16457499
]
Tao Yang commented on YARN-8025:
--------------------------------
Thanks [~yangjiandan] for discovering this problem.
This NPE happens when async-scheduling enabled, first thread (Thread-1) is
holding the write lock and computing limit resource while another thread
(Thread-2) is waiting. In this case, Thread-2 can hold the write lock later but
don't need to recompute resource limit (Thread-1 have did this and add into the
cache) and update userLimitPerSchedulingMode (will remain be null). So that NPE
will be thrown when calling {{userLimitPerSchedulingMode.get(...)}} later.
Reference code:
{code:java}
Map<SchedulingMode, Resource> userLimitPerSchedulingMode =
preComputedActiveUserLimit
.get(nodePartition);
try {
writeLock.lock();
if (isRecomputeNeeded(schedulingMode, nodePartition, true)) {
// recompute
userLimitPerSchedulingMode = reComputeUserLimits(userName,
nodePartition, clusterResource, schedulingMode, true);
// update user count to cache so that we can avoid recompute if no major
// changes.
setLocalVersionOfUsersState(nodePartition, schedulingMode, true);
}
} finally {
writeLock.unlock();
}
Resource userLimitResource = userLimitPerSchedulingMode.get(schedulingMode);
{code}
We can fix this problem through calling {{userLimitPerSchedulingMode =
preComputedActiveUserLimit.get(nodePartition)}} after the lock block.
> UsersManangers#getComputedResourceLimitForActiveUsers throws NPE due to
> preComputedActiveUserLimit is empty
> -----------------------------------------------------------------------------------------------------------
>
> Key: YARN-8025
> URL: https://issues.apache.org/jira/browse/YARN-8025
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Jiandan Yang
> Priority: Major
> Attachments: YARN-8025.001.patch
>
>
> UsersManangers#getComputedResourceLimitForActiveUsers throws NPE when I run
> SLS.
> *preComputedActiveUserLimit* is not put any element in the code.
> {code:java}
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.getComputedResourceLimitForActiveUsers(UsersManager.java:511)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getResourceLimitForActiveUsers(LeafQueue.java:1576)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.computeUserLimitAndSetHeadroom(LeafQueue.java:1517)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1190)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:824)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:630)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1834)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1802)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersOnMultiNodes(CapacityScheduler.java:1925)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1946)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.scheduleBasedOnNodeLabels(CapacityScheduler.java:732)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:774)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]