[ 
https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446447#comment-16446447
 ] 

Zian Chen commented on YARN-8193:
---------------------------------

Dig into the code logic, when we decide if we can assign a container to a 
requesting application in Async scheduling, we should figure out the number of 
unique locations asks in RegularContainerAllocator#canAssign before we can pass 
it into RegularContainerAllocator#getLocalityWaitFactor. We only set canAssign 
result to be true after we do the NULL check for getting current application's 
AppPlacementAllocator and the number of unique locations asks is equal to one.

Also, we need to do NULL check in 
RegularContainerAllocator#preCheckForNodeCandidateSet when getting 
AppPlacementAllocator as well since this is possible when #pending resource 
decreased by a different thread.

 

 

 

> YARN RM hangs abruptly (stops allocating resources) when running successive 
> applications.
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-8193
>                 URL: https://issues.apache.org/jira/browse/YARN-8193
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Zian Chen
>            Assignee: Zian Chen
>            Priority: Critical
>
> When running massive queries successively, at some point RM just hangs and 
> stops allocating resources. At the point RM get hangs, YARN throw 
> NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.
> There's sufficient space given to yarn.nodemanager.local-dirs (not a node 
> health issue, RM didn't report any node being unhealthy). There is no fixed 
> trigger for this (query or operation).
> This problem goes away on restarting ResourceManager. No NM restart is 
> required. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to