[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446447#comment-16446447 ]
Zian Chen commented on YARN-8193: --------------------------------- Dig into the code logic, when we decide if we can assign a container to a requesting application in Async scheduling, we should figure out the number of unique locations asks in RegularContainerAllocator#canAssign before we can pass it into RegularContainerAllocator#getLocalityWaitFactor. We only set canAssign result to be true after we do the NULL check for getting current application's AppPlacementAllocator and the number of unique locations asks is equal to one. Also, we need to do NULL check in RegularContainerAllocator#preCheckForNodeCandidateSet when getting AppPlacementAllocator as well since this is possible when #pending resource decreased by a different thread. > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > ----------------------------------------------------------------------------------------- > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Reporter: Zian Chen > Assignee: Zian Chen > Priority: Critical > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org