[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453031#comment-16453031 ]
Zian Chen commented on YARN-8193: --------------------------------- Update patch v2 to remove the unused sehcdulerKey in getLocalityWaitFactor. Is the latest patch looks good? [~leftnoteasy] > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > ----------------------------------------------------------------------------------------- > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Reporter: Zian Chen > Assignee: Zian Chen > Priority: Critical > Attachments: YARN-8193.001.patch, YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org