[ https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514305#comment-14514305 ]
Hudson commented on YARN-3464: ------------------------------ FAILURE: Integrated in Hadoop-Mapreduce-trunk #2126 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2126/]) YARN-3464. Race condition in LocalizerRunner kills localizer before localizing all resources. (Zhihai Xu via kasha) (kasha: rev 47279c3228185548ed09c36579b420225e4894f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/LocalizationEventType.java > Race condition in LocalizerRunner kills localizer before localizing all > resources > --------------------------------------------------------------------------------- > > Key: YARN-3464 > URL: https://issues.apache.org/jira/browse/YARN-3464 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: zhihai xu > Assignee: zhihai xu > Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3464.000.patch, YARN-3464.001.patch > > > Race condition in LocalizerRunner causes container localization timeout. > Currently LocalizerRunner will kill the ContainerLocalizer when pending list > for LocalizerResourceRequestEvent is empty. > {code} > } else if (pending.isEmpty()) { > action = LocalizerAction.DIE; > } > {code} > If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the > ContainerLocalizer due to empty pending list, this > LocalizerResourceRequestEvent will never be handled. > Without ContainerLocalizer, LocalizerRunner#update will never be called. > The container will stay at LOCALIZING state, until the container is killed by > AM due to TASK_TIMEOUT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)