[
https://issues.apache.org/jira/browse/YARN-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314562#comment-15314562
]
Hudson commented on YARN-5190:
------------------------------
SUCCESS: Integrated in Hadoop-trunk-Commit #9906 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/9906/])
YARN-5190. Registering/unregistering container metrics in (jianhe: rev
99cc439e29794f8e61bebe03b2a7ca4b6743ec92)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java
*
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java
*
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/lib/DefaultMetricsSystem.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
> Registering/unregistering container metrics triggered by ContainerEvent and
> ContainersMonitorEvent are conflict which cause uncaught exception in
> ContainerMonitorImpl
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-5190
> URL: https://issues.apache.org/jira/browse/YARN-5190
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Junping Du
> Assignee: Junping Du
> Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: YARN-5190-v2.patch, YARN-5190.patch
>
>
> The exception stack is as following:
> {noformat}
> 310735 2016-05-22 01:50:04,554 [Container Monitor] ERROR
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Container
> Monitor,5,main] threw an Exception.
> 310736 org.apache.hadoop.metrics2.MetricsException: Metrics source
> ContainerResource_container_1463840817638_14484_01_000010 already exists!
> 310737 at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
> 310738 at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
> 310739 at
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> 310740 at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:212)
> 310741 at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:198)
> 310742 at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:385)
> {noformat}
> After YARN-4906, we have multiple places to get ContainerMetrics for a
> particular container that could cause race condition in registering the same
> container metrics to DefaultMetricsSystem by different threads. Lacking of
> proper handling of MetricsException which could get thrown, the exception
> will could bring down daemon of ContainerMonitorImpl or even whole NM.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]