[
https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shane Kumpf reassigned YARN-8035:
---------------------------------
Assignee: Shane Kumpf
> Uncaught exception in ContainersMonitorImpl during relaunch due to the
> process ID changing
> ------------------------------------------------------------------------------------------
>
> Key: YARN-8035
> URL: https://issues.apache.org/jira/browse/YARN-8035
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Shane Kumpf
> Assignee: Shane Kumpf
> Priority: Major
> Attachments: YARN-8035.001.patch
>
>
> In the case of a container relaunch event, the container ID is reused but a
> new process is spawned. For resource monitoring, {{ContainersMonitorImpl}}
> will obtain the new PID post relaunch and initialize the process tree
> monitoring. As part of this initialization, a tag called {{ContainerPid}},
> whose value is the PID for the container, is populated for the metrics
> associated with the container. If the prior container failed after its
> process started, the original PID will already be populated for the
> container, resulting in the {{MetricsException}} below.
> {code:java}
> 2018-03-16 11:59:02,563 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Uncaught exception in ContainersMonitorImpl while monitoring resource of
> container_1521201379995_0001_01_000002
> org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists!
> at
> org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433)
> at
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394)
> at
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code}
> {{MetricsRegistry}} provides a {{tag}} method that allows for updating the
> value of an existing tag. Updating the value ensures that the PID associated
> with container is the currently running process, which appears to be an
> appropriate fix. However, it's unclear how this tag might be being used by
> other systems. I'm not finding any usage in Hadoop itself.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]