Shane Kumpf created YARN-8035:
---------------------------------

             Summary: Uncaught exception in ContainersMonitorImpl during 
relaunch due to the process ID changing
                 Key: YARN-8035
                 URL: https://issues.apache.org/jira/browse/YARN-8035
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Shane Kumpf


In the case of a container relaunch event, the container ID is reused but a new 
process is spawned. For resource monitoring, {{ContainersMonitorImpl}} will 
obtain the new PID post relaunch and initialize the process tree monitoring. As 
part of this initialization, a tag called {{ContainerPid}}, whose value is the 
PID for the container, is populated for the metrics associated with the 
container. If the prior container failed after its process started, the 
original PID will already be populated for the container, resulting in the 
{{MetricsException}} below.
{code:java}
2018-03-16 11:59:02,563 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Uncaught exception in ContainersMonitorImpl while monitoring resource of 
container_1521201379995_0001_01_000002
org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists!
at 
org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433)
at org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394)
at org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code}
{{MetricsRegistry}} provides a {{tag}} method that allows for updating the 
value of an existing tag. Updating the value ensures that the PID associated 
with container is the currently running process, which appears to be an 
appropriate fix. However, it's unclear how this tag might be being used by 
other systems. I'm not finding any usage in Hadoop itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to