[ 
https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf reassigned YARN-8035:
---------------------------------

    Assignee: Shane Kumpf

> Uncaught exception in ContainersMonitorImpl during relaunch due to the 
> process ID changing
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-8035
>                 URL: https://issues.apache.org/jira/browse/YARN-8035
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-8035.001.patch
>
>
> In the case of a container relaunch event, the container ID is reused but a 
> new process is spawned. For resource monitoring, {{ContainersMonitorImpl}} 
> will obtain the new PID post relaunch and initialize the process tree 
> monitoring. As part of this initialization, a tag called {{ContainerPid}}, 
> whose value is the PID for the container, is populated for the metrics 
> associated with the container. If the prior container failed after its 
> process started, the original PID will already be populated for the 
> container, resulting in the {{MetricsException}} below.
> {code:java}
> 2018-03-16 11:59:02,563 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Uncaught exception in ContainersMonitorImpl while monitoring resource of 
> container_1521201379995_0001_01_000002
> org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists!
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code}
> {{MetricsRegistry}} provides a {{tag}} method that allows for updating the 
> value of an existing tag. Updating the value ensures that the PID associated 
> with container is the currently running process, which appears to be an 
> appropriate fix. However, it's unclear how this tag might be being used by 
> other systems. I'm not finding any usage in Hadoop itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to