Billie Rinaldi created YARN-9071:
------------------------------------
Summary: NM and service AM don't have updated status for
reinitialized containers
Key: YARN-9071
URL: https://issues.apache.org/jira/browse/YARN-9071
Project: Hadoop YARN
Issue Type: Bug
Reporter: Billie Rinaldi
Container resource monitoring is not stopped during the reinitialization
process, and this prevents the NM from obtaining updated process tree
information when the container starts running again. I observed a reinitialized
container go from RUNNING to REINITIALIZING to REINITIALIZING_AWAITING_KILL to
SCHEDULED to RUNNING. Container monitoring was then started for a second time,
but since the trackingContainers entry had already been initialized for the
container, ContainersMonitor skipped finding the new PID and IP for the
container. A possible solution would be to stop the container monitoring in the
reinitialization process so that the process tree information would be
initialized properly when monitoring is restarted. When the same container was
stopped by the NM later, the NM did not kill the container, and the service AM
received an unexpected event (stop at reinitializing).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]