[
https://issues.apache.org/jira/browse/YARN-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Miklos Szegedi updated YARN-7239:
---------------------------------
Description:
ContainersLauncher.handle() submits the launch job and then adds the job into
the collection risking that the cleanup will miss it and return. This should be
in reversed order in all 3 instances:
{code}
containerLauncher.submit(launch);
running.put(containerId, launch);
{code}
The cleanup code that the above code is racing with:
{code}
ContainerLaunch runningContainer = running.get(containerId);
if (runningContainer == null) {
// Container not launched. So nothing needs to be done.
LOG.info("Container " + containerId + " not running, nothing to
signal.");
return;
}
...
{code}
was:
ContainersLauncher.handle() submits the launch job and then adds the job into
the collection risking that the cleanup will miss it and return. This should be
in reversed order in all 3 instances:
{code}
containerLauncher.submit(launch);
running.put(containerId, launch);
{code}
The cleanup code the above code is racing with:
{code}
ContainerLaunch runningContainer = running.get(containerId);
if (runningContainer == null) {
// Container not launched. So nothing needs to be done.
LOG.info("Container " + containerId + " not running, nothing to
signal.");
return;
}
...
{code}
> Possible launch/cleanup race condition in ContainersLauncher
> ------------------------------------------------------------
>
> Key: YARN-7239
> URL: https://issues.apache.org/jira/browse/YARN-7239
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Miklos Szegedi
> Labels: newbie
>
> ContainersLauncher.handle() submits the launch job and then adds the job into
> the collection risking that the cleanup will miss it and return. This should
> be in reversed order in all 3 instances:
> {code}
> containerLauncher.submit(launch);
> running.put(containerId, launch);
> {code}
> The cleanup code that the above code is racing with:
> {code}
> ContainerLaunch runningContainer = running.get(containerId);
> if (runningContainer == null) {
> // Container not launched. So nothing needs to be done.
> LOG.info("Container " + containerId + " not running, nothing to
> signal.");
> return;
> }
> ...
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]