[
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626455#comment-16626455
]
Chandni Singh edited comment on YARN-7644 at 9/24/18 9:14 PM:
--------------------------------------------------------------
For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and
{{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks
and submits it to the executor to be performed in a non-blocking way:
{code:java}
containerLauncher.submit(launch);
{code}
However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}},
{{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions
are performed in a blocking way.
{code:java}
launcher.cleanupContainer();
{code}
With this Jira, I can focus on {{CLEANUP_CONTAINER}} and
{{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way.
Doesn't look the caller ({{ContainerImpl}}) waits anywhere for
{{cleanupContainer()}} to be performed synchronously. It is triggered by
dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events.
cc. [~ebadger] [~jlowe]
was (Author: csingh):
For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and
{{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks
and submits it to the executor to be performed in a non-blocking way:
{code:java}
containerLauncher.submit(launch);
{code}
However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}},
{{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions
are performed in a blocking way.
{code:java}
launcher.cleanupContainer();
{code}
With this Jira, I can focus on {{CLEANUP_CONTAINER}} and
{{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way.
Doesn't look the caller ({{ContainerImpl}}) waits anywhere for
{{cleanupContainer()}} to be performed synchronously. It is triggered by
dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events.
> NM gets backed up deleting docker containers
> --------------------------------------------
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Eric Badger
> Assignee: Chandni Singh
> Priority: Major
> Labels: Docker
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10
> seconds when we shut down a container. If the container does not stop after
> 10 seconds then we force kill it. However, the {{docker stop}} command is a
> blocking call. So in cases where lots of containers don't go down with the
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to
> return. This ties up the ContainerLaunch handler and so these kill events
> back up. It also appears to be backing up new container launches as well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]