[
https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576925#comment-16576925
]
Jim Brennan commented on YARN-6495:
-----------------------------------
As part of YARN-8648, I am proposing that we can just remove the code that this
patch is fixing. If we are using cgroups, we are passing the {{cgroup-parent}}
argument to docker, which accomplishes what this code was trying to do in a
much more deterministic and reliable way.
My proposal would be to remove this code as part of YARN-8648, but if there is
a preference for doing that in a separate Jira, I can file a new one. Assuming
there is agreement, I think we can close out this Jira.
[~Jaeboo], [~ebadger], do you agree?
> check docker container's exit code when writing to cgroup task files
> --------------------------------------------------------------------
>
> Key: YARN-6495
> URL: https://issues.apache.org/jira/browse/YARN-6495
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Jaeboo Jeong
> Assignee: Jim Brennan
> Priority: Major
> Labels: Docker
> Attachments: YARN-6495.001.patch, YARN-6495.002.patch
>
>
> If I execute simple command like date on docker container, the application
> failed to complete successfully.
> for example,
> {code}
> $ yarn jar
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
> -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
> -num_containers 1 -timeout 3600000
> …
> 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished
> unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring
> loop
> 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to
> complete successfully
> {code}
> The error log is like below.
> {code}
> ...
> Failed to write pid to file
> /cgroup_parent/cpu/hadoop-yarn/container_xxxx/tasks - No such process
> ...
> {code}
> When writing pid to cgroup tasks, container-executor doesn’t check docker
> container’s status.
> If the container finished very quickly, we can’t write pid to cgroup tasks,
> and it is not problem.
> So container-executor needs to check docker container’s exit code during
> writing pid to cgroup tasks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]