[
https://issues.apache.org/jira/browse/YARN-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462451#comment-16462451
]
Shane Kumpf edited comment on YARN-8194 at 5/3/18 1:37 PM:
-----------------------------------------------------------
{quote}There is no container relaunch commited to branch-3.1.{quote}
It seems there is a bit of confusion here. The Relaunch feature was added in
Hadoop 2.9 via YARN-3998 and does exist in branch-3.1. As this patch fixes an
issue that causes the NM to shutdown if someone tries out upgrade, I think this
is needed in branch-3.1 as well, since that upgrade code has been committed
there. It looks like this patch applies to branch-3.1 without issue.
was (Author: [email protected]):
{quote}There is no container relaunch commited to branch-3.1.\{quote}
It seems there is a bit of confusion here. The Relaunch feature was added in
Hadoop 2.9 via YARN-3998 and does exist in branch-3.1. As this patch fixes an
issue that causes the NM to shutdown if someone tries out upgrade, I think this
is needed in branch-3.1 as well, since that upgrade code has been committed
there. It looks like this patch applies to branch-3.1 without issue.
> Exception when reinitializing a container using LinuxContainerExecutor
> ----------------------------------------------------------------------
>
> Key: YARN-8194
> URL: https://issues.apache.org/jira/browse/YARN-8194
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Chandni Singh
> Assignee: Chandni Singh
> Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: YARN-8194.001.patch
>
>
> When a component instance is upgraded and the container executor is set to
> {{org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor}}, then
> the following exception is seen in the nodemanager:
> {code}
> Writing to cgroup task files...
> Creating local dirs...
> Can't open
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
> for output - File exists
> Getting exit code file...
> Creating script paths...
> Full command array for failed execution:
> [/usr/local/hadoop-3.2.0-SNAPSHOT/bin/container-executor, hbase, hbase, 1,
> application_1524242413029_0001, container_1524242413029_0001_01_000002,
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002,
>
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh,
>
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.tokens,
>
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid,
> /tmp/hadoop-yarn/nm-local-dir,
> /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs, cgroups=none]
> 2018-04-20 16:50:16,641 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
> Launch container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
> ExitCodeException exitCode=33: Could not create copy file 3
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
> Could not create local files and directories
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:118)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: ExitCodeException exitCode=33: Could not create copy file 3
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
> Could not create local files and directories
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
> ... 11 more
> 2018-04-20 16:50:16,642 WARN
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code
> from container container_1524242413029_0001_01_000002 is : 33
> 2018-04-20 16:50:16,642 WARN
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception
> from container-launch with container ID:
> container_1524242413029_0001_01_000002 and exit code: 33
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
> Launch container failed
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-04-20 16:50:16,643 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from
> container-launch.
> 2018-04-20 16:50:16,643 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id:
> container_1524242413029_0001_01_000002
> 2018-04-20 16:50:16,643 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 33
> 2018-04-20 16:50:16,643 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception
> message: Launch container failed
> 2018-04-20 16:50:16,643 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error
> output: Could not create copy file 3
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]