[jira] [Commented] (YARN-8274) Docker command error during container relaunch

Eric Badger (JIRA) Fri, 11 May 2018 14:17:23 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472679#comment-16472679
 ]


Eric Badger commented on YARN-8274:
-----------------------------------

bq. With 3.1.1 code freeze on Saturday, it is easy to make mistakes, and I like 
to get YARN-7654 committed before end of today. YARN-7654 and YARN-8207 are 
probably left uncommitted for too long
I understand that you want to get these patches into 3.1.1, but I don't believe 
we should rush to get features into releases and in the process compromise on 
quality. Rushed patches/reviews lead to bugs like this happening at an elevated 
rate. I'm also not particularly compelled by the argument that YARN-7654 and 
YARN-8207 have been uncommitted for too long. YARN-8207 ended up being a 127 kB 
patch of entirely C code, which is incredibly time-consuming to review, while 
YARN-7654 is now on patch number 23. It's not like these aren't getting 
reviewed, they are just going through a normal process of comprehensive review. 
I think that YARN-8027 getting committed in 2 weeks is a semi-miracle given the 
size, complexity, and possible ramifications of the changes. Reviewing that 
much C code (especially in a setuid binary) throughout 10 different patches is 
basically a full-time job. [~jlowe] has spent countless more hours/days than I 
think should be reasonably expected and is still working in an attempt to get 
these patches into 3.1.1. If anything, he should be commended and thanked for 
his yeoman’s effort here regardless of whether YARN-7654 makes it into 3.1.1.

So, while I understand that deadlines exist and that we should strive to meet 
them, I don't believe that we should rush patches in solely because of a 
deadline. That destabilizes the project and causes more work for everyone. If a 
patch/feature isn't fully ready, we should step back and get it into the next 
release rather than cut time on reviews and possibly miss something. At the end 
of the day, if we are introducing bugs like this consistently, which recently 
we have been, then we are clearly iterating too quickly and need to spend more 
time on reviewing each patch instead of rushing them to being committed. 

> Docker command error during container relaunch
> ----------------------------------------------
>
>                 Key: YARN-8274
>                 URL: https://issues.apache.org/jira/browse/YARN-8274
>             Project: Hadoop YARN
>          Issue Type: Task
>            Reporter: Billie Rinaldi
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 3.2.0, 3.1.1
>
>         Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [[email protected]] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_000002
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_000002' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8274) Docker command error during container relaunch

Reply via email to