[ 
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374138#comment-15374138
 ] 

Shane Kumpf commented on YARN-4759:
-----------------------------------

The two remaining checkstyle errors are because the package names are over 80 
characters. Looking at other examples, they also have the same issue, so I 
assume this can be ignored?

Also, the changes to container-executor are necessary because the exitcode file 
is used in the container reacquisition process. Without these changes, the 
exitcode file is not written as the NM user, and cannot be used during 
recovery. Since the exitcode file lives in nmPrivate, ensuring the file is 
written as the NM user seems appropriate. 

Root privileges are also dropped after issuing the "docker" related commands.

Below is the exception without this change.

{code}
2016-07-12 17:32:59,831 ERROR 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch:
 Unable to recover container container_1468357024753_0004_01_000002
java.io.IOException: File 
'/usr/local/src/hadoop_install/hadoop/tmp/yarn/nm-local-dir/nmPrivate/application_1468357024753_0004/container_1468357024753_0004_01_000002/container_1468357024753_0004_01_000002.pid.exitcode'
 cannot be read
        at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:296)
        at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1711)
        at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1748)
        at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:232)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:479)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:85)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:48)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)2016-07-12 17:32:59,831 ERROR 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch:
 Unable to recover container container_1468357024753_0004_01_000002
java.io.IOException: File 
'/usr/local/src/hadoop_install/hadoop/tmp/yarn/nm-local-dir/nmPrivate/application_1468357024753_0004/container_1468357024753_0004_01_000002/container_1468357024753_0004_01_000002.pid.exitcode'
 cannot be read
        at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:296)
        at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1711)
        at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1748)
        at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:232)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:479)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:85)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:48)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

> Revisit signalContainer() for docker containers
> -----------------------------------------------
>
>                 Key: YARN-4759
>                 URL: https://issues.apache.org/jira/browse/YARN-4759
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Sidharta Seethana
>            Assignee: Shane Kumpf
>         Attachments: YARN-4759.001.patch, YARN-4759.002.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be 
> revisited for docker containers. For example, container reacquisition on NM 
> restart might not work, depending on which user the process in the container 
> runs as. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to