Jason Lowe commented on YARN-7999:

Thanks for the logs!  I don't see how this could be a race, the container 
executor is not multithreaded and doesn't start running the docker command for 
a container before it has completed creating the directories for the container.

>From the logs it looks like we never got around to running a docker command at 
>all, rather the mount security checks within the container executor are 
>failing.  The "Creating local dirs..." log implies that the local directories 
>(including log directories per my previous comment) are being created, and 
>that's just before it tries to construct the docker run command which checks 
>the mount permissions.

I don't see an error like "Could not determine real path of mount" or "Could 
not stat path" in the launch logs, so I'm guessing the log directory is 
actually being created.  You could try setting 
yarn.nodemanager.delete.debug-delay-sec to a large enough value to facilitate 
verifying the log directory is actually there.  Given it's not complaining 
about being unable to stat the mount path before complaining about it, I 
suspect it is there.  That leads me to believe that it doesn't think that path 
is allowed rather than not there, which implies it is either missing from the 
whitelisted paths in the container executor config or maybe something is wrong 
with YARN-7626 which did recently go into trunk.

> Docker launch fails when user private filecache directory is missing
> --------------------------------------------------------------------
>                 Key: YARN-7999
>                 URL: https://issues.apache.org/jira/browse/YARN-7999
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Eric Yang
>            Assignee: Jason Lowe
>            Priority: Major
>         Attachments: YARN-7999.001.patch, YARN-7999.002.patch, q3.log
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_000020]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_000020
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_000020//container_1520032931921_0001_01_000020.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to