[
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397344#comment-16397344
]
Eric Yang edited comment on YARN-7999 at 3/13/18 5:38 PM:
----------------------------------------------------------
[~jlowe] This is non entry_point docker image with clean checkout of trunk
code. There is no shared storage between nodes. I turned off docker daemon on
one of the node and this problem surfaced. However, I think my earlier
analysis is incorrect. The directory problem also exists in node that have
functional docker. However, it appears to be a race condition between log
directory creation and docker run. Please see the attached log file (q3.log).
was (Author: eyang):
[~jlowe] This is non entry_point docker image with clean checkout of trunk
code. There is no shared storage between nodes. I turned off docker daemon on
one of the node and this problem surfaced. However, I think my earlier
analysis is incorrect. The directory problem also exists in node that have
functional docker. However, it appears to be a race condition between log
directory creation and docker run. Please see the attached log file.
> Docker launch fails when user private filecache directory is missing
> --------------------------------------------------------------------
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: Eric Yang
> Assignee: Jason Lowe
> Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch
>
>
> Docker container is failing to launch in trunk. The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_000020]:
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_000020
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
> realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_000020//container_1520032931921_0001_01_000020.pid
> in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]