[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885995#comment-16885995
 ] 

KWON BYUNGCHANG commented on YARN-9647:
---------------------------------------

[~ebadger] Thanks for your comments.

The process of mounting volume in YARN is as follows.

step1.  validate mountable point (docker.allowed.ro-mounts, 
docker.allowed.rw-mounts) that is configured by yarn administrator  in 
/etc/hadoop/conf/container-executor.cfg
step2. validate mount point that is configured by user 
step3. validate mount point of step2 belong to mountable point of step1

if  /data2/  is unhealthy,  threre is not /data2/ in mount point configuration 
(step2) because nodemanager already know /data2 is unhealthy.
problem is  /data2 still exists in /etc/hadoop/conf/container-executor.cfg 
because container-exector.cfg is static configuation file.
and docker launch fails in step1 because container-executor cannot resolve real 
path of /data2. 
I simply modified step1 to ignore unresolving mountable path.

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -------------------------------------------------------------
>
>                 Key: YARN-9647
>                 URL: https://issues.apache.org/jira/browse/YARN-9647
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.1.2
>            Reporter: KWON BYUNGCHANG
>            Priority: Major
>         Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>    docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>    docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to