[ 
https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567431#comment-16567431
 ] 

Eric Yang commented on YARN-8587:
---------------------------------

DistributedShell uses YARN v1 API, which doesn't support more fine-grained 
status distinction between container-executor running vs docker running.  If 
docker run failed due to invalid parameters supplied by distributed shell, it 
may take up to a minute to fail the container because the delay happens in 
heart beat interval to report the status to AM and RM.  The recommendation is 
to update the test case to use yarn container -list [appId] to shorten the time 
to check container running status from RM, but not completely eliminate 
possible network delay in container status report.

> Delays are noticed to launch docker container
> ---------------------------------------------
>
>                 Key: YARN-8587
>                 URL: https://issues.apache.org/jira/browse/YARN-8587
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.1
>            Reporter: Yesha Vora
>            Priority: Major
>              Labels: Docker
>
> Launch dshell application. Wait for application to go in RUNNING state.
> {code:java}
> yarn  jar /xx/hadoop-yarn-applications-distributedshell-*.jar  -shell_command 
> "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
> -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar
> {code}
> Find out container allocation. Run docker inspect command for docker 
> containers launched by app.
> Sometimes, the container is allocated to NM but docker PID is not up.
> {code:java}
> Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null 
> xxx "sudo su - -c \"docker ps  -a | grep 
> container_e02_1531189225093_0003_01_000002\" root" failed after 0 retries 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to