[ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917003#comment-16917003
 ] 

Eric Yang commented on YARN-9561:
---------------------------------

[~ebadger] {quote}For your 3rd point, I think it would be better to do this 
check in Java. That way we can catch the failure earlier since all containers 
trying to run runC will fail if overlay is not installed. For the check, I was 
thinking of doing an lsmod on "overlay"{quote}

Do you mean C side?  Java side does not have privileges to run modprobe or 
lsmod due to lack of root privileges.

It took me several days to restore my cluster to a working state with overlay 
kernel module installed.  In the latest patch 004, mapreduce pi job fails when 
trying to run mapreduce pi:
{code}
vars="YARN_CONTAINER_RUNTIME_TYPE=runc,YARN_CONTAINER_RUNTIME_RUNC_IMAGE=local/java-centos:latest"
./bin/hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0-SNAPSHOT.jar pi 
-Dmapreduce.map.env=$vars -Dmapreduce.reduce.env=$vars 10 100
{code}

Observed log output:
{code}
2019-08-27 11:46:08,377 INFO mapreduce.Job: Task Id : 
attempt_1566930487263_0002_m_000002_2, Status : FAILED
[2019-08-27 11:46:07.131]Exception from container-launch.
Container id: container_1566930487263_0002_01_000030
Exit code: 1
Exception message: Launch container failed

[2019-08-27 11:46:07.133]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapred.YarnChild


[2019-08-27 11:46:07.134]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapred.YarnChild
{code}

My base container image only contains centos:latest, and java-1.8.0-openjdk rpm 
installed.  It does not have Hadoop binaries in the container.  Do we need 
implicit mounting of Hadoop binaries to enable existing workload to run with 
runc?  If not, what step can be used to run an example app?

> Add C changes for the new RuncContainerRuntime
> ----------------------------------------------
>
>                 Key: YARN-9561
>                 URL: https://issues.apache.org/jira/browse/YARN-9561
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>         Attachments: YARN-9561.001.patch, YARN-9561.002.patch, 
> YARN-9561.003.patch, YARN-9561.004.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new RuncContainerRuntime. There should be 
> no changes to existing code paths. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to