Jack Zhu created YARN-9549:
------------------------------
Summary: Not able to run pyspark in docker driver container on
Yarn3
Key: YARN-9549
URL: https://issues.apache.org/jira/browse/YARN-9549
Project: Hadoop YARN
Issue Type: Bug
Components: yarn
Affects Versions: 3.1.2
Environment: Hadoop 3.1.1.3.1.0.0-78
spark version 2.3.2.3.1.0.0-78
Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211
Server: Docker Engine - Community Version: 18.09.6
Reporter: Jack Zhu
Attachments: Dockerfile, test.py
I follow
[https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html]
to build up a spark docker image to run pyspark, there isn't a good document
describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use
below command to launch my simple python job:
PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn
--deploy-mode cluster --num-executors 3 --executor-memory 1g --conf
spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf
spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 --conf
spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8
--conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py
in the test.py, it only simply collect the hostname from the executor, and
check whether the python job run in a container or not.
I found that the driver always run direct on the host, not run in the
container, as a result we need to keep python version in docker image
consistent with the nodemanager, this is meanless to use docker to package all
the dependencies.
The spark job can be run successfully, below is the std output:
Log Type: stdout
Log Upload Time: Tue May 14 02:07:06 +0000 2019
Log Length: 141
host.test.com
False ============>going to print all the container names. [True, True, True,
True, True, True, True, True, True]
please see attached Dockfile and test.py
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]