[
https://issues.apache.org/jira/browse/YARN-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839068#comment-16839068
]
Jack Zhu commented on YARN-9549:
--------------------------------
Thanks for you replay, I have attached my yarn-site.xml
> Not able to run pyspark in docker driver container on Yarn3
> -----------------------------------------------------------
>
> Key: YARN-9549
> URL: https://issues.apache.org/jira/browse/YARN-9549
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 3.1.2
> Environment: Hadoop 3.1.1.3.1.0.0-78
> spark version 2.3.2.3.1.0.0-78
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211
> Server: Docker Engine - Community Version: 18.09.6
> Reporter: Jack Zhu
> Priority: Critical
> Attachments: Dockerfile, test.py, yarn-site.xml
>
>
> I follow
> [https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html]
> to build up a spark docker image to run pyspark, there isn't a good document
> describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use
> below command to launch my simple python job:
> PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn
> --deploy-mode cluster --num-executors 3 --executor-memory 1g --conf
> spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf
> spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8
> --conf
> spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8
> --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py
>
> in the test.py, it only simply collect the hostname from the executor, and
> check whether the python job run in a container or not.
> I found that the driver always run direct on the host, not run in the
> container, as a result we need to keep python version in docker image
> consistent with the nodemanager, this is meanless to use docker to package
> all the dependencies.
>
> The spark job can be run successfully, below is the std output:
> Log Type: stdout
> Log Upload Time: Tue May 14 02:07:06 +0000 2019
> Log Length: 141
> host.test.com
> False ============>going to print all the container names. [True, True, True,
> True, True, True, True, True, True]
> please see attached Dockfile and test.py
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]