GitHub user lianhuiwang reopened a pull request:
https://github.com/apache/spark/pull/3976
[SPARK-5173]support python application running on yarn cluster mode
now when we run python application on yarn cluster mode through
spark-submit, spark-submit does not support python application on yarn cluster
mode. so i modify code of submit and yarn's AM in order to support it.
through specifying .py file or primaryResource file via spark-submit, we
can make pyspark run in yarn-cluster mode.
example:spark-submit --master yarn-master --num-executors 1 --driver-memory
1g --executor-memory 1g xx.py --primaryResource yy.conf
this config is same as pyspark on yarn-client mode.
firstly,we put local path of .py or primaryResource to yarn's
dist.files.that can be distributed on slave nodes.and then in spark-submit we
transfer --py-files and --primaryResource to yarn.Client and use
"org.apache.spark.deploy.PythonRunner" to user class that can run .py files on
ApplicationMaster.
in yarn.Client we transfer --py-files and --primaryResource to
ApplicationMaster.
in ApplicationMaster, user's class is org.apache.spark.deploy.PythonRunner,
and user's args is primaryResource and -py-files. so that can make pyspark run
on ApplicationMaster.
@JoshRosen @tgravescs @sryza
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/lianhuiwang/spark SPARK-5173
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3976.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3976
----
commit 9c941bc59527e594ee1d155c00cb8e55d7c40fe8
Author: lianhuiwang <[email protected]>
Date: 2015-01-09T12:58:24Z
support python application running on yarn cluster mode
commit 172eec10b9daaf9ed838e821474d28871ab63462
Author: Wang Lianhui <[email protected]>
Date: 2015-01-09T15:01:52Z
fix a min submit's bug
commit f1f55b6eb4b65499be8e182e857d89a158873234
Author: lianhuiwang <[email protected]>
Date: 2015-01-29T11:13:35Z
when yarn-cluster, all python files can be non-local
commit 905a10610532578c774e58d12b927597330fb9ff
Author: lianhuiwang <[email protected]>
Date: 2015-01-31T03:29:09Z
update with sryza and andrewor 's comments
commit 097a5ec37456bf9d13a952f4108a750b9f9f84d0
Author: lianhuiwang <[email protected]>
Date: 2015-01-31T03:59:06Z
fix line length exceeds 100
commit 5b300648fe53d9de604e8afce7580fddfe6bbaef
Author: lianhuiwang <[email protected]>
Date: 2015-01-31T12:18:22Z
add test
commit d60bc6069cf65637622472ef1cd27153333df53c
Author: lianhuiwang <[email protected]>
Date: 2015-01-31T14:07:03Z
fix test
commit 2adc8f591ddd0f253496c18d32b1910d29e04c8d
Author: lianhuiwang <[email protected]>
Date: 2015-01-31T16:35:01Z
add spark.test.home
commit 47d2fc35e53a8851790607085bc67e94736358d6
Author: lianhuiwang <[email protected]>
Date: 2015-02-01T02:40:25Z
fix test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]