[ https://issues.apache.org/jira/browse/SPARK-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Kimbrel updated SPARK-9229: -------------------------------- Environment: centos Cloudera 5.4.1 based off Apache Hadoop 2.6.0, using spark 1.5.0 built for hadoop 2.6.0 from github master branch on 7.20.2015 (was: centos ) > pyspark yarn-cluster PYSPARK_PYTHON not set > -------------------------------------------- > > Key: SPARK-9229 > URL: https://issues.apache.org/jira/browse/SPARK-9229 > Project: Spark > Issue Type: Bug > Affects Versions: 1.5.0 > Environment: centos Cloudera 5.4.1 based off Apache Hadoop 2.6.0, > using spark 1.5.0 built for hadoop 2.6.0 from github master branch on > 7.20.2015 > Reporter: Eric Kimbrel > > PYSPARK_PYTHON is set in spark-env.sh to use an alternative python > installation. > Use spark-submit to run a pyspark job in yarn with cluster deploy mode. > PYSPARK_PTYHON is not set in the cluster environment, and the system default > python is used instead of the intended original. > test code: (simple.py) > from pyspark import SparkConf, SparkContext > import sys,os > conf = SparkConf() > sc = SparkContext(conf=conf) > out = [('PYTHON VERSION',str(sys.version))] > out.extend( zip( os.environ.keys(),os.environ.values() ) ) > rdd = sc.parallelize(out) > rdd.coalesce(1).saveAsTextFile("hdfs://namenode/tmp/env") > submit command: > spark-submit --master yarn --deploy-mode cluster --num-executors 1 simple.py > I've also tried setting PYSPARK_PYTHON on the command line with no effect. > It seems like there is no way to specify an alternative python executable in > yarn-cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org