[ 
https://issues.apache.org/jira/browse/SPARK-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Kimbrel updated SPARK-9229:
--------------------------------
    Environment: centos   Cloudera 5.4.1 based off Apache Hadoop 2.6.0, using 
spark 1.5.0 built for hadoop 2.6.0 from github master branch on 7.20.2015  
(was: centos )

> pyspark yarn-cluster  PYSPARK_PYTHON not set
> --------------------------------------------
>
>                 Key: SPARK-9229
>                 URL: https://issues.apache.org/jira/browse/SPARK-9229
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: centos   Cloudera 5.4.1 based off Apache Hadoop 2.6.0, 
> using spark 1.5.0 built for hadoop 2.6.0 from github master branch on 
> 7.20.2015
>            Reporter: Eric Kimbrel
>
> PYSPARK_PYTHON is set in spark-env.sh to use an alternative python 
> installation.
> Use spark-submit to run a pyspark job in yarn with cluster deploy mode.
> PYSPARK_PTYHON is not set in the cluster environment, and the system default 
> python is used instead of the intended original.
> test code: (simple.py)
> from pyspark import SparkConf, SparkContext
> import sys,os
> conf = SparkConf()
> sc = SparkContext(conf=conf)
> out = [('PYTHON VERSION',str(sys.version))]
> out.extend( zip( os.environ.keys(),os.environ.values() ) )
> rdd = sc.parallelize(out)
> rdd.coalesce(1).saveAsTextFile("hdfs://namenode/tmp/env")
> submit command:
> spark-submit --master yarn --deploy-mode cluster --num-executors 1 simple.py 
> I've also tried setting PYSPARK_PYTHON on the command line with no effect.
> It seems like there is no way to specify an alternative python executable in 
> yarn-cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to