spark.pyspark.python is ignored?

Jason White Thu, 29 Jun 2017 09:38:05 -0700

According to the documentation, `spark.pyspark.python` configures which
python executable is run on the workers. It seems to be ignored in my simple
test cast. I'm running on a pip-installed Pyspark 2.1.1, completely stock.
The only customization at this point is my Hadoop configuration directory.


In the below code, the `PYSPARK_PYTHON` value is used, so `session` is a
functioning SparkSession. However, it shouldn't be; `spark.pyspark.python`is
set to a nonsense value, and should take priority. If I take out the env
variable, it just loads python2 - this value doesn't appear to have any
impact for me.

Any suggestions?


import os
import pprint
import pyspark

ip = '10.30.50.73'

conf_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'conf',
'cloudera.yarn'))
os.environ['YARN_CONF_DIR'] = conf_dir
os.environ['HADOOP_CONF_DIR'] = conf_dir
os.environ['PYSPARK_PYTHON'] = '/u/pyenv/versions/3.6.1/bin/python3'

config = pyspark.SparkConf(loadDefaults=False)
config.set('spark.driver.host', ip)
config.set('spark.master', 'yarn')
config.set('spark.submit.deployMode', 'client')
config.set('spark.pyspark.python', 'foo/bar')

spark_builder = pyspark.sql.SparkSession.builder.config(conf=config)
session = spark_builder.getOrCreate()

context = session.sparkContext
config_string = pprint.pformat({key: value for key, value in
context.getConf().getAll()})
print(config_string)

import IPython
IPython.embed()



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-pyspark-python-is-ignored-tp28808.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

spark.pyspark.python is ignored?

Reply via email to