According to the documentation, `spark.pyspark.python` configures which python executable is run on the workers. It seems to be ignored in my simple test cast. I'm running on a pip-installed Pyspark 2.1.1, completely stock. The only customization at this point is my Hadoop configuration directory.
In the below code, the `PYSPARK_PYTHON` value is used, so `session` is a functioning SparkSession. However, it shouldn't be; `spark.pyspark.python`is set to a nonsense value, and should take priority. If I take out the env variable, it just loads python2 - this value doesn't appear to have any impact for me. Any suggestions? import os import pprint import pyspark ip = '10.30.50.73' conf_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'conf', 'cloudera.yarn')) os.environ['YARN_CONF_DIR'] = conf_dir os.environ['HADOOP_CONF_DIR'] = conf_dir os.environ['PYSPARK_PYTHON'] = '/u/pyenv/versions/3.6.1/bin/python3' config = pyspark.SparkConf(loadDefaults=False) config.set('spark.driver.host', ip) config.set('spark.master', 'yarn') config.set('spark.submit.deployMode', 'client') config.set('spark.pyspark.python', 'foo/bar') spark_builder = pyspark.sql.SparkSession.builder.config(conf=config) session = spark_builder.getOrCreate() context = session.sparkContext config_string = pprint.pformat({key: value for key, value in context.getConf().getAll()}) print(config_string) import IPython IPython.embed() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-pyspark-python-is-ignored-tp28808.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org