Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19840
@yaooqinn OK, I see the situation.
In client mode, I think we can't use `spark.yarn.appMasterEnv.XXX` which is
for cluster mode. So we should use environment variable `PYSPARK_PYTHON` or
`PYSPARK_DRIVER_PYTHON`, or corresponding spark conf, `spark.pyspark.python`,
`spark.pyspark.driver.python`.
In cluster mode, we can use `spark.yarn.appMasterEnv.XXX` and if there
exist `spark.yarn.appMasterEnv.PYSPARK_PYTHON` or
`spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON`, they overwrite original
environment variables.
Btw, `PYSPARK_DRIVER_PYTHON` is for only Driver, not Executors, so we
should handle only `PYSPARK_PYTHON` in executor and the priority of
`PYSPARK_DRIVER_PYTHON` is higher than `PYSPARK_PYTHON` in Driver.
Currently we handle only environment varibale but not
`spark.executorEnv.PYSPARK_PYTHON` for executor so we should handle it at
`api/python/PythonRunner` as you do now or
[context.py#L191](https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]