Github user yaooqinn commented on the issue:
https://github.com/apache/spark/pull/19840
#### use spark-2.2.0-bin-hadoop2.7 numpy
examples/src/main/python/mllib/correlations_example.py
### case 1
|key|value|
|---|---|
|**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**client**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv.PYSPARK_PYTHON| py3.zip/py3/bin/python |
|failure|Exception: Python in worker has different version 2.7 than that in
driver 3.6, PySpark cannot run with different minor versions.Please check
environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly
set.|
### case 2
|key|value|
|---|---|
|**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv.PYSPARK_PYTHON| py3.zip/py3/bin/python |
|failure|java.io.IOException: Cannot run program
"/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or
directory at
org.apache.spark.**deploy.PythonRunner**$.main(PythonRunner.scala:91)|
### case 3 & 4
|key|value|
|---|---|
|**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster(3) client (4)**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_DRIVER_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|failure|Exception: Python in worker has different version 2.7 than that in
driver 3.6, PySpark cannot run with different minor versions.Please check
environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly
set.|
### case 5 && 6
|key|value|
|---|---|
|**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster(6)**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_DRIVER_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|failure|java.io.IOException: Cannot run program
"/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or
directory [**executor side PythonRunner**]|
### case 7
|key|value|
|---|---|
|**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|**success **|--|
### case 8
|key|value|
|---|---|
|**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|failure|java.io.IOException: Cannot run program
"/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or
directory [**executor side PythonRunner**]|
### case 9
|key|value|
|---|---|
|not setting~~PYSPARK_[DRIVER]_PYTHON~~|<empty>|
|deploy-mode|**client**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|failure|ImportError: No module named numpy|
### case 10
|key|value|
|---|---|
|not setting~~PYSPARK_[DRIVER]_PYTHON~~|<empty>|
|deploy-mode|**cluster**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|**success**| -- |
### my humble opinions
1. spark.executorEnv. PYSPARK_* takes no affect on executor side
pythonExec, which is determined by driver.
2. if PYSPARK_PYTHON is specified then **spark.yarn.appMasterEnv.** should
be suffixed by **PYSPARK_PYTHON** not ~~PYSPARK_DRIVER_PYTHON~~
3. specifying PYSPARK_DRIVER_PYTHON fails all the cases, it may be caused
by
https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191
only deal with PYSPARK_PYTHON
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]