I'd like to give users ability to switch between Python2 and Python3 for
their PySpark jobs.
Was somebody able to set up something like this, so they can switch between
python2 and python3 pyspark interpreters?

For this experiment, created a new %py3spark interpreter, assigned to spark
interpreter group.

Added following options there for %py3spark: [1]
/opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's
available on all worker nodes and on zeppelin server too.

For default %pyspark interpreter it's very similar to [1], except all paths
have "/opt/cloudera/parcels/Anaconda" instead of "
/opt/cloudera/parcels/Anaconda3".

Nevertheless, zeppelin_ipythonxxx/ipython_server.py
seems catching environment variable from zeppelin-env.sh and not from
interpreter settings.

Zeppelin documentation reads that all uppercase variables will be
treated as environment variables, so I assume it should overwrite what's in
zeppelin-env.sh, no?

It seems environment variables at interpreter level are broken - notice
"pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH
(highlighted).

[image: image.png]



[1]

LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
PATH
/usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
PYTHONHOME  /opt/cloudera/parcels/Anaconda3

spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
spark.executorEnv.PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
spark.yarn.appMasterEnv.PYSPARK_PYTHON
/opt/cloudera/parcels/Anaconda3/bin/python

-- 
Ruslan Dautkhanov

Reply via email to