I'd like to give users ability to switch between Python2 and Python3 for their PySpark jobs. Was somebody able to set up something like this, so they can switch between python2 and python3 pyspark interpreters?
For this experiment, created a new %py3spark interpreter, assigned to spark interpreter group. Added following options there for %py3spark: [1] /opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's available on all worker nodes and on zeppelin server too. For default %pyspark interpreter it's very similar to [1], except all paths have "/opt/cloudera/parcels/Anaconda" instead of " /opt/cloudera/parcels/Anaconda3". Nevertheless, zeppelin_ipythonxxx/ipython_server.py seems catching environment variable from zeppelin-env.sh and not from interpreter settings. Zeppelin documentation reads that all uppercase variables will be treated as environment variables, so I assume it should overwrite what's in zeppelin-env.sh, no? It seems environment variables at interpreter level are broken - notice "pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH (highlighted). [image: image.png] [1] LD_LIBRARY_PATH /opt/cloudera/parcels/Anaconda3/lib PATH /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin PYSPARK_DRIVER_PYTHON /opt/cloudera/parcels/Anaconda3/bin/python PYSPARK_PYTHON /opt/cloudera/parcels/Anaconda3/bin/python PYTHONHOME /opt/cloudera/parcels/Anaconda3 spark.executorEnv.LD_LIBRARY_PATH/ opt/cloudera/parcels/Anaconda3/lib spark.executorEnv.PYSPARK_PYTHON /opt/cloudera/parcels/Anaconda3/bin/python spark.pyspark.driver.python /opt/cloudera/parcels/Anaconda3/bin/python spark.pyspark.python /opt/cloudera/parcels/Anaconda3/bin/python spark.yarn.appMasterEnv.PYSPARK_PYTHON /opt/cloudera/parcels/Anaconda3/bin/python -- Ruslan Dautkhanov