Hi, I find rather confusing the documentation about the configuration options. There are a lot of files that are not too clear on where to modify. For example, spark-env vs spark-defaults.
I am getting an error with Python versions collision: File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 2.7 than that in driver 3.4, PySpark cannot run with different minor versions at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:315) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) But I have conf/spark-env.sh with: PYSPARK_PYTHON=python3.4 Also, I am not sure about the shebang line there is in the top of the spark-env because sourcing it would make the env vars be defined in a subrpocess, so I removed that, but anyway, I am having the same problem, Anyone has experience using python3? And with python3 in virtualenv? Also, as a matter of feedback, I find rather difficult to deploy and develop apps because although you may have ipython notebook, I haven't found a way to include pyspark in my environment (with the rest of the virtualenv libraries). Thanks!