Hi Ian, The log stack looks quite similar with https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed since v0.5.6 This happens when pyspark.zip and py4j-*.zip are not distributed to yarn worker nodes.
If you are building from source code can you please double check that you pulled the latest master? And also to be sure can you confirm that if you can see spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI) > Environment > Spark Properties? On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney <rachmaninovquar...@gmail.com> wrote: > Hi, > > I've been trying unsuccessfully to configure the pyspark interpreter on > Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter > from Zeppelin without issue. Here are the lines which aren't commented out > in my zeppelin-env.sh file: > > export MASTER=yarn-client > > export ZEPPELIN_PORT=8090 > > export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 > -Dspark.yarn.queue=default" > > export SPARK_HOME=/usr/hdp/current/spark-client/ > > export HADOOP_CONF_DIR=/etc/hadoop/conf > > export PYSPARK_PYTHON=/usr/bin/python > > export > PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH > > Running a simple pyspark script in the interpreter gives this error: > > 1. Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > 2. : org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 1.0 (TID 5, some_yarn_node.networkname): > org.apache.spark.SparkException: > 3. Error from python worker: > 4. /usr/bin/python: No module named pyspark > 5. PYTHONPATH was: > 6. > /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar > > More details can be found here: > > https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html > > Thanks, > > Ian > >