Hi Mina, Thanks for the response. I recloned the master from github and built using: mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark
I did that locally then scped to a node in a cluster running HDP 2.3 (spark 1.4.1 & hadoop 2.7.1). I added the two config files from below and started the Zeppelin daemon. Inspecting the spark.yarn.isPython config in the spark UI, showed it to be "true". The pyspark interpreter gives the same error as before. Are there any other configs I should check? I'm beginning to wonder if it's related to something in Hortonworks' distribution of spark or yarn. On Tuesday, February 16, 2016, mina lee <mina...@apache.org> wrote: > Hi Ian, > > The log stack looks quite similar with > https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed since > v0.5.6 > This happens when pyspark.zip and py4j-*.zip are not distributed to yarn > worker nodes. > > If you are building from source code can you please double check that you > pulled the latest master? > And also to be sure can you confirm that if you can see > spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI) > > Environment > Spark Properties? > > On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney <rachmaninovquar...@gmail.com > <javascript:_e(%7B%7D,'cvml','rachmaninovquar...@gmail.com');>> wrote: > >> Hi, >> >> I've been trying unsuccessfully to configure the pyspark interpreter on >> Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter >> from Zeppelin without issue. Here are the lines which aren't commented out >> in my zeppelin-env.sh file: >> >> export MASTER=yarn-client >> >> export ZEPPELIN_PORT=8090 >> >> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 >> -Dspark.yarn.queue=default" >> >> export SPARK_HOME=/usr/hdp/current/spark-client/ >> >> export HADOOP_CONF_DIR=/etc/hadoop/conf >> >> export PYSPARK_PYTHON=/usr/bin/python >> >> export >> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH >> >> Running a simple pyspark script in the interpreter gives this error: >> >> 1. Py4JJavaError: An error occurred while calling >> z:org.apache.spark.api.python.PythonRDD.runJob. >> 2. : org.apache.spark.SparkException: Job aborted due to stage >> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task >> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname): >> org.apache.spark.SparkException: >> 3. Error from python worker: >> 4. /usr/bin/python: No module named pyspark >> 5. PYTHONPATH was: >> 6. >> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar >> >> More details can be found here: >> >> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html >> >> Thanks, >> >> Ian >> >> >