Hi Mina, I added your changes and they got the pyspark interpreter working! Thanks so much for your help!
Ian On Sunday, February 21, 2016, mina lee <mina...@apache.org> wrote: > Hi Ian, sorry for late reply. > I was able to reproduce the same error with spark 1.4.1 & hadoop > 2.6.0. Turned out it was bug from Zeppelin. > After some search, I realized that `spark.yarn.isPython` property is > introduced since 1.5.0. I just made a PR( > https://github.com/apache/incubator-zeppelin/pull/736) to fix it. It will > be really appreciated if you can try it and see if it works. Thank you for > reporting bug! > > Regard, > Mina > > On Thu, Feb 18, 2016 at 2:39 AM, Ian Maloney <rachmaninovquar...@gmail.com > <javascript:_e(%7B%7D,'cvml','rachmaninovquar...@gmail.com');>> wrote: > >> Hi Mina, >> >> Thanks for the response. I recloned the master from github and built >> using: >> mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark >> >> I did that locally then scped to a node in a cluster running HDP 2.3 >> (spark 1.4.1 & hadoop 2.7.1). >> >> I added the two config files from below and started the Zeppelin daemon. >> Inspecting the spark.yarn.isPython config in the spark UI, showed it to be >> "true". >> >> The pyspark interpreter gives the same error as before. Are there any >> other configs I should check? I'm beginning to wonder if it's related to >> something in Hortonworks' distribution of spark or yarn. >> >> >> >> On Tuesday, February 16, 2016, mina lee <mina...@apache.org >> <javascript:_e(%7B%7D,'cvml','mina...@apache.org');>> wrote: >> >>> Hi Ian, >>> >>> The log stack looks quite similar with >>> https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed >>> since v0.5.6 >>> This happens when pyspark.zip and py4j-*.zip are not distributed to yarn >>> worker nodes. >>> >>> If you are building from source code can you please double check that >>> you pulled the latest master? >>> And also to be sure can you confirm that if you can see >>> spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI) >>> > Environment > Spark Properties? >>> >>> On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney < >>> rachmaninovquar...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I've been trying unsuccessfully to configure the pyspark interpreter on >>>> Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter >>>> from Zeppelin without issue. Here are the lines which aren't commented out >>>> in my zeppelin-env.sh file: >>>> >>>> export MASTER=yarn-client >>>> >>>> export ZEPPELIN_PORT=8090 >>>> >>>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 >>>> -Dspark.yarn.queue=default" >>>> >>>> export SPARK_HOME=/usr/hdp/current/spark-client/ >>>> >>>> export HADOOP_CONF_DIR=/etc/hadoop/conf >>>> >>>> export PYSPARK_PYTHON=/usr/bin/python >>>> >>>> export >>>> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH >>>> >>>> Running a simple pyspark script in the interpreter gives this error: >>>> >>>> 1. Py4JJavaError: An error occurred while calling >>>> z:org.apache.spark.api.python.PythonRDD.runJob. >>>> 2. : org.apache.spark.SparkException: Job aborted due to stage >>>> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task >>>> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname): >>>> org.apache.spark.SparkException: >>>> 3. Error from python worker: >>>> 4. /usr/bin/python: No module named pyspark >>>> 5. PYTHONPATH was: >>>> 6. >>>> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar >>>> >>>> More details can be found here: >>>> >>>> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html >>>> >>>> Thanks, >>>> >>>> Ian >>>> >>>> >>> >