Hi Mina,

I added your changes and they got the pyspark interpreter working! Thanks
so much for your help!

Ian

On Sunday, February 21, 2016, mina lee <mina...@apache.org> wrote:

> Hi Ian, sorry for late reply.
> I was able to reproduce the same error with spark 1.4.1 & hadoop
> 2.6.0. Turned out it was bug from Zeppelin.
> After some search, I realized that `spark.yarn.isPython` property is
> introduced since 1.5.0. I just made a PR(
> https://github.com/apache/incubator-zeppelin/pull/736) to fix it. It will
> be really appreciated if you can try it and see if it works. Thank you for
> reporting bug!
>
> Regard,
> Mina
>
> On Thu, Feb 18, 2016 at 2:39 AM, Ian Maloney <rachmaninovquar...@gmail.com
> <javascript:_e(%7B%7D,'cvml','rachmaninovquar...@gmail.com');>> wrote:
>
>> Hi Mina,
>>
>> Thanks for the response. I recloned the master from github and built
>> using:
>> mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark
>>
>> I did that locally then scped to a node in a cluster running HDP 2.3
>> (spark 1.4.1 & hadoop 2.7.1).
>>
>> I added the two config files from below and started the Zeppelin daemon.
>> Inspecting the spark.yarn.isPython config in the spark UI, showed it to be
>> "true".
>>
>> The pyspark interpreter gives the same error as before. Are there any
>> other configs I should check? I'm beginning to wonder if it's related to
>> something in Hortonworks' distribution of spark or yarn.
>>
>>
>>
>> On Tuesday, February 16, 2016, mina lee <mina...@apache.org
>> <javascript:_e(%7B%7D,'cvml','mina...@apache.org');>> wrote:
>>
>>> Hi Ian,
>>>
>>> The log stack looks quite similar with
>>> https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed
>>> since v0.5.6
>>> This happens when pyspark.zip and py4j-*.zip are not distributed to yarn
>>> worker nodes.
>>>
>>> If you are building from source code can you please double check that
>>> you pulled the latest master?
>>> And also to be sure can you confirm that if you can see
>>> spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI)
>>> > Environment > Spark Properties?
>>>
>>> On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney <
>>> rachmaninovquar...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've been trying unsuccessfully to configure the pyspark interpreter on
>>>> Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter
>>>> from Zeppelin without issue. Here are the lines which aren't commented out
>>>> in my zeppelin-env.sh file:
>>>>
>>>> export MASTER=yarn-client
>>>>
>>>> export ZEPPELIN_PORT=8090
>>>>
>>>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950
>>>> -Dspark.yarn.queue=default"
>>>>
>>>> export SPARK_HOME=/usr/hdp/current/spark-client/
>>>>
>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>>>
>>>> export PYSPARK_PYTHON=/usr/bin/python
>>>>
>>>> export
>>>> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH
>>>>
>>>> Running a simple pyspark script in the interpreter gives this error:
>>>>
>>>>   1.  Py4JJavaError: An error occurred while calling
>>>> z:org.apache.spark.api.python.PythonRDD.runJob.
>>>>   2.  : org.apache.spark.SparkException: Job aborted due to stage
>>>> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task
>>>> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname):
>>>> org.apache.spark.SparkException:
>>>>   3.  Error from python worker:
>>>>   4.    /usr/bin/python: No module named pyspark
>>>>   5.  PYTHONPATH was:
>>>>   6.
>>>> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
>>>>
>>>> More details can be found here:
>>>>
>>>> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html
>>>>
>>>> Thanks,
>>>>
>>>> Ian
>>>>
>>>>
>>>
>

Reply via email to