Re: Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread Jianfeng (Jeff) Zhang

It is dynamic, you can set enviroment variable in interpreter setting page.


Best Regard,
Jeff Zhang


From: Ruslan Dautkhanov <dautkha...@gmail.com<mailto:dautkha...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Tuesday, March 21, 2017 at 3:27 AM
To: users <users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: Should zeppelin.pyspark.python be used on the worker nodes ?

You're right - it will not be dynamic.

You may want to check
https://issues.apache.org/jira/browse/ZEPPELIN-2195
https://github.com/apache/zeppelin/pull/2079
it seems it is fixed in a current snapshot of Zeppelin (comitted 3 weeks ago).






--
Ruslan Dautkhanov

On Mon, Mar 20, 2017 at 1:21 PM, William Markito Oliveira 
<william.mark...@gmail.com<mailto:william.mark...@gmail.com>> wrote:
Thanks for the quick response Ruslan.

But given that it's an environment variable, I can't quickly change that value 
and point to a different python environment without restarting the Zeppelin 
process, can I ? I mean is there a way to set the value for PYSPARK_PYTHON from 
the Interpreter configuration screen ?

Thanks,


On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov 
<dautkha...@gmail.com<mailto:dautkha...@gmail.com>> wrote:
You can set PYSPARK_PYTHON environment variable for that.

Not sure about zeppelin.pyspark.python. I think it does not work
See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265


Eventually, i think we can remove zeppelin.pyspark.python and use only 
PYSPARK_PYTHON instead to avoid confusion.


--
Ruslan Dautkhanov

On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira 
<mark...@apache.org<mailto:mark...@apache.org>> wrote:
I'm trying to use zeppelin.pyspark.python as the variable to set the python 
that Spark worker nodes should use for my job, but it doesn't seem to be 
working.

Am I missing something or this variable does not do that ?

My goal is to change that variable to point to different conda environments.  
These environments are available in all worker nodes since it's on a shared 
location and ideally all nodes then would have access to the same libraries and 
dependencies.

Thanks,

~/William




--
~/William



Re: Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread Ruslan Dautkhanov
> from pyspark.conf import SparkConf
> ImportError: No module named *pyspark.conf*


William, you probably meant

from pyspark import SparkConf


?


-- 
Ruslan Dautkhanov

On Mon, Mar 20, 2017 at 2:12 PM, William Markito Oliveira <
william.mark...@gmail.com> wrote:

> Ah! Thanks Ruslan! I'm still using 0.7.0 - Let me update to 0.8.0 and
> I'll come back update this thread with the results.
>
> On Mon, Mar 20, 2017 at 3:10 PM, William Markito Oliveira <
> william.mark...@gmail.com> wrote:
>
>> Hi moon, thanks for the tip. Here to summarize my current settings are
>> the following
>>
>> conf/zeppelin-env.sh has only SPARK_HOME setting:
>>
>> export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/
>>
>> Then on the configuration of the interpreter through the web interface I
>> have:
>>
>> PYSPARK_PYTHON=/opt/miniconda2/envs/myenv/bin/python
>> zeppelin.pyspark.python=python
>>
>> But when I submit from the notebook I'm receiving:  pyspark is not
>> responding
>>
>> And the log file outputs:
>>
>> Traceback (most recent call last): File 
>> "/tmp/zeppelin_pyspark-6480867511995958556.py",
>> line 22, in  from pyspark.conf import SparkConf ImportError: No
>> module named pyspark.conf
>>
>> Any thoughts ?  Thanks a lot!
>>
>> On Mon, Mar 20, 2017 at 2:27 PM, moon soo Lee  wrote:
>>
>>> When property key in interpreter configuration screen matches certain
>>> condition [1], it'll be treated as a environment variable.
>>>
>>> You can remove PYSPARK_PYTHON from conf/zeppelin-env.sh and place it in
>>> interpreter configuration.
>>>
>>> Thanks,
>>> moon
>>>
>>> [1] https://github.com/apache/zeppelin/blob/master/zeppelin-
>>> interpreter/src/main/java/org/apache/zeppelin/interpreter/re
>>> mote/RemoteInterpreter.java#L152
>>>
>>>
>>> On Mon, Mar 20, 2017 at 12:21 PM William Markito Oliveira <
>>> william.mark...@gmail.com> wrote:
>>>
 Thanks for the quick response Ruslan.

 But given that it's an environment variable, I can't quickly change
 that value and point to a different python environment without restarting
 the Zeppelin process, can I ? I mean is there a way to set the value for
 PYSPARK_PYTHON from the Interpreter configuration screen ?

 Thanks,


 On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov <
 dautkha...@gmail.com> wrote:

 You can set PYSPARK_PYTHON environment variable for that.

 Not sure about zeppelin.pyspark.python. I think it does not work
 See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265

 Eventually, i think we can remove zeppelin.pyspark.python and use only
 PYSPARK_PYTHON instead to avoid confusion.


 --
 Ruslan Dautkhanov

 On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira <
 mark...@apache.org> wrote:

 I'm trying to use zeppelin.pyspark.python as the variable to set the
 python that Spark worker nodes should use for my job, but it doesn't seem
 to be working.

 Am I missing something or this variable does not do that ?

 My goal is to change that variable to point to different conda
 environments.  These environments are available in all worker nodes since
 it's on a shared location and ideally all nodes then would have access to
 the same libraries and dependencies.

 Thanks,

 ~/William





 --
 ~/William

>>>
>>
>>
>> --
>> ~/William
>>
>
>
>
> --
> ~/William
>


Re: Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread William Markito Oliveira
Hi moon, thanks for the tip. Here to summarize my current settings are the
following

conf/zeppelin-env.sh has only SPARK_HOME setting:

export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/

Then on the configuration of the interpreter through the web interface I
have:

PYSPARK_PYTHON=/opt/miniconda2/envs/myenv/bin/python
zeppelin.pyspark.python=python

But when I submit from the notebook I'm receiving:  pyspark is not
responding

And the log file outputs:

Traceback (most recent call last): File
"/tmp/zeppelin_pyspark-6480867511995958556.py", line 22, in  from
pyspark.conf import SparkConf ImportError: No module named pyspark.conf

Any thoughts ?  Thanks a lot!

On Mon, Mar 20, 2017 at 2:27 PM, moon soo Lee  wrote:

> When property key in interpreter configuration screen matches certain
> condition [1], it'll be treated as a environment variable.
>
> You can remove PYSPARK_PYTHON from conf/zeppelin-env.sh and place it in
> interpreter configuration.
>
> Thanks,
> moon
>
> [1] https://github.com/apache/zeppelin/blob/master/zeppelin-
> interpreter/src/main/java/org/apache/zeppelin/interpreter/
> remote/RemoteInterpreter.java#L152
>
>
> On Mon, Mar 20, 2017 at 12:21 PM William Markito Oliveira <
> william.mark...@gmail.com> wrote:
>
>> Thanks for the quick response Ruslan.
>>
>> But given that it's an environment variable, I can't quickly change that
>> value and point to a different python environment without restarting the
>> Zeppelin process, can I ? I mean is there a way to set the value for
>> PYSPARK_PYTHON from the Interpreter configuration screen ?
>>
>> Thanks,
>>
>>
>> On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov 
>> wrote:
>>
>> You can set PYSPARK_PYTHON environment variable for that.
>>
>> Not sure about zeppelin.pyspark.python. I think it does not work
>> See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265
>>
>> Eventually, i think we can remove zeppelin.pyspark.python and use only
>> PYSPARK_PYTHON instead to avoid confusion.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira <
>> mark...@apache.org> wrote:
>>
>> I'm trying to use zeppelin.pyspark.python as the variable to set the
>> python that Spark worker nodes should use for my job, but it doesn't seem
>> to be working.
>>
>> Am I missing something or this variable does not do that ?
>>
>> My goal is to change that variable to point to different conda
>> environments.  These environments are available in all worker nodes since
>> it's on a shared location and ideally all nodes then would have access to
>> the same libraries and dependencies.
>>
>> Thanks,
>>
>> ~/William
>>
>>
>>
>>
>>
>> --
>> ~/William
>>
>


-- 
~/William


Re: Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread William Markito Oliveira
Ah! Thanks Ruslan! I'm still using 0.7.0 - Let me update to 0.8.0 and I'll
come back update this thread with the results.

On Mon, Mar 20, 2017 at 3:10 PM, William Markito Oliveira <
william.mark...@gmail.com> wrote:

> Hi moon, thanks for the tip. Here to summarize my current settings are the
> following
>
> conf/zeppelin-env.sh has only SPARK_HOME setting:
>
> export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/
>
> Then on the configuration of the interpreter through the web interface I
> have:
>
> PYSPARK_PYTHON=/opt/miniconda2/envs/myenv/bin/python
> zeppelin.pyspark.python=python
>
> But when I submit from the notebook I'm receiving:  pyspark is not
> responding
>
> And the log file outputs:
>
> Traceback (most recent call last): File 
> "/tmp/zeppelin_pyspark-6480867511995958556.py",
> line 22, in  from pyspark.conf import SparkConf ImportError: No
> module named pyspark.conf
>
> Any thoughts ?  Thanks a lot!
>
> On Mon, Mar 20, 2017 at 2:27 PM, moon soo Lee  wrote:
>
>> When property key in interpreter configuration screen matches certain
>> condition [1], it'll be treated as a environment variable.
>>
>> You can remove PYSPARK_PYTHON from conf/zeppelin-env.sh and place it in
>> interpreter configuration.
>>
>> Thanks,
>> moon
>>
>> [1] https://github.com/apache/zeppelin/blob/master/zeppelin-
>> interpreter/src/main/java/org/apache/zeppelin/interpreter/re
>> mote/RemoteInterpreter.java#L152
>>
>>
>> On Mon, Mar 20, 2017 at 12:21 PM William Markito Oliveira <
>> william.mark...@gmail.com> wrote:
>>
>>> Thanks for the quick response Ruslan.
>>>
>>> But given that it's an environment variable, I can't quickly change that
>>> value and point to a different python environment without restarting the
>>> Zeppelin process, can I ? I mean is there a way to set the value for
>>> PYSPARK_PYTHON from the Interpreter configuration screen ?
>>>
>>> Thanks,
>>>
>>>
>>> On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov >> > wrote:
>>>
>>> You can set PYSPARK_PYTHON environment variable for that.
>>>
>>> Not sure about zeppelin.pyspark.python. I think it does not work
>>> See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265
>>>
>>> Eventually, i think we can remove zeppelin.pyspark.python and use only
>>> PYSPARK_PYTHON instead to avoid confusion.
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira <
>>> mark...@apache.org> wrote:
>>>
>>> I'm trying to use zeppelin.pyspark.python as the variable to set the
>>> python that Spark worker nodes should use for my job, but it doesn't seem
>>> to be working.
>>>
>>> Am I missing something or this variable does not do that ?
>>>
>>> My goal is to change that variable to point to different conda
>>> environments.  These environments are available in all worker nodes since
>>> it's on a shared location and ideally all nodes then would have access to
>>> the same libraries and dependencies.
>>>
>>> Thanks,
>>>
>>> ~/William
>>>
>>>
>>>
>>>
>>>
>>> --
>>> ~/William
>>>
>>
>
>
> --
> ~/William
>



-- 
~/William


Re: Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread Ruslan Dautkhanov
You're right - it will not be dynamic.

You may want to check
https://issues.apache.org/jira/browse/ZEPPELIN-2195
https://github.com/apache/zeppelin/pull/2079
it seems it is fixed in a current snapshot of Zeppelin (comitted 3 weeks
ago).






-- 
Ruslan Dautkhanov

On Mon, Mar 20, 2017 at 1:21 PM, William Markito Oliveira <
william.mark...@gmail.com> wrote:

> Thanks for the quick response Ruslan.
>
> But given that it's an environment variable, I can't quickly change that
> value and point to a different python environment without restarting the
> Zeppelin process, can I ? I mean is there a way to set the value for
> PYSPARK_PYTHON from the Interpreter configuration screen ?
>
> Thanks,
>
>
> On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov 
> wrote:
>
>> You can set PYSPARK_PYTHON environment variable for that.
>>
>> Not sure about zeppelin.pyspark.python. I think it does not work
>> See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265
>>
>> Eventually, i think we can remove zeppelin.pyspark.python and use only
>> PYSPARK_PYTHON instead to avoid confusion.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira <
>> mark...@apache.org> wrote:
>>
>>> I'm trying to use zeppelin.pyspark.python as the variable to set the
>>> python that Spark worker nodes should use for my job, but it doesn't seem
>>> to be working.
>>>
>>> Am I missing something or this variable does not do that ?
>>>
>>> My goal is to change that variable to point to different conda
>>> environments.  These environments are available in all worker nodes since
>>> it's on a shared location and ideally all nodes then would have access to
>>> the same libraries and dependencies.
>>>
>>> Thanks,
>>>
>>> ~/William
>>>
>>
>>
>
>
> --
> ~/William
>


Re: Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread moon soo Lee
When property key in interpreter configuration screen matches certain
condition [1], it'll be treated as a environment variable.

You can remove PYSPARK_PYTHON from conf/zeppelin-env.sh and place it in
interpreter configuration.

Thanks,
moon

[1]
https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreter.java#L152


On Mon, Mar 20, 2017 at 12:21 PM William Markito Oliveira <
william.mark...@gmail.com> wrote:

> Thanks for the quick response Ruslan.
>
> But given that it's an environment variable, I can't quickly change that
> value and point to a different python environment without restarting the
> Zeppelin process, can I ? I mean is there a way to set the value for
> PYSPARK_PYTHON from the Interpreter configuration screen ?
>
> Thanks,
>
>
> On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov 
> wrote:
>
> You can set PYSPARK_PYTHON environment variable for that.
>
> Not sure about zeppelin.pyspark.python. I think it does not work
> See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265
>
> Eventually, i think we can remove zeppelin.pyspark.python and use only
> PYSPARK_PYTHON instead to avoid confusion.
>
>
> --
> Ruslan Dautkhanov
>
> On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira <
> mark...@apache.org> wrote:
>
> I'm trying to use zeppelin.pyspark.python as the variable to set the
> python that Spark worker nodes should use for my job, but it doesn't seem
> to be working.
>
> Am I missing something or this variable does not do that ?
>
> My goal is to change that variable to point to different conda
> environments.  These environments are available in all worker nodes since
> it's on a shared location and ideally all nodes then would have access to
> the same libraries and dependencies.
>
> Thanks,
>
> ~/William
>
>
>
>
>
> --
> ~/William
>


Re: Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread William Markito Oliveira
Thanks for the quick response Ruslan.

But given that it's an environment variable, I can't quickly change that
value and point to a different python environment without restarting the
Zeppelin process, can I ? I mean is there a way to set the value for
PYSPARK_PYTHON from the Interpreter configuration screen ?

Thanks,


On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov 
wrote:

> You can set PYSPARK_PYTHON environment variable for that.
>
> Not sure about zeppelin.pyspark.python. I think it does not work
> See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265
>
> Eventually, i think we can remove zeppelin.pyspark.python and use only
> PYSPARK_PYTHON instead to avoid confusion.
>
>
> --
> Ruslan Dautkhanov
>
> On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira <
> mark...@apache.org> wrote:
>
>> I'm trying to use zeppelin.pyspark.python as the variable to set the
>> python that Spark worker nodes should use for my job, but it doesn't seem
>> to be working.
>>
>> Am I missing something or this variable does not do that ?
>>
>> My goal is to change that variable to point to different conda
>> environments.  These environments are available in all worker nodes since
>> it's on a shared location and ideally all nodes then would have access to
>> the same libraries and dependencies.
>>
>> Thanks,
>>
>> ~/William
>>
>
>


-- 
~/William


Re: Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread Ruslan Dautkhanov
You can set PYSPARK_PYTHON environment variable for that.

Not sure about zeppelin.pyspark.python. I think it does not work
See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265

Eventually, i think we can remove zeppelin.pyspark.python and use only
PYSPARK_PYTHON instead to avoid confusion.


-- 
Ruslan Dautkhanov

On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira <
mark...@apache.org> wrote:

> I'm trying to use zeppelin.pyspark.python as the variable to set the
> python that Spark worker nodes should use for my job, but it doesn't seem
> to be working.
>
> Am I missing something or this variable does not do that ?
>
> My goal is to change that variable to point to different conda
> environments.  These environments are available in all worker nodes since
> it's on a shared location and ideally all nodes then would have access to
> the same libraries and dependencies.
>
> Thanks,
>
> ~/William
>


Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread William Markito Oliveira
I'm trying to use zeppelin.pyspark.python as the variable to set the python
that Spark worker nodes should use for my job, but it doesn't seem to be
working.

Am I missing something or this variable does not do that ?

My goal is to change that variable to point to different conda
environments.  These environments are available in all worker nodes since
it's on a shared location and ideally all nodes then would have access to
the same libraries and dependencies.

Thanks,

~/William