Re: Multiple spark interpreters in the same Zeppelin instance

John Omernik Tue, 12 Apr 2016 05:12:21 -0700

I would actually argue that if the user doesn't have access to the same or
a similar interpreter.json file, than notebook file portability is a moot
point. For example, if I setup %spark or %jdbc in my environment, create a
notebook, that notebook is not any more or less portable than if I had
%myspark or %drill (a jdbc  interpreter).  Mainly, because if someone tries
to open that notebook, and they don't have my setup of %spark or of %jdbc,
they can't run the notebook.  If we could allow the user to create an alias
for an instance of an interpreter, and that alias information was stored in
interpreter.json, then the portability of the notebook would essentially
that the same.


Said another way:

Static interpreter invocation: (%jdbc, %pysaprk, %psql):
- This notebook is 100% dependent on the interpreter.json in order to run.
%jdbc may point to Drill, %pyspark may point to an authenticated YARN
instance (specific to the user/org), %psql may point to an authenticated
Postgres instance unique to the org/user.  Without interpreter.json, this
notebook is not portable.

Aliases for interpreter invocation stored in interpreter.json (%drill ->
jdbc with settings, %datesciencespark -> pyspark for the data science
group, %entdw -> postgres server, enterprise datawarehouse)

- Thus notebook is still 100% dependent on the interpreter.json file in
order to run. There is no more or less dependance on the interpreter.json
(if these aliases are stored there) then there is if Zeppelin is using
static interpreter invocation, thus portability is not a benefit of the
static method, and the aliased method can provide a good deal of analyst
agility/definition in a multi data set/source environment.


My thought is we should people to create new interpreters of known types,
and on creation of these interpreters allow the invocation to be stored in
the interpreter.json. Also, if a new interpreter is registered, it would
follow the same interpreter group methodology. Thus if I setup a new %spark
to be %entspark, then the sub interpreters (pyspark, sparksql etc) can be
there and have access to the master entspark, and also can be renamed.  so
that sub interpreter can be renamed, and the access it has to interpreter
group is based on the parent child relationship, not just by name...

Thoughts?






On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wangzhong....@gmail.com> wrote:

> Thanks moon - it is good to know the ideas behind the design. It makes a
> lot more sense to use system-defined identifiers in order to make the
> notebook portable.
>
> Currently, I can name the interpreter in the WebUI, but actually the name
> doesn't help to distinguish between my spark interpreters, which is quite
> confusing to me. I am not sure whether this is a better way:
> --
> 1. the UI generates the default identifier for the first spark
> interpreter, which is %spark
> 2. when the user creates another spark interpreter, the UI asks the users
> to provide a user-defined identifier
>
> Zhong
>
> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <m...@apache.org> wrote:
>
>> In the initial stage of development, there were discussion about
>> %xxx, where xxx should be user defined interpreter identifier, or should
>> be a static interpreter identifier.
>>
>> And decided to go later one. Because we wanted keep notebook file
>> portable. i.e. Let run imported note.json file from other Zeppelin instance
>> without (or minimum) modification.
>>
>> if we use user defined identifier, running imported notebook will not be
>> very simple. This is why %xxx is not using user defined interpreter
>> identifier at the moment.
>>
>> If you have any other thoughts, ideas, please feel free to share.
>>
>> Thanks,
>> moon
>>
>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wangzhong....@gmail.com>
>> wrote:
>>
>>> Thanks, Moon! I got it worked. The reason why it didn't work is that I
>>> tried to use both of the spark interpreters inside one notebook. I think I
>>> can create different notebooks for each interpreters, but it would be great
>>> if we could use "%xxx", where xxx is the user defined interpreter
>>> identifier, to identify different interpreters for different paragraphs.
>>>
>>> Besides, because currently both of the interpreters are using "spark" as
>>> the identifier, they share the same log file. I am not sure whether there
>>> are other cases they interfere with each other.
>>>
>>> Thanks,
>>> Zhong
>>>
>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <m...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> Once you create another spark interpreter in Interpreter menu of GUI,
>>>> then each notebook should able to select and use it (setting icon on
>>>> top right corner of each notebook).
>>>>
>>>> If it does not work, could you find error message on the log file?
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wangzhong....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi zeppelin pilots,
>>>>>
>>>>> I am trying to run multiple spark interpreters in the same Zeppelin
>>>>> instance. This is very helpful if the data comes from multiple spark
>>>>> clusters.
>>>>>
>>>>> Another useful use case is that, run one instance in cluster mode, and
>>>>> another in local mode. This will significantly boost the performance of
>>>>> small data analysis.
>>>>>
>>>>> Is there anyway to run multiple spark interpreters? I tried to create
>>>>> another spark interpreter with a different identifier, which is allowed in
>>>>> UI. But it doesn't work (shall I file a ticket?)
>>>>>
>>>>> I am now trying running multiple sparkContext in the same spark
>>>>> interpreter.
>>>>>
>>>>> Zhong
>>>>>
>>>>>
>>>>>
>>>
>

Re: Multiple spark interpreters in the same Zeppelin instance

Reply via email to