Re: Multiple spark interpreters in the same Zeppelin instance

John Omernik Fri, 29 Apr 2016 08:42:24 -0700

Moon -

I would be curious on your thoughts on my email from April 12th.


John



On Tue, Apr 12, 2016 at 7:11 AM, John Omernik <j...@omernik.com> wrote:

> I would actually argue that if the user doesn't have access to the same or
> a similar interpreter.json file, than notebook file portability is a moot
> point. For example, if I setup %spark or %jdbc in my environment, create a
> notebook, that notebook is not any more or less portable than if I had
> %myspark or %drill (a jdbc  interpreter).  Mainly, because if someone tries
> to open that notebook, and they don't have my setup of %spark or of %jdbc,
> they can't run the notebook.  If we could allow the user to create an alias
> for an instance of an interpreter, and that alias information was stored in
> interpreter.json, then the portability of the notebook would essentially
> that the same.
>
> Said another way:
>
> Static interpreter invocation: (%jdbc, %pysaprk, %psql):
> - This notebook is 100% dependent on the interpreter.json in order to run.
> %jdbc may point to Drill, %pyspark may point to an authenticated YARN
> instance (specific to the user/org), %psql may point to an authenticated
> Postgres instance unique to the org/user.  Without interpreter.json, this
> notebook is not portable.
>
> Aliases for interpreter invocation stored in interpreter.json (%drill ->
> jdbc with settings, %datesciencespark -> pyspark for the data science
> group, %entdw -> postgres server, enterprise datawarehouse)
>
> - Thus notebook is still 100% dependent on the interpreter.json file in
> order to run. There is no more or less dependance on the interpreter.json
> (if these aliases are stored there) then there is if Zeppelin is using
> static interpreter invocation, thus portability is not a benefit of the
> static method, and the aliased method can provide a good deal of analyst
> agility/definition in a multi data set/source environment.
>
>
> My thought is we should people to create new interpreters of known types,
> and on creation of these interpreters allow the invocation to be stored in
> the interpreter.json. Also, if a new interpreter is registered, it would
> follow the same interpreter group methodology. Thus if I setup a new %spark
> to be %entspark, then the sub interpreters (pyspark, sparksql etc) can be
> there and have access to the master entspark, and also can be renamed.  so
> that sub interpreter can be renamed, and the access it has to interpreter
> group is based on the parent child relationship, not just by name...
>
> Thoughts?
>
>
>
>
>
>
> On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wangzhong....@gmail.com>
> wrote:
>
>> Thanks moon - it is good to know the ideas behind the design. It makes a
>> lot more sense to use system-defined identifiers in order to make the
>> notebook portable.
>>
>> Currently, I can name the interpreter in the WebUI, but actually the name
>> doesn't help to distinguish between my spark interpreters, which is quite
>> confusing to me. I am not sure whether this is a better way:
>> --
>> 1. the UI generates the default identifier for the first spark
>> interpreter, which is %spark
>> 2. when the user creates another spark interpreter, the UI asks the users
>> to provide a user-defined identifier
>>
>> Zhong
>>
>> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <m...@apache.org> wrote:
>>
>>> In the initial stage of development, there were discussion about
>>> %xxx, where xxx should be user defined interpreter identifier, or should
>>> be a static interpreter identifier.
>>>
>>> And decided to go later one. Because we wanted keep notebook file
>>> portable. i.e. Let run imported note.json file from other Zeppelin instance
>>> without (or minimum) modification.
>>>
>>> if we use user defined identifier, running imported notebook will not be
>>> very simple. This is why %xxx is not using user defined interpreter
>>> identifier at the moment.
>>>
>>> If you have any other thoughts, ideas, please feel free to share.
>>>
>>> Thanks,
>>> moon
>>>
>>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wangzhong....@gmail.com>
>>> wrote:
>>>
>>>> Thanks, Moon! I got it worked. The reason why it didn't work is that I
>>>> tried to use both of the spark interpreters inside one notebook. I think I
>>>> can create different notebooks for each interpreters, but it would be great
>>>> if we could use "%xxx", where xxx is the user defined interpreter
>>>> identifier, to identify different interpreters for different paragraphs.
>>>>
>>>> Besides, because currently both of the interpreters are using "spark"
>>>> as the identifier, they share the same log file. I am not sure whether
>>>> there are other cases they interfere with each other.
>>>>
>>>> Thanks,
>>>> Zhong
>>>>
>>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <m...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Once you create another spark interpreter in Interpreter menu of GUI,
>>>>> then each notebook should able to select and use it (setting icon on
>>>>> top right corner of each notebook).
>>>>>
>>>>> If it does not work, could you find error message on the log file?
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wangzhong....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi zeppelin pilots,
>>>>>>
>>>>>> I am trying to run multiple spark interpreters in the same Zeppelin
>>>>>> instance. This is very helpful if the data comes from multiple spark
>>>>>> clusters.
>>>>>>
>>>>>> Another useful use case is that, run one instance in cluster mode,
>>>>>> and another in local mode. This will significantly boost the performance 
>>>>>> of
>>>>>> small data analysis.
>>>>>>
>>>>>> Is there anyway to run multiple spark interpreters? I tried to create
>>>>>> another spark interpreter with a different identifier, which is allowed 
>>>>>> in
>>>>>> UI. But it doesn't work (shall I file a ticket?)
>>>>>>
>>>>>> I am now trying running multiple sparkContext in the same spark
>>>>>> interpreter.
>>>>>>
>>>>>> Zhong
>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>

Re: Multiple spark interpreters in the same Zeppelin instance

Reply via email to