Re: Zeppelin Anatomy

Sourav Mazumder Fri, 25 Sep 2015 07:24:44 -0700

I have one more qs/requirement in this context.

Say I have 2 intercepters (spark1 and spark2) created out of the same base
intercepter (spark). The spark1 connects to a local spark environment where
as spark2 connects to a remote stand alone spark cluster. Both of them use
same Hive on a hadoop cluster.


What I want to do is , uisng spark1 I would read a local file and save it
to Hive as a table. Then using spark2 I want to process that data with
other data in Hive. I want to use spark2 because of bigger infrastructure
for computation intensive processing.

Now I can do the same using 2 different note books by specifying spark1 and
spark 2 respectively as the interecpter in them separately.

But I cannot do the same in the same notebook because both spark1 and spak2
uses same set of tags like %sql, %dep etc.

Any idea whether this is still doable with some different
configuration/work around ?

Regards,
Sourav



On Thu, Sep 24, 2015 at 10:53 PM, tog <guillaume.all...@gmail.com> wrote:

> Hi Alex
>
> Yep, i think the multitenancy set-up has raised numerous questions
> recently. It might be interesting to dedicate a web page to your container
> approach in the doc.
>
> Thanks
> Guillaume
>
> On Friday, 25 September 2015, Alex <abezzu...@nflabs.com> wrote:
>
>> Hi,
>>
>> Spark context is bounded to Spark interpreter instance, each running in a
>> separate process.
>>
>> All notes that share the same interpreter - are sharing the context too
>> (among other things)
>>
>> You can archive the desired behaviour in multiuser environment right now,
>> I.e by creating a separate spark interpreter for each user in case all
>> users share access to the same Zeppelin instance.
>>
>> Another approach that we use for our customers is to host a separate
>> Zeppelin instance in container, one per-user, and have a balancing
>> reverse-proxy in front of it.
>>
>> I can share more details on this multi-tenancy setup, if enough people
>> from community are interested in it.
>>
>> Hole this helps!
>>
>> --
>> Kind regards,
>> Alexander
>>
>> On 25 Sep 2015, at 00:54, Yian Shang <yian.sh...@gmail.com> wrote:
>>
>> Are there any plans to change this so that there will be a separate Spark
>> context per Notebook? In a multi-user environment, it is hard to deal with
>> the accidental overwriting of user variables.
>>
>> On Thu, Sep 24, 2015 at 7:19 AM, Rick Moritz <rah...@gmail.com> wrote:
>>
>>> Different instances of Zeppelin (even under the same user) are indeed
>>> separate, which is (currently) the only way to get any kind of independence
>>> into notebooks. In comparison, spark-notebook spawns one spark context per
>>> Notebook, which is somehwat better design, since concurrent useres of the
>>> same application aren't overwriting each other's variables accidentally,
>>> and each notebook is indeed "repeatable" and "stand-alone", which is a
>>> current deficit of Zeppelin, especially ina  multi-user environment.
>>> So yes, closing one context in one instance of Zeppelin will not
>>> interefere with the other Spark context in the other instance of Zeppelin.
>>>
>>> On Thu, Sep 24, 2015 at 4:02 PM, Hammad <ham...@flexilogix.com> wrote:
>>>
>>>> Very useful indeed, Rick!
>>>>
>>>> If I have two zeppelin instances running as two different users with
>>>> same Spark Master -  I see them as two different applications in Spark Web
>>>> UI.
>>>>
>>>> 1. will they have their own 'context' of execution in this case? If I
>>>> understand, this would mean that closing a spark context in one user's
>>>> zeppelin will have no impact on another user's zeppelin environment or its
>>>> not true?
>>>>
>>>> On Thu, Sep 24, 2015 at 4:47 PM, Rick Moritz <rah...@gmail.com> wrote:
>>>>
>>>>> 1)
>>>>> Zeppelin uses the spark-shell REPL API. Therefore it behaves similarly
>>>>> to the scala shell.
>>>>> You do not write applications in the shell, in the technical sense,
>>>>> but instead evaluate individual expressions with the goal of interacting
>>>>> with a dataset.
>>>>> You can (manually) export some of the code that you find useful in
>>>>> Zeppelin to applications, for example to provide batch-pre-processing.
>>>>> I recommend you look at demos/descriptions of the interactive shell
>>>>> functionality to get an idea, of what Zeppelin offers over an application.
>>>>> Also: You still have to manage most of your imports ;)
>>>>>
>>>>> 2)
>>>>> There are two benefits:
>>>>> - You can import and export/share notebooks. This means it makes sense
>>>>> to split content.
>>>>> - You also reduce the load of the browser, by splitting heavy
>>>>> visualizations into multiple notebooks. Once you start rendering tens of
>>>>> thousands of points, you start reaching the limits of a browser's
>>>>> capability.
>>>>>
>>>>> Hopefully this helps you get started.
>>>>>
>>>>> On Thu, Sep 24, 2015 at 1:04 PM, Hammad <ham...@flexilogix.com> wrote:
>>>>>
>>>>>> Hi mates,
>>>>>>
>>>>>> I was struggling with anatomy of Zeppelin in context of Spark and
>>>>>> could not find anywhere that could answer my questions in mind as below;
>>>>>>
>>>>>> 1. Usually a scala application structure is;
>>>>>>
>>>>>> import org.apache.<whatever>
>>>>>>
>>>>>> obect MyApp{
>>>>>> def main(args: Array[String]){
>>>>>> //something
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> whereas, on zeppelin we only write //something. Does it mean that one
>>>>>> zeppelin daemon is one application? What if I want to write multiple
>>>>>> applications on one zeppelin daemon instance?
>>>>>>
>>>>>> 2. Related to (1), if same spark context is shared across all
>>>>>> notebooks, whats the benefit of having multiple notebooks?
>>>>>>
>>>>>> I really appreciate if someone may help me understand above two.
>>>>>>
>>>>>> Thanks,
>>>>>> Hmad
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Flexilogix
>>>> Ph: +92 618090374
>>>> Fax: +92 612011810
>>>> http://www.flexilogix.com
>>>> i...@flexilogix.com
>>>>
>>>> Disclaimer:  This transmission (including any attachments) may contain
>>>> confidential information, privileged material or constitute non-public
>>>> information. Any use of this information by anyone other than the intended
>>>> recipient is prohibited. If you have received this transmission in error,
>>>> please immediately reply to the sender and delete this information from
>>>> your system.
>>>>
>>>
>>>
>>
>
> --
> PGP KeyID: 2048R/EA31CFC9  subkeys.pgp.net
>

Re: Zeppelin Anatomy

Reply via email to