Regarding re-using notebook, I noticed one thing in past week that if you
used 'shared' spark interpreter/context then all your variables (at least
scala based) are shared across multiple notes. I think solution you are
mentioning will give user better control. i.e. they can run spark
interpreter in any mode and still be able to control what is shared across
However I see some other minor problems. (problems that exists in any shell
based programming hence may not necessary a problem but just to share)
- User has to be careful using same variable declaration across
notebooks. e.g. if I do `val abc = bla bla ` in one notebook and `val abc =
123` in another, last one executed will override previous one.
Regarding REST api, I strongly think that REST api is better alternative
then a sheduler. Here are few reason why not to rely on scheduler:
1. Often times your model is a part of complex pipeline. i.e. it has to
be part of a complex workflow and has to rely on certain external events
2. Almost every component in your pipeline requires *parameterization*
or some kind of config (static or dynamic). In case of dynamic
configuration it is easier to call component (here zeppelin notebook) with
3. You can't rely on stand alone scheduler to get triggered at right
time unless you can make it configurable based on external events. even
though I think it's not as reliable as calling via REST api
4. with REST api at hand, user can design their own scheduling however
I just developed my first model with notebook. I will have more thoughts
once I'll think more about how to deploy it, retrain it time to time or
even re-evaluate it etc.
On Sat, Oct 15, 2016 at 6:14 PM, moon soo Lee <m...@apache.org> wrote:
> Hi Nirav,
> Thanks for sharing your thoughts.
> I think idea of reuse notebook make sense.
> One possible idea about resuing notebook, is extend current
> z.run(PRARAGRAPH_ID)  which works for paragraphs only in the same note,
> to z.run(NOTE_ID) or z.run(PARAGRAPH_ID) which works any note or paragraph
> in the other note.
> Deploy notebook in production, there're two approaches. One is improve
> REST api from external application, the other is enhance Zeppelin's job
> scheduler. I think both valid approach.
>  https://github.com/apache/zeppelin/blob/branch-0.6/
> On Tue, Sep 27, 2016 at 2:43 AM Nirav Patel <npa...@xactlycorp.com> wrote:
>> Currently I am using apache zeppelin alongside my eclipse based scala
>> project. So basically I use my scala project to spit various intermediate
>> files or file I need for analysis and then use zeppelin to create different
>> visualization on top of those files. However, many times I find myself to
>> be able to dig more into models that I am using. For that I think it's
>> easier to just do modeling in zeppelin as well using spark mllib or any
>> other imported library. Is this a proper use case for zeppelin?
>> If it is then I think there are some enhancement should be added to
>> notebook. e.g. Ability to reuse notebook (treat them as a class or package
>> ) so it can be imported into other notebooks at least. That way we can
>> define common imports, variables, files, objects (filesystem, connection
>> pool) etc.
>> Another thing to consider is how to deploy such notebooks in production.
>> e.g. how to parameterize zeppelin notebook and call it via REST or
>> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn]
>> <https://www.linkedin.com/company/xactly-corporation> [image: Twitter]
>> <https://twitter.com/Xactly> [image: Facebook]
>> <https://www.facebook.com/XactlyCorp> [image: YouTube]
[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
<https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn]
<https://www.linkedin.com/company/xactly-corporation> [image: Twitter]
<https://twitter.com/Xactly> [image: Facebook]
<https://www.facebook.com/XactlyCorp> [image: YouTube]