Since almost everyone agree on to run serial by default. We could implement it 
first. Regarding the parallel mode,  we could leave it in future although 
personally I prefer to define DAG for note.


Best Regard,
Jeff Zhang


From: Michael Segel 
<msegel_had...@hotmail.com<mailto:msegel_had...@hotmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Friday, October 6, 2017 at 10:08 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: Implementing run all paragraphs sequentially

Guys…

1) You’re posting this to the user list… Isn’t this a dev question?

2) +1 on the run serial… but doesn’t that already exist with the “run all 
paragraphs” button already?

3) -1 on a ‘run all in parallel’ button.  (Its like putting lipstick on a pig.)

Are you really going to run all of the paragraphs in parallel?  You’re not 
going to have a paragraph that is used to set things up? Import external 
libraries?  Define classes/functions for future paragraphs to use?

IMHO I would much rather see a DAG where each paragraph can set their 
dependancy… (this isn’t the right term. I’m trying to think back to how it was 
described in NeXTStep objective-c code.)
Then you could set your parallel button to run in parallel but if your 
paragraph is dependent on another, its blocked from executing until its 
predecessor completes.

But that’s just my $0.02

On Oct 6, 2017, at 2:25 AM, Polyakov Valeriy 
<v.polja...@tinkoff.ru<mailto:v.polja...@tinkoff.ru>> wrote:

Thank you all for sharing the problem. Naman Mishra had started the 
implementation of serial run in [1] so I propose to come back for the 
discussion of next step (both Parallel and Serial run buttons) after [1] will 
resolved.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-2368


Valeriy Polyakov

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Friday, October 06, 2017 10:14 AM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially


+1 for serial run by default.  Let's leave others in future.

Mohit Jaggi <mohitja...@gmail.com<mailto:mohitja...@gmail.com>>于2017年10月6日周五 
上午7:48写道:
+1 for serial run by default.

Sent from my iPhone

On Oct 5, 2017, at 3:36 PM, moon soo Lee 
<m...@apache.org<mailto:m...@apache.org>> wrote:
I'd like to we also consider simplicity of use.

We can have two different modes, or two different run buttons for Serial or 
Parallel run. This gives flexibility of choosing two different scheduler as a 
benefit, but to make user understand difference between two run button, there 
must be really good UI treatment.

I see there're high user demands for run notebook sequentially. And i think 
there're 3 action items in this discussion threads.

1. Change Parallel -> Serial the current run all button behavior
2. Provide both Parallel and Serial run buttons with really good UI treatment.
3. Provides DAG

I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) in 
the future.

So, why don't we try 1) first and keep discuss and polish idea about 2) and 3)?


Thanks,
moon

On Mon, Oct 2, 2017 at 10:22 AM Michael Segel 
<msegel_had...@hotmail.com<mailto:msegel_had...@hotmail.com>> wrote:
Whoa!
Seems I walked in to something.

Herval,

What do you suggest?  A simple switch that runs everything in serial, or 
everything in parallel?
That would be a very bad idea.

I gave you an example of a class of solutions where you don’t want that 
behavior.
E.g Unit testing where you have one setup and then run several unit tests in 
parallel.

If that’s not enough for you… how about if you want to test producer/consumer 
problems?

Or if you want to define classes in one paragraph but then call on them in 
later paragraphs. If everything runs in parallel from the start of time 0, you 
can’t do this.


So, if you want to do it right the first time… you need to establish a way to 
control the dependency of paragraphs. This isn’t rocket science.
And frankly not that complex.

BTW, this is the user list not the dev list…

Just saying…  ;-)


On Oct 2, 2017, at 11:24 AM, Herval Freire 
<hfre...@twitter.com<mailto:hfre...@twitter.com>> wrote:

 "nice to have" isn't a very strong requirement. I strongly uggest you really, 
really think about this before you start pounding an overengineered solution to 
a non-issue :-)

h

On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel 
<msegel_had...@hotmail.com<mailto:msegel_had...@hotmail.com>> wrote:
Yes…
 You have bunch of unit tests you can run in parallel where you only need one 
constructor and one cleanup.

I would strongly suggest that you really, really think about this long and hard 
before you start to pound code.
Its going to be harder to back out and fix than if you take the time to think 
thru the problem and not make a dumb mistake.

On Oct 2, 2017, at 11:02 AM, Herval Freire 
<hfre...@twitter.com<mailto:hfre...@twitter.com>> wrote:

Did anyone request such a case ("running some in parallel and some in 
sequence")? I haven't seen any requests for this in the wild (nor on this 
thread), other than theoretical "what if" - which is totally fine, when it 
doesn't introduce a lot of unecessary complexity for little to no gain (which 
seems to be the case here)

h

On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel 
<msegel_had...@hotmail.com<mailto:msegel_had...@hotmail.com>> wrote:
Because that simplicity doesn’t work.

You will want to run some things serial and some things in parallel.

Which is why you will need a dependency graph.

On Oct 2, 2017, at 10:40 AM, Herval Freire 
<hfre...@twitter.com<mailto:hfre...@twitter.com>> wrote:

Why do you need rules and graphs and any of that to support running everything 
sequentially or everything in parallel?

3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs one 
at a time, in the order they’re defined. If parallel, run using current scheme 
(as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v.polja...@tinkoff.ru<mailto:v.polja...@tinkoff.ru>>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running 
notes does not meet actual requirements. The most important thing that we need 
is the ability of sequential running. However, at the same time we want to keep 
functionality of parallel running. We discussed that the most suitable solution 
of building paragraphs` dependencies is a DAG (directed acyclic graph). 
Therefore, surely, this kind of dependencies should be defined in note and the 
running order should not depend on how we launch it (button / scheduler / API). 
In this way, our objectives are to implement “dependency definition engine” and 
to use it in “run engine”. What are the options?
1)      Explicit dependency definition.
We could take for a rule that each paragraph should wait for the end of 
execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” 
where we can choose paragraph for which we are waiting for to start execution. 
In case where the option is set, we start execution immediately after the end 
of execution of selected paragraph. This pattern allows us to implement 
full-parallel DAG running order. What are the disadvantages? All of them are 
about the same – not easy understanding of the dependency management process 
from the perspective of users (and probably redundancy of the functionality – 
my personal view). At first, we should use strange format of paragraph IDs, 
which in addition is hidden. We could come up with visible and handsome 
paragraph ID aliases, but then it appears necessity of duplication control. The 
second thing is in some kind of scenarios where we should change existing 
dependencies (e.g. you need to add new paragraph between one and dependent 
group – you have to change option “Wait for …” for each paragraph in group).
2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of 
execution of ALL previous paragraphs. Then we add paragraph option “Run in 
parallel with previous” which allows us to create paragraph groups to run in 
parallel. It turns out that we have the way of sequential running of paragraph 
groups – group by group in which paragraphs run in parallel. This approach is 
much more understandable for the users, but the obvious defect in comparison 
with “Explicit definition” is the fact that dependency graph and level of 
parallelism are not so cool.

I am not sure which option (1) or (2) is correct to implement at the moment. I 
hope to hear from product visionaries which way to choose and to get approval 
for the start of implementation.
Thank you!



Valeriy Polyakov

From: Michael Segel [mailto:msegel_had...@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some 
sort of dependency graph.  Think of a common set up where you need to set up 
common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to 
build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where 
you can set up your dependencies  as well as add things like labels beyond the 
ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee 
<m...@apache.org<mailto:m...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each 
interpreter's own scheduler (FIFO, Parallel) at once. And each individual 
scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits 
paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible 
DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi 
<mohitja...@gmail.com<mailto:mohitja...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire 
<hfre...@twitter.com<mailto:hfre...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are 
expected to *always* run sequentially - thus it makes more sense to be a note 
option than a per-run mode

H

_____________________________
From: moon soo Lee <m...@apache.org<mailto:m...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire 
<hfre...@twitter.com<mailto:hfre...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich 
<m.belou...@tinkoff.ru<mailto:m.belou...@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. 
Besides, they often use scheduling to make some regular reporting. And they 
should do something like `time.sleep()` to wait for the data from %jdbc. It 
doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs 
in [1].
We are really interested in implementation of the issue [2] and are ready to 
solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use 
(parallel or sequential) and leave "Run all" to be the only button running all 
the cells in the note. This will make sequential or parallel running the `note 
option` but not `run option`.
Option will be controlled by nearby button as shown

<~WRD000.jpg>



For new notes the default state would be "Run sequential all", for old - "Run 
parallel for interpreters"

We are glad to hear any thoughts.
Thank you.


[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

Reply via email to