[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-03-03 Thread xiufengliu
Github user xiufengliu commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
@cloverhearts Is this feature available now? I am really looking forward 
to. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-07 Thread cloverhearts
Github user cloverhearts commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
@rasehorn @zjffdu 
Thank you very much!
I understand the function wait.
I will try to organize it again based on your opinion.
Thank you for your kind comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-04 Thread rasehorn
Github user rasehorn commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
@cloverhearts 
I think a picture and some pseudocode tells more than thousand words, so I 
created one.

Also: I'm only talking about the use case to ensure a certain sequence of 
paragraph executions when runAll is called for the notebook. If you explicitely 
call z.run(paragraphId) within a certain notebook after runAll() was called, 
you propably execute those paragraphs twice.

The easiest way to ensure a certain sequence of paragraph execution after 
runAll() was issued is to make the paragraphs wait for the one they depend on 
to finish. 

Lets say we have three paragraphs. 
The first one is necessary to prepare the data and define temporary tables. 
The second and third paragraphs depend on that data, so it does not make sense 
to execute them before paragraph 1 finished.
Since the last two paragraphs are in status "running" and wait in parallel 
for the first paragraph to finish, they will be executed in parallel.

Please see the picture 
![wait 
pseudocode](https://cloud.githubusercontent.com/assets/22585000/21642718/903ba9be-d284-11e6-8efb-958adca7861a.jpg)

From my point of view this would be the easiest way for a ZeppelinUser to 
ensure a certain sequence of paragraph execution including control which 
paragraphs are executed in parallel. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-03 Thread rasehorn
Github user rasehorn commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
As far as I remember another discussion the paragraph IDs will change if 
you export/import or copy a notebook (not sure which one applies). If that is 
the case the workflow will be broken after import. If the user in front of the 
screen is not familiar with the code and logic of the notebook, it might be 
difficult to fix.  

What about a simple "z.wait(ordernumber or paragraphId)" function which 
makes the paragraph wait for the paragraph referenced by the ordernumber or id 
to finish successfully or cancel the paragraph execution in case of an error? 

This way all paragraphs without z.wait will be executed in parallel and 
those calling z.wait would be executed in sequence to the ones they depend on. 
And additionally this kind of functionality would not be mixed with the job 
handling on notebook level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-03 Thread cloverhearts
Github user cloverhearts commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  

Yes, apart from workflow, this feature is essential. (Get paragraph status)
I want to separate getZeppelinJobStatus () into a separate PR, and I want 
to improve the workflow by gathering this from feedback here.
And many Zeppelin users seem to want to work with a DAG type workflow 
outside of the interpreter.
I will put your opinions on this together and present a new alternative to 
this PR.

And we will separate the functions related to the workflow into other PRs.

For example, getting paragraph status, deleting paragraph output.

Thank you a lot for your opinion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-03 Thread cloverhearts
Github user cloverhearts commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
@Leemoonsoo 
Yes it seems to be good, I will make a new change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-03 Thread Leemoonsoo
Github user Leemoonsoo commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
```
z.getZeppelinJobStatus("execute note id", "execute paragraph 
id").getJobStatus()
```

How about not repeating `Job`, `Status` and omit `Zeppelin` (while `z.` 
represents zeppelin) in method name?
i.e. something like

```
z.getJob("note id", "paragraph id").getStatus()
```

or just

```
z.getJobStatus("note id", "paragraph id")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-03 Thread rasehorn
Github user rasehorn commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
I'm also a little bit confused what this PR really is about - the pictures 
above point to paragraph execution order and control but the discussion also 
points to Notebook execution workflows. 
From my point of view the control over paragraph execution within a 
notebook is something different than defining a workflow for notebook execution 
and mixing different features leads to poor design. 

Often paragraphs within notebooks depend on others and therefore they need 
to be executed in a certain order. I feel like this kind of paragraph execution 
control shall be handeled by the core framework based on settings for each 
paragraph within the notebook.

Additionally: In some places within the discussion the implementation of 
that feature on interpreter level was mentioned. It is not clear to me why the 
notebook workflow definition feature shall be reimplemented in different 
interpreters in different ways. Instead the internals of a notebook are of no 
interest when it is executed within a workflow - all that matters is success or 
failure and a definition at the workflow level what shall happen in case of a 
failure. So from my point of view the notebook workflow feature should also be 
implemented in the core code independently from the different interpreters 
available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-03 Thread rasehorn
Github user rasehorn commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
From my point of view this kind of functionality shall be provided by the 
core framework. 
I do not have created many notebooks but what I've done always is: create 
one paragraph after the other to seperate data preparation from processing and 
visualization. So for the approach I apply it would be sufficient to execute 
the paragraphs in the sequence they are ordered in the notebook and this should 
be the default behaviour. 
To support control over parallel execution of paragraphs it would be 
sufficient from my point of view to have a flag on each paragraph telling if 
this paragraph could be executed in parallel, so all subsequent paragraphs 
(their order within the notebook, not their ID) having this flag set could also 
be executed in parallel. 

This is a kind defining the paragraph execution workflow implicitely 
without the need to program explicitely.
But again: I'm not a power user. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-02 Thread cloverhearts
Github user cloverhearts commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
@zjffdu 
Yes you are right.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-02 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
Thanks @cloverhearts , after reading #1176. This PR is the first phase of 
this feature (implement low level api for workflow), is that correct ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-02 Thread cloverhearts
Github user cloverhearts commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
@zjffdu 
I agree with you.
But I am a bit cautious about this part.
In fact, we've re-implemented this functionality in a variety of ways, and 
we've actually implemented it in the parent framework format. (Formerly PR)
If, according to your opinion, I will re-implement it, it will be a form 
that combines my previous PR with the current PR.
I need many people opinion.

perhaps, Woluld you give me for many opinion this about?
commiter and zeppelin users?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-02 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
BTW, in the first phase we can provide the high-level framework to allow 
user to call it programmatically, And in the second phase, it would be better 
to allow user to do it though drag & drop in UI. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-02 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
@cloverhearts What I mean is that the code like following would be called 
many times by users
```
if (z.getZeppelinJobStatus("execute note id", "execute paragraph 
id").getJobStatus().isFinished() == true)
{ z.run("execute note id", "execute paragraph id") }
```
It is just like some code templates, so what I suggest is that we can 
create a high level workflow framework which use these apis internally. And for 
users, they just need to specify the dependency between paragraphs using this 
framework, they don't need to check job status like the code above. 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-02 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
@cloverhearts This is very interesting. I have a few questions
1. Does the dynamic forms here mean more control flow (like if condition 
and for loop)
2. In case 2, If the markdown interpreter paragraph does not depends on the 
spark interpreter paragraph, we can execute them parallelly rather than 
sequentially. 
3. I think the most important thing of workflow is to define the DAG 
(dependency between paragraphs). Your idea is to run the paragraphs 
programmatically. Would it be more intuitive to just define the DAG (Directed 
acyclic graph), and let the framework to run the dag automatically. 
e.g.

```
val flow = new JobFlow(noteId)
val note = z.getNote(noteId)
val p1 = z.getParagraph(pId1)
val p2 = z.getParagraph(pid2)
val p3 = z.getParagraph(pid3)
p3.addDependency (p2)
p2.addDependency(p1)
flow.add(p1).add(p2).add(p3).run()
``` 
4. Currently we use noteId and paragraphId, but I think these are not 
readable. We'd better use note name and paragraph name. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1799: [ZEPPELIN-1165 : WIP] Code-based job workflow

2017-01-02 Thread cloverhearts
Github user cloverhearts commented on the issue:

https://github.com/apache/zeppelin/pull/1799
  
create new issue on jira
https://issues.apache.org/jira/browse/ZEPPELIN-1886




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---