Re: [discuss] Zeppelin support workflow

2019-03-18 Thread Xun Liu
Hi, Mei Long

I am very happy to be able to attend the meeting of the zeppelin community. 
What time is the next meeting? Waiting for community email notifications?

Zeppelin workflow's ticket is here, 
https://issues.apache.org/jira/browse/ZEPPELIN-4018 
<https://issues.apache.org/jira/browse/ZEPPELIN-4018> 
welcome everyone's attention.

> 在 2019年3月19日,上午1:04,Mei Long  写道:
> 
> Very cool! @Xun Liu Would you like to talk about it at our next Apache
> Zeppelin community meeting?
> 
> On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung 
> wrote:
> 
>> I like it!
>> 
>> 
>> From: Jongyoul Lee 
>> Sent: Monday, March 11, 2019 9:05:03 PM
>> To: dev
>> Subject: Re: [discuss] Zeppelin support workflow
>> 
>> Thanks for the sharing this kind of discussion.
>> 
>> I'm interested in it. Will see it.
>> 
>> On Mon, Mar 11, 2019 at 10:43 AM Xun Liu  wrote:
>> 
>>> Hello, everyone
>>> 
>>> Because there are more than 20 interpreters in zeppelin,  Data analysts
>>> can be used to do a variety of data development,
>>> A lot of data development is interdependent.
>>> For example, the development of machine learning algorithms requires
>>> relying on spark to preprocess data, and so on.
>>> 
>>> Zeppelin should have built-in workflow capabilities. Instead of relying
>> on
>>> external software to schedule notes in zeppelin for the following
>> reasons:
>>> 
>>> 1. Now that we have upgraded from the data processing era to the
>> algorithm
>>> era, After zeppelin has its own workflow,
>>> Will have a complete ecosystem of complete data processing and
>> algorithmic
>>> operations.
>>> 2. zeppelin's powerful interactive processing capabilities help algorithm
>>> engineers improve productivity and work.
>>> Zeppelin should give the algorithm engineer more direct control. Instead
>>> of handing the algorithm to other teams(or software) to do the workflow.
>>> 3. zeppelin knows more about the processing status of data than Azkaban
>>> and airflow.
>>> So the built-in workflow will have better performance, user experience
>> and
>>> control.
>>> 
>>> Typical use case
>>> Especially in machine learning, Because machine learning generally has a
>>> long task execution.
>>> A typical example is as follows:
>>> 1) First, obtain data from HDFS through spark;
>>> 2) Clean and convert the data through sparksql;
>>> 3) Feature extraction of data through spark;
>>> 4) Tensorflow writing algorithm through hadoop submarine;
>>> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
>>> processing;
>>> 6) Publish the training acquisition model and provide online prediction
>>> services;
>>> 7) Model prediction by flink;
>>> 8) Receive incremental data through flink for incremental update of the
>>> model;
>>> Therefore, zeppelin is especially required to have the ability to arrange
>>> workflows.
>>> 
>>> I completed the draft of the zeppelin workflow system design, please
>>> review, you can directly modify the document or fill in the comments.
>>> 
>>> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>> gdoc:
>>> 
>> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
>>> <
>>> 
>> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
>>> 
>>> 
>>> 
>>> :-)
>>> 
>>> Xun Liu
>>> 2019-03-11
>> 
>> 
>> 
>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>> 



Re: [discuss] Zeppelin support workflow

2019-03-18 Thread Mei Long
Very cool! @Xun Liu Would you like to talk about it at our next Apache
Zeppelin community meeting?

On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung 
wrote:

> I like it!
>
> 
> From: Jongyoul Lee 
> Sent: Monday, March 11, 2019 9:05:03 PM
> To: dev
> Subject: Re: [discuss] Zeppelin support workflow
>
> Thanks for the sharing this kind of discussion.
>
> I'm interested in it. Will see it.
>
> On Mon, Mar 11, 2019 at 10:43 AM Xun Liu  wrote:
>
> > Hello, everyone
> >
> > Because there are more than 20 interpreters in zeppelin,  Data analysts
> > can be used to do a variety of data development,
> > A lot of data development is interdependent.
> > For example, the development of machine learning algorithms requires
> > relying on spark to preprocess data, and so on.
> >
> > Zeppelin should have built-in workflow capabilities. Instead of relying
> on
> > external software to schedule notes in zeppelin for the following
> reasons:
> >
> > 1. Now that we have upgraded from the data processing era to the
> algorithm
> > era, After zeppelin has its own workflow,
> > Will have a complete ecosystem of complete data processing and
> algorithmic
> > operations.
> > 2. zeppelin's powerful interactive processing capabilities help algorithm
> > engineers improve productivity and work.
> > Zeppelin should give the algorithm engineer more direct control. Instead
> > of handing the algorithm to other teams(or software) to do the workflow.
> > 3. zeppelin knows more about the processing status of data than Azkaban
> > and airflow.
> > So the built-in workflow will have better performance, user experience
> and
> > control.
> >
> > Typical use case
> > Especially in machine learning, Because machine learning generally has a
> > long task execution.
> > A typical example is as follows:
> > 1) First, obtain data from HDFS through spark;
> > 2) Clean and convert the data through sparksql;
> > 3) Feature extraction of data through spark;
> > 4) Tensorflow writing algorithm through hadoop submarine;
> > 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> > processing;
> > 6) Publish the training acquisition model and provide online prediction
> > services;
> > 7) Model prediction by flink;
> > 8) Receive incremental data through flink for incremental update of the
> > model;
> > Therefore, zeppelin is especially required to have the ability to arrange
> > workflows.
> >
> > I completed the draft of the zeppelin workflow system design, please
> > review, you can directly modify the document or fill in the comments.
> >
> > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> > https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> > gdoc:
> >
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> > <
> >
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> >
> >
> >
> > :-)
> >
> > Xun Liu
> > 2019-03-11
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


Re: [discuss] Zeppelin support workflow

2019-03-16 Thread Felix Cheung
I like it!


From: Jongyoul Lee 
Sent: Monday, March 11, 2019 9:05:03 PM
To: dev
Subject: Re: [discuss] Zeppelin support workflow

Thanks for the sharing this kind of discussion.

I'm interested in it. Will see it.

On Mon, Mar 11, 2019 at 10:43 AM Xun Liu  wrote:

> Hello, everyone
>
> Because there are more than 20 interpreters in zeppelin,  Data analysts
> can be used to do a variety of data development,
> A lot of data development is interdependent.
> For example, the development of machine learning algorithms requires
> relying on spark to preprocess data, and so on.
>
> Zeppelin should have built-in workflow capabilities. Instead of relying on
> external software to schedule notes in zeppelin for the following reasons:
>
> 1. Now that we have upgraded from the data processing era to the algorithm
> era, After zeppelin has its own workflow,
> Will have a complete ecosystem of complete data processing and algorithmic
> operations.
> 2. zeppelin's powerful interactive processing capabilities help algorithm
> engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control. Instead
> of handing the algorithm to other teams(or software) to do the workflow.
> 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> So the built-in workflow will have better performance, user experience and
> control.
>
> Typical use case
> Especially in machine learning, Because machine learning generally has a
> long task execution.
> A typical example is as follows:
> 1) First, obtain data from HDFS through spark;
> 2) Clean and convert the data through sparksql;
> 3) Feature extraction of data through spark;
> 4) Tensorflow writing algorithm through hadoop submarine;
> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> processing;
> 6) Publish the training acquisition model and provide online prediction
> services;
> 7) Model prediction by flink;
> 8) Receive incremental data through flink for incremental update of the
> model;
> Therefore, zeppelin is especially required to have the ability to arrange
> workflows.
>
> I completed the draft of the zeppelin workflow system design, please
> review, you can directly modify the document or fill in the comments.
>
> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> gdoc:
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> <
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit>
>
>
> :-)
>
> Xun Liu
> 2019-03-11



--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [discuss] Zeppelin support workflow

2019-03-11 Thread Jongyoul Lee
Thanks for the sharing this kind of discussion.

I'm interested in it. Will see it.

On Mon, Mar 11, 2019 at 10:43 AM Xun Liu  wrote:

> Hello, everyone
>
> Because there are more than 20 interpreters in zeppelin,  Data analysts
> can be used to do a variety of data development,
> A lot of data development is interdependent.
> For example, the development of machine learning algorithms requires
> relying on spark to preprocess data, and so on.
>
> Zeppelin should have built-in workflow capabilities. Instead of relying on
> external software to schedule notes in zeppelin for the following reasons:
>
> 1. Now that we have upgraded from the data processing era to the algorithm
> era, After zeppelin has its own workflow,
> Will have a complete ecosystem of complete data processing and algorithmic
> operations.
> 2. zeppelin's powerful interactive processing capabilities help algorithm
> engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control. Instead
> of handing the algorithm to other teams(or software) to do the workflow.
> 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> So the built-in workflow will have better performance, user experience and
> control.
>
> Typical use case
> Especially in machine learning, Because machine learning generally has a
> long task execution.
> A typical example is as follows:
> 1) First, obtain data from HDFS through spark;
> 2) Clean and convert the data through sparksql;
> 3) Feature extraction of data through spark;
> 4) Tensorflow writing algorithm through hadoop submarine;
> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> processing;
> 6) Publish the training acquisition model and provide online prediction
> services;
> 7) Model prediction by flink;
> 8) Receive incremental data through flink for incremental update of the
> model;
> Therefore, zeppelin is especially required to have the ability to arrange
> workflows.
>
> I completed the draft of the zeppelin workflow system design, please
> review, you can directly modify the document or fill in the comments.
>
> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> gdoc:
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> <
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit>
>
>
> :-)
>
> Xun Liu
> 2019-03-11



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net