Re: [discuss] Zeppelin support workflow
Hi, Mei Long I am very happy to be able to attend the meeting of the zeppelin community. What time is the next meeting? Waiting for community email notifications? Zeppelin workflow's ticket is here, https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> welcome everyone's attention. > 在 2019年3月19日,上午1:04,Mei Long 写道: > > Very cool! @Xun Liu Would you like to talk about it at our next Apache > Zeppelin community meeting? > > On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung > wrote: > >> I like it! >> >> >> From: Jongyoul Lee >> Sent: Monday, March 11, 2019 9:05:03 PM >> To: dev >> Subject: Re: [discuss] Zeppelin support workflow >> >> Thanks for the sharing this kind of discussion. >> >> I'm interested in it. Will see it. >> >> On Mon, Mar 11, 2019 at 10:43 AM Xun Liu wrote: >> >>> Hello, everyone >>> >>> Because there are more than 20 interpreters in zeppelin, Data analysts >>> can be used to do a variety of data development, >>> A lot of data development is interdependent. >>> For example, the development of machine learning algorithms requires >>> relying on spark to preprocess data, and so on. >>> >>> Zeppelin should have built-in workflow capabilities. Instead of relying >> on >>> external software to schedule notes in zeppelin for the following >> reasons: >>> >>> 1. Now that we have upgraded from the data processing era to the >> algorithm >>> era, After zeppelin has its own workflow, >>> Will have a complete ecosystem of complete data processing and >> algorithmic >>> operations. >>> 2. zeppelin's powerful interactive processing capabilities help algorithm >>> engineers improve productivity and work. >>> Zeppelin should give the algorithm engineer more direct control. Instead >>> of handing the algorithm to other teams(or software) to do the workflow. >>> 3. zeppelin knows more about the processing status of data than Azkaban >>> and airflow. >>> So the built-in workflow will have better performance, user experience >> and >>> control. >>> >>> Typical use case >>> Especially in machine learning, Because machine learning generally has a >>> long task execution. >>> A typical example is as follows: >>> 1) First, obtain data from HDFS through spark; >>> 2) Clean and convert the data through sparksql; >>> 3) Feature extraction of data through spark; >>> 4) Tensorflow writing algorithm through hadoop submarine; >>> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch >>> processing; >>> 6) Publish the training acquisition model and provide online prediction >>> services; >>> 7) Model prediction by flink; >>> 8) Receive incremental data through flink for incremental update of the >>> model; >>> Therefore, zeppelin is especially required to have the ability to arrange >>> workflows. >>> >>> I completed the draft of the zeppelin workflow system design, please >>> review, you can directly modify the document or fill in the comments. >>> >>> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> >>> gdoc: >>> >> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit >>> < >>> >> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit >>> >>> >>> >>> :-) >>> >>> Xun Liu >>> 2019-03-11 >> >> >> >> -- >> 이종열, Jongyoul Lee, 李宗烈 >> http://madeng.net >>
Re: [discuss] Zeppelin support workflow
Very cool! @Xun Liu Would you like to talk about it at our next Apache Zeppelin community meeting? On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung wrote: > I like it! > > > From: Jongyoul Lee > Sent: Monday, March 11, 2019 9:05:03 PM > To: dev > Subject: Re: [discuss] Zeppelin support workflow > > Thanks for the sharing this kind of discussion. > > I'm interested in it. Will see it. > > On Mon, Mar 11, 2019 at 10:43 AM Xun Liu wrote: > > > Hello, everyone > > > > Because there are more than 20 interpreters in zeppelin, Data analysts > > can be used to do a variety of data development, > > A lot of data development is interdependent. > > For example, the development of machine learning algorithms requires > > relying on spark to preprocess data, and so on. > > > > Zeppelin should have built-in workflow capabilities. Instead of relying > on > > external software to schedule notes in zeppelin for the following > reasons: > > > > 1. Now that we have upgraded from the data processing era to the > algorithm > > era, After zeppelin has its own workflow, > > Will have a complete ecosystem of complete data processing and > algorithmic > > operations. > > 2. zeppelin's powerful interactive processing capabilities help algorithm > > engineers improve productivity and work. > > Zeppelin should give the algorithm engineer more direct control. Instead > > of handing the algorithm to other teams(or software) to do the workflow. > > 3. zeppelin knows more about the processing status of data than Azkaban > > and airflow. > > So the built-in workflow will have better performance, user experience > and > > control. > > > > Typical use case > > Especially in machine learning, Because machine learning generally has a > > long task execution. > > A typical example is as follows: > > 1) First, obtain data from HDFS through spark; > > 2) Clean and convert the data through sparksql; > > 3) Feature extraction of data through spark; > > 4) Tensorflow writing algorithm through hadoop submarine; > > 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch > > processing; > > 6) Publish the training acquisition model and provide online prediction > > services; > > 7) Model prediction by flink; > > 8) Receive incremental data through flink for incremental update of the > > model; > > Therefore, zeppelin is especially required to have the ability to arrange > > workflows. > > > > I completed the draft of the zeppelin workflow system design, please > > review, you can directly modify the document or fill in the comments. > > > > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > > https://issues.apache.org/jira/browse/ZEPPELIN-4018> > > gdoc: > > > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit > > < > > > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit > > > > > > > > :-) > > > > Xun Liu > > 2019-03-11 > > > > -- > 이종열, Jongyoul Lee, 李宗烈 > http://madeng.net >
Re: [discuss] Zeppelin support workflow
I like it! From: Jongyoul Lee Sent: Monday, March 11, 2019 9:05:03 PM To: dev Subject: Re: [discuss] Zeppelin support workflow Thanks for the sharing this kind of discussion. I'm interested in it. Will see it. On Mon, Mar 11, 2019 at 10:43 AM Xun Liu wrote: > Hello, everyone > > Because there are more than 20 interpreters in zeppelin, Data analysts > can be used to do a variety of data development, > A lot of data development is interdependent. > For example, the development of machine learning algorithms requires > relying on spark to preprocess data, and so on. > > Zeppelin should have built-in workflow capabilities. Instead of relying on > external software to schedule notes in zeppelin for the following reasons: > > 1. Now that we have upgraded from the data processing era to the algorithm > era, After zeppelin has its own workflow, > Will have a complete ecosystem of complete data processing and algorithmic > operations. > 2. zeppelin's powerful interactive processing capabilities help algorithm > engineers improve productivity and work. > Zeppelin should give the algorithm engineer more direct control. Instead > of handing the algorithm to other teams(or software) to do the workflow. > 3. zeppelin knows more about the processing status of data than Azkaban > and airflow. > So the built-in workflow will have better performance, user experience and > control. > > Typical use case > Especially in machine learning, Because machine learning generally has a > long task execution. > A typical example is as follows: > 1) First, obtain data from HDFS through spark; > 2) Clean and convert the data through sparksql; > 3) Feature extraction of data through spark; > 4) Tensorflow writing algorithm through hadoop submarine; > 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch > processing; > 6) Publish the training acquisition model and provide online prediction > services; > 7) Model prediction by flink; > 8) Receive incremental data through flink for incremental update of the > model; > Therefore, zeppelin is especially required to have the ability to arrange > workflows. > > I completed the draft of the zeppelin workflow system design, please > review, you can directly modify the document or fill in the comments. > > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018> > gdoc: > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit > < > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit> > > > :-) > > Xun Liu > 2019-03-11 -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net
Re: [discuss] Zeppelin support workflow
Thanks for the sharing this kind of discussion. I'm interested in it. Will see it. On Mon, Mar 11, 2019 at 10:43 AM Xun Liu wrote: > Hello, everyone > > Because there are more than 20 interpreters in zeppelin, Data analysts > can be used to do a variety of data development, > A lot of data development is interdependent. > For example, the development of machine learning algorithms requires > relying on spark to preprocess data, and so on. > > Zeppelin should have built-in workflow capabilities. Instead of relying on > external software to schedule notes in zeppelin for the following reasons: > > 1. Now that we have upgraded from the data processing era to the algorithm > era, After zeppelin has its own workflow, > Will have a complete ecosystem of complete data processing and algorithmic > operations. > 2. zeppelin's powerful interactive processing capabilities help algorithm > engineers improve productivity and work. > Zeppelin should give the algorithm engineer more direct control. Instead > of handing the algorithm to other teams(or software) to do the workflow. > 3. zeppelin knows more about the processing status of data than Azkaban > and airflow. > So the built-in workflow will have better performance, user experience and > control. > > Typical use case > Especially in machine learning, Because machine learning generally has a > long task execution. > A typical example is as follows: > 1) First, obtain data from HDFS through spark; > 2) Clean and convert the data through sparksql; > 3) Feature extraction of data through spark; > 4) Tensorflow writing algorithm through hadoop submarine; > 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch > processing; > 6) Publish the training acquisition model and provide online prediction > services; > 7) Model prediction by flink; > 8) Receive incremental data through flink for incremental update of the > model; > Therefore, zeppelin is especially required to have the ability to arrange > workflows. > > I completed the draft of the zeppelin workflow system design, please > review, you can directly modify the document or fill in the comments. > > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018> > gdoc: > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit > < > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit> > > > :-) > > Xun Liu > 2019-03-11 -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net