Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-08-18 Thread Shuiqiang Chen
Hi Robert,

Thank you for your reminding! I have added the wiki page[1] for this FLIP.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs

Robert Metzger  于2019年8月14日周三 下午5:56写道:

> It seems that this FLIP doesn't have a Wiki page yet [1], even though it is
> already partially implemented [2]
> We should try to stick more to the FLIP process to manage the project more
> efficiently.
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> [2] https://issues.apache.org/jira/browse/FLINK-12470
>
> On Mon, Jun 17, 2019 at 12:27 PM Gen Luo  wrote:
>
> > Hi all,
> >
> > In the review of PR for FLINK-12473, there were a few comments regarding
> > pipeline exportation. We would like to start a follow up discussions to
> > address some related comments.
> >
> > Currently, FLIP-39 proposal gives a way for users to persist a pipeline
> in
> > JSON format. But it does not specify how users can export a pipeline for
> > serving purpose. We summarized some thoughts on this in the following
> doc.
> >
> >
> >
> https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing
> >
> > After we reach consensus on the pipeline exportation, we will add a
> > corresponding section in FLIP-39.
> >
> >
> > Shaoxuan Wang  于2019年6月5日周三 上午8:47写道:
> >
> > > Stavros,
> > > They have the similar logic concept, but the implementation details are
> > > quite different. It is hard to migrate the interface with different
> > > implementations. The built-in algorithms are useful legacy that we will
> > > consider migrate to the new API (but still with different
> > implementations).
> > > BTW, the new API has already been merged via FLINK-12473.
> > >
> > > Thanks,
> > > Shaoxuan
> > >
> > >
> > >
> > > On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> > > st.kontopou...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Some portion of the code could be migrated to the new Table API no?
> > > > I am saying that because the new API design is based on scikit-learn
> > and
> > > > the old one was also inspired by it.
> > > >
> > > > Best,
> > > > Stavros
> > > > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang 
> > > wrote:
> > > >
> > > > > Another consensus (from the offline discussion) is that we will
> > > > > delete/deprecate flink-libraries/flink-ml. I have started a survey
> > and
> > > > > discussion [1] in dev/user-ml to collect the feedback. Depending on
> > the
> > > > > replies, we will decide if we shall delete it in Flink1.9 or
> > > > > deprecate in the next release after 1.9.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > > > >
> > > > > Regards,
> > > > > Shaoxuan
> > > > >
> > > > >
> > > > > On Tue, May 21, 2019 at 9:22 PM Gen Luo 
> wrote:
> > > > >
> > > > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > > > registering user defined aggregator is also needed which is
> > currently
> > > > > > provided by 'bridge' and finally will be merged into Table API.
> > It's
> > > > same
> > > > > > with collect().
> > > > > >
> > > > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > > > Transformer.transform() to get rid of the dependency on
> > > > > > flink-table-planner. This will be committed soon.
> > > > > >
> > > > > > Aljoscha Krettek  于2019年5月21日周二 下午7:31写道:
> > > > > >
> > > > > > > We discussed this in private and came to the conclusion that we
> > > > should
> > > > > > > (for now) have the dependency on flink-table-api-xxx-bridge
> > because
> > > > we
> > > > > > need
> > > > > > > access to the collect() method, which is not yet available in
> the
> > > > Table
> > > > > > > API. Once that is available the code can be refactored but for
> > now
> > > we
> > > > > > want
> > > > > > > to unblock work on this new module.
> > > > > > >
> > > > > > > We also agreed that we don’t need a direct dependency on
> > > > > > > flink-table-planner.
> > > > > > >
> > > > > > > I hope I summarised our discussion correctly.
> > > > > > >
> > > > > > > > On 17. May 2019, at 12:20, Gen Luo 
> > wrote:
> > > > > > > >
> > > > > > > > Thanks for your reply.
> > > > > > > >
> > > > > > > > For the first question, it's not strictly necessary. But I
> > perfer
> > > > not
> > > > > > to
> > > > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > > > Transformer.transform(), which is not part of machine
> learning
> > > > > concept,
> > > > > > > and
> > > > > > > > may make our API not as clean and pretty as other systems
> do. I
> > > > would
> > > > > > > like
> > > > > > > > another way other than introducing flink-table-planner to do
> > > this.
> > > > If
> > > > > > > it's
> > > > > > > > impossible or severely opposed, I may make the concession to
> > add
> > > > the
> > > > > > > > 

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-08-14 Thread Robert Metzger
It seems that this FLIP doesn't have a Wiki page yet [1], even though it is
already partially implemented [2]
We should try to stick more to the FLIP process to manage the project more
efficiently.


[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
[2] https://issues.apache.org/jira/browse/FLINK-12470

On Mon, Jun 17, 2019 at 12:27 PM Gen Luo  wrote:

> Hi all,
>
> In the review of PR for FLINK-12473, there were a few comments regarding
> pipeline exportation. We would like to start a follow up discussions to
> address some related comments.
>
> Currently, FLIP-39 proposal gives a way for users to persist a pipeline in
> JSON format. But it does not specify how users can export a pipeline for
> serving purpose. We summarized some thoughts on this in the following doc.
>
>
> https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing
>
> After we reach consensus on the pipeline exportation, we will add a
> corresponding section in FLIP-39.
>
>
> Shaoxuan Wang  于2019年6月5日周三 上午8:47写道:
>
> > Stavros,
> > They have the similar logic concept, but the implementation details are
> > quite different. It is hard to migrate the interface with different
> > implementations. The built-in algorithms are useful legacy that we will
> > consider migrate to the new API (but still with different
> implementations).
> > BTW, the new API has already been merged via FLINK-12473.
> >
> > Thanks,
> > Shaoxuan
> >
> >
> >
> > On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> > st.kontopou...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Some portion of the code could be migrated to the new Table API no?
> > > I am saying that because the new API design is based on scikit-learn
> and
> > > the old one was also inspired by it.
> > >
> > > Best,
> > > Stavros
> > > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang 
> > wrote:
> > >
> > > > Another consensus (from the offline discussion) is that we will
> > > > delete/deprecate flink-libraries/flink-ml. I have started a survey
> and
> > > > discussion [1] in dev/user-ml to collect the feedback. Depending on
> the
> > > > replies, we will decide if we shall delete it in Flink1.9 or
> > > > deprecate in the next release after 1.9.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > > >
> > > > Regards,
> > > > Shaoxuan
> > > >
> > > >
> > > > On Tue, May 21, 2019 at 9:22 PM Gen Luo  wrote:
> > > >
> > > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > > registering user defined aggregator is also needed which is
> currently
> > > > > provided by 'bridge' and finally will be merged into Table API.
> It's
> > > same
> > > > > with collect().
> > > > >
> > > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > > Transformer.transform() to get rid of the dependency on
> > > > > flink-table-planner. This will be committed soon.
> > > > >
> > > > > Aljoscha Krettek  于2019年5月21日周二 下午7:31写道:
> > > > >
> > > > > > We discussed this in private and came to the conclusion that we
> > > should
> > > > > > (for now) have the dependency on flink-table-api-xxx-bridge
> because
> > > we
> > > > > need
> > > > > > access to the collect() method, which is not yet available in the
> > > Table
> > > > > > API. Once that is available the code can be refactored but for
> now
> > we
> > > > > want
> > > > > > to unblock work on this new module.
> > > > > >
> > > > > > We also agreed that we don’t need a direct dependency on
> > > > > > flink-table-planner.
> > > > > >
> > > > > > I hope I summarised our discussion correctly.
> > > > > >
> > > > > > > On 17. May 2019, at 12:20, Gen Luo 
> wrote:
> > > > > > >
> > > > > > > Thanks for your reply.
> > > > > > >
> > > > > > > For the first question, it's not strictly necessary. But I
> perfer
> > > not
> > > > > to
> > > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > > Transformer.transform(), which is not part of machine learning
> > > > concept,
> > > > > > and
> > > > > > > may make our API not as clean and pretty as other systems do. I
> > > would
> > > > > > like
> > > > > > > another way other than introducing flink-table-planner to do
> > this.
> > > If
> > > > > > it's
> > > > > > > impossible or severely opposed, I may make the concession to
> add
> > > the
> > > > > > > argument.
> > > > > > >
> > > > > > > Other than that, "flink-table-api-xxx-bridge"s are still
> needed.
> > A
> > > > vary
> > > > > > > common case is that an algorithm needs to guarantee that it's
> > > running
> > > > > > under
> > > > > > > a BatchTableEnvironment, which makes it possible to collect
> > result
> > > > each
> > > > > > > iteration. A typical algorithm like this is ALS. By flink1.8,
> > this
> > > > can
> > > > > be
> > > > > > > only achieved by converting Table to DataSet than call
> > > > > 

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-06-17 Thread Gen Luo
Hi all,

In the review of PR for FLINK-12473, there were a few comments regarding
pipeline exportation. We would like to start a follow up discussions to
address some related comments.

Currently, FLIP-39 proposal gives a way for users to persist a pipeline in
JSON format. But it does not specify how users can export a pipeline for
serving purpose. We summarized some thoughts on this in the following doc.

https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing

After we reach consensus on the pipeline exportation, we will add a
corresponding section in FLIP-39.


Shaoxuan Wang  于2019年6月5日周三 上午8:47写道:

> Stavros,
> They have the similar logic concept, but the implementation details are
> quite different. It is hard to migrate the interface with different
> implementations. The built-in algorithms are useful legacy that we will
> consider migrate to the new API (but still with different implementations).
> BTW, the new API has already been merged via FLINK-12473.
>
> Thanks,
> Shaoxuan
>
>
>
> On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> st.kontopou...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Some portion of the code could be migrated to the new Table API no?
> > I am saying that because the new API design is based on scikit-learn and
> > the old one was also inspired by it.
> >
> > Best,
> > Stavros
> > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang 
> wrote:
> >
> > > Another consensus (from the offline discussion) is that we will
> > > delete/deprecate flink-libraries/flink-ml. I have started a survey and
> > > discussion [1] in dev/user-ml to collect the feedback. Depending on the
> > > replies, we will decide if we shall delete it in Flink1.9 or
> > > deprecate in the next release after 1.9.
> > >
> > > [1]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > >
> > > On Tue, May 21, 2019 at 9:22 PM Gen Luo  wrote:
> > >
> > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > registering user defined aggregator is also needed which is currently
> > > > provided by 'bridge' and finally will be merged into Table API. It's
> > same
> > > > with collect().
> > > >
> > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > Transformer.transform() to get rid of the dependency on
> > > > flink-table-planner. This will be committed soon.
> > > >
> > > > Aljoscha Krettek  于2019年5月21日周二 下午7:31写道:
> > > >
> > > > > We discussed this in private and came to the conclusion that we
> > should
> > > > > (for now) have the dependency on flink-table-api-xxx-bridge because
> > we
> > > > need
> > > > > access to the collect() method, which is not yet available in the
> > Table
> > > > > API. Once that is available the code can be refactored but for now
> we
> > > > want
> > > > > to unblock work on this new module.
> > > > >
> > > > > We also agreed that we don’t need a direct dependency on
> > > > > flink-table-planner.
> > > > >
> > > > > I hope I summarised our discussion correctly.
> > > > >
> > > > > > On 17. May 2019, at 12:20, Gen Luo  wrote:
> > > > > >
> > > > > > Thanks for your reply.
> > > > > >
> > > > > > For the first question, it's not strictly necessary. But I perfer
> > not
> > > > to
> > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > Transformer.transform(), which is not part of machine learning
> > > concept,
> > > > > and
> > > > > > may make our API not as clean and pretty as other systems do. I
> > would
> > > > > like
> > > > > > another way other than introducing flink-table-planner to do
> this.
> > If
> > > > > it's
> > > > > > impossible or severely opposed, I may make the concession to add
> > the
> > > > > > argument.
> > > > > >
> > > > > > Other than that, "flink-table-api-xxx-bridge"s are still needed.
> A
> > > vary
> > > > > > common case is that an algorithm needs to guarantee that it's
> > running
> > > > > under
> > > > > > a BatchTableEnvironment, which makes it possible to collect
> result
> > > each
> > > > > > iteration. A typical algorithm like this is ALS. By flink1.8,
> this
> > > can
> > > > be
> > > > > > only achieved by converting Table to DataSet than call
> > > > DataSet.collect(),
> > > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > > registering
> > > > > > UDAGG is also depending on it.
> > > > > >
> > > > > > In conclusion, '"planner" can be removed from dependencies but
> > > > > introducing
> > > > > > "bridge"s are inevitable. Whether and how to acquire
> > TableEnvironment
> > > > > from
> > > > > > a Table can be discussed.
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-06-04 Thread Shaoxuan Wang
Stavros,
They have the similar logic concept, but the implementation details are
quite different. It is hard to migrate the interface with different
implementations. The built-in algorithms are useful legacy that we will
consider migrate to the new API (but still with different implementations).
BTW, the new API has already been merged via FLINK-12473.

Thanks,
Shaoxuan



On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos 
wrote:

> Hi,
>
> Some portion of the code could be migrated to the new Table API no?
> I am saying that because the new API design is based on scikit-learn and
> the old one was also inspired by it.
>
> Best,
> Stavros
> On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang  wrote:
>
> > Another consensus (from the offline discussion) is that we will
> > delete/deprecate flink-libraries/flink-ml. I have started a survey and
> > discussion [1] in dev/user-ml to collect the feedback. Depending on the
> > replies, we will decide if we shall delete it in Flink1.9 or
> > deprecate in the next release after 1.9.
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> >
> > Regards,
> > Shaoxuan
> >
> >
> > On Tue, May 21, 2019 at 9:22 PM Gen Luo  wrote:
> >
> > > Yes, this is our conclusion. I'd like to add only one point that
> > > registering user defined aggregator is also needed which is currently
> > > provided by 'bridge' and finally will be merged into Table API. It's
> same
> > > with collect().
> > >
> > > I will add a TableEnvironment argument in Estimator.fit() and
> > > Transformer.transform() to get rid of the dependency on
> > > flink-table-planner. This will be committed soon.
> > >
> > > Aljoscha Krettek  于2019年5月21日周二 下午7:31写道:
> > >
> > > > We discussed this in private and came to the conclusion that we
> should
> > > > (for now) have the dependency on flink-table-api-xxx-bridge because
> we
> > > need
> > > > access to the collect() method, which is not yet available in the
> Table
> > > > API. Once that is available the code can be refactored but for now we
> > > want
> > > > to unblock work on this new module.
> > > >
> > > > We also agreed that we don’t need a direct dependency on
> > > > flink-table-planner.
> > > >
> > > > I hope I summarised our discussion correctly.
> > > >
> > > > > On 17. May 2019, at 12:20, Gen Luo  wrote:
> > > > >
> > > > > Thanks for your reply.
> > > > >
> > > > > For the first question, it's not strictly necessary. But I perfer
> not
> > > to
> > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > Transformer.transform(), which is not part of machine learning
> > concept,
> > > > and
> > > > > may make our API not as clean and pretty as other systems do. I
> would
> > > > like
> > > > > another way other than introducing flink-table-planner to do this.
> If
> > > > it's
> > > > > impossible or severely opposed, I may make the concession to add
> the
> > > > > argument.
> > > > >
> > > > > Other than that, "flink-table-api-xxx-bridge"s are still needed. A
> > vary
> > > > > common case is that an algorithm needs to guarantee that it's
> running
> > > > under
> > > > > a BatchTableEnvironment, which makes it possible to collect result
> > each
> > > > > iteration. A typical algorithm like this is ALS. By flink1.8, this
> > can
> > > be
> > > > > only achieved by converting Table to DataSet than call
> > > DataSet.collect(),
> > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > registering
> > > > > UDAGG is also depending on it.
> > > > >
> > > > > In conclusion, '"planner" can be removed from dependencies but
> > > > introducing
> > > > > "bridge"s are inevitable. Whether and how to acquire
> TableEnvironment
> > > > from
> > > > > a Table can be discussed.
> > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-06-03 Thread Stavros Kontopoulos
Hi,

Some portion of the code could be migrated to the new Table API no?
I am saying that because the new API design is based on scikit-learn and
the old one was also inspired by it.

Best,
Stavros
On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang  wrote:

> Another consensus (from the offline discussion) is that we will
> delete/deprecate flink-libraries/flink-ml. I have started a survey and
> discussion [1] in dev/user-ml to collect the feedback. Depending on the
> replies, we will decide if we shall delete it in Flink1.9 or
> deprecate in the next release after 1.9.
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
>
> Regards,
> Shaoxuan
>
>
> On Tue, May 21, 2019 at 9:22 PM Gen Luo  wrote:
>
> > Yes, this is our conclusion. I'd like to add only one point that
> > registering user defined aggregator is also needed which is currently
> > provided by 'bridge' and finally will be merged into Table API. It's same
> > with collect().
> >
> > I will add a TableEnvironment argument in Estimator.fit() and
> > Transformer.transform() to get rid of the dependency on
> > flink-table-planner. This will be committed soon.
> >
> > Aljoscha Krettek  于2019年5月21日周二 下午7:31写道:
> >
> > > We discussed this in private and came to the conclusion that we should
> > > (for now) have the dependency on flink-table-api-xxx-bridge because we
> > need
> > > access to the collect() method, which is not yet available in the Table
> > > API. Once that is available the code can be refactored but for now we
> > want
> > > to unblock work on this new module.
> > >
> > > We also agreed that we don’t need a direct dependency on
> > > flink-table-planner.
> > >
> > > I hope I summarised our discussion correctly.
> > >
> > > > On 17. May 2019, at 12:20, Gen Luo  wrote:
> > > >
> > > > Thanks for your reply.
> > > >
> > > > For the first question, it's not strictly necessary. But I perfer not
> > to
> > > > have a TableEnvironment argument in Estimator.fit() or
> > > > Transformer.transform(), which is not part of machine learning
> concept,
> > > and
> > > > may make our API not as clean and pretty as other systems do. I would
> > > like
> > > > another way other than introducing flink-table-planner to do this. If
> > > it's
> > > > impossible or severely opposed, I may make the concession to add the
> > > > argument.
> > > >
> > > > Other than that, "flink-table-api-xxx-bridge"s are still needed. A
> vary
> > > > common case is that an algorithm needs to guarantee that it's running
> > > under
> > > > a BatchTableEnvironment, which makes it possible to collect result
> each
> > > > iteration. A typical algorithm like this is ALS. By flink1.8, this
> can
> > be
> > > > only achieved by converting Table to DataSet than call
> > DataSet.collect(),
> > > > which is available in flink-table-api-xxx-bridge. Besides,
> registering
> > > > UDAGG is also depending on it.
> > > >
> > > > In conclusion, '"planner" can be removed from dependencies but
> > > introducing
> > > > "bridge"s are inevitable. Whether and how to acquire TableEnvironment
> > > from
> > > > a Table can be discussed.
> > >
> > >
> >
>


Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-22 Thread Shaoxuan Wang
Another consensus (from the offline discussion) is that we will
delete/deprecate flink-libraries/flink-ml. I have started a survey and
discussion [1] in dev/user-ml to collect the feedback. Depending on the
replies, we will decide if we shall delete it in Flink1.9 or
deprecate in the next release after 1.9.

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html

Regards,
Shaoxuan


On Tue, May 21, 2019 at 9:22 PM Gen Luo  wrote:

> Yes, this is our conclusion. I'd like to add only one point that
> registering user defined aggregator is also needed which is currently
> provided by 'bridge' and finally will be merged into Table API. It's same
> with collect().
>
> I will add a TableEnvironment argument in Estimator.fit() and
> Transformer.transform() to get rid of the dependency on
> flink-table-planner. This will be committed soon.
>
> Aljoscha Krettek  于2019年5月21日周二 下午7:31写道:
>
> > We discussed this in private and came to the conclusion that we should
> > (for now) have the dependency on flink-table-api-xxx-bridge because we
> need
> > access to the collect() method, which is not yet available in the Table
> > API. Once that is available the code can be refactored but for now we
> want
> > to unblock work on this new module.
> >
> > We also agreed that we don’t need a direct dependency on
> > flink-table-planner.
> >
> > I hope I summarised our discussion correctly.
> >
> > > On 17. May 2019, at 12:20, Gen Luo  wrote:
> > >
> > > Thanks for your reply.
> > >
> > > For the first question, it's not strictly necessary. But I perfer not
> to
> > > have a TableEnvironment argument in Estimator.fit() or
> > > Transformer.transform(), which is not part of machine learning concept,
> > and
> > > may make our API not as clean and pretty as other systems do. I would
> > like
> > > another way other than introducing flink-table-planner to do this. If
> > it's
> > > impossible or severely opposed, I may make the concession to add the
> > > argument.
> > >
> > > Other than that, "flink-table-api-xxx-bridge"s are still needed. A vary
> > > common case is that an algorithm needs to guarantee that it's running
> > under
> > > a BatchTableEnvironment, which makes it possible to collect result each
> > > iteration. A typical algorithm like this is ALS. By flink1.8, this can
> be
> > > only achieved by converting Table to DataSet than call
> DataSet.collect(),
> > > which is available in flink-table-api-xxx-bridge. Besides, registering
> > > UDAGG is also depending on it.
> > >
> > > In conclusion, '"planner" can be removed from dependencies but
> > introducing
> > > "bridge"s are inevitable. Whether and how to acquire TableEnvironment
> > from
> > > a Table can be discussed.
> >
> >
>


Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-21 Thread Gen Luo
Yes, this is our conclusion. I'd like to add only one point that
registering user defined aggregator is also needed which is currently
provided by 'bridge' and finally will be merged into Table API. It's same
with collect().

I will add a TableEnvironment argument in Estimator.fit() and
Transformer.transform() to get rid of the dependency on
flink-table-planner. This will be committed soon.

Aljoscha Krettek  于2019年5月21日周二 下午7:31写道:

> We discussed this in private and came to the conclusion that we should
> (for now) have the dependency on flink-table-api-xxx-bridge because we need
> access to the collect() method, which is not yet available in the Table
> API. Once that is available the code can be refactored but for now we want
> to unblock work on this new module.
>
> We also agreed that we don’t need a direct dependency on
> flink-table-planner.
>
> I hope I summarised our discussion correctly.
>
> > On 17. May 2019, at 12:20, Gen Luo  wrote:
> >
> > Thanks for your reply.
> >
> > For the first question, it's not strictly necessary. But I perfer not to
> > have a TableEnvironment argument in Estimator.fit() or
> > Transformer.transform(), which is not part of machine learning concept,
> and
> > may make our API not as clean and pretty as other systems do. I would
> like
> > another way other than introducing flink-table-planner to do this. If
> it's
> > impossible or severely opposed, I may make the concession to add the
> > argument.
> >
> > Other than that, "flink-table-api-xxx-bridge"s are still needed. A vary
> > common case is that an algorithm needs to guarantee that it's running
> under
> > a BatchTableEnvironment, which makes it possible to collect result each
> > iteration. A typical algorithm like this is ALS. By flink1.8, this can be
> > only achieved by converting Table to DataSet than call DataSet.collect(),
> > which is available in flink-table-api-xxx-bridge. Besides, registering
> > UDAGG is also depending on it.
> >
> > In conclusion, '"planner" can be removed from dependencies but
> introducing
> > "bridge"s are inevitable. Whether and how to acquire TableEnvironment
> from
> > a Table can be discussed.
>
>


Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-21 Thread Aljoscha Krettek
We discussed this in private and came to the conclusion that we should (for 
now) have the dependency on flink-table-api-xxx-bridge because we need access 
to the collect() method, which is not yet available in the Table API. Once that 
is available the code can be refactored but for now we want to unblock work on 
this new module.

We also agreed that we don’t need a direct dependency on flink-table-planner.

I hope I summarised our discussion correctly.

> On 17. May 2019, at 12:20, Gen Luo  wrote:
> 
> Thanks for your reply.
> 
> For the first question, it's not strictly necessary. But I perfer not to
> have a TableEnvironment argument in Estimator.fit() or
> Transformer.transform(), which is not part of machine learning concept, and
> may make our API not as clean and pretty as other systems do. I would like
> another way other than introducing flink-table-planner to do this. If it's
> impossible or severely opposed, I may make the concession to add the
> argument.
> 
> Other than that, "flink-table-api-xxx-bridge"s are still needed. A vary
> common case is that an algorithm needs to guarantee that it's running under
> a BatchTableEnvironment, which makes it possible to collect result each
> iteration. A typical algorithm like this is ALS. By flink1.8, this can be
> only achieved by converting Table to DataSet than call DataSet.collect(),
> which is available in flink-table-api-xxx-bridge. Besides, registering
> UDAGG is also depending on it.
> 
> In conclusion, '"planner" can be removed from dependencies but introducing
> "bridge"s are inevitable. Whether and how to acquire TableEnvironment from
> a Table can be discussed.



Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-17 Thread Gen Luo
Thanks for your reply.

For the first question, it's not strictly necessary. But I perfer not to
have a TableEnvironment argument in Estimator.fit() or
Transformer.transform(), which is not part of machine learning concept, and
may make our API not as clean and pretty as other systems do. I would like
another way other than introducing flink-table-planner to do this. If it's
impossible or severely opposed, I may make the concession to add the
argument.

Other than that, "flink-table-api-xxx-bridge"s are still needed. A vary
common case is that an algorithm needs to guarantee that it's running under
a BatchTableEnvironment, which makes it possible to collect result each
iteration. A typical algorithm like this is ALS. By flink1.8, this can be
only achieved by converting Table to DataSet than call DataSet.collect(),
which is available in flink-table-api-xxx-bridge. Besides, registering
UDAGG is also depending on it.

In conclusion, '"planner" can be removed from dependencies but introducing
"bridge"s are inevitable. Whether and how to acquire TableEnvironment from
a Table can be discussed.


Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-17 Thread Aljoscha Krettek
Hi,

Why is it necessary to acquire a TableEnvironment from a Table?

I think you even said yourself what we should do: "I believe it's better to 
make the api
clean and hide the detail of implementation as much as possible.”. In my 
opinion this means we can only depend on the generic Table API module and not 
let any planner/runner specifics or DataSet/DataStream API leak out. This would 
be setting us up for future problems once we want to deprecate/remove/rework 
those APIs.

Best,
Aljoscha

> On 17. May 2019, at 09:06, Gen Luo  wrote:
> 
> It's better not to depend on flink-table-planner indeed. It's currently
> needed for 3 points: registering udagg, judging the tableEnv batch or
> streaming, converting table to dataSet to collect data. Most of these
> requirements can be fulfilled by flink-table-api-java-bridge and
> flink-table-api-scala-bridge.
> 
> But there's a lack that without current flink-table-planner, it's
> impossible to acquire the tableEnv from a table. If so, all interfaces have
> to require an extra argument tableEnv.
> 
> This does make sense, but personally I don't like it because it has nothing
> to do with machine learning concept. The flink-ml is mainly towards to
> algorithm engineers and scientists, I believe it's better to make the api
> clean and hide the detail of implementation as much as possible. Hopefully
> there would another way to acquire the tableEnv and the api could stay
> clean.
> 
> Aljoscha Krettek  于2019年5月16日周四 下午8:16写道:
> 
>> Hi,
>> 
>> I had a look at the document mostly from a module structure/dependency
>> structure perspective.
>> 
>> We should make the expected dependency structure explicit in the document.
>> 
>> From the discussion in the doc it seems that the intention is that
>> flink-ml-lib should depend on flink-table-planner (the current, pre-blink
>> Table API planner that has a dependency on the DataSet API and DataStream
>> API). I think we should not have this because it ties the Flink ML
>> implementation to a module that is going to be deprecated. As far as I
>> understood, the intention for this new Flink ML module is to be the next
>> generation approach, based on the Table API. If this is true, we should
>> make sure that this only depends on the Table API and is independent of the
>> underlying planner implementation. Especially if we want this to work with
>> the new Blink-based planner that is currently being added to Flink.
>> 
>> What do you think?
>> 
>> Best,
>> Aljoscha
>> 
>>> On 10. May 2019, at 11:22, Shaoxuan Wang  wrote:
>>> 
>>> Hi everyone,
>>> 
>>> I created umbrella Jira FLINK-12470
>>>  for FLIP39 and
>> added an
>>> "implementation plan" section in the google doc
>>> (
>> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx
>> )
>>> > docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx)
>> .>
>>> .
>>> Need your special attention on the organization of modules/packages of
>>> flink-ml. @Aljosha, @Till, @Rong, @Jincheng, @Becket, and all.
>>> 
>>> We anticipate a quick development growth of Flink ML in the next several
>>> releases. Several components (for instance, pipeline, mllib, model
>> serving,
>>> ml integration test) need to be separated into different submodules.
>>> Therefore, we propose to create a new flink-ml module at the root, and
>> add
>>> sub-modules for ml-pipeline and ml-lib of FLIP39, and potentially we
>>> can also design FLIP23 as another sub-module under this new flink-ml
>>> module (I will raise a discussion in FLIP23 ML thread about this). The
>>> legacy flink-ml module (under flink-libraries) can be remained as it is
>> and
>>> await to be deprecated in the future, or alternatively we move it under
>>> this new flink-ml module and rename it to flink-dataset-ml. What do you
>>> think?
>>> 
>>> Looking forward to your feedback.
>>> 
>>> Regards,
>>> Shaoxuan
>>> 
>>> 
>>> On Tue, May 7, 2019 at 8:42 AM Rong Rong  wrote:
>>> 
 Thanks for following up promptly and sharing the feedback @shaoxuan.
 
 Yes I share the same view with you on the convergence of these 2 FLIPs
 eventually. I also have some questions regarding the API as well as the
 possible convergence challenges (especially current Co-processor
>> approach
 vs. FLIP-39's table API approach), I will follow up on the discussion
 thread and the PR on FLIP-23 with you and Boris :-)
 
 --
 Rong
 
 On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang 
>> wrote:
 
> 
> Thanks for the feedback, Rong and Flavio.
> 
> @Rong Rong
>> There's another thread regarding a close to merge FLIP-23
>> implementation
>> [1]. I agree this might still be early stage to talk about
> productionizing
>> and model-serving. But I would be nice to keep the
> design/implementation in
>> mind that: ease of use for 

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-17 Thread Gen Luo
It's better not to depend on flink-table-planner indeed. It's currently
needed for 3 points: registering udagg, judging the tableEnv batch or
streaming, converting table to dataSet to collect data. Most of these
requirements can be fulfilled by flink-table-api-java-bridge and
flink-table-api-scala-bridge.

But there's a lack that without current flink-table-planner, it's
impossible to acquire the tableEnv from a table. If so, all interfaces have
to require an extra argument tableEnv.

This does make sense, but personally I don't like it because it has nothing
to do with machine learning concept. The flink-ml is mainly towards to
algorithm engineers and scientists, I believe it's better to make the api
clean and hide the detail of implementation as much as possible. Hopefully
there would another way to acquire the tableEnv and the api could stay
clean.

Aljoscha Krettek  于2019年5月16日周四 下午8:16写道:

> Hi,
>
> I had a look at the document mostly from a module structure/dependency
> structure perspective.
>
> We should make the expected dependency structure explicit in the document.
>
> From the discussion in the doc it seems that the intention is that
> flink-ml-lib should depend on flink-table-planner (the current, pre-blink
> Table API planner that has a dependency on the DataSet API and DataStream
> API). I think we should not have this because it ties the Flink ML
> implementation to a module that is going to be deprecated. As far as I
> understood, the intention for this new Flink ML module is to be the next
> generation approach, based on the Table API. If this is true, we should
> make sure that this only depends on the Table API and is independent of the
> underlying planner implementation. Especially if we want this to work with
> the new Blink-based planner that is currently being added to Flink.
>
> What do you think?
>
> Best,
> Aljoscha
>
> > On 10. May 2019, at 11:22, Shaoxuan Wang  wrote:
> >
> > Hi everyone,
> >
> > I created umbrella Jira FLINK-12470
> >  for FLIP39 and
> added an
> > "implementation plan" section in the google doc
> > (
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx
> )
> >  docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx)
> .>
> > .
> > Need your special attention on the organization of modules/packages of
> > flink-ml. @Aljosha, @Till, @Rong, @Jincheng, @Becket, and all.
> >
> > We anticipate a quick development growth of Flink ML in the next several
> > releases. Several components (for instance, pipeline, mllib, model
> serving,
> > ml integration test) need to be separated into different submodules.
> > Therefore, we propose to create a new flink-ml module at the root, and
> add
> > sub-modules for ml-pipeline and ml-lib of FLIP39, and potentially we
> > can also design FLIP23 as another sub-module under this new flink-ml
> > module (I will raise a discussion in FLIP23 ML thread about this). The
> > legacy flink-ml module (under flink-libraries) can be remained as it is
> and
> > await to be deprecated in the future, or alternatively we move it under
> > this new flink-ml module and rename it to flink-dataset-ml. What do you
> > think?
> >
> > Looking forward to your feedback.
> >
> > Regards,
> > Shaoxuan
> >
> >
> > On Tue, May 7, 2019 at 8:42 AM Rong Rong  wrote:
> >
> >> Thanks for following up promptly and sharing the feedback @shaoxuan.
> >>
> >> Yes I share the same view with you on the convergence of these 2 FLIPs
> >> eventually. I also have some questions regarding the API as well as the
> >> possible convergence challenges (especially current Co-processor
> approach
> >> vs. FLIP-39's table API approach), I will follow up on the discussion
> >> thread and the PR on FLIP-23 with you and Boris :-)
> >>
> >> --
> >> Rong
> >>
> >> On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang 
> wrote:
> >>
> >>>
> >>> Thanks for the feedback, Rong and Flavio.
> >>>
> >>> @Rong Rong
>  There's another thread regarding a close to merge FLIP-23
> implementation
>  [1]. I agree this might still be early stage to talk about
> >>> productionizing
>  and model-serving. But I would be nice to keep the
> >>> design/implementation in
>  mind that: ease of use for productionizing a ML pipeline is also very
>  important.
>  And if we can leverage the implementation in FLIP-23 in the future,
> >>> (some
>  adjustment might be needed) that would be super helpful.
> >>> Your raised a very good point. Actually I have been reviewing FLIP23
> for
> >>> a while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
> >>> FLIP39 can be well unified at some point. Model serving in FLIP23 is
> >>> actually a special case of “transformer/model” proposed in FLIP39.
> Boris's
> >>> implementation of model serving can be designed as an abstract class
> on top
> >>> of transformer/model 

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-16 Thread Aljoscha Krettek
Hi,

I had a look at the document mostly from a module structure/dependency 
structure perspective.

We should make the expected dependency structure explicit in the document.

From the discussion in the doc it seems that the intention is that flink-ml-lib 
should depend on flink-table-planner (the current, pre-blink Table API planner 
that has a dependency on the DataSet API and DataStream API). I think we should 
not have this because it ties the Flink ML implementation to a module that is 
going to be deprecated. As far as I understood, the intention for this new 
Flink ML module is to be the next generation approach, based on the Table API. 
If this is true, we should make sure that this only depends on the Table API 
and is independent of the underlying planner implementation. Especially if we 
want this to work with the new Blink-based planner that is currently being 
added to Flink.

What do you think?

Best,
Aljoscha

> On 10. May 2019, at 11:22, Shaoxuan Wang  wrote:
> 
> Hi everyone,
> 
> I created umbrella Jira FLINK-12470
>  for FLIP39 and added an
> "implementation plan" section in the google doc
> (https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx)
> 
> .
> Need your special attention on the organization of modules/packages of
> flink-ml. @Aljosha, @Till, @Rong, @Jincheng, @Becket, and all.
> 
> We anticipate a quick development growth of Flink ML in the next several
> releases. Several components (for instance, pipeline, mllib, model serving,
> ml integration test) need to be separated into different submodules.
> Therefore, we propose to create a new flink-ml module at the root, and add
> sub-modules for ml-pipeline and ml-lib of FLIP39, and potentially we
> can also design FLIP23 as another sub-module under this new flink-ml
> module (I will raise a discussion in FLIP23 ML thread about this). The
> legacy flink-ml module (under flink-libraries) can be remained as it is and
> await to be deprecated in the future, or alternatively we move it under
> this new flink-ml module and rename it to flink-dataset-ml. What do you
> think?
> 
> Looking forward to your feedback.
> 
> Regards,
> Shaoxuan
> 
> 
> On Tue, May 7, 2019 at 8:42 AM Rong Rong  wrote:
> 
>> Thanks for following up promptly and sharing the feedback @shaoxuan.
>> 
>> Yes I share the same view with you on the convergence of these 2 FLIPs
>> eventually. I also have some questions regarding the API as well as the
>> possible convergence challenges (especially current Co-processor approach
>> vs. FLIP-39's table API approach), I will follow up on the discussion
>> thread and the PR on FLIP-23 with you and Boris :-)
>> 
>> --
>> Rong
>> 
>> On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang  wrote:
>> 
>>> 
>>> Thanks for the feedback, Rong and Flavio.
>>> 
>>> @Rong Rong
 There's another thread regarding a close to merge FLIP-23 implementation
 [1]. I agree this might still be early stage to talk about
>>> productionizing
 and model-serving. But I would be nice to keep the
>>> design/implementation in
 mind that: ease of use for productionizing a ML pipeline is also very
 important.
 And if we can leverage the implementation in FLIP-23 in the future,
>>> (some
 adjustment might be needed) that would be super helpful.
>>> Your raised a very good point. Actually I have been reviewing FLIP23 for
>>> a while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
>>> FLIP39 can be well unified at some point. Model serving in FLIP23 is
>>> actually a special case of “transformer/model” proposed in FLIP39. Boris's
>>> implementation of model serving can be designed as an abstract class on top
>>> of transformer/model interface, and then can be used by ML users as a
>>> certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
>>> reply to the FLIP23 ML later with more details.
>>> 
>>> @Flavio
 I have read many discussion about Flink ML and none of them take into
 account the ongoing efforts carried out of by the Streamline H2020
>>> project
 [1] on this topic.
 Have you tried to ping them? I think that both projects could benefits
>>> from
 a joined effort on this side..
 [1] https://h2020-streamline-project.eu/objectives/
>>> Thank you for your info. I am not aware of the Streamline H2020 projects
>>> before. Just did a quick look at its website and github. IMO these projects
>>> could be very good Flink ecosystem projects and can be built on top of ML
>>> pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
>>> the owners of these projects to understand their plans and blockers of
>>> using Flink (if there is any). In the meantime, if you have the direct
>>> contact of person who might be interested on ML pipeline & ML lib, 

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-10 Thread Shaoxuan Wang
Hi everyone,

I created umbrella Jira FLINK-12470
 for FLIP39 and added an
"implementation plan" section in the google doc
(https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx)

.
Need your special attention on the organization of modules/packages of
flink-ml. @Aljosha, @Till, @Rong, @Jincheng, @Becket, and all.

We anticipate a quick development growth of Flink ML in the next several
releases. Several components (for instance, pipeline, mllib, model serving,
ml integration test) need to be separated into different submodules.
Therefore, we propose to create a new flink-ml module at the root, and add
sub-modules for ml-pipeline and ml-lib of FLIP39, and potentially we
can also design FLIP23 as another sub-module under this new flink-ml
module (I will raise a discussion in FLIP23 ML thread about this). The
legacy flink-ml module (under flink-libraries) can be remained as it is and
await to be deprecated in the future, or alternatively we move it under
this new flink-ml module and rename it to flink-dataset-ml. What do you
think?

Looking forward to your feedback.

Regards,
Shaoxuan


On Tue, May 7, 2019 at 8:42 AM Rong Rong  wrote:

> Thanks for following up promptly and sharing the feedback @shaoxuan.
>
> Yes I share the same view with you on the convergence of these 2 FLIPs
> eventually. I also have some questions regarding the API as well as the
> possible convergence challenges (especially current Co-processor approach
> vs. FLIP-39's table API approach), I will follow up on the discussion
> thread and the PR on FLIP-23 with you and Boris :-)
>
> --
> Rong
>
> On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang  wrote:
>
>>
>> Thanks for the feedback, Rong and Flavio.
>>
>> @Rong Rong
>> > There's another thread regarding a close to merge FLIP-23 implementation
>> > [1]. I agree this might still be early stage to talk about
>> productionizing
>> > and model-serving. But I would be nice to keep the
>> design/implementation in
>> > mind that: ease of use for productionizing a ML pipeline is also very
>> > important.
>> > And if we can leverage the implementation in FLIP-23 in the future,
>> (some
>> > adjustment might be needed) that would be super helpful.
>> Your raised a very good point. Actually I have been reviewing FLIP23 for
>> a while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
>> FLIP39 can be well unified at some point. Model serving in FLIP23 is
>> actually a special case of “transformer/model” proposed in FLIP39. Boris's
>> implementation of model serving can be designed as an abstract class on top
>> of transformer/model interface, and then can be used by ML users as a
>> certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
>> reply to the FLIP23 ML later with more details.
>>
>> @Flavio
>> > I have read many discussion about Flink ML and none of them take into
>> > account the ongoing efforts carried out of by the Streamline H2020
>> project
>> > [1] on this topic.
>> > Have you tried to ping them? I think that both projects could benefits
>> from
>> > a joined effort on this side..
>> > [1] https://h2020-streamline-project.eu/objectives/
>> Thank you for your info. I am not aware of the Streamline H2020 projects
>> before. Just did a quick look at its website and github. IMO these projects
>> could be very good Flink ecosystem projects and can be built on top of ML
>> pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
>> the owners of these projects to understand their plans and blockers of
>> using Flink (if there is any). In the meantime, if you have the direct
>> contact of person who might be interested on ML pipeline & ML lib, please
>> share with me.
>>
>> Regards,
>> Shaoxuan
>>
>>
>>
>>
>>
>> On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier 
>> wrote:
>>
>>> Hi to all,
>>> I have read many discussion about Flink ML and none of them take into
>>> account the ongoing efforts carried out of by the Streamline H2020
>>> project
>>> [1] on this topic.
>>> Have you tried to ping them? I think that both projects could benefits
>>> from
>>> a joined effort on this side..
>>> [1] https://h2020-streamline-project.eu/objectives/
>>>
>>> Best,
>>> Flavio
>>>
>>> On Thu, May 2, 2019 at 12:18 AM Rong Rong  wrote:
>>>
>>> > Hi Shaoxuan/Weihua,
>>> >
>>> > Thanks for the proposal and driving the effort.
>>> > I also replied to the original discussion thread, and still a +1 on
>>> moving
>>> > towards the ski-learn model.
>>> > I just left a few comments on the API details and some general
>>> questions.
>>> > Please kindly take a look.
>>> >
>>> > There's another thread regarding a close to merge FLIP-23
>>> implementation
>>> > [1]. I agree this might still be early stage to talk about
>>> productionizing
>>> > and 

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-06 Thread Rong Rong
Thanks for following up promptly and sharing the feedback @shaoxuan.

Yes I share the same view with you on the convergence of these 2 FLIPs
eventually. I also have some questions regarding the API as well as the
possible convergence challenges (especially current Co-processor approach
vs. FLIP-39's table API approach), I will follow up on the discussion
thread and the PR on FLIP-23 with you and Boris :-)

--
Rong

On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang  wrote:

>
> Thanks for the feedback, Rong and Flavio.
>
> @Rong Rong
> > There's another thread regarding a close to merge FLIP-23 implementation
> > [1]. I agree this might still be early stage to talk about
> productionizing
> > and model-serving. But I would be nice to keep the design/implementation
> in
> > mind that: ease of use for productionizing a ML pipeline is also very
> > important.
> > And if we can leverage the implementation in FLIP-23 in the future, (some
> > adjustment might be needed) that would be super helpful.
> Your raised a very good point. Actually I have been reviewing FLIP23 for a
> while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
> FLIP39 can be well unified at some point. Model serving in FLIP23 is
> actually a special case of “transformer/model” proposed in FLIP39. Boris's
> implementation of model serving can be designed as an abstract class on top
> of transformer/model interface, and then can be used by ML users as a
> certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
> reply to the FLIP23 ML later with more details.
>
> @Flavio
> > I have read many discussion about Flink ML and none of them take into
> > account the ongoing efforts carried out of by the Streamline H2020
> project
> > [1] on this topic.
> > Have you tried to ping them? I think that both projects could benefits
> from
> > a joined effort on this side..
> > [1] https://h2020-streamline-project.eu/objectives/
> Thank you for your info. I am not aware of the Streamline H2020 projects
> before. Just did a quick look at its website and github. IMO these projects
> could be very good Flink ecosystem projects and can be built on top of ML
> pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
> the owners of these projects to understand their plans and blockers of
> using Flink (if there is any). In the meantime, if you have the direct
> contact of person who might be interested on ML pipeline & ML lib, please
> share with me.
>
> Regards,
> Shaoxuan
>
>
>
>
>
> On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier 
> wrote:
>
>> Hi to all,
>> I have read many discussion about Flink ML and none of them take into
>> account the ongoing efforts carried out of by the Streamline H2020 project
>> [1] on this topic.
>> Have you tried to ping them? I think that both projects could benefits
>> from
>> a joined effort on this side..
>> [1] https://h2020-streamline-project.eu/objectives/
>>
>> Best,
>> Flavio
>>
>> On Thu, May 2, 2019 at 12:18 AM Rong Rong  wrote:
>>
>> > Hi Shaoxuan/Weihua,
>> >
>> > Thanks for the proposal and driving the effort.
>> > I also replied to the original discussion thread, and still a +1 on
>> moving
>> > towards the ski-learn model.
>> > I just left a few comments on the API details and some general
>> questions.
>> > Please kindly take a look.
>> >
>> > There's another thread regarding a close to merge FLIP-23 implementation
>> > [1]. I agree this might still be early stage to talk about
>> productionizing
>> > and model-serving. But I would be nice to keep the
>> design/implementation in
>> > mind that: ease of use for productionizing a ML pipeline is also very
>> > important.
>> > And if we can leverage the implementation in FLIP-23 in the future,
>> (some
>> > adjustment might be needed) that would be super helpful.
>> >
>> > Best,
>> > Rong
>> >
>> >
>> > [1]
>> >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
>> >
>> >
>> > On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang 
>> wrote:
>> >
>> > > Thanks for all the feedback.
>> > >
>> > > @Jincheng Sun
>> > > > I recommend It's better to add a detailed implementation plan to
>> FLIP
>> > and
>> > > google doc.
>> > > Yes, I will add a subsection for implementation plan.
>> > >
>> > > @Chen Qin
>> > > >Just share some of insights from operating SparkML side at scale
>> > > >- map reduce may not best way to iterative sync partitioned workers.
>> > > >- native hardware accelerations is key to adopt rapid changes in ML
>> > > improvements in foreseeable future.
>> > > Thanks for sharing your experience on SparkML. The purpose of this
>> FLIP
>> > is
>> > > mainly to provide the interfaces for ML pipeline and ML lib, and the
>> > > implementations of most standard algorithms. Besides this FLIP, for AI
>> > > computing on Flink, we will continue to contribute the efforts, like
>> the
>> > > enhancement of iterative and the integration of deep learning engines
>> > (such

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-06 Thread Shaoxuan Wang
Thanks for the feedback, Rong and Flavio.

@Rong Rong
> There's another thread regarding a close to merge FLIP-23 implementation
> [1]. I agree this might still be early stage to talk about productionizing
> and model-serving. But I would be nice to keep the design/implementation
in
> mind that: ease of use for productionizing a ML pipeline is also very
> important.
> And if we can leverage the implementation in FLIP-23 in the future, (some
> adjustment might be needed) that would be super helpful.
Your raised a very good point. Actually I have been reviewing FLIP23 for a
while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
FLIP39 can be well unified at some point. Model serving in FLIP23 is
actually a special case of “transformer/model” proposed in FLIP39. Boris's
implementation of model serving can be designed as an abstract class on top
of transformer/model interface, and then can be used by ML users as a
certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
reply to the FLIP23 ML later with more details.

@Flavio
> I have read many discussion about Flink ML and none of them take into
> account the ongoing efforts carried out of by the Streamline H2020 project
> [1] on this topic.
> Have you tried to ping them? I think that both projects could benefits
from
> a joined effort on this side..
> [1] https://h2020-streamline-project.eu/objectives/
Thank you for your info. I am not aware of the Streamline H2020 projects
before. Just did a quick look at its website and github. IMO these projects
could be very good Flink ecosystem projects and can be built on top of ML
pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
the owners of these projects to understand their plans and blockers of
using Flink (if there is any). In the meantime, if you have the direct
contact of person who might be interested on ML pipeline & ML lib, please
share with me.

Regards,
Shaoxuan





On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier 
wrote:

> Hi to all,
> I have read many discussion about Flink ML and none of them take into
> account the ongoing efforts carried out of by the Streamline H2020 project
> [1] on this topic.
> Have you tried to ping them? I think that both projects could benefits from
> a joined effort on this side..
> [1] https://h2020-streamline-project.eu/objectives/
>
> Best,
> Flavio
>
> On Thu, May 2, 2019 at 12:18 AM Rong Rong  wrote:
>
> > Hi Shaoxuan/Weihua,
> >
> > Thanks for the proposal and driving the effort.
> > I also replied to the original discussion thread, and still a +1 on
> moving
> > towards the ski-learn model.
> > I just left a few comments on the API details and some general questions.
> > Please kindly take a look.
> >
> > There's another thread regarding a close to merge FLIP-23 implementation
> > [1]. I agree this might still be early stage to talk about
> productionizing
> > and model-serving. But I would be nice to keep the design/implementation
> in
> > mind that: ease of use for productionizing a ML pipeline is also very
> > important.
> > And if we can leverage the implementation in FLIP-23 in the future, (some
> > adjustment might be needed) that would be super helpful.
> >
> > Best,
> > Rong
> >
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
> >
> >
> > On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang 
> wrote:
> >
> > > Thanks for all the feedback.
> > >
> > > @Jincheng Sun
> > > > I recommend It's better to add a detailed implementation plan to FLIP
> > and
> > > google doc.
> > > Yes, I will add a subsection for implementation plan.
> > >
> > > @Chen Qin
> > > >Just share some of insights from operating SparkML side at scale
> > > >- map reduce may not best way to iterative sync partitioned workers.
> > > >- native hardware accelerations is key to adopt rapid changes in ML
> > > improvements in foreseeable future.
> > > Thanks for sharing your experience on SparkML. The purpose of this FLIP
> > is
> > > mainly to provide the interfaces for ML pipeline and ML lib, and the
> > > implementations of most standard algorithms. Besides this FLIP, for AI
> > > computing on Flink, we will continue to contribute the efforts, like
> the
> > > enhancement of iterative and the integration of deep learning engines
> > (such
> > > as Tensoflow/Pytorch). I have presented part of these work in
> > >
> > >
> >
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
> > > I am not sure if I have fully got your comments. Can you please
> elaborate
> > > them with more details, and if possible, please provide some
> suggestions
> > > about what we should work on to address the challenges you have
> > mentioned.
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > > On Mon, Apr 29, 2019 at 11:28 AM Chen Qin  wrote:
> > >
> > > > Just share some of insights from operating SparkML side at scale
> > > 

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-02 Thread Flavio Pompermaier
Hi to all,
I have read many discussion about Flink ML and none of them take into
account the ongoing efforts carried out of by the Streamline H2020 project
[1] on this topic.
Have you tried to ping them? I think that both projects could benefits from
a joined effort on this side..
[1] https://h2020-streamline-project.eu/objectives/

Best,
Flavio

On Thu, May 2, 2019 at 12:18 AM Rong Rong  wrote:

> Hi Shaoxuan/Weihua,
>
> Thanks for the proposal and driving the effort.
> I also replied to the original discussion thread, and still a +1 on moving
> towards the ski-learn model.
> I just left a few comments on the API details and some general questions.
> Please kindly take a look.
>
> There's another thread regarding a close to merge FLIP-23 implementation
> [1]. I agree this might still be early stage to talk about productionizing
> and model-serving. But I would be nice to keep the design/implementation in
> mind that: ease of use for productionizing a ML pipeline is also very
> important.
> And if we can leverage the implementation in FLIP-23 in the future, (some
> adjustment might be needed) that would be super helpful.
>
> Best,
> Rong
>
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
>
>
> On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang  wrote:
>
> > Thanks for all the feedback.
> >
> > @Jincheng Sun
> > > I recommend It's better to add a detailed implementation plan to FLIP
> and
> > google doc.
> > Yes, I will add a subsection for implementation plan.
> >
> > @Chen Qin
> > >Just share some of insights from operating SparkML side at scale
> > >- map reduce may not best way to iterative sync partitioned workers.
> > >- native hardware accelerations is key to adopt rapid changes in ML
> > improvements in foreseeable future.
> > Thanks for sharing your experience on SparkML. The purpose of this FLIP
> is
> > mainly to provide the interfaces for ML pipeline and ML lib, and the
> > implementations of most standard algorithms. Besides this FLIP, for AI
> > computing on Flink, we will continue to contribute the efforts, like the
> > enhancement of iterative and the integration of deep learning engines
> (such
> > as Tensoflow/Pytorch). I have presented part of these work in
> >
> >
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
> > I am not sure if I have fully got your comments. Can you please elaborate
> > them with more details, and if possible, please provide some suggestions
> > about what we should work on to address the challenges you have
> mentioned.
> >
> > Regards,
> > Shaoxuan
> >
> > On Mon, Apr 29, 2019 at 11:28 AM Chen Qin  wrote:
> >
> > > Just share some of insights from operating SparkML side at scale
> > > - map reduce may not best way to iterative sync partitioned workers.
> > > - native hardware accelerations is key to adopt rapid changes in ML
> > > improvements in foreseeable future.
> > >
> > > Chen
> > >
> > > On Apr 29, 2019, at 11:02, jincheng sun 
> > wrote:
> > > >
> > > > Hi Shaoxuan,
> > > >
> > > > Thanks for doing more efforts for the enhances of the scalability and
> > the
> > > > ease of use of Flink ML and make it one step further. Thank you for
> > > sharing
> > > > a lot of context information.
> > > >
> > > > big +1 for this proposal!
> > > >
> > > > Here only one suggestion, that is, It has been a short time until the
> > > > release of flink-1.9, so I recommend It's better to add a detailed
> > > > implementation plan to FLIP and google doc.
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > > > Shaoxuan Wang  于2019年4月29日周一 上午10:34写道:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI
> > > several
> > > >> months ago in this mail thread:
> > > >>
> > > >>
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> > > >>
> > > >> Luogen, Becket, Xu, Weihua and I have been working on this proposal
> > > >> offline in
> > > >> the past a few months. Now we want to share the first phase of the
> > > entire
> > > >> proposal with a FLIP. In this FLIP-39, we want to achieve several
> > things
> > > >> (and hope those can be accomplished and released in Flink-1.9):
> > > >>
> > > >>   -
> > > >>
> > > >>   Provide a new set of ML core interface (on top of Flink TableAPI)
> > > >>   -
> > > >>
> > > >>   Provide a ML pipeline interface (on top of Flink TableAPI)
> > > >>   -
> > > >>
> > > >>   Provide the interfaces for parameters management and pipeline/mode
> > > >>   persistence
> > > >>   -
> > > >>
> > > >>   All the above interfaces should facilitate any new ML algorithm.
> We
> > > will
> > > >>   gradually add various standard ML algorithms on top of these new
> > > >> proposed
> > > >>   interfaces to ensure their feasibility and 

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-01 Thread Rong Rong
Hi Shaoxuan/Weihua,

Thanks for the proposal and driving the effort.
I also replied to the original discussion thread, and still a +1 on moving
towards the ski-learn model.
I just left a few comments on the API details and some general questions.
Please kindly take a look.

There's another thread regarding a close to merge FLIP-23 implementation
[1]. I agree this might still be early stage to talk about productionizing
and model-serving. But I would be nice to keep the design/implementation in
mind that: ease of use for productionizing a ML pipeline is also very
important.
And if we can leverage the implementation in FLIP-23 in the future, (some
adjustment might be needed) that would be super helpful.

Best,
Rong


[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html


On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang  wrote:

> Thanks for all the feedback.
>
> @Jincheng Sun
> > I recommend It's better to add a detailed implementation plan to FLIP and
> google doc.
> Yes, I will add a subsection for implementation plan.
>
> @Chen Qin
> >Just share some of insights from operating SparkML side at scale
> >- map reduce may not best way to iterative sync partitioned workers.
> >- native hardware accelerations is key to adopt rapid changes in ML
> improvements in foreseeable future.
> Thanks for sharing your experience on SparkML. The purpose of this FLIP is
> mainly to provide the interfaces for ML pipeline and ML lib, and the
> implementations of most standard algorithms. Besides this FLIP, for AI
> computing on Flink, we will continue to contribute the efforts, like the
> enhancement of iterative and the integration of deep learning engines (such
> as Tensoflow/Pytorch). I have presented part of these work in
>
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
> I am not sure if I have fully got your comments. Can you please elaborate
> them with more details, and if possible, please provide some suggestions
> about what we should work on to address the challenges you have mentioned.
>
> Regards,
> Shaoxuan
>
> On Mon, Apr 29, 2019 at 11:28 AM Chen Qin  wrote:
>
> > Just share some of insights from operating SparkML side at scale
> > - map reduce may not best way to iterative sync partitioned workers.
> > - native hardware accelerations is key to adopt rapid changes in ML
> > improvements in foreseeable future.
> >
> > Chen
> >
> > On Apr 29, 2019, at 11:02, jincheng sun 
> wrote:
> > >
> > > Hi Shaoxuan,
> > >
> > > Thanks for doing more efforts for the enhances of the scalability and
> the
> > > ease of use of Flink ML and make it one step further. Thank you for
> > sharing
> > > a lot of context information.
> > >
> > > big +1 for this proposal!
> > >
> > > Here only one suggestion, that is, It has been a short time until the
> > > release of flink-1.9, so I recommend It's better to add a detailed
> > > implementation plan to FLIP and google doc.
> > >
> > > What do you think?
> > >
> > > Best,
> > > Jincheng
> > >
> > > Shaoxuan Wang  于2019年4月29日周一 上午10:34写道:
> > >
> > >> Hi everyone,
> > >>
> > >> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI
> > several
> > >> months ago in this mail thread:
> > >>
> > >>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> > >>
> > >> Luogen, Becket, Xu, Weihua and I have been working on this proposal
> > >> offline in
> > >> the past a few months. Now we want to share the first phase of the
> > entire
> > >> proposal with a FLIP. In this FLIP-39, we want to achieve several
> things
> > >> (and hope those can be accomplished and released in Flink-1.9):
> > >>
> > >>   -
> > >>
> > >>   Provide a new set of ML core interface (on top of Flink TableAPI)
> > >>   -
> > >>
> > >>   Provide a ML pipeline interface (on top of Flink TableAPI)
> > >>   -
> > >>
> > >>   Provide the interfaces for parameters management and pipeline/mode
> > >>   persistence
> > >>   -
> > >>
> > >>   All the above interfaces should facilitate any new ML algorithm. We
> > will
> > >>   gradually add various standard ML algorithms on top of these new
> > >> proposed
> > >>   interfaces to ensure their feasibility and scalability.
> > >>
> > >>
> > >> Part of this FLIP has been present in Flink Forward 2019 @ San
> > Francisco by
> > >> Xu and Me.
> > >>
> > >>
> > >>
> >
> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
> > >>
> > >>
> > >>
> >
> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
> > >>
> > >> You can find the videos & slides at
> > >> https://www.ververica.com/flink-forward-san-francisco-2019
> > >>
> > >> The design document for FLIP-39 can be found here:
> > >>
> > >>
> > >>
> >
> 

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-30 Thread Shaoxuan Wang
Thanks for all the feedback.

@Jincheng Sun
> I recommend It's better to add a detailed implementation plan to FLIP and
google doc.
Yes, I will add a subsection for implementation plan.

@Chen Qin
>Just share some of insights from operating SparkML side at scale
>- map reduce may not best way to iterative sync partitioned workers.
>- native hardware accelerations is key to adopt rapid changes in ML
improvements in foreseeable future.
Thanks for sharing your experience on SparkML. The purpose of this FLIP is
mainly to provide the interfaces for ML pipeline and ML lib, and the
implementations of most standard algorithms. Besides this FLIP, for AI
computing on Flink, we will continue to contribute the efforts, like the
enhancement of iterative and the integration of deep learning engines (such
as Tensoflow/Pytorch). I have presented part of these work in
https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
I am not sure if I have fully got your comments. Can you please elaborate
them with more details, and if possible, please provide some suggestions
about what we should work on to address the challenges you have mentioned.

Regards,
Shaoxuan

On Mon, Apr 29, 2019 at 11:28 AM Chen Qin  wrote:

> Just share some of insights from operating SparkML side at scale
> - map reduce may not best way to iterative sync partitioned workers.
> - native hardware accelerations is key to adopt rapid changes in ML
> improvements in foreseeable future.
>
> Chen
>
> On Apr 29, 2019, at 11:02, jincheng sun  wrote:
> >
> > Hi Shaoxuan,
> >
> > Thanks for doing more efforts for the enhances of the scalability and the
> > ease of use of Flink ML and make it one step further. Thank you for
> sharing
> > a lot of context information.
> >
> > big +1 for this proposal!
> >
> > Here only one suggestion, that is, It has been a short time until the
> > release of flink-1.9, so I recommend It's better to add a detailed
> > implementation plan to FLIP and google doc.
> >
> > What do you think?
> >
> > Best,
> > Jincheng
> >
> > Shaoxuan Wang  于2019年4月29日周一 上午10:34写道:
> >
> >> Hi everyone,
> >>
> >> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI
> several
> >> months ago in this mail thread:
> >>
> >>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> >>
> >> Luogen, Becket, Xu, Weihua and I have been working on this proposal
> >> offline in
> >> the past a few months. Now we want to share the first phase of the
> entire
> >> proposal with a FLIP. In this FLIP-39, we want to achieve several things
> >> (and hope those can be accomplished and released in Flink-1.9):
> >>
> >>   -
> >>
> >>   Provide a new set of ML core interface (on top of Flink TableAPI)
> >>   -
> >>
> >>   Provide a ML pipeline interface (on top of Flink TableAPI)
> >>   -
> >>
> >>   Provide the interfaces for parameters management and pipeline/mode
> >>   persistence
> >>   -
> >>
> >>   All the above interfaces should facilitate any new ML algorithm. We
> will
> >>   gradually add various standard ML algorithms on top of these new
> >> proposed
> >>   interfaces to ensure their feasibility and scalability.
> >>
> >>
> >> Part of this FLIP has been present in Flink Forward 2019 @ San
> Francisco by
> >> Xu and Me.
> >>
> >>
> >>
> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
> >>
> >>
> >>
> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
> >>
> >> You can find the videos & slides at
> >> https://www.ververica.com/flink-forward-san-francisco-2019
> >>
> >> The design document for FLIP-39 can be found here:
> >>
> >>
> >>
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
> >>
> >>
> >> I am looking forward to your feedback.
> >>
> >> Regards,
> >>
> >> Shaoxuan
> >>
>


Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-28 Thread Chen Qin
Just share some of insights from operating SparkML side at scale
- map reduce may not best way to iterative sync partitioned workers. 
- native hardware accelerations is key to adopt rapid changes in ML 
improvements in foreseeable future.

Chen

On Apr 29, 2019, at 11:02, jincheng sun  wrote:
> 
> Hi Shaoxuan,
> 
> Thanks for doing more efforts for the enhances of the scalability and the
> ease of use of Flink ML and make it one step further. Thank you for sharing
> a lot of context information.
> 
> big +1 for this proposal!
> 
> Here only one suggestion, that is, It has been a short time until the
> release of flink-1.9, so I recommend It's better to add a detailed
> implementation plan to FLIP and google doc.
> 
> What do you think?
> 
> Best,
> Jincheng
> 
> Shaoxuan Wang  于2019年4月29日周一 上午10:34写道:
> 
>> Hi everyone,
>> 
>> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several
>> months ago in this mail thread:
>> 
>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
>> 
>> Luogen, Becket, Xu, Weihua and I have been working on this proposal
>> offline in
>> the past a few months. Now we want to share the first phase of the entire
>> proposal with a FLIP. In this FLIP-39, we want to achieve several things
>> (and hope those can be accomplished and released in Flink-1.9):
>> 
>>   -
>> 
>>   Provide a new set of ML core interface (on top of Flink TableAPI)
>>   -
>> 
>>   Provide a ML pipeline interface (on top of Flink TableAPI)
>>   -
>> 
>>   Provide the interfaces for parameters management and pipeline/mode
>>   persistence
>>   -
>> 
>>   All the above interfaces should facilitate any new ML algorithm. We will
>>   gradually add various standard ML algorithms on top of these new
>> proposed
>>   interfaces to ensure their feasibility and scalability.
>> 
>> 
>> Part of this FLIP has been present in Flink Forward 2019 @ San Francisco by
>> Xu and Me.
>> 
>> 
>> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
>> 
>> 
>> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
>> 
>> You can find the videos & slides at
>> https://www.ververica.com/flink-forward-san-francisco-2019
>> 
>> The design document for FLIP-39 can be found here:
>> 
>> 
>> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
>> 
>> 
>> I am looking forward to your feedback.
>> 
>> Regards,
>> 
>> Shaoxuan
>> 


Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-28 Thread jincheng sun
Hi Shaoxuan,

Thanks for doing more efforts for the enhances of the scalability and the
ease of use of Flink ML and make it one step further. Thank you for sharing
a lot of context information.

big +1 for this proposal!

Here only one suggestion, that is, It has been a short time until the
release of flink-1.9, so I recommend It's better to add a detailed
implementation plan to FLIP and google doc.

What do you think?

Best,
Jincheng

Shaoxuan Wang  于2019年4月29日周一 上午10:34写道:

> Hi everyone,
>
> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several
> months ago in this mail thread:
>
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
>
> Luogen, Becket, Xu, Weihua and I have been working on this proposal
> offline in
> the past a few months. Now we want to share the first phase of the entire
> proposal with a FLIP. In this FLIP-39, we want to achieve several things
> (and hope those can be accomplished and released in Flink-1.9):
>
>-
>
>Provide a new set of ML core interface (on top of Flink TableAPI)
>-
>
>Provide a ML pipeline interface (on top of Flink TableAPI)
>-
>
>Provide the interfaces for parameters management and pipeline/mode
>persistence
>-
>
>All the above interfaces should facilitate any new ML algorithm. We will
>gradually add various standard ML algorithms on top of these new
> proposed
>interfaces to ensure their feasibility and scalability.
>
>
> Part of this FLIP has been present in Flink Forward 2019 @ San Francisco by
> Xu and Me.
>
>
> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
>
>
> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
>
> You can find the videos & slides at
> https://www.ververica.com/flink-forward-san-francisco-2019
>
> The design document for FLIP-39 can be found here:
>
>
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
>
>
> I am looking forward to your feedback.
>
> Regards,
>
> Shaoxuan
>


[DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-28 Thread Shaoxuan Wang
Hi everyone,

Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several
months ago in this mail thread:

http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html

Luogen, Becket, Xu, Weihua and I have been working on this proposal offline in
the past a few months. Now we want to share the first phase of the entire
proposal with a FLIP. In this FLIP-39, we want to achieve several things
(and hope those can be accomplished and released in Flink-1.9):

   -

   Provide a new set of ML core interface (on top of Flink TableAPI)
   -

   Provide a ML pipeline interface (on top of Flink TableAPI)
   -

   Provide the interfaces for parameters management and pipeline/mode
   persistence
   -

   All the above interfaces should facilitate any new ML algorithm. We will
   gradually add various standard ML algorithms on top of these new proposed
   interfaces to ensure their feasibility and scalability.


Part of this FLIP has been present in Flink Forward 2019 @ San Francisco by
Xu and Me.

https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api

https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink

You can find the videos & slides at
https://www.ververica.com/flink-forward-san-francisco-2019

The design document for FLIP-39 can be found here:

https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo


I am looking forward to your feedback.

Regards,

Shaoxuan