Hi everyone,
Just FYI, if there is no further suggestion on FLIP, we plan to start the
voting thread this Friday on 9/10.
Thanks,
Dong
On Fri, Aug 27, 2021 at 10:32 AM Zhipeng Zhang
wrote:
> Thanks for the post, Dong :)
>
> We welcome everyone to drop us an email on Flink ML. Let's work
Thanks for the post, Dong :)
We welcome everyone to drop us an email on Flink ML. Let's work together to
build machine learning on Flink :)
Dong Lin 于2021年8月25日周三 下午8:58写道:
> Hi everyone,
>
> Based on the feedback received in the online/offline discussion in the
> past few weeks, we (Zhepeng,
Hi everyone,
Based on the feedback received in the online/offline discussion in the past
few weeks, we (Zhepeng, Fan, myself and a few other developers at Alibaba)
have reached agreement on the design to support DAG of algorithms. We have
merged the ideas from the intial two options into this
Thanks for the comments, Fan. Please see the reply inline.
On Thu, Aug 19, 2021 at 10:25 PM Fan Hong wrote:
> Hi, Becket,
>
> Many thanks to your detailed review. I agree that it is easier to involve
> more people to discuss if fundamental differences are highlighted.
>
>
> Here are some of my
Hi, Becket,
Many thanks to your detailed review. I agree that it is easier to involve
more people to discuss if fundamental differences are highlighted.
Here are some of my thoughts to help other people to think about these
differences. (correct me if those technique details are not right.)
Hi, Mingliang and Becket,
Thank you for providing a real-world case of heterogeneous topology in the
training and inference phase, and Becket has given two options to you to
choose.
Personally, I think Becket's two options are over-simplified in
description, and may be somehow misleading.
Sincerely,
Fan Hong
--
发件人:青雉(祁明良)
发送时间:2021年8月10日(星期二) 11:36
收件人:dev@flink.apache.org
主 题:Re: [DISCUSS] FLIP-173: Support DAG of algorithms (Flink ML)
Vote for option 2.
It is similar to what we are doing with Tensorflow.
1. Define
Hi Zhipeng,
It looks like there are three different but potentially related things
here.
1. How to describe multiple output of a node in the DAG.
2. How to construct / describe the DAG.
3. Do we need an encapsulation class of a DAG, e.g. the Graph class in
option 1?
It is much easier to discuss
Hi Timo, Becket,
Thanks for the feedback.
I agree that having named table can help the code more readable. No matter
there is one output table or multiple output tables, users have to access
an output table by a magic index (For the case that there is only one
output table, we need to use index
Thanks for the feedback, Mingliang.
Dong, I think what Mingliang meant by option-2 is the second way mentioned
in my email, i.e. having a Graph encapsulation. It does not mean the option
2 in the FLIP. So he actually meant option 1 of the FLIP. Mingliang can
correct me if I misunderstood.
Hi
Hi everyone,
I'm not deeply involved in the discussion but I quickly checked out the
proposed interfaces because it seems they are using Table API heavily
and would like to leave some feedback here:
I have the feeling that the proposed interfaces are a bit too simplified.
Methods like
Thank you Mingliang for providing the comments.
Currently option-1 proposes Graph/GraphModel/GraphBuilder to build an
Estimator from a graph of Estimator/Transformer, where Estimator could
generate the model (as a Transformer) directly. On the other hand, option-2
proposes AlgoOperator that can
Vote for option 2.
It is similar to what we are doing with Tensorflow.
1. Define the graph in training phase
2. Export model with different input/output spec for online inference
Thanks,
Mingliang
On Aug 10, 2021, at 9:39 AM, Becket Qin
mailto:becket@gmail.com>> wrote:
estimatorInputs
Thanks Mingliang. It is super helpful to get your input. At this point,
there are two ways mentioned in the FLIP to support heterogeneous topology
in training and inference phase.
1. Create two separate DAGs or code for training and inference respectively.
2. An encapsulation API called Graph,
Hi all,
This is mingliang, a machine learning engineer in recommendation area.
I see there’s discussion about “heterogeneous topologies in training and
inference.” Actually this is a very common case in recommendation system
especially in CTR prediction tasks. For training task, usually data is
Hi Zhipeng,
Yes, I agree that the key difference between the two options is how they
support MIMO.
My main concern for option 2 is potential inconsistent availability of
algorithms in the two sets of API. In order to make an algorithm available
to both sets of API, people have to implement the
Hi Dong,
Sorry for the late reply.
I am a bit confused by this description of the semantic change. By "from
> Data -> Data conversion to generic Table -> Table", do you mean "Table !=
> Data"?
Yes, I think that Table and Data are not equivalent in this case. It might
depend on what people
Hi Becket,
Thank you for the detailed reply!
My understanding of your comments is that most of option-1 looks good
except its change of the Transformer semantics. Please see my reply inline.
On Tue, Jul 20, 2021 at 11:43 AM Becket Qin wrote:
> Hi Dong, Zhipeng and Fan,
>
> Thanks for the
Hi Becket,
Thanks for the review! I totally agree that it would be easier for people
to discuss if we can list the fundamental difference between these two
proposals. (So I want to make the discussion even shorter)
In my opinion, the fundamental difference between proposal-1 and proposal-2
is
Hi Dong, Zhipeng and Fan,
Thanks for the detailed proposals. It is quite a lot of reading! Given that
we are introducing a lot of stuff here, I find that it might be easier for
people to discuss if we can list the fundamental differences first. From
what I understand, the very fundamental
Hi all,
Zhipeng, Fan (cc'ed) and I are opening this thread to discuss two different
designs to extend Flink ML API to support more use-cases, e.g. expressing a
DAG of preprocessing and training logics. These two designs have been
documented in FLIP-173
21 matches
Mail list logo