Hi all!
Sorry for joining this discussion late (I have already missed some of the
deadlines set in this thread).
*Here are some thoughts about what we can do immediately*
(1) Grow ML community by adding committers with a dedicated. Irrespective
of any direction decision, this is a
, our group is no longer
> actively developing them.
>
> Thanks,
>
> Soila
>
> From: Theodore Vasiloudis [mailto:theodoros.vasilou...@gmail.com]
> Sent: Friday, March 3, 2017 4:11 AM
> To: dev@flink.apache.org
> Cc: Kavulya, Soila P <soila.p.kavu...@intel.com>
actively developing them.
Thanks,
Soila
From: Theodore Vasiloudis [mailto:theodoros.vasilou...@gmail.com]
Sent: Friday, March 3, 2017 4:11 AM
To: dev@flink.apache.org
Cc: Kavulya, Soila P <soila.p.kavu...@intel.com>
Subject: Re: [DISCUSS] Flink ML roadmap
It seems like a relatively new project,
> > > >> this approach is that batch algorithms can be applied on streaming
> > data
> > > as
> > > >> well as online algorithms can be supported.
> > > >>
> > > >> One difference to batch learning are the transformers that
e the mean
> > over
> > >> all the data, but the Flink Streaming API is perfect for that. It
> would
> > be
> > >> useful for users to have an extensible toolbox of transformers.
> > >>
> > >> Another difference is the evaluation o
; >> value evolves over time when it sees more labeled data.
> >>
> >> However, the transformation and evaluation works again similar in both
> >> online learning and offline learning.
> >>
> >> I also liked the discussion in [2] and I think that the competit
also liked the discussion in [2] and I think that the competition in
>> the batch learning field is hard and there are already a lot of great
>> projects. I think it is true that in most real world problems it is not
>> necessary to update the model immediately, but there are a lot of
well and I would also be
willing to contribute to the future development of the Flink ML library.
Best regards,
Philipp
[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-td16040.html
<http://apache-flink-mailing-list-archive.1008284.n3.nabble.
to contribute to the future development of the Flink ML library.
Best regards,
Philipp
[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-td16040.html
<http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-td16040.h
@Theodore, thanks for taking lead in the coordination :)
Let's see what we can do, and then decide what should start out as an
independent project, or strictly inside Flink.
I agree that something experimental like batch ML on streaming would
probably benefit more an independent repo first.
Sure having a deadline for March 3rd is fine. I can act as coordinator,
trying to guide the discussion to concrete results.
For committers it's up to their discretion and time if one wants to
participate. I don't think it's necessary to have one, but it would be most
welcome.
@Katherin I would
Okay, let's just aim for around the end of next week, but we can take
more time to discuss if there's still a lot of ongoing activity. Keep
the topic hot!
Thanks all for the enthusiasm :)
On 2017-02-23 16:17, Stavros Kontopoulos wrote:
@Gabor 3rd March is ok for me. But maybe giving a bit
@Gabor 3rd March is ok for me. But maybe giving a bit more time to it like
a week may suit more people.
What do you think all?
I will contribute to the doc.
+100 for having a co-ordinator + commiter.
Thank you all for joining the discussion.
Cheers,
Stavros
On Thu, Feb 23, 2017 at 4:48 PM,
Okay, I've created a skeleton of the design doc for choosing a direction:
https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw/edit?usp=sharing
Much of the pros/cons have already been discussed here, so I'll try to
put there all the arguments mentioned in this thread.
I have asked already some teams for useful cases, but all of them need time
to think.
During analysis something will finally arise.
May be we can ask partners of Flink for cases? Data Artisans got results
of customers survey: [1], ML better support is wanted, so we could ask what
exactly is
+100 for a design doc.
Could we also set a roadmap after some time-boxed investigation captured in
that document? We need action.
Looking forward to work on this (whatever that might be) ;) Also are there
any data supporting one direction or the other from a customer perspective?
It would help
I agree, that it's better to go in one direction first, but I think
online and offline with streaming API can go somewhat parallel later. We
could set a short-term goal, concentrate initially on one direction, and
showcase that direction (e.g. in a blogpost). But first, we should list
the
I'm not sure that this is feasible, doing all at the same time could mean
doing nothing
I'm just afraid, that words: we will work on streaming not on batching, we
have no commiter's time for this, mean that yes, we started work on
FLINK-1730, but nobody will commit this work in the end, as it
Hello all,
@Gabor, we have discussed the idea of using the streaming API to write all
of our ML algorithms with a couple of people offline,
and I think it might be possible and is generally worth a shot. The
approach we would take would be close to Vowpal Wabbit, not exactly
"online", but rather
d to speed up the design of some
> > > features (e.g. side inputs [3]).
> > >
> > > I really hope we can define a new roadmap by which we can finally push
> > > forward the topic. I will put my best to help in this way.
> > >
> > > Sincerely,
> &g
he Data Set
> > https://issues.apache.org/jira/browse/FLINK-1730
> > [2] Only send data to each taskmanager once for broadcasts
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> > 5%3A+Only+send+data+to+each+taskmanager+once+for+broadcasts
> > [3] Side inputs -
i.apache.org/confluence/display/FLINK/FLIP-
> > 5%3A+Only+send+data+to+each+taskmanager+once+for+broadcasts
> > [3] Side inputs - Evolving or static Filter/Enriching
> > https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-
> > MKQYN3m4/edit#
> > http://apache-flink-mailing-list-archive.1008284.n3.
> > nabble.com/DISCUSS-Add-Side-Input-Broadcast-Set-For-
> > Streaming-API-td11529.html
> >
> >
> >
> > --
> > View this message in context: http://apache-flink-mailing-
> > list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-
> > roadmap-tp16040p16064.html
> > Sent from the Apache Flink Mailing List archive. mailing list archive at
> > Nabble.com.
> >
>
ogle.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-
> MKQYN3m4/edit#
> http://apache-flink-mailing-list-archive.1008284.n3.
> nabble.com/DISCUSS-Add-Side-Input-Broadcast-Set-For-
> Streaming-API-td11529.html
>
>
>
> --
> View this message in context: http://apache-flink-mailing-
> list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-
> roadmap-tp16040p16064.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>
edit#
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Add-Side-Input-Broadcast-Set-For-Streaming-API-td11529.html
--
View this message in context:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-tp16040p16064.html
Sent from the Apache Fli
I think Flink ML could be a success. Many use cases out there could benefit
from such algorithms especially online ones.
I agree examples should be created showing how it could be used.
I was not aware of the project re-structuring issues. GPUs is really
important nowdays but it is still not the
Hello guys,
My couple of cents.
All Flink presentations, articles, etc. articulate that Flink is for ETL,
data ingestion. CEP is a maximum.
If you visit http://flink.apache.org/usecases.html, you'll there aren't any
explicit ML or Graphs there.
It's also stated that Flink is suitable when "Data
Hello guys,
May be we will be able to focus our forces on some E2E scenario or show
case for Flink as also ML supporting engine, and in such a way actualize
the roadmap?
This means: we can take some real life/production problem, like Fraud
detection in some area, and try to solve this problem
Hello all,
thank you for opening this discussion Stavros, note that it's almost
exactly 1 year since I last opened such a topic (linked by Gabor) and the
comments there are still relevant.
I think Gabor described the current state quite well, development in the
libraries is hard without
Hi Stavros,
Thanks for bringing this up.
There have been past [1] and recent [2, 3] discussions about the Flink
libraries, because there are some stalling PRs and overloaded
committers. (Actually, Till is the only committer shepherd of the both
the CEP and ML library, and AFAIK he has a ton
(Resending with the appropriate topic)
Hi,
I would like to start a discussion about next steps for Flink ML.
Currently there is a lot of work going on but needs a push forward.
Some topics to discuss:
a) How several features should be planned and get aligned with Flink
releases.
b) Priorities
30 matches
Mail list logo