Re: [DISCUSS] Flink ML roadmap

2017-03-14 Thread Stephan Ewen
Hi all! Sorry for joining this discussion late (I have already missed some of the deadlines set in this thread). *Here are some thoughts about what we can do immediately* (1) Grow ML community by adding committers with a dedicated. Irrespective of any direction decision, this is a

Re: [DISCUSS] Flink ML roadmap

2017-03-10 Thread Till Rohrmann
, our group is no longer > actively developing them. > > Thanks, > > Soila > > From: Theodore Vasiloudis [mailto:theodoros.vasilou...@gmail.com] > Sent: Friday, March 3, 2017 4:11 AM > To: dev@flink.apache.org > Cc: Kavulya, Soila P <soila.p.kavu...@intel.com>

RE: [DISCUSS] Flink ML roadmap

2017-03-08 Thread Kavulya, Soila P
actively developing them. Thanks, Soila From: Theodore Vasiloudis [mailto:theodoros.vasilou...@gmail.com] Sent: Friday, March 3, 2017 4:11 AM To: dev@flink.apache.org Cc: Kavulya, Soila P <soila.p.kavu...@intel.com> Subject: Re: [DISCUSS] Flink ML roadmap It seems like a relatively new project,

Re: [DISCUSS] Flink ML roadmap

2017-03-03 Thread Theodore Vasiloudis
> > > >> this approach is that batch algorithms can be applied on streaming > > data > > > as > > > >> well as online algorithms can be supported. > > > >> > > > >> One difference to batch learning are the transformers that

Re: [DISCUSS] Flink ML roadmap

2017-03-03 Thread Stavros Kontopoulos
e the mean > > over > > >> all the data, but the Flink Streaming API is perfect for that. It > would > > be > > >> useful for users to have an extensible toolbox of transformers. > > >> > > >> Another difference is the evaluation o

Re: [DISCUSS] Flink ML roadmap

2017-03-03 Thread Theodore Vasiloudis
; >> value evolves over time when it sees more labeled data. > >> > >> However, the transformation and evaluation works again similar in both > >> online learning and offline learning. > >> > >> I also liked the discussion in [2] and I think that the competit

Re: [DISCUSS] Flink ML roadmap

2017-03-02 Thread Roberto Bentivoglio
also liked the discussion in [2] and I think that the competition in >> the batch learning field is hard and there are already a lot of great >> projects. I think it is true that in most real world problems it is not >> necessary to update the model immediately, but there are a lot of

Re: [DISCUSS] Flink ML roadmap

2017-02-28 Thread Gábor Hermann
well and I would also be willing to contribute to the future development of the Flink ML library. Best regards, Philipp [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-td16040.html <http://apache-flink-mailing-list-archive.1008284.n3.nabble.

Re: [DISCUSS] Flink ML roadmap

2017-02-27 Thread Philipp Zehnder
to contribute to the future development of the Flink ML library. Best regards, Philipp [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-td16040.html <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-td16040.h

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
@Theodore, thanks for taking lead in the coordination :) Let's see what we can do, and then decide what should start out as an independent project, or strictly inside Flink. I agree that something experimental like batch ML on streaming would probably benefit more an independent repo first.

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Theodore Vasiloudis
Sure having a deadline for March 3rd is fine. I can act as coordinator, trying to guide the discussion to concrete results. For committers it's up to their discretion and time if one wants to participate. I don't think it's necessary to have one, but it would be most welcome. @Katherin I would

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
Okay, let's just aim for around the end of next week, but we can take more time to discuss if there's still a lot of ongoing activity. Keep the topic hot! Thanks all for the enthusiasm :) On 2017-02-23 16:17, Stavros Kontopoulos wrote: @Gabor 3rd March is ok for me. But maybe giving a bit

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Stavros Kontopoulos
@Gabor 3rd March is ok for me. But maybe giving a bit more time to it like a week may suit more people. What do you think all? I will contribute to the doc. +100 for having a co-ordinator + commiter. Thank you all for joining the discussion. Cheers, Stavros On Thu, Feb 23, 2017 at 4:48 PM,

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
Okay, I've created a skeleton of the design doc for choosing a direction: https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw/edit?usp=sharing Much of the pros/cons have already been discussed here, so I'll try to put there all the arguments mentioned in this thread.

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
I have asked already some teams for useful cases, but all of them need time to think. During analysis something will finally arise. May be we can ask partners of Flink for cases? Data Artisans got results of customers survey: [1], ML better support is wanted, so we could ask what exactly is

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Stavros Kontopoulos
+100 for a design doc. Could we also set a roadmap after some time-boxed investigation captured in that document? We need action. Looking forward to work on this (whatever that might be) ;) Also are there any data supporting one direction or the other from a customer perspective? It would help

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
I agree, that it's better to go in one direction first, but I think online and offline with streaming API can go somewhat parallel later. We could set a short-term goal, concentrate initially on one direction, and showcase that direction (e.g. in a blogpost). But first, we should list the

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
I'm not sure that this is feasible, doing all at the same time could mean doing nothing I'm just afraid, that words: we will work on streaming not on batching, we have no commiter's time for this, mean that yes, we started work on FLINK-1730, but nobody will commit this work in the end, as it

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Theodore Vasiloudis
Hello all, @Gabor, we have discussed the idea of using the streaming API to write all of our ML algorithms with a couple of people offline, and I think it might be possible and is generally worth a shot. The approach we would take would be close to Vowpal Wabbit, not exactly "online", but rather

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Katherin Eri
d to speed up the design of some > > > features (e.g. side inputs [3]). > > > > > > I really hope we can define a new roadmap by which we can finally push > > > forward the topic. I will put my best to help in this way. > > > > > > Sincerely, > &g

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Stavros Kontopoulos
he Data Set > > https://issues.apache.org/jira/browse/FLINK-1730 > > [2] Only send data to each taskmanager once for broadcasts > > https://cwiki.apache.org/confluence/display/FLINK/FLIP- > > 5%3A+Only+send+data+to+each+taskmanager+once+for+broadcasts > > [3] Side inputs -

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Till Rohrmann
i.apache.org/confluence/display/FLINK/FLIP- > > 5%3A+Only+send+data+to+each+taskmanager+once+for+broadcasts > > [3] Side inputs - Evolving or static Filter/Enriching > > https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv- > > MKQYN3m4/edit# > > http://apache-flink-mailing-list-archive.1008284.n3. > > nabble.com/DISCUSS-Add-Side-Input-Broadcast-Set-For- > > Streaming-API-td11529.html > > > > > > > > -- > > View this message in context: http://apache-flink-mailing- > > list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML- > > roadmap-tp16040p16064.html > > Sent from the Apache Flink Mailing List archive. mailing list archive at > > Nabble.com. > > >

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Theodore Vasiloudis
ogle.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv- > MKQYN3m4/edit# > http://apache-flink-mailing-list-archive.1008284.n3. > nabble.com/DISCUSS-Add-Side-Input-Broadcast-Set-For- > Streaming-API-td11529.html > > > > -- > View this message in context: http://apache-flink-mailing- > list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML- > roadmap-tp16040p16064.html > Sent from the Apache Flink Mailing List archive. mailing list archive at > Nabble.com. >

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Andrea Spina
edit# http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Add-Side-Input-Broadcast-Set-For-Streaming-API-td11529.html -- View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-ML-roadmap-tp16040p16064.html Sent from the Apache Fli

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Stavros Kontopoulos
I think Flink ML could be a success. Many use cases out there could benefit from such algorithms especially online ones. I agree examples should be created showing how it could be used. I was not aware of the project re-structuring issues. GPUs is really important nowdays but it is still not the

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Timur Shenkao
Hello guys, My couple of cents. All Flink presentations, articles, etc. articulate that Flink is for ETL, data ingestion. CEP is a maximum. If you visit http://flink.apache.org/usecases.html, you'll there aren't any explicit ML or Graphs there. It's also stated that Flink is suitable when "Data

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Katherin Eri
Hello guys, May be we will be able to focus our forces on some E2E scenario or show case for Flink as also ML supporting engine, and in such a way actualize the roadmap? This means: we can take some real life/production problem, like Fraud detection in some area, and try to solve this problem

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Theodore Vasiloudis
Hello all, thank you for opening this discussion Stavros, note that it's almost exactly 1 year since I last opened such a topic (linked by Gabor) and the comments there are still relevant. I think Gabor described the current state quite well, development in the libraries is hard without

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Gábor Hermann
Hi Stavros, Thanks for bringing this up. There have been past [1] and recent [2, 3] discussions about the Flink libraries, because there are some stalling PRs and overloaded committers. (Actually, Till is the only committer shepherd of the both the CEP and ML library, and AFAIK he has a ton

[DISCUSS] Flink ML roadmap

2017-02-20 Thread Stavros Kontopoulos
(Resending with the appropriate topic) Hi, I would like to start a discussion about next steps for Flink ML. Currently there is a lot of work going on but needs a push forward. Some topics to discuss: a) How several features should be planned and get aligned with Flink releases. b) Priorities