Re: [DISCUSS] Project build time and possible restructuring

2017-02-22 Thread Gábor Hermann
Hi all, I'm also in favor of splitting, but only in terms of committers. I agree with Theodore, that async releases would cause confusion. With time based releases [1] it should be easy to sync release. Even if it's possible to add committers to different components, should we do a more

Re: [DISCUSS] Project build time and possible restructuring

2017-02-22 Thread Gábor Hermann
@Stephan: Although I tried to raise some issues about splitting committers, I'm still strongly in favor of some kind of restructuring. We just have to be conscious about the disadvantages. Not splitting the committers could leave the libraries in the same stalling status, described by Till.

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Gábor Hermann
Hi Stavros, Thanks for bringing this up. There have been past [1] and recent [2, 3] discussions about the Flink libraries, because there are some stalling PRs and overloaded committers. (Actually, Till is the only committer shepherd of the both the CEP and ML library, and AFAIK he has a ton

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
not on batching, we have no commiter's time for this, mean that yes, we started work on FLINK-1730, but nobody will commit this work in the end, as it already was with this ticket. 23 февр. 2017 г. 14:26 пользователь "Gábor Hermann" <m...@gaborhermann.com> написал: @Theodore: Great t

Re: [DISCUSS] Per-key event time

2017-02-23 Thread Gábor Hermann
Hey all, Let me share some ideas about this. @Paris: The local-only progress tracking indeed seems easier, we do not need to broadcast anything. Implementation-wise it is easier, but performance-wise probably not. If one key can come from multiple sources, there could be a lot more network

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
. Regards. Theodore On Thu, Feb 23, 2017 at 4:41 PM, Gábor Hermann <m...@gaborhermann.com> wrote: Okay, let's just aim for around the end of next week, but we can take more time to discuss if there's still a lot of ongoing activity. Keep the topic hot! Thanks all for the enthusiasm :) O

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
ith committers. @Gabor, could you please start such shared doc, as you have already several ideas proposed? чт, 23 февр. 2017, 15:06 Gábor Hermann <m...@gaborhermann.com>: I agree, that it's better to go in one direction first, but I think online and offline with streaming API can go somewha

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
more time to it like a week may suit more people. What do you think all? I will contribute to the doc. +100 for having a co-ordinator + commiter. Thank you all for joining the discussion. Cheers, Stavros On Thu, Feb 23, 2017 at 4:48 PM, Gábor Hermann <m...@gaborhermann.com> wrote: Okay

Re: Using QueryableState inside Flink jobs (and Parameter Server implementation)

2017-02-14 Thread Gábor Hermann
nal service just as you've proposed. [1] https://issues.apache.org/jira/browse/FLINK-5782 Cheers, Gabor On 2017-02-13 04:10, Jinkui Shi wrote: hi,Gábor Hermann The online parameter server is a good proposal. PS’ paper [1] have a early implement [2], and now it’s mxnet [3]. I have some thought

Re: Using QueryableState inside Flink jobs (and Parameter Server implementation)

2017-02-14 Thread Gábor Hermann
iteration approach in a first version. Would that be a feasible starting point for you? – Ufuk On 14 February 2017 at 14:01:21, Gábor Hermann (m...@gaborhermann.com) wrote: Hi Gyula, Jinkui Shi, Thanks for your thoughts! @Gyula: I'll try and explain a bit more detail. The API could be a

Using QueryableState inside Flink jobs (and Parameter Server implementation)

2017-02-10 Thread Gábor Hermann
Hi all, TL;DR: Is it worth to implement a special QueryableState for querying state from another part of a Flink streaming job and aligning it with fault tolerance? I've been thinking about implementing a Parameter Server with/within Flink. A Parameter Server is basically a specialized

Re: [DISCUSS] Flink ML roadmap

2017-02-28 Thread Gábor Hermann
5:48 schrieb Gábor Hermann <m...@gaborhermann.com>: Okay, I've created a skeleton of the design doc for choosing a direction: https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw/edit?usp=sharing Much of the pros/cons have already been discussed here, so I'll try

ML contributions

2016-09-12 Thread Gábor Hermann
Hey all, We are planning to contribute some algorithms and improvements to Flink ML at SZTAKI . I have already opened a JIRA for an implicit feedback ALS, but probably more will come soon. We are implementing

Re: Flink ML recommender system API

2016-10-04 Thread Gábor Hermann
Thank you both for your detailed replies. I think we all agree on extending the evaluation framework to handle recommendation models, and choosing the scalable form of ranking, so we'll do it that way. For now we will work upon Theodore's PR. Thanks for giving me the reasons behind the

Re: Flink ML recommender system API

2016-11-10 Thread Gábor Hermann
Hi all, We have managed to fit the ranking recommendation evaluation into the evaluation framework proposed by Thedore (FLINK-2157). There's one main problem, that remains: we have to different predictor traits (Predictor, RankingPredictor) without a common superclass, and that might be

Re: Flink ML recommender system API

2016-11-10 Thread Gábor Hermann
re On Thu, Nov 10, 2016 at 12:46 PM, Gábor Hermann <m...@gaborhermann.com> wrote: Hi all, We have managed to fit the ranking recommendation evaluation into the evaluation framework proposed by Thedore (FLINK-2157). There's one main problem, that remains: we have to different predictor t

Re: SVMITSuite Testing

2016-10-26 Thread Gábor Hermann
Hi Jesse, Have you tried running the test in an IDE (e.g. Intellij IDEA)? AFAIK they have support for ScalaTest. When running a Maven build, it seems to skip integration tests (ones marked with "IT") intentionally. I assume it would take much time to run those tests. You can run them by

Re: Machine Learning on Flink - Next steps

2017-03-17 Thread Gábor Hermann
Hi Chen, Thanks for the input! :) There is already a project [1] for using TensorFlow models in Flink, and Theodore has suggested to contact the author, Eron Wright for the model serving direction. [1] http://sf.flink-forward.org/kb_sessions/introducing-flink-tensorflow/ Cheers, Gabor On

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-17 Thread Gábor Hermann
to be on vacation for 1.5 weeks starting next week Wednesday and after that we have Flink Forward. Best, Aljoscha On Thu, Mar 16, 2017, at 23:52, Gábor Hermann wrote: Regarding the CoFlatMap workaround, - For keyed streams, do you suggest that having a per-key buffer stored as keyed state would have a large

Re: Machine Learning on Flink - Next steps

2017-03-20 Thread Gábor Hermann
. I agree about the tensorflow integration it seems to be important from what I hear. Should we sign up somewhere for the working groups (gdcos)? I would like to start helping with the model serving feature. Best Regards, Stavros On Fri, Mar 17, 2017 at 10:34 PM, Gábor Hermann <m...@gaborher

Re: Machine Learning on Flink - Next steps

2017-03-16 Thread Gábor Hermann
updated. As a general remark for the discussions on the google doc. I think it would be great if we could at least mirror the discussions happening in the google doc back on the mailing list or ideally conduct the discussions directly on the mailing list. That's at least what the ASF enco

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-16 Thread Gábor Hermann
Regarding the CoFlatMap workaround, - For keyed streams, do you suggest that having a per-key buffer stored as keyed state would have a large memory overhead? That must be true, although a workaround could be partitioning the data and using a non-keyed stream. Of course that seems hacky, as we

Re: Flink streaming job with iterations gets stuck waiting for network buffers

2017-04-03 Thread Gábor Hermann
Hi Andrey, As Paris has explained it, this is a known issue and there are ongoing efforts to solve it. I can suggest a workaround: limit the amount of messages sent into the iteration manually. You can do this with a e.g. a Map operator that limits records per seconds and simply sends what

Re: Machine Learning on Flink - Next steps

2017-03-10 Thread Gábor Hermann
Hey all, Sorry for the bit late response. I'd like to work on - Offline learning with Streaming API - Low-latency prediction serving I would drop the batch API ML because of past experience with lack of support, and online learning because the lack of use-cases. I completely agree with Kate

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-10 Thread Gábor Hermann
Hi all, Thanks Aljoscha for going forward with the side inputs and for the nice proposal! I'm also in favor of the implementation with N-ary input (3.) for the reasons Ventura explained. I'm strongly against managing side inputs at StreamTask level (2.), as it would create another