Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-12 Thread vino yang
Hi all, Thanks for all the feedback and comments. Based on @Aljoscha Krettek 's suggestion, I have created a new FLIP: "FLIP-44: Support Local Aggregation in Flink" and started a new mail thread for it in the dev mailing list. So any further feedback and discussion can be moved to the new threa

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-11 Thread vino yang
Hi Aljoscha, I am happy to create a FLIP and have a voting process for this feature. I have already sent a mail to apply for the wiki permissions. Once I get the permission, will start the next step. When it is ready, I will ping you again. Best, Vino Aljoscha Krettek 于2019年6月11日周二 下午10:57写道:

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-11 Thread Aljoscha Krettek
Hi, I think this proposed change is big enough to warrant a FLIP [1], which should have a voting process as described in that link before the FLIP is accepted. I’m writing this because such a bigger change has the possibility of languishing for a long time due to lack of PMC/committer bandwidth

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-09 Thread vino yang
Hi all, Thanks for all the feedback and comments. Since the thread of this feature has been presented about one week in the dev mailing list and has got much support from the community, I have created a new JIRA feature issue[1] to track it and I will split subtasks soon. We can move further dis

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-07 Thread mayo zhang
+ 1 for this feature which is great helpful in product situations and look forward to see it as soon as possible. > 在 2019年6月6日,下午4:58,qianjin Xu 写道: > >> >> hi > > +1 nice work > > best > > forwardxu

Re: Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-06 Thread qianjin Xu
> > hi +1 nice work best forwardxu

Re: Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-06 Thread qianjin Xu
hi +1,nice work best forwardxu boshu Zheng 于2019年6月6日周四 下午4:30写道: > Hi, > > > +1 from my side. Looking forward to this must-have feature :) > > > Best, > boshu > > At 2019-06-05 16:33:13, "vino yang" wrote: > >Hi Aljoscha, > > > >What do you think about this feature and design document? > >

Re:Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-06 Thread boshu Zheng
Hi, +1 from my side. Looking forward to this must-have feature :) Best, boshu At 2019-06-05 16:33:13, "vino yang" wrote: >Hi Aljoscha, > >What do you think about this feature and design document? > >Best, >Vino > >vino yang 于2019年6月5日周三 下午4:18写道: > >> Hi Dian, >> >> I still think your imp

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-05 Thread vino yang
Hi Aljoscha, What do you think about this feature and design document? Best, Vino vino yang 于2019年6月5日周三 下午4:18写道: > Hi Dian, > > I still think your implementation is similar to the window operator, you > mentioned the scalable trigger mechanism, the window API also can customize > trigger. >

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-05 Thread vino yang
Hi Dian, I still think your implementation is similar to the window operator, you mentioned the scalable trigger mechanism, the window API also can customize trigger. Moreover, IMO, the design should guarantee a deterministic semantics, I think based on memory availability is a non-deterministic

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Dian Fu
Hi Vino, Thanks a lot for your reply. > 1) When, Why and How to judge the memory is exhausted? My point here is that the local aggregate operator can buffer the inputs in memory and send out the results AT ANY TIME. i.e. element count or the time interval reached a pre-configured value, the me

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Biao Liu
Hi Vino, +1 for this feature. It's useful for data skew. And it could also reduce shuffled datum. I have some concerns about the API part. From my side, this feature should be more like an improvement. I'm afraid the proposal is an overkill about the API part. Many other systems support pre-aggre

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread vino yang
Hi Litree, >From an implementation level, the localKeyBy API returns a general KeyedStream, you can call all the APIs which KeyedStream provides, we did not restrict its usage, although we can do this (for example returns a new stream object named LocalKeyedStream). However, to achieve the goal o

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread vino yang
Hi Dian, The different opinion is fine for me, If there is a better solution or there are obvious deficiencies in our design, we are very happy to accept and improve it. I agree with you that customized local aggregate operator is more scalable in the way of the trigger mechanism. However, I have

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread litree
Hi Vino, I have read your design,something I want to know is the usage of these new APIs.It looks like when I use localByKey,i must then use a window operator to return a datastream,and then use keyby and another window operator to get the final result? thanks, Litree On 06/04/2019 17:22,

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Dian Fu
Hi Vino, It may seem similar to window operator but there are also a few key differences. For example, the local aggregate operator can send out the results at any time and the window operator can only send out the results at the end of window (without early fire). This means that the local aggreg

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread vino yang
Hi Dian, Thanks for your reply. I know what you mean. However, if you think deeply, you will find your implementation need to provide an operator which looks like a window operator. You need to use state and receive aggregation function and specify the trigger time. It looks like a lightweight wi

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Piotr Nowojski
Hi Vino, > So if users want to use local aggregation, they should call the window API > to build a local window that means users should (or say "can") specify the > window length and other information based on their needs. It sounds ok for me. It would have to be run against some API guys from th

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Dian Fu
Hi Vino, Thanks a lot for starting this discussion. +1 to this feature as I think it will be very useful. Regarding to using window to buffer the input elements, personally I don't think it's a good solution for the following reasons: 1) As we know that WindowOperator will store the accumulated

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-03 Thread vino yang
Hi Ken, Thanks for your reply. As I said before, we try to reuse Flink's state concept (fault tolerance and guarantee "Exactly-Once" semantics). So we did not consider cache. In addition, if we use Flink's state, the OOM related issue is not a key problem we need to consider. Best, Vino Ken Kr

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-03 Thread Ken Krugler
Hi all, Cascading implemented this “map-side reduce” functionality with an LLR cache. That worked well, as then the skewed keys would always be in the cache. The API let you decide the size of the cache, in terms of number of entries. Having a memory limit would have been better for many of our

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-03 Thread vino yang
Hi Piotr, The localKeyBy API returns an instance of KeyedStream (we just added an inner flag to identify the local mode) which is Flink has provided before. Users can call all the APIs(especially *window* APIs) which KeyedStream provided. So if users want to use local aggregation, they should cal

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-03 Thread Piotr Nowojski
Hi, +1 for the idea from my side. I’ve even attempted to add similar feature quite some time ago, but didn’t get enough traction [1]. I’ve read through your document and I couldn’t find it mentioning anywhere, when the pre aggregated result should be emitted down the stream? I think that’s one

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-03 Thread sf lee
Excited and Big +1 for this feature. SHI Xiaogang 于2019年6月3日周一 下午3:37写道: > Nice feature. > Looking forward to having it in Flink. > > Regards, > Xiaogang > > vino yang 于2019年6月3日周一 下午3:31写道: > > > Hi all, > > > > As we mentioned in some conference, such as Flink Forward SF 2019 and > QCon > >

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-03 Thread SHI Xiaogang
Nice feature. Looking forward to having it in Flink. Regards, Xiaogang vino yang 于2019年6月3日周一 下午3:31写道: > Hi all, > > As we mentioned in some conference, such as Flink Forward SF 2019 and QCon > Beijing 2019, our team has implemented "Local aggregation" in our inner > Flink fork. This feature c

[DISCUSS] Support Local Aggregation in Flink

2019-06-03 Thread vino yang
Hi all, As we mentioned in some conference, such as Flink Forward SF 2019 and QCon Beijing 2019, our team has implemented "Local aggregation" in our inner Flink fork. This feature can effectively alleviate data skew. Currently, keyed streams are widely used to perform aggregating operations (e.g.