Hey, I am wondering if the following code will result in identical but more efficient (parallel):
input.keyBy(assignRandomKey).window(Time.seconds(10) ).reduce(count).timeWindowAll(Time.seconds(10)).reduce(count) Effectively just assigning random keys to do the preaggregation and then do a window on the pre-aggregated values. I wonder if this actually leads to correct results or how does it interplay with the time semantics. Cheers, Gyula Stephan Ewen <[email protected]> ezt írta (időpont: 2016. febr. 26., P, 19:10): > True, at this point it does not pre-aggregate in parallel, that is > actually a feature on the list but not yet added... > > On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <[email protected]> > wrote: > >> That code will not run in parallel right? So, a map-reduce task would >> yield better performance no? >> >> >> >> On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <[email protected]> wrote: >> >>> Then go for: >>> >>> input.timeWindowAll(Time.seconds(10)).fold(0, new >>> FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public >>> Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception >>> { return integer + 1; } }); >>> >>> Try to explore the API a bit, most things should be quite intuitive. >>> There are also some docs: >>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams >>> >>> On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <[email protected]> >>> wrote: >>> >>>> Why the ".keyBy"? I don't want to count tuples by Key. I simply want to >>>> count all tuples that are contained in a window. >>>> >>>> On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <[email protected]> >>>> wrote: >>>> >>>>> Hi Saiph, >>>>> >>>>> you can do it the following way: >>>>> >>>>> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new >>>>> FoldFunction<Tuple2<Integer, Integer>, Integer>() { >>>>> @Override >>>>> public Integer fold(Integer integer, Tuple2<Integer, Integer> o) >>>>> throws Exception { >>>>> return integer + 1; >>>>> } >>>>> }); >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> >>>>> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> In Flink Stream what's the best way of counting the number of tuples >>>>>> within a window of 10 seconds? Using a map-reduce task? Asking because in >>>>>> spark there is the method rawStream.countByWindow(Seconds(x)). >>>>>> >>>>>> Thanks. >>>>>> >>>>> >>>>> >>>> >>> >> >
