Hey,

I am wondering if the following code will result in identical but more
efficient (parallel):

input.keyBy(assignRandomKey).window(Time.seconds(10)
).reduce(count).timeWindowAll(Time.seconds(10)).reduce(count)

Effectively just assigning random keys to do the preaggregation and then do
a window on the pre-aggregated values. I wonder if this actually leads to
correct results or how does it interplay with the time semantics.

Cheers,
Gyula

Stephan Ewen <[email protected]> ezt írta (időpont: 2016. febr. 26., P,
19:10):

> True, at this point it does not pre-aggregate in parallel, that is
> actually a feature on the list but not yet added...
>
> On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <[email protected]>
> wrote:
>
>> That code will not run in parallel right? So, a map-reduce task would
>> yield better performance no?
>>
>>
>>
>> On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <[email protected]> wrote:
>>
>>> Then go for:
>>>
>>> input.timeWindowAll(Time.seconds(10)).fold(0, new
>>> FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public
>>> Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception
>>> { return integer + 1; } });
>>>
>>> Try to explore the API a bit, most things should be quite intuitive.
>>> There are also some docs:
>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
>>>
>>> On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <[email protected]>
>>> wrote:
>>>
>>>> Why the ".keyBy"? I don't want to count tuples by Key. I simply want to
>>>> count all tuples that are contained in a window.
>>>>
>>>> On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Saiph,
>>>>>
>>>>> you can do it the following way:
>>>>>
>>>>> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new 
>>>>> FoldFunction<Tuple2<Integer, Integer>, Integer>() {
>>>>>     @Override
>>>>>     public Integer fold(Integer integer, Tuple2<Integer, Integer> o) 
>>>>> throws Exception {
>>>>>         return integer + 1;
>>>>>     }
>>>>> });
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>> ​
>>>>>
>>>>> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> In Flink Stream what's the best way of counting the number of tuples
>>>>>> within a window of 10 seconds? Using a map-reduce task? Asking because in
>>>>>> spark there is the method rawStream.countByWindow(Seconds(x)).
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to