Re: windowAll and AggregateFunction

Ken Krugler Wed, 09 Jan 2019 15:26:31 -0800

> On Jan 9, 2019, at 3:10 PM, CPC <acha...@gmail.com> wrote:
> 
> Hi Ken,
> 
> From regular time-based windows do you mean keyed windows?


Correct. Without doing a keyBy() you would have a parallelism of 1.

I think you want to key on whatever you’re counting for unique values, so that 
each window operator gets a slice of the unique values.

— Ken

> On Wed, Jan 9, 2019, 10:22 PM Ken Krugler <kkrugler_li...@transpac.com 
> <mailto:kkrugler_li...@transpac.com> wrote:
> Hi there,
> 
> You should be able to use a regular time-based window(), and emit the 
> HyperLogLog binary data as your result, which then would get merged in your 
> custom function (which you set a parallelism of 1 on).
> 
> Note that if you are generating unique counts per non-overlapping time 
> window, you’ll need to keep N HLL structures in each operator.
> 
> — Ken
> 
> 
>> On Jan 9, 2019, at 10:26 AM, CPC <acha...@gmail.com 
>> <mailto:acha...@gmail.com>> wrote:
>> 
>> Hi Stefan,
>> 
>> Could i use "Reinterpreting a pre-partitioned data stream as keyed stream" 
>> feature for this? 
>> 
>> On Wed, 9 Jan 2019 at 17:50, Stefan Richter <s.rich...@da-platform.com 
>> <mailto:s.rich...@da-platform.com>> wrote:
>> Hi,
>> 
>> I think your expectation about windowAll is wrong, from the method 
>> documentation: “Note: This operation is inherently non-parallel since all 
>> elements have to pass through the same operator instance” and I also cannot 
>> think of a way in which the windowing API would support your use case 
>> without a shuffle. You could probably build the functionality by hand 
>> through, but I guess this is not quite what you want.
>> 
>> Best,
>> Stefan
>> 
>> > On 9. Jan 2019, at 13:43, CPC <acha...@gmail.com 
>> > <mailto:acha...@gmail.com>> wrote:
>> > 
>> > Hi all,
>> > 
>> > In our implementation,we are consuming from kafka and calculating distinct 
>> > with hyperloglog. We are using windowAll function with a custom 
>> > AggregateFunction but flink runtime shows a little bit unexpected behavior 
>> > at runtime. Our sources running with parallelism 4 and i expect add 
>> > function to run after source calculate partial results and at the end of 
>> > the window i expect it to send 4 hll object to single operator to merge 
>> > there(merge function). Instead, it sends all data to single instance and 
>> > call add function there. 
>> > 
>> > Is here any way to make flink behave like this? I mean calculate partial 
>> > results after consuming from kafka with paralelism of sources without 
>> > shuffling(so some part of the calculation can be calculated in parallel) 
>> > and merge those partial results with a merge function?
>> > 
>> > Thank you in advance...
>> 
> 
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com <http://www.scaleunlimited.com/>
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Re: windowAll and AggregateFunction

Reply via email to