Hi there,

You should be able to use a regular time-based window(), and emit the 
HyperLogLog binary data as your result, which then would get merged in your 
custom function (which you set a parallelism of 1 on).

Note that if you are generating unique counts per non-overlapping time window, 
you’ll need to keep N HLL structures in each operator.

— Ken


> On Jan 9, 2019, at 10:26 AM, CPC <acha...@gmail.com> wrote:
> 
> Hi Stefan,
> 
> Could i use "Reinterpreting a pre-partitioned data stream as keyed stream" 
> feature for this? 
> 
> On Wed, 9 Jan 2019 at 17:50, Stefan Richter <s.rich...@da-platform.com 
> <mailto:s.rich...@da-platform.com>> wrote:
> Hi,
> 
> I think your expectation about windowAll is wrong, from the method 
> documentation: “Note: This operation is inherently non-parallel since all 
> elements have to pass through the same operator instance” and I also cannot 
> think of a way in which the windowing API would support your use case without 
> a shuffle. You could probably build the functionality by hand through, but I 
> guess this is not quite what you want.
> 
> Best,
> Stefan
> 
> > On 9. Jan 2019, at 13:43, CPC <acha...@gmail.com 
> > <mailto:acha...@gmail.com>> wrote:
> > 
> > Hi all,
> > 
> > In our implementation,we are consuming from kafka and calculating distinct 
> > with hyperloglog. We are using windowAll function with a custom 
> > AggregateFunction but flink runtime shows a little bit unexpected behavior 
> > at runtime. Our sources running with parallelism 4 and i expect add 
> > function to run after source calculate partial results and at the end of 
> > the window i expect it to send 4 hll object to single operator to merge 
> > there(merge function). Instead, it sends all data to single instance and 
> > call add function there. 
> > 
> > Is here any way to make flink behave like this? I mean calculate partial 
> > results after consuming from kafka with paralelism of sources without 
> > shuffling(so some part of the calculation can be calculated in parallel) 
> > and merge those partial results with a merge function?
> > 
> > Thank you in advance...
> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply via email to