Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream

2016-10-07 Thread Fabian Hueske
If you are using time windows, you can access the TimeWindow parameter of
the WindowFunction.apply() method.
The TimeWindow contains the start and end timestamp of a window (as Long)
which can act as keys.

If you are using count windows, I think you have to use a counter as you
described.


2016-10-07 1:06 GMT+02:00 AJ Heller :

> Thank you Fabian, I think that solves it. I'll need to rig up some tests
> to verify, but it looks good.
>
> I used a RichMapFunction to assign ids incrementally to windows (mapping
> STREAM_OBJECT to Tuple2 using a private long value in
> the mapper that increments on every map call). It works, but by any chance
> is there a more succinct way to do it?
>
> On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske  wrote:
>
>> Maybe this can be done by assigning the same window id to each of the N
>> local windows, and do a
>>
>> .keyBy(windowId)
>> .countWindow(N)
>>
>> This should create a new global window for each window id and collect all
>> N windows.
>>
>> Best, Fabian
>>
>> 2016-10-06 22:39 GMT+02:00 AJ Heller :
>>
>>> The goal is:
>>>  * to split data, random-uniformly, across N nodes,
>>>  * window the data identically on each node,
>>>  * transform the windows locally on each node, and
>>>  * merge the N parallel windows into a global window stream, such that
>>> one window from each parallel process is merged into a "global window"
>>> aggregate
>>>
>>> I've achieved all but the last bullet point, merging one window from
>>> each partition into a globally-aggregated window output stream.
>>>
>>> To be clear, a rolling reduce won't work because it would aggregate over
>>> all previous windows in all partitioned streams, and I only need to
>>> aggregate over one window from each partition at a time.
>>>
>>> Similarly for a fold.
>>>
>>> The closest I have found is ParallelMerge for ConnectedStreams, but I
>>> have not found a way to apply it to this problem. Can flink achieve this?
>>> If so, I'd greatly appreciate a point in the right direction.
>>>
>>> Cheers,
>>> -aj
>>>
>>
>>
>


Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream

2016-10-06 Thread AJ Heller
Thank you Fabian, I think that solves it. I'll need to rig up some tests to
verify, but it looks good.

I used a RichMapFunction to assign ids incrementally to windows (mapping
STREAM_OBJECT to Tuple2 using a private long value in
the mapper that increments on every map call). It works, but by any chance
is there a more succinct way to do it?

On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske  wrote:

> Maybe this can be done by assigning the same window id to each of the N
> local windows, and do a
>
> .keyBy(windowId)
> .countWindow(N)
>
> This should create a new global window for each window id and collect all
> N windows.
>
> Best, Fabian
>
> 2016-10-06 22:39 GMT+02:00 AJ Heller :
>
>> The goal is:
>>  * to split data, random-uniformly, across N nodes,
>>  * window the data identically on each node,
>>  * transform the windows locally on each node, and
>>  * merge the N parallel windows into a global window stream, such that
>> one window from each parallel process is merged into a "global window"
>> aggregate
>>
>> I've achieved all but the last bullet point, merging one window from each
>> partition into a globally-aggregated window output stream.
>>
>> To be clear, a rolling reduce won't work because it would aggregate over
>> all previous windows in all partitioned streams, and I only need to
>> aggregate over one window from each partition at a time.
>>
>> Similarly for a fold.
>>
>> The closest I have found is ParallelMerge for ConnectedStreams, but I
>> have not found a way to apply it to this problem. Can flink achieve this?
>> If so, I'd greatly appreciate a point in the right direction.
>>
>> Cheers,
>> -aj
>>
>
>


Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream

2016-10-06 Thread Fabian Hueske
Maybe this can be done by assigning the same window id to each of the N
local windows, and do a

.keyBy(windowId)
.countWindow(N)

This should create a new global window for each window id and collect all N
windows.

Best, Fabian

2016-10-06 22:39 GMT+02:00 AJ Heller :

> The goal is:
>  * to split data, random-uniformly, across N nodes,
>  * window the data identically on each node,
>  * transform the windows locally on each node, and
>  * merge the N parallel windows into a global window stream, such that one
> window from each parallel process is merged into a "global window" aggregate
>
> I've achieved all but the last bullet point, merging one window from each
> partition into a globally-aggregated window output stream.
>
> To be clear, a rolling reduce won't work because it would aggregate over
> all previous windows in all partitioned streams, and I only need to
> aggregate over one window from each partition at a time.
>
> Similarly for a fold.
>
> The closest I have found is ParallelMerge for ConnectedStreams, but I have
> not found a way to apply it to this problem. Can flink achieve this? If so,
> I'd greatly appreciate a point in the right direction.
>
> Cheers,
> -aj
>