Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream
If you are using time windows, you can access the TimeWindow parameter of the WindowFunction.apply() method. The TimeWindow contains the start and end timestamp of a window (as Long) which can act as keys. If you are using count windows, I think you have to use a counter as you described. 2016-10-07 1:06 GMT+02:00 AJ Heller: > Thank you Fabian, I think that solves it. I'll need to rig up some tests > to verify, but it looks good. > > I used a RichMapFunction to assign ids incrementally to windows (mapping > STREAM_OBJECT to Tuple2 using a private long value in > the mapper that increments on every map call). It works, but by any chance > is there a more succinct way to do it? > > On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske wrote: > >> Maybe this can be done by assigning the same window id to each of the N >> local windows, and do a >> >> .keyBy(windowId) >> .countWindow(N) >> >> This should create a new global window for each window id and collect all >> N windows. >> >> Best, Fabian >> >> 2016-10-06 22:39 GMT+02:00 AJ Heller : >> >>> The goal is: >>> * to split data, random-uniformly, across N nodes, >>> * window the data identically on each node, >>> * transform the windows locally on each node, and >>> * merge the N parallel windows into a global window stream, such that >>> one window from each parallel process is merged into a "global window" >>> aggregate >>> >>> I've achieved all but the last bullet point, merging one window from >>> each partition into a globally-aggregated window output stream. >>> >>> To be clear, a rolling reduce won't work because it would aggregate over >>> all previous windows in all partitioned streams, and I only need to >>> aggregate over one window from each partition at a time. >>> >>> Similarly for a fold. >>> >>> The closest I have found is ParallelMerge for ConnectedStreams, but I >>> have not found a way to apply it to this problem. Can flink achieve this? >>> If so, I'd greatly appreciate a point in the right direction. >>> >>> Cheers, >>> -aj >>> >> >> >
Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream
Thank you Fabian, I think that solves it. I'll need to rig up some tests to verify, but it looks good. I used a RichMapFunction to assign ids incrementally to windows (mapping STREAM_OBJECT to Tuple2using a private long value in the mapper that increments on every map call). It works, but by any chance is there a more succinct way to do it? On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske wrote: > Maybe this can be done by assigning the same window id to each of the N > local windows, and do a > > .keyBy(windowId) > .countWindow(N) > > This should create a new global window for each window id and collect all > N windows. > > Best, Fabian > > 2016-10-06 22:39 GMT+02:00 AJ Heller : > >> The goal is: >> * to split data, random-uniformly, across N nodes, >> * window the data identically on each node, >> * transform the windows locally on each node, and >> * merge the N parallel windows into a global window stream, such that >> one window from each parallel process is merged into a "global window" >> aggregate >> >> I've achieved all but the last bullet point, merging one window from each >> partition into a globally-aggregated window output stream. >> >> To be clear, a rolling reduce won't work because it would aggregate over >> all previous windows in all partitioned streams, and I only need to >> aggregate over one window from each partition at a time. >> >> Similarly for a fold. >> >> The closest I have found is ParallelMerge for ConnectedStreams, but I >> have not found a way to apply it to this problem. Can flink achieve this? >> If so, I'd greatly appreciate a point in the right direction. >> >> Cheers, >> -aj >> > >
Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream
Maybe this can be done by assigning the same window id to each of the N local windows, and do a .keyBy(windowId) .countWindow(N) This should create a new global window for each window id and collect all N windows. Best, Fabian 2016-10-06 22:39 GMT+02:00 AJ Heller: > The goal is: > * to split data, random-uniformly, across N nodes, > * window the data identically on each node, > * transform the windows locally on each node, and > * merge the N parallel windows into a global window stream, such that one > window from each parallel process is merged into a "global window" aggregate > > I've achieved all but the last bullet point, merging one window from each > partition into a globally-aggregated window output stream. > > To be clear, a rolling reduce won't work because it would aggregate over > all previous windows in all partitioned streams, and I only need to > aggregate over one window from each partition at a time. > > Similarly for a fold. > > The closest I have found is ParallelMerge for ConnectedStreams, but I have > not found a way to apply it to this problem. Can flink achieve this? If so, > I'd greatly appreciate a point in the right direction. > > Cheers, > -aj >