Thank you Fabian, I think that solves it. I'll need to rig up some tests to verify, but it looks good.
I used a RichMapFunction to assign ids incrementally to windows (mapping STREAM_OBJECT to Tuple2<Long, STREAM_OBJECT> using a private long value in the mapper that increments on every map call). It works, but by any chance is there a more succinct way to do it? On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske <fhue...@gmail.com> wrote: > Maybe this can be done by assigning the same window id to each of the N > local windows, and do a > > .keyBy(windowId) > .countWindow(N) > > This should create a new global window for each window id and collect all > N windows. > > Best, Fabian > > 2016-10-06 22:39 GMT+02:00 AJ Heller <a...@drfloob.com>: > >> The goal is: >> * to split data, random-uniformly, across N nodes, >> * window the data identically on each node, >> * transform the windows locally on each node, and >> * merge the N parallel windows into a global window stream, such that >> one window from each parallel process is merged into a "global window" >> aggregate >> >> I've achieved all but the last bullet point, merging one window from each >> partition into a globally-aggregated window output stream. >> >> To be clear, a rolling reduce won't work because it would aggregate over >> all previous windows in all partitioned streams, and I only need to >> aggregate over one window from each partition at a time. >> >> Similarly for a fold. >> >> The closest I have found is ParallelMerge for ConnectedStreams, but I >> have not found a way to apply it to this problem. Can flink achieve this? >> If so, I'd greatly appreciate a point in the right direction. >> >> Cheers, >> -aj >> > >