Hi Stephen, First of all, yes, windows computing and emitting at the same time can cause pressure on the downstream system.
There are a few ways how you can achieve this: * use a custom window assigner. A window assigner decides into which window a record is assigned. This is the approach you suggested. * use a regular window and add an operator that buffers the window results and releases them with randomized delay. * use a ProcessFunction which allows you to control the timing of computations yourself. A few months ago, there was a similar discussion on the dev mailing list [1] (didn't read the thread) started by Rong (in CC). Maybe, he can share some ideas / experiences as well. Cheers, Fabian [1] https://lists.apache.org/thread.html/0d1e41302b89378f88693bf4fdb52c23d4b240160b5a10c163d9c46c@%3Cdev.flink.apache.org%3E Am So., 10. Feb. 2019 um 21:03 Uhr schrieb Stephen Connolly < stephen.alan.conno...@gmail.com>: > Looking into the code in TumblingEventTimeWindows: > > @Override > public Collection<TimeWindow> assignWindows(Object element, long > timestamp, WindowAssignerContext context) { > if (timestamp > Long.MIN_VALUE) { > // Long.MIN_VALUE is currently assigned when no timestamp is present > long start = TimeWindow.getWindowStartWithOffset(timestamp, offset, size); > return Collections.singletonList(new TimeWindow(start, start + size)); > } else { > throw new RuntimeException("Record has Long.MIN_VALUE timestamp (= no > timestamp marker). " + > "Is the time characteristic set to 'ProcessingTime', or did you forget to > call " + > "'DataStream.assignTimestampsAndWatermarks(...)'?"); > } > } > > So I think I can just write my own where the offset is derived from > hashing the element using my hash function. > > Good plan or bad plan? > > > On Sun, 10 Feb 2019 at 19:55, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> I would like to process a stream of data firom different customers, >> producing output say once every 15 minutes. The results will then be loaded >> into another system for stoage and querying. >> >> I have been using TumblingEventTimeWindows in my prototype, but I am >> concerned that all the windows will start and stop at the same time and >> cause batch load effects on the back-end data store. >> >> What I think I would like is that the windows could have a different >> start offset for each key, (using a hash function that I would supply) >> >> Thus deterministically, key "ca:fe:ba:be" would always start based on an >> initail offset of 00:07 UTC while say key "de:ad:be:ef" would always start >> based on an initial offset of say 00:02 UTC >> >> Is this possible? Or do I just have to find some way of queuing up my >> writes using back-pressure? >> >> Thanks in advance >> >> -stephenc >> >> P.S. I can trade assistance with Flink for assistance with Maven or >> Jenkins if my questions are too wierysome! >> >