Thank you Robert, this is really helpful context and confirmation. > You mention having an "agressive" watermark--could you clarify what you mean by this?
I'm using KafkaIO and I customize the watermark to follow my message event timestamps as they come (instead of having it lag behind the stream to allow for any even slightly out-of-order events.) So I call this "aggressive". I also have a custom window that closes precisely on certain event markers in the stream -- and I need these windows to trigger as fast as possible (was hoping for sub-second here, but I suspected that was not the focus of runners as you're saying so I think I can live w/o this). > Certainly all-in-process runners would likely take the cake here Besides DirectRunner which as I understand it is primarily for testing, what are other in-process runners? On Fri, Dec 13, 2019 at 1:45 PM Robert Bradshaw <[email protected]> wrote: > In general, sub-second latencies are difficult because one must wait > for the watermark to catch up before actually firing. This would > require the oldest item in flight across all machines to be almost > exactly the same timestamp as the newest. Furthermore most sources > cannot provide sub-second watermarks. You mention having an > "agressive" watermark--could you clarify what you mean by this? There > are also (generally small) latencies involved with persisting state > and then firing timers, and windowing is built on the same mechanisms. > > I am not aware of latency benchmarks for various runners--in my > experience most people are interested in high throughput at > O(second-minute) latency. There's nothing in the Beam model that > prevents sub-second latencies, but I don't think this has been pushed > very far on most runners. Gathering such data would certainly be > interesting. (Certainly all-in-process runners would likely take the > cake here, but it'd be interesting to see how it degrades as one adds > more machines.) > > On Fri, Dec 13, 2019 at 10:58 AM Aaron Dixon <[email protected]> wrote: > > > > I've been building pipelines and benchmarking Beam jobs in Dataflow. > > > > Without windowing, latencies look pretty good (reliably sub-second) from > ingest to sink. > > > > Once I introduce windowing even with aggressive watermark it seems it > takes at least a second (often multiple seconds) to see a window fire. > > > > Same appears true for setting events timers in the State API; there > seems a delay between setting a timer at current event time and the timer > callback fires. > > > > Are there any good/canonical latency benchmarks/reports for different > runners? > > > > My next move may be to evaluate Flink in terms of these latencies, but > curious if I should be trying to get sub-second latencies out of Beam > (windowing) at all? > > >
