In general, sub-second latencies are difficult because one must wait for the watermark to catch up before actually firing. This would require the oldest item in flight across all machines to be almost exactly the same timestamp as the newest. Furthermore most sources cannot provide sub-second watermarks. You mention having an "agressive" watermark--could you clarify what you mean by this? There are also (generally small) latencies involved with persisting state and then firing timers, and windowing is built on the same mechanisms.
I am not aware of latency benchmarks for various runners--in my experience most people are interested in high throughput at O(second-minute) latency. There's nothing in the Beam model that prevents sub-second latencies, but I don't think this has been pushed very far on most runners. Gathering such data would certainly be interesting. (Certainly all-in-process runners would likely take the cake here, but it'd be interesting to see how it degrades as one adds more machines.) On Fri, Dec 13, 2019 at 10:58 AM Aaron Dixon <[email protected]> wrote: > > I've been building pipelines and benchmarking Beam jobs in Dataflow. > > Without windowing, latencies look pretty good (reliably sub-second) from > ingest to sink. > > Once I introduce windowing even with aggressive watermark it seems it takes > at least a second (often multiple seconds) to see a window fire. > > Same appears true for setting events timers in the State API; there seems a > delay between setting a timer at current event time and the timer callback > fires. > > Are there any good/canonical latency benchmarks/reports for different runners? > > My next move may be to evaluate Flink in terms of these latencies, but > curious if I should be trying to get sub-second latencies out of Beam > (windowing) at all? >
