hmm, no I need the triggers on the window for two reasons:

  1. Say I get ~6GB of data for each of those hourly windows, and I let the
window fire only after the watermark naturally crosses, I would need to
store that 6GB in memory. Whereas if I let early firings happen often, and
let the stateful DoFn output whenever it has received 100MB worth of data,
the memory requirement comes down significantly.

 2. The window size could be larger than an hour, maybe a day. Early
firings would let 100MB-pieces of the data be written and get picked up by
downstream systems at a reduced latency instead of waiting for everything
to arrive.

GroupIntoBatches doesn't work on sizes (in terms of bytes) AFAIK

Reply via email to