Hi all,

I'm running into a scenario where I feel that Dataflow Overrides
(specifically BatchStatefulParDoOverrides.GbkBeforeStatefulParDo ) are
unnecessarily causing a batch pipeline to "pause" throughput since a GBK
needs to have processed all the data in a window before it can output.

Is it strictly required that GbkBeforeStatefulParDo must run before any
stateful DoFn? If not, what failure modes is GbkBeforeStatefulParDo trying
to protect against, and how can it be bypassed/disabled while still using
DataflowRunner?

Thanks,
Evan

Reply via email to