Hi I am reading a bounded collection from BQ.
I have to use a Stateful & Timely operation. a) I am invoking job in batch mode. Dataflow runner adds a step "BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" which has partitionBy. This partitionBy waits for all the data to come and becomes a bottleneck. when I read about its documentation it seems its objective it to be added when there are no windows. I tried added windows and triggering them before stateful step, but everything comes to this partitionBy step and waits till all data is here. Is there a way to write code in some way (like window etc) or give Dataflow a hint not to add this step in. b) I dont want to call this job in streaming mode, When I call in streaming mode, this Dataflow runner does not add this step, but in Streaming BQ read becomes a bottleneck. So either I have to solve how I read BQ faster if I call job in Streaming mode or How I bypass this partitionBy from "BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" if I invoke job in batch mode ? Thanks Aniruddh