Hi Kenn Thanks for your guidance, I understand that batch mode waits for previous stage. But the real issue in this particular case is not only this.
Dataflow runner adds a step automatically "BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" which not only waits for previous stage but it waits for a very very very long time. Is there a way to give hint to Dataflow runner not to add this step, as in my case I functionally do not require this step. Thanks for your suggestion, will create another thread to understand BQ options Thanks Aniruddh On 2020/04/23 03:51:31, Kenneth Knowles <k...@apache.org> wrote: > The definition of batch mode for Dataflow is this: completely compute the > result of one stage of computation before starting the next stage. There is > no way around this. It does not have to do with using state and timers. > > If you are working with state & timers & triggers, and you are hoping for > output before the pipeline is completely terminated, then you most likely > want streaming mode. Perhaps it is best to investigate the BQ read > performance issue. > > Kenn > > On Wed, Apr 22, 2020 at 4:04 PM Aniruddh Sharma <asharma...@gmail.com> > wrote: > > > Hi > > > > I am reading a bounded collection from BQ. > > > > I have to use a Stateful & Timely operation. > > > > a) I am invoking job in batch mode. Dataflow runner adds a step > > "BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" which has partitionBy. > > This partitionBy waits for all the data to come and becomes a bottleneck. > > when I read about its documentation it seems its objective it to be added > > when there are no windows. > > > > I tried added windows and triggering them before stateful step, but > > everything comes to this partitionBy step and waits till all data is here. > > > > Is there a way to write code in some way (like window etc) or give > > Dataflow a hint not to add this step in. > > > > b) I dont want to call this job in streaming mode, When I call in > > streaming mode, this Dataflow runner does not add this step, but in > > Streaming BQ read becomes a bottleneck. > > > > So either I have to solve how I read BQ faster if I call job in Streaming > > mode or How I bypass this partitionBy from > > "BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" if I invoke job in > > batch mode ? > > > > Thanks > > Aniruddh > > > > > > > > >