The definition of batch mode for Dataflow is this: completely compute the
result of one stage of computation before starting the next stage. There is
no way around this. It does not have to do with using state and timers.

If you are working with state & timers & triggers, and you are hoping for
output before the pipeline is completely terminated, then you most likely
want streaming mode. Perhaps it is best to investigate the BQ read
performance issue.

Kenn

On Wed, Apr 22, 2020 at 4:04 PM Aniruddh Sharma <asharma...@gmail.com>
wrote:

> Hi
>
> I am reading a bounded collection from BQ.
>
> I have to use a Stateful & Timely operation.
>
> a) I am invoking job in batch mode. Dataflow runner adds a step
> "BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" which has partitionBy.
> This partitionBy waits for all the data to come and becomes a bottleneck.
> when I read about its documentation it seems its objective it to be added
> when there are no windows.
>
> I tried added windows and triggering them before stateful step, but
> everything comes to this partitionBy step and waits till all data is here.
>
> Is there a way to write code in some way (like window etc) or give
> Dataflow a hint not to add this step in.
>
> b) I dont want to call this job in streaming mode, When I call in
> streaming mode, this Dataflow runner does not add this step, but in
> Streaming BQ read becomes a bottleneck.
>
> So either I have to solve how I read BQ faster if I call job in Streaming
> mode or How I bypass this partitionBy from
> "BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" if I invoke job in
> batch mode ?
>
> Thanks
> Aniruddh
>
>
>
>

Reply via email to