Hi 

I am reading a bounded collection from BQ. 

I have to use a Stateful & Timely operation. 

a) I am invoking job in batch mode. Dataflow runner adds a step 
"BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" which has partitionBy. 
This partitionBy waits for all the data to come and becomes a bottleneck. when 
I read about its documentation it seems its objective it to be added when there 
are no windows.

I tried added windows and triggering them before stateful step, but everything 
comes to this partitionBy step and waits till all data is here. 

Is there a way to write code in some way (like window etc) or give Dataflow a 
hint not to add this step in.

b) I dont want to call this job in streaming mode, When I call in streaming 
mode, this Dataflow runner does not add this step, but in Streaming BQ read 
becomes a bottleneck.

So either I have to solve how I read BQ faster if I call job in Streaming mode 
or How I bypass this partitionBy from 
"BatchStatefulParDoOverrides.GbkBeforeStatefulParDo" if I invoke job in batch 
mode ?

Thanks
Aniruddh



Reply via email to