Hi Eugene!

I had gone through that link before sending an email here. It does a decent job 
explaining when to use which method and what kind of optimisations we are 
looking at, but didn’t really answer the question I had i.e. the controlling 
granularity of elements of PCollection in a bundle. Kenneth made it clear that 
it is not in user control, but now I am interested to know how does the runner 
decide it.

> On May 21, 2018, at 7:55 PM, Eugene Kirpichov <[email protected]> wrote:
> 
> Hi Abdul,
> Please see 
> https://stackoverflow.com/questions/45985753/what-is-the-difference-between-dofn-setup-and-dofn-startbundle
>  
> <https://stackoverflow.com/questions/45985753/what-is-the-difference-between-dofn-setup-and-dofn-startbundle>
>  - let me know if it answers your question sufficiently.
> 
> On Mon, May 21, 2018 at 7:04 PM Abdul Qadeer <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi!
> 
> I was trying to understand the behavior of StartBundle and FinishBundle 
> w.r.t. DoFns.
> I have an unbounded data source and I am trying to leverage bundling to 
> achieve batching.
> From the docs of ParDo:
> 
> "when a ParDo transform is executed, the elements of the input PCollection 
> are first divided up into some number of "bundles"
> 
> I would like to know if bundling is possible for unbounded data in the first 
> place. If it is then how do I control the bundle size i.e. number of elements 
> of a given PCollection in that bundle?

Reply via email to