Thanks Kenneth! I sort of feel the notions of bundles and windows are a bit confusing in Beam.
For example, here is what the Beam Programming Guide says: "When performing an operation that groups elements in an unbounded PCollection, Beam requires a concept called *windowing* to divide a continuously updating data set into logical windows of finite size. Beam processes each window as a bundle, and processing continues as the data set is generated." So then I would assume "bundles" and "windows" are terms that can be used almost interchangeably. Do you know if there's any good posts / documentations about bundles? Cheers, Derek On Wed, Oct 18, 2017 at 6:59 AM, Kenneth Knowles <k...@google.com> wrote: > Bundles are decidedly not windows, so let's keep the two terms separate. > It sounds like you are asking about bundles. > > The bundle size is a performance tuning parameter and is arbitrarily > chosen arbitrarily and dynamically chosen by a runner. The runner chooses > based on its best effort to amortize @StartBundle/@FinishBundle operations > across multiple @ProcessElement/@OnTimer calls. Your code must yield > correct results for for any bundling - you should be implementing > per-element logic, where @StartBundle/@FinishBundle are implementation > details. > > Kenn > > On Tue, Oct 17, 2017 at 5:37 PM, Derek Hao Hu <phoenixin...@gmail.com> > wrote: > >> Hi, >> >> Is there any more detailed explanation on how Beam chooses the window >> size (bundle size) in streaming mode? It seems there is no clear answer in >> the [Beam Programming Guide](https://beam.apache.org >> /documentation/programming-guide/) and I can't find how PubsubIO >> implements this windowing strategy as well. :( >> >> Could someone kindly provide some pointers? Thanks! >> -- >> Derek Hao Hu >> >> Software Engineer | Snapchat >> Snap Inc. >> > > -- Derek Hao Hu Software Engineer | Snapchat Snap Inc.