Thanks Kenneth! I sort of feel the notions of bundles and windows are a bit
confusing in Beam.

For example, here is what the Beam Programming Guide says:

"When performing an operation that groups elements in an unbounded
PCollection, Beam requires a concept called *windowing* to divide a
continuously updating data set into logical windows of finite size. Beam
processes each window as a bundle, and processing continues as the data set
is generated."

So then I would assume "bundles" and "windows" are terms that can be used
almost interchangeably.

Do you know if there's any good posts / documentations about bundles?

Cheers,

Derek

On Wed, Oct 18, 2017 at 6:59 AM, Kenneth Knowles <k...@google.com> wrote:

> Bundles are decidedly not windows, so let's keep the two terms separate.
> It sounds like you are asking about bundles.
>
> The bundle size is a performance tuning parameter and is arbitrarily
> chosen arbitrarily and dynamically chosen by a runner. The runner chooses
> based on its best effort to amortize @StartBundle/@FinishBundle operations
> across multiple @ProcessElement/@OnTimer calls. Your code must yield
> correct results for for any bundling - you should be implementing
> per-element logic, where @StartBundle/@FinishBundle are implementation
> details.
>
> Kenn
>
> On Tue, Oct 17, 2017 at 5:37 PM, Derek Hao Hu <phoenixin...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Is there any more detailed explanation on how Beam chooses the window
>> size (bundle size) in streaming mode? It seems there is no clear answer in
>> the [Beam Programming Guide](https://beam.apache.org
>> /documentation/programming-guide/) and I can't find how PubsubIO
>> implements this windowing strategy as well. :(
>>
>> Could someone kindly provide some pointers? Thanks!
>> --
>> Derek Hao Hu
>>
>> Software Engineer | Snapchat
>> Snap Inc.
>>
>
>


-- 
Derek Hao Hu

Software Engineer | Snapchat
Snap Inc.

Reply via email to