We had a team try to use the "slowly updating global window side inputs"
pattern (on dataflow) to update some metadata in their pipeline every
minute, but surprisingly ran into errors that the side input PCollection
contained more than one element, [1] although this only manifested
intermittently.

My theory on why this breaks is as follows, can someone check my logic?

Given that GenerateSequence operates on processing time, (although this
might not actually matter) it's possible that if processing the source is
delayed for whatever reason, the source may emit multiple elements at once
in a single bundle.  For example, if I configure the source to generate an
element every 10 seconds, and the evaluation of the source is delayed for
30 seconds, I'd get a bundle with 3 elements in it. (or so it seems)  All
elements are then windowed into the global window, so they all end up in
the same window.

If a bundle with 3 elements enters
the AfterProcessingTime.pastFirstElementInPane() state machine, all 3
elements will be emitted in that pane.  This will then propagate down and
break on the singleton view combiner.

Is my thought process here correct?  Is the example here just buggy?

[1] "pcollection view being accessed as a singleton despite having more
than one input."

Reply via email to