I also had issues using this pattern. In most cases, it works fine, but the duplicate error showed up after 4000 or so triggers using a 30-second timer. I've tried to apply aggregation before View Singleton to enforce a single element, but that didn't solve the issue.
Setting the timer to 5-minutes seemed to alleviate (or delay) the problem but I need to do more testing. On Tue, Feb 22, 2022 at 2:22 PM Reuven Lax <[email protected]> wrote: > elementCountAtLeast only guarantees a lower bound on the elements in a > pane. No upper bound is guaranteed. > > On Tue, Feb 22, 2022 at 11:16 AM Steve Niemitz <[email protected]> > wrote: > >> Does "Repeatedly.forever(AfterPane.elementCountAtLeast(1)" solve this? >> At least in my tests it seems like this correctly only emits a single >> element per pane, but I'm not sure how much of a guarantee there actually >> is that there will never be more than N elements in a pane when >> elementCountAtLeast(N) is set. >> >> On Tue, Feb 22, 2022 at 2:06 PM Luke Cwik <[email protected]> wrote: >> >>> I'm not certain that Latest would work since the processing time trigger >>> would still cause multiple firings to occur each producing the "latest" at >>> that point in time. All these firings would effectively be output to the >>> PCollection that the view is over. The PCollection would effectively be a >>> concatenation of all these firings. >>> >>> >>> >>> On Tue, Feb 22, 2022 at 10:57 AM Pavel Solomin <[email protected]> >>> wrote: >>> >>>> I also did not succeed in making this pattern work some time ago. In >>>> the link below there's my mail thread with code example - do you have a >>>> similar use-case? >>>> >>>> https://lists.apache.org/thread/9l74o4vqbtfgc5vkj9qq0xofffmtxswc >>>> >>>> Will keep watching this thread for insights. >>>> >>>> Best Regards, >>>> Pavel Solomin >>>> >>>> Tel: +351 962 950 692 <+351%20962%20950%20692> | Skype: pavel_solomin >>>> | Linkedin <https://www.linkedin.com/in/pavelsolomin> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, 22 Feb 2022 at 18:46, Steve Niemitz <[email protected]> >>>> wrote: >>>> >>>>> We had a team try to use the "slowly updating global window side >>>>> inputs" pattern (on dataflow) to update some metadata in their pipeline >>>>> every minute, but surprisingly ran into errors that the side input >>>>> PCollection contained more than one element, [1] although this only >>>>> manifested intermittently. >>>>> >>>>> My theory on why this breaks is as follows, can someone check my logic? >>>>> >>>>> Given that GenerateSequence operates on processing time, (although >>>>> this might not actually matter) it's possible that if processing the >>>>> source >>>>> is delayed for whatever reason, the source may emit multiple elements at >>>>> once in a single bundle. For example, if I configure the source to >>>>> generate an element every 10 seconds, and the evaluation of the source is >>>>> delayed for 30 seconds, I'd get a bundle with 3 elements in it. (or so it >>>>> seems) All elements are then windowed into the global window, so they all >>>>> end up in the same window. >>>>> >>>>> If a bundle with 3 elements enters >>>>> the AfterProcessingTime.pastFirstElementInPane() state machine, all 3 >>>>> elements will be emitted in that pane. This will then propagate down and >>>>> break on the singleton view combiner. >>>>> >>>>> Is my thought process here correct? Is the example here just buggy? >>>>> >>>>> [1] "pcollection view being accessed as a singleton despite having >>>>> more than one input." >>>>> >>>>
