+dev <d...@beam.apache.org>@beam and some people who I talk about joins with
Interesting! It is a lot to take in and fully grok the code, so calling in reinforcements... Generally, I think there's agreement that for a lot of real use cases, you have to roll your own join using the lower level Beam primitives. So I think it would be great to get some of these other approaches to joins into Beam, perhaps as an extension of the Java SDK or even in the core (since schema joins are in the core). In particular: - "join in fixed window with repeater" sounds similar (but not identical) to work by Mikhail - "join in global window with cache" sounds similar (but not identical) to work and discussions w/ Reza and Tyson I want to be clear that I am *not* saying there's any duplication. I'm guessing these all fit into a collection of different ways to accomplish joins, and if everything comes to fruition we will have the great opportunity to document how a user should choose between them. Kenn On Fri, May 1, 2020 at 7:56 AM Marcin Kuthan <marcin.kut...@gmail.com> wrote: > Hi, > > it's my first post here but I'm a group reader for a while, so thank you > for sharing the knowledge! > > I've been using Beam/Scio on Dataflow for about a year, mostly for stream > processing from unbounded source like PubSub. During my daily work I found > that built-in windowing is very generic and provides reach watermark/late > events semantics but there are a few very annoying limitations, e.g: > - both side of the join must be defined within compatible windows > - for fixed windows, elements close to window boundaries (but in different > windows) won't be joined > - for sliding windows there is a huge overhead if the duration is much > longer than offset > > I would like to ask you to review a few "join/windowing patterns" with > custom stateful ParDos, not so generic as Beam built-ins but perhaps better > crafted for more specific needs. I published code with tests, feel free to > comment as GitHub issues or on the mailing list. The event time processing > with watermarks is so demanding that I'm almost sure that I overlooked many > important corner cases. > https://github.com/mkuthan/beam-examples > > If you think that the examples are somehow useful I'll be glad to write > blog post with more details :) > > Marcin >