Hi,

it's my first post here but I'm a group reader for a while, so thank you
for sharing the knowledge!

I've been using Beam/Scio on Dataflow for about a year, mostly for stream
processing from unbounded source like PubSub. During my daily work I found
that built-in windowing is very generic and provides reach watermark/late
events semantics but there are a few very annoying limitations, e.g:
- both side of the join must be defined within compatible windows
- for fixed windows, elements close to window boundaries (but in different
windows) won't be joined
- for sliding windows there is a huge overhead if the duration is much
longer than offset

I would like to ask you to review a few "join/windowing patterns" with
custom stateful ParDos, not so generic as Beam built-ins but perhaps better
crafted for more specific needs. I published code with tests, feel free to
comment as GitHub issues or on the mailing list. The event time processing
with watermarks is so demanding that I'm almost sure that I overlooked many
important corner cases.
https://github.com/mkuthan/beam-examples

If you think that the examples are somehow useful I'll be glad to write
blog post with more details :)

Marcin

Reply via email to