Hi Robert,
Thanks for the response.

As you've discovered, fully custom merging window fns are not yet
> supported portably, though this is on our todo list.
>
> https://issues.apache.org/jira/browse/BEAM-5601
>

Thanks for linking me to that.  I've watched it and voted for it, and maybe
I'll even take a peak at what it would take to implement if it appears to
be the best way forward for us.

Note that it's tricky to get the exact behavior you need as data may
> come in out of order. Consider, for example, three events of
> increasing timestamp e1, e2, e3 where e2 is the "end" event. It could
> happen that e1 and e3 get merged before e2 is seen, and there's no
> "unmerge" capability (the values may already be combined via an
> upstream combiner). How do you handle this?
>

This is something I've been wondering about myself.  I read the excellent
book Streaming Systems and it seems that the preferred way to solve this is
using event timestamps and watermarks, but that raised a few questions for
me.

First, just to clarify, you're correct that your example scenario could
occur _in processing time_ but our system can guarantee that it does not
happen in event time. i.e. The e2 "end" event will always have an event
timestamp later than e1 and e3.

So, can I use event time with watermarks to solve this problem?

IIUC, python does not have support for late arriving data, so that seems
like a pretty big issue.  If it did, would that be the preferred way of
solving this problem, or is that not enough?  If late data is indeed not
currently supported, then the critique of my custom WindowFn would apply to
Sessions in general would it not?

Is handling of late data in python something that's slated for an upcoming
release?

In the meantime, you could try getting this behavior with StatefulDoFns.
>

Is this fully supported by python now?  I've read some conflicting
information on the subject.

thanks again for the feedback!

(btw, I'm still experimenting with the static type checking issue:
https://github.com/python/mypy/issues/6933)

-chad

Reply via email to