Re: Java SplittableDoFn Watermark API

2020-03-13 Thread Luke Cwik
Using the interfaces defined in pr/10992[1], I started migrating from ProcessContext#updateWatermark to the WatermarkEstimators with pr/11126[2]. The PR is very WIP but it does include the necessary changes to the Watch transform and also the UnboundedSource SDF wrapper to be able to report

Re: Java SplittableDoFn Watermark API

2020-03-09 Thread Luke Cwik
The current set of watermark estimators in Apache Beam for UnboundedSource are: SQS - tracks the timestamp of the last unacked message (does not report monotonically increasing watermarks and assumes that the system will make sure to lower bound what is being reported) AMP - tracks timestamp of

Re: Java SplittableDoFn Watermark API

2020-03-09 Thread Ismaël Mejía
Thanks for the explanation on Watch + FileIO it is really clear. Extra question related to WatermarkEstimator, is it supposed to be called in pipelines at the same exact moments that getWatermark is today for Unbounded sources? (slightly unrelated) There is an open JIRA for an issue related to

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Luke Cwik
On Wed, Mar 4, 2020 at 7:36 AM Ismaël Mejía wrote: > > I think we should move to a world where *all* runners become portable > runners. > > The at doesn't mean they all need to user docker images, or even GRPC, > but I > > don't think having classical-only or classical-excluded features is >

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Luke Cwik
On Wed, Mar 4, 2020 at 7:37 AM Ismaël Mejía wrote: > > Bounded SDFs are allowed to have a method signature which has void as the > > return type OR a ProcessContinuation. Unbounded SDFs must use a > > ProcessContinuation as the return type. The "void" return case improves > ease > > of use

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Ismaël Mejía
> I think we should move to a world where *all* runners become portable runners. > The at doesn't mean they all need to user docker images, or even GRPC, but I > don't think having classical-only or classical-excluded features is where we > want to be long-term. Robert I agree 100% with you, I

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Ismaël Mejía
> Bounded SDFs are allowed to have a method signature which has void as the > return type OR a ProcessContinuation. Unbounded SDFs must use a > ProcessContinuation as the return type. The "void" return case improves ease > of use since it is likely to be the common case for bounded SDFs. Luke,

Re: Java SplittableDoFn Watermark API

2020-03-03 Thread Luke Cwik
On Tue, Mar 3, 2020 at 9:11 AM Ismaël Mejía wrote: > > the unification of bounded/unbounded within SplittableDoFn has always > been a goal. > > I am glad to know that my intuition is correct and that this was > envisioned, the > idea of checkpoints for bounded inputs sounds super really useful.

Re: Java SplittableDoFn Watermark API

2020-03-03 Thread Robert Bradshaw
On Tue, Mar 3, 2020 at 9:11 AM Ismaël Mejía wrote: > > > the unification of bounded/unbounded within SplittableDoFn has always been > > a goal. > > I am glad to know that my intuition is correct and that this was envisioned, > the > idea of checkpoints for bounded inputs sounds super really

Re: Java SplittableDoFn Watermark API

2020-03-03 Thread Ismaël Mejía
> the unification of bounded/unbounded within SplittableDoFn has always been a > goal. I am glad to know that my intuition is correct and that this was envisioned, the idea of checkpoints for bounded inputs sounds super really useful. Eager to try that on practice. An explicit example (with a

Re: Java SplittableDoFn Watermark API

2020-03-02 Thread Robert Bradshaw
I don't have a strong preference for using a provider/having a set of tightly coupled methods in Java, other than that we be consistent (and we already use the methods style for restrictions). On Mon, Mar 2, 2020 at 3:32 PM Luke Cwik wrote: > > Jan, there are some parts of Apache Beam the

Re: Java SplittableDoFn Watermark API

2020-03-02 Thread Luke Cwik
Jan, there are some parts of Apache Beam the watermarks package will likely rely on (@Experimental annotation, javadoc links) but fundamentally should not rely on core and someone could create a separate package for this. Ismael, the unification of bounded/unbounded within SplittableDoFn has

Re: Java SplittableDoFn Watermark API

2020-02-28 Thread Ismaël Mejía
I just realized that the HBaseIO example is not a good one because we can already have Watch like behavior as we do for Partition discovery in HCatalogIO. Still I am interested on your views on bounded/unbounded unification. Interesting question2: How this will annotations connect with the Watch

Re: Java SplittableDoFn Watermark API

2020-02-28 Thread Ismaël Mejía
Really interesting! Implementing correctly the watermark has been a common struggle for IO authors, to the point that some IOs still have issues around that. So +1 for this, in particular if we can get to reuse common patterns. I was not aware of Boyuan's work around this, really nice. One aspect

Re: Java SplittableDoFn Watermark API

2020-02-27 Thread Kenneth Knowles
Great idea. Are any of the methods optional or useful on their own? It seems like maybe not? So then a single annotation to return an object that returns all the methods might be more clear. Per Boyuan's work - WatermarkEstimatorProvider? Kenn On Thu, Feb 27, 2020 at 2:43 PM Luke Cwik wrote:

Java SplittableDoFn Watermark API

2020-02-27 Thread Luke Cwik
See this doc[1] and blog[2] for some context about SplittableDoFns. To support watermark reporting within the Java SDK for SplittableDoFns, we need a way to have SDF authors to report watermark estimates over the element and restriction pair that they are processing. For UnboundedSources, it was