Re: Processing time watermarks in KinesisIO

2023-10-27 Thread Alexey Romanenko
Ahh, ok, I see. Yes, it looks like a bug. So, I'd propose to deprecate the old "processing time” watermark policy, which we can remove later, and create a new fixed one. PS: It’s recommended to use "org.apache.beam.sdk.io.aws2.kinesis.KinesisIO” instead of deprecated

Re: Streaming update compatibility

2023-10-27 Thread Robert Burke
On Fri, Oct 27, 2023, 9:09 AM Robert Bradshaw via dev wrote: > On Fri, Oct 27, 2023 at 7:50 AM Kellen Dye via dev > wrote: > > > > > Auto is hard, because it would involve > > > querying the runner before pipeline construction, and we may not even > > > know what the runner is at this point > >

Re: Streaming update compatibility

2023-10-27 Thread Robert Bradshaw via dev
On Fri, Oct 27, 2023 at 7:50 AM Kellen Dye via dev wrote: > > > Auto is hard, because it would involve > > querying the runner before pipeline construction, and we may not even > > know what the runner is at this point > > At the point where pipeline construction will start, you should have

Re: Processing time watermarks in KinesisIO

2023-10-27 Thread Jan Lukavský
No, I'm referring to this [1] policy which has unexpected (and hardly avoidable on the user-code side) data loss issues. The problem is that assigning timestamps to elements and watermarks is completely decoupled and unrelated, which I'd say is a bug.  Jan [1]

Re: Streaming update compatibility

2023-10-27 Thread Kellen Dye via dev
In Spotify's case we deploy streaming jobs via CI and would ideally verify compatibility as part of the build process before submitting to dataflow. Perhaps decoupled from the _running_ pipeline if we had a cache of previous pipeline versions. Currently the user experience is poor because any

Re: Streaming update compatibility

2023-10-27 Thread Robert Burke
You raise a very good point: https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/org/apache/beam/model/job_management/v1/beam_job_api.proto#L54 The job management API does allow for the pipeline proto to be returned. So one could get the live job, so the SDK could

Re: Processing time watermarks in KinesisIO

2023-10-27 Thread Alexey Romanenko
Why not just to create a custom watermark policy for that? Or you mean to make it as a default policy? — Alexey > On 27 Oct 2023, at 10:25, Jan Lukavský wrote: > > > Hi, > > when discussing about [1] we found out, that the issue is actually caused by > processing time watermarks in

Re: Streaming update compatibility

2023-10-27 Thread Kellen Dye via dev
> Auto is hard, because it would involve > querying the runner before pipeline construction, and we may not even > know what the runner is at this point At the point where pipeline construction will start, you should have access to the pipeline arguments and be able to determine the runner. What

Beam High Priority Issue Report (46)

2023-10-27 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java

Processing time watermarks in KinesisIO

2023-10-27 Thread Jan Lukavský
Hi, when discussing about [1] we found out, that the issue is actually caused by processing time watermarks in KinesisIO. Enabling this watermark outputs watermarks based on current processing time, _but event timestamps are derived from ingestion timestamp_. This can cause unbounded