Ahh, ok, I see.
Yes, it looks like a bug. So, I'd propose to deprecate the old "processing
time” watermark policy, which we can remove later, and create a new fixed one.
PS: It’s recommended to use "org.apache.beam.sdk.io.aws2.kinesis.KinesisIO”
instead of deprecated
On Fri, Oct 27, 2023, 9:09 AM Robert Bradshaw via dev
wrote:
> On Fri, Oct 27, 2023 at 7:50 AM Kellen Dye via dev
> wrote:
> >
> > > Auto is hard, because it would involve
> > > querying the runner before pipeline construction, and we may not even
> > > know what the runner is at this point
> >
On Fri, Oct 27, 2023 at 7:50 AM Kellen Dye via dev wrote:
>
> > Auto is hard, because it would involve
> > querying the runner before pipeline construction, and we may not even
> > know what the runner is at this point
>
> At the point where pipeline construction will start, you should have
No, I'm referring to this [1] policy which has unexpected (and hardly
avoidable on the user-code side) data loss issues. The problem is that
assigning timestamps to elements and watermarks is completely decoupled
and unrelated, which I'd say is a bug.
Jan
[1]
In Spotify's case we deploy streaming jobs via CI and would ideally verify
compatibility as part of the build process before submitting to dataflow.
Perhaps decoupled from the _running_ pipeline if we had a cache of previous
pipeline versions.
Currently the user experience is poor because any
You raise a very good point:
https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/org/apache/beam/model/job_management/v1/beam_job_api.proto#L54
The job management API does allow for the pipeline proto to be returned. So
one could get the live job, so the SDK could
Why not just to create a custom watermark policy for that? Or you mean to make
it as a default policy?
—
Alexey
> On 27 Oct 2023, at 10:25, Jan Lukavský wrote:
>
>
> Hi,
>
> when discussing about [1] we found out, that the issue is actually caused by
> processing time watermarks in
> Auto is hard, because it would involve
> querying the runner before pipeline construction, and we may not even
> know what the runner is at this point
At the point where pipeline construction will start, you should have access
to the pipeline arguments and be able to determine the runner. What
This is your daily summary of Beam's current high priority issues that may need
attention.
See https://beam.apache.org/contribute/issue-priorities for the meaning and
expectations around issue priorities.
Unassigned P1 Issues:
https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java
Hi,
when discussing about [1] we found out, that the issue is actually
caused by processing time watermarks in KinesisIO. Enabling this
watermark outputs watermarks based on current processing time, _but
event timestamps are derived from ingestion timestamp_. This can cause
unbounded
10 matches
Mail list logo