Re: Watermarks

2023-10-02 Thread Perez
As per this link, it says that it only supports value_only for now as I am using pyflink. Does it mean I can't extract the timestamp appended by Kafka with pyflink as of now? https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/kafka/#deserializer or does it mean

Re: Watermarks

2023-10-02 Thread Perez
Hi Liu and Jinfeng, I am trying to implement KafkaDeserializationSchema for Pyflink but am unable to get any examples. Can you share some links or references using which I can understand and try to implement myself? Perez sid.

Re: Watermarks

2023-09-13 Thread Perez
Cool thanks for the clarification. Sid. On Mon, Sep 11, 2023 at 9:22 AM liu ron wrote: > Hi, Sid > > For the second question, I think it is not needed. > > Best, > Ron > > Feng Jin 于2023年9月9日周六 21:19写道: > >> hi Sid >> >> >> 1. You can customize KafkaDeserializationSchema[1], in the

Re: Watermarks

2023-09-10 Thread liu ron
Hi, Sid For the second question, I think it is not needed. Best, Ron Feng Jin 于2023年9月9日周六 21:19写道: > hi Sid > > > 1. You can customize KafkaDeserializationSchema[1], in the `deserialize` > method, you can obtain the Kafka event time. > > 2. I don't think it's necessary to explicitly mention

Re: Watermarks

2023-09-09 Thread Feng Jin
hi Sid 1. You can customize KafkaDeserializationSchema[1], in the `deserialize` method, you can obtain the Kafka event time. 2. I don't think it's necessary to explicitly mention the watermark strategy. [1].

Watermarks

2023-09-09 Thread Sid
Hello experts, My source is Kafka and I am trying to generate records for which I have FlinkKafkaConsumer class. Now my first question is how to consume an event timestamp for the records generated. I know for a fact that for CLI, there is one property called *print.timestamp=true* which gives

Re: Watermarks lagging behind events that generate them

2023-03-16 Thread David Anderson
Watermarks are not included in checkpoints or savepoints. See [1] for some head-scratchingly-complicated info about restarts, watermarks, and unaligned checkpoints. [1] https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/#interplay

Re: Watermarks lagging behind events that generate them

2023-03-15 Thread Shammon FY
Hi Alexis Currently I think checkpoint and savepoint will not save watermarks. I think how to deal with watermarks at checkpoint/savepoint is a good question, we can discuss this in dev mail list Best, Shammon FY On Wed, Mar 15, 2023 at 4:22 PM Alexis Sarda-Espinosa < sarda.espin...@gmail.

Re: Watermarks lagging behind events that generate them

2023-03-15 Thread Alexis Sarda-Espinosa
t; >> Hi David, thanks for the answer. One follow-up question: will the >> watermark be reset to Long.MIN_VALUE every time I restart a job with >> savepoint? >> >> Am Di., 14. März 2023 um 05:08 Uhr schrieb David Anderson < >> dander...@apache.org>: >&g

Re: Watermarks lagging behind events that generate them

2023-03-14 Thread Shammon FY
pin...@gmail.com> wrote: > Hi David, thanks for the answer. One follow-up question: will the > watermark be reset to Long.MIN_VALUE every time I restart a job with > savepoint? > > Am Di., 14. März 2023 um 05:08 Uhr schrieb David Anderson < > dander...@apache.org>:

Re: Watermarks lagging behind events that generate them

2023-03-14 Thread Alexis Sarda-Espinosa
Hi David, thanks for the answer. One follow-up question: will the watermark be reset to Long.MIN_VALUE every time I restart a job with savepoint? Am Di., 14. März 2023 um 05:08 Uhr schrieb David Anderson < dander...@apache.org>: > Watermarks always follow the corresponding event(s). I'm

Re: Watermarks lagging behind events that generate them

2023-03-13 Thread David Anderson
Watermarks always follow the corresponding event(s). I'm not sure why they were designed that way, but that is how they are implemented. Windows maintain this contract by emitting all of their results before forwarding the watermark that triggered the results. David On Mon, Mar 13, 2023 at 5:28

Re: Watermarks lagging behind events that generate them

2023-03-13 Thread Shammon FY
Hi Alexis Do you use both event-time watermark generator and TimerService for processing time in your job? Maybe you can try using event-time watermark first. Best, Shammon.FY On Sat, Mar 11, 2023 at 7:47 AM Alexis Sarda-Espinosa < sarda.espin...@gmail.com> wrote: > Hello, > > I recently ran

Watermarks lagging behind events that generate them

2023-03-10 Thread Alexis Sarda-Espinosa
Hello, I recently ran into a weird issue with a streaming job in Flink 1.16.1. One of my functions (KeyedProcessFunction) has been using processing time timers. I now want to execute the same job based on a historical data dump, so I had to adjust the logic to use event time timers in that case

Re: Non-temporal watermarks

2023-02-03 Thread David Anderson
DataStream time windows and Flink SQL make assumptions about the timestamps and watermarks being milliseconds since the epoch. But the underlying machinery does not. So if you limit yourself to process functions (for example), then nothing will assign any semantics to the time values. David

Re: Non-temporal watermarks

2023-02-02 Thread James Sandys-Lumsdaine
watermark is reached and that required all sources to have reached at least that point in the recovery). Once we have reached the startup datetime watermark the system seamlessly flips into live processing mode. The watermarks still trigger my timers but now we are processing the last ~1 minute

Re: Non-temporal watermarks

2023-02-02 Thread Gen Luo
> concept of watermarks is more abstract, so I'll leave implementation > details aside. > > Speaking generally, yes, there is a set of requirements that must be met > in order to be able to generate a system that uses watermarks. > > The primary question is what are watermarks used

Re: Non-temporal watermarks

2023-02-02 Thread Jan Lukavský
Hi, I will not speak about details related to Flink specifically, the concept of watermarks is more abstract, so I'll leave implementation details aside. Speaking generally, yes, there is a set of requirements that must be met in order to be able to generate a system that uses watermarks

Non-temporal watermarks

2023-02-01 Thread Yaroslav Tkachenko
Hey everyone, I'm wondering if anyone has done any experiments trying to use non-temporal watermarks? For example, a dataset may contain some kind of virtual timestamp / version field that behaves just like a regular timestamp (monotonically increasing, etc.), but has a different scale / range

RE: Processing watermarks in a broadcast connected stream

2023-01-31 Thread Schwalbe Matthias
) number of DataStream keyed/broadcast/plain and also to tap into the meta-stream of watermark events. Each Input is set up separately and can implement separate handlers for the events/watermarks/etc. However, it is an operator implementation, you e.g. need to manually set up timer manager

Processing watermarks in a broadcast connected stream

2023-01-30 Thread Sajjad Rizvi
Hi, I am trying to process watermarks in a BroadcastConnectedStream. However, I am not able to find any direct way to handle watermark events, similar to what we have in processWatermark1 in a KeyedCoProcessOperator. Following are further details. In the context of the example given

Re: Can't use nested attributes as watermarks in Table

2022-12-17 Thread Theodor Wübker
be interested (since obviously it would help my thesis). Maybe you can tell how much effort it would be? I imagine it would need support in the place where the watermarks are registered (the one I sent) and in the place they are actually used (which I have not checked yet at all). -Theo > On

Re: Can't use nested attributes as watermarks in Table

2022-12-16 Thread Martijn Visser
Hi Theo, The most logical reason is that nested attributes were added later than watermarks were :) I agree that it's something that would be worthwhile to improve. If you can and want to make a contribution on this, that would be great. Best regards, Martijn On Wed, Dec 14, 2022 at 9:24 AM

Re: Can't use nested attributes as watermarks in Table

2022-12-14 Thread Theodor Wübker
Actually, this behaviour is documented <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#create-table> (See the Watermarks section, where it is stated that the column must be a “top-level” column). So I suppose, there is a reason. Nevertheless it is

Can't use nested attributes as watermarks in Table

2022-12-13 Thread Theodor Wübker
Hey everyone, I have encountered a problem with my Table API Program. I am trying to use a nested attribute as a watermark. The structure of my schema is a row, which itself has 3 rows as attributes and they again have some attributes, especially the Timestamp that I want to use as a

Re: Overwriting watermarks in DataStream

2022-08-22 Thread Peter Schrott
Hi David, Thanks a lot for clarification. Best, Peter > On 21. Aug 2022, at 18:36, David Anderson wrote: > > If you have two watermark strategies in your job, the downstream > TimestampsAndWatermarksOperator will absorb incoming watermarks and not > forward

Re: Overwriting watermarks in DataStream

2022-08-21 Thread David Anderson
If you have two watermark strategies in your job, the downstream TimestampsAndWatermarksOperator will absorb incoming watermarks and not forward them downstream, but it will have no effect upstream. The only exception to this is that watermarks equal to Long.MAX_VALUE are forwarded downstream

Overwriting watermarks in DataStream

2022-08-18 Thread Peter Schrott
Hi there, While still struggling with events and watermarks out of order after sorting with a buffer process function (compare [1]) I tired to solve the issue by assigning a new watermark after the mentioned sorting function. The Flink docs [2] are not very certain about the impact

Eventtimes and watermarks not in sync after sorting stream by eventide

2022-08-17 Thread Peter Schrott
son of the subsequent event times, the event time of the current event is compared with the current watermark (context.timerService().currentWatermark()). Result: Watermarks and event timestamps are NOT ascending. On some events the event timestamp is lower than the current watermark. I somehow susp

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-29 Thread r pp
gt; .withTimestampAssigner(new >>>> SerializableTimestampAssigner[StarscreamEventCounter_V1] { >>>> override def extractTimestamp(element: StarscreamEventCounter_V1, >>>> recordTimestamp: Long): Long = >>>> element.envelopeTimestamp >>>&g

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-04 Thread Ryan van Huuksloot
f(2, ChronoUnit.HOURS)) >>> .withTimestampAssigner(new >>> SerializableTimestampAssigner[StarscreamEventCounter_V1] { >>> override def extractTimestamp(element: StarscreamEventCounter_V1, >>> recordTimestamp: Long): Long = >>> element.env

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-04 Thread Arvid Heise
.withTimestampAssigner(new >> SerializableTimestampAssigner[StarscreamEventCounter_V1] { >> override def extractTimestamp(element: StarscreamEventCounter_V1, >> recordTimestamp: Long): Long = >> element.envelopeTimestamp >> }) >> >> The Watermarks are correct

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-01 Thread r pp
f(2, ChronoUnit.HOURS)) > .withTimestampAssigner(new > SerializableTimestampAssigner[StarscreamEventCounter_V1] { > override def extractTimestamp(element: StarscreamEventCounter_V1, > recordTimestamp: Long): Long = > element.envelopeTimestamp > }) > > The Watermarks ar

[DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-03-31 Thread Ryan van Huuksloot
extractTimestamp(element: StarscreamEventCounter_V1, recordTimestamp: Long): Long = element.envelopeTimestamp }) The Watermarks are correctly getting assigned. However, when a reduce function is used the window never terminates because the `ctx.getCurrentWatermark()` returns the default

Re: Watermarks event time vs processing time

2022-03-29 Thread HG
); out.collect(((ObjectNode) originalEvent).toString()); } } } Op di 29 mrt. 2022 om 15:23 schreef Schwalbe Matthias < matthias.schwa...@viseca.ch>: > Hello Hans-Peter, > > > > I’m a little confused which version of your code you are tes

RE: Watermarks event time vs processing time

2022-03-29 Thread Schwalbe Matthias
Hello Hans-Peter, I’m a little confused which version of your code you are testing against: * ProcessingTimeSessionWindows or EventTimeSessionWindows? * did you keep the withIdleness() ?? As said before: * for ProcessingTimeSessionWindows, watermarks play no role * if you keep

Re: Watermarks event time vs processing time

2022-03-29 Thread HG
ater on >2. So far you used a session window to determine the point in time >when to emit the stored/enriched/sorted events >3. Watermarks are generated with bounded out of orderness >4. You use session windows with a specific gap >5. In your experiment you ever onl

RE: Interval join operator is not forwarding watermarks correctly

2022-03-17 Thread Alexis Sarda-Espinosa
Hi Dawid, Thanks for the update, I also managed to work around it by adding another watermark assignment operator between the join and the window. I’ll have to see if it’s possible to assign watermarks at the source, but even if it is, I worry that the different partitions created by all my

Re: Interval join operator is not forwarding watermarks correctly

2022-03-17 Thread Dawid Wysakowicz
Hi Alexis, I tried looking into your example. First of all, so far, I've spent only a limited time looking at the WatermarkGenerator, and I have not thoroughly understood how it works. I'd discourage assigning watermarks anywhere in the middle of your pipeline. This is considered

RE: Watermarks event time vs processing time

2022-03-16 Thread Schwalbe Matthias
ing of the watermarks on single operators / per subtask useful: Look for subtasks that don’t have watermarks, or too low watermarks for a specific session window to trigger. Thias From: HG Sent: Mittwoch, 16. März 2022 16:41 To: Schwalbe Matthias Cc: user Subject: Re: Watermarks event time

Re: Watermarks event time vs processing time

2022-03-16 Thread HG
ific enough (please confirm or correct in your answer): > >1. You store incoming events in state per transaction_id to be >sorted/aggregated(min/max time) by event time later on >2. So far you used a session window to determine the point in time >when to emit the stored/en

RE: Watermarks event time vs processing time

2022-03-16 Thread Schwalbe Matthias
/aggregated(min/max time) by event time later on 2. So far you used a session window to determine the point in time when to emit the stored/enriched/sorted events 3. Watermarks are generated with bounded out of orderness 4. You use session windows with a specific gap 5. In your experiment you

Watermarks event time vs processing time

2022-03-16 Thread HG
but somehow the processing did not run as expected. When I pushed 1000 events eventually 800 or so would appear at the output. This was resolved by switching to ProcessingTimeSessionWindows . My thought was then that I could remove the watermarkstrategies with watermarks with timestamp assigners (handling

Re: Interval join operator is not forwarding watermarks correctly

2022-03-15 Thread Alexis Sarda-Espinosa
For completeness, this still happens with Flink 1.14.4 Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Friday, March 11, 2022 12:21 AM To: user@flink.apache.org Cc: pnowoj...@apache.org Subject: Re: Interval join operator is not forwarding watermarks

Re: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
] https://github.com/asardaes/flink-interval-join-test Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Thursday, March 10, 2022 7:47 PM To: user@flink.apache.org Cc: pnowoj...@apache.org Subject: RE: Interval join operator is not forwarding watermarks

RE: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
: Interval join operator is not forwarding watermarks correctly Hello, I'm in the process of updating from Flink 1.11.3 to 1.14.3, and it seems the interval join in my pipeline is no longer working. More specifically, I have a sliding window after the interval join, and the window isn't firing. After

Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
.connect(stream2) .keyBy(selector1, selector2) .transform("Interval Join", TypeInformation.of(Pojo.class), joinOperator); --- Some more information in case it's relevant: - stream2 is obtained from a side output. - both stream1 and stream2 have watermarks assigned by custom str

Re: table api watermarks, timestamps, outoforderness and head aches

2022-02-14 Thread Francesco Guardiani
en you should set >> an idleness which after that, a watermark is produced. >> >> Idleness is >> >> On Fri, Feb 11, 2022 at 2:53 PM HG wrote: >> >>> Hi, >>> >>> I am getting a headache when thinking about watermarks and timestamp

Re: table api watermarks, timestamps, outoforderness and head aches

2022-02-14 Thread HG
couple of days nothing is produced), then you should set an > idleness which after that, a watermark is produced. > > Idleness is > > On Fri, Feb 11, 2022 at 2:53 PM HG wrote: > >> Hi, >> >> I am getting a headache when thinking about watermarks and tim

Re: table api watermarks, timestamps, outoforderness and head aches

2022-02-14 Thread Francesco Guardiani
uple of days nothing is produced), then you should set an idleness which after that, a watermark is produced. Idleness is On Fri, Feb 11, 2022 at 2:53 PM HG wrote: > Hi, > > I am getting a headache when thinking about watermarks and timestamps. > My application reads events from

table api watermarks, timestamps, outoforderness and head aches

2022-02-11 Thread HG
Hi, I am getting a headache when thinking about watermarks and timestamps. My application reads events from Kafka (they are in json format) as a Datastream Events can be keyed by a transactionId and have a event timestamp (handlingTime) All events belonging to a single transactionId will arrive

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-18 Thread Arvid Heise
processFunction will just emit watermarks from upstream as they come. No function/operator in Flink is a black hole w.r.t. watermarks. It's just important to remember that watermark after a network shuffle is always the min of all inputs (ignoring idle inputs). So if any connected upstream part

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-15 Thread Ahmad Alkilani
Thanks again Arvid, I am getting closer to the culprit as I've found some interesting scenarios. Still no exact answer yet. We are indeed also using .withIdleness to mitigate slow/issues with partitions. I did have a few clarifying questions though w.r.t watermarks if you don't mind. *Watermark

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-14 Thread Arvid Heise
upstream. A usual suspect when not seeing good watermarks is that the custom watermark assigner is not working as expected. But you mentioned that with a no-op, the process function is actually showing the watermark and that leaves me completely puzzled. I would dump down your example even more

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-12 Thread Ahmad Alkilani
progress. So now: Kafka -> Flink Kafka source -> flatMap (map & filter) -> assignTimestampsAndWaterMarks -> map Function -> *process function (print watermarks) *-> Key By -> Keyed Process Function -> *process function (print watermarks)* -> Simple Sink I am seeing similar

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-12 Thread Arvid Heise
you please simply remove AsyncIO+Sink from your job and check for print statements? On Tue, Oct 12, 2021 at 3:23 AM Ahmad Alkilani wrote: > Flink 1.11 > I have a simple Flink application that reads from Kafka, uses event > timestamps, assigns timestamps and watermarks and then key'

Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-11 Thread Ahmad Alkilani
Flink 1.11 I have a simple Flink application that reads from Kafka, uses event timestamps, assigns timestamps and watermarks and then key's by a field and uses a KeyedProcessFunciton. The keyed process function outputs events from with the `processElement` method using `out.collect`. No timers

Re: Empty Kafka topics and watermarks

2021-10-11 Thread Piotr Nowojski
r Nowojski > *Sent:* 08 October 2021 13:17 > *To:* James Sandys-Lumsdaine > *Cc:* user@flink.apache.org > *Subject:* Re: Empty Kafka topics and watermarks > > Hi James, > > I believe you have encountered a bug that we have already fixed [1]. The > small problem is t

Re: Empty Kafka topics and watermarks

2021-10-11 Thread James Sandys-Lumsdaine
Kafka topics and watermarks Hi James, I believe you have encountered a bug that we have already fixed [1]. The small problem is that in order to fix this bug, we had to change some `@PublicEvolving` interfaces and thus we were not able to backport this fix to earlier minor releases

Re: Empty Kafka topics and watermarks

2021-10-08 Thread Piotr Nowojski
Hi James, I believe you have encountered a bug that we have already fixed [1]. The small problem is that in order to fix this bug, we had to change some `@PublicEvolving` interfaces and thus we were not able to backport this fix to earlier minor releases. As such, this is only fixed starting from

Empty Kafka topics and watermarks

2021-10-08 Thread James Sandys-Lumsdaine
Hi everyone, I'm putting together a Flink workflow that needs to merge historic data from a custom JDBC source with a Kafka flow (for the realtime data). I have successfully written the custom JDBC source that emits a watermark for the last event time after all the DB data has been emitted but

Re: stream processing savepoints and watermarks question

2021-09-24 Thread Marco Villalobos
ned today. >> When we tried to shutdown a job with a savepoint, the watermarks became >> equal to 2^63 - 1. >> >> This caused timers to fire indefinitely and crash downstream systems with >> overloaded untrue data. >> >> We are using event time processing with Kafka

RE: stream processing savepoints and watermarks question

2021-09-24 Thread Schwalbe Matthias
eak the infinite iteration over timers … I believe the behavior exhibited by flink is intentional and no defect! What do you think? Thias From: JING ZHANG Sent: Freitag, 24. September 2021 12:25 To: Guowei Ma Cc: Marco Villalobos ; user Subject: Re: stream processing savepoints and watermarks quest

Re: stream processing savepoints and watermarks question

2021-09-24 Thread JING ZHANG
> could only be triggered when there is a watermark (except the "quiesce > phase"). > I think it could not advance any watermarks after MAX_WATERMARK is > received. > > Best, > Guowei > > > On Fri, Sep 24, 2021 at 4:31 PM JING ZHANG wrote: > >> Hi Guowei, &

Re: stream processing savepoints and watermarks question

2021-09-24 Thread Guowei Ma
Hi, JING Thanks for the case. But I am not sure this would happen. As far as I know the event timer could only be triggered when there is a watermark (except the "quiesce phase"). I think it could not advance any watermarks after MAX_WATERMARK is received. Best, Guowei On Fri, Se

Re: stream processing savepoints and watermarks question

2021-09-24 Thread JING ZHANG
ps://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-final-savepoint >> >> Best, >> JING ZHANG >> >> Marco Villalobos 于2021年9月24日周五 下午12:54写道: >> >>> Something strange happened today. >>> W

Re: stream processing savepoints and watermarks question

2021-09-24 Thread Guowei Ma
ned today. >> When we tried to shutdown a job with a savepoint, the watermarks became >> equal to 2^63 - 1. >> >> This caused timers to fire indefinitely and crash downstream systems with >> overloaded untrue data. >> >> We are using event time processing with Kafka as o

Re: stream processing savepoints and watermarks question

2021-09-24 Thread JING ZHANG
-a-final-savepoint Best, JING ZHANG Marco Villalobos 于2021年9月24日周五 下午12:54写道: > Something strange happened today. > When we tried to shutdown a job with a savepoint, the watermarks became > equal to 2^63 - 1. > > This caused timers to fire indefinitely and crash downstream systems wi

stream processing savepoints and watermarks question

2021-09-23 Thread Marco Villalobos
Something strange happened today. When we tried to shutdown a job with a savepoint, the watermarks became equal to 2^63 - 1. This caused timers to fire indefinitely and crash downstream systems with overloaded untrue data. We are using event time processing with Kafka as our source. It seems

Re: Theory question on process_continously processing mode and watermarks

2021-08-19 Thread Arvid Heise
I think what you are seeing is that the files have records with similar timestamps. That means after reading file1 your watermarks are already progressed to the end of your time range. When Flink picks up file2, all records are considered late records and no windows fire anymore. See [1

Re: Theory question on process_continously processing mode and watermarks

2021-08-19 Thread Caizhi Weng
; > Using FileProcessingMode.*PROCESS_CONTINUOUSLY* > > Into a streaming job that uses tumbling Windows and watermarks causes my > streaming process to stop ad the reading files phase. > > Meanwhile if i delete my declarations of Windows and watermark the program > works as expected. &g

Theory question on process_continously processing mode and watermarks

2021-08-19 Thread Fra
Hello, during my personal development of a Flink streaming Platform i found something that perplexes me.Using FileProcessingMode.PROCESS_CONTINUOUSLYInto a streaming job that uses tumbling Windows and watermarks causes my streaming process to stop ad the reading files phase.Meanwhile if i delete

Re: Watermarks in Event Time Temporal Join

2021-04-28 Thread Leonard Xu
Thanks for your example, Maciej I can explain more about the design. > Let's have events. > S1, id1, v1, 1 > S2, id1, v2, 1 > > Nothing is happening as none of the streams have reached the watermark. > Now let's add > S2, id2, v2, 101 > This should trigger join for id1 because we have all the

Re: Watermarks in Event Time Temporal Join

2021-04-28 Thread Maciej Bryński
Hi Leonard, Let's assume we have two streams. S1 - id, value1, ts1 with watermark = ts1 - 1 S2 - id, value2, ts2 with watermark = ts2 - 1 Then we have following interval join SELECT id, value1, value2, ts1 FROM S1 JOIN S2 ON S1.id = S2.id and ts1 between ts2 - 1 and ts2 Let's have events.

Re: Watermarks in Event Time Temporal Join

2021-04-27 Thread Leonard Xu
Hello, Maciej > I agree the watermark should pass on versioned table side, because > this is the only way to know which version of record should be used. > But if we mimics behaviour of interval join then main stream watermark > could be skipped. IIRC, rowtime interval join requires the watermark

Re: Watermarks in Event Time Temporal Join

2021-04-26 Thread Maciej Bryński
sage is late or early. If we only > use the watermark on versioned table side, we have no means to determine > whether the event in the main stream is ready to emit. > > Best, > Shengkai > > maverick 于2021年4月26日周一 上午2:31写道: >> >> Hi, >> I'm curious why Event

Re: Watermarks in Event Time Temporal Join

2021-04-25 Thread Shengkai Fang
us why Event Time Temporal Join needs watermarks from both sides > to > perform join. > > Shouldn't watermark on versioned table side be enough to perform join ? > > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >

Watermarks in Event Time Temporal Join

2021-04-25 Thread maverick
Hi, I'm curious why Event Time Temporal Join needs watermarks from both sides to perform join. Shouldn't watermark on versioned table side be enough to perform join ? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink 1.11 FlinkKafkaConsumer not propagating watermarks

2021-04-21 Thread Arvid Heise
ssors using Flink 1.12, and tried to get them working on Amazon > EMR. However Amazon EMR only supports Flink 1.11.2 at the moment. When I > went to downgrade, I found, inexplicably, that watermarks were no longer > propagating. > > There is only one partition on the topic, and p

Re: proper way to manage watermarks with messages combining multiple timestamps

2021-04-19 Thread Arvid Heise
Hi Mathieu, The easiest way is to already emit several inputs on the source level. If you use DeserializationSchema, try to use the method with the collector. The watermarks should then be generated as if you would only receive one element at a time. On Sun, Apr 18, 2021 at 11:08 AM Mathieu D

Re: proper way to manage watermarks with messages combining multiple timestamps

2021-04-18 Thread Mathieu D
aggregation m2 in the 9:00-10:00 aggregation What's the proper way to set the watermarks in such a case ? Thanks for your insights ! Mathieu Le sam. 17 avr. 2021 à 07:05, Lasse Nedergaard < lassenedergaardfl...@gmail.com> a écrit : > Hi > > One thing to remember is that Flinks wate

proper way to manage watermarks with messages combining multiple timestamps

2021-04-16 Thread Mathieu D
Hello, I'm totally new to Flink, and I'd like to make sure I understand things properly around watermarks. We're processing messages from iot devices. Those messages have a timestamp, and we have a first phase of processing based on this timestamp. So far so good. These messages actually "

Flink 1.11 FlinkKafkaConsumer not propagating watermarks

2021-04-14 Thread Edward Bingham
Hi everyone, I'm seeing some strange behavior from FlinkKafkaConsumer. I wrote up some Flink processors using Flink 1.12, and tried to get them working on Amazon EMR. However Amazon EMR only supports Flink 1.11.2 at the moment. When I went to downgrade, I found, inexplicably, that watermarks were

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
Thank you. Yuan Mei 于2021年3月4日周四 下午11:10写道: > Hey Yidan, > > KafkaShuffle is initially motivated to support shuffle data > materialization on Kafka, and started with a limited version supporting > hash-partition only. Watermark is maintained and forwarded as part of > shuffle data. So you are

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread Yuan Mei
Hey Yidan, KafkaShuffle is initially motivated to support shuffle data materialization on Kafka, and started with a limited version supporting hash-partition only. Watermark is maintained and forwarded as part of shuffle data. So you are right, watermark storing/forwarding logic has nothing to do

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
And do you know when kafka consumer/producer will be re implemented according to the new source/sink api? I am thinking whether I should adjust the code for now, since I need to re adjust the code when it is reconstructed to the new source/sink api. yidan zhao 于2021年3月4日周四 下午4:44写道: > I

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
I uploaded a picture to describe that. https://ftp.bmp.ovh/imgs/2021/03/2068f2e22045e696.png >

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
One more question, If I only need watermark's logic, not keyedStream, why not provide methods such as writeDataStream and readDataStream. It uses the similar methods for kafka producer sink records and broadcast watermark to partitions and then kafka consumers read it and regenerate the watermark.

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread Piotr Nowojski
Great :) Just one more note. Currently FlinkKafkaShuffle has a critical bug [1] that probably will prevent you from using it directly. I hope it will be fixed in some next release. In the meantime you can just inspire your solution with the source code. Best, Piotrek [1]

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread yidan zhao
Yes, you are right and thank you. I take a brief look at what FlinkKafkaShuffle is doing, it seems what I need and I will have a try. >

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread Piotr Nowojski
Hi, Can not you write the watermark as a special event to the "mid-topic"? In the "new job2" you would parse this event and use it to assign watermark before `xxxWindow2`? I believe this is what FlinkKafkaShuffle is doing [1], you could look at its code for inspiration. Piotrek [1]

how to propagate watermarks across multiple jobs

2021-03-01 Thread yidan zhao
I have a job which includes about 50+ tasks. I want to split it to multiple jobs, and the data is transferred through Kafka, but how about watermark? Is anyone have do something similar and solved this problem? Here I give an example: The original job: kafkaStream1(src-topic) => xxxProcess =>

Re: Watermarks on map operator

2021-02-05 Thread David Anderson
Basically the only thing that Watermarks do is to trigger event time timers. Event time timers are used explicitly in KeyedProcessFunctions, but are also used internally by time windows, CEP (to sort the event stream), in various time-based join operations, and within the Table/SQL API. If you

Re: Watermarks on map operator

2021-02-04 Thread Kezhu Wang
> it is not clear to me if watermarks are also used by map/flatmpat operators or just by window operators. Watermarks are most liked only used by timing segmented aggregation operator to trigger result materialization. In streaming, this “timing segmentation” is usually called “windowing”,

Watermarks on map operator

2021-02-04 Thread Antonis Papaioannou
Hi, reading through the documentation regarding waterrmarks, it is not clear to me if watermarks are also used by map/flatmpat operators or just by window operators. My application reads from a kafka topic (with multiple partitions) and extracts assigns timestamp on each tuple based on some

Re: Question about Watermarks within a KeyedProcessFunction

2020-06-27 Thread David Anderson
With an AscendingTimestampExtractor, watermarks are not created for every event, and as your job starts up, some events will be processed before the first watermark is generated. The impossible value you see is an initial value that's in place until the first real watermark is available

Question about Watermarks within a KeyedProcessFunction

2020-06-26 Thread Marco Villalobos
My source is a Kafka topic. I am using Event Time. I assign the event time with an AscendingTimestampExtractor I noticed when debugging that in the KeyedProcessFunction that after my highest known event time of: 2020-06-23T00:46:30.000Z the processElement method had a watermark with an

Re: Interaction of watermarks and windows

2020-06-21 Thread Benchao Li
ling window operator has no > effect on the watermarks of the resulting append stream - the watermarks of > the input stream are propagated as-is. > > This seems to be a documented behavior > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#interaction

Re: Interaction of watermarks and windows

2020-06-21 Thread Jark Wu
ling window operator has no > effect on the watermarks of the resulting append stream - the watermarks of > the input stream are propagated as-is. > > This seems to be a documented behavior > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#interaction-of-wate

Interaction of watermarks and windows

2020-06-21 Thread Sergii Mikhtoniuk
that the tumbling window operator has no effect on the watermarks of the resulting append stream - the watermarks of the input stream are propagated as-is. This seems to be a documented behavior https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#interaction

  1   2   3   >