Re: Weird Flink Kafka source watermark behavior

2022-04-13 Thread Jin Yi
sue happens even if I use an idle watermark. > >>> > >>> > >>> > >>> You would expect to see glitches with watermarking when you enable > idleness. > >>> > >>> Idleness sort of trades watermark correctness for reduces latency

Re: Weird Flink Kafka source watermark behavior

2022-04-13 Thread Qingsheng Ren
f trades watermark correctness for reduces latency when >>> processing timers (much simplified). >>> >>> With idleness enabled you have no guaranties whatsoever as to the quality >>> of watermarks (which might be ok in some cases). >>> >>> BTW

Re: Weird Flink Kafka source watermark behavior

2022-04-13 Thread Qingsheng Ren
> > enabling idleness would break everything. > > > > > > > > Oversight put aside things should work the way you implemented it. > > > > > > > > One thing I could imagine to be a cause is > > > > • that over time the k

Re: Weird Flink Kafka source watermark behavior

2022-04-08 Thread Martijn Visser
;> - The same issue happens even if I use an idle watermark. >>> > >>> > >>> > >>> > You would expect to see glitches with watermarking when you enable >>> idleness. >>> > >>> > Idleness sort o

Re: Weird Flink Kafka source watermark behavior

2022-04-08 Thread Jin Yi
cy when >> processing timers (much simplified). >> > >> > With idleness enabled you have no guaranties whatsoever as to the >> quality of watermarks (which might be ok in some cases). >> > >> > BTW we dominantly use a mix of fast and slow sources (that only update >> once a da

Re: Weird Flink Kafka source watermark behavior

2022-04-08 Thread Jin Yi
urces (that only update > once a day) which hand-pimped watermarking and late event processing, and > enabling idleness would break everything. > > > > > > > > Oversight put aside things should work the way you implemented it. > > > > > > > > One thing I c

Re: Weird Flink Kafka source watermark behavior

2022-04-08 Thread Qingsheng Ren
consumer subtasks which would probably stress correct recalculation of > watermarks. Hence #partition == number subtask might reduce the problem > • can you enable logging of partition-consumer assignment, to see if > that is the cause of the problem > • also involuntary r

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dongwon Kim
ntary restarts of the job can cause havoc as this resets >watermarking > > > > I’ll be off next week, unable to take part in the active discussion … > > > > Sincere greetings > > > > Thias > > > > > > > > > > *From:* Dan Hill > *

RE: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Schwalbe Matthias
Oops mistyped your name, Dan From: Schwalbe Matthias Sent: Freitag, 18. März 2022 09:02 To: 'Dan Hill' ; Dongwon Kim Cc: user Subject: RE: Weird Flink Kafka source watermark behavior Hi San, Dongwon, I share the opinion that when per-partition watermarking is enabled, you shoul

RE: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Schwalbe Matthias
ongwon Kim Cc: user Subject: Re: Weird Flink Kafka source watermark behavior ⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠ I'll try forcing # source tasks = # partitions tomorrow. Thank you, Dongwon, for all of your help! On Fri, Mar 18, 2022 at 12:20 AM Dongwon Kim mail

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dan Hill
I'll try forcing # source tasks = # partitions tomorrow. Thank you, Dongwon, for all of your help! On Fri, Mar 18, 2022 at 12:20 AM Dongwon Kim wrote: > I believe your job with per-partition watermarking should be working okay > even in a backfill scenario. > > BTW, is the problem still observe

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dongwon Kim
I believe your job with per-partition watermarking should be working okay even in a backfill scenario. BTW, is the problem still observed even with # sour tasks = # partitions? For committers: Is there a way to confirm that per-partition watermarking is used in TM log? On Fri, Mar 18, 2022 at 4:

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dan Hill
I hit this using event processing and no idleness detection. The same issue happens if I enable idleness. My code matches the code example for per-partition watermarking

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dongwon Kim
Hi Dan, I'm quite confused as you already use per-partition watermarking. What I meant in the reply is - If you don't use per-partition watermarking, # tasks < # partitions can cause the problem for backfill jobs. - If you don't use per-partition watermarking, # tasks = # partitions is going to b

Re: Weird Flink Kafka source watermark behavior

2022-03-17 Thread Dan Hill
Thanks Dongwon! Wow. Yes, I'm using per-partition watermarking [1]. Yes, my # source tasks < # kafka partitions. This should be called out in the docs or the bug should be fixed. On Thu, Mar 17, 2022 at 10:54 PM Dongwon Kim wrote: > Hi Dan, > > Do you use the per-partition watermarking expla

Re: Weird Flink Kafka source watermark behavior

2022-03-17 Thread Dongwon Kim
Hi Dan, Do you use the per-partition watermarking explained in [1]? I've also experienced a similar problem when running backfill jobs specifically when # source tasks < # kafka partitions. - When # source tasks = # kafka partitions, the backfill job works as expected. - When # source tasks < # ka

Re: Weird Flink Kafka source watermark behavior

2022-03-17 Thread Dan Hill
I'm following the example from this section: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-strategies-and-the-kafka-connector On Thu, Mar 17, 2022 at 10:26 PM Dan Hill wrote: > Other points > - I'm using the kafka timestamp a

Re: Weird Flink Kafka source watermark behavior

2022-03-17 Thread Dan Hill
Other points - I'm using the kafka timestamp as event time. - The same issue happens even if I use an idle watermark. On Thu, Mar 17, 2022 at 10:17 PM Dan Hill wrote: > There are 12 Kafka partitions (to keep the structure similar to other low > traffic environments). > > On Thu, Mar 17, 2022 at

Re: Weird Flink Kafka source watermark behavior

2022-03-17 Thread Dan Hill
There are 12 Kafka partitions (to keep the structure similar to other low traffic environments). On Thu, Mar 17, 2022 at 10:13 PM Dan Hill wrote: > Hi. > > I'm running a backfill from a kafka topic with very few records spread > across a few days. I'm seeing a case where the records coming from

Weird Flink Kafka source watermark behavior

2022-03-17 Thread Dan Hill
Hi. I'm running a backfill from a kafka topic with very few records spread across a few days. I'm seeing a case where the records coming from a kafka source have a watermark that's more recent (by hours) than the event time. I haven't seen this before when running this. This violates what I'd as