Hi Samuel,

Maybe related to this https://issues.apache.org/jira/browse/FLINK-28975? See
also:
-
https://stackoverflow.com/questions/72654182/flink-interval-join-datastream-with-kafkasource-drops-all-records

I left a similar comment in your SO post.

Regards,

Salva

On Mon, Nov 7, 2022 at 7:27 AM Samuel Chase <[email protected]> wrote:

> Hello,
>
> At work we are using Flink to store timers and notify us when they are
> triggered. It's been working great over several versions over the
> years. Flink 1.5 -> Flink 1.9 -> Flink 1.15.2.
>
> A few months ago we upgraded from Flink 1.9 to Flink 1.15.2. In the
> process we had to upgrade all the Flink API code in our job to use the
> new APIs.
>
> Our job code has a Kafka Source and a Kafka Sink. For our Source, we
> are currently using `WatermarkStrategy.noWatermarks()`. It has been
> running fine ever since we upgraded, but in the last few weeks we have
> faced two outages.
>
> Configuration:
>
> 2 JobManager nodes
> 5 TaskManager nodes (4 slots each)
> Parallelism: 16
> Source topic: 30 partitions
> Using `setStartingOffsets(OffsetsInitializer.latest())` while
> initializing the source.
>
> Outage #1
>
> Our monitoring system alerted us that lag is building up on one
> partition (out of 30). We did not know of anything we could to do
> jumpstart consumption on that partition other than by forcing a
> reassignment. When the TaskManager service on the node to which the
> partition was assigned was restarted, the lag reduced soon after.
>
> Outage #2
>
> Something similar happened again, but this time, lag was building up
> on 9 (out of 30) partitions. Once again, we restarted the TaskManager
> services on all the nodes, and it started consuming once again.
>
> We asked a question on SO,
> https://stackoverflow.com/q/74272277/2165719 and was directed to ask
> on the mailing list as well.
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
>
> In another post, https://stackoverflow.com/a/70101290/2165719 there is
> a suggestion to use `WatermarkStrategy.withIdleness(...)`. Could this
> help us?
>
> Any help/guidance here would be much appreciated.
>
> Thanks,
>

Reply via email to