Hi Samuel,

I'm glad to hear that! Let us know how the problem is finally solved.
Personally I'd upgrade to 1.15.3.

Salva

On Mon, Nov 7, 2022 at 9:42 AM Samuel Chase <[email protected]> wrote:

> Hi Salva,
>
> Thanks for the pointers. They were helpful in gaining a better
> understanding what happened.
>
> In both situations, these outages occurred at a time of the lowest
> traffic in a day. Due to business-logic reasons, we are using a
> partition key which may not result in even distribution across all
> partitions. It seems conceivable to me that during times of low
> traffic some partitions may not see any events for some time.
>
> Now, with no watermarking strategy, I believe we are running into the
> problem described in
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#dealing-with-idle-sources
> where the Watermark cannot move forward. The docs recommend using a
> Watermark strategy i.e. withIdleness which detect inputs as being idle
> which will not hold the watermark back.
>
> However, https://issues.apache.org/jira/browse/FLINK-28975 suggests
> there is an issue with `withIdleness`. Am I right in understanding
> that we cannot safely use `withIdleness` now in the Watermarking
> strategy. Does this affect me? I am not doing any joins on the
> streams, I just need the message to be sent to the sink, and no lag
> building up.
>
> Plan of action:
>
> A. Use Watermarking strategy `withIdleness` on 1.15.2 (which is
> affected by FLINK-28975).
>
> OR
>
> B. Upgrade to 1.15.3 (FLINK-28975 is fixed here) with a Watermarking
> strategy `withIdleness`.
>
> On Mon, Nov 7, 2022 at 1:05 PM Salva Alcántara <[email protected]>
> wrote:
> >
> > Hi Samuel,
> >
> > Maybe related to this https://issues.apache.org/jira/browse/FLINK-28975?
> See also:
> > -
> https://stackoverflow.com/questions/72654182/flink-interval-join-datastream-with-kafkasource-drops-all-records
> >
> > I left a similar comment in your SO post.
> >
> > Regards,
> >
> > Salva
> >
> > On Mon, Nov 7, 2022 at 7:27 AM Samuel Chase <[email protected]>
> wrote:
> >>
> >> Hello,
> >>
> >> At work we are using Flink to store timers and notify us when they are
> >> triggered. It's been working great over several versions over the
> >> years. Flink 1.5 -> Flink 1.9 -> Flink 1.15.2.
> >>
> >> A few months ago we upgraded from Flink 1.9 to Flink 1.15.2. In the
> >> process we had to upgrade all the Flink API code in our job to use the
> >> new APIs.
> >>
> >> Our job code has a Kafka Source and a Kafka Sink. For our Source, we
> >> are currently using `WatermarkStrategy.noWatermarks()`. It has been
> >> running fine ever since we upgraded, but in the last few weeks we have
> >> faced two outages.
> >>
> >> Configuration:
> >>
> >> 2 JobManager nodes
> >> 5 TaskManager nodes (4 slots each)
> >> Parallelism: 16
> >> Source topic: 30 partitions
> >> Using `setStartingOffsets(OffsetsInitializer.latest())` while
> >> initializing the source.
> >>
> >> Outage #1
> >>
> >> Our monitoring system alerted us that lag is building up on one
> >> partition (out of 30). We did not know of anything we could to do
> >> jumpstart consumption on that partition other than by forcing a
> >> reassignment. When the TaskManager service on the node to which the
> >> partition was assigned was restarted, the lag reduced soon after.
> >>
> >> Outage #2
> >>
> >> Something similar happened again, but this time, lag was building up
> >> on 9 (out of 30) partitions. Once again, we restarted the TaskManager
> >> services on all the nodes, and it started consuming once again.
> >>
> >> We asked a question on SO,
> >> https://stackoverflow.com/q/74272277/2165719 and was directed to ask
> >> on the mailing list as well.
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> >>
> >> In another post, https://stackoverflow.com/a/70101290/2165719 there is
> >> a suggestion to use `WatermarkStrategy.withIdleness(...)`. Could this
> >> help us?
> >>
> >> Any help/guidance here would be much appreciated.
> >>
> >> Thanks,
>

Reply via email to