Hello,
Just a shot in the dark here, but could it be related to
https://issues.apache.org/jira/browse/FLINK-32241 ?
Such failures can cause many exceptions, but I think the ones you've
included aren't pointing to the root cause, so I'm not sure if that issue
applies to you.
Regards,
Alexis.
On
Hi Yanfei,
We were never able to restore from a checkpoint, we ended up restoring from
a savepoint as fallback. Would those logs suggest we failed to take a
checkpoint before the job manager restarted? Our observabillity monitors
showed no failed checkpoints.
Here is an exception that occurred
Hey Jacqlyn,
According to the stack trace, it seems that there is a problem when
the checkpoint is triggered. Is this the problem after the restore?
would you like to share some logs related to restoring?
Best,
Yanfei
Jacqlyn Bender via user 于2023年9月8日周五 05:11写道:
>
> Hey folks,
>
>
> We
Hey folks,
We experienced a pipeline failure where our job manager restarted and we
were for some reason unable to restore from our last successful checkpoint.
We had regularly completed checkpoints every 10 minutes up to this failure
and 0 failed checkpoints logged. Using Flink version 1.17.1.