Hi Robin,
this is a very good observation and maybe even unintended behavior.
Maybe Arvid in CC is more familiar with the checkpointing?
Regards,
Timo
On 02.04.20 15:37, Robin Cassan wrote:
Hi all,
I am wondering if there is a way to make a flink job fail (not cancel
it) when one or several checkpoints have failed due to being expired
(taking longer than the timeout) ?
I am using Flink 1.9.2 and have set
`*setTolerableCheckpointFailureNumber(1)*` which doesn't do the trick.
Looking into the CheckpointFailureManager.java class, it looks like this
only works when the checkpoint failure reason is
`*CHECKPOINT_DECLINED*`, but the number of failures isn't incremented on
`*CHECKPOINT_EXPIRED*`.
Am I missing something?
Thanks!