Hi all,

We run a Flink application on Kubernetes in Application Mode using Kafka
with exactly-once-semantics and high availability.

We are looking into a specific failure scenario: a flink job that has too
short a checkpoint timeout (execution.checkpointing.timeout) and at some
point during the job's execution, checkpoints begin to fail.

Is there a way to update the checkpoint timeout
(execution.checkpointing.timeout) of this job, in-place ie. without
creating a new job, or restoring from an old savepoint/checkpoint? Note:
one idea may be to take a savepoint, and then restore from that savepoint
with the new configuration, however this is not possible because if
checkpoints are timing out, so are savepoints and thus save points cannot
be taken. Are there any other ways to handle this situation?

We want to ensure exactly-once semantics are respected.

Thanks in advance!

Reply via email to