Hi all, We run a Flink application on Kubernetes in Application Mode using Kafka with exactly-once-semantics and high availability.
We are looking into a specific failure scenario: a flink job that has too short a checkpoint timeout (execution.checkpointing.timeout) and at some point during the job's execution, checkpoints begin to fail. Is there a way to update the checkpoint timeout (execution.checkpointing.timeout) of this job, in-place ie. without creating a new job, or restoring from an old savepoint/checkpoint? Note: one idea may be to take a savepoint, and then restore from that savepoint with the new configuration, however this is not possible because if checkpoints are timing out, so are savepoints and thus save points cannot be taken. Are there any other ways to handle this situation? We want to ensure exactly-once semantics are respected. Thanks in advance!
