Hi Kevin If you want to change this configuration(execution.checkpointing.timeout) without restarting the job, as far as I know, there may not be such a method. But could you consider increasing this value by default?
Best, Guowei On Wed, Nov 3, 2021 at 5:15 AM Kevin Lam <[email protected]> wrote: > Hi all, > > We run a Flink application on Kubernetes in Application Mode using Kafka > with exactly-once-semantics and high availability. > > We are looking into a specific failure scenario: a flink job that has too > short a checkpoint timeout (execution.checkpointing.timeout) and at some > point during the job's execution, checkpoints begin to fail. > > Is there a way to update the checkpoint timeout > (execution.checkpointing.timeout) of this job, in-place ie. without > creating a new job, or restoring from an old savepoint/checkpoint? Note: > one idea may be to take a savepoint, and then restore from that savepoint > with the new configuration, however this is not possible because if > checkpoints are timing out, so are savepoints and thus save points cannot > be taken. Are there any other ways to handle this situation? > > We want to ensure exactly-once semantics are respected. > > Thanks in advance! >
