Hi all,

Is there any way to prevent restart of flink job, or override the
checkpoint metadata, if for some reason there exists a checkpoint by same
name. I get the following exception and my job restarts, have been trying
to find solution for a very long time but havent found anything useful yet,
other than manually cleaning.

2023-02-27 10:00:50,360 WARN
org.apache.flink.runtime.checkpoint.CheckpointFailureManager
[] - Failed to trigger or complete checkpoint 1 for job
000000006e6b13320000000000000000. (0 consecutive failed attempts so far)

org.apache.flink.runtime.checkpoint.CheckpointException: Failure to
finalize checkpoint.

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1375)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1265)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1157)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]

at java.lang.Thread.run(Thread.java:834) [?:?]

Caused by: java.io.IOException: Target file
file:/opt/flink/pm/checkpoint/000000006e6b13320000000000000000/chk-1/_metadata
already exists.

at
org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.getOutputStreamWrapper(FsCheckpointMetadataOutputStream.java:168)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:64)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:109)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:332)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1361)
~[event_executor-1.0-SNAPSHOT.jar:?]

... 7 more

2023-02-27 10:00:50,374 WARN  org.apache.flink.runtime.jobmaster.JobMaster
              [] - Error while processing AcknowledgeCheckpoint message

org.apache.flink.runtime.checkpoint.CheckpointException: Could not finalize
the pending checkpoint 1. Failure reason: Failure to finalize checkpoint.

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1381)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1265)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1157)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119)
~[event_executor-1.0-SNAPSHOT.jar:?]

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]

at java.lang.Thread.run(Thread.java:834) [?:?]

Caused by: java.io.IOException: Target file
file:/opt/flink/pm/checkpoint/000000006e6b13320000000000000000/chk-1/_metadata
already exists.

at
org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.getOutputStreamWrapper(FsCheckpointMetadataOutputStream.java:168)
~[event_executor-1.0-SNAPSHOT.jar:?]


Please let me know if anyone knows how to resolve this issue.

Thanks and Regards

Amenreet Singh Sodhi

Reply via email to