Hi, amenreet, As Hangxiang said, we should use a new checkpoint dir if the new job has the same jobId as the old one . Or else you should not use a fixed jobId and the checkpoint dir will not conflict.
Best, Hang Hangxiang Yu <master...@gmail.com> 于2023年5月10日周三 10:35写道: > Hi, > I guess you used a fixed JOB_ID, and configured the same checkpoint dir as > before ? > And you may also start the job without before state ? > The new job cannot know anything about before checkpoints, that's why the > new job will fail when it tries to generate a new checkpoint. > I'd like to suggest you to use different JOB_ID for different jobs, or set > a different checkpoint dir for a new job. > > On Tue, May 9, 2023 at 9:38 PM amenreet sodhi <amenso...@gmail.com> wrote: > >> Hi all, >> >> Is there any way to prevent restart of flink job, or override the >> checkpoint metadata, if for some reason there exists a checkpoint by same >> name. I get the following exception and my job restarts, have been trying >> to find solution for a very long time but havent found anything useful yet, >> other than manually cleaning. >> >> 2023-02-27 10:00:50,360 WARN >> org.apache.flink.runtime.checkpoint.CheckpointFailureManager >> [] - Failed to trigger or complete checkpoint 1 for job >> 000000006e6b13320000000000000000. (0 consecutive failed attempts so far) >> >> org.apache.flink.runtime.checkpoint.CheckpointException: Failure to >> finalize checkpoint. >> >> at >> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1375) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1265) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1157) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >> [?:?] >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >> [?:?] >> >> at java.lang.Thread.run(Thread.java:834) [?:?] >> >> Caused by: java.io.IOException: Target file >> file:/opt/flink/pm/checkpoint/000000006e6b13320000000000000000/chk-1/_metadata >> already exists. >> >> at >> org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.getOutputStreamWrapper(FsCheckpointMetadataOutputStream.java:168) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:64) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:109) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:332) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1361) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> ... 7 more >> >> 2023-02-27 10:00:50,374 WARN org.apache.flink.runtime.jobmaster.JobMaster >> [] - Error while processing AcknowledgeCheckpoint message >> >> org.apache.flink.runtime.checkpoint.CheckpointException: Could not >> finalize the pending checkpoint 1. Failure reason: Failure to finalize >> checkpoint. >> >> at >> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1381) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1265) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1157) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >> [?:?] >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >> [?:?] >> >> at java.lang.Thread.run(Thread.java:834) [?:?] >> >> Caused by: java.io.IOException: Target file >> file:/opt/flink/pm/checkpoint/000000006e6b13320000000000000000/chk-1/_metadata >> already exists. >> >> at >> org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.getOutputStreamWrapper(FsCheckpointMetadataOutputStream.java:168) >> ~[event_executor-1.0-SNAPSHOT.jar:?] >> >> >> Please let me know if anyone knows how to resolve this issue. >> >> Thanks and Regards >> >> Amenreet Singh Sodhi >> >> >> > > -- > Best, > Hangxiang. >