811d3b279c8b26ed99ff0883b7630242 is the operator id.
If I'm not mistaken, running the job graph generation (e.g. the main
method) in DEBUG log level will show you all the IDs generated. This should
help you map this ID to your code.

On Wed, Dec 8, 2021 at 7:52 AM Dan Hill <quietgol...@gmail.com> wrote:

> Nothing changed (as far as I know).  It's the same binary and the same
> args.  It's Flink v1.12.3.  I'm going to switch away from auto-gen uids and
> see if that helps.  The job randomly started failing to checkpoint.  I
> cancelled the job and started it from the last successful checkpoint.
>
> I'm confused why `811d3b279c8b26ed99ff0883b7630242` is used and not the
> auto-generated uid.  That seems like a bug.
>
> On Tue, Dec 7, 2021 at 10:40 PM Robert Metzger <metrob...@gmail.com>
> wrote:
>
>> Hi Dan,
>>
>> When restoring a savepoint/checkpoint, Flink is matching the state for
>> the operators based on the uuid of the operator. The exception says that
>> there is some state that doesn't match any operator. So from Flink's
>> perspective, the operator is gone.
>> Here is more information:
>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#assigning-operator-ids
>>
>>
>> Somehow something must have changed in your job: Did you change the Flink
>> version?
>>
>> Hope this helps!
>>
>> On Wed, Dec 8, 2021 at 5:49 AM Dan Hill <quietgol...@gmail.com> wrote:
>>
>>> I'm restoring the job with the same binary and same flags/args.
>>>
>>> On Tue, Dec 7, 2021 at 8:48 PM Dan Hill <quietgol...@gmail.com> wrote:
>>>
>>>> Hi.  I noticed this warning has "operator
>>>> 811d3b279c8b26ed99ff0883b7630242" in it.  I assume this should be an
>>>> operator uid or name.  It looks like something else.  What is it?  Is
>>>> something corrupted?
>>>>
>>>>
>>>> org.apache.flink.runtime.client.JobInitializationException: Could not 
>>>> instantiate JobManager.
>>>>    at 
>>>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:494)
>>>>    at 
>>>> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>    at java.lang.Thread.run(Thread.java:748)
>>>> Caused by: java.lang.IllegalStateException: Failed to rollback to 
>>>> checkpoint/savepoint 
>>>> s3a://my-flink-state/checkpoints/ce9b90eafde97ca4629c13936c34426f/chk-626. 
>>>> Cannot map checkpoint/savepoint state for operator 
>>>> 811d3b279c8b26ed99ff0883b7630242 to the new program, because the operator 
>>>> is not available in the new program. If you want to allow to skip this, 
>>>> you can set the --allowNonRestoredState option on the CLI.
>>>>    at 
>>>> org.apache.flink.runtime.checkpoint.Checkpoints.throwNonRestoredStateException(Checkpoints.java:226)
>>>>    at 
>>>> org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:190)
>>>>    at 
>>>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1627)
>>>>    at 
>>>> org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:362)
>>>>    at 
>>>> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:292)
>>>>    at 
>>>> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:249)
>>>>    at 
>>>> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:133)
>>>>    at 
>>>> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:111)
>>>>    at 
>>>> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:345)
>>>>    at 
>>>> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:330)
>>>>    at 
>>>> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:95)
>>>>    at 
>>>> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:39)
>>>>    at 
>>>> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:162)
>>>>    at 
>>>> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:86)
>>>>    at 
>>>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:478)
>>>>    ... 4 more
>>>>
>>>>

Reply via email to