[ 
https://issues.apache.org/jira/browse/FLINK-32552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabio Wanner resolved FLINK-32552.
----------------------------------
    Release Note: Not a bug of the flink k8s operator.
      Resolution: Not A Bug

> Mixed up Flink session job deployments
> --------------------------------------
>
>                 Key: FLINK-32552
>                 URL: https://issues.apache.org/jira/browse/FLINK-32552
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Fabio Wanner
>            Priority: Major
>
> *Context*
> In the scope of end-to-end tests we deploy all the Flink session jobs we have 
> regularly in a staging environment. Some of the jobs are bundled together in 
> one helm chart and therefore deployed at the same time. There are around 40 
> individual Flink jobs (running on the same Flink session cluster). The 
> session cluster is individual for each e2e test run. The problems described 
> below happen scarcely (1 in ~ 50 run maybe).
> *Problem*
> Rarely the operator seems to "mix up" the deployments. This can be seen in 
> the Flink cluster logs as multiple {{Received JobGraph submission '<JOB 
> NAME>' (<JOB_ID>)}} logs are created from jobs with the same job_id. This 
> results in errors such as:
> {{DuplicateJobSubmissionException}} or {{ClassNotFoundException.}}
> It' also visible in the FlinkSessionJob resource: status.jobStatus.jobName 
> does not match the expected job name of the job being deployed (The job name 
> is passed to the application via argument).
> So far we were unable to reliably reproduce the error.
> *Details*
> The following lines show the status of 3 jobs form the view point of the 
> Flink cluster dashboard, and the FlinkSessionJob ressource:
>  
> *aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615*
> Apache Flink Dashboard:
>  * State: Restarting
>  * ID: a7d36f3881f943a00000000000000002
>  * Exceptions: Cannot load user class: 
> aelps.pipelines.aletsch.smc.SMCUrlMapper
> FlinkSessionJob Ressource:
>  * State: RUNNING
>  * jobId: a1221c743367497b0000000000000002
>  * uid: a1221c74-3367-497b-ad2f-8793ab23919d
>  
> *aletsch_mat_e5730831db8092adb12f5189c4c895ef3a268615*
> Apache Flink Dashboard:
>  * State: -
>  * ID: -
> FlinkSessionJob Ressource:
>  * State: UPGRADING
>  * jobId: -
>  * uid: a7d36f38-81f9-43a0-898f-19b950430e9d
> Flink K8s Operator:
>  * Exceptions: DuplicateJobSubmissionException: Job has already been 
> submitted.
>  
> *aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615*
> Apache Flink Dashboard:
>  * State: Running
>  * ID: e692c2dfaa18441c0000000000000002
>  * Exceptions: -
> FlinkSessionJob Ressource:
>  * State: RUNNING
>  * jobId: e692c2dfaa18441c0000000000000002
>  * uid: e692c2df-aa18-441c-a352-88aefa9a3017
> As we can see the *aletsch_smc* job is presumably running according to the 
> FlinkSessionJob resource, but crash-looping in the cluster and it has the 
> jobID matching the uid of the resource of {*}aletsch_mat{*}. While 
> *aletsch_mat* is not even running. The following logs also show some 
> suspicious entries: There are several {{Received JobGraph submission}} from 
> different jobs with the same jobID.
>  
> *Logs*
> The logs are filtered by the 3 jobIds from above.
>  
> JobID: a7d36f3881f943a00000000000000002
> {code:bash}
> Flink Cluster
>     ...
>     023-07-06 10:23:50,552 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a7d36f3881f943a00000000000000002) switched from state RUNNING to RESTARTING.
>     2023-07-06 10:23:50           file: 
> '/tmp/tm_10.0.11.159:6122-e9fadc/blobStorage/job_a7d36f3881f943a00000000000000002/blob_p-40c7a30adef8868254191d2cf2dbc4cb7ab46f0d-8a02a0583d91c5e8e6c94f378aa444c2'
>  (valid JAR)
>     2023-07-06 10:23:50,522 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=4}]
>     2023-07-06 10:23:50,522 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=3}]
>     2023-07-06 10:23:50,522 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=2}]
>     2023-07-06 10:23:50,522 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=1}]
>     2023-07-06 10:23:50,512 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a7d36f3881f943a00000000000000002) switched from state RESTARTING to RUNNING.
>     2023-07-06 10:23:48,979 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Clearing resource requirements of job a7d36f3881f943a00000000000000002
>     2023-07-06 10:23:48,853 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=1}]
>     2023-07-06 10:23:48,853 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=2}]
>     2023-07-06 10:23:48,853 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=3}]
>     2023-07-06 10:23:48           file: 
> '/tmp/tm_10.0.11.159:6122-e9fadc/blobStorage/job_a7d36f3881f943a00000000000000002/blob_p-40c7a30adef8868254191d2cf2dbc4cb7ab46f0d-8a02a0583d91c5e8e6c94f378aa444c2'
>  (valid JAR)
>     2023-07-06 10:23:48,661 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a7d36f3881f943a00000000000000002) switched from state RUNNING to RESTARTING.
>     2023-07-06 10:23:48,583 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=4}]
>     2023-07-06 10:23:48,583 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=3}]
>     2023-07-06 10:23:48,583 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=2}]
>     2023-07-06 10:23:48,582 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=1}]
>     2023-07-06 10:23:48,573 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a7d36f3881f943a00000000000000002) switched from state RESTARTING to RUNNING.
>     2023-07-06 10:23:47,562 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received 
> JobGraph submission 'aletsch_mat_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a7d36f3881f943a00000000000000002).
>     2023-07-06 10:23:47,518 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Clearing resource requirements of job a7d36f3881f943a00000000000000002
>     2023-07-06 10:23:47,517 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=1}]
>     2023-07-06 10:23:47,517 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=2}]
>     2023-07-06 10:23:47,516 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=3}]
>     2023-07-06 10:23:47,463 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Submitting Job with JobId=a7d36f3881f943a00000000000000002.
>     2023-07-06 10:23:47,463 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Job a7d36f3881f943a00000000000000002 is submitted.
>     2023-07-06 10:23:47,104 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a7d36f3881f943a00000000000000002) switched from state RUNNING to RESTARTING.
>     2023-07-06 10:23:46,804 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Offer 
> reserved slots to the leader of job a7d36f3881f943a00000000000000002.
>     2023-07-06 10:23:46,804 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Establish 
> JobManager connection for job a7d36f3881f943a00000000000000002.
>     2023-07-06 10:23:46,799 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Successful 
> registration at job manager 
> akka.tcp://flink@10.0.11.158:6123/user/rpc/jobmanager_2 for job 
> a7d36f3881f943a00000000000000002.
>     2023-07-06 10:23:46,577 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 221b24b50413805c9e35d7620b8a00b8 for job 
> a7d36f3881f943a00000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:46,577 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 49d3c8cd1080bd38c0144c3d3cc597cd for job 
> a7d36f3881f943a00000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:46,577 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 819f34cc8957066478fb4b3549367d24 for job 
> a7d36f3881f943a00000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:46,574 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Add job 
> a7d36f3881f943a00000000000000002 for job leader monitoring.
>     2023-07-06 10:23:46,570 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 36802a7de1487f3fb1b6a3b509bd5e20 for job 
> a7d36f3881f943a00000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:46,560 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a7d36f3881f943a00000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=4}]
>     2023-07-06 10:23:46,556 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Registered job manager 
> aaa9331f70b07a195b5f09d57d1b4...@akka.tcp://flink@10.0.11.158:6123/user/rpc/jobmanager_2
>  for job a7d36f3881f943a00000000000000002.
>     2023-07-06 10:23:46,528 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Registering job manager 
> aaa9331f70b07a195b5f09d57d1b4...@akka.tcp://flink@10.0.11.158:6123/user/rpc/jobmanager_2
>  for job a7d36f3881f943a00000000000000002.
>     2023-07-06 10:23:46,480 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a7d36f3881f943a00000000000000002) switched from state CREATED to RUNNING.
>     2023-07-06 10:23:46,476 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Starting 
> execution of job 'aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a7d36f3881f943a00000000000000002) under job master id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:46,466 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using 
> failover strategy 
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@62877000
>  for aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a7d36f3881f943a00000000000000002).
>     2023-07-06 10:23:46,079 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Running 
> initialization on master for job 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a7d36f3881f943a00000000000000002).
>     2023-07-06 10:23:46,059 INFO  
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils [] - 
> Found 0 checkpoints in 
> KubernetesStateHandleStore{configMapName='flink-cluster-aelps-staging-e5730831-a7d36f3881f943a00000000000000002-config-map'}.
>     2023-07-06 10:23:46,051 INFO  
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils [] - 
> Recovering checkpoints from 
> KubernetesStateHandleStore{configMapName='flink-cluster-aelps-staging-e5730831-a7d36f3881f943a00000000000000002-config-map'}.
>     2023-07-06 10:23:46,006 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using 
> restart back off time strategy 
> ExponentialDelayRestartBackoffTimeStrategy(initialBackoffMS=1000, 
> maxBackoffMS=300000, backoffMultiplier=2.0, resetBackoffThresholdMS=3600000, 
> jitterFactor=0.5, currentBackoffMS=1000, lastFailureTimestamp=0) for 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a7d36f3881f943a00000000000000002).
>     2023-07-06 10:23:45,987 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - 
> Initializing job 'aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a7d36f3881f943a00000000000000002).
>     2023-07-06 10:23:45,966 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received 
> JobGraph submission 
> 'aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a7d36f3881f943a00000000000000002).
>     2023-07-06 10:23:45,965 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received 
> JobGraph submission 'aletsch_mat_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a7d36f3881f943a00000000000000002).
>     2023-07-06 10:23:45,915 INFO  
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore     [] - Added 
> JobGraph(jobId: a7d36f3881f943a00000000000000002) to 
> KubernetesStateHandleStore{configMapName='flink-cluster-aelps-staging-e5730831-cluster-config-map'}.
>     2023-07-06 10:23:45,859 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Submitting 
> job 'aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a7d36f3881f943a00000000000000002).
>     2023-07-06 10:23:45,857 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received 
> JobGraph submission 'aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a7d36f3881f943a00000000000000002).
>     2023-07-06 10:23:45,705 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Submitting Job with JobId=a7d36f3881f943a00000000000000002.
>     2023-07-06 10:23:45,705 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Job a7d36f3881f943a00000000000000002 is submitted.
>     2023-07-06 10:23:45,705 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Submitting Job with JobId=a7d36f3881f943a00000000000000002.
>     2023-07-06 10:23:45,705 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Job a7d36f3881f943a00000000000000002 is submitted.
>     2023-07-06 10:23:45,705 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Submitting Job with JobId=a7d36f3881f943a00000000000000002.
>     2023-07-06 10:23:45,705 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Job a7d36f3881f943a00000000000000002 is submitted.
>     Flink Operator
>     2023-07-06 10:26:25,792 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-mat-staging-e5730831] Submitting job: 
> a7d36f3881f943a00000000000000002 to session cluster.
>     2023-07-06 10:25:05,163 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-mat-staging-e5730831] Submitting job: 
> a7d36f3881f943a00000000000000002 to session cluster.
>     2023-07-06 10:24:24,553 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-mat-staging-e5730831] Submitting job: 
> a7d36f3881f943a00000000000000002 to session cluster.
>     2023-07-06 10:24:03,850 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-mat-staging-e5730831] Submitting job: 
> a7d36f3881f943a00000000000000002 to session cluster.
>     2023-07-06 10:23:53,094 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-mat-staging-e5730831] Submitting job: 
> a7d36f3881f943a00000000000000002 to session cluster.
>     2023-07-06 10:23:47,346 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-mat-staging-e5730831] Submitting job: 
> a7d36f3881f943a00000000000000002 to session cluster.
>     2023-07-06 10:23:45,372 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-mat-staging-e5730831] Submitting job: 
> a7d36f3881f943a00000000000000002 to session cluster.
> {code}
>  
> JobID: a1221c743367497b0000000000000002
> {code:bash}
> Flink Cluster
>     2023-07-06 11:23:48,062 INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
> checkpoint 1 for job a1221c743367497b0000000000000002 (48548 bytes, 
> checkpointDuration=107 ms, finalizationTime=33 ms).
>     2023-07-06 11:23:47,937 INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
> checkpoint 1 (type=CheckpointType{name='Checkpoint', 
> sharingFilesStrategy=FORWARD_BACKWARD}) @ 1688635427922 for job 
> a1221c743367497b0000000000000002.
>     2023-07-06 10:23:48,567 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Offer 
> reserved slots to the leader of job a1221c743367497b0000000000000002.
>     2023-07-06 10:23:48,567 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Establish 
> JobManager connection for job a1221c743367497b0000000000000002.
>     2023-07-06 10:23:48,567 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Successful 
> registration at job manager 
> akka.tcp://flink@10.0.11.158:6123/user/rpc/jobmanager_7 for job 
> a1221c743367497b0000000000000002.
>     2023-07-06 10:23:48,009 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request cae6932e2409d5fece3f6b4636e3c71a for job 
> a1221c743367497b0000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:48,003 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 8a57f3ecff07d300aebb33f6b3545aed for job 
> a1221c743367497b0000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:48,003 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 7a4a0cfd16eec4a1cb043cce5f989db0 for job 
> a1221c743367497b0000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:48,002 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Add job 
> a1221c743367497b0000000000000002 for job leader monitoring.
>     2023-07-06 10:23:48,002 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 92cbc64513fa703e4acf28bbb3088a58 for job 
> a1221c743367497b0000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:48,999 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> a1221c743367497b0000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=4}]
>     2023-07-06 10:23:47,998 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Registered job manager 
> aaa9331f70b07a195b5f09d57d1b4...@akka.tcp://flink@10.0.11.158:6123/user/rpc/jobmanager_7
>  for job a1221c743367497b0000000000000002.
>     2023-07-06 10:23:47,953 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Registering job manager 
> aaa9331f70b07a195b5f09d57d1b4...@akka.tcp://flink@10.0.11.158:6123/user/rpc/jobmanager_7
>  for job a1221c743367497b0000000000000002.
>     2023-07-06 10:23:47,922 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a1221c743367497b0000000000000002) switched from state CREATED to RUNNING.
>     2023-07-06 10:23:47,887 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Starting 
> execution of job 'aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a1221c743367497b0000000000000002) under job master id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:47,887 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using 
> failover strategy 
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@2222ba4d
>  for aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a1221c743367497b0000000000000002).
>     2023-07-06 10:23:47,880 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Running 
> initialization on master for job 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a1221c743367497b0000000000000002).
>     2023-07-06 10:23:47,872 INFO  
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils [] - 
> Found 0 checkpoints in 
> KubernetesStateHandleStore{configMapName='flink-cluster-aelps-staging-e5730831-a1221c743367497b0000000000000002-config-map'}.
>     2023-07-06 10:23:47,867 INFO  
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils [] - 
> Recovering checkpoints from 
> KubernetesStateHandleStore{configMapName='flink-cluster-aelps-staging-e5730831-a1221c743367497b0000000000000002-config-map'}.
>     2023-07-06 10:23:47,832 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using 
> restart back off time strategy 
> ExponentialDelayRestartBackoffTimeStrategy(initialBackoffMS=1000, 
> maxBackoffMS=300000, backoffMultiplier=2.0, resetBackoffThresholdMS=3600000, 
> jitterFactor=0.5, currentBackoffMS=1000, lastFailureTimestamp=0) for 
> aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
> (a1221c743367497b0000000000000002).
>     2023-07-06 10:23:47,832 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - 
> Initializing job 'aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a1221c743367497b0000000000000002).
>     2023-07-06 10:23:47,820 INFO  
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore     [] - Added 
> JobGraph(jobId: a1221c743367497b0000000000000002) to 
> KubernetesStateHandleStore{configMapName='flink-cluster-aelps-staging-e5730831-cluster-config-map'}.
>     2023-07-06 10:23:47,780 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Submitting 
> job 'aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a1221c743367497b0000000000000002).
>     2023-07-06 10:23:47,776 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received 
> JobGraph submission 'aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615' 
> (a1221c743367497b0000000000000002).
>     2023-07-06 10:23:47,668 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Submitting Job with JobId=a1221c743367497b0000000000000002.
>     2023-07-06 10:23:47,668 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Job a1221c743367497b0000000000000002 is submitted.
>     Flink Operator
>     2023-07-06 10:23:48,007 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-smc-staging-e5730831] Submitted job: 
> a1221c743367497b0000000000000002 to session cluster.
>     2023-07-06 10:23:47,505 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-smc-staging-e5730831] Submitting job: 
> a1221c743367497b0000000000000002 to session cluster.
>     2023-07-06 10:23:45,416 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-smc-staging-e5730831] Submitting job: 
> a1221c743367497b0000000000000002 to session cluster.
> {code}
> JobID: e692c2dfaa18441c0000000000000002
> {code:bash}
> Flink Cluster
>     2023-07-06 11:23:48,004 INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
> checkpoint 1 for job e692c2dfaa18441c0000000000000002 (8194 bytes, 
> checkpointDuration=125 ms, finalizationTime=28 ms).
>     2023-07-06 11:23:47,867 INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
> checkpoint 1 (type=CheckpointType{name='Checkpoint', 
> sharingFilesStrategy=FORWARD_BACKWARD}) @ 1688635427851 for job 
> e692c2dfaa18441c0000000000000002.
>     2023-07-06 10:23:48,568 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Offer 
> reserved slots to the leader of job e692c2dfaa18441c0000000000000002.
>     2023-07-06 10:23:48,568 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Establish 
> JobManager connection for job e692c2dfaa18441c0000000000000002.
>     2023-07-06 10:23:48,568 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Successful 
> registration at job manager 
> akka.tcp://flink@10.0.11.158:6123/user/rpc/jobmanager_6 for job 
> e692c2dfaa18441c0000000000000002.
>     2023-07-06 10:23:48,002 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 5e5a0e55fac280bf31abf29a20bce684 for job 
> e692c2dfaa18441c0000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:48,002 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 1cdbce54f4376a1df86430f97dab6858 for job 
> e692c2dfaa18441c0000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:48,002 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request 352db7288d0e4d1775d5f52dd14c769d for job 
> e692c2dfaa18441c0000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:48,001 INFO  
> org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Add job 
> e692c2dfaa18441c0000000000000002 for job leader monitoring.
>     2023-07-06 10:23:48,000 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive 
> slot request bffed3e4a4c8573049a4119bd7e15f19 for job 
> e692c2dfaa18441c0000000000000002 from resource manager with leader id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:48,998 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager 
> [] - Received resource requirements from job 
> e692c2dfaa18441c0000000000000002: 
> [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
> numberOfRequiredSlots=4}]
>     2023-07-06 10:23:47,998 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Registered job manager 
> aaa9331f70b07a195b5f09d57d1b4...@akka.tcp://flink@10.0.11.158:6123/user/rpc/jobmanager_6
>  for job e692c2dfaa18441c0000000000000002.
>     2023-07-06 10:23:47,953 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Registering job manager 
> aaa9331f70b07a195b5f09d57d1b4...@akka.tcp://flink@10.0.11.158:6123/user/rpc/jobmanager_6
>  for job e692c2dfaa18441c0000000000000002.
>     2023-07-06 10:23:47,851 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job 
> aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615 
> (e692c2dfaa18441c0000000000000002) switched from state CREATED to RUNNING.
>     2023-07-06 10:23:47,845 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Starting 
> execution of job 'aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615' 
> (e692c2dfaa18441c0000000000000002) under job master id 
> aaa9331f70b07a195b5f09d57d1b40c5.
>     2023-07-06 10:23:47,844 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using 
> failover strategy 
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@7eeab246
>  for aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615 
> (e692c2dfaa18441c0000000000000002).
>     2023-07-06 10:23:47,834 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Running 
> initialization on master for job 
> aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615 
> (e692c2dfaa18441c0000000000000002).
>     2023-07-06 10:23:47,825 INFO  
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils [] - 
> Found 0 checkpoints in 
> KubernetesStateHandleStore{configMapName='flink-cluster-aelps-staging-e5730831-e692c2dfaa18441c0000000000000002-config-map'}.
>     2023-07-06 10:23:47,813 INFO  
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils [] - 
> Recovering checkpoints from 
> KubernetesStateHandleStore{configMapName='flink-cluster-aelps-staging-e5730831-e692c2dfaa18441c0000000000000002-config-map'}.
>     2023-07-06 10:23:47,782 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using 
> restart back off time strategy 
> ExponentialDelayRestartBackoffTimeStrategy(initialBackoffMS=1000, 
> maxBackoffMS=300000, backoffMultiplier=2.0, resetBackoffThresholdMS=3600000, 
> jitterFactor=0.5, currentBackoffMS=1000, lastFailureTimestamp=0) for 
> aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615 
> (e692c2dfaa18441c0000000000000002).
>     2023-07-06 10:23:47,781 INFO  
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - 
> Initializing job 'aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615' 
> (e692c2dfaa18441c0000000000000002).
>     2023-07-06 10:23:47,774 INFO  
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore     [] - Added 
> JobGraph(jobId: e692c2dfaa18441c0000000000000002) to 
> KubernetesStateHandleStore{configMapName='flink-cluster-aelps-staging-e5730831-cluster-config-map'}.
>     2023-07-06 10:23:47,703 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Submitting 
> job 'aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615' 
> (e692c2dfaa18441c0000000000000002).
>     2023-07-06 10:23:47,702 INFO  
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received 
> JobGraph submission 
> 'aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615' 
> (e692c2dfaa18441c0000000000000002).
>     2023-07-06 10:23:47,650 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Submitting Job with JobId=e692c2dfaa18441c0000000000000002.
>     2023-07-06 10:23:47,650 INFO  
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] 
> - Job e692c2dfaa18441c0000000000000002 is submitted.
>     Flink Operator
>     2023-07-06 10:23:47,973 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-wp-wafer-staging-e5730831] Submitted job: 
> e692c2dfaa18441c0000000000000002 to session cluster.
>     2023-07-06 10:23:47,505 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-wp-wafer-staging-e5730831] Submitting job: 
> e692c2dfaa18441c0000000000000002 to session cluster.
>     2023-07-06 10:23:45,374 o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][aelps-staging/aletsch-wp-wafer-staging-e5730831] Submitting job: 
> e692c2dfaa18441c0000000000000002 to session cluster.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to