Hi all. It seems that we have problems when we try to create a group of Flink Session Jobs, the Operator first runs into some timeouts, and most jobs go into RECONCILING state. Then they do manage to get to running state, but when we inspect Flink UI, we can see that some of the jobs are duplicates. The K8s Operator mixed up
So, for instance, we would see: % kubectl get FlinkSessionJobs NAME JOB STATUS LIFECYCLE STATE cache-cdc-dynamic-ims-config-entities RUNNING STABLE cache-cdc-equipment RUNNING STABLE cache-cdc-floc RUNNING STABLE cache-cdc-maintenance-entities RUNNING STABLE cache-cdc-notification-entities RUNNING STABLE cache-cdc-schedule RUNNING STABLE cache-cdc-static-entities RUNNING STABLE cache-cdc-task-list-entities RUNNING STABLE cache-cdc-work-order-entities RUNNING STABLE But listing the jobs shows duplicates. # flink list ------------------ Running/Restarting Jobs ------------------- 10.09.2025 09:36:26 : d0f18a972db81a49763dfddc59eff21d : CACHE - CDC Static Entities (RUNNING) 10.09.2025 09:36:28 : b045747e9f83a5dfb99f306838d23346 : CACHE - CDC Equipment (RUNNING) 10.09.2025 09:36:42 : 57874e342a9386ad5c022d15da947508 : CACHE - CDC Notification entities (RUNNING) 10.09.2025 09:36:49 : 8dce7b2ffb701b619938ab4989f18aba : CACHE - CDC Maintenance entities (RUNNING) 10.09.2025 09:36:51 : bac0c76a4c48e207b065576f80750e8d : CACHE - CDC FLOC (RUNNING) 10.09.2025 09:37:21 : 4e065fa31fcd640c18b8d2f9d832a9ea : CACHE - CDC Maintenance entities (RUNNING) 10.09.2025 09:37:30 : 3ff775f7bda5b83d26b43ecfaddbf030 : CACHE - CDC Schedule (RUNNING) 10.09.2025 09:38:02 : cc7c25a0ed3c1dbf4ea28c9b31ae328d : CACHE - CDC Schedule (RUNNING) 10.09.2025 09:38:42 : 7ea9c0ff50499003e669e27d9d15767f : CACHE - CDC Work Order entities (RUNNING) 10.09.2025 09:39:07 : 119ef8358eebfbe7ff0fa7223ab2e22d : CACHE - CDC Work Order entities (RUNNING) 10.09.2025 09:39:17 : c00161bf832557f7afac66cd8d32b3ac : CACHE - CDC Work Order entities (RUNNING) -------------------------------------------------------------- No scheduled jobs. So, the job with name bolded IS the correct one (K8s resource and the Flink job match) other two jobs are attached to wrong K8s resources. So, for instance FlinkSessionJob/cache-cdc-schedule is referencing correct parameters in “spec” section, but its “status” section is showing another job entirely. There is some sort of mixup. What can we do to at least mitigate this? It seems that creating K8s resources one-by-one is OK. Can we set an operator config option “kubernetes.operator.reconcile.parallelism” to 1 to force one-by-one launching? Nikola.
