Hi All I hope this message finds you well. I'm currently running PyFlink on Kubernetes with the following setup:
* Flink Operator is deployed in namespace-A version 1.20.1 * FlinkDeployment is created in namespace-B * FlinkSessionJob is used to run various jobs with upgradeMode set to last-state To update the PyFlink script, I upload a new version to a folder on a PVC disk. Then, I remove the existing job to trigger a sync via ArgoCD. However, I'm encountering several issues during this process: 1. The job with the same name continues to run, and multiple new jobs are created unexpectedly. 2. I tried changing the restartNonce to a random value to force a state sync, but the new script doesn't seem to be picked up — it appears the job isn't actually restarted. 3. I also attempted to cancel the job using flink cancel, but new jobs keep spawning without removing the previous job ID. It seems that Flink is not properly cleaning up the previous job or reloading the updated script as expected. Questions: * What is the correct way to restart a FlinkSessionJob to ensure the updated PyFlink script is used? * Is there a recommended approach for managing last-state upgrade mode to avoid duplicate or stuck jobs? * Do I need to handle job IDs or state cleanup manually in this scenario? Any insights or recommendations would be greatly appreciated. Best regards, Pachara Aryuyuen (Kas) Cloud Platform Engineer