I am using flink operator with beam + python. I also struggled with autoscaler, and ended up going with a static configuration and using parallelism as a way to get some scaling.
Regards, Pritam On Thu, Dec 18, 2025, 9:07 AM Sebastian YEPES <[email protected]> wrote: > Hello All, > > Is anyone in the community currently using the Kubernetes operator? It > would be really helpful to get some insights or assistance with this issue. > > If this isn’t the best place for communication or support, could someone > kindly point me to where I can get help with the operator? > > Regards, > Seb > > On Thu, Nov 13, 2025 at 11:42 AM Sebastian YEPES <[email protected]> wrote: > >> Hello, >> >> I’ve recently started using the Flink Kubernetes Operator with the >> autoscaler feature and have encountered some OOMKilled issues. From my >> investigation, it appears that the operator automatically calculates and >> adjusts memory settings based on the initial configuration and current >> traffic. While this mechanism works in principle and I can see >> the deployments are getting auto adjusted as data is getting processed. >> I’ve noticed that the autoscaler tends to set the CPU and Memory resource >> limits for Kubernetes pods too low, which results in the pods being killed >> due to resource overconsumption. >> >> The limits are being set almost equal to the total configured memory, >> without including any additional buffer to provide some leeway. >> >> I’ve tried to override or manually set the resource limits for the >> TaskManager, but these changes don’t seem to take effect. >> From the perspective of the CRD definition, this configuration is >> permitted, but it doesn’t appear to be functioning as expected: >> https://github.com/apache/flink-kubernetes-operator/blob/release-1.13/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml#L865-L890 >> >> See attachment for the full example >> >>> podTemplate: >>> spec: >>> containers: >>> - name: flink-main-container >>> # TODO: Investigate not working >>> resources: >>> limits: >>> cpu: 3.5 >>> memory: "12Gi" >> >> >> >> *I have a couple of questions:* >> - Is this a known issue with the Flink Operator, or could it be a >> configuration problem on my end? >> - Is there currently a way to explicitly define Kubernetes resource >> limits for the flink-main-container? >> >> *Environment details:* >> Used FlinkDeployment CRD: See attachment with all the settings ( >> FlinkDeployment-Example.yaml) >> Flink 2.1.1 >> Flink Operator: 1.13 >> Kubernetes: 1.33 >> Python: 3.12 >> >> >> Any insights or suggestions would be greatly appreciated. >> >> Thank you! >> Sebastian YEPES >> >>
