Re: Questions Regarding Flink K8s Operator Autoscaler and Resource Limits

Pritam Dodeja Thu, 18 Dec 2025 08:57:58 -0800

I am using flink operator with beam + python. I also struggled with
autoscaler, and ended up going with a static configuration and using
parallelism as a way to get some scaling.


Regards,

Pritam

On Thu, Dec 18, 2025, 9:07 AM Sebastian YEPES <[email protected]> wrote:

> Hello All,
>
> Is anyone in the community currently using the Kubernetes operator? It
> would be really helpful to get some insights or assistance with this issue.
>
> If this isn’t the best place for communication or support, could someone
> kindly point me to where I can get help with the operator?
>
> Regards,
> Seb
>
> On Thu, Nov 13, 2025 at 11:42 AM Sebastian YEPES <[email protected]> wrote:
>
>> Hello,
>>
>> I’ve recently started using the Flink Kubernetes Operator with the
>> autoscaler feature and have encountered some OOMKilled issues. From my
>> investigation, it appears that the operator automatically calculates and
>> adjusts memory settings based on the initial configuration and current
>> traffic. While this mechanism works in principle and I can see
>> the deployments are getting auto adjusted as data is getting processed.
>> I’ve noticed that the autoscaler tends to set the CPU and Memory resource
>> limits for Kubernetes pods too low, which results in the pods being killed
>> due to resource overconsumption.
>>
>> The limits are being set almost equal to the total configured memory,
>> without including any additional buffer to provide some leeway.
>>
>> I’ve tried to override or manually set the resource limits for the
>> TaskManager, but these changes don’t seem to take effect.
>> From the perspective of the CRD definition, this configuration is
>> permitted, but it doesn’t appear to be functioning as expected:
>> https://github.com/apache/flink-kubernetes-operator/blob/release-1.13/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml#L865-L890
>>
>> See attachment for the full example
>>
>>>   podTemplate:
>>>     spec:
>>>       containers:
>>>         - name: flink-main-container
>>>           # TODO: Investigate not working
>>>           resources:
>>>             limits:
>>>               cpu: 3.5
>>>               memory: "12Gi"
>>
>>
>>
>> *I have a couple of questions:*
>> - Is this a known issue with the Flink Operator, or could it be a
>> configuration problem on my end?
>> - Is there currently a way to explicitly define Kubernetes resource
>> limits for the flink-main-container?
>>
>> *Environment details:*
>>  Used FlinkDeployment CRD: See attachment with all the settings (
>> FlinkDeployment-Example.yaml)
>>  Flink 2.1.1
>>  Flink Operator: 1.13
>>  Kubernetes: 1.33
>>  Python: 3.12
>>
>>
>> Any insights or suggestions would be greatly appreciated.
>>
>> Thank you!
>>  Sebastian YEPES
>>
>>

Re: Questions Regarding Flink K8s Operator Autoscaler and Resource Limits

Reply via email to