[ https://issues.apache.org/jira/browse/FLINK-31924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gyula Fora closed FLINK-31924. ------------------------------ Resolution: Not A Bug Based on offline discussion the problem seems to be related to the job itself not the autoscaler logic > [Flink operator] Flink Autoscale - Limit the max number of scale ups > -------------------------------------------------------------------- > > Key: FLINK-31924 > URL: https://issues.apache.org/jira/browse/FLINK-31924 > Project: Flink > Issue Type: Bug > Components: Autoscaler, Kubernetes Operator > Affects Versions: kubernetes-operator-1.4.0 > Reporter: Sriram Ganesh > Priority: Critical > > Found that Autoscale keeps happening even after reaching max-parallelism. > {color:#172b4d}Flink version: 1.17{color} > Source: Kafka > Configuration: > > {code:java} > flinkConfiguration: > kubernetes.operator.job.autoscaler.enabled: "true" > kubernetes.operator.job.autoscaler.scaling.sources.enabled: "true" > kubernetes.operator.job.autoscaler.target.utilization: "0.6" > kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2" > kubernetes.operator.job.autoscaler.stabilization.interval: "1m" > kubernetes.operator.job.autoscaler.metrics.window: "3m"{code} > Logs: > {code:java} > 2023-04-24 12:29:10,738 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:10,740 > o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting > service for my-job2023-04-24 12:29:10,740 o.a.f.k.o.o.JobStatusObserver > [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:10,765 > o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status > changed from CREATED to RUNNING2023-04-24 12:29:10,870 o.a.f.k.o.l.AuditUtils > [INFO ][my-namespace/my-pod] >>> Event | Info | JOBSTATUSCHANGED > | Job status changed from CREATED to RUNNING2023-04-24 12:29:10,938 > o.a.f.k.o.l.AuditUtils [INFO ][my-namespace/my-pod] >>> Status | Info > | STABLE | The resource deployment is considered to be stable and > won’t be rolled back2023-04-24 12:29:10,986 > o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Skipping > metric collection during stabilization period until > 2023-04-24T12:30:10.765Z2023-04-24 12:29:10,986 > o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] > Resource fully reconciled, nothing to do...2023-04-24 12:29:10,986 > o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of > reconciliation2023-04-24 12:29:25,991 o.a.f.k.o.c.FlinkDeploymentController > [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:25,992 > o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting > service for my-job2023-04-24 12:29:25,992 o.a.f.k.o.o.JobStatusObserver > [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:26,005 > o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status > (RUNNING) unchanged2023-04-24 12:29:26,053 o.a.f.k.o.a.ScalingMetricCollector > [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization > period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:26,054 > o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] > Resource fully reconciled, nothing to do...2023-04-24 12:29:26,054 > o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of > reconciliation2023-04-24 12:29:41,059 o.a.f.k.o.c.FlinkDeploymentController > [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:41,060 > o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting > service for my-job2023-04-24 12:29:41,061 o.a.f.k.o.o.JobStatusObserver > [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:41,075 > o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status > (RUNNING) unchanged2023-04-24 12:29:41,116 o.a.f.k.o.a.ScalingMetricCollector > [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization > period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:41,116 > o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] > Resource fully reconciled, nothing to do...2023-04-24 12:29:41,116 > o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of > reconciliation2023-04-24 12:29:56,121 o.a.f.k.o.c.FlinkDeploymentController > [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:56,122 > o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting > service for my-job2023-04-24 12:29:56,122 o.a.f.k.o.o.JobStatusObserver > [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:56,134 > o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status > (RUNNING) unchanged2023-04-24 12:29:56,178 o.a.f.k.o.a.ScalingMetricCollector > [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization > period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:56,179 > o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] > Resource fully reconciled, nothing to do...2023-04-24 12:29:56,179 > o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of > reconciliation2023-04-24 12:30:11,183 o.a.f.k.o.c.FlinkDeploymentController > [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:30:11,184 > o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting > service for my-job2023-04-24 12:30:11,184 o.a.f.k.o.o.JobStatusObserver > [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:30:11,193 > o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status > (RUNNING) unchanged2023-04-24 12:30:11,367 o.a.f.k.o.a.m.ScalingMetrics > [ERROR][my-namespace/my-pod] Cannot compute source target data rate without > numRecordsInPerSecond and pendingRecords (lag) metric for > e5a72f353fc1e6bbf3bd96a41384998c.2023-04-24 12:30:11,370 > o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Waiting until > 2023-04-24T12:33:10.765Z so the initial metric window is full before starting > scaling2023-04-24 12:30:11,370 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler > [INFO ][my-namespace/my-pod] Resource fully reconciled, nothing to > do...2023-04-24 12:30:11,370 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][my-namespace/my-pod] End of reconciliation2023-04-24 12:30:26,374 > o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] Starting > reconciliation2023-04-24 12:30:26,375 o.a.f.k.o.s.FlinkResourceContextFactory > [INFO ][my-namespace/my-pod] Getting service for my-job2023-04-24 > 12:30:26,376 o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] > Observing job status2023-04-24 12:30:26,385 o.a.f.k.o.o.JobStatusObserver > [INFO ][my-namespace/my-pod] Job status (RUNNING) unchanged2023-04-24 > 12:30:26,542 o.a.f.k.o.a.m.ScalingMetrics [ERROR][my-namespace/my-pod] > Cannot compute source target data rate without numRecordsInPerSecond and > pendingRecords (lag) metric for e5a72f353fc1e6bbf3bd96a41384998c.2023-04-24 > 12:30:26,543 o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] > Waiting until 2023-04-24T12:33:10.765Z so the initial metric window is full > before starting scaling2023-04-24 12:30:26,543 > o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] > Resource fully reconciled, nothing to do...2023-04-24 12:30:26,544 > o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of > reconciliation{code} > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)