[ 
https://issues.apache.org/jira/browse/FLINK-31924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-31924.
------------------------------
    Resolution: Not A Bug

Based on offline discussion the problem seems to be related to the job itself 
not the autoscaler logic

> [Flink operator] Flink Autoscale - Limit the max number of scale ups
> --------------------------------------------------------------------
>
>                 Key: FLINK-31924
>                 URL: https://issues.apache.org/jira/browse/FLINK-31924
>             Project: Flink
>          Issue Type: Bug
>          Components: Autoscaler, Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.4.0
>            Reporter: Sriram Ganesh
>            Priority: Critical
>
> Found that Autoscale keeps happening even after reaching max-parallelism.
> {color:#172b4d}Flink version: 1.17{color}
> Source: Kafka
> Configuration:
>  
> {code:java}
> flinkConfiguration:
>     kubernetes.operator.job.autoscaler.enabled: "true"
>     kubernetes.operator.job.autoscaler.scaling.sources.enabled: "true"
>     kubernetes.operator.job.autoscaler.target.utilization: "0.6"
>     kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2"
>     kubernetes.operator.job.autoscaler.stabilization.interval: "1m"
>     kubernetes.operator.job.autoscaler.metrics.window: "3m"{code}
> Logs:
> {code:java}
> 2023-04-24 12:29:10,738 o.a.f.k.o.c.FlinkDeploymentController [INFO 
> ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:10,740 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:10,740 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:10,765 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> changed from CREATED to RUNNING2023-04-24 12:29:10,870 o.a.f.k.o.l.AuditUtils 
>         [INFO ][my-namespace/my-pod] >>> Event  | Info    | JOBSTATUSCHANGED 
> | Job status changed from CREATED to RUNNING2023-04-24 12:29:10,938 
> o.a.f.k.o.l.AuditUtils         [INFO ][my-namespace/my-pod] >>> Status | Info 
>    | STABLE          | The resource deployment is considered to be stable and 
> won’t be rolled back2023-04-24 12:29:10,986 
> o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Skipping 
> metric collection during stabilization period until 
> 2023-04-24T12:30:10.765Z2023-04-24 12:29:10,986 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:10,986 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:25,991 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:25,992 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:25,992 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:26,005 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:26,053 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:26,054 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:26,054 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:41,059 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:41,060 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:41,061 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:41,075 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:41,116 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:41,116 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:41,116 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:56,121 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:56,122 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:56,122 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:56,134 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:56,178 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:56,179 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:56,179 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:30:11,183 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:30:11,184 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:30:11,184 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:30:11,193 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:30:11,367 o.a.f.k.o.a.m.ScalingMetrics   
> [ERROR][my-namespace/my-pod] Cannot compute source target data rate without 
> numRecordsInPerSecond and pendingRecords (lag) metric for 
> e5a72f353fc1e6bbf3bd96a41384998c.2023-04-24 12:30:11,370 
> o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Waiting until 
> 2023-04-24T12:33:10.765Z so the initial metric window is full before starting 
> scaling2023-04-24 12:30:11,370 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler 
> [INFO ][my-namespace/my-pod] Resource fully reconciled, nothing to 
> do...2023-04-24 12:30:11,370 o.a.f.k.o.c.FlinkDeploymentController [INFO 
> ][my-namespace/my-pod] End of reconciliation2023-04-24 12:30:26,374 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] Starting 
> reconciliation2023-04-24 12:30:26,375 o.a.f.k.o.s.FlinkResourceContextFactory 
> [INFO ][my-namespace/my-pod] Getting service for my-job2023-04-24 
> 12:30:26,376 o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] 
> Observing job status2023-04-24 12:30:26,385 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Job status (RUNNING) unchanged2023-04-24 
> 12:30:26,542 o.a.f.k.o.a.m.ScalingMetrics   [ERROR][my-namespace/my-pod] 
> Cannot compute source target data rate without numRecordsInPerSecond and 
> pendingRecords (lag) metric for e5a72f353fc1e6bbf3bd96a41384998c.2023-04-24 
> 12:30:26,543 o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] 
> Waiting until 2023-04-24T12:33:10.765Z so the initial metric window is full 
> before starting scaling2023-04-24 12:30:26,543 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:30:26,544 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to