[jira] [Commented] (FLINK-31924) [Flink operator] Flink Autoscale - Limit the max number of scale ups

2023-04-25 Thread Sriram Ganesh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716186#comment-17716186
 ] 

Sriram Ganesh commented on FLINK-31924:
---

The issue still remains the same. I tried from the main branch.

> [Flink operator] Flink Autoscale - Limit the max number of scale ups
> 
>
> Key: FLINK-31924
> URL: https://issues.apache.org/jira/browse/FLINK-31924
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler, Kubernetes Operator
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Sriram Ganesh
>Priority: Critical
>
> Found that Autoscale keeps happening even after reaching max-parallelism.
> {color:#172b4d}Flink version: 1.17{color}
> Source: Kafka
> Configuration:
>  
> {code:java}
> flinkConfiguration:
>     kubernetes.operator.job.autoscaler.enabled: "true"
>     kubernetes.operator.job.autoscaler.scaling.sources.enabled: "true"
>     kubernetes.operator.job.autoscaler.target.utilization: "0.6"
>     kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2"
>     kubernetes.operator.job.autoscaler.stabilization.interval: "1m"
>     kubernetes.operator.job.autoscaler.metrics.window: "3m"{code}
> Logs:
> {code:java}
> 2023-04-24 12:29:10,738 o.a.f.k.o.c.FlinkDeploymentController [INFO 
> ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:10,740 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:10,740 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:10,765 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> changed from CREATED to RUNNING2023-04-24 12:29:10,870 o.a.f.k.o.l.AuditUtils 
> [INFO ][my-namespace/my-pod] >>> Event  | Info| JOBSTATUSCHANGED 
> | Job status changed from CREATED to RUNNING2023-04-24 12:29:10,938 
> o.a.f.k.o.l.AuditUtils [INFO ][my-namespace/my-pod] >>> Status | Info 
>| STABLE  | The resource deployment is considered to be stable and 
> won’t be rolled back2023-04-24 12:29:10,986 
> o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Skipping 
> metric collection during stabilization period until 
> 2023-04-24T12:30:10.765Z2023-04-24 12:29:10,986 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:10,986 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:25,991 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:25,992 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:25,992 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:26,005 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:26,053 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:26,054 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:26,054 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:41,059 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:41,060 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:41,061 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:41,075 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:41,116 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:41,116 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:41,116 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:56,121 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:56,122 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:56,122 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:56,134 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job 

[jira] [Commented] (FLINK-31924) [Flink operator] Flink Autoscale - Limit the max number of scale ups

2023-04-24 Thread Maximilian Michels (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715824#comment-17715824
 ] 

Maximilian Michels commented on FLINK-31924:


Could you clarify what is the issue here? The logs don't indicate an issue. The 
autoscaler will continue to run even after reaching the max parallelism. The 
max parallelism is per vertex. There may be other vertices which still get 
scaled.

> [Flink operator] Flink Autoscale - Limit the max number of scale ups
> 
>
> Key: FLINK-31924
> URL: https://issues.apache.org/jira/browse/FLINK-31924
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler, Kubernetes Operator
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Sriram Ganesh
>Priority: Critical
>
> Found that Autoscale keeps happening even after reaching max-parallelism.
> {color:#172b4d}Flink version: 1.17{color}
> Source: Kafka
> Configuration:
>  
> {code:java}
> flinkConfiguration:
>     kubernetes.operator.job.autoscaler.enabled: "true"
>     kubernetes.operator.job.autoscaler.scaling.sources.enabled: "true"
>     kubernetes.operator.job.autoscaler.target.utilization: "0.6"
>     kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2"
>     kubernetes.operator.job.autoscaler.stabilization.interval: "1m"
>     kubernetes.operator.job.autoscaler.metrics.window: "3m"{code}
> Logs:
> {code:java}
> 2023-04-24 12:29:10,738 o.a.f.k.o.c.FlinkDeploymentController [INFO 
> ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:10,740 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:10,740 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:10,765 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> changed from CREATED to RUNNING2023-04-24 12:29:10,870 o.a.f.k.o.l.AuditUtils 
> [INFO ][my-namespace/my-pod] >>> Event  | Info| JOBSTATUSCHANGED 
> | Job status changed from CREATED to RUNNING2023-04-24 12:29:10,938 
> o.a.f.k.o.l.AuditUtils [INFO ][my-namespace/my-pod] >>> Status | Info 
>| STABLE  | The resource deployment is considered to be stable and 
> won’t be rolled back2023-04-24 12:29:10,986 
> o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Skipping 
> metric collection during stabilization period until 
> 2023-04-24T12:30:10.765Z2023-04-24 12:29:10,986 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:10,986 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:25,991 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:25,992 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:25,992 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:26,005 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:26,053 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:26,054 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:26,054 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:41,059 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:41,060 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:41,061 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:41,075 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:41,116 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:41,116 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:41,116 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:56,121 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:56,122 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 

[jira] [Commented] (FLINK-31924) [Flink operator] Flink Autoscale - Limit the max number of scale ups

2023-04-24 Thread Sriram Ganesh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715816#comment-17715816
 ] 

Sriram Ganesh commented on FLINK-31924:
---

Sure. Let me try and come back again by tomorrow.

> [Flink operator] Flink Autoscale - Limit the max number of scale ups
> 
>
> Key: FLINK-31924
> URL: https://issues.apache.org/jira/browse/FLINK-31924
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler, Kubernetes Operator
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Sriram Ganesh
>Priority: Critical
>
> Found that Autoscale keeps happening even after reaching max-parallelism.
> {color:#172b4d}Flink version: 1.17{color}
> Source: Kafka
> Configuration:
>  
> {code:java}
> flinkConfiguration:
>     kubernetes.operator.job.autoscaler.enabled: "true"
>     kubernetes.operator.job.autoscaler.scaling.sources.enabled: "true"
>     kubernetes.operator.job.autoscaler.target.utilization: "0.6"
>     kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2"
>     kubernetes.operator.job.autoscaler.stabilization.interval: "1m"
>     kubernetes.operator.job.autoscaler.metrics.window: "3m"{code}
> Logs:
> {code:java}
> 2023-04-24 12:29:10,738 o.a.f.k.o.c.FlinkDeploymentController [INFO 
> ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:10,740 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:10,740 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:10,765 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> changed from CREATED to RUNNING2023-04-24 12:29:10,870 o.a.f.k.o.l.AuditUtils 
> [INFO ][my-namespace/my-pod] >>> Event  | Info| JOBSTATUSCHANGED 
> | Job status changed from CREATED to RUNNING2023-04-24 12:29:10,938 
> o.a.f.k.o.l.AuditUtils [INFO ][my-namespace/my-pod] >>> Status | Info 
>| STABLE  | The resource deployment is considered to be stable and 
> won’t be rolled back2023-04-24 12:29:10,986 
> o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Skipping 
> metric collection during stabilization period until 
> 2023-04-24T12:30:10.765Z2023-04-24 12:29:10,986 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:10,986 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:25,991 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:25,992 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:25,992 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:26,005 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:26,053 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:26,054 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:26,054 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:41,059 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:41,060 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:41,061 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:41,075 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:41,116 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:41,116 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:41,116 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:56,121 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:56,122 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:56,122 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:56,134 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> 

[jira] [Commented] (FLINK-31924) [Flink operator] Flink Autoscale - Limit the max number of scale ups

2023-04-24 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715779#comment-17715779
 ] 

Gyula Fora commented on FLINK-31924:


[~sriramgr] can you please try the autoscaler from the latest main branch and 
see whether you can reproduce the problem?

> [Flink operator] Flink Autoscale - Limit the max number of scale ups
> 
>
> Key: FLINK-31924
> URL: https://issues.apache.org/jira/browse/FLINK-31924
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler, Kubernetes Operator
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Sriram Ganesh
>Priority: Critical
>
> Found that Autoscale keeps happening even after reaching max-parallelism.
> {color:#172b4d}Flink version: 1.17{color}
> Source: Kafka
> Configuration:
>  
> {code:java}
> flinkConfiguration:
>     kubernetes.operator.job.autoscaler.enabled: "true"
>     kubernetes.operator.job.autoscaler.scaling.sources.enabled: "true"
>     kubernetes.operator.job.autoscaler.target.utilization: "0.6"
>     kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2"
>     kubernetes.operator.job.autoscaler.stabilization.interval: "1m"
>     kubernetes.operator.job.autoscaler.metrics.window: "3m"{code}
> Logs:
> {code:java}
> 2023-04-24 12:29:10,738 o.a.f.k.o.c.FlinkDeploymentController [INFO 
> ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:10,740 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:10,740 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:10,765 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> changed from CREATED to RUNNING2023-04-24 12:29:10,870 o.a.f.k.o.l.AuditUtils 
> [INFO ][my-namespace/my-pod] >>> Event  | Info| JOBSTATUSCHANGED 
> | Job status changed from CREATED to RUNNING2023-04-24 12:29:10,938 
> o.a.f.k.o.l.AuditUtils [INFO ][my-namespace/my-pod] >>> Status | Info 
>| STABLE  | The resource deployment is considered to be stable and 
> won’t be rolled back2023-04-24 12:29:10,986 
> o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Skipping 
> metric collection during stabilization period until 
> 2023-04-24T12:30:10.765Z2023-04-24 12:29:10,986 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:10,986 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:25,991 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:25,992 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:25,992 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:26,005 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:26,053 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:26,054 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:26,054 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:41,059 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:41,060 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:41,061 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:41,075 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:41,116 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:41,116 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:41,116 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:56,121 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:56,122 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:56,122 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:56,134 
> 

[jira] [Commented] (FLINK-31924) [Flink operator] Flink Autoscale - Limit the max number of scale ups

2023-04-24 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715777#comment-17715777
 ] 

Gyula Fora commented on FLINK-31924:


cc [~mxm] 

> [Flink operator] Flink Autoscale - Limit the max number of scale ups
> 
>
> Key: FLINK-31924
> URL: https://issues.apache.org/jira/browse/FLINK-31924
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler, Kubernetes Operator
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Sriram Ganesh
>Priority: Critical
>
> Found that Autoscale keeps happening even after reaching max-parallelism.
> {color:#172b4d}Flink version: 1.17{color}
> Source: Kafka
> Configuration:
>  
> {code:java}
> flinkConfiguration:
>     kubernetes.operator.job.autoscaler.enabled: "true"
>     kubernetes.operator.job.autoscaler.scaling.sources.enabled: "true"
>     kubernetes.operator.job.autoscaler.target.utilization: "0.6"
>     kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2"
>     kubernetes.operator.job.autoscaler.stabilization.interval: "1m"
>     kubernetes.operator.job.autoscaler.metrics.window: "3m"{code}
> Logs:
> {code:java}
> 2023-04-24 12:29:10,738 o.a.f.k.o.c.FlinkDeploymentController [INFO 
> ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:10,740 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:10,740 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:10,765 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> changed from CREATED to RUNNING2023-04-24 12:29:10,870 o.a.f.k.o.l.AuditUtils 
> [INFO ][my-namespace/my-pod] >>> Event  | Info| JOBSTATUSCHANGED 
> | Job status changed from CREATED to RUNNING2023-04-24 12:29:10,938 
> o.a.f.k.o.l.AuditUtils [INFO ][my-namespace/my-pod] >>> Status | Info 
>| STABLE  | The resource deployment is considered to be stable and 
> won’t be rolled back2023-04-24 12:29:10,986 
> o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Skipping 
> metric collection during stabilization period until 
> 2023-04-24T12:30:10.765Z2023-04-24 12:29:10,986 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:10,986 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:25,991 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:25,992 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:25,992 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:26,005 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:26,053 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:26,054 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:26,054 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:41,059 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:41,060 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:41,061 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:41,075 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:41,116 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:41,116 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:41,116 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:56,121 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:56,122 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:56,122 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:56,134 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:56,178 

[jira] [Commented] (FLINK-31924) [Flink operator] Flink Autoscale - Limit the max number of scale ups

2023-04-24 Thread Sriram Ganesh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715765#comment-17715765
 ] 

Sriram Ganesh commented on FLINK-31924:
---

[~gyfora] - Please kindly check. 

> [Flink operator] Flink Autoscale - Limit the max number of scale ups
> 
>
> Key: FLINK-31924
> URL: https://issues.apache.org/jira/browse/FLINK-31924
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Sriram Ganesh
>Priority: Critical
>
> Found that Autoscale keeps happening even after reaching max-parallelism.
> {color:#172b4d}Flink version: 1.17
> {color}Source: Kafka
> Configuration:
>  
> {code:java}
> flinkConfiguration:
>     kubernetes.operator.job.autoscaler.enabled: "true"
>     kubernetes.operator.job.autoscaler.scaling.sources.enabled: "true"
>     kubernetes.operator.job.autoscaler.target.utilization: "0.6"
>     kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2"
>     kubernetes.operator.job.autoscaler.stabilization.interval: "1m"
>     kubernetes.operator.job.autoscaler.metrics.window: "3m"{code}
> Logs:
> {code:java}
> 2023-04-24 12:29:10,738 o.a.f.k.o.c.FlinkDeploymentController [INFO 
> ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:10,740 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:10,740 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:10,765 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> changed from CREATED to RUNNING2023-04-24 12:29:10,870 o.a.f.k.o.l.AuditUtils 
> [INFO ][my-namespace/my-pod] >>> Event  | Info| JOBSTATUSCHANGED 
> | Job status changed from CREATED to RUNNING2023-04-24 12:29:10,938 
> o.a.f.k.o.l.AuditUtils [INFO ][my-namespace/my-pod] >>> Status | Info 
>| STABLE  | The resource deployment is considered to be stable and 
> won’t be rolled back2023-04-24 12:29:10,986 
> o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Skipping 
> metric collection during stabilization period until 
> 2023-04-24T12:30:10.765Z2023-04-24 12:29:10,986 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:10,986 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:25,991 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:25,992 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:25,992 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:26,005 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:26,053 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:26,054 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:26,054 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:41,059 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:41,060 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:41,061 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:41,075 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:41,116 o.a.f.k.o.a.ScalingMetricCollector 
> [INFO ][my-namespace/my-pod] Skipping metric collection during stabilization 
> period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:41,116 
> o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod] 
> Resource fully reconciled, nothing to do...2023-04-24 12:29:41,116 
> o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of 
> reconciliation2023-04-24 12:29:56,121 o.a.f.k.o.c.FlinkDeploymentController 
> [INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:56,122 
> o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting 
> service for my-job2023-04-24 12:29:56,122 o.a.f.k.o.o.JobStatusObserver  
> [INFO ][my-namespace/my-pod] Observing job status2023-04-24 12:29:56,134 
> o.a.f.k.o.o.JobStatusObserver  [INFO ][my-namespace/my-pod] Job status 
> (RUNNING) unchanged2023-04-24 12:29:56,178