Hi Salva, I'm currently out of bandwidth. Looking forward to your proposal!
Best, Zhanghao Chen ________________________________ From: Salva Alcántara <salcantara...@gmail.com> Sent: Saturday, August 16, 2025 3:23 To: user <user@flink.apache.org> Subject: Re: Autoscaling Global Scaling Factor (???) Yeah I was thinking of enforcing that restriction myself by taking the max too. Anyway, since the change is simple enough, I think that it makes sense to offer that (global scale factor) option, specially for coarse-grained resource management. We could create a ticket for that, what do you think Chen? Maybe you could just push your changes there? Otherwise I could send a proposal myself. BTW A friend of mine (señor!) bumped Flink to 1.20 and he reported better task distribution. He will post an update on this soon... Regards, Salva On Fri, Aug 15, 2025 at 1:04 PM Zhanghao Chen <zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>> wrote: Not quite possible based on the current version. We run an internal version of the Autoscaler in our production env. One major diff is that we let the whole pipeline (except the source/sink) to have the same parallelism to avoid uneven task distribution. The change is relatively simple, just run the algorithm per vertex, and take the max of them. Best, Zhanghao Chen ________________________________ From: Salva Alcántara <salcantara...@gmail.com<mailto:salcantara...@gmail.com>> Sent: Thursday, August 14, 2025 12:24 To: user <user@flink.apache.org<mailto:user@flink.apache.org>> Subject: Re: Autoscaling Global Scaling Factor (???) That was on my agenda already. Will try and let you know how it goes. Regarding my questions, do you think it's possible to achieve any of those points to make the autoscaler work as when you simply add/remove replicas by hand? Thanks Chen! Salva On Thu, Aug 14, 2025 at 2:58 AM Zhanghao Chen <zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>> wrote: Hi, you may upgrade Flink to 1.19.3 or 1.20.2 or 2.0.1+. There's a known issue that Autoscaler may not minimize the number of TMs during downscaling with adaptive scheduler [1]. [1] https://issues.apache.org/jira/browse/FLINK-33977 Best, Zhanghao Chen ________________________________ From: Salva Alcántara <salcantara...@gmail.com<mailto:salcantara...@gmail.com>> Sent: Wednesday, August 13, 2025 20:56 To: user <user@flink.apache.org<mailto:user@flink.apache.org>> Subject: RE: Autoscaling Global Scaling Factor (???) BTW, I'm running on Flink 1.18.1 on top of operator 1.12.1 and the following autoscaler settings: ``` job.autoscaler.enabled: "true" job.autoscaler.scaling.enabled: "true" job.autoscaler.scale-down.enabled: "true" job.autoscaler.vertex.max-parallelism: "8" job.autoscaler.vertex.min-parallelism: "1" jobmanager.scheduler: adaptive job.autoscaler.metrics.window: 15m job.autoscaler.metrics.busy-time.aggregator: MAX job.autoscaler.backlog-processing.lag-threshold: 2m job.autoscaler.scaling.effectiveness.detection.enabled: "true" job.autoscaler.scaling.effectiveness.threshold: "0.3" job.autoscaler.scaling.event.interval: 10m job.autoscaler.stabilization.interval: 5m job.autoscaler.scale-up.max-factor: "100000.0" job.autoscaler.scaling.key-group.partitions.adjust.mode: "EVENLY_SPREAD" job.autoscaler.scale-down.interval: 30m job.autoscaler.scale-down.max-factor: "0.5" job.autoscaler.memory.tuning.scale-down-compensation.enabled: "true" job.autoscaler.catch-up.duration: 5m job.autoscaler.restart.time: 15m job.autoscaler.restart.time-tracking.enabled: "true" job.autoscaler.utilization.target: "0.8" ``` Regards, Salva