Not quite possible based on the current version. We run an internal version of the Autoscaler in our production env. One major diff is that we let the whole pipeline (except the source/sink) to have the same parallelism to avoid uneven task distribution. The change is relatively simple, just run the algorithm per vertex, and take the max of them.
Best, Zhanghao Chen ________________________________ From: Salva Alcántara <salcantara...@gmail.com> Sent: Thursday, August 14, 2025 12:24 To: user <user@flink.apache.org> Subject: Re: Autoscaling Global Scaling Factor (???) That was on my agenda already. Will try and let you know how it goes. Regarding my questions, do you think it's possible to achieve any of those points to make the autoscaler work as when you simply add/remove replicas by hand? Thanks Chen! Salva On Thu, Aug 14, 2025 at 2:58 AM Zhanghao Chen <zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>> wrote: Hi, you may upgrade Flink to 1.19.3 or 1.20.2 or 2.0.1+. There's a known issue that Autoscaler may not minimize the number of TMs during downscaling with adaptive scheduler [1]. [1] https://issues.apache.org/jira/browse/FLINK-33977 Best, Zhanghao Chen ________________________________ From: Salva Alcántara <salcantara...@gmail.com<mailto:salcantara...@gmail.com>> Sent: Wednesday, August 13, 2025 20:56 To: user <user@flink.apache.org<mailto:user@flink.apache.org>> Subject: RE: Autoscaling Global Scaling Factor (???) BTW, I'm running on Flink 1.18.1 on top of operator 1.12.1 and the following autoscaler settings: ``` job.autoscaler.enabled: "true" job.autoscaler.scaling.enabled: "true" job.autoscaler.scale-down.enabled: "true" job.autoscaler.vertex.max-parallelism: "8" job.autoscaler.vertex.min-parallelism: "1" jobmanager.scheduler: adaptive job.autoscaler.metrics.window: 15m job.autoscaler.metrics.busy-time.aggregator: MAX job.autoscaler.backlog-processing.lag-threshold: 2m job.autoscaler.scaling.effectiveness.detection.enabled: "true" job.autoscaler.scaling.effectiveness.threshold: "0.3" job.autoscaler.scaling.event.interval: 10m job.autoscaler.stabilization.interval: 5m job.autoscaler.scale-up.max-factor: "100000.0" job.autoscaler.scaling.key-group.partitions.adjust.mode: "EVENLY_SPREAD" job.autoscaler.scale-down.interval: 30m job.autoscaler.scale-down.max-factor: "0.5" job.autoscaler.memory.tuning.scale-down-compensation.enabled: "true" job.autoscaler.catch-up.duration: 5m job.autoscaler.restart.time: 15m job.autoscaler.restart.time-tracking.enabled: "true" job.autoscaler.utilization.target: "0.8" ``` Regards, Salva