Re: Autoscaling Global Scaling Factor (???)

Zhanghao Chen Fri, 15 Aug 2025 04:04:50 -0700

Not quite possible based on the current version. We run an internal version of 
the Autoscaler in our production env. One major diff is that we let the whole 
pipeline (except the source/sink) to have the same parallelism to avoid uneven 
task distribution. The change is relatively simple, just run the algorithm per 
vertex, and take the max of them.


Best,
Zhanghao Chen
________________________________
From: Salva Alcántara <salcantara...@gmail.com>
Sent: Thursday, August 14, 2025 12:24
To: user <user@flink.apache.org>
Subject: Re: Autoscaling Global Scaling Factor (???)

That was on my agenda already. Will try and let you know how it goes.

Regarding my questions, do you think it's possible to achieve any of those 
points to make the autoscaler work as when you simply add/remove replicas by 
hand?

Thanks Chen!

Salva

On Thu, Aug 14, 2025 at 2:58 AM Zhanghao Chen 
<zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>> wrote:
Hi, you may upgrade Flink to 1.19.3 or 1.20.2 or 2.0.1+. There's a known issue 
that Autoscaler may not minimize the number of TMs during downscaling with 
adaptive scheduler [1].

[1] https://issues.apache.org/jira/browse/FLINK-33977

Best,
Zhanghao Chen

________________________________
From: Salva Alcántara <salcantara...@gmail.com<mailto:salcantara...@gmail.com>>
Sent: Wednesday, August 13, 2025 20:56
To: user <user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: RE: Autoscaling Global Scaling Factor (???)

BTW, I'm running on Flink 1.18.1 on top of operator 1.12.1 and the following 
autoscaler settings:

```
      job.autoscaler.enabled: "true"
      job.autoscaler.scaling.enabled: "true"
      job.autoscaler.scale-down.enabled: "true"
      job.autoscaler.vertex.max-parallelism: "8"
      job.autoscaler.vertex.min-parallelism: "1"
      jobmanager.scheduler: adaptive
      job.autoscaler.metrics.window: 15m
      job.autoscaler.metrics.busy-time.aggregator: MAX
      job.autoscaler.backlog-processing.lag-threshold: 2m
      job.autoscaler.scaling.effectiveness.detection.enabled: "true"
      job.autoscaler.scaling.effectiveness.threshold: "0.3"
      job.autoscaler.scaling.event.interval: 10m
      job.autoscaler.stabilization.interval: 5m
      job.autoscaler.scale-up.max-factor: "100000.0"
      job.autoscaler.scaling.key-group.partitions.adjust.mode: "EVENLY_SPREAD"
      job.autoscaler.scale-down.interval: 30m
      job.autoscaler.scale-down.max-factor: "0.5"
      job.autoscaler.memory.tuning.scale-down-compensation.enabled: "true"
      job.autoscaler.catch-up.duration: 5m
      job.autoscaler.restart.time: 15m
      job.autoscaler.restart.time-tracking.enabled: "true"
      job.autoscaler.utilization.target: "0.8"
```

Regards,

Salva

Re: Autoscaling Global Scaling Factor (???)

Reply via email to