Gyula Fora created FLINK-33764: ---------------------------------- Summary: Incorporate GC / Heap metrics in autoscaler decisions Key: FLINK-33764 URL: https://issues.apache.org/jira/browse/FLINK-33764 Project: Flink Issue Type: New Feature Components: Autoscaler, Kubernetes Operator Reporter: Gyula Fora Assignee: Gyula Fora
The autoscaler currently doesn't use any GC/HEAP metrics as part of the scaling decisions. While the long term goal may be to support vertical scaling (increasing TM sizes) currently this is out of scope for the autoscaler. However it is very important to detect cases where the throughput of certain vertices or the entire pipeline is critically affected by long GC pauses. In these cases the current autoscaler logic would wrongly assume a low true processing rate and scale the pipeline too high, ramping up costs and causing further issues. Using the improved GC metrics introduced in https://issues.apache.org/jira/browse/FLINK-33318 we should measure the GC pauses and simply block scaling decisions if the pipeline spends too much time garbage collecting and notify the user about the required action to increase memory. -- This message was sent by Atlassian Jira (v8.20.10#820010)