Gyula Fora created FLINK-33764:
----------------------------------

             Summary: Incorporate GC / Heap metrics in autoscaler decisions
                 Key: FLINK-33764
                 URL: https://issues.apache.org/jira/browse/FLINK-33764
             Project: Flink
          Issue Type: New Feature
          Components: Autoscaler, Kubernetes Operator
            Reporter: Gyula Fora
            Assignee: Gyula Fora


The autoscaler currently doesn't use any GC/HEAP metrics as part of the scaling 
decisions. 

While the long term goal may be to support vertical scaling (increasing TM 
sizes) currently this is out of scope for the autoscaler.

However it is very important to detect cases where the throughput of certain 
vertices or the entire pipeline is critically affected by long GC pauses. In 
these cases the current autoscaler logic would wrongly assume a low true 
processing rate and scale the pipeline too high, ramping up costs and causing 
further issues.

Using the improved GC metrics introduced in 
https://issues.apache.org/jira/browse/FLINK-33318 we should measure the GC 
pauses and simply block scaling decisions if the pipeline spends too much time 
garbage collecting and notify the user about the required action to increase 
memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to