Hi, I got stuck in using Prometheus,Pushgateway to collect metrics. Here is my configuration about reporter:
metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter metrics.reporter.promgateway.host: localhost metrics.reporter.promgateway.port: 9091 metrics.reporter.promgateway.jobName: myJob metrics.reporter.promgateway.randomJobNameSuffix: true metrics.reporter.promgateway.deleteOnShutdown: true And the version information: Flink 1.9.1 Prometheus 2.18 PushGateway 1.2 & 0.9 (I had already try them both) I found that when Flink cluster restart, there showed up metrics which have new jobName with random suffix. But there still existed those metrics having jobName before restarting cluster(value stop update). Since Prometheus still periodically pulled the data in pushgateway, I got a bunch of time series data with value unchanged forever. It looks like: # HELP flink_jobmanager_Status_JVM_CPU_Load Load (scope: jobmanager_Status_JVM_CPU) # TYPE flink_jobmanager_Status_JVM_CPU_Load gauge flink_jobmanager_Status_JVM_CPU_Load{host="localhost",instance="",job="myJobae71620b106e8c2fdf86cb5c65fd6414"} 0 flink_jobmanager_Status_JVM_CPU_Load{host="localhost",instance="",job="myJobe50caa3be194aeb2ff71a64bced17cea"} 0.0006602344673593189 # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: jobmanager_Status_JVM_CPU) # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge flink_jobmanager_Status_JVM_CPU_Time{host="localhost",instance="",job="myJobae71620b106e8c2fdf86cb5c65fd6414"} 4.54512e+09 flink_jobmanager_Status_JVM_CPU_Time{host="localhost",instance="",job="myJobe50caa3be194aeb2ff71a64bced17cea"} 8.24809e+09 # HELP flink_jobmanager_Status_JVM_ClassLoader_ClassesLoaded ClassesLoaded (scope: jobmanager_Status_JVM_ClassLoader) # TYPE flink_jobmanager_Status_JVM_ClassLoader_ClassesLoaded gauge flink_jobmanager_Status_JVM_ClassLoader_ClassesLoaded{host="localhost",instance="",job="myJobae71620b106e8c2fdf86cb5c65fd6414"} 5984 flink_jobmanager_Status_JVM_ClassLoader_ClassesLoaded{host="localhost",instance="",job="myJobe50caa3be194aeb2ff71a64bced17cea"} 6014 # HELP flink_jobmanager_Status_JVM_ClassLoader_ClassesUnloaded ClassesUnloaded (scope: jobmanager_Status_JVM_ClassLoader) # TYPE flink_jobmanager_Status_JVM_ClassLoader_ClassesUnloaded gauge flink_jobmanager_Status_JVM_ClassLoader_ClassesUnloaded{host="localhost",instance="",job="myJobae71620b106e8c2fdf86cb5c65fd6414"} 0 flink_jobmanager_Status_JVM_ClassLoader_ClassesUnloaded{host="localhost",instance="",job="myJobe50caa3be194aeb2ff71a64bced17cea"} 0 Ps: This cluster has one JobManager. In my understanding, when I set metrics.reporter.promgateway.deleteOnShutdown to true, the old metrics information should be deleted from pushgateway. But it didn’t work somehow. Is my understanding on these configuration right? Any solution about deleting metrics from pushgateway? Thanks!