The metrics I see on prometheus is like: # HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp lastCheckpointRestoreTimestamp (scope: jobmanager_job) # TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} -1.0 # HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints (scope: jobmanager_job) # TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0 # HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: jobmanager_Status_JVM_Memory_Heap) # TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9 # HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep) # TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} 2.0 # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: jobmanager_Status_JVM_CPU) # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9 # HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct) # TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 604064.0 # HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job) # TYPE flink_jobmanager_job_fullRestarts gauge flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0
On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <ches...@apache.org> wrote: > You've said elsewhere that you do see some metrics in prometheus, which > are those? > > Why are you configuring the host for the prometheus reporter? This option > is only for the PrometheusPushGatewayReporter. > > On 06/07/2020 18:01, Manish G wrote: > > Hi, > > So I have following in flink-conf.yml : > ////////////////////////////////////////////////////// > metrics.reporter.prom.class: > org.apache.flink.metrics.prometheus.PrometheusReporter > metrics.reporter.prom.host: 127.0.0.1 > metrics.reporter.prom.port: 9999 > metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter > metrics.reporter.slf4j.interval: 30 SECONDS > ////////////////////////////////////////////////////// > > And while I can see custom metrics in Taskmanager logs, but prometheus > dashboard logs doesn't show custom metrics. > > With regards > > On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <ches...@apache.org> > wrote: > >> You have explicitly configured a reporter list, resulting in the slf4j >> reporter being ignored: >> >> 2020-07-06 13:48:22,191 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: metrics.reporters, prom >> 2020-07-06 13:48:23,203 INFO >> org.apache.flink.runtime.metrics.ReporterSetup - Excluding >> reporter slf4j, not configured in reporter list (prom). >> >> Note that nowadays metrics.reporters is no longer required; the set of >> reporters is automatically determined based on configured properties; the >> only use-case is disabling a reporter without having to remove the entire >> configuration. >> I'd suggest to just remove the option, try again, and report back. >> >> On 06/07/2020 16:35, Chesnay Schepler wrote: >> >> Please enable debug logging and search for warnings from the metric >> groups/registry/reporter. >> >> If you cannot find anything suspicious, you can also send the foll log to >> me directly. >> >> On 06/07/2020 16:29, Manish G wrote: >> >> Job is an infinite streaming one, so it keeps going. Flink configuration >> is as: >> >> metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter >> metrics.reporter.slf4j.interval: 30 SECONDS >> >> >> >> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> How long did the job run for, and what is the configured interval? >>> >>> >>> On 06/07/2020 15:51, Manish G wrote: >>> >>> Hi, >>> >>> Thanks for this. >>> >>> I did the configuration as mentioned at the link(changes in >>> flink-conf.yml, copying the jar in lib directory), and registered the Meter >>> with metrics group and invoked markEvent() method in the target code. But I >>> don't see any related logs. >>> I am doing this all on my local computer. >>> >>> Anything else I need to do? >>> >>> With regards >>> Manish >>> >>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>>> Have you looked at the SLF4J reporter? >>>> >>>> >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter >>>> >>>> On 06/07/2020 13:49, Manish G wrote: >>>> > Hi, >>>> > >>>> > Is it possible to log Flink metrics in application logs apart from >>>> > publishing it to Prometheus? >>>> > >>>> > With regards >>>> >>>> >>>> >>> >> >> >