Re: Logging Flink metrics

Manish G Mon, 06 Jul 2020 09:36:20 -0700

The metrics I see on prometheus is like:

# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp
lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
-1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints
numberOfFailedCheckpoints (scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope:
jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count
Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",}
2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope:
jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity
TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",}
604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
0.0




On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <ches...@apache.org> wrote:

> You've said elsewhere that you do see some metrics in prometheus, which
> are those?
>
> Why are you configuring the host for the prometheus reporter? This option
> is only for the PrometheusPushGatewayReporter.
>
> On 06/07/2020 18:01, Manish G wrote:
>
> Hi,
>
> So I have following in flink-conf.yml :
> //////////////////////////////////////////////////////
> metrics.reporter.prom.class:
> org.apache.flink.metrics.prometheus.PrometheusReporter
> metrics.reporter.prom.host: 127.0.0.1
> metrics.reporter.prom.port: 9999
> metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
> metrics.reporter.slf4j.interval: 30 SECONDS
> //////////////////////////////////////////////////////
>
> And while I can see custom metrics in Taskmanager logs, but prometheus
> dashboard logs doesn't show custom metrics.
>
> With regards
>
> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> You have explicitly configured a reporter list, resulting in the slf4j
>> reporter being ignored:
>>
>> 2020-07-06 13:48:22,191 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: metrics.reporters, prom
>> 2020-07-06 13:48:23,203 INFO
>> org.apache.flink.runtime.metrics.ReporterSetup                - Excluding
>> reporter slf4j, not configured in reporter list (prom).
>>
>> Note that nowadays metrics.reporters is no longer required; the set of
>> reporters is automatically determined based on configured properties; the
>> only use-case is disabling a reporter without having to remove the entire
>> configuration.
>> I'd suggest to just remove the option, try again, and report back.
>>
>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>
>> Please enable debug logging and search for warnings from the metric
>> groups/registry/reporter.
>>
>> If you cannot find anything suspicious, you can also send the foll log to
>> me directly.
>>
>> On 06/07/2020 16:29, Manish G wrote:
>>
>> Job is an infinite streaming one, so it keeps going. Flink configuration
>> is as:
>>
>> metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
>> metrics.reporter.slf4j.interval: 30 SECONDS
>>
>>
>>
>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <ches...@apache.org>
>> wrote:
>>
>>> How long did the job run for, and what is the configured interval?
>>>
>>>
>>> On 06/07/2020 15:51, Manish G wrote:
>>>
>>> Hi,
>>>
>>> Thanks for this.
>>>
>>> I did the configuration as mentioned at the link(changes in
>>> flink-conf.yml, copying the jar in lib directory), and registered the Meter
>>> with metrics group and invoked markEvent() method in the target code. But I
>>> don't see any related logs.
>>> I am doing this all on my local computer.
>>>
>>> Anything else I need to do?
>>>
>>> With regards
>>> Manish
>>>
>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <ches...@apache.org>
>>> wrote:
>>>
>>>> Have you looked at the SLF4J reporter?
>>>>
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>
>>>> On 06/07/2020 13:49, Manish G wrote:
>>>> > Hi,
>>>> >
>>>> > Is it possible to log Flink metrics in application logs apart from
>>>> > publishing it to Prometheus?
>>>> >
>>>> > With regards
>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Logging Flink metrics

Reply via email to