Re: Logging Flink metrics

Chesnay Schepler Mon, 06 Jul 2020 10:14:08 -0700

Are you running Flink is WSL by chance?

On 06/07/2020 19:06, Manish G wrote:

In flink-conf.yaml:
*metrics.reporter.prom.port: 9250-9260*

This is based on information provided here<https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter>/*|port|- (optional) the port the Prometheus exporter listens on,defaults to9249<https://github.com/prometheus/prometheus/wiki/Default-port-allocations>.In order to be able to run several instances of the reporter on onehost (e.g. when one TaskManager is colocated with the JobManager) itis advisable to use a port range like|9250-9260|.*/

/*
*/

As I am running flink locally, so both jobmanager and taskmanager arecolocated.


In prometheus.yml:
*- job_name: 'flinkprometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']
    metrics_path: /*
*
*

This is the whole configuration I have done based on several tutorialsand blogs available online.

**


/**/

On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <[email protected]<mailto:[email protected]>> wrote:


    These are all JobManager metrics; have you configured prometheus
    to also scrape the task manager processes?

    On 06/07/2020 18:35, Manish G wrote:

    The metrics I see on prometheus is like:
    # HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp 
lastCheckpointRestoreTimestamp (scope: jobmanager_job)
    # TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
    
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
 -1.0
    # HELP flink_jobmanager_job_numberOfFailedCheckpoints 
numberOfFailedCheckpoints (scope: jobmanager_job)
    # TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
    
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
 0.0
    # HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: 
jobmanager_Status_JVM_Memory_Heap)
    # TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
    flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
    # HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count 
Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
    # TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
    
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",}
 2.0
    # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: 
jobmanager_Status_JVM_CPU)
    # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
    flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
    # HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity 
TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
    # TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
    flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 
604064.0
    # HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: 
jobmanager_job)
    # TYPE flink_jobmanager_job_fullRestarts gauge
    
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
 0.0



    On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler
    <[email protected] <mailto:[email protected]>> wrote:

        You've said elsewhere that you do see some metrics in
        prometheus, which are those?

        Why are you configuring the host for the prometheus reporter?
        This option is only for the PrometheusPushGatewayReporter.

        On 06/07/2020 18:01, Manish G wrote:

        Hi,

        So I have following in flink-conf.yml :
        //////////////////////////////////////////////////////
        metrics.reporter.prom.class:
        org.apache.flink.metrics.prometheus.PrometheusReporter
        metrics.reporter.prom.host: 127.0.0.1
        metrics.reporter.prom.port: 9999
        metrics.reporter.slf4j.class:
        org.apache.flink.metrics.slf4j.Slf4jReporter
        metrics.reporter.slf4j.interval: 30 SECONDS
        //////////////////////////////////////////////////////

        And while I can see custom metrics in Taskmanager logs, but
        prometheus dashboard logs doesn't show custom metrics.

        With regards

        On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler
        <[email protected] <mailto:[email protected]>> wrote:

            You have explicitly configured a reporter list,
            resulting in the slf4j reporter being ignored:

            2020-07-06 13:48:22,191 INFO
            org.apache.flink.configuration.GlobalConfiguration -
            Loading configuration property: metrics.reporters, prom
            2020-07-06 13:48:23,203 INFO
            org.apache.flink.runtime.metrics.ReporterSetup -
            Excluding reporter slf4j, not configured in reporter
            list (prom).

            Note that nowadays metrics.reporters is no longer
            required; the set of reporters is automatically
            determined based on configured properties; the only
            use-case is disabling a reporter without having to
            remove the entire configuration.
            I'd suggest to just remove the option, try again, and
            report back.

            On 06/07/2020 16:35, Chesnay Schepler wrote:

            Please enable debug logging and search for warnings
            from the metric groups/registry/reporter.

            If you cannot find anything suspicious, you can also
            send the foll log to me directly.

            On 06/07/2020 16:29, Manish G wrote:

            Job is an infinite streaming one, so it keeps going.
            Flink configuration is as:

            metrics.reporter.slf4j.class:
            org.apache.flink.metrics.slf4j.Slf4jReporter
            metrics.reporter.slf4j.interval: 30 SECONDS



            On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
            <[email protected] <mailto:[email protected]>> wrote:

                How long did the job run for, and what is the
                configured interval?


                On 06/07/2020 15:51, Manish G wrote:

                Hi,

                Thanks for this.

                I did the configuration as mentioned at the
                link(changes in flink-conf.yml, copying the jar
                in lib directory), and registered the Meter with
                metrics group and invoked markEvent() method in
                the target code. But I don't see any related logs.
                I am doing this all on my local computer.

                Anything else I need to do?

                With regards
                Manish

                On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
                <[email protected] <mailto:[email protected]>>
                wrote:

                    Have you looked at the SLF4J reporter?

                    
https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

                    On 06/07/2020 13:49, Manish G wrote:
                    > Hi,
                    >
                    > Is it possible to log Flink metrics in
                    application logs apart from
                    > publishing it to Prometheus?
                    >
                    > With regards

Re: Logging Flink metrics

Reply via email to