Are you running Flink is WSL by chance?
On 06/07/2020 19:06, Manish G wrote:
In flink-conf.yaml:
*metrics.reporter.prom.port: 9250-9260*
This is based on information provided here
<https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter>
/*|port|- (optional) the port the Prometheus exporter listens on,
defaults to9249
<https://github.com/prometheus/prometheus/wiki/Default-port-allocations>.
In order to be able to run several instances of the reporter on one
host (e.g. when one TaskManager is colocated with the JobManager) it
is advisable to use a port range like|9250-9260|.*/
/*
*/
As I am running flink locally, so both jobmanager and taskmanager are
colocated.
In prometheus.yml:
*- job_name: 'flinkprometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9250', 'localhost:9251']
metrics_path: /*
*
*
This is the whole configuration I have done based on several tutorials
and blogs available online.
**
/**/
On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <[email protected]
<mailto:[email protected]>> wrote:
These are all JobManager metrics; have you configured prometheus
to also scrape the task manager processes?
On 06/07/2020 18:35, Manish G wrote:
The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp
lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
-1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints
numberOfFailedCheckpoints (scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope:
jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count
Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",}
2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope:
jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity
TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",}
604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope:
jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
0.0
On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler
<[email protected] <mailto:[email protected]>> wrote:
You've said elsewhere that you do see some metrics in
prometheus, which are those?
Why are you configuring the host for the prometheus reporter?
This option is only for the PrometheusPushGatewayReporter.
On 06/07/2020 18:01, Manish G wrote:
Hi,
So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////
And while I can see custom metrics in Taskmanager logs, but
prometheus dashboard logs doesn't show custom metrics.
With regards
On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler
<[email protected] <mailto:[email protected]>> wrote:
You have explicitly configured a reporter list,
resulting in the slf4j reporter being ignored:
2020-07-06 13:48:22,191 INFO
org.apache.flink.configuration.GlobalConfiguration -
Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO
org.apache.flink.runtime.metrics.ReporterSetup -
Excluding reporter slf4j, not configured in reporter
list (prom).
Note that nowadays metrics.reporters is no longer
required; the set of reporters is automatically
determined based on configured properties; the only
use-case is disabling a reporter without having to
remove the entire configuration.
I'd suggest to just remove the option, try again, and
report back.
On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings
from the metric groups/registry/reporter.
If you cannot find anything suspicious, you can also
send the foll log to me directly.
On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going.
Flink configuration is as:
metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
<[email protected] <mailto:[email protected]>> wrote:
How long did the job run for, and what is the
configured interval?
On 06/07/2020 15:51, Manish G wrote:
Hi,
Thanks for this.
I did the configuration as mentioned at the
link(changes in flink-conf.yml, copying the jar
in lib directory), and registered the Meter with
metrics group and invoked markEvent() method in
the target code. But I don't see any related logs.
I am doing this all on my local computer.
Anything else I need to do?
With regards
Manish
On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
<[email protected] <mailto:[email protected]>>
wrote:
Have you looked at the SLF4J reporter?
https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in
application logs apart from
> publishing it to Prometheus?
>
> With regards