Re: Missing metrics when using metric reporter on high parallelism

Chesnay Schepler Tue, 11 Aug 2020 08:26:11 -0700

IIRC this can be caused by the Carbon MAX_CREATES_PER_MINUTE setting.


I would deem it unlikely that the reporter thread is busy for 30 seconds.

On 11/08/2020 16:57, Nikola Hrusov wrote:

Hello,
I am doing some tests with flink 1.11.1 and I have noticed somethingstrange/wrong going on with the exported metrics.
I have a configuration like such:
/
metrics.reporter.graphite.class:org.apache.flink.metrics.graphite.GraphiteReporterFactory
metrics.reporter.graphite.host: graphite
metrics.reporter.graphite.port: 8080
metrics.reporter.graphite.protocol: tcp
metrics.reporter.graphite.interval: 10 SECONDS/

which should produce metrics to graphite every 10 seconds.
And that works with low parallelism (e.g. <= 20). Then we get allmetrics, all the time, every 10th second.However, when I scale my job to 200 parallelism or more, the metricsare not sent every 10 seconds. Sometimes they are missing for up to 3reporting cycles.I have had a brief look in the code here:https://github.com/apache/flink/blob/release-1.11.1/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java#L107-L144 andit looks like there is a separate thread. That was my first guess, ifit is doing too much work on the same thread.
I have tried lowering the reporting interval from 10 SECONDS to 6-7SECONDS, but even in that case there will be missing metrics. Even forsimpler jobs such as "source -> map -> sink" with higher parallelismthat would happen.
What can I do to further debug/make this work? Has anyone come acrossthis before?
Regards
,
Nikola Hrusov

Re: Missing metrics when using metric reporter on high parallelism

Reply via email to