Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-05 Thread Peter Schrott
Hi ChangZhuo, By low level process function you mean the Data Streams API, right? How to disable the Kafka metrics when creating a Kafka source / sink is described here: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#additional-properties The same pr

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Mason Chen
Nice work Peter! Looking forward to the fix. @ChangZhou Kafka metrics are emitted from the source and the process function would be a different operator. For the datastream API, you can set `KafkaSourceOptions.REGISTER_KAFKA_CONSUMER_METRICS.key()` as `false` in your consumer properties. Best, Ma

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread 陳昌倬
On Wed, May 04, 2022 at 01:53:01PM +0200, Chesnay Schepler wrote: > Disabling the kafka metrics _should_ work. Is there anyway to disable Kafka metrics when using low level process function? -- ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org http://czchen.info/ Key fingerprint = BA04 346D C2E1

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Chesnay Schepler
Ah that's unfortunate. Yeah the feature freeze was quite a bit earlier than I remembered :( On 04/05/2022 15:31, Peter Schrott wrote: Hi Chesnay, Thanks again for the hints. Unfortunately the metrics filtering feature is not part of 1.15.0. It seems to be part of 1.16.0: https://issues.apac

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Peter Schrott
Hi Chesnay, Thanks again for the hints. Unfortunately the metrics filtering feature is not part of 1.15.0. It seems to be part of 1.16.0: https://issues.apache.org/jira/browse/FLINK-21585 I was already wondering why I could not find the feature in the docs you linked. > Disabling the kafka metri

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Chesnay Schepler
Disabling the kafka metrics _should_ work. Alternatively you could use the new generic feature to filter metrics: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/metric_reporters/#filter-excludes metrics.reporter..filter.excludes: *KafkaProducer*;*KafkaConsumer* Th

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Peter Schrott
Allright! Thanks! I tried to dig a bit deeper and see if there is any workaround for that problem. I tried to switch off reporting the Kafka metrics, but I was not quite successful. I am using the table api Kafka connector. Do you have any suggestions on how to overcome this? Could you also prov

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Chesnay Schepler
https://issues.apache.org/jira/browse/FLINK-27487 On 04/05/2022 13:22, Chesnay Schepler wrote: Yes, that looks like a new bug in 1.15. The migration to the new non-deprecated Kafka API in the KafkaMetricMutableWrapper was done incorrectly. This should affect every job that uses the new kafka

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Chesnay Schepler
Yes, that looks like a new bug in 1.15. The migration to the new non-deprecated Kafka API in the KafkaMetricMutableWrapper was done incorrectly. This should affect every job that uses the new kafka connector. Thank you for debugging the issue! I will create a ticket. On 04/05/2022 12:24, Pet

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Peter Schrott
As the stracktrace says, class cast exception occurs here: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/metrics/KafkaMetricMutableWrapper.java#L37 I found the following metrics to be affected

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Peter Schrott
Sorry for the spamming! Just after jumping into the debug-session I noticed that there are indeed exceptions thrown when fetching the metrics on port 9200: 13657 INFO [ScalaTest-run] com.sun.net.httpserver - HttpServer created http 0.0.0.0/0.0.0.0:9200 13658 INFO [ScalaTest-run] com.sun.net.ht

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Peter Schrott
Hi Chesnay, Thanks for that support! Just for compilation: Running the "Problem-Job" locally as test in Intellij (as Chesney suggested above) reproduces the described problem: ➜ ~ curl localhost:9200curl: (52) Empty reply from server Doing the same with other jobs metrics are available on local

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread Chesnay Schepler
> I noticed that my config of the PrometheusReporter is different here. I have: `metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter`. I will investigate if this is a problem. That's not a problem. > Which trace logs are interesting? The logging config I provi

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread Peter Schrott
Hi Chesnay, Thanks for the code snipped. Which trace logs are interesting? Of " org.apache.flink.metrics.prometheus.PrometheusReporter"? I could also add this logger settings in the environment where the problem is present. Other than that, I am not sure how to reproduce this issue in a local set

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread Chesnay Schepler
You'd help me out greatly if you could provide me with a sample job that runs into the issue. So far I wasn't able to reproduce the issue, but it should be clear that there is some given 3 separate reports, although it is strange that so far it was only reported for Prometheus. If one of you is

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread 陳昌倬
On Tue, May 03, 2022 at 10:32:03AM +0200, Peter Schrott wrote: > Hi! > > I also discovered problems with the PrometheusReporter on Flink 1.15.0, > coming from 1.14.4. I already consulted the mailing list: > https://lists.apache.org/thread/m8ohrfkrq1tqgq7lowr9p226z3yc0fgc > I have not found the und

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread 陳昌倬
On Tue, May 03, 2022 at 10:28:18AM +0200, Chesnay Schepler wrote: > Is there any warning in the logs containing "Error while handling metric"? No, we don't find any "Error while handling metric" -- ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org http://czchen.info/ Key fingerprint = BA04 346D C

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread Peter Schrott
Hi! I also discovered problems with the PrometheusReporter on Flink 1.15.0, coming from 1.14.4. I already consulted the mailing list: https://lists.apache.org/thread/m8ohrfkrq1tqgq7lowr9p226z3yc0fgc I have not found the underlying problem or a solution to it. Actually, after re-checking, I see th

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread Chesnay Schepler
Is there any warning in the logs containing "Error while handling metric"? On 03/05/2022 10:18, ChangZhuo Chen (陳昌倬) wrote: On Tue, May 03, 2022 at 01:00:42AM -0700, Mason Chen wrote: Hi ChangZhou, The warning log indicates that the metric was previously defined and so the runtime is handling

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread 陳昌倬
On Tue, May 03, 2022 at 01:00:42AM -0700, Mason Chen wrote: > Hi ChangZhou, > > The warning log indicates that the metric was previously defined and so the > runtime is handling the "duplicate" metric by ignoring it. This is > typically a benign message unless you rely on this metric. Is it possib

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread Mason Chen
Hi ChangZhou, The warning log indicates that the metric was previously defined and so the runtime is handling the "duplicate" metric by ignoring it. This is typically a benign message unless you rely on this metric. Is it possible that you are using the same task name for different tasks? It would