Dear Biao Geng,
 
thank you very much. With the help of your demo and the YAML configuration, I was able to successfully set up monitoring for my Apache Flink jobs.
 
Thanks again for your time and help.
 
Best regards,
Oliver
 
 
Gesendet: Sonntag, 19. Mai 2024 um 17:42 Uhr
Von: "Biao Geng" <biaoge...@gmail.com>
An: "Oliver Schmied" <uncharted...@gmx.at>
Cc: user@flink.apache.org
Betreff: Re: Advice Needed: Setting Up Prometheus and Grafana Monitoring for Apache Flink on Kubernetes
Hi Oliver,
 
I believe you are almost there. One thing I found could improve is that in your job yaml, instead of using:
    kubernetes.operator.metrics.reporter.prommetrics.reporters: prom
    kubernetes.operator.metrics.reporter.prommetrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
    kubernetes.operator.metrics.reporter.prom.port: 9249-9250
, you should use 
    metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
    metrics.reporter.prom.port: "9249"
 
Configs with the prefix, `kubernetes.operator`, is for the flink k8s operator itself(You may use it if you want to collect the metrics of the operator). For the job config, we do not need it.
 
I created a detailed demo of using Prometheus to monitor jobs started by flink k8s operator. Maybe it can be helpful.
 
Best,
Biao Geng
 
Oliver Schmied <uncharted...@gmx.at> 于2024年5月19日周日 04:21写道:

Dear Apache Flink Community,

I am currently trying to monitor an Apache Flink cluster deployed on Kubernetes using Prometheus and Grafana. Despite following the official guide (https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/operations/metrics-logging/)  on how to setup prometheus I have not been able to get Flink-specific metrics to appear in Prometheus. I am reaching out to seek your assistance, as I`ve tried many things but nothing worked.

 

# My setup:

* Kubernetes

* flink v.18 deployed as FlinkDeployment

with this manifest:

```apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  namespace: default
  name: flink-cluster
spec:
  image: flink:1.18
  flinkVersion: v1_18
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    #Added
    kubernetes.operator.metrics.reporter.prommetrics.reporters: prom
    kubernetes.operator.metrics.reporter.prommetrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
    kubernetes.operator.metrics.reporter.prom.port: 9249-9250
  serviceAccount: flink
  jobManager:
    resource:
      memory: "1048m"
      cpu: 1
  taskManager:
    resource:
      memory: "1048m"
      cpu: 1

```

* Prometheus operator install via

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack

 
* deployed a pod-monitor.yaml
```
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: flink-kubernetes-operator
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: flink-cluster
  podMetricsEndpoints:
      - port: metrics
 
```
 
# The problem
 
* I can access prometheus fine and concerning the logs of the pod-monitor, it seems to collect flink specific metrics, but I can't access these metrics with flink
* Do I even setup prometheus correctly in my flink deployment manifest?
* I also added the following line to my values.yaml file, but apart from that I change nothing:
```
metrics:
  port: 9999
```
 
# My questions
 
* Can anyone see the mistake in my deployment?
* Or does anyone have a better idea on how to monitor my flink deployment?
 
 
I would be very grateful for your answers. Thank you very much.
 
Best regards,
Oliver
 
 
 
 

 

Reply via email to