On 22.04.21 20:20, Matthias Rampke wrote:
Your best starting point is the rules page of the Prometheus UI (:9090/rules). It will show the error. You can also evaluate the rule expression yourself, using the UI, or maybe using PromLens to help debug expression issues.

/MR

:9090/rules show those 2 errors - found duplicate series for the match group

I think we may have a problem with the federation connfig..

alert:PrometheusRemoteWriteBehind
expr:(max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds[5m]) - on(job, instance) group_right() max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds[5m])) > 120
for: 15m
labels:
severity: critical
annotations:
description: Prometheus {{$labels.namespace}}/{{$labels.pod}} remote write is {{ printf "%.1f" $value }}s behind for {{ $labels.remote_name}}:{{ $labels.url }}.
summary: Prometheus remote write is behind.


found duplicate series for the match group {instance="prometheus.slash-dir-poc-in.kuber.example.org:9090", job="federate"} on the left hand-side of the operation: [{cluster="poc", endpoint="web", exported_instance="x.x.x.x:9090", exported_job="prometheus-k8s", instance="prometheus.slash-dir-poc-in.kuber.example.org:9090", job="federate", namespace="monitoring", pod="prometheus-k8s-1", prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0", service="prometheus-k8s", team="MY-TEAM-NAME"}, {cluster="poc", endpoint="web", exported_instance="x.x.x.x:9090", exported_job="prometheus-k8s", instance="prometheus.slash-dir-poc-in.kuber.example.org:9090", job="federate", namespace="monitoring", pod="prometheus-k8s-0", prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0", service="prometheus-k8s", team="MY-TEAM-NAME"}];many-to-many matching
not allowed: matching labels must be unique on one side


and

record:node:node_num_cpu:sum
expr:count by(cluster, node) (sum by(node, cpu) (node_cpu_seconds_total{job="node-exporter"} * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))


found duplicate series for the match group {namespace="monitoring", pod="prometheus-k8s-0"} on the right hand-side of the operation: [{__name__="node_namespace_pod:kube_pod_info:", cluster="preprod", instance="prometheus.ep-preprod-in.kuber.example.org:9090", job="federate", namespace="monitoring", node="4516e9ed-4917-4792-ad49-2158775dc07e", pod="prometheus-k8s-0", prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-1", team="MY-TEAM-NAME"}, {__name__="node_namespace_pod:kube_pod_info:", cluster="poc", instance="prometheus.slash-dir-poc-in.kuber.example.org:9090", job="federate", namespace="monitoring", node="602efe91-2eb5-466f-9350-c4c6ce35119a", pod="prometheus-k8s-0", prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0", team="MY-TEAM-NAME"}];many-to-many matching not allowed: matching labels must be unique on one side

also this alert fires

name: PrometheusOutOfOrderTimestamps
expr: rate(prometheus_target_scrapes_sample_out_of_order_total[5m]) > 0

we may have a problem with federation:

We have an external Prometheus which federates from 4x k8s cluter Prometheus.

config

  - job_name: federate
    scrape_interval: 15s
    scrape_timeout: 15s
    honor_labels: false
    metrics_path: /federate
    scheme: https
    tls_config:
      insecure_skip_verify: true
    params:
        match[]:
          - '{__name__=~".+"}'
    file_sd_configs:
      - files:
          - k8s.yml
    relabel_configs:
      - source_labels:
          - __address__
        regex: (.*)
        replacement: ${1}:9090
        target_label: __address__
                

- labels:
    cluster: poc
    team: MY-TEAM-NAME
  targets:
    - prometheus.slash-dir-poc-in.kuber.example.org
- labels:
    cluster: devtest
    team: MY-TEAM-NAME
  targets:
    - prometheus.slash-dir-devtest-in.kuber.example.org
- labels:
    cluster: preprod
    team: MY-TEAM-NAME
  targets:
    - prometheus.ep-preprod-in.kuber.example.org
- labels:
    cluster: prod
    team: MY-TEAM-NAME
  targets:
    - prometheus.ep-prod-in.kuber.example.org

kind regards
Evelyn

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8eb1d476-d2ce-9c99-1dfa-392b390c096c%40disroot.org.

Attachment: OpenPGP_0x61776FA8E38403FB.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to