Re: [prometheus-users] How-To debug prometheus_rule_evaluation_failures_total? Prometheus is failing rule evaluations

Matthias Rampke Fri, 23 Apr 2021 11:35:30 -0700

It seems like you are federating through an ingress or load balancer that
balances over multiple Prometheus server replicas. Either federate from
each separately, or make sure that you only get responses from one
consistently.


As an alternative to the global federation, consider Thanos, it scales
further and handles this situation out of the box.

/MR

On Fri, Apr 23, 2021, 06:08 'Evelyn Pereira Souza' via Prometheus Users <
[email protected]> wrote:

> On 22.04.21 20:20, Matthias Rampke wrote:
> > Your best starting point is the rules page of the Prometheus UI
> > (:9090/rules). It will show the error. You can also evaluate the rule
> > expression yourself, using the UI, or maybe using PromLens to help debug
> > expression issues.
> >
> > /MR
>
> :9090/rules show those 2 errors - found duplicate series for the match
> group
>
> I think we may have a problem with the federation connfig..
>
> alert:PrometheusRemoteWriteBehind
> expr:(max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds[5m])
>
> - on(job, instance) group_right()
> max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds[5m]))
>
>  > 120
> for: 15m
> labels:
> severity: critical
> annotations:
> description: Prometheus {{$labels.namespace}}/{{$labels.pod}} remote
> write is {{ printf "%.1f" $value }}s behind for {{
> $labels.remote_name}}:{{ $labels.url }}.
> summary: Prometheus remote write is behind.
>
>
> found duplicate series for the match group
> {instance="prometheus.slash-dir-poc-in.kuber.example.org:9090",
> job="federate"} on the left hand-side of the operation: [{cluster="poc",
> endpoint="web", exported_instance="x.x.x.x:9090",
> exported_job="prometheus-k8s",
> instance="prometheus.slash-dir-poc-in.kuber.example.org:9090",
> job="federate", namespace="monitoring", pod="prometheus-k8s-1",
> prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0",
> service="prometheus-k8s", team="MY-TEAM-NAME"}, {cluster="poc",
> endpoint="web", exported_instance="x.x.x.x:9090",
> exported_job="prometheus-k8s",
> instance="prometheus.slash-dir-poc-in.kuber.example.org:9090",
> job="federate", namespace="monitoring", pod="prometheus-k8s-0",
> prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0",
> service="prometheus-k8s", team="MY-TEAM-NAME"}];many-to-many matching
>
> not allowed: matching labels must be unique on one side
>
>
> and
>
> record:node:node_num_cpu:sum
> expr:count by(cluster, node) (sum by(node, cpu)
> (node_cpu_seconds_total{job="node-exporter"} * on(namespace, pod)
> group_left(node) node_namespace_pod:kube_pod_info:))
>
>
> found duplicate series for the match group {namespace="monitoring",
> pod="prometheus-k8s-0"} on the right hand-side of the operation:
> [{__name__="node_namespace_pod:kube_pod_info:", cluster="preprod",
> instance="prometheus.ep-preprod-in.kuber.example.org:9090",
> job="federate", namespace="monitoring",
> node="4516e9ed-4917-4792-ad49-2158775dc07e", pod="prometheus-k8s-0",
> prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-1",
> team="MY-TEAM-NAME"}, {__name__="node_namespace_pod:kube_pod_info:",
> cluster="poc",
> instance="prometheus.slash-dir-poc-in.kuber.example.org:9090",
> job="federate", namespace="monitoring",
> node="602efe91-2eb5-466f-9350-c4c6ce35119a", pod="prometheus-k8s-0",
> prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0",
> team="MY-TEAM-NAME"}];many-to-many matching not allowed: matching labels
> must be unique on one side
>
> also this alert fires
>
> name: PrometheusOutOfOrderTimestamps
> expr: rate(prometheus_target_scrapes_sample_out_of_order_total[5m]) > 0
>
> we may have a problem with federation:
>
> We have an external Prometheus which federates from 4x k8s cluter
> Prometheus.
>
> config
>
>    - job_name: federate
>      scrape_interval: 15s
>      scrape_timeout: 15s
>      honor_labels: false
>      metrics_path: /federate
>      scheme: https
>      tls_config:
>        insecure_skip_verify: true
>      params:
>          match[]:
>            - '{__name__=~".+"}'
>      file_sd_configs:
>        - files:
>            - k8s.yml
>      relabel_configs:
>        - source_labels:
>            - __address__
>          regex: (.*)
>          replacement: ${1}:9090
>          target_label: __address__
>
>
> - labels:
>      cluster: poc
>      team: MY-TEAM-NAME
>    targets:
>      - prometheus.slash-dir-poc-in.kuber.example.org
> - labels:
>      cluster: devtest
>      team: MY-TEAM-NAME
>    targets:
>      - prometheus.slash-dir-devtest-in.kuber.example.org
> - labels:
>      cluster: preprod
>      team: MY-TEAM-NAME
>    targets:
>      - prometheus.ep-preprod-in.kuber.example.org
> - labels:
>      cluster: prod
>      team: MY-TEAM-NAME
>    targets:
>      - prometheus.ep-prod-in.kuber.example.org
>
> kind regards
> Evelyn
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/8eb1d476-d2ce-9c99-1dfa-392b390c096c%40disroot.org
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAMV%3D_gZD_2z2Y1Kx9LSUj-8hY259F%3DLYOyVP5xpv4N%3DKD3CFNg%40mail.gmail.com.

Re: [prometheus-users] How-To debug prometheus_rule_evaluation_failures_total? Prometheus is failing rule evaluations

Reply via email to