> > or are they lost in the expression evaluation, and then replaced by the > labels from the Prometheus that evaluates and sends the alert?
I guess this is what's happening probably. Is there any way to keep the original external labels in the final alert I guess your best bet is to either change the names of the external labels of the central(utils) Prometheus or re-write labels of child clusters at scrape time, using scrape_relabel_config. something like this: relabel_config regex = "cluster" replace = "child_cluster" action = keep. On Fri, Jun 12, 2020 at 8:34 PM dgarciad <[email protected]> wrote: > Hello. > > First of all, english is not my native language so excuse me if I cannot > explain myself well enough. > > I am facing the following situation: > > - Several Prometheus servers deployed in clusters dedicated to development > environments (dev, itg, pre, pro), federated against a central one (utils). > All of them in HA configuration. > - Each of the Prometheus has external labels configured (dev, itg, pre, > pro, utils), for example: > externalLabels: > cluster: dev-gke-cluster > environment: dev > - Only 1 alertmanager deployed alongside the central Prometheus, in HA > configuration. > - honor labels is enabled for federated targets. > > Prometheus was deployed with helm chart prometheus-operator. > > The problem here is, with some of the prometheus-operator default alert > rules I am not able to tell where the alert comes from, because the > external labels get overwritten. > For example, with the KubePodNotReady alert: > > In Prometheus alerts tab: > > Annotations > > message > Pod demo-apps-devops-back/fwk-springboot-service-example-969897fd4-6c6gd > has been in a non-ready state for longer than 15 minutes. > runbook_url > > https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready > alertname=KubePodNotReadynamespace=elasticpod=filebeat-filebeat-m8rgl > severity=critical FIRING 2020-06-12T06:03:39.737453697Z 1e+00 > > In Alertmanager: > > 06:18:09, 2020-06-12 (UTC)InfoSource > <http://prometheus.gcp.mercadona.com/graph?g0.expr=sum+by%28namespace%2C+pod%29+%28max+by%28namespace%2C+pod%29+%28kube_pod_status_phase%7Bjob%3D%22kube-state-metrics%22%2Cnamespace%3D~%22.%2A%22%2Cphase%3D~%22Pending%7CUnknown%22%7D%29+%2A+on%28namespace%2C+pod%29+group_left%28owner_kind%29+max+by%28namespace%2C+pod%2C+owner_kind%29+%28kube_pod_owner%7Bowner_kind%21%3D%22Job%22%7D%29%29+%3E+0&g0.tab=1> > Silence > <https://alertmanager.gcp.mercadona.com/#/silences/new?filter=%7Balertname%3D%22KubePodNotReady%22%2C%20cloud%3D%22gcp%22%2C%20cluster%3D%22mdona-cloud-utils-gke-cluster%22%2C%20environment%3D%22utils%22%2C%20namespace%3D%22demo-apps-devops-back%22%2C%20pod%3D%22fwk-springboot-service-example-969897fd4-6c6gd%22%2C%20prometheus%3D%22prometheus%2Fprometheus-prometheus-oper-prometheus%22%2C%20region%3D%22europe-west%22%2C%20severity%3D%22critical%22%7D> > cloud="gcp"+ > cluster="utils-gke-cluster"+ > environment="utils"+ > namespace="demo-apps-devops-back"+ > pod="fwk-springboot-service-example-969897fd4-6c6gd"+ > prometheus="prometheus/prometheus-prometheus-oper-prometheus"+ > region="europe-west"+ > severity="critical" > > This alert refers to a pod and namespace that do not exist in the "utils" > environment, but in the "dev" one, even though we defined the environment > external label. All the labels here belong to the "utils" Prometheus, where > all the metrics are gathered and from where the alerts are generated. > > We have found that this happens whenever the alert rule expression has any > type of aggregation, as the one in the example: > > sum by(namespace, pod) (max by(namespace, pod) (kube_pod_status_phase{job= > "kube-state-metrics",namespace=~".*",phase=~"Pending|Unknown"}) * on( > namespace, pod) group_left(owner_kind) max by(namespace, pod, owner_kind) > (kube_pod_owner{owner_kind!="Job"})) > 0 > <https://prometheus.gcp.mercadona.com/new/graph?g0.expr=sum%20by(namespace%2C%20pod)%20(max%20by(namespace%2C%20pod)%20(kube_pod_status_phase%7Bjob%3D%22kube-state-metrics%22%2Cnamespace%3D~%22.*%22%2Cphase%3D~%22Pending%7CUnknown%22%7D)%20*%20on(namespace%2C%20pod)%20group_left(owner_kind)%20max%20by(namespace%2C%20pod%2C%20owner_kind)%20(kube_pod_owner%7Bowner_kind!%3D%22Job%22%7D))%20%3E%200&g0.tab=1&g0.stacked=0&g0.range_input=1h> > > This is a problem because there are namespaces with the same name in > different clusters; or in other cases there is no way to be sure of the pod > location except by looking for it manually. > > Is there any way to keep the original external labels in the final alert, > or are they lost in the expression evaluation, and then replaced by the > labels from the Prometheus that evaluates and sends the alert? > > Thanks in advance for your assistance. > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/d98ca7a5-114d-41b7-b14d-482654322552o%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/d98ca7a5-114d-41b7-b14d-482654322552o%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHo%3DpzBW0zPxPHyPWcPuyv6pM2EfHTxPU1%3DQtpvh6db0UKuTKQ%40mail.gmail.com.

