Hello. First of all, english is not my native language so excuse me if I cannot explain myself well enough.
I am facing the following situation: - Several Prometheus servers deployed in clusters dedicated to development environments (dev, itg, pre, pro), federated against a central one (utils). All of them in HA configuration. - Each of the Prometheus has external labels configured (dev, itg, pre, pro, utils), for example: externalLabels: cluster: dev-gke-cluster environment: dev - Only 1 alertmanager deployed alongside the central Prometheus, in HA configuration. - honor labels is enabled for federated targets. Prometheus was deployed with helm chart prometheus-operator. The problem here is, with some of the prometheus-operator default alert rules I am not able to tell where the alert comes from, because the external labels get overwritten. For example, with the KubePodNotReady alert: In Prometheus alerts tab: Annotations message Pod demo-apps-devops-back/fwk-springboot-service-example-969897fd4-6c6gd has been in a non-ready state for longer than 15 minutes. runbook_url https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready alertname=KubePodNotReadynamespace=elasticpod=filebeat-filebeat-m8rgl severity=critical FIRING 2020-06-12T06:03:39.737453697Z 1e+00 In Alertmanager: 06:18:09, 2020-06-12 (UTC)InfoSource <http://prometheus.gcp.mercadona.com/graph?g0.expr=sum+by%28namespace%2C+pod%29+%28max+by%28namespace%2C+pod%29+%28kube_pod_status_phase%7Bjob%3D%22kube-state-metrics%22%2Cnamespace%3D~%22.%2A%22%2Cphase%3D~%22Pending%7CUnknown%22%7D%29+%2A+on%28namespace%2C+pod%29+group_left%28owner_kind%29+max+by%28namespace%2C+pod%2C+owner_kind%29+%28kube_pod_owner%7Bowner_kind%21%3D%22Job%22%7D%29%29+%3E+0&g0.tab=1> Silence <https://alertmanager.gcp.mercadona.com/#/silences/new?filter=%7Balertname%3D%22KubePodNotReady%22%2C%20cloud%3D%22gcp%22%2C%20cluster%3D%22mdona-cloud-utils-gke-cluster%22%2C%20environment%3D%22utils%22%2C%20namespace%3D%22demo-apps-devops-back%22%2C%20pod%3D%22fwk-springboot-service-example-969897fd4-6c6gd%22%2C%20prometheus%3D%22prometheus%2Fprometheus-prometheus-oper-prometheus%22%2C%20region%3D%22europe-west%22%2C%20severity%3D%22critical%22%7D> cloud="gcp"+ cluster="utils-gke-cluster"+ environment="utils"+ namespace="demo-apps-devops-back"+ pod="fwk-springboot-service-example-969897fd4-6c6gd"+ prometheus="prometheus/prometheus-prometheus-oper-prometheus"+ region="europe-west"+ severity="critical" This alert refers to a pod and namespace that do not exist in the "utils" environment, but in the "dev" one, even though we defined the environment external label. All the labels here belong to the "utils" Prometheus, where all the metrics are gathered and from where the alerts are generated. We have found that this happens whenever the alert rule expression has any type of aggregation, as the one in the example: sum by(namespace, pod) (max by(namespace, pod) (kube_pod_status_phase{job= "kube-state-metrics",namespace=~".*",phase=~"Pending|Unknown"}) * on( namespace, pod) group_left(owner_kind) max by(namespace, pod, owner_kind) ( kube_pod_owner{owner_kind!="Job"})) > 0 <https://prometheus.gcp.mercadona.com/new/graph?g0.expr=sum%20by(namespace%2C%20pod)%20(max%20by(namespace%2C%20pod)%20(kube_pod_status_phase%7Bjob%3D%22kube-state-metrics%22%2Cnamespace%3D~%22.*%22%2Cphase%3D~%22Pending%7CUnknown%22%7D)%20*%20on(namespace%2C%20pod)%20group_left(owner_kind)%20max%20by(namespace%2C%20pod%2C%20owner_kind)%20(kube_pod_owner%7Bowner_kind!%3D%22Job%22%7D))%20%3E%200&g0.tab=1&g0.stacked=0&g0.range_input=1h> This is a problem because there are namespaces with the same name in different clusters; or in other cases there is no way to be sure of the pod location except by looking for it manually. Is there any way to keep the original external labels in the final alert, or are they lost in the expression evaluation, and then replaced by the labels from the Prometheus that evaluates and sends the alert? Thanks in advance for your assistance. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/d98ca7a5-114d-41b7-b14d-482654322552o%40googlegroups.com.

