Thanks Brain.
>
> Can you give some more specific examples? What metric are you joining
> with - perhaps node_uname_info? >>
>
- alert: HighCpuLoadCrit
expr: (node_load15 > (2 * count without (cpu, mode)
(node_cpu_seconds_total{mode="system"}))) ** on(instance)
group_left(nodename) node_uname_info*
> Note that the "up" metric will still exist (with a value of 0) when a
> scrape fails - this means:
> (a) you can join on it, and >>
>
UP metrics will exist but if the node exporter itself is down, it won't
expose the metric at that time right? So, I won't get the "nodename" label
from node_uname_info.
(b) you can alert on this condition, i.e. scrape failed / node_exporter is
> down. This is a different condition than "blackbox_exporter says
> host/service is down, but node_exporter is still being scraped". Hence the
> alerting rule for (up == 0) can be written to avoid the join. There is
> actually a benefit here: you'll only get one alert when the host goes down,
> instead of lots. >>
> I am using up == 0 only and using it as inhibition rule also, but (up ==
> 0) itself won't give me the hostname. My main aim is to get the hostname
> for every alert. But, when the server is actually down i.e node exporter
> will also be down and again I won't get nodename label.
>
Please correct me if I am wrong anywhere.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/e6344cd3-12ed-4c65-9d31-165d8a028131%40googlegroups.com.