I've recently started monitoring a large fleet of hardware devices using a combination of blackbox, snmp, node, and json exporters. I started out using the *up* metric, but I noticed when using blackbox ping, *up* is *always* 1 even when the device is offline. So I plan to switch to *probe_success* instead. But I'm thinking about the implications of this when mixed with other exporters. For example json-exporter does not offer a *probe_success* metric; instead it returns *up*=0 when the target times out.
My goal is to build a Grafana dashboard and alerts that monitors a combination of blackbox and other exporters. For context, when certain devices crash, they remain pingable, but they return their failed state via REST API. So I'm setting the json-exporter to an HTTP target endpoint. I'm struggling to come up with a unified way of monitoring all these different devices types. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1746ad20-654f-499c-ae1d-28b84d3cb962n%40googlegroups.com.