Hey there.

What I'm trying to do is basically replace Icinga with Prometheus (or
well not really replacing, but integrating it into the latter, which I
anyway need for other purposes).

So I'll have e.g. some metric that shows me the RAID status on
instances, and I want to get an alert, when a HDD is broken.


I guess it's obvious that it could turn out bad if I don't get an
alert, just because the metric data isn't there (for some reason).


In Icinga, this would have been simple:
The system knows about every host and every service it needs to check.
If there's no result (like RAID is OK or FAILED) anymore (e.g. because
the raid CLI tool is no installed), the check's status would at least
go into UNKNOWN.



I wonder how this is / can be handled in Prometheus?


I mean I can of course check e.g.
   expr: up == 0
in some alert.
But AFAIU this actually just tells me whether there are any scrape
targets that couldn't be scraped (in the last run, based on the scrape
interval), right?

If my important checks were all their own exporters, e.g. one exporter
just for the RAID status, then - AFAIU - this would already work any
notify me for sure, even if there's no result at all.

But what if it's part of some larger exporter, like e.g. the mdadm data
in node exporter.

up wouldn't become 0, just because node_md_disks would be not part of
the metrics.


Even if I'd say it's the duty of the exporter to make sure that there
is a result even on failure to read the status... what e.g. if some
tool is already needed just to determine whether that metric make sense
to be collected at all.
That would by typical for most hardware RAID controllers... you need
the respective RAID tool just to see whether any RAIDS are present.


So in principle I'd like a simple way to check for a certain group of
hosts on the availability of a certain time series, so that I can set
up e.g. an alert that fires if any node where I have e.g. some MegaCLI
based RAID, lacks megacli_some_metric.

Or is there some other/better way this is done in practise?


Thanks,
Chris.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/fbfd8ca1c671830a3ce428a54a60aebed2ea596e.camel%40gmail.com.

Reply via email to