We need to generate an alert - via Prometheus snmp_exporter metrics - when
less than 80% of the nodes on our active bigip F5 load balancer is up (i.e.
). I think we have the percentage of up hosts, but am not sure how to
ensure that we are only alerting on the active F5 load balancer node. In
the snmp_exporter each F5 node is a distinct instance label name.
Here are the two metrics in question.
host up metric: ltmPoolMemberMonitorState = 4
f5 node active metric: sysCmFailoverStatusId = 4
Below are counting the number of ltmPoolMemberNodeName with a
ltmPoolMemberNodeName that includes "prod" that are up, divided by the
total number of ltmPoolMemberNodeName. Then we appended the OR operator to
provide a 0 when all hosts are in a down state (i.e.
ltmPoolMemberMonitorState is not 4). See below:
count(count by (ltmPoolMemberNodeName)
(ltmPoolMemberMonitorState{ltmPoolMemberNodeName=~".*prod.*"} == 4)) /
count(count by (ltmPoolMemberNodeName)
(ltmPoolMemberMonitorState{ltmPoolMemberNodeName=~".*prod.*"})) OR on()
vector(0)
Now we need to ensure that we are only deriving the calculation from the
active f5 node instance metrics (i.e. when the metric sysCmFailoverStatusId
is equal to 4 for a particular instance). I tried with (instance) and on
(instance) to keep the metrics on same F5 node instance label, but haven't
had any luck. Any recommendations would be greatly appreciated.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/ba41c4cb-2bbf-4b79-a455-c19d4a1a4842n%40googlegroups.com.