We use a Prometheus alert (and node-exporter) to check whether we are
running out of memory on a node.
Issue: In many cases I get an alert with a $value that is below the
threshold value in the expression.The expression is:
alert: GettingOutOfMemory
> expr: max(sum
> by(instance) ((((node_memory_MemTotal_bytes) - (node_memory_MemFree_bytes +
> node_memory_Buffers_bytes
> + node_memory_Cached_bytes)) / (node_memory_MemTotal_bytes)) * 100)) >= 90
> for: 5m
> labels:
> severity: warning
> annotations:
> description: Docker Swarm node {{ $labels.instance }} memory usage is at {{
> humanize $value}}%.
> summary: Memory is getting low for Swarm node '{{ $labels.node_name }}'
>
>
I get messages saying that we ran out of memory at e.g. 63%. So that is the
value of the $value. This is clearly below the 90% threshold.
Why do I get this alert even though the $value is below the threshold?
How can I repair this Prometheus alert rule so I will only get only alerts
when the $value is above the threshold?
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/6a2cc11f-b733-4a66-87b5-55e8355d7ebb%40googlegroups.com.