[prometheus-users] Worked example for "using-time-series-as-alert-thresholds" blog posts?

William Hargrove Thu, 13 Jan 2022 12:30:44 -0800

Is anyone able to share some worked examples based from Brian's blog post 
(https://www.robustperception.io/using-time-series-as-alert-thresholds), 
specifically related to setting differing disk space thresholds and 
alerting from those?


I'm struggling to apply this article - perhaps I am finding the concept a 
little abstract.

I have tried to construct the example illustrated below, but this doesn't 
work and the recording rules error with "vector contains metrics with the 
same labelset after applying rule labels".

I have three example systems, one should be monitored by the "default" 
threshold in the alert rule definition (instance: pi4-1.home:9100), the 
other two should have threshold set via recording rules (instance: 
pi4-2.home:9100 adn pi4-3.home:9100). I want to set thresholds per 
instance. Am I correct in thinking that I need a rule per instance as I am 
setting the override on the instance label?

Alert:

    - alert: HostOutOfDiskSpace
      expr: |
        # Alert on per instance thresholds, with a default
        (node_filesystem_avail_bytes{mountpoint="/"} * 100) / 
node_filesystem_size_bytes{mountpoint="/"}
        < on (instance) group_left()
        (
            node_filesystem_threshold
          or on(instance)
            count by (instance)(node_filesystem_avail_bytes{mountpoint="/"} 
* 100) / node_filesystem_size_bytes{mountpoint="/"} * 0 + 70
        )
      for: 5s
      labels:
        severity: critical
        notification: slack
      annotations:
        summary: "{{ $labels.alertname }} on {{ $labels.instance }}"
        description: "Disk is almost full {{ humanize $value }}% on {{ 
$labels.mountpoint }}"

Recording rules:

groups:
- name: example
  rules:
  - record: node_filesystem_threshold
    expr: (node_filesystem_avail_bytes{mountpoint="/"} * 100) / 
node_filesystem_size_bytes{mountpoint="/"} < 90
    labels:
      instance: pi4-2.home:9100


  - record: node_filesystem_threshold
    expr: (node_filesystem_avail_bytes{mountpoint="/"} * 100) / 
node_filesystem_size_bytes{mountpoint="/"} < 90
    labels:
      instance: pi4-3.home:9100

This gives the error below:

[image: recording_rule_capture.PNG]
If anyone is able to help me build out/correct this example I would be most 
grateful.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/79fc2693-3fe1-4c9e-aacd-8bf7c66a5b3en%40googlegroups.com.

[prometheus-users] Worked example for "using-time-series-as-alert-thresholds" blog posts?

Reply via email to