GitHub user summer-abacus added a comment to the discussion: Prometheus 
Exporter - VR Healthcheck results

It was easier to implement than expected. What I did to get the metrics into 
prometheus:

- Installed the prometheus node exporter on the cloudstack management host
- Configured the textfile collector 
(<https://github.com/prometheus/node_exporter#textfile-collector>)
- Wrote a small script to generate the metrics
- Execute the script via cronjob

Script:

```bash
#!/bin/bash

/usr/bin/cmk set output json

JSON=$(/usr/bin/cmk list routers listall=true 
filter=name,hostname,healthchecksfailed,)

COUNT=$(echo "$JSON" | jq '.count')

echo "# HELP router_count Total number of routers"
echo "# TYPE router_count gauge"
echo "router_count $COUNT"
echo ""
echo "# HELP router_healthchecks_failed Indicates if health checks failed for a 
router (0 = false, 1 = true)"
echo "# TYPE router_healthchecks_failed gauge"

echo "$JSON" | jq -c '.router[]' | while read -r router; do
  HEALTHCHECKS_FAILED=$(echo "$router" | jq '.healthchecksfailed')
  HOSTNAME=$(echo "$router" | jq -r '.hostname')
  NAME=$(echo "$router" | jq -r '.name')

  # Convert healthchecksfailed to 0 or 1
  if [ "$HEALTHCHECKS_FAILED" = "false" ]; then
    HEALTHCHECKS_FAILED=0
  else
    HEALTHCHECKS_FAILED=1
  fi

  echo "router_healthchecks_failed{hostname=\"$HOSTNAME\", name=\"$NAME\"} 
$HEALTHCHECKS_FAILED"
done
```

Cronjob:

```bash
* * * * * /opt/scripts/router_health_to_prom.sh > 
/opt/node_exporter/drop_zone/router.prom
```

In prometheus metrics, it looks like this:

```
# HELP router_count Total number of routers
# TYPE router_count gauge
router_count 5
# HELP router_healthchecks_failed Indicates if health checks failed for a 
router (0 = false, 1 = true)
# TYPE router_healthchecks_failed gauge
router_healthchecks_failed{hostname="srv1-213",name="r-1063-VM"} 0
router_healthchecks_failed{hostname="srv1-213",name="r-981-VM"} 0
router_healthchecks_failed{hostname="srv1-214",name="r-1335-VM"} 0
router_healthchecks_failed{hostname="srv1-214",name="r-8-VM"} 0
router_healthchecks_failed{hostname="srv1-214",name="r-982-VM"} 0
```

This allwos me to use alertmanager to trigger notifications via teams, mail etc.


GitHub link: 
https://github.com/apache/cloudstack/discussions/12624#discussioncomment-15765807

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to