Hi,

https://github.com/cloudflare/pint is a small tool we use at Cloudflare to 
try to better manage our ever growing collection of recording and alerting 
rules.
The main motivation for it was to help with pull requests that are adding 
or editing rule files where we often would need to check:
* how many time series would a new recording rule add
* how many times a new alert will trigger based on historical metrics
* are all time series used in a rule present in our Prometheus instances 
(we have a non-trivial topology)
And that's on top of simple conventions we have, for example each alert 
should have a set of well known labels and annotations, like severity or a 
link to a Grafana dashboard and a runbook. But even those conventions, 
while simple themselves, only apply to "production" alerts, rather than 
"test" alerts that are present in config, but not yet paging anyone.

While the code is fairly fresh it's been used internally for a while with 
good results, so I hope this will be useful for others.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/895de3f8-35c0-4fee-9807-9225eb1aa330n%40googlegroups.com.

Reply via email to