Hi, https://github.com/cloudflare/pint is a small tool we use at Cloudflare to try to better manage our ever growing collection of recording and alerting rules. The main motivation for it was to help with pull requests that are adding or editing rule files where we often would need to check: * how many time series would a new recording rule add * how many times a new alert will trigger based on historical metrics * are all time series used in a rule present in our Prometheus instances (we have a non-trivial topology) And that's on top of simple conventions we have, for example each alert should have a set of well known labels and annotations, like severity or a link to a Grafana dashboard and a runbook. But even those conventions, while simple themselves, only apply to "production" alerts, rather than "test" alerts that are present in config, but not yet paging anyone.
While the code is fairly fresh it's been used internally for a while with good results, so I hope this will be useful for others. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/895de3f8-35c0-4fee-9807-9225eb1aa330n%40googlegroups.com.

