Team,

I discussed this idea briefly at the last dev summit, and we agreed I'd 
raise a thread for this topic here. I'm proposing a mechanism to evaluate 
rules concurrently, under certain conditions, to improve rule reliability.

Rule groups execute concurrently, but the rules within a group execute 
sequentially; this is because rules can use the output of a preceding rule 
as their input. However, if there is no detectable relationship between 
rules then there is no reason to run them sequentially.

A missed group iteration occurs when the cumulative time to evaluate all 
rules exceeds the interval defined for that group. When this occurs, alert 
expressions are not evaluated for the next iteration and likewise recording 
rules produce no series; this can be a large reliability problem.

*By evaluating rules concurrently, the likelihood of missed group 
iterations is reduced.*

Of course, the trade-off here is more concurrent query load on the query 
engine. This can be ameliorated by bounding the concurrency using a global 
weighted semaphore. This feature would be opt-in, and the concurrency 
configurable.

Here is my implementation:
https://github.com/prometheus/prometheus/pull/12946

The feature is hidden behind a feature-flag, but I would argue that we can 
drop the flag and simply set --rules.max-concurrent-evals=0 as default which 
is functionally equivalent to not having any concurrency at all (the 
current behaviour); double opt-in feels unnecessary.

As an aside, this feature will be quite useful for Grafana Loki (for which 
I'm a maintainer). We vendor in the Prometheus rule engine for our rule 
evaluation, and we have a mode now where rules can be evaluated in a 
distributed fashion. Our rules run sequentially, but they don't need to 
(since our rules cannot have interdependencies), and being able to run a 
certain number of rules concurrently would massively improve our rule 
evaluation reliability.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/398f5867-e75e-4af8-9602-a27590b0ef73n%40googlegroups.com.

Reply via email to