Are you saying there are 3 million time series for the http_server_requests_seconds_bucket metric, or in total for the server?
Looking at your query the uri label looks very problematic - if that is the URI called by an external Internet user that has infinite cardinality, as they could just make up things. That could completely break your server. If you do want to have some indication of the page requested I'd suggest some sort of processing of the raw value. Remove as much as you can (query parameters, final piece of the path?) and ideally match against an allow list (with "other" for anything that is rejected). On 30 November 2020 14:34:26 GMT, Julian Maicher <[email protected]> wrote: >Hi, > >we have a set of high-cardinality metrics and currently design >recording >rules, primarily to improve dashboard performance. >At a certain threshold, we observe group evaluation times exceeding the > >interval, thus leading to iteration misses [1]. >In these cases, we can also see that the next iteration starts at the >end >of the last evaluation plus the interval. So the the iteration is not >really skipped but rather delayed (the schedule has a lag). > >What is the impact of this? Do we need to worry about iteration misses? > >To be more concrete, here is one of our rule groups: > >groups: >- name: http_server_requests_seconds_bucket > rules: > - record: >app_method_uri_status_le:http_server_requests_seconds_bucket:rate1m > expr: sum by(app, method, uri, status, le) >(rate(http_server_requests_seconds_bucket[1m])) > - record: app_le:http_server_requests_seconds_bucket:rate1m > expr: sum by(app, le) >(app_method_uri_status_le:http_server_requests_seconds_bucket:rate1m) > >The scrape interval is set to 15s, the evaluation interval to 30s. >With ~3Mio time series [2], we see evaluation times of ~1m. > >[1] We use "prometheus_rule_group_iterations_missed_total" to monitor >iteration misses >[2] We have a little test tool to simulate load on prometheus before >rolling this out. We're trying to find limits of a single prometheus >instance before scaling horizontally (federation) or reaching for e.g., > >Thanos, Cortex. > >-- >You received this message because you are subscribed to the Google >Groups "Prometheus Users" group. >To unsubscribe from this group and stop receiving emails from it, send >an email to [email protected]. >To view this discussion on the web visit >https://groups.google.com/d/msgid/prometheus-users/33cb8c8e-7d88-4f27-818f-2ddf0a4bab94n%40googlegroups.com. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/F897F83C-0A77-4C9A-A86F-3CDEF09FDEDD%40Jahingo.com.

