Also, all the problems are in the DBs (backend prometheus) not front end. On Wednesday, April 5, 2023 at 4:50:39 PM UTC-4 Johny wrote:
> Prometheus version is 2.39.1 > > There are many users and some legacy clients that add friction to changing > queries across the board. > During ingestion, we can make use of relabeling to drop labels > automatically. > > I am fairly certain this is the root cause for performance degradation in > the system, as we're able to reproduce the problem in a load test --- > simulating queries with/without the concerning label filter, the latter > performing much better with no memory problems. > > > > On Wednesday, April 5, 2023 at 3:50:08 PM UTC-4 Brian Candler wrote: > >> I wonder if the filtering algorithm is really as simplistic as the >> Timescale blog implies ("for every label/value pair, first find *every* >> possible series which matches; then take the intersection of the >> results")? I don't know, I'll leave others to answer that. If it had some >> internal stats so that it could start with the labels which match the >> fewest number of series, I'd expect it to do that; and the TSDB stats in >> the web interface suggests that it does. >> >> I ask again: what version(s) of Prometheus are you running? >> >> Are you experiencing this with all prometheus components, i.e. a >> prometheus front-end talking to prometheus back-ends with remote_read? >> >> I think the ideal thing would be to narrow this down to a reproducible >> test case: either a particular pattern of remote_read queries which is >> performing badly at the backend, or a particular query sent to the >> front-end which is being sent to the backend in a suboptimal way (e.g. not >> including all possible label filters at once). >> >> You said "for now we need a workaround". Is it not sufficient simply to >> remove {*global_label="constant-value"*} from your queries? After all, >> you're already thinking about removing this label at ingestion time, and if >> you do that, you won't be able to filter on it anyway. >> >> On Wednesday, 5 April 2023 at 18:50:02 UTC+1 Johny wrote: >> >>> The count of time series/metric for a few selected metrics is close to 2 >>> million today. For scalability, we shard the data onto a few Prometheus >>> instances and use remote read from a front end Prometheus to fetch data >>> from the storage units. >>> >>> The series' are fetched from time series blocks by taking an >>> intersection of series (or postings) across all label filters in query. >>> First, the index postings are scanned for each label filter; second step >>> finds matching series with an implicit AND operator. From my understanding, >>> the low cardinality label present in all series will cause a large portion >>> of index to load in memory (during the first step). We've also observed >>> memory spikes during query processing when the system gets a steady dose of >>> queries. Without including this filter, the memory usage is lower and query >>> returns much faster. >>> >>> >>> https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look&text=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20 >>> . >>> >>> So, I believe if we exclude the const label in ingestion, we won't have >>> this problem in the long term. Excluding this filter somewhere in the front >>> end will help mitigate this problem. >>> >>> >>> >>> On Wednesday, April 5, 2023 at 1:13:42 PM UTC-4 Brian Candler wrote: >>> >>>> Also: how many timeseries are you working with, in terms of the >>>> "my_series" that you are querying, and globally on the whole system? >>>> >>>> On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote: >>>> >>>>> Adding a constant label to every timeseries should have almost zero >>>>> impact on memory usage. >>>>> >>>>> Can you clarify what you're saying, and how you've come to your >>>>> diagnosis? What version of prometheus are you running? When you say >>>>> "backends" in the plural, how have you set this up? >>>>> >>>>> At one point you seem to be saying it's something to do with >>>>> ingestion, but then you seem to be saying it's something to do with >>>>> queries >>>>> (*"Without this filter, the queries run reasonably well"*). Can you >>>>> give specific examples of filters which show the difference in behaviour? >>>>> >>>>> Again: the queries >>>>> my_series{global_label="constant-value", l1="..", l2=".."} >>>>> my_series{l1="..", l2=".."} >>>>> should perform almost identically, as they will select the same subset >>>>> of timeseries. >>>>> >>>>> On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote: >>>>> >>>>>> There is a performance related issue we're facing in Prometheus >>>>>> coming from a label with a constant value across all (thousands of) time >>>>>> series. The label filter in query causes a large quantity of metadata to >>>>>> load in memory overwhelming Prometheus backends. Without this filter, >>>>>> the >>>>>> queries run reasonably well. We are planning to exclude this label in >>>>>> ingestion in future, but for now we need a workaround. >>>>>> >>>>>> my_series{*global_label="constant-value"*, l1="..", l2=".."} >>>>>> >>>>>> Is there a mechanism to automatically exclude global_label in query >>>>>> configuration: remote_read subsection, or elsewhere? >>>>>> >>>>>> thanks, >>>>>> Johny >>>>>> >>>>>> >>>>>> >>>>>> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b905bc69-f286-4a6a-b68c-60363e796c8dn%40googlegroups.com.