Re: [prometheus-users] Re: remove a label filter for all PromQL queries

2023-04-06 Thread Brian Candler
Many thanks for the clarification.

Setting aside sharding: what if the label selection for timeseries were to 
start by processing the label/value pair which returned the lowest number 
of timeseries, and then work up progressively to those which match larger 
numbers?

I'm thinking about the sort of optimisation that a SQL database does when 
it has indexes on foo and bar, and you SELECT ... WHERE foo=X and bar=Y.  
If it knows from index stats that there are many fewer rows with foo=X than 
bar=Y, then it will start with the rows matching foo and then check bar for 
those rows against the other index (or vice versa, of course).

Fundamentally this depends on whether you could determine, cheaply, at 
least an order-of-magnitude estimate of how many items there are in the 
inverted index for a given label/value pair.  It's also complicated by 
having !=, =~ and !~ for label matchers (but I would be inclined to treat 
those as "likely to match many series" and therefore do those after = 
matching).

On Thursday, 6 April 2023 at 17:19:26 UTC+1 Ben Kochie wrote:

> Sorry, I had to catch up on this thread.
>
> The description is correct. The label inverse index works as described. 
> It's one of the down sides to doing an inverted index that allows for 
> arbitrary combinations of labels to be filtered on.
>
> Each metric has an internal identifier, and the label index points to all 
> metrics that contain that. This includes the __name__ index.
>
> Right now, this inverted index is not sharded. IMO it would be useful and 
> a good performance improvement to shard the index by metric name, since you 
> almost always have the __name__ value as the first entrypoint into doing a 
> lookup.
>
> On Wed, Apr 5, 2023 at 9:50 PM Brian Candler  wrote:
>
>> I wonder if the filtering algorithm is really as simplistic as the 
>> Timescale blog implies ("for every label/value pair, first find *every* 
>> possible series which matches; then take the intersection of the 
>> results")?  I don't know, I'll leave others to answer that.  If it had some 
>> internal stats so that it could start with the labels which match the 
>> fewest number of series, I'd expect it to do that; and the TSDB stats in 
>> the web interface suggests that it does.
>>
>> I ask again: what version(s) of Prometheus are you running?
>>
>> Are you experiencing this with all prometheus components, i.e. a 
>> prometheus front-end talking to prometheus back-ends with remote_read?
>>
>> I think the ideal thing would be to narrow this down to a reproducible 
>> test case: either a particular pattern of remote_read queries which is 
>> performing badly at the backend, or a particular query sent to the 
>> front-end which is being sent to the backend in a suboptimal way (e.g. not 
>> including all possible label filters at once).
>>
>> You said "for now we need a workaround".  Is it not sufficient simply to 
>> remove {*global_label="constant-value"*} from your queries? After all, 
>> you're already thinking about removing this label at ingestion time, and if 
>> you do that, you won't be able to filter on it anyway.
>>
>> On Wednesday, 5 April 2023 at 18:50:02 UTC+1 Johny wrote:
>>
>>> The count of time series/metric for a few selected metrics is close to 2 
>>> million today. For scalability, we shard the data onto a few Prometheus 
>>> instances and use remote read from a front end Prometheus to fetch data 
>>> from the storage units.
>>>
>>> The series' are fetched from time series blocks by taking an 
>>> intersection of series (or postings) across all label filters in query. 
>>> First, the index postings are scanned for each label filter; second step 
>>> finds matching series with an implicit AND operator. From my understanding, 
>>> the low cardinality label present in all series will cause a large portion 
>>> of index to load in memory (during the first step). We've also observed 
>>> memory spikes during query processing when the system gets a steady dose of 
>>> queries. Without including this filter, the memory usage is lower and query 
>>> returns much faster.
>>>
>>>
>>> https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20
>>> .
>>>   
>>> So, I believe if we exclude the const label in ingestion, we won't have 
>>> this problem in the long term. Excluding this filter somewhere in the front 
>>> end will help mitigate this problem.
>>>
>>>
>>>
>>> On Wednesday, April 5, 2023 at 1:13:42 PM UTC-4 Brian Candler wrote:
>>>
 Also: how many timeseries are you working with, in terms of the 
 "my_series" that you are querying, and globally on the whole system?

 On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote:

> Adding a constant label to every timeseries should have almost zero 
> impact on memory usage.
>
> Can 

Re: [prometheus-users] Re: remove a label filter for all PromQL queries

2023-04-06 Thread Ben Kochie
Sorry, I had to catch up on this thread.

The description is correct. The label inverse index works as described.
It's one of the down sides to doing an inverted index that allows for
arbitrary combinations of labels to be filtered on.

Each metric has an internal identifier, and the label index points to all
metrics that contain that. This includes the __name__ index.

Right now, this inverted index is not sharded. IMO it would be useful and a
good performance improvement to shard the index by metric name, since you
almost always have the __name__ value as the first entrypoint into doing a
lookup.

On Wed, Apr 5, 2023 at 9:50 PM Brian Candler  wrote:

> I wonder if the filtering algorithm is really as simplistic as the
> Timescale blog implies ("for every label/value pair, first find *every*
> possible series which matches; then take the intersection of the
> results")?  I don't know, I'll leave others to answer that.  If it had some
> internal stats so that it could start with the labels which match the
> fewest number of series, I'd expect it to do that; and the TSDB stats in
> the web interface suggests that it does.
>
> I ask again: what version(s) of Prometheus are you running?
>
> Are you experiencing this with all prometheus components, i.e. a
> prometheus front-end talking to prometheus back-ends with remote_read?
>
> I think the ideal thing would be to narrow this down to a reproducible
> test case: either a particular pattern of remote_read queries which is
> performing badly at the backend, or a particular query sent to the
> front-end which is being sent to the backend in a suboptimal way (e.g. not
> including all possible label filters at once).
>
> You said "for now we need a workaround".  Is it not sufficient simply to
> remove {*global_label="constant-value"*} from your queries? After all,
> you're already thinking about removing this label at ingestion time, and if
> you do that, you won't be able to filter on it anyway.
>
> On Wednesday, 5 April 2023 at 18:50:02 UTC+1 Johny wrote:
>
>> The count of time series/metric for a few selected metrics is close to 2
>> million today. For scalability, we shard the data onto a few Prometheus
>> instances and use remote read from a front end Prometheus to fetch data
>> from the storage units.
>>
>> The series' are fetched from time series blocks by taking an intersection
>> of series (or postings) across all label filters in query. First, the index
>> postings are scanned for each label filter; second step finds matching
>> series with an implicit AND operator. From my understanding, the low
>> cardinality label present in all series will cause a large portion of index
>> to load in memory (during the first step). We've also observed memory
>> spikes during query processing when the system gets a steady dose of
>> queries. Without including this filter, the memory usage is lower and query
>> returns much faster.
>>
>>
>> https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20
>> .
>>
>> So, I believe if we exclude the const label in ingestion, we won't have
>> this problem in the long term. Excluding this filter somewhere in the front
>> end will help mitigate this problem.
>>
>>
>>
>> On Wednesday, April 5, 2023 at 1:13:42 PM UTC-4 Brian Candler wrote:
>>
>>> Also: how many timeseries are you working with, in terms of the
>>> "my_series" that you are querying, and globally on the whole system?
>>>
>>> On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote:
>>>
 Adding a constant label to every timeseries should have almost zero
 impact on memory usage.

 Can you clarify what you're saying, and how you've come to your
 diagnosis? What version of prometheus are you running? When you say
 "backends" in the plural, how have you set this up?

 At one point you seem to be saying it's something to do with ingestion,
 but then you seem to be saying it's something to do with queries (*"Without
 this filter, the queries run reasonably well"*). Can you give specific
 examples of filters which show the difference in behaviour?

 Again: the queries
   my_series{global_label="constant-value",  l1="..", l2=".."}
   my_series{l1="..", l2=".."}
 should perform almost identically, as they will select the same subset
 of timeseries.

 On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote:

> There is a performance related issue we're facing in Prometheus coming
> from a label with a constant value across all (thousands of) time series.
> The label filter in query causes a large quantity of metadata to load in
> memory overwhelming Prometheus backends. Without this filter, the queries
> run reasonably well. We are planning to exclude this label in ingestion in
> future, 

[prometheus-users] Re: remove a label filter for all PromQL queries

2023-04-06 Thread Brian Candler
On Wednesday, 5 April 2023 at 21:50:39 UTC+1 Johny wrote:

During ingestion, we can make use of relabeling to drop labels 
automatically.


Sure. But doesn't that imply that you will have to modify all queries, 
*not* to filter on the (now missing) label? In which case, why not just 
modify the queries now?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/af9da534-9dc3-4da1-b74f-d18552c954d9n%40googlegroups.com.


[prometheus-users] Re: remove a label filter for all PromQL queries

2023-04-05 Thread Johny
Also, all the problems are in the DBs (backend prometheus) not front end.

On Wednesday, April 5, 2023 at 4:50:39 PM UTC-4 Johny wrote:

> Prometheus version is 2.39.1
>
> There are many users and some legacy clients that add friction to changing 
> queries across the board. 
> During ingestion, we can make use of relabeling to drop labels 
> automatically.
>
> I am fairly certain this is the root cause for performance degradation in 
> the system, as we're able to reproduce the problem in a load test --- 
> simulating queries with/without the concerning label filter, the latter 
> performing much better with no memory problems.
>
>
>
> On Wednesday, April 5, 2023 at 3:50:08 PM UTC-4 Brian Candler wrote:
>
>> I wonder if the filtering algorithm is really as simplistic as the 
>> Timescale blog implies ("for every label/value pair, first find *every* 
>> possible series which matches; then take the intersection of the 
>> results")?  I don't know, I'll leave others to answer that.  If it had some 
>> internal stats so that it could start with the labels which match the 
>> fewest number of series, I'd expect it to do that; and the TSDB stats in 
>> the web interface suggests that it does.
>>
>> I ask again: what version(s) of Prometheus are you running?
>>
>> Are you experiencing this with all prometheus components, i.e. a 
>> prometheus front-end talking to prometheus back-ends with remote_read?
>>
>> I think the ideal thing would be to narrow this down to a reproducible 
>> test case: either a particular pattern of remote_read queries which is 
>> performing badly at the backend, or a particular query sent to the 
>> front-end which is being sent to the backend in a suboptimal way (e.g. not 
>> including all possible label filters at once).
>>
>> You said "for now we need a workaround".  Is it not sufficient simply to 
>> remove {*global_label="constant-value"*} from your queries? After all, 
>> you're already thinking about removing this label at ingestion time, and if 
>> you do that, you won't be able to filter on it anyway.
>>
>> On Wednesday, 5 April 2023 at 18:50:02 UTC+1 Johny wrote:
>>
>>> The count of time series/metric for a few selected metrics is close to 2 
>>> million today. For scalability, we shard the data onto a few Prometheus 
>>> instances and use remote read from a front end Prometheus to fetch data 
>>> from the storage units.
>>>
>>> The series' are fetched from time series blocks by taking an 
>>> intersection of series (or postings) across all label filters in query. 
>>> First, the index postings are scanned for each label filter; second step 
>>> finds matching series with an implicit AND operator. From my understanding, 
>>> the low cardinality label present in all series will cause a large portion 
>>> of index to load in memory (during the first step). We've also observed 
>>> memory spikes during query processing when the system gets a steady dose of 
>>> queries. Without including this filter, the memory usage is lower and query 
>>> returns much faster.
>>>
>>>
>>> https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20
>>> .
>>>   
>>> So, I believe if we exclude the const label in ingestion, we won't have 
>>> this problem in the long term. Excluding this filter somewhere in the front 
>>> end will help mitigate this problem.
>>>
>>>
>>>
>>> On Wednesday, April 5, 2023 at 1:13:42 PM UTC-4 Brian Candler wrote:
>>>
 Also: how many timeseries are you working with, in terms of the 
 "my_series" that you are querying, and globally on the whole system?

 On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote:

> Adding a constant label to every timeseries should have almost zero 
> impact on memory usage.
>
> Can you clarify what you're saying, and how you've come to your 
> diagnosis? What version of prometheus are you running? When you say 
> "backends" in the plural, how have you set this up?
>
> At one point you seem to be saying it's something to do with 
> ingestion, but then you seem to be saying it's something to do with 
> queries 
> (*"Without this filter, the queries run reasonably well"*). Can you 
> give specific examples of filters which show the difference in behaviour?
>
> Again: the queries
>   my_series{global_label="constant-value",  l1="..", l2=".."}
>   my_series{l1="..", l2=".."}
> should perform almost identically, as they will select the same subset 
> of timeseries.
>
> On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote:
>
>> There is a performance related issue we're facing in Prometheus 
>> coming from a label with a constant value across all (thousands of) time 
>> series. The label filter in query causes a large quantity of metadata to 

[prometheus-users] Re: remove a label filter for all PromQL queries

2023-04-05 Thread Johny
Prometheus version is 2.39.1

There are many users and some legacy clients that add friction to changing 
queries across the board. 
During ingestion, we can make use of relabeling to drop labels 
automatically.

I am fairly certain this is the root cause for performance degradation in 
the system, as we're able to reproduce the problem in a load test --- 
simulating queries with/without the concerning label filter, the latter 
performing much better with no memory problems.



On Wednesday, April 5, 2023 at 3:50:08 PM UTC-4 Brian Candler wrote:

> I wonder if the filtering algorithm is really as simplistic as the 
> Timescale blog implies ("for every label/value pair, first find *every* 
> possible series which matches; then take the intersection of the 
> results")?  I don't know, I'll leave others to answer that.  If it had some 
> internal stats so that it could start with the labels which match the 
> fewest number of series, I'd expect it to do that; and the TSDB stats in 
> the web interface suggests that it does.
>
> I ask again: what version(s) of Prometheus are you running?
>
> Are you experiencing this with all prometheus components, i.e. a 
> prometheus front-end talking to prometheus back-ends with remote_read?
>
> I think the ideal thing would be to narrow this down to a reproducible 
> test case: either a particular pattern of remote_read queries which is 
> performing badly at the backend, or a particular query sent to the 
> front-end which is being sent to the backend in a suboptimal way (e.g. not 
> including all possible label filters at once).
>
> You said "for now we need a workaround".  Is it not sufficient simply to 
> remove {*global_label="constant-value"*} from your queries? After all, 
> you're already thinking about removing this label at ingestion time, and if 
> you do that, you won't be able to filter on it anyway.
>
> On Wednesday, 5 April 2023 at 18:50:02 UTC+1 Johny wrote:
>
>> The count of time series/metric for a few selected metrics is close to 2 
>> million today. For scalability, we shard the data onto a few Prometheus 
>> instances and use remote read from a front end Prometheus to fetch data 
>> from the storage units.
>>
>> The series' are fetched from time series blocks by taking an intersection 
>> of series (or postings) across all label filters in query. First, the index 
>> postings are scanned for each label filter; second step finds matching 
>> series with an implicit AND operator. From my understanding, the low 
>> cardinality label present in all series will cause a large portion of index 
>> to load in memory (during the first step). We've also observed memory 
>> spikes during query processing when the system gets a steady dose of 
>> queries. Without including this filter, the memory usage is lower and query 
>> returns much faster.
>>
>>
>> https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20
>> .
>>   
>> So, I believe if we exclude the const label in ingestion, we won't have 
>> this problem in the long term. Excluding this filter somewhere in the front 
>> end will help mitigate this problem.
>>
>>
>>
>> On Wednesday, April 5, 2023 at 1:13:42 PM UTC-4 Brian Candler wrote:
>>
>>> Also: how many timeseries are you working with, in terms of the 
>>> "my_series" that you are querying, and globally on the whole system?
>>>
>>> On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote:
>>>
 Adding a constant label to every timeseries should have almost zero 
 impact on memory usage.

 Can you clarify what you're saying, and how you've come to your 
 diagnosis? What version of prometheus are you running? When you say 
 "backends" in the plural, how have you set this up?

 At one point you seem to be saying it's something to do with ingestion, 
 but then you seem to be saying it's something to do with queries 
 (*"Without 
 this filter, the queries run reasonably well"*). Can you give specific 
 examples of filters which show the difference in behaviour?

 Again: the queries
   my_series{global_label="constant-value",  l1="..", l2=".."}
   my_series{l1="..", l2=".."}
 should perform almost identically, as they will select the same subset 
 of timeseries.

 On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote:

> There is a performance related issue we're facing in Prometheus coming 
> from a label with a constant value across all (thousands of) time series. 
> The label filter in query causes a large quantity of metadata to load in 
> memory overwhelming Prometheus backends. Without this filter, the queries 
> run reasonably well. We are planning to exclude this label in ingestion 
> in 
> future, but for now we need a workaround.
>
> 

[prometheus-users] Re: remove a label filter for all PromQL queries

2023-04-05 Thread Brian Candler
I wonder if the filtering algorithm is really as simplistic as the 
Timescale blog implies ("for every label/value pair, first find *every* 
possible series which matches; then take the intersection of the 
results")?  I don't know, I'll leave others to answer that.  If it had some 
internal stats so that it could start with the labels which match the 
fewest number of series, I'd expect it to do that; and the TSDB stats in 
the web interface suggests that it does.

I ask again: what version(s) of Prometheus are you running?

Are you experiencing this with all prometheus components, i.e. a prometheus 
front-end talking to prometheus back-ends with remote_read?

I think the ideal thing would be to narrow this down to a reproducible test 
case: either a particular pattern of remote_read queries which is 
performing badly at the backend, or a particular query sent to the 
front-end which is being sent to the backend in a suboptimal way (e.g. not 
including all possible label filters at once).

You said "for now we need a workaround".  Is it not sufficient simply to 
remove {*global_label="constant-value"*} from your queries? After all, 
you're already thinking about removing this label at ingestion time, and if 
you do that, you won't be able to filter on it anyway.

On Wednesday, 5 April 2023 at 18:50:02 UTC+1 Johny wrote:

> The count of time series/metric for a few selected metrics is close to 2 
> million today. For scalability, we shard the data onto a few Prometheus 
> instances and use remote read from a front end Prometheus to fetch data 
> from the storage units.
>
> The series' are fetched from time series blocks by taking an intersection 
> of series (or postings) across all label filters in query. First, the index 
> postings are scanned for each label filter; second step finds matching 
> series with an implicit AND operator. From my understanding, the low 
> cardinality label present in all series will cause a large portion of index 
> to load in memory (during the first step). We've also observed memory 
> spikes during query processing when the system gets a steady dose of 
> queries. Without including this filter, the memory usage is lower and query 
> returns much faster.
>
>
> https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20
> .
>   
> So, I believe if we exclude the const label in ingestion, we won't have 
> this problem in the long term. Excluding this filter somewhere in the front 
> end will help mitigate this problem.
>
>
>
> On Wednesday, April 5, 2023 at 1:13:42 PM UTC-4 Brian Candler wrote:
>
>> Also: how many timeseries are you working with, in terms of the 
>> "my_series" that you are querying, and globally on the whole system?
>>
>> On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote:
>>
>>> Adding a constant label to every timeseries should have almost zero 
>>> impact on memory usage.
>>>
>>> Can you clarify what you're saying, and how you've come to your 
>>> diagnosis? What version of prometheus are you running? When you say 
>>> "backends" in the plural, how have you set this up?
>>>
>>> At one point you seem to be saying it's something to do with ingestion, 
>>> but then you seem to be saying it's something to do with queries (*"Without 
>>> this filter, the queries run reasonably well"*). Can you give specific 
>>> examples of filters which show the difference in behaviour?
>>>
>>> Again: the queries
>>>   my_series{global_label="constant-value",  l1="..", l2=".."}
>>>   my_series{l1="..", l2=".."}
>>> should perform almost identically, as they will select the same subset 
>>> of timeseries.
>>>
>>> On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote:
>>>
 There is a performance related issue we're facing in Prometheus coming 
 from a label with a constant value across all (thousands of) time series. 
 The label filter in query causes a large quantity of metadata to load in 
 memory overwhelming Prometheus backends. Without this filter, the queries 
 run reasonably well. We are planning to exclude this label in ingestion in 
 future, but for now we need a workaround.

 my_series{*global_label="constant-value"*,  l1="..", l2=".."}

 Is there a mechanism to automatically exclude global_label in query 
 configuration: remote_read subsection, or elsewhere?

 thanks,
 Johny





-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7e4bc28f-6034-4e99-9177-89b36a8c9b4cn%40googlegroups.com.


[prometheus-users] Re: remove a label filter for all PromQL queries

2023-04-05 Thread Johny
The count of time series/metric for a few selected metrics is close to 2 
million today. For scalability, we shard the data onto a few Prometheus 
instances and use remote read from a front end Prometheus to fetch data 
from the storage units.

The series' are fetched from time series blocks by taking an intersection 
of series (or postings) across all label filters in query. First, the index 
postings are scanned for each label filter; second step finds matching 
series with an implicit AND operator. From my understanding, the low 
cardinality label present in all series will cause a large portion of index 
to load in memory (during the first step). We've also observed memory 
spikes during query processing when the system gets a steady dose of 
queries. Without including this filter, the memory usage is lower and query 
returns much faster.

https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20.
  
So, I believe if we exclude the const label in ingestion, we won't have 
this problem in the long term. Excluding this filter somewhere in the front 
end will help mitigate this problem.



On Wednesday, April 5, 2023 at 1:13:42 PM UTC-4 Brian Candler wrote:

> Also: how many timeseries are you working with, in terms of the 
> "my_series" that you are querying, and globally on the whole system?
>
> On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote:
>
>> Adding a constant label to every timeseries should have almost zero 
>> impact on memory usage.
>>
>> Can you clarify what you're saying, and how you've come to your 
>> diagnosis? What version of prometheus are you running? When you say 
>> "backends" in the plural, how have you set this up?
>>
>> At one point you seem to be saying it's something to do with ingestion, 
>> but then you seem to be saying it's something to do with queries (*"Without 
>> this filter, the queries run reasonably well"*). Can you give specific 
>> examples of filters which show the difference in behaviour?
>>
>> Again: the queries
>>   my_series{global_label="constant-value",  l1="..", l2=".."}
>>   my_series{l1="..", l2=".."}
>> should perform almost identically, as they will select the same subset of 
>> timeseries.
>>
>> On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote:
>>
>>> There is a performance related issue we're facing in Prometheus coming 
>>> from a label with a constant value across all (thousands of) time series. 
>>> The label filter in query causes a large quantity of metadata to load in 
>>> memory overwhelming Prometheus backends. Without this filter, the queries 
>>> run reasonably well. We are planning to exclude this label in ingestion in 
>>> future, but for now we need a workaround.
>>>
>>> my_series{*global_label="constant-value"*,  l1="..", l2=".."}
>>>
>>> Is there a mechanism to automatically exclude global_label in query 
>>> configuration: remote_read subsection, or elsewhere?
>>>
>>> thanks,
>>> Johny
>>>
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/49c4ee2e-3631-427d-a69c-8926b57d1125n%40googlegroups.com.


[prometheus-users] Re: remove a label filter for all PromQL queries

2023-04-05 Thread Brian Candler
Also: how many timeseries are you working with, in terms of the "my_series" 
that you are querying, and globally on the whole system?

On Wednesday, 5 April 2023 at 18:12:11 UTC+1 Brian Candler wrote:

> Adding a constant label to every timeseries should have almost zero impact 
> on memory usage.
>
> Can you clarify what you're saying, and how you've come to your diagnosis? 
> What version of prometheus are you running? When you say "backends" in the 
> plural, how have you set this up?
>
> At one point you seem to be saying it's something to do with ingestion, 
> but then you seem to be saying it's something to do with queries (*"Without 
> this filter, the queries run reasonably well"*). Can you give specific 
> examples of filters which show the difference in behaviour?
>
> Again: the queries
>   my_series{global_label="constant-value",  l1="..", l2=".."}
>   my_series{l1="..", l2=".."}
> should perform almost identically, as they will select the same subset of 
> timeseries.
>
> On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote:
>
>> There is a performance related issue we're facing in Prometheus coming 
>> from a label with a constant value across all (thousands of) time series. 
>> The label filter in query causes a large quantity of metadata to load in 
>> memory overwhelming Prometheus backends. Without this filter, the queries 
>> run reasonably well. We are planning to exclude this label in ingestion in 
>> future, but for now we need a workaround.
>>
>> my_series{*global_label="constant-value"*,  l1="..", l2=".."}
>>
>> Is there a mechanism to automatically exclude global_label in query 
>> configuration: remote_read subsection, or elsewhere?
>>
>> thanks,
>> Johny
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0716b36e-835e-44fd-b36c-0b63834b8447n%40googlegroups.com.


[prometheus-users] Re: remove a label filter for all PromQL queries

2023-04-05 Thread Brian Candler
Adding a constant label to every timeseries should have almost zero impact 
on memory usage.

Can you clarify what you're saying, and how you've come to your diagnosis? 
What version of prometheus are you running? When you say "backends" in the 
plural, how have you set this up?

At one point you seem to be saying it's something to do with ingestion, but 
then you seem to be saying it's something to do with queries (*"Without 
this filter, the queries run reasonably well"*). Can you give specific 
examples of filters which show the difference in behaviour?

Again: the queries
  my_series{global_label="constant-value",  l1="..", l2=".."}
  my_series{l1="..", l2=".."}
should perform almost identically, as they will select the same subset of 
timeseries.

On Wednesday, 5 April 2023 at 17:42:33 UTC+1 Johny wrote:

> There is a performance related issue we're facing in Prometheus coming 
> from a label with a constant value across all (thousands of) time series. 
> The label filter in query causes a large quantity of metadata to load in 
> memory overwhelming Prometheus backends. Without this filter, the queries 
> run reasonably well. We are planning to exclude this label in ingestion in 
> future, but for now we need a workaround.
>
> my_series{*global_label="constant-value"*,  l1="..", l2=".."}
>
> Is there a mechanism to automatically exclude global_label in query 
> configuration: remote_read subsection, or elsewhere?
>
> thanks,
> Johny
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ca99f732-2785-4149-950a-cc5d479b3863n%40googlegroups.com.