Thank you for your help. Your reply is very clear. I think I know how to do 
it. I will try to use recording rule to optimize.

Thank you again!

在 2020年3月3日星期二 UTC+8下午7:11:51,Brian Candler写道:
>
> On Tuesday, 3 March 2020 09:49:08 UTC, luv - wrote: 
>>
>>     I use Prometheus to scrape the machine. There are about 500 service 
>> machines, and each machine generates about 3500 sample data.
>>
>
> 500 machines each generating 3500 different metrics is 1,750,000 
> timeseries.
>  
>
>> When I use the federate interface, I find Prometheus adds instance and 
>> job labels to data from different machines. So there are 500 * 3500 = 
>> 1750000 sample data. This results in very large memory for queries and 
>> writes.
>>
>
> There are indeed 1,750,000 timeseries.  The labels themselves don't use up 
> any space, except in the timeseries index.  A rule of thumb is that about 2 
> million timeseries is the point where you start thinking about splitting up 
> scrapes between multiple servers.
>  
>
>> I don't care about the data on the single machine dimension.  Is there 
>> any way to remove the instance label and aggregate data before data writing?
>>
>> for example:
>> http_requests_total{methed="GET", code="200", instance="ip1:port", job=
>> "job1"} 100
>> http_requests_total{methed="GET", code="200", instance="ip2:port", job=
>> "job1"} 50
>>
>> merge to:
>> http_requests_total{methed="GET", code="200", instance="", job="job1"} 
>> 150
>>
>>
>>
>>
> You can't aggregate before writing (unless you write your own exporter 
> which does this).  Or you could use statsd_exporter, and have all the 
> targets push their counter updates to this.
>
> You can use a recording rule to generate the aggregate - and then when you 
> scrape the /federate endpoint pass a match[] query so that only the 
> aggregate timeseries is returned.
>
> Note that if you simply stripped the labels, you would get conflicting 
> data.  For example, at one scrape instant you might have:
>
> http_requests_total{methed="GET", code="200"} 100
> http_requests_total{methed="GET", code="200"} 50
>
> Is the value of the counter at this point in time 100 or 50?  Answer: it's 
> neither (it should be 150).  And if you look at the metric over time, it 
> would bounce up and down as it flips between different counter values.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1abdd9a5-44a4-4cdf-a6e5-13543acab010%40googlegroups.com.

Reply via email to