Thank you for your help. Your reply is very clear. I think I know how to do
it. I will try to use recording rule to optimize.
Thank you again!
在 2020年3月3日星期二 UTC+8下午7:11:51,Brian Candler写道:
>
> On Tuesday, 3 March 2020 09:49:08 UTC, luv - wrote:
>>
>> I use Prometheus to scrape the machine. There are about 500 service
>> machines, and each machine generates about 3500 sample data.
>>
>
> 500 machines each generating 3500 different metrics is 1,750,000
> timeseries.
>
>
>> When I use the federate interface, I find Prometheus adds instance and
>> job labels to data from different machines. So there are 500 * 3500 =
>> 1750000 sample data. This results in very large memory for queries and
>> writes.
>>
>
> There are indeed 1,750,000 timeseries. The labels themselves don't use up
> any space, except in the timeseries index. A rule of thumb is that about 2
> million timeseries is the point where you start thinking about splitting up
> scrapes between multiple servers.
>
>
>> I don't care about the data on the single machine dimension. Is there
>> any way to remove the instance label and aggregate data before data writing?
>>
>> for example:
>> http_requests_total{methed="GET", code="200", instance="ip1:port", job=
>> "job1"} 100
>> http_requests_total{methed="GET", code="200", instance="ip2:port", job=
>> "job1"} 50
>>
>> merge to:
>> http_requests_total{methed="GET", code="200", instance="", job="job1"}
>> 150
>>
>>
>>
>>
> You can't aggregate before writing (unless you write your own exporter
> which does this). Or you could use statsd_exporter, and have all the
> targets push their counter updates to this.
>
> You can use a recording rule to generate the aggregate - and then when you
> scrape the /federate endpoint pass a match[] query so that only the
> aggregate timeseries is returned.
>
> Note that if you simply stripped the labels, you would get conflicting
> data. For example, at one scrape instant you might have:
>
> http_requests_total{methed="GET", code="200"} 100
> http_requests_total{methed="GET", code="200"} 50
>
> Is the value of the counter at this point in time 100 or 50? Answer: it's
> neither (it should be 150). And if you look at the metric over time, it
> would bounce up and down as it flips between different counter values.
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/1abdd9a5-44a4-4cdf-a6e5-13543acab010%40googlegroups.com.