Re: [prometheus-users] Re: Efficient way to query non-active time series's last value

Ben Kochie Tue, 01 Sep 2020 13:39:51 -0700

Prometheus attempts to use gzip http compression by default, but as you
say, your exporter is local.


Your 400k samples per scrape is pretty far out of bounds for a normal
setup. Prometheus scales by scraping many small requests in parallel.
Typically I recommend 50k samples per scrape is an absolute maximum
recommended, and more than 10k samples per scrape is "large but OK".

It sounds like you've either got some metrics with excessive cardinality,
or you're pre-aggregating data for Prometheus. Both of which are going
against best practices and are going to lead you into trouble long-term.

Without more understanding of what you're really doing, it's hard to say.
But it's definitely not how Prometheus is designed to be used and you're
suggesting workarounds for problems you shouldn't have in the first place.

On Tue, Sep 1, 2020, 17:37 Peter S <[email protected]> wrote:

> We measured by `curl <metrics_endpoint> | wc` Also
> `scrape_samples_scraped` reports that 400k metrics are exported and scraped.
>
> TSDB is fine. Sorry I wasn't being clear. Network traffic has become the
> bottleneck. Even though the exporter and prometheus are collocated on the
> same machine, scrapes have begun timing out more and more often. Next we
> think we're increasing scraping interval (15s) to buy us some time.
>
> What we really want, in an ideal world, is that only states changes are
> exported and scraped, and there is an efficient way to query last reported
> states, so all these network traffic (and storage although it's not an
> issue for us) can be saved, and the system becomes much more scalable.
>
> Thanks,
>
> Peter
>
> On Tuesday, September 1, 2020 at 5:01:47 AM UTC-4 [email protected] wrote:
>
>> On Tuesday, 1 September 2020 01:55:15 UTC+1, Peter S wrote:
>>>
>>> Thanks. Unfortunately, exporting and scraping the same values have
>>> become costly for us. We have metrics endpoints of 50MB+, and scraping have
>>> begun to time out more and more often.
>>>
>>>
>> Sorry, can you explain what you mean by "metrics endpoints of 50MB+" ?
>> Where are you measuring 50MB exactly?
>>
>> If you have 50 million timeseries, that's huge.  But I don't think that's
>> what you mean.
>>
>> If you are returning 50MB of prometheus line-format data in a single
>> scrape, that's quite a lot, but it will compress to very little in the TSDB
>> if the values are not changing.
>>
>> What's important to prometheus is not the volume of the scrape, but the
>> number of active timeseries.  Timeseries are active if they're in the head,
>> which means a sample has been seen in the last ~2 hours.  Leaving gaps in
>> the timeseries, when the gaps are less than 2 hours, is not going to save
>> you any TSDB resources at all, but will cause you problems with staleness
>> at query time.
>>
>> What are you trying to optimise: the volume of TSDB storage, or the
>> volume of network traffic?  If it's network traffic then you might be
>> better off having a local prometheus server right next to where the data is
>> collected.  You can either query it directly, or via promxy, or use
>> something like Thanos.  In either case, the only traffic will be the query
>> request/response.
>>
>> You could also use remote_write to forward data to a central server such
>> as VictoriaMetrics, although I have not measured how the volume of
>> remote_write traffic compares with the volume of prometheus line protocol
>> traffic.
>>
>> Another option to consider would be to use statsd_exporter or possibly
>> pushgateway, and have those local to your prometheus server.  The remote
>> metrics updates would be done via statsd or pushgateway updates, and when
>> they don't change, prometheus just scrapes the same value.
>>
>> Finally, it would be pretty easy to write a proxy which is tailored to
>> your requirements: incoming scrape performs outbound scrape, merges the
>> results into a cache, and then returns the whole cache contents.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/096dac17-76eb-4e79-8e39-cf4e60b55bbcn%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/096dac17-76eb-4e79-8e39-cf4e60b55bbcn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmogWqqTKjackOY4crOLExBLkhdDNytLV2Zv5Dt5_rZgVQ%40mail.gmail.com.

Re: [prometheus-users] Re: Efficient way to query non-active time series's last value

Reply via email to