On Tuesday, 13 October 2020 19:39:08 UTC+1, Kelsey Cummings wrote:
>
> On 10/13/2020 12:42 AM, Brian Candler wrote: 
> > The minimum useful scrape time for Prometheus is 2 minutes (120 
> > seconds). 
> Ah, interesting.  The 300 second to staleness is not configurable? 


There is a global option "--query.lookback-delta"
 

>
> Polling the devices faster is not practical. All of the samples will be 
> coming out of an existing poller - in some cases it takes ~ 10 minutes 
> to the poller to complete taking the stats off a single device (limited 
> by the device management plane's pitiful CPU).  We could write our own 
> cache layer that would allow prometheus to poll, oversampling if 
> necessary, if that's architecturally better than using the push gateway. 
>
>
For devices which are slow or expensive to poll: I'd suggest polling them 
in a separate process (e.g. a cronjob at 30 minute intervals or whatever), 
write the data somewhere, and then scrape the written data.  
node-exporter's textfile-collector is very convenient for this.

Amongst other things, it means you can scrape from multiple prometheus 
servers for HA without increasing the load on your target systems.  Also, 
the textfile-collector exposes a metric which is the timestamp of the file, 
so you can easily alert if the file hasn't been updated for more than a 
certain amount of time.

 

>
> Any hints on what server specs would be required for an instance that 
> big?  Projecting costs and what scale out is always a bit tricky when 
> you've never run an application before. 
>
>
I'll leave others to comment, not having built one that big myself.  There 
is this calculator for RAM though:
https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion

You could also look at prometheus -> (remote write) -> VictoriaMetrics.  
This has good support for backup and restore of long-term data, and would 
let you keep your prometheus retention short (just enough for alerting 
history really).  Blog:
https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7059e6bf-af84-4337-bb6a-dea0c600d5e3o%40googlegroups.com.

Reply via email to