On Friday, 24 April 2020 06:57:08 UTC+1, Srinivasa praveen wrote:
>
> Thanks for the response Stuark. The reason behind keeping the scraping
> interval so long is, on receiving scrape request from Prometheus, my
> exporter performs around 10 queries against database and exposes the result
> as 10 metrics, which will take around 15 minutes to complete all the
> queries. And Prometheus scrape was timing out. So, to increase the
> scrape_timeout I had to increase the scrape_interval also.
>
I think a better option is: run your slow queries from cron every 30
minutes, and write the results into a metrics file which is picked up using
node_exporter textfile collector.
This means you can scrape it as often as you like, including from multiple
prometheus servers for HA.
Also, textfile collector exposes a metric with the timestamp of the file,
so you can alert if the file isn't being updated for any reason: useful to
spot cronjobs that are persistently failing.
- name: Hourly
interval: 1h
rules:
- alert: StaleTextFile
expr: time() - node_textfile_mtime_seconds > 7200
for: 2h
labels:
severity: warning
annotations:
summary: "textfile-collector file has not been updated for more than
3 hours"
I also suggest: move the metric file into place only when your slow query
has completed successfully.
(
...
) >/var/lib/node_exporter/sqlmetrics.prom.new && mv
/var/lib/node_exporter/sqlmetrics.prom.new
/var/lib/node_exporter/sqlmetrics.prom
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/9086dd88-d95b-45a9-87fb-1a0b8daa8358%40googlegroups.com.