On 28.01.25 09:29, rohit ahuja wrote: > > Question 1 - what should be my deletion policy to delete stale metrics from > pushgateway? > should it be after my batches are complete for the day? batches runs for > 2-3 hours
Ideally never. I would set up your batches in a way that the metrics of each day produce the same metrics. Then you have a fixed set of metrics that will live on the PGW forever, overwritten each day. > Question 2 - I want to define an email alert if any of batch fails. > although spring batch provide this metric but it is not working. So i > defined a Counter that is available to me in prometheus like this > --> app_job_status_total{status="FAILED"} 1. Problem is it always gives me > the same value. Using functions increase() or rate() does not help as well. > as the value of metric once set is not changing over the evaluated > interval. Please advice The Pushgateway is not a distributed counter. If you have separate metrics for each of your daily batch job, you could just have a gauge that is 0 or 1 depending on success or failure, and have an alert watching all those. If you cannot avoid the "distributed counter" use case, you could try a statsd setup and funnel the stastd metrics into Prometehus via the statsd exporter. Or you try out the prom-aggregation-gateway, https://github.com/zapier/prom-aggregation-gateway See also https://github.com/prometheus/pushgateway?tab=readme-ov-file#non-goals -- Björn Rabenstein [PGP-ID] 0x851C3DA17D748D03 [email] bjo...@rabenste.in -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/prometheus-developers/Z5kg1qdPMjz%2BqbWS%40mail.rabenste.in.