Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-02-27 Thread Christoph Anton Mitterer
Hy Stuart, Julien and Ben, Hope you don't mind that I answer all three replies in one... don't wanna spam the list ;-) On Tue, 2023-02-21 at 07:31 +, Stuart Clark wrote: > Prometheus itself cannot do downsampling, but other related projects > such as Cortex & Thanos have such features.

Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-03-01 Thread Christoph Anton Mitterer
Hey Brian On Tue, 2023-02-28 at 00:27 -0800, Brian Candler wrote: > > I can offer a couple more options: > > (1) Use two servers with federation. > - server 1 does the scraping and keeps the detailed data for 2 weeks > - server 2 scrapes server 1 at lower interval, using the federation >

Re: [prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-03-01 Thread Christoph Anton Mitterer
On Tue, 2023-02-28 at 10:25 +0100, Ben Kochie wrote: > > Debian release cycles are too slow for the pace of Prometheus > development. It's rather simple to pull the version from Debian unstable, if on needs so, and that seems pretty current. > You'd be better off running Prometheus using

[prometheus-users] fading out sample resolution for samples from longer ago possible?

2023-02-20 Thread Christoph Anton Mitterer
Hey. I wondered whether one can to with Prometheus something similar that is possible with systems using RRD (e.g. Ganlia). Depending on the kind of metrics, like for those from the node exporter, one may want a very high sample resolution (and thus short scraping interval) for like the last

[prometheus-users] restrict (respectively silence) alert rules to/for certain instances

2023-04-24 Thread Christoph Anton Mitterer
Hey. I have some troubles understanding how to do things right™ with respect to alerting. In principle I'd like to do two things: a) have certain alert rules run only for certain instances (though that may in practise actually be less needed, when only the respective nodes would generate

[prometheus-users] Re: restrict (respectively silence) alert rules to/for certain instances

2023-04-27 Thread Christoph Anton Mitterer
On Wednesday, April 26, 2023 at 9:14:35 AM UTC+2 Brian Candler wrote: > I guess with (2) you also meant having a route which is then permanently muted? I'd use a route with a null receiver (i.e. a receiver which has no _configs under it) Ah, interesting. It wasn't even clear to me from the

[prometheus-users] Re: how to make sure a metric is to be checked is "there"

2023-04-27 Thread Christoph Anton Mitterer
Hey again. On Wednesday, April 26, 2023 at 9:35:32 AM UTC+2 Brian Candler wrote: > expr: up{job="myjob"} == 1 unless my_metric Beware with that, that it will only work if the labels on both 'up' and 'my_metric' match exactly. If they don't, then you can either use on(...) to specify the set

[prometheus-users] Re: restrict (respectively silence) alert rules to/for certain instances

2023-04-25 Thread Christoph Anton Mitterer
Hey Brian On Tuesday, April 25, 2023 at 9:59:12 AM UTC+2 Brian Candler wrote: So really I'd divide the possibilities 3 ways: a. Prevent the alert being generated from prometheus in the first place, by writing the expr in such a way that it filters out conditions that you don't want to alert

[prometheus-users] Re: how to make sure a metric is to be checked is "there"

2023-04-25 Thread Christoph Anton Mitterer
On Tuesday, April 25, 2023 at 9:32:25 AM UTC+2 Brian Candler wrote: I think you would have basically the same problem with Icinga unless you have configured Icinga with a list of RAID controllers which should be present on a given device, or a list of drives which should be present in a

[prometheus-users] how to make sure a metric is to be checked is "there"

2023-04-24 Thread Christoph Anton Mitterer
Hey there. What I'm trying to do is basically replace Icinga with Prometheus (or well not really replacing, but integrating it into the latter, which I anyway need for other purposes). So I'll have e.g. some metric that shows me the RAID status on instances, and I want to get an alert, when a

[prometheus-users] collect non-metrics data

2023-02-11 Thread Christoph Anton Mitterer
Hey. I wondered whether the following is possible with Prometheus. I basically think about possibly phasing out Icinga and do any alerting in Prometheus. For checks that are clearly metrics based (like load or free disk space) this seems rather easy. But what about any checks that are not

Re: [prometheus-users] collect non-metrics data

2023-02-13 Thread Christoph Anton Mitterer
Hey Ben. On Saturday, February 11, 2023 at 11:18:44 AM UTC+1 Ben Kochie wrote: You combine this with an "info" metric that tells you about the rest of the device. Ah,... and I assume that one could just also export these info metrics alongside e.g. node_md_state? Thanks :-) Chris. -- You

[prometheus-users] Re: better way to get notified about (true) single scrape failures?

2023-05-09 Thread Christoph Anton Mitterer
Hey Brian. On Tuesday, May 9, 2023 at 9:55:22 AM UTC+2 Brian Candler wrote: That's tricky to get exactly right. You could try something like this (untested): expr: min_over_time(up[5m]) == 0 unless max_over_time(up[5m]) == 0 for: 5m - min_over_time will be 0 if any single scrape

[prometheus-users] better way to get notified about (true) single scrape failures?

2023-05-08 Thread Christoph Anton Mitterer
Hey. I have an alert rule like this: groups: - name: alerts_general rules: - alert: general_target-down expr: 'up == 0' for: 5m which is intended to notify about a target instance (respectively a specific exporter on that) being down. There are also routes in

[prometheus-users] Re: better way to get notified about (true) single scrape failures?

2023-05-12 Thread Christoph Anton Mitterer
Hey Brian On Wednesday, May 10, 2023 at 9:03:36 AM UTC+2 Brian Candler wrote: It depends on the exact semantics of "for". e.g. take a simple case of 1 minute rule evaluation interval. If you apply "for: 1m" then I guess that means the alert must be firing for two successive evaluations

[prometheus-users] Re: better way to get notified about (true) single scrape failures?

2024-03-17 Thread Christoph Anton Mitterer
Hey there. I eventually got back to this and I'm still fighting this problem. As a reminder, my goal was: - if e.g. scrapes fail for 1m, a target-down alert shall fire (similar to how Icinga would put the host into down state, after pings failed or a number of seconds) - but even if a single

Re: [prometheus-users] Re: better way to get notified about (true) single scrape failures?

2024-03-17 Thread Christoph Anton Mitterer
Hey Chris. On Sun, 2024-03-17 at 22:40 -0400, Chris Siebenmann wrote: > > One thing you can look into here for detecting and counting failed > scrapes is resets(). This works perfectly well when applied to a > gauge Though it is documented as to be only used with counters... :-/ > that is 1

Re: [prometheus-users] Re: better way to get notified about (true) single scrape failures?

2024-03-21 Thread Christoph Anton Mitterer
I've been looking into possible alternatives, based on the ideas given here. I) First one completely different approach might be: - alert: target-down expr: 'max_over_time( up[1m0s] ) == 0' for: 0s and: ( - alert: single-scrape-failure expr: 'min_over_time( up[2m0s] ) == 0' for: 1m or - alert:

[prometheus-users] query for time series misses samples (that should be there), but not when offset is used

2024-03-22 Thread Christoph Anton Mitterer
Hey. I noticed a somewhat unexpected behaviour, perhaps someone can explain why this happens. - on a Prometheus instance, with a scrape interval of 10s - doing the following queries via curl from the same node where Prometheus runs (so there cannot be any different system times or so Looking

Re: [prometheus-users] query for time series misses samples (that should be there), but not when offset is used

2024-04-05 Thread Christoph Anton Mitterer
Hey. On Friday, April 5, 2024 at 7:10:29 AM UTC+2 Ben Kochie wrote: If the jitter is > 0.002, the real value is stored. Interesting... though I guess bad for my solution in the other thread, where I make the assumption that it's guaranteed that samples are always exactly on point with the

Re: [prometheus-users] Re: better way to get notified about (true) single scrape failures?

2024-04-05 Thread Christoph Anton Mitterer
Hey Chris. On Thursday, April 4, 2024 at 8:41:02 PM UTC+2 Chris Siebenmann wrote: > - The evaluation interval is sufficiently less than the scrape > interval, so that it's guaranteed that none of the `up`-samples are > being missed. I assume you were referring to the above specific point?

[prometheus-users] what to do about flapping alerts?

2024-04-05 Thread Christoph Anton Mitterer
Hey. I have some simple alerts like: - alert: node_upgrades_non-security_apt expr: 'sum by (instance,job) ( apt_upgrades_pending{origin!~"(?i)^.*-security(?:\\PL.*)?$"} )' - alert: node_upgrades_security_apt expr: 'sum by (instance,job) (

[prometheus-users] Re: what to do about flapping alerts?

2024-04-08 Thread Christoph Anton Mitterer
Hey Brian. On Saturday, April 6, 2024 at 9:33:27 AM UTC+2 Brian Candler wrote: > but AFAIU that would simply affect all alerts, i.e. it wouldn't just keep firing, when the scraping failed, but also when it actually goes back to an ok state, right? It affects all alerts individually, and I

[prometheus-users] Re: what to do about flapping alerts?

2024-04-08 Thread Christoph Anton Mitterer
On Monday, April 8, 2024 at 11:05:41 PM UTC+2 Brian Candler wrote: On Monday 8 April 2024 at 20:57:34 UTC+1 Christoph Anton Mitterer wrote: But for Prometheus, with keep_firing_for, it will be like the same alert. If the alerts have the exact same set of labels (e.g. the alert

Re: [prometheus-users] Re: better way to get notified about (true) single scrape failures?

2024-04-04 Thread Christoph Anton Mitterer
hanges in my config git below. Thanks to everyone for helping me with that :-) Best wishes, Chris. (needs a mono-spaced font to work out nicely) TL/DR: - commit f31f3c656cae4aeb79ce4bfd1782a624784c1c43 Author: Christoph Anton Mitterer Date: Mon

Re: [prometheus-users] query for time series misses samples (that should be there), but not when offset is used

2024-04-04 Thread Christoph Anton Mitterer
Hey Chris, Brian. Thanks for your replies/confirmations. On Sunday, March 24, 2024 at 8:16:14 AM UTC+1 Ben Kochie wrote: Yup, this is correct. Prometheus sets the timestamp of the sample at the start of the scrape. But since it's an ACID compliant database, the data is not queryable until