Hey. I have some troubles understanding how to do things right™ with respect to alerting.
In principle I'd like to do two things: a) have certain alert rules run only for certain instances (though that may in practise actually be less needed, when only the respective nodes would generate the respective metrics - not sure yet, whether this will be the case) b) silence certain (or all) alerts for a given set of instances e.g. these may be nodes where I'm not an admin how can take action on an incident, but just view the time series graphs to see what's going on As example I'll take an alert that fires when the root fs has >85% usage: groups: - name: node_alerts rules: - alert: node_free_fs_space expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) >= 85 With respect to (a): I could of course a yet another key like: instance=~"someRegexThatDescribesMyInstances" to each time series, but when that regex gets more complex, everything becomes quite unreadable and it's quite error prone to forget about a place (assuming one has many alerts) when the regex changes. Is there some way like defining host groups or so? Where I have a central place where I could define the list of hosts respectively a regex for that... and just use the name of that definition in the actual alert rules? With respect to (b): Similarly to above,... if I had various instances for which I'd never wanted to see any alerts, I could of course add a regex to all my alerts. But seems quite ugly to clutter up all the rules just for a potentially long list/regex of things for which I don't want to see anyway. Another idea I had was that I do the filtering/silencing in the alertmanager config at route level: Like by adding a "ignore" route, that matches via regex on all the instances I'd like to silence (and have a mute_time_interval set to 24/7), before any other routes match. But AFAIU this would only suppress the message (e.g. mail), but the alert would still show up in the alertmanager webpages/etc. as firing. Not sure whether anything can be done better via adding labels at some stage. - Doing external_labels: in prometheus config doesn't seem to help here (only stact values?) - Same for labels: in <static_config> in prometheus config. - Setting some "noalerts" label via <relabel_config> in prometheus config would also set that in the DB, right? This I rather wouldn't want. - Maybe using: alerting: alert_relabel_configs: - <relabel_config> would work? Like matching hostnames on instance and replacing with e.g. "yes" in some "noalerts" target? And then somehow using that in the alert rules... But also sounds a bit ugly, TBH. So... what's the proper way to do this? :-) Thanks, Chris. btw: Is there any difference between: 1) alerting: alert_relabel_configs: - <relabel_config> and 2) the relabel_configs: in <alertmanager_config> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/03619286babc6b2ee9d3295e235016b4e3b383ca.camel%40gmail.com.