Re: [prometheus-users] query for time series misses samples (that should be there), but not when offset is used

2024-04-05 Thread Chris Siebenmann
> Also, what would happen if e.g. there was a first scrape, which get's > delayed > 0.002 s ... and before that first scrape arrives, there's yet > another (later) scrape which has no jitter and is on time? > Are they going to be properly ordered? As far as I know, this can't happen. The

Re: [prometheus-users] Re: better way to get notified about (true) single scrape failures?

2024-04-04 Thread Chris Siebenmann
> The assumptions I've made are basically three: > - Prometheus does that "faking" of sample times, and thus these are > always on point with exactly the scrape interval between each. > This in turn should mean, that if I have e.g. a scrape interval of > 10s, and I do up[20s], then

Re: [prometheus-users] Inhibit resolved messages from inhibited alerts

2024-04-02 Thread Chris Siebenmann
> It seems that resolved messages are still thrown/received when an inhibited > alert is resolved. Is there any way to squelch these as well? Or is this > pretty much as intended. Inhibitions normally stop notifications about resolved alerts. However, I suspect that you may be running into a

Re: [prometheus-users] Assistance Needed with Prometheus and Alertmanager Configuration

2024-03-29 Thread Chris Siebenmann
> I am encountering challenges with configuring Prometheus and Alertmanager > for my application's alarm system. Below are the configurations I am > currently using: > > *prometheus.yml:* > > Scrape Interval: 1h This scrape interval is far too high. Although it's not well documented, you can't

Re: [prometheus-users] query for time series misses samples (that should be there), but not when offset is used

2024-03-23 Thread Chris Siebenmann
> I noticed a somewhat unexpected behaviour, perhaps someone can explain why > this happens. > > - on a Prometheus instance, with a scrape interval of 10s > - doing the following queries via curl from the same node where Prometheus > runs (so there cannot be any different system times or so > >

Re: [prometheus-users] blackbox_exporter 0.24.0 and smokeping_prober 0.7.1 - DNS cache "nscd" not working

2024-03-21 Thread Chris Siebenmann
> Having a quick look at the binary, it seems, that netgo build tag was > applied: > > $ strings blackbox_exporter-0.24.0.linux-amd64/blackbox_exporter | egrep > '\-tags.*net.*' > build -tags=netgo > build -tags=netgo As a side note: if you have the Go toolchain available, you can use 'go

[prometheus-users] Sending inhibited or muted Alertmanager alerts when the mute expires?

2024-03-18 Thread Chris Siebenmann
Alertmanager has supported inhibiting alert notifications for a long time, and somewhat more recently it's added support for muting them during specific time ranges (or only sending them in specific time ranges). As far as I know, the behaviour (for both inhibited alerts and muted alerts) is that

Re: [prometheus-users] Re: better way to get notified about (true) single scrape failures?

2024-03-17 Thread Chris Siebenmann
> As a reminder, my goal was: > - if e.g. scrapes fail for 1m, a target-down alert shall fire (similar to > how Icinga would put the host into down state, after pings failed or a > number of seconds) > - but even if a single scrape fails (which alone wouldn't trigger the above > alert) I'd

Re: [prometheus-users] Re: Metrics from PUSH Consumer - Relabeled Metrics? Check "Up" state?

2024-02-26 Thread Chris Siebenmann
> Will I run into issues with "staleness" if there aren't any metrics anymore > for (more) than 5 minutes? > Or perhaps can I use this "staleness" indicator in some way? Perhaps this is a use for absent() or absent_over_time(), if you know specific metrics that should always be present from the

Re: [prometheus-users] Optimal solution for storing 3 years of data from 300 hosts in prometheus server

2024-02-20 Thread Chris Siebenmann
> I am planning to store 3 years of data from 300 server in a single > prometheus server. The data will primarily consist of default exporter > metrics and the server has 500G memory and 80 cores. We currently scrape metrics from 908 different sources (from 'count(up)'), 153 of which are the

Re: [prometheus-users] blackbox_exporter - how to simplify my configuration

2024-02-19 Thread Chris Siebenmann
> In our DataCenter we have different security zones. In each zone I > want to place a blackbox_exporter. The goal is that each > blackbox_exporter monitors the same destinations eg. same DNS Server > or same webserver. All exporters are controlled by one single > prometheus server. > > If someone

Re: [prometheus-users] Binary operations between range vectors and scalars

2024-01-29 Thread Chris Siebenmann
> I don't know if this has been proposed before, so I'd like to raise it here > before taking it to github or prometheus-developers. > > There are cases where binary operators could act between range vectors and > scalars, but this is not currently allowed today (except by using > subqueries,

Re: [prometheus-users] delta/increase on a counter return wrong value

2024-01-19 Thread Chris Siebenmann
> I understand that there can be some interpolation at the boundaries, but > the value is not changing around the boundaries, it only changes in the > middle of the time range. Scrap is done every 15s and the value of the > metric is constant more than 1 minute before and after the boundaries.

Re: [prometheus-users] delta/increase on a counter return wrong value

2024-01-18 Thread Chris Siebenmann
> I have a counter and I want to counter the number of occurences on a > duration (let's say 15m). I'm using delta() or increase but I'm not getting > the result I'm expecting. > > value @t0: 30242494 > value @t0+15m: 30609457 > calculated diff: 366963 > round(max_over_time(metric[15m])) -

Re: [prometheus-users] Guidance on Prometheus Alerting for Shutdown Instances

2023-12-06 Thread Chris Siebenmann
> I'm working with a metric like CPU usage, where instance identifiers > are submitted as labels. To ensure instances are running as expected, > I've defined an alert based on this metric. The alert triggers when > the aggregation value (in my case, the increase) over a time window > falls below

Re: [prometheus-users] probe_success VS up

2023-11-28 Thread Chris Siebenmann
> Blackbox exporter does have a /metrics endpoint, but this is only for > metrics internal to the operation of blackbox_exporter itself (e.g. > memory stats, software version). You don't need to scrape this, but it > gives you a little bit of extra info about how your exporter is > performing.

Re: [prometheus-users] probe_success VS up

2023-11-27 Thread Chris Siebenmann
> I've recently started monitoring a large fleet of hardware devices > using a combination of blackbox, snmp, node, and json exporters. I > started out using the *up* metric, but I noticed when using blackbox > ping, *up* is *always* 1 even when the device is offline. So I plan to > switch to

Re: [prometheus-users] Rancher alertmanager Jason payload editable or not

2023-11-17 Thread Chris Siebenmann
> Hi All , we have rancher based setup . We are receiving webhook payload > from alertmanager, but alarms are not generating at webhook due to format > issue. > > For example we need to remove \ > Here we need > *severity=warning*, instead of *severity=\"warning\"}",* > *Can we update the

Re: [prometheus-users] Trying to find an aggregation proxy/gateway

2023-09-08 Thread Chris Siebenmann
> This is a question that I have asked in the past in other channels, but > I never found a solution, so I am trying again in case somebody knows of > some tool or hack, I cannot be the only one with this need! > > I have a NodeJS application that I have developed for which I really > need to

Re: [prometheus-users] Time between FIRING and RESOLVED.

2021-05-20 Thread Chris Siebenmann
> Does anyone know how to set the time between firing and resolved > alert? > > I thought it is resolve_timeout after "global" line in > alertmanager.yml, but it not working. Is there something that need to > be configured additionally? The resolve_timeout setting in Alertmanager is in practice

Re: [prometheus-users] using pushgateway metric with other metrics returns empty result

2021-04-20 Thread Chris Siebenmann
> Hi All, > > I've started looking into using pushgateway metrics. > I tried a variation of the curl example found in the pushgateway git > readme: > > cat < http://my-prometheus-host:9091/metrics/job/some_job/instance/some_instance > # TYPE another_metric gauge > # HELP another_metric Just an

Re: [prometheus-users] SunOS 5.10 architecture: sparc Access monitoring

2021-04-20 Thread Chris Siebenmann
> Machine parameters that need to be connected to monitoring > > Release: 5.10 > Kernel architecture: sun4u > Application architecture: sparc > Hardware provider: Sun_Microsystems > Domain: > Kernel version: SunOS 5.10 Generic_144488-17 > > Does

Re: [prometheus-users] Timestamp of Duration Metric of a Periodic Task

2021-03-18 Thread Chris Siebenmann
> Prometheus' answer is to construct time series of the number of runs, > and cumulative run time, starting at some arbitrary point in time > (together these are a summary). By looking at the change in these > numbers over time, we can calculate the duty cycle (what fraction of > time is spent

Re: [prometheus-users] Alerts history

2021-03-15 Thread Chris Siebenmann
> Is there any way to get the alerts history like currently I am able to > see the alerts which are trigger and those are able to see in grafana > dashboard by using grafana plugins and now I want to see past alerts > also like 1 days before or n no of days before alerts how could it > possibly.

Re: [prometheus-users] NFS IO Stats.

2020-11-30 Thread Chris Siebenmann
> Hi. I have enabled the mounstats, nfs and nfsd collectors for the NFS > side metrics but there is a plethora of metrics without any proper > documentation. Can somebody help with which metrics would give me > the reads and writes completed on a particular mount? Like we have >

Re: [prometheus-users] How to query a pushgateway for metrics via Python?

2020-11-27 Thread Chris Siebenmann
> I have a case where I probably have to push say 200 metrics from > Python to a pushgateway->prometheus->grafana system - it rocks, > However the 200 metrics might later be reduced to 170 metrics, where > at that point I need to query the pushgateway for the obtained metrics > - and delete the 30

Re: [prometheus-users] PromQL query to find the duration of each firing alert

2020-11-27 Thread Chris Siebenmann
> I am looking for a solution to calculate the total duration of each > firing alert since it started firing. Following is the query I tried, > but i see the value for all the firing alert is 86400 > > (avg_over_time(customer_ALERTS{alertstate="firing",severity="critical"}[24h])) > *24 * 3600

Re: [prometheus-users] Alerts resolved upon prometheus crash

2020-03-09 Thread Chris Siebenmann
These days alerts time out faster than this, and the timeout is controlled by Prometheus instead of by Alertmanager. If you look at an active alert in Alertmanager, you'll see an 'endsat' value (or a similar-sounding label) that's a couple of minutes into the future. Prometheus sets that in

Re: [prometheus-users] Alertmanager spams messages on slack

2020-02-18 Thread Chris Siebenmann
> I am using alertmanager to post alerts on slack. Here is the configuration > of my alert: > > expr: > for: 60m > > Here are the settings on my alertmanager: > > global: > resolve_timeout: 5m > route: > group_by: ['alertname', 'cluster'] > group_interval: 5m > group_wait: 30s >

Re: [prometheus-users] Re: sometimes I just received a resolved email but not firing email

2020-02-18 Thread Chris Siebenmann
> If I understand correctly, prometheus doesn't send any "resolved" > message to alertmanager: it just stops sending alerts. Alertmanager > treats the lack of alert as meaning "resolved". > > Therefore, if you receive the "resolved" message, then this proves > that alertmanager must have received