> Also, what would happen if e.g. there was a first scrape, which get's
> delayed > 0.002 s ... and before that first scrape arrives, there's yet
> another (later) scrape which has no jitter and is on time?
> Are they going to be properly ordered?
As far as I know, this can't happen. The
> The assumptions I've made are basically three:
> - Prometheus does that "faking" of sample times, and thus these are
> always on point with exactly the scrape interval between each.
> This in turn should mean, that if I have e.g. a scrape interval of
> 10s, and I do up[20s], then
> It seems that resolved messages are still thrown/received when an inhibited
> alert is resolved. Is there any way to squelch these as well? Or is this
> pretty much as intended.
Inhibitions normally stop notifications about resolved alerts. However,
I suspect that you may be running into a
> I am encountering challenges with configuring Prometheus and Alertmanager
> for my application's alarm system. Below are the configurations I am
> currently using:
>
> *prometheus.yml:*
>
> Scrape Interval: 1h
This scrape interval is far too high. Although it's not well documented,
you can't
> I noticed a somewhat unexpected behaviour, perhaps someone can explain why
> this happens.
>
> - on a Prometheus instance, with a scrape interval of 10s
> - doing the following queries via curl from the same node where Prometheus
> runs (so there cannot be any different system times or so
>
>
> Having a quick look at the binary, it seems, that netgo build tag was
> applied:
>
> $ strings blackbox_exporter-0.24.0.linux-amd64/blackbox_exporter | egrep
> '\-tags.*net.*'
> build -tags=netgo
> build -tags=netgo
As a side note: if you have the Go toolchain available, you can use 'go
Alertmanager has supported inhibiting alert notifications for a long
time, and somewhat more recently it's added support for muting them
during specific time ranges (or only sending them in specific time
ranges). As far as I know, the behaviour (for both inhibited alerts and
muted alerts) is that
> As a reminder, my goal was:
> - if e.g. scrapes fail for 1m, a target-down alert shall fire (similar to
> how Icinga would put the host into down state, after pings failed or a
> number of seconds)
> - but even if a single scrape fails (which alone wouldn't trigger the above
> alert) I'd
> Will I run into issues with "staleness" if there aren't any metrics anymore
> for (more) than 5 minutes?
> Or perhaps can I use this "staleness" indicator in some way?
Perhaps this is a use for absent() or absent_over_time(), if you know
specific metrics that should always be present from the
> I am planning to store 3 years of data from 300 server in a single
> prometheus server. The data will primarily consist of default exporter
> metrics and the server has 500G memory and 80 cores.
We currently scrape metrics from 908 different sources (from
'count(up)'), 153 of which are the
> In our DataCenter we have different security zones. In each zone I
> want to place a blackbox_exporter. The goal is that each
> blackbox_exporter monitors the same destinations eg. same DNS Server
> or same webserver. All exporters are controlled by one single
> prometheus server.
>
> If someone
> I don't know if this has been proposed before, so I'd like to raise it here
> before taking it to github or prometheus-developers.
>
> There are cases where binary operators could act between range vectors and
> scalars, but this is not currently allowed today (except by using
> subqueries,
> I understand that there can be some interpolation at the boundaries, but
> the value is not changing around the boundaries, it only changes in the
> middle of the time range. Scrap is done every 15s and the value of the
> metric is constant more than 1 minute before and after the boundaries.
> I have a counter and I want to counter the number of occurences on a
> duration (let's say 15m). I'm using delta() or increase but I'm not getting
> the result I'm expecting.
>
> value @t0: 30242494
> value @t0+15m: 30609457
> calculated diff: 366963
> round(max_over_time(metric[15m])) -
> I'm working with a metric like CPU usage, where instance identifiers
> are submitted as labels. To ensure instances are running as expected,
> I've defined an alert based on this metric. The alert triggers when
> the aggregation value (in my case, the increase) over a time window
> falls below
> Blackbox exporter does have a /metrics endpoint, but this is only for
> metrics internal to the operation of blackbox_exporter itself (e.g.
> memory stats, software version). You don't need to scrape this, but it
> gives you a little bit of extra info about how your exporter is
> performing.
> I've recently started monitoring a large fleet of hardware devices
> using a combination of blackbox, snmp, node, and json exporters. I
> started out using the *up* metric, but I noticed when using blackbox
> ping, *up* is *always* 1 even when the device is offline. So I plan to
> switch to
> Hi All , we have rancher based setup . We are receiving webhook payload
> from alertmanager, but alarms are not generating at webhook due to format
> issue.
>
> For example we need to remove \
> Here we need
> *severity=warning*, instead of *severity=\"warning\"}",*
> *Can we update the
> This is a question that I have asked in the past in other channels, but
> I never found a solution, so I am trying again in case somebody knows of
> some tool or hack, I cannot be the only one with this need!
>
> I have a NodeJS application that I have developed for which I really
> need to
> Does anyone know how to set the time between firing and resolved
> alert?
>
> I thought it is resolve_timeout after "global" line in
> alertmanager.yml, but it not working. Is there something that need to
> be configured additionally?
The resolve_timeout setting in Alertmanager is in practice
> Hi All,
>
> I've started looking into using pushgateway metrics.
> I tried a variation of the curl example found in the pushgateway git
> readme:
>
> cat < http://my-prometheus-host:9091/metrics/job/some_job/instance/some_instance
> # TYPE another_metric gauge
> # HELP another_metric Just an
> Machine parameters that need to be connected to monitoring
>
> Release: 5.10
> Kernel architecture: sun4u
> Application architecture: sparc
> Hardware provider: Sun_Microsystems
> Domain:
> Kernel version: SunOS 5.10 Generic_144488-17
>
> Does
> Prometheus' answer is to construct time series of the number of runs,
> and cumulative run time, starting at some arbitrary point in time
> (together these are a summary). By looking at the change in these
> numbers over time, we can calculate the duty cycle (what fraction of
> time is spent
> Is there any way to get the alerts history like currently I am able to
> see the alerts which are trigger and those are able to see in grafana
> dashboard by using grafana plugins and now I want to see past alerts
> also like 1 days before or n no of days before alerts how could it
> possibly.
> Hi. I have enabled the mounstats, nfs and nfsd collectors for the NFS
> side metrics but there is a plethora of metrics without any proper
> documentation. Can somebody help with which metrics would give me
> the reads and writes completed on a particular mount? Like we have
>
> I have a case where I probably have to push say 200 metrics from
> Python to a pushgateway->prometheus->grafana system - it rocks,
> However the 200 metrics might later be reduced to 170 metrics, where
> at that point I need to query the pushgateway for the obtained metrics
> - and delete the 30
> I am looking for a solution to calculate the total duration of each
> firing alert since it started firing. Following is the query I tried,
> but i see the value for all the firing alert is 86400
>
> (avg_over_time(customer_ALERTS{alertstate="firing",severity="critical"}[24h]))
> *24 * 3600
These days alerts time out faster than this, and the timeout is
controlled by Prometheus instead of by Alertmanager. If you look
at an active alert in Alertmanager, you'll see an 'endsat' value
(or a similar-sounding label) that's a couple of minutes into the
future. Prometheus sets that in
> I am using alertmanager to post alerts on slack. Here is the configuration
> of my alert:
>
> expr:
> for: 60m
>
> Here are the settings on my alertmanager:
>
> global:
> resolve_timeout: 5m
> route:
> group_by: ['alertname', 'cluster']
> group_interval: 5m
> group_wait: 30s
>
> If I understand correctly, prometheus doesn't send any "resolved"
> message to alertmanager: it just stops sending alerts. Alertmanager
> treats the lack of alert as meaning "resolved".
>
> Therefore, if you receive the "resolved" message, then this proves
> that alertmanager must have received
30 matches
Mail list logo