On Saturday, 7 November 2020 13:35:47 UTC, Yagyansh S. Kumar wrote:
>
> Try looking at scrape_duration_seconds{job="Ping-All-Servers"}. Maybe
> it's borderline to the scrape interval.
> >> That's interesting. Here are the top 20 scrape_duration_seconds maxed
> for last 1 hour by instance. Close to 5 seconds. Can this lead to some
> issue?
>
Possibly. Maybe the scrape timeout handling has changed slightly between
those version of prometheus. I would in any case be concerned about the
scrape duration being so close to the scrape interval, although failed
scrapes should still show as "up == 0".
However, I note that the scrape.yml you posted shows the Ping-All-Servers
job with a scrape interval of 10s, not 5s.
I also notice your module config has:
icmp_prober:
prober: icmp
timeout: 30s
icmp:
preferred_ip_protocol: ip4
I *think* the timeout is clipped to just under the scrape interval, so it
should work, but I'd be inclined to set it lower anyway (say 3s); if you
don't get a reply within 3s, you're unlikely to get one.
Since this test only does one ping, I would *expect* it to fail from time
to time, and hence the alert go into "pending" state until the "for: 1m"
has run its course.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/73d68985-b5e4-4413-a70e-6c4d54bcf57eo%40googlegroups.com.