Try looking at scrape_duration_seconds{job="Ping-All-Servers"}. Maybe it's
borderline to the scrape interval.
What does min_over_time(up{job="Ping-All-Servers"}[5m]) show? In other
words, is it the scrape to BBE which is failing, or the BBE probe? (I think
the latter).
Is there a different network path between the two prometheus servers and
BBE?
It still bothers me that BBE is logging panics. Something weird is going
on in your BBE. Could even be a hardware problem.
I think you should paste your entire scrape config and BBE config, in case
something else jumps out.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/b466378c-050d-45ad-9910-7af69fc92d69o%40googlegroups.com.