blackbox_exporter monitoring TCP ports (e.g. for SSH) and ICMP (ping) works 
fine.

"but black box exporter detect the recover behavior after about 5mins"

Black box exporter only performs a single test when you scrape it.  It does 
not by itself do any recovery detection.  The problem is therefore most 
likely with your prometheus scrape config or your alertmanager config.

If you're having a problem, you'll need to be more specific:
* show your blackbox_exporter config, your prometheus scrape config which 
scrapes it, your alerting rules, and your alertmanager config (if using 
alertmanager)
* describe more clearly the behaviour you're seeing, and what you expected 
to see.  (For example, are you waiting for a "recovery" E-mail from 
alertmanager?)

"And after the IP table is recovered, the alert for Ping can be cleared 
after about 20mins, but SSH is still there."

Either SSH is working and reachable, or it is not.  You can check the 
results of blackbox_exporter tests by hand using curl, and also get 
additional debugging information, like this:

curl -g 'http://127.0.0.1:9115/probe?module=xxx&target=yyyy&debug=true'

Here is an example:

# *curl -g 
'http://localhost:9115/probe?module=icmp&target=1.2.3.4&debug=true'*
Logs for the probe:
ts=2022-04-20T12:25:11.587855449Z caller=main.go:320 module=icmp 
target=1.2.3.4 level=info msg="Beginning probe" probe=icmp timeout_seconds=3
ts=2022-04-20T12:25:11.588014456Z caller=icmp.go:91 module=icmp 
target=1.2.3.4 level=info msg="Resolving target address" ip_protocol=ip6
ts=2022-04-20T12:25:11.588065658Z caller=icmp.go:91 module=icmp 
target=1.2.3.4 level=info msg="Resolving target address" ip_protocol=ip4
ts=2022-04-20T12:25:11.588098688Z caller=icmp.go:91 module=icmp 
target=1.2.3.4 level=info msg="Resolved target address" ip=1.2.3.4
ts=2022-04-20T12:25:11.588133368Z caller=main.go:130 module=icmp 
target=1.2.3.4 level=info msg="Creating socket"
ts=2022-04-20T12:25:11.588188673Z caller=main.go:130 module=icmp 
target=1.2.3.4 level=debug msg="Unable to do unprivileged listen on socket, 
will attempt privileged" err="socket: permission denied"
ts=2022-04-20T12:25:11.58829848Z caller=main.go:130 module=icmp 
target=1.2.3.4 level=info msg="Creating ICMP packet" seq=24581 id=190
ts=2022-04-20T12:25:11.588348917Z caller=main.go:130 module=icmp 
target=1.2.3.4 level=info msg="Writing out packet"
ts=2022-04-20T12:25:11.588470176Z caller=main.go:130 module=icmp 
target=1.2.3.4 level=info msg="Waiting for reply packets"
ts=2022-04-20T12:25:14.588761946Z caller=main.go:130 module=icmp 
target=1.2.3.4 level=debug msg="Cannot get TTL from the received packet. 
'probe_icmp_reply_hop_limit' will be missing."
ts=2022-04-20T12:25:14.588979317Z caller=main.go:130 module=icmp 
target=1.2.3.4 level=warn msg="Timeout reading from socket" err="read ip 
0.0.0.0: raw-read ip4 0.0.0.0: i/o timeout"
ts=2022-04-20T12:25:14.589247538Z caller=main.go:320 module=icmp 
target=1.2.3.4 level=error msg="Probe failed" duration_seconds=3.001307309



Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns 
lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.000116077
# HELP probe_duration_seconds Returns how long the probe took to complete 
in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 3.001307309
# HELP probe_icmp_duration_seconds Duration of icmp request by phase
# TYPE probe_icmp_duration_seconds gauge
probe_icmp_duration_seconds{phase="resolve"} 0.000116077
probe_icmp_duration_seconds{phase="rtt"} 0
probe_icmp_duration_seconds{phase="setup"} 0.000212886
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to 
detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 3.268949123e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0



Module configuration:
prober: icmp
timeout: 3s
http:
    ip_protocol_fallback: true
    follow_redirects: true
tcp:
    ip_protocol_fallback: true
icmp:
    ip_protocol_fallback: true
dns:
    ip_protocol_fallback: true


Look at "probe_success" for the overall result.

You can also use the PromQL browser in the Prometheus web interface: enter 
"probe_success" as the query and look at the graph tab. You'll see the 
history of your blackbox exporter probes.

On Wednesday, 20 April 2022 at 12:37:17 UTC+1 [email protected] wrote:

> Hi guys,
>
> We are using black box exporter to monitor ssh and ping.
>
> For ssh, (we monitor the port 22) if we stop sshd service, actually the 
> service will be auto-recovered, but black box exporter detect the recover 
> behavior after about 5mins.
>
> For ping, we use icmp module to monitor system ping, we deleted the IP 
> tables, then Prometheus triggered 2 alerts, one is SSH is failed, the other 
> is Ping is failed. And after the IP table is recovered, the alert for Ping 
> can be cleared after about 20mins, but SSH is still there.
>
> So it is a good approach to use blackbox exporter to monitor SSH and PING?
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/49ff724b-5ed5-414c-b5f8-4f7353b1fc57n%40googlegroups.com.

Reply via email to