blackbox_exporter monitoring TCP ports (e.g. for SSH) and ICMP (ping) works fine.
"but black box exporter detect the recover behavior after about 5mins" Black box exporter only performs a single test when you scrape it. It does not by itself do any recovery detection. The problem is therefore most likely with your prometheus scrape config or your alertmanager config. If you're having a problem, you'll need to be more specific: * show your blackbox_exporter config, your prometheus scrape config which scrapes it, your alerting rules, and your alertmanager config (if using alertmanager) * describe more clearly the behaviour you're seeing, and what you expected to see. (For example, are you waiting for a "recovery" E-mail from alertmanager?) "And after the IP table is recovered, the alert for Ping can be cleared after about 20mins, but SSH is still there." Either SSH is working and reachable, or it is not. You can check the results of blackbox_exporter tests by hand using curl, and also get additional debugging information, like this: curl -g 'http://127.0.0.1:9115/probe?module=xxx&target=yyyy&debug=true' Here is an example: # *curl -g 'http://localhost:9115/probe?module=icmp&target=1.2.3.4&debug=true'* Logs for the probe: ts=2022-04-20T12:25:11.587855449Z caller=main.go:320 module=icmp target=1.2.3.4 level=info msg="Beginning probe" probe=icmp timeout_seconds=3 ts=2022-04-20T12:25:11.588014456Z caller=icmp.go:91 module=icmp target=1.2.3.4 level=info msg="Resolving target address" ip_protocol=ip6 ts=2022-04-20T12:25:11.588065658Z caller=icmp.go:91 module=icmp target=1.2.3.4 level=info msg="Resolving target address" ip_protocol=ip4 ts=2022-04-20T12:25:11.588098688Z caller=icmp.go:91 module=icmp target=1.2.3.4 level=info msg="Resolved target address" ip=1.2.3.4 ts=2022-04-20T12:25:11.588133368Z caller=main.go:130 module=icmp target=1.2.3.4 level=info msg="Creating socket" ts=2022-04-20T12:25:11.588188673Z caller=main.go:130 module=icmp target=1.2.3.4 level=debug msg="Unable to do unprivileged listen on socket, will attempt privileged" err="socket: permission denied" ts=2022-04-20T12:25:11.58829848Z caller=main.go:130 module=icmp target=1.2.3.4 level=info msg="Creating ICMP packet" seq=24581 id=190 ts=2022-04-20T12:25:11.588348917Z caller=main.go:130 module=icmp target=1.2.3.4 level=info msg="Writing out packet" ts=2022-04-20T12:25:11.588470176Z caller=main.go:130 module=icmp target=1.2.3.4 level=info msg="Waiting for reply packets" ts=2022-04-20T12:25:14.588761946Z caller=main.go:130 module=icmp target=1.2.3.4 level=debug msg="Cannot get TTL from the received packet. 'probe_icmp_reply_hop_limit' will be missing." ts=2022-04-20T12:25:14.588979317Z caller=main.go:130 module=icmp target=1.2.3.4 level=warn msg="Timeout reading from socket" err="read ip 0.0.0.0: raw-read ip4 0.0.0.0: i/o timeout" ts=2022-04-20T12:25:14.589247538Z caller=main.go:320 module=icmp target=1.2.3.4 level=error msg="Probe failed" duration_seconds=3.001307309 Metrics that would have been returned: # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds # TYPE probe_dns_lookup_time_seconds gauge probe_dns_lookup_time_seconds 0.000116077 # HELP probe_duration_seconds Returns how long the probe took to complete in seconds # TYPE probe_duration_seconds gauge probe_duration_seconds 3.001307309 # HELP probe_icmp_duration_seconds Duration of icmp request by phase # TYPE probe_icmp_duration_seconds gauge probe_icmp_duration_seconds{phase="resolve"} 0.000116077 probe_icmp_duration_seconds{phase="rtt"} 0 probe_icmp_duration_seconds{phase="setup"} 0.000212886 # HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes. # TYPE probe_ip_addr_hash gauge probe_ip_addr_hash 3.268949123e+09 # HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6 # TYPE probe_ip_protocol gauge probe_ip_protocol 4 # HELP probe_success Displays whether or not the probe was a success # TYPE probe_success gauge probe_success 0 Module configuration: prober: icmp timeout: 3s http: ip_protocol_fallback: true follow_redirects: true tcp: ip_protocol_fallback: true icmp: ip_protocol_fallback: true dns: ip_protocol_fallback: true Look at "probe_success" for the overall result. You can also use the PromQL browser in the Prometheus web interface: enter "probe_success" as the query and look at the graph tab. You'll see the history of your blackbox exporter probes. On Wednesday, 20 April 2022 at 12:37:17 UTC+1 [email protected] wrote: > Hi guys, > > We are using black box exporter to monitor ssh and ping. > > For ssh, (we monitor the port 22) if we stop sshd service, actually the > service will be auto-recovered, but black box exporter detect the recover > behavior after about 5mins. > > For ping, we use icmp module to monitor system ping, we deleted the IP > tables, then Prometheus triggered 2 alerts, one is SSH is failed, the other > is Ping is failed. And after the IP table is recovered, the alert for Ping > can be cleared after about 20mins, but SSH is still there. > > So it is a good approach to use blackbox exporter to monitor SSH and PING? > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/49ff724b-5ed5-414c-b5f8-4f7353b1fc57n%40googlegroups.com.

