Getting "Empty Query Results" at this moment. I will check when I notice the problem again.
Thanks for your input ! Paras. On Mon, Sep 19, 2022 at 4:03 PM Brian Candler <b.cand...@pobox.com> wrote: > Are you collecting prometheus' own metrics? Something like this: > > - job_name: prometheus > scrape_interval: 1m > static_configs: > - targets: ['localhost:9090'] > > If you are, then there are various metrics you should check, including: > prometheus_rule_evaluations_total > prometheus_rule_evaluation_failures_total > prometheus_rule_group_iterations_total > prometheus_rule_group_iterations_missed_total > > For the rule / rule group in question, check which of these are > incrementing during the problem period. If the 'failures' or 'missed' are > incrementing, that points to a problem. Similarly if the > 'evaluations_total' or 'iterations_total' *isn't* incrementing. > > Also, have a look at error output from prometheus while the problem is > occurring: > journalctl -fu prometheus > > On Monday, 19 September 2022 at 21:53:46 UTC+1 pradha...@gmail.com wrote: > >> Correct. Restating prometheus does fix it. >> >> On Mon, Sep 19, 2022 at 3:44 PM Brian Candler <b.ca...@pobox.com> wrote: >> >>> "Restarting prometheus, alertmanager and blackbox-exports fixes the >>> issue" >>> >>> Which one of these fixes the issue? From what you've said, I am >>> guessing that restarting only prometheus would do it - since you're saying >>> you see no alerts in the Prometheus UI, not even in "pending" state. >>> >>> On Monday, 19 September 2022 at 21:39:11 UTC+1 pradha...@gmail.com >>> wrote: >>> >>>> Prometheus : 2.38.0 >>>> Alertmanager : 0.24.0 >>>> Blackbox: 0.22.0 >>>> >>>> probe_success{job="blackbox_icmp-server"} returns 0. I see it . >>>> >>>> Thanks >>>> Paras. >>>> >>>> On Mon, Sep 19, 2022 at 3:32 PM Brian Candler <b.ca...@pobox.com> >>>> wrote: >>>> >>>>> Prometheus version? Alertmanager version? >>>>> >>>>> What if you enter the query >>>>> probe_success{job="blackbox_icmp-server"} == 0 >>>>> in the prometheus web interface (PromQL browser) while the problem is >>>>> happening? Does it show any results? >>>>> >>>>> On Monday, 19 September 2022 at 19:21:29 UTC+1 pradha...@gmail.com >>>>> wrote: >>>>> >>>>>> Hello Julius >>>>>> >>>>>> * The rule is something like this: >>>>>> >>>>>> - name: ServerDown >>>>>> rules: >>>>>> - alert: Server-InstanceDown >>>>>> expr: probe_success{job="blackbox_icmp-server"} == 0 >>>>>> for: 1m >>>>>> >>>>>> * When alerting is not working, they are down for hours until I >>>>>> restart prometheus and blackbox exporters. After restarting, everything >>>>>> is >>>>>> normal. >>>>>> >>>>>> * The underlying metrics (probe_sucess) get 0 when it's down but >>>>>> they don't change to Pending/Fired. >>>>>> >>>>>> Thanks >>>>>> Paras. >>>>>> >>>>>> On Mon, Sep 19, 2022 at 2:35 AM Julius Volz <juliu...@promlabs.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Paras, >>>>>>> >>>>>>> Could you share more information about your setup: >>>>>>> >>>>>>> * What's the alerting rule that isn't working as intended? >>>>>>> * For how long were the hosts down without getting alerted on? >>>>>>> * What did the underlying metrics (e.g. "up" for the exporter's own >>>>>>> scrape health and "probe_success" for the backend probe health) >>>>>>> collected >>>>>>> by the Blackbox Exporter look like at the time when the alert should >>>>>>> have >>>>>>> been firing, but didn't? >>>>>>> >>>>>>> One possibility is that your Blackbox exporter itself couldn't be >>>>>>> scraped anymore, in which case its "up" metric would be 0 and the >>>>>>> "probe_success" metric would be absent (and thus any alerts based on >>>>>>> that >>>>>>> metric would never fire). >>>>>>> >>>>>>> Regards, >>>>>>> Julius >>>>>>> >>>>>>> On Thu, Sep 15, 2022 at 6:33 PM Paras pradhan <pradha...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> We use prometheus , alertmanager and blackbox-exporter to check >>>>>>>> hosts if they respond to icmp. Host counts are 1K+. We noticed >>>>>>>> sometimes >>>>>>>> and randomly the alerts are not generated (prometheus dashboard --> >>>>>>>> alerts) when the hosts/targets are actually down. Restarting >>>>>>>> prometheus, >>>>>>>> alertmanager and blackbox-exports fixes the issue. Don't see anything >>>>>>>> that >>>>>>>> standouts in the logs. How do I troubleshoot and is there anything like >>>>>>>> cache data in prometheus that needs to be cleared? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Paras. >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "Prometheus Users" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to prometheus-use...@googlegroups.com. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com >>>>>>>> <https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Julius Volz >>>>>>> PromLabs - promlabs.com >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Prometheus Users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to prometheus-use...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/prometheus-users/8e9dedc5-38ca-4e22-883c-3f15a5f84227n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/prometheus-users/8e9dedc5-38ca-4e22-883c-3f15a5f84227n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to prometheus-use...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/0a344880-3ac6-4567-9e0a-7e8cec7177dan%40googlegroups.com >>> <https://groups.google.com/d/msgid/prometheus-users/0a344880-3ac6-4567-9e0a-7e8cec7177dan%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to prometheus-users+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/50e6a4a9-2e0c-4804-bc01-29925565310bn%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/50e6a4a9-2e0c-4804-bc01-29925565310bn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CADyt5gmC6D-%3DC_vOCW3fY9wMeE2QSajnhPTgz%2BnK629edvWQLQ%40mail.gmail.com.