Just be aware that you can end up with very noisy data. Something which looks like a failure could easily be due to transient issues - failed scrapes, etc.
On 22 June 2020 20:46:28 BST, "Sébastien Dionne" <[email protected]> wrote: >thanks > >in my case, the alerts will be send to our healthManager and update the > >states of our application in the database. No human interaction. > > >I though of using a script with a liveness probe and the script could >sent >a POST into our healthManager.. but at the end, it's the same thing >because >the livenessprobe will run like each 5 seconds. So I prefer to use the > >metrics that Prometheus will scrap anyway. > > > > >On Monday, June 22, 2020 at 3:42:14 PM UTC-4, Stuart Clark wrote: >> >> While it is definitely possible to have very low scrape intervals and >very >> sensitive alerts often that results in poor outcomes. >> >> The reality is that reaction times to alerts are generally fairly >long - >> an alert outside of office hours could easily take 30 minutes or >longer to >> respond to. I'd suggest being very careful about such short "for" >> intervals. You can very easily end up with a lot of false positives, >with >> alerts which fire then resolve, fire then resolve. >> >> But technically you can have scrape intervals of a second or less, >and >> "for"s of a few seconds. >> >> On 22 June 2020 20:08:08 BST, "Sébastien Dionne" ><[email protected] >> <javascript:>> wrote: >>> >>> I want to use Prometheus + alertmanager for health manager. I want >to >>> know what is the lowest value I can use for scraping metrics (I hope >that I >>> can have a config for particuliar rules) and send alert as soon as >there >>> are alerts. I need almost realtime. Is it possible in Prometheus + > >>> alertmanager ? >>> >>> >>> I have a sample config that works now, but is it possible to have 1s >are >>> something that prometheus send alert as soon as the metric is read ? >>> >>> serverFiles: >>> alerts: >>> groups: >>> - name: Instances >>> rules: >>> - alert: InstanceDown >>> expr: up == 0 >>> for: 10s >>> labels: >>> severity: page >>> annotations: >>> description: '{{ $labels.instance }} of job {{ >$labels.job >>> }} has been down for more than 1 minute.' >>> summary: 'Instance {{ $labels.instance }} down' >>> >>> alertmanagerFiles: >>> alertmanager.yml: >>> route: >>> receiver: default-receiver >>> group_wait: 5s >>> group_interval: 10s >>> >>> receivers: >>> - name: default-receiver >>> webhook_configs: >>> - url: " >>> https://webhook.site/815a0b0b-f40c-4fc2-984d-e29cb9606840" >>> >>> >>> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. >> > >-- >You received this message because you are subscribed to the Google >Groups "Prometheus Users" group. >To unsubscribe from this group and stop receiving emails from it, send >an email to [email protected]. >To view this discussion on the web visit >https://groups.google.com/d/msgid/prometheus-users/f1787055-a91f-491b-8eaf-0a8fec9aca00o%40googlegroups.com. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/C88F5644-BCDD-47F5-86BD-38D7B9CEC263%40Jahingo.com.

