I don't think your use case is something that AM or Prometheus is looking to solve.
The way I see it : - Prometheus has metrics and alarm patterns. - It triggers an alarm and sends it to AM. - AM receives the alarm and does some basic routing based on labels. - Once the Prometheus pattern becomes false, a recovery is sent. That's pretty much it. There is no concept of escalation, end to end service recovery, or service mapping inside AM or Prometheus. In theory, you could have a "fake" alarm, where you send some json to AM, with specific flags and that triggers a specific route to send the SMS/email to the appropriate recipients. But I don't think it's really part of the core purpose of AM. It's one of the values of a service like PagerDuty. But that still relies on Prometheus metric --> Prometheus alert triggered --> AM alert received --> AM sends the alert somewhere. Just my 2 cents :) On Wed, Dec 16, 2020 at 11:21 AM Al <[email protected]> wrote: > Thanks for the quick response Stuart. One of our specific use cases > (although there will be more over time) would be something where a first or > second level support team escalates an issue they can't solve t to the > engineers responsible for the product. In this case, there would be no > metric as this is an event that could happen at any time for which we don't > really want a metric. Triggering the alert via alertmanager seemed a > logical choice as it already handles the logic of the routing to the > necessary destinations (email, webhook, victorops, etc). All the user > would have to do us run the amtool command, with the necessary labels and > wouldn't have to worry about any other specifics. > > Based on your explanation, I now understand alertmanager can't really be > used that way. Could you show me where in the AM sourcecode that it will > close an alert unless it is continuously notified by Prometheus? I'd like > to know for my own personal knowledge. > > Now having considered these facts, do you have any suggestions based on > this example? Is this just something we should handle separately with > another custom application? If that's the case, it's a bit discouraging as > now that means we have to handle the logic of alert routing in more than > one location. > > > > Al > > On Monday, December 14, 2020 at 12:52:53 PM UTC-5 Stuart Clark wrote: > >> On 2020-12-14 17:05, Al wrote: >> > Hi >> > >> > I realize alert conditions in a Promertheus ecosystem should be >> > triggered from a prometheus instance itself although there is the >> > "amtool alert add" command that can be used to manually trigger an >> > alert. Is this something which is commonly used in production >> > use-cases? I can see a benefit to using this command as I could still >> > allow users to trigger alerts in a standardized way, but without >> > having to have specific pre-defined alerting conditions. There may >> > also be situations where there is no metric collected but only an >> > alert to be triggered in the situation a specific event occurs. >> > >> > From my understanding, when prometheus fires an alert, it will send >> > the payload to all instances of alert manager with in the cluster and >> > then they will handle which instance will actually route the alert to >> > the final destination (e.g.: Victorops, email, webook, etc). If this >> > is in fact correct, does this mean that amtool should also send the >> > alert to all alertmanager instances within the cluster? >> > >> > I appreciate any clarification you can provide me with. >> > >> >> That command is only intended for testing. Alerts aren't a one-off API >> call from Prometheus to Alertmanager. Instead Prometheus will repeatedly >> call every single Alertmanager periodically until the alert is cleared. >> If Alertmanager stops receiving these updates it will mark the alert as >> resolved. >> >> Alerts in the Prometheus world are triggered based on the evaluation of >> alerting rules, which themselves are queries which interrogate metrics. >> Therefore every alert would be based on some sort of source metric >> (there are a few exceptions, such as having an alert which always fires >> to check the alerting pipeline for example). >> >> For one of the example use cases you gave you said an alert should be >> triggered if an event happens. Prometheus itself isn't an event system, >> but you can create metrics from events. So in that case you'd have a >> metric that could be a counter of the number of events that have >> happened. Then your alert would fire when that value increases (for >> example). >> >> Are you able to give some more information on what use cases you are >> trying to handle? >> >> -- >> Stuart Clark >> > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/74029e3f-d642-4174-9a19-646c23618430n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/74029e3f-d642-4174-9a19-646c23618430n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAOAKi8yRQht7B4u47cXWE0YsX-PFcp3y4ges-kNZo6DxsDf53Q%40mail.gmail.com.

