[prometheus-users] Use Case Assessment

Rishabh Arora Mon, 17 Oct 2022 00:01:46 -0700

Hello!

I'm currently in the process of implementing Prometheus along with 
Alertmanager as our de facto solution for node health monitoring. We have a 
kubernetes, kafka, mqtt setup and for monitoring our infrastructure, 
prometheus is an obvious good fit.

We have an application / business case, where I'm wondering whether
Prometheus may be a reasonable solution. Our application needs to meet
certain SLAs. In case those SLAs are not being, some alerts need to be
firing. For example, consider the following case which bears close
resemblance to our real business case:

An *Order* schema in our system has a *payment* field which can be one of
['COMPLETED','FAILED','PENDING']. In our HA real time system, we need to
fire alerts for Orders which are in a PENDING state. Rows in our *Orders*
collection
will be in the order of potentially millions. An order also has a
*paymentEngine* field, which represents the entity responsible for
processing the payment for the order.

Now, with Prometheus, finding the total count of PENDING Orders would be a
simple metric, but what we're interested in is also the Order IDs. For
instance, is there a way I could capture the PENDING order IDs in the
"metadata"(???) or "payload" of the metric? Downstream in the alertmanager,
I'd also like to group by *paymentEngine* so I could potentially inhibit
alerts for an unstable engine.

Can anyone please help me out? Apologies in advance for my naivety :)

Best,

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/dd57c63c-5e33-4103-9d3b-7968b26a4a59n%40googlegroups.com.

[prometheus-users] Use Case Assessment

Reply via email to