Unfortunately this won't work for us due to different reasons : * Our environment is highly dynamic, we take servers in and out from production more often * We then would need an aggregator like (thanos-query) if we want to see our data in one place * We also want to avoid as much as we can the appearing and disappearing time-series
On Fri, Feb 19, 2021 at 10:48 PM Stuart Clark <[email protected]> wrote: > On 19/02/2021 19:43, Badreddin Aboubakr wrote: > > > Hello, > > We use Prometheus to monitor our infrastructure (hypervisors, gateways, > storage servers, etc). Scrape targets are sourced from a Postgres database, > which contains additional information about the “in production” state of > the target. In the beginning we used to have a metadata metric which > indicated the state of the server as an `enum` metric. > > By joining the state metric on each alerting rule and then dropping the > alerts which have specific state, we were able to suppress un-needed alerts > > With the growth of number of alerting rules and the number of states, > joining on these metrics in all alerting rules became so expensive that we > wrote some recording rules which keeps evaluating the enum metric and > produces enum metric with less cardinality (production (where alerts shall > pass to their receivers) and everything else (Will be dropped at > alertmanager step)) > > so again we join on these metrics and drop alerts which have > non-production. > > Now that is not going to scale but it was a temporary solution as our > alerting rules are growing. > > So we discussed some solutions: > > * We can set silences and remove them on state change using alert manager > API: > > This approach is too dynamic however (I don’t know if alertmanager API > was designed for this purpose and, maybe it’s ) Will that scale with > number of silences and hosts > > * We can develop a kind of proxy which will be deployed between Prometheus > and alertmanager, and drop alerts for hosts in non-production state: > > This approach is dangerous as if the proxy fails, no alerts will reach > alertmanager > > * put the proxy on the notification path: This will make it a bit > complicated as the proxy has to understand receivers, etc > > PS: We still want to scrape and monitor the servers which are not in > production state. > > We will be really thankful for any suggestions or ideas. > > Couldn't you run two sets of Prometheus servers to monitor the production > infrastructure separately from the non-production. Then just don't have > alerting rules or connect alertmanagers to the non-production servers. > > -- > Stuart Clark > > -- Badreddin Aboubakr GAPS (IONOS Cloud) 1&1 IONOS SE | Greifswalder Str. 207 | 10405 Berlin | Germany Phone: E-mail: [email protected] | Web: www.ionos.de Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 24498 Vorstand: Hüseyin Dogan, Dr. Martin Endreß, Claudia Frese, Henning Kettler, Arthur Mai, Matthias Steinberg, Achim Weiß Aufsichtsratsvorsitzender: Markus Kadelke Member of United Internet Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient of this e-mail, you are hereby notified that saving, distribution or use of the content of this e-mail in any way is prohibited. If you have received this e-mail in error, please notify the sender and delete the e-mail. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CALJcnC-JBzKXwnw3fSZBB%3Dyh4TiyijQukNBtuBObDaBYsbzb5Q%40mail.gmail.com.

