This seems like an interesting approach. If possible can you please give some more insight into this approach?
On Friday, July 3, 2020 at 10:56:01 AM UTC+5:30 [email protected] wrote: > The other proper way is to dynamically generate alerts where you hardcode > the thresholds based on labels. > Like using a combo of yaml/jinja to store the thresholds in a > maintainable format and have one command to regenerate everything. > Every time you want to change a value you just regenerate the alerts. > > On Thu, Jul 2, 2020 at 8:38 PM Yagyansh S. Kumar <[email protected]> > wrote: > >> Also, currently, I have only tried a single way to give custom threshold >> i.e based on the component name. For example, for all the targets under >> Comp-A have a threshold of 99.9 and all the targets under Comp-B have a >> threshold of 95. >> But now, I have to give a common custom threshold let say 98 to 5 >> different targets, all of which belong to 5 different components and all >> the 5 components have more than 1 target but I want the custom threshold to >> be applied for only a single target from each component. >> >> On Fri, Jul 3, 2020 at 12:02 AM Yagyansh S. Kumar <[email protected]> >> wrote: >> >>> Hi Christian, >>> >>> Actually, I want to another if there is any better way to define the >>> threshold for my 5 new servers that belong to 5 different components. Is >>> writing 5 different recording rules with the same name, and different >>> instance and component labels only way to proceed here? Won't that be a >>> little too dirty to maintain? What if it was 20 servers all belonging to a >>> different component? >>> >>> On Tue, Jun 30, 2020 at 11:43 AM Christian Hoffmann < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> On 6/24/20 8:09 PM, [email protected] wrote: >>>> > Hi. Currently I am using a custom threshold in case of my Memory >>>> alerts. >>>> > I have 2 main labels for my every node exporter target - cluster and >>>> > component. >>>> > My custom threshold till now has been based on the component as I had >>>> to >>>> > define that particular custom threshold for all the servers of the >>>> > component. But now, I have 5 instances, all from different components >>>> > and I have to set the threshold as 97. How do approach this? >>>> > >>>> > My typical node exporter job. >>>> > - job_name: 'node_exporter_JOB-A' >>>> > static_configs: >>>> > - targets: [ 'x.x.x.x:9100' , 'x.x.x.x:9100'] >>>> > labels: >>>> > cluster: 'Cluster-A' >>>> > env: 'PROD' >>>> > component: 'Comp-A' >>>> > scrape_interval: 10s >>>> > >>>> > Recording rule for custom thresholds. >>>> > - record: abcd_critical >>>> > expr: 99.9 >>>> > labels: >>>> > component: 'Comp-A' >>>> > >>>> > - record: xyz_critical >>>> > expr: 95 >>>> > labels: >>>> > node: 'Comp-B' >>>> > >>>> > The expression for Memory Alert. >>>> > ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - >>>> > node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * >>>> > on(instance) group_left(nodename) node_uname_info > on(component) >>>> > group_left() (*abcd_critical* or *xyz_critical* or on(node) count by >>>> > (component)((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - >>>> > node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 0 + >>>> 90) >>>> > >>>> > Now, I have 5 servers with different components. How to include that >>>> in >>>> > the most optimized manner? >>>> >>>> This looks almost like the pattern described here: >>>> https://www.robustperception.io/using-time-series-as-alert-thresholds >>>> >>>> It looks like you already tried to integrate the two different ways to >>>> specific thresholds, right? Is there any specific problem with it? >>>> >>>> Sadly, this pattern quickly becomes complex, especially if nested (like >>>> you would need to do) and if combined with an already longer query (like >>>> in your case). >>>> >>>> I can only suggest to try to move some of the complexity out of the >>>> query (e.g. by moving the memory calculation to a recording rule >>>> instead). >>>> >>>> You can also split the rule into multiple rules (with the same name). >>>> You will just have to ensure that they only ever fire for a subset of >>>> your instances (e.g. the first variant would only fire for >>>> compartment-based thresholds, the second only for instance-based >>>> thresholds). >>>> >>>> Hope this helps. >>>> >>>> Kind regards, >>>> Christian >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/CAFGi5vB8S0_Gi03HSS%2BUFnQ%3DmWrWVwoBSAxJDhS3ed9r4QcTEA%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/prometheus-users/CAFGi5vB8S0_Gi03HSS%2BUFnQ%3DmWrWVwoBSAxJDhS3ed9r4QcTEA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a0526d89-d0f9-4f39-b319-741323e65885n%40googlegroups.com.

