Hi Christian, Actually, I want to another if there is any better way to define the threshold for my 5 new servers that belong to 5 different components. Is writing 5 different recording rules with the same name, and different instance and component labels only way to proceed here? Won't that be a little too dirty to maintain? What if it was 20 servers all belonging to a different component?
On Tue, Jun 30, 2020 at 11:43 AM Christian Hoffmann < [email protected]> wrote: > Hi, > > On 6/24/20 8:09 PM, [email protected] wrote: > > Hi. Currently I am using a custom threshold in case of my Memory alerts. > > I have 2 main labels for my every node exporter target - cluster and > > component. > > My custom threshold till now has been based on the component as I had to > > define that particular custom threshold for all the servers of the > > component. But now, I have 5 instances, all from different components > > and I have to set the threshold as 97. How do approach this? > > > > My typical node exporter job. > > - job_name: 'node_exporter_JOB-A' > > static_configs: > > - targets: [ 'x.x.x.x:9100' , 'x.x.x.x:9100'] > > labels: > > cluster: 'Cluster-A' > > env: 'PROD' > > component: 'Comp-A' > > scrape_interval: 10s > > > > Recording rule for custom thresholds. > > - record: abcd_critical > > expr: 99.9 > > labels: > > component: 'Comp-A' > > > > - record: xyz_critical > > expr: 95 > > labels: > > node: 'Comp-B' > > > > The expression for Memory Alert. > > ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - > > node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * > > on(instance) group_left(nodename) node_uname_info > on(component) > > group_left() (*abcd_critical* or *xyz_critical* or on(node) count by > > (component)((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - > > node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 0 + 90) > > > > Now, I have 5 servers with different components. How to include that in > > the most optimized manner? > > This looks almost like the pattern described here: > https://www.robustperception.io/using-time-series-as-alert-thresholds > > It looks like you already tried to integrate the two different ways to > specific thresholds, right? Is there any specific problem with it? > > Sadly, this pattern quickly becomes complex, especially if nested (like > you would need to do) and if combined with an already longer query (like > in your case). > > I can only suggest to try to move some of the complexity out of the > query (e.g. by moving the memory calculation to a recording rule instead). > > You can also split the rule into multiple rules (with the same name). > You will just have to ensure that they only ever fire for a subset of > your instances (e.g. the first variant would only fire for > compartment-based thresholds, the second only for instance-based > thresholds). > > Hope this helps. > > Kind regards, > Christian > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAFGi5vAN7hdhL_ZymmKijhTgA93Jpq63ngNhExLQb_mDWpnkxQ%40mail.gmail.com.

