Hi,

On 6/24/20 8:09 PM, [email protected] wrote:
> Hi. Currently I am using a custom threshold in case of my Memory alerts.
> I have 2 main labels for my every node exporter target - cluster and
> component.
> My custom threshold till now has been based on the component as I had to
> define that particular custom threshold for all the servers of the
> component. But now, I have 5 instances, all from different components
> and I have to set the threshold as 97. How do approach this?
> 
> My typical node exporter job.
>   - job_name: 'node_exporter_JOB-A'
>     static_configs:
>     - targets: [ 'x.x.x.x:9100' , 'x.x.x.x:9100']
>       labels:
>         cluster: 'Cluster-A'
>         env: 'PROD'
>         component: 'Comp-A'
>     scrape_interval: 10s
> 
> Recording rule for custom thresholds.
>   - record: abcd_critical
>     expr: 99.9
>     labels:
>       component: 'Comp-A'
> 
>   - record: xyz_critical
>     expr: 95
>     labels:
>       node: 'Comp-B'
> 
> The expression for Memory Alert.
> ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
> node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) *
> on(instance) group_left(nodename) node_uname_info > on(component)
> group_left() (*abcd_critical* or *xyz_critical* or on(node) count by
> (component)((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
> node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 0 + 90)
> 
> Now, I have 5 servers with different components. How to include that in
> the most optimized manner?

This looks almost like the pattern described here:
https://www.robustperception.io/using-time-series-as-alert-thresholds

It looks like you already tried to integrate the two different ways to
specific thresholds, right? Is there any specific problem with it?

Sadly, this pattern quickly becomes complex, especially if nested (like
you would need to do) and if combined with an already longer query (like
in your case).

I can only suggest to try to move some of the complexity out of the
query (e.g. by moving the memory calculation to a recording rule instead).

You can also split the rule into multiple rules (with the same name).
You will just have to ensure that they only ever fire for a subset of
your instances (e.g. the first variant would only fire for
compartment-based thresholds, the second only for instance-based
thresholds).

Hope this helps.

Kind regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2565fb74-b5ab-26a9-7656-8b81eeb277ff%40hoffmann-christian.info.

Reply via email to