Hi Christian,

Actually, I want to another if there is any better way to define the
threshold for my 5 new servers that belong to 5 different components. Is
writing 5 different recording rules with the same name, and different
instance and component labels only way to proceed here? Won't that be a
little too dirty to maintain? What if it was 20 servers all belonging to a
different component?

On Tue, Jun 30, 2020 at 11:43 AM Christian Hoffmann <
[email protected]> wrote:

> Hi,
>
> On 6/24/20 8:09 PM, [email protected] wrote:
> > Hi. Currently I am using a custom threshold in case of my Memory alerts.
> > I have 2 main labels for my every node exporter target - cluster and
> > component.
> > My custom threshold till now has been based on the component as I had to
> > define that particular custom threshold for all the servers of the
> > component. But now, I have 5 instances, all from different components
> > and I have to set the threshold as 97. How do approach this?
> >
> > My typical node exporter job.
> >   - job_name: 'node_exporter_JOB-A'
> >     static_configs:
> >     - targets: [ 'x.x.x.x:9100' , 'x.x.x.x:9100']
> >       labels:
> >         cluster: 'Cluster-A'
> >         env: 'PROD'
> >         component: 'Comp-A'
> >     scrape_interval: 10s
> >
> > Recording rule for custom thresholds.
> >   - record: abcd_critical
> >     expr: 99.9
> >     labels:
> >       component: 'Comp-A'
> >
> >   - record: xyz_critical
> >     expr: 95
> >     labels:
> >       node: 'Comp-B'
> >
> > The expression for Memory Alert.
> > ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
> > node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) *
> > on(instance) group_left(nodename) node_uname_info > on(component)
> > group_left() (*abcd_critical* or *xyz_critical* or on(node) count by
> > (component)((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
> > node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 0 + 90)
> >
> > Now, I have 5 servers with different components. How to include that in
> > the most optimized manner?
>
> This looks almost like the pattern described here:
> https://www.robustperception.io/using-time-series-as-alert-thresholds
>
> It looks like you already tried to integrate the two different ways to
> specific thresholds, right? Is there any specific problem with it?
>
> Sadly, this pattern quickly becomes complex, especially if nested (like
> you would need to do) and if combined with an already longer query (like
> in your case).
>
> I can only suggest to try to move some of the complexity out of the
> query (e.g. by moving the memory calculation to a recording rule instead).
>
> You can also split the rule into multiple rules (with the same name).
> You will just have to ensure that they only ever fire for a subset of
> your instances (e.g. the first variant would only fire for
> compartment-based thresholds, the second only for instance-based
> thresholds).
>
> Hope this helps.
>
> Kind regards,
> Christian
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAFGi5vAN7hdhL_ZymmKijhTgA93Jpq63ngNhExLQb_mDWpnkxQ%40mail.gmail.com.

Reply via email to