If I understand what you're doing, I wouldn't have 60000 static recording 
rules, I would just create a text file like this:

monitored_interface_info{instance="ORDER12345678",ifDescr="Fa0"} 1
monitored_interface_info{instance="ORDER12345678",ifDescr="Fa1"} 1
... etc

Then I would either stick this on a webserver and scrape it, or drop it 
into a file for node_exporter's textfile collector to pick up. Either way 
would need "honor_labels: true" to preserve the 'instance' label (or you 
can put it in a different label, and then use metric relabelling to move 
it).

This also solves your problem about changes. There's no need to HUP 
prometheus, it'll update on the next scrape.

On Wednesday, 11 October 2023 at 14:22:04 UTC+1 Sebastiaan van Doesselaar 
wrote:

> Thank you very much for the pointers. I'd considered a recording rule 
> might work when I did my Google adventures, but as you mention this works 
> very well indeed.
>
> I had to slightly modify your query to:  (ifOperStatus != 1) * on 
> (instance,ifName) monitored_interface_info
>
> Then it gives me the perfect result actually. Probably still needs some 
> finetuning, but that's fine.
>
> To get back to your second suggestion: this unfortunately is not an option 
> for us. We're not always in full control of what we monitor unfortunately. 
> If we had been, that would be the easier and better solution indeed.
>
> Two questions left:
>
>    - Any recommended/supported way of loading the rules dynamically? I 
>    saw you'd need a SIGHUP to reload them, so I could script it easily. 
>    Preferably I'd use something (natively) supported though, like the 
>    http_sd_config setup we use to do service discovery.
>    - What'll the impact on performance be if we have say, (20000 
>    instances, each with 2-5 monitored interfaces) 60000 recording rules like 
>    this? I imagine it'll either be peanuts for Prometheus, or heavier than I 
>    imagine.
>
>
> On Wednesday, October 11, 2023 at 10:15:26 AM UTC+2 Ben Kochie wrote:
>
>> For alerting on monitored interfaces I might suggest a different approach 
>> than trying to apply them at discovery time. The discovery phase is able to 
>> apply labels to the whole target device easily, but it's not really going 
>> to work well to annotate individual metrics.
>>
>> What I would suggest is you populate a series of recording rules that 
>> define which interfaces should be alerted on. Then you can use a join at 
>> alert query time. This is also how you can set different alerting 
>> thresholds for things dynamically.
>>
>> For example if you have this rule:
>>
>> groups:
>> - name: monitored interfaces
>>   interval: 1m
>>   rules:
>>     - record: monitored_interface_info
>>       expr: vector(1)
>>       labels:
>>         instance: ORDER12345678
>>         ifDescr: Fa0
>>     - record: monitored_interface_info
>>       expr: vector(1)
>>       labels:
>>         instance: ORDER12345678
>>         ifDescr: Fa1
>>
>> Then your alert would look like this:
>>
>> - name: alerts
>>   rules:
>>     - alert: InterfaceDown
>>       expr: ifOperStatus == 0 * on (instance,ifDescr) 
>> monitored_interface_info
>>       for: 5m
>>
>> You can use nautbot database to generate the rules file.
>>
>> Another approach would be to populate the monitored interface information 
>> in your devices. If you can tag the interface descriptions/aliases with a 
>> structured format you can use metric_relabel_configs to create a 
>> monitored_interface label
>>
>> So if your interface description is say Fa0;true, you can do something 
>> like this:
>> metric_relabel_configs:
>> - source_labels: [ifDescr]
>>   regex: '.+;(.+)'
>>   target_label: monitored_interface
>> - source_labels: [ifDescr]
>>   regex: '(.+);.+'
>>   target_label: ifDescr
>>
>> On Wed, Oct 11, 2023 at 9:05 AM 'Sebastiaan van Doesselaar' via 
>> Prometheus Users <promethe...@googlegroups.com> wrote:
>>
>>> Hi all,
>>>
>>> I've got a PoC running using nautobot (source of truth, scrape with SD), 
>>> prometheus and snmp-exporter. It's scaling very well, much better than we 
>>> anticipated and we're very eager to put the finishing touches on the PoC. 
>>>
>>> I want to set a bunch of interfaces to either "monitored" or not. Based 
>>> on that, I will generate a label. For example: 
>>> "__meta_nautobot_monitored_interfaces": 
>>> "Fa0,Fa1". 
>>>
>>> A full return from Nautobot might be something like:
>>>
>>> [ { "targets": [ "ORDER12345678" ], "labels": { 
>>> "__meta_nautobot_status": "Active", "__meta_nautobot_model": "Device", 
>>> "__meta_nautobot_name": "ORDER12345678", "__meta_nautobot_id": 
>>> "c301aebf-e92d-4f72-8f2b-5768144c42f4", "__meta_nautobot_primary_ip": 
>>> "xxx", "__meta_nautobot_primary_ip4": "xxx", 
>>> "__meta_nautobot_monitored_interfaces": "Fa0,Fa1", "__meta_nautobot_role": 
>>> "CPE", "__meta_nautobot_role_slug": "cpe", "__meta_nautobot_device_type": 
>>> "ASR9006", "__meta_nautobot_device_type_slug": "asr9006", 
>>> "__meta_nautobot_site": "Place", "__meta_nautobot_site_slug": "place" } } ]
>>>
>>>
>>> My question is two-fold:
>>>   - I'd like to drop all unrelated interfaces. I don't know how to do 
>>> that using relabeling with this comma separated string. I'm open to 
>>> presenting the data in a different way, since I made the prometheus service 
>>> discovery plugin myself (based on the netbox one), but I haven't thought of 
>>> a better way.
>>>
>>>   - I only want to alert on the monitored interfaces. I mean, if we fix 
>>> the above, this is a non-issue, but if that's not possible, I'd like to at 
>>> least only alert on the monitored interfaces.
>>>
>>> This is going to be run on ~20.000 devices with all differing 
>>> configuration, models, vendor, etc.  It needs to be very dynamic and all 
>>> sourced from Nautobot, as well as be refreshed when changes are made there. 
>>> Using snmp.yml and a generator to do so doesn't seem feasible unless I'd 
>>> create a new module for every possible configuration, and that doesn't seem 
>>> ideal.
>>>
>>> If anyone has any suggestions on how to accomplish this, I'd very much 
>>> appreciate it. 
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to prometheus-use...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f7864a46-0023-448d-995f-9c1e7b947b1cn%40googlegroups.com.

Reply via email to