Hi all, I'm trying to work on adding a bunch of monitoring alerts for our 
VPNs (very topical, huh?) using Prometheus. We have a number of VPNs that 
we're monitoring; monitoring via Prometheus is new (available data < 1 
week) and some of the VPNs are newer (available data < 2 days). The VPNs 
are broadly similar in terms of SNMP, but the interface naming is a bit 
different.

Today's aim to look at a certain set of interfaces and alert when the 
amount of traffic is unusually high. As I'm still coming up to speed with 
Prometheus and alerting I want to ensure I'm following best practice and am 
working towards some reusable patterns I can include in my teams internal 
training.

I can get the rate of ifOutOctets for the various interfaces of interest, 
that's not a problem:

ifOutOctets{job="cisco_asa_vpn",ifName=~".*(VPN|MAN).*",vpn=~"vpn.*"}

and as a result I get 12 series returned, although this may be a little 
tidier

sum(rate(ifOutOctets{job="cisco_asa_vpn",ifName=~".*(VPN-INSIDE|MAN).*",vpn
=~"vpn.*"}[2m])) by(vpn,ifName)

The result of that looks like this as an instance query

{ifName="INT-MAN",vpn="vpn.example.com"} 53710.8
{ifName="INT-MAN",vpn="vpn2.example.com"} 371.9938001033316
{ifName="INT-VPN-INSIDE-344",vpn="vpn.example.com"} 1334581.4166666667
{ifName="INT-VPN-INSIDE-344",vpn="vpn2.example.com"} 45.7325711238146
{ifName="DMZ-VPN-INSIDE",vpn="vpn5.example.com"} 1554450.8833333333
{ifName="DMZ-VPN-INSIDE",vpn="vpn6.example.com"} 5290491.866666668
{ifName="INT-MAN",vpn="vpn5.example.com"} 93529.56666666667
{ifName="INT-MAN",vpn="vpn6.example.com"} 107974.35000000003

By 'unusually high', I'm thinking either above 95th percentile of the 
preceeding 7 days (well, initially perhaps 2 days). So trying to get the 
95th percentile for the last two days...

quantile(0.95, rate(ifOutOctets{job="cisco_asa_vpn",ifName=~
".*(VPN-INSIDE|MAN).*",vpn=~"vpn.*"}[2d])) by(vpn,ifName)

However, this seems quite wrong, as the graph looks the same with the 5th 
percentile as it does with the 95th, which is clearly not useful, but the 
data (when rated over a 2m period) is quite variable.

What should the query be to give me a single value for each series 
{vpn,ifName} that would give the 95th percentile based on the past N days?

Thanks,
Cameron

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2b133c69-1a99-4ae0-b974-5bb4492990c2%40googlegroups.com.

Reply via email to