[prometheus-users] Re: Filter metric between range hour and minutes

Alen Cappelletti Sun, 26 Jun 2022 14:29:47 -0700

Hi Brian,
thank you very much  for the snipped code.. 
it was just what I needed ... I was trying to translate it in my mind from 
SQL to prom-SQL but something was not right. Thanks again you have been 
very useful.
you're right when you said: "But it's pretty ugly..." but the IT departmen  
informed me that outside that time period ... there may be maintenance 
procedures that could necessarily trigger it!
So it's ok. I looked on grafana ... and you can silence them .. but it is 
not routine, as I told you I must necessarily intervene in the query.
But it doesn't bother me.


I want to ask you another question on alertmanager, if you prefer I can 
open another thread. Anyway ... I have been working on a docker stack app 
from about 8 months and only now that I am nearing the end I'm dedicating 
to alerts. Honestly, initially I had used ALERTManager, but in Grafana 
there is a very similar management but I would say even more advanced in 
other aspects. Honestly, I have read a dozen articles and posts on the web, 
but it is not clear to me when it is preferable to use alertmanager over 
grafana. 

>From what I understood alertmanager, I see it as a unique hub for managing 
alerts coming from multiple instances of Prometheus also on other networks, 
but maybe it's just my opinion as a not profound connoisseur.

Thanks again and have a nice day.
ALEN
Il giorno domenica 26 giugno 2022 alle 09:49:59 UTC+2 Brian Candler ha 
scritto:

> I see; so this is just to workaround the limited functionality of Grafana 
> alerting.
>
> Then I guess you can just modify the rule you already have, to use (hour() 
> + minute()/60).
>
> e.g. I tested this briefly:
> (node_filesystem_avail_bytes < 10000000) and on () (hour() + minute()/60) 
> >= 6.5 < 19
>
> But it's pretty ugly.  For a long-running problem, the alert will be 
> "resolved" at 19:00 and then re-activate at 06:30 the next day.
>
> If you have a lot of this to do, then you could find out if Grafana can be 
> plugged into an external system like OpsGenie or PagerDuty (I have no idea 
> if it can; there is a separate discussion group for Grafana).  Or consider 
> moving to Alertmanager.
>
> On Sunday, 26 June 2022 at 00:10:40 UTC+1 [email protected] wrote:
>
>> Hi Brian, and thank you very much for your detailed answer... which I 
>> have read very carefully several times.
>>
>> Maybe I forgot a detail in my question, that is: I'm using Grafana!
>> Your concepts also related to the muting of the reports are clear to me 
>> and absolutely correct. These are not related to the alerts in grafana, 
>> unfortunately, but to the communication points where the recipients of the 
>> messages are defined. 
>>
>> So to simplify, it would be... in this particular case easier to fix it 
>> directly in the prom-QL code.
>> I would simply like to know how I can also include the 30 minutes only 
>> from 8:00 AM so that it becomes 8:30 AM... I don't know if exists the right 
>> syntax in prom-QL
>>
>> Thanks again and have a nice day.
>> ALEN
>>
>> Il giorno sabato 25 giugno 2022 alle 11:21:51 UTC+2 Brian Candler ha 
>> scritto:
>>
>>> Firstly, given that you have put "or vector(0)", I think you may 
>>> misunderstand how alerting works in Prometheus.
>>>
>>> PromQL expressions return vectors - a set of 0 or more values. In an 
>>> alerting expression, the alert is treated as firing if the vector is 
>>> non-empty - i.e. it contains 1 or more values, regardless of what those 
>>> values actually are.  Therefore, the expression vector(0) gives an alert 
>>> which fires all of the time, which isn't very useful.
>>>
>>> Next, PromQL comparison operators are filters, not booleans.  Suppose 
>>> you have the following metrics in your database:
>>>
>>> node_disk_space{instance="a"} 100
>>> node_disk_space{instance="b"} 200
>>> node_disk_space{instance="c"} 300
>>>
>>> The PromQL expression "node_disk_space > 150" returns a vector of 2 
>>> values:
>>>
>>> node_disk_space{instance="b"} 200
>>> node_disk_space{instance="c"} 300
>>>
>>> That is, the expression "node_disk_space" returns a vector of all 
>>> metrics with that metric name, and "node_disk_space > 150" filters it down 
>>> to just those metrics whose value is over 150.  It does not return a "true" 
>>> or "false" value (or values).
>>>
>>> Similarly, "and/or/unless" don't work like booleans either.  The 
>>> expression "node_disk_space > 150 or vector(0)" will return the following:
>>>
>>> node_disk_space{instance="b"} 200
>>> node_disk_space{instance="c"} 300
>>> {} 0
>>>
>>> In this case you get a vector of 3 values.  The explanation of how "or" 
>>> works is here:
>>>
>>> https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators
>>> It's another vector operator, which matches the label sets of the LHS 
>>> and RHS.
>>>
>>> Now, let me go back to your original problem about time periods.  I 
>>> think you're approach this the wrong way.
>>>
>>> I believe the business rule amounts to this:  "I only want to receive 
>>> alerts on this condition if the time falls between 8:30am and 9pm".  It's 
>>> not that the problem doesn't happen outside business hours; it's that the 
>>> problem isn't important enough to send a notification outside of business 
>>> hours.
>>>
>>> Therefore, the right way to handle this is with time periods within 
>>> alertmanager, to control when the alerts are sent - not within the PromQL 
>>> expression which determines whether there is a problem or not.
>>>
>>> The way you do this is with time intervals in alertmanager routing 
>>> trees. See:
>>> https://prometheus.io/docs/alerting/latest/configuration/#route
>>> https://prometheus.io/docs/alerting/latest/configuration/#time_interval
>>>
>>> Not only is this far easier to implement than attempting to do it in 
>>> PromQL, it's also more flexible - for example you can have the same alert 
>>> (from the same PromQL alerting rule) sent to different groups depending on 
>>> the time of day.
>>>
>>> Note that you can add labels to your alert in the alerting rule to 
>>> categorise the alert, and you can match on those labels in your alert 
>>> routing tree.  This gives you further flexibility to categorise your alerts 
>>> in whatever way is useful to you.
>>>
>>> On Friday, 24 June 2022 at 23:20:42 UTC+1 [email protected] wrote:
>>>
>>>> Hi,
>>>> I'm try to write this simple code for Prometheus
>>>> but I don't understand how can I include also minutes... with a valide 
>>>> range of hour.
>>>>
>>>> Alert could firing only between: *08:30 AM to all 09:00 P.M*.
>>>>
>>>> Here below the hours are in CET (+2 from Italy where I'm)
>>>>
>>>> (count by (exported_instance, counter_instance) 
>>>> (database_status{job="aaaa", exported_instance="myserver", 
>>>> status!="ONLINE"})
>>>> and on() hour() >= 6 <= 19
>>>> *......... miss minute .......*
>>>> ) or vector(0)
>>>>
>>>> Thanks Alen
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4f20dab8-6d21-4ec7-a538-8f7416ed6834n%40googlegroups.com.

[prometheus-users] Re: Filter metric between range hour and minutes

Reply via email to