I saw some blog in google as below:
If you want to count the time spent in down state, this becomes more
complicated because you have to detect the switch from 1 to 0 which count
for 1min and the subsequent down state until the first switch back from 0
to 1.
It could be something along the lines of:
(max_over_time(up[60s]) == bool 0) * ((up offset 61s == bool 1) *
count(up[60s]) OR vector(1)) ---> query
But the above query threw me an error as below:
bad_data: 1:73: parse error: expected type instant vector in aggregation
expression, got range vector
What I am missing here... How I can achieve this solution like "find the
instances that have been completely in down state for last X days"
Thanks & regards,
Bharath Kumar.
On Wednesday, 17 August 2022 at 19:42:26 UTC+5:30 Brian Candler wrote:
> If you want servers that have been down for 30 days, then I thought it
> should be obvious you need max_over_time(up[30d]) == 0 ... but perhaps it
> isn't as obvious as I thought.
>
> Let me break that query down into parts:
>
> up[30d] : returns a *range vector* containing all data points for the
> timeseries with metric name "up" from T - 30 days to T (where T is the
> evaluation time, i.e. the point on the X axis)
>
> By "timeseries" I mean distinct combination of metric name and labels, e.g.
> up{instance="foo"}
> up{instance="bar"}
> are two different timeseries. They happen to share the same metric name
> ("up") but they are recording an independent sequence of measurements.
>
> Think of the range vector as a two-dimensional grid: there are N different
> timeseries, each with M data points over that period. The data collected
> and stored in the TSDB might look like this:
>
> up{instance="foo"} v1 . . . v2 . . . v3 . . .
> up{instance="bar"} . . v4 . . . v5 . . . v6 .
> -------------------------> time
>
> Then:
> max_over_time(...) : for each timeseries in the range vector, picks the
> highest value. This returns an *instant vector*, i.e. a single value for
> every timeseries, which is the maximum of each.
>
> up{instance="foo"} v3
> up{instance="bar"} v5
>
> Each of those values is the maximum value of the timeseries, over the 30
> day period.
>
> Now, you've chosen to draw a graph of this expression, but it's important
> to realise that the graph itself doesn't need to be over 30 days. When you
> draw a graph of an expression, it will sweep across the evaluation time,
> evaluating the expression repeatedly at different instants in time over the
> given period.
>
> Let's say, for example, you set the graph range to be 1 week, but you are
> graphing max_over_time(up[30d]) == 0
>
> What will you get? This will be a series of points. Let's imagine the
> graph only had one point per day. Considering the position of each point on
> the time axis:
> Aug 17: shows if the server has been down from (Aug 17 - 30 days) to (Aug
> 17)
> Aug 16: shows if the server has been down from (Aug 16 - 30 days) to (Aug
> 16)
> ...
> Aug 10: shows if the server has been down from (Aug 10 - 30 days) to (Aug
> 10)
>
> In fact, for your purposes (asking, has the server been down for the *last
> 30 days*?) you don't need to draw a graph at all! In which case, if you
> turn on the "Instant" switch in Grafana it will only ask Prometheus to
> evaluate the expression for the current instant, which makes the query much
> faster and cheaper.
>
> This is then an ideal query to use in a dashboard, where you just want to
> show a list of servers that have been down for the last 30 days. You don't
> care, for example, if 2 days ago they were down for the 30 days before that
> point, do you? Because that's what basically a graph of that expression
> will tell you: at each point in time, whether it was down for the previous
> 30 days.
>
> On Wednesday, 17 August 2022 at 14:09:42 UTC+1 [email protected]
> wrote:
>
>> [image: up.PNG]
>> this is the query I am using and the above graph is for 30 days and it is
>> down from the last day. I want the servers that are down for the whole 30
>> days
>> On Wednesday, 17 August 2022 at 12:55:48 UTC+5:30 Brian Candler wrote:
>>
>>> Extraordinary claims require extraordinary evidence.
>>>
>>> I don't believe there's a bug in prometheus: I believe there's a bug in
>>> how you are using it. But unless you show the data, there's no way to
>>> demonstrate this.
>>>
>>> On Wednesday, 17 August 2022 at 04:36:43 UTC+1 [email protected]
>>> wrote:
>>>
>>>>
>>>> yeah. I want only that the servers are down for the whole two days. Its
>>>> value should always be zero(0) throughout the last 'X' days.
>>>>
>>>> But max_over_time is giving me the info if the servers are down for
>>>> even one minute from the last 'X' days.
>>>>
>>>> Thanks & regards,
>>>> Bharath kumar.
>>>> On Tuesday, 16 August 2022 at 20:27:30 UTC+5:30 Stuart Clark wrote:
>>>>
>>>>> On 2022-08-16 15:08, BHARATH KUMAR wrote:
>>>>> > hello,
>>>>> >
>>>>> > max_over_time(up[2d]) == 0 is giving me the info like ...for the
>>>>> last
>>>>> > two days if the server goes down for 1 minute also it was displaying
>>>>> > in the graph which I don't want. I want the information that for the
>>>>> > last "X" days it should be completely in an unreachable state.
>>>>> >
>>>>>
>>>>> So you are only wanting it if every single scrape failed over the past
>>>>> 2
>>>>> days?
>>>>>
>>>>> Try sum() instead of max_over_time().
>>>>>
>>>>> --
>>>>> Stuart Clark
>>>>>
>>>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/47873d21-7f4c-42ac-9753-0651d8f26640n%40googlegroups.com.