Hello Brian, Sorry for the late response. I already have plenty of dashboards in Grafana for various parts of our infrastructure, alerts and thresholds works well, and having an actual value helps us finding the source of our problems as you say. However, the particular dashboard I'm crafting is aimed at the executives and other partners than demands an availability counter for our infrastructure as a whole. So for this particular dashboard, the question is not "is something broken and why is it broken?" but just "is everything working and if not, what broke and when?". I should have made it a bit clearer, sorry.
The few queries you gave me helped me a lot actually! I never used a bool in my queries before and never bothered to use it until you mentioned it. So now I use home-made recording rules for the various parts of the infrastructure, mainly containing min/max/max_over_time/bool and a few conditions. I get a nice load of 0s and 1s everywhere and it's very easy now to get a global % of availability for a period of time. The state timeline panel in Grafana is also very useful. Thanks for your help Brian :) Le jeudi 18 novembre 2021 à 09:57:37 UTC+1, Brian Candler a écrit : > You're probably looking at it the wrong way, and I expect you should > configure Grafana to visualise correctly the response you have. > > You can display or not display something in Grafana based on > presence/absence of any value. However usually it's more useful to *see* the > actual failing value, because an indication of just "not healthy" doesn't > give you any clue to help debug the problem. One thing you can do in > Grafana is to set thresholds and colours: e.g. display green if the value > is between 0 and 5, amber if 5 to 10, red if 10 or higher. That's often > much more useful (except for users with colour blindness who may need > additional cues). > > However, you *can* also frig the queries in PromQL if required. Since > you don't give the actual queries, I can only talk in general terms. > > foo < 1 > # gives you some value for foo, if it's less than 1, and no value if foo > >= 1. > > (foo < 1) * 0 > # will always gives you a value of 0 if foo < 1, or no value if foo >= 1 > > foo < bool 1 > # will always give you a value: 0 if foo < 1, 1 if foo >= 1 > > > For example, I might have a cluster where one of the servers can fail > and still display an available service (and a result of 1 for my query), > but having 2 failed servers would get me a result of "0" for my query. > > I would be inclined make a query to count "number of failed servers", and > set a display threshold on this. Then the dashboard won't say "too many > failed servers!", it will say "2 failed servers!" > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a713ef0c-2a9e-411b-a56a-95ae65ee463bn%40googlegroups.com.

