My 2 cents:
* We try to structure our stats in a Errors->Saturation->Utilization way,  
which is consistent with USE methodology[1] or Google's Four Golden Signals[2]. 
In case of unbound it is:
  * Servfails, availability measured by blackbox tests
  * Queue sizes, queue drops, blackbox tests latency
  * Number of Queries w/ breakdowns by querytype/answercode, cache hit rate
* We also graphs some basic internal stats, like memory usage, cpu usage, 
restart rate, etc.
* Breakdowns and drill downs are very useful to reduce MTTR.

[1] http://www.brendangregg.com/usemethod.html
[2] https://landing.google.com/sre/book.html
> On Nov 23, 2016, at 11:55 AM, John Todd via Unbound-users 
> <[email protected]> wrote:
> 
> On 23 Nov 2016, at 0:49, Jaap Akkerhuis via Unbound-users wrote:
> 
>> Alexander via Unbound-users writes:
>> 
>>>     Hi to every one, can you help to monitor unbound dns with cacti?
>>> I'm tried to set up unbound and cacti, but the graphs are empty. I'm
>>> installed Dmitriy Demidov package.
>> 
>> Once I set-up cacti to do this, but I'm not really happy with that.
>> 
>>>     Can you tell me others tools for monitoring dns queues? Some tips
>>> for monitoring DNS?
>> 
>> I really prefer using munin. See the user contributed directory.
> 
> [snip]
> 
> I know it’s not a direct answer to the top part of the original question, but 
> perhaps it does answer the second part about monitoring queues.  We’ve 
> recently created an exporter for Unbound resolver for importation into 
> Prometheus, which seems to work quite well. We then use Grafana to extract 
> and visualize information from Prometheus. Building charts once you get the 
> hang of the query language is quite easy, and allows on-the-fly regeneration 
> of data visualization and complex comparisons/aggregations if you have 
> multiple servers, locations, or services. Here is an example chart that took 
> about 30 seconds to build.  There are also monitoring components for 
> Prometheus and/or Grafana which can generate alerts based on metrics in a 
> more complex way other than just visualization, but that perhaps is outside 
> the scope of this thread. There are a number of tools for importing other 
> system-level data into Prometheus, and it may be a good idea to investigate 
> those other components to compliment or replace your existing monitoring 
> systems if they do what you need. It is not trivial to learn - the query 
> language is mostly unlike SQL, and there are quite a few ways to fail 
> silently with what seem to be legitimate queries, but if you know the ground 
> truth of one system you can start iteratively trying to draw graphs until you 
> figure out the right way to do it.
> 
> If there is interest, we can try to work on getting the exporter we wrote in 
> a condition where it could be provided in the contrib directory. It uses the 
> “push gateway” method, which is not ideal but does work well enough. (Note: 
> “Prometheus Unbound” is also a novel by Percy Bysshe Shelley, which makes 
> keyword searching for prior work on this a bit difficult, so apologies if 
> someone has already done this project.  :-)
> 
> 
> Prometheus overview:
> 
> To give an example of how a graph is built, this is the simplest query that I 
> performed to get the component of the chart that generates the “A” QTYPE 
> component line. I just cut/pasted this into a number of other queries in the 
> same graph to create the other lines, replacing “A” with “AAA”, “MX”, etc.  
> This aggregates all of the Unbound servers I am running (I have many) with 
> the “sum” command, then uses the “irate” command which shows change over 
> time, with a time interval of 1 minute.
> 
> sum(irate(unbound_num_query_type_A[1m]))
> 
> I then specified that this is stacked chart, percentage-measured, with 60% as 
> the lower bound.  I could command-click any of the labels shown and they 
> would disappear from the graph and it would be re-drawn without that 
> statistic instantly. Alternately, I could click on just one of the labels and 
> only that graph line would be shown, re-drawing instantly.
> 
> A more complex query, limiting to systems that are tagged with “prod” (vs. 
> “dev”) and limiting to specific POPs is shown below.
> The “env” and “loc” tags are made up by us, and the contents of those tags 
> are set on the remote server before the metrics are collected.  This allows 
> arbitrary tagging of each metric so that it is possible to filter (think of 
> it as a modified “SELECT WHERE” statement.)  The $POP string specification 
> (created by us, again another arbitrary tag name) is consumed by Grafana 
> using a concept called “templates”, which puts a pull-down list at the top of 
> the graph page with a list of all of the POPs we have.  I can then select one 
> OR MORE POPs and the system will automatically aggregate all the data across 
> all those metrics and display it. I could put other filters in here that 
> would be parsed at the moment the graph is drawn.
> 
> sum(irate(unbound_num_query_type_A{env="prod",loc=~"$POP"}[1m]))
> 
> In summary: Once you start putting your monitoring data into a TSDB or 
> TSDB-ish system like Prometheus (or InfluxDB, or OpenTSDB) and creating 
> visualizations with Grafana, you will wonder how you possibly survived 
> without it.  Even just using the most basic features is a huge win over older 
> systems, in my opinion, and moving up into the automation methods and 
> alerting methods as you get more experience is another win. If you’re looking 
> for a short intro to Prometheus, see the following presentation from 
> Monitorama 2015 by Jamie Wilkinson.
> 
> Video: https://vimeo.com/131581353
> Slides: 
> https://docs.google.com/presentation/d/1X1rKozAUuF2MVc1YXElFWq9wkcWv3Axdldl8LOH9Vik/edit#slide=id.ga150a40c0_0_193
> 
> If you’re looking for an introduction to Grafana, there are many - Google 
> will be a better guide than I.
> 
> JT
> 
> <Screen Shot 2016-11-23 at 10.28.43 AM.png>

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to