Petr Menšík via Unbound-users wrote: > No. I think that was question on some conference, DNS-OARC perhaps. > > The proposal was what if bind9, unbound, knot-resolver and pdns-recursor > could create the same format for their statistics. So prometheus could have > only one statistics parser code. It might be exported to different path in > filesystem and that should be enough. Only path and content should be > different for different services. Format should ideally stay compatible. > Then it would require less code as glue between statistics dashboards used > and the DNS service itself. > > I think such common format would be great. I would prefer something json > based. I can describe only bind9 and unbound statistics. Their format is > very different, although quite a lot numbers could be similar. > > This is main statistics refactoring issue at bind9 > > https://gitlab.isc.org/isc-projects/bind9/-/issues/38 > > I am not sure where exactly did they talk about requirements for a new > format, sorry. I think it was mentioned after some talk at some OARC > recording, but do not remember which one.
I don't think BIND, Unbound, Knot Resolver, and PowerDNS Recursor should generate identical statistics. It would be nice if they used a de facto standard like Prometheus/OpenMetrics format exposed on an HTTP endpoint so that their metrics can be scraped and ingested by modern observability stacks. Currently Unbound requires deploying a third party daemon [0] alongside Unbound to convert the bespoke "UBCT1" protocol (the protocol that unbound-control speaks to the Unbound daemon) in order to ingest Unbound's metrics into a Prometheus-compatible stack. There are some metrics that count the number of times certain kinds of packets occur (queries/responses by QTYPE/OPCODE/RCODE, by transport, etc.) where you can arguably find some level of commonality between different DNS server implementations because they are just counting objective, externally observable events. If you are restricting your visibility to just these externally observable properties of DNS transactions, then perhaps it might be possible to share glue code and "statistics dashboards" between different DNS server implementations. But this is a fairly basic level of visibility. DNS server implementations are going to have diverse internal architectures and implementation details and some level of visibility (or "observability") into the health of those internal implementation details is highly desireable. For instance, I care very much about why Unbound might have dropped a query from a client. It's not very useful or actionable to have a single "number of client queries that were dropped" metric in the DNS server that aggregates every cause together. (All this tells me is that the query got to the server and I can exclude external possibilities like socket receive buffer overruns from the possible causes.) You need more fine grained metrics that let you track down what mechanism(s) resulted in the query drops. So Unbound has been getting more fine grained metrics like [1, 2] that help explain exactly which implementation specific mechanisms are resulting in query drops. It would be unreasonable to expect every implementation to have the same metrics like the ones in [1, 2], because these are implementation-specific details that are going to vary because different implementations take different approaches to solving various problems. It would also be unreasonable to just take a union of all such implementation-specific metrics and add them to a single common format and just have implementations omit the ones that aren't relevant to them. (Or, even worse, have different implementations use the same metric names to mean totally different things, or sort of similar but not really the same things.) So my recommendations are basically: 1) Don't innovate on the exposition format. DNS servers exist in a universe with many other kinds of servers that have had to deal with broadly similar issues and this is not a greenfield. Prometheus/OpenTelemetry already exists. If you come up with a bespoke XML, JSON, protobuf, etc. format, it will have to be converted to something else in order to import it into modern observability stacks, so just generate that format directly. (If you disagree, then by all means, design an additional layer of internal abstraction and build a pluggable module interface so you can support a multitude of different metrics exposition formats/transports.) 2) Every vendor should come up with their own naming scheme, organization, and definition of implementation-specific "health" metrics that fit their own software most naturally. 3) There may be some value in regularizing across server implementations the definitions of metrics that count externally visible properties of DNS transactions (the QTYPEs and RCODEs, etc.). But this kind of effort should be narrowly scoped to exclude the implementation-specific "health" metrics. [0]: https://github.com/letsencrypt/unbound_exporter [1]: https://github.com/NLnetLabs/unbound/pull/1159 [2]: https://github.com/NLnetLabs/unbound/pull/1374 -- Robert Edmonds [email protected]
