Re: [DISCUSS] Prometheus endpoint in CouchDB 4.x

Will Holley Thu, 24 Sep 2020 00:41:29 -0700

It's a good point about querying every node for cluster-wide data such as
_active_tasks (assuming a direct translation of the current endpoint),
though I think it's better to lean on the monitoring tool to aggregate data
across nodes rather than deduplicate. Possibly the simplest thing for a v1
would be to only expose the existing node-level metrics (_stats, _system)?


Having also written an OpenMetrics exporter for CouchDB, one problem I
found is that the existing stats/system responses don't map cleanly to
OpenMetrics idioms. Some examples:

 * OpenMetrics uses a flat naming structure whereas _stats uses a
hierarchical structure, and automatically flattening the names using a
convention leads to nonsensical / duplicated names.
 * Some groupings of counters in _stats (e.g. the httpd_requests_methods
group) should be represented as a single counter with multiple label values
(method="PUT|POST|GET|DELETE|COPY|HEAD"). For others (e.g. the couchdb
group containing auth_cache_hits, database_reads, etc), each field
represents a different counter.
 * Histograms can't be translated to OpenMetrics format purely from the
stats response - you need to understand the CouchDB config/what the
boundaries of the histogram buckets are.

Even if it ends up being a plugin/sidecar, I think it would be useful to
have a reference implementation that could be used to inform these
decisions and make it simpler to share assets which build on the
OpenMetrics output (e.g. Grafana dashboards, PromQL alert definitions).




On Wed, 23 Sep 2020 at 22:10, Tobias Gesellchen <gesel...@gmail.com> wrote:

> Hi,
>
> chiming in as maintainer of the already mentioned
> https://github.com/gesellix/couchdb-prometheus-exporter <
> https://github.com/gesellix/couchdb-prometheus-exporter>.
>
> My impression would be to consolidate the existing endpoints first (maybe
> at /_metrics, because /_info sounds too informal), which would make high
> frequent scrapes more efficient. The current approach of the
> CouchDB-Prometheus-Exporter doesn’t feel right, because every node * every
> stats/system/active_tasks endpoint needs to be queried on each scrape. That
> endpoint should certainly be able to provide JSON format by default, which
> would already help a lot to improve the existing Prometheus exporter.
>
> Content negotiation via Accept header would be nice to respond with the
> Prometheus specific format. I wouldn’t prefer the workaround with the
> request parameter, though. Without too much knowledge about CouchDB
> internals: I’d suggest yet another endpoint /_prometheus which would
> provide text/plain (prometheus formatted) content by default. That endpoint
> could internally “delegate” to the /_metrics endpoint.
>
> While I might be biased, and knowing that other frameworks/tools already
> provide Prometheus stats out of the box, I personally tend to keep things
> separated. From an operational perspective it would be great to _not_ have
> to co-locate CouchDB with a sidecar-exporter, but on the contrary it would
> also be great if I could perform upgrades or configuration separately.
>
> Best
> Tobias
>
>
>
> > On 23. Sep 2020, at 22:43, Robert Samuel Newson <rnew...@apache.org>
> wrote:
> >
> > Hi,
> >
> > I don't see why this can't be a new endpoint (emitting the normal
> Prometheus format) that couchdb administrators can choose to enable (and
> leave it disabled by default, returning a 404).
> >
> > I agree with the general view that content type negotiation doesn't
> really work well in practice, and I don't much like the suggested ?accept=
> hack.
> >
> > I am old and world-weary and have seen these sorts of things come and go
> many times. Prometheus seems a fine option for now, and perhaps for a
> while, but it feels like a plugin, not core, to me.
> >
> > B.
> >
> >> On 23 Sep 2020, at 17:25, Richard Ellis <ricel...@uk.ibm.com> wrote:
> >>
> >>> so we should absolutely make this info available in JSON
> >>
> >> This sounds like a good idea to me
> >>
> >>> we could fall back to a ?accept=prometheus option
> >>
> >> I'm opposed to adding endpoints that supply different content-type
> >> responses via non-standard means. The CouchDB API has some examples of
> >> this through history and it can make using those endpoints with
> standard
> >> tooling somewhat painful.
> >>
> >> A bit of quick searching seems to suggest that the format has its own
> >> project https://openmetrics.io/ - and this declares it's text
> >> representation linking back to
> >>
> https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format
> >> which declares a Content-Type of "text/plain; version=0.0.4" - so
> >> defaulting to that, but following Joan's suggestion and switching to
> JSON
> >> for a supplied Accept:application/json in the standard way seems a like
> >> good choice to me.
> >>
> >> Rich
> >>
> >>
> >>
> >> From:   Jan Lehnardt <j...@apache.org>
> >> To:     dev@couchdb.apache.org
> >> Cc:     "Gesellchen, Tobias" <tobias.gesellc...@europace.de>
> >> Date:   23/09/2020 16:42
> >> Subject:        [EXTERNAL] Re: [DISCUSS] Prometheus endpoint in CouchDB
> >> 4.x
> >>
> >>
> >>
> >> Hi all,
> >>
> >> a few things to consider:
> >>
> >> 1. The idea of unifying our “get runtime info about CouchDB” endpoints
> >> into one is solid, as it is always weird to make sure you know which
> info
> >> you get where. We see this specifically in support engagements, where
> it
> >> is always awkward to ask for the results of multiple endpoints.
> >>
> >> 2. This directly leads to the question about what the endpoint should
> be
> >> called. I feel if it is a new endpoint, we should give it a new name.
> >> _info maybe, but feel free to bike shed away.
> >>
> >> 3. Next the question about per-node and per-cluster
> info/metrics/activity
> >> on the endpoint. It might be convenient to be able to ask any one node
> >> about what is going on in the entire cluster, rather than any one node,
> >> but some stats only make sense in the context of a single node. Maybe
> the
> >> result includes everything separated by node somehow.
> >>
> >> 4. Then the format: if this wasn’t about Prometheus and its custom
> format,
> >> we wouldn’t discuss any of this and just use JSON. Since we *do* want
> to
> >> target Prometheus with this, we have to talk about the format. Any of
> the
> >> above is useful for non-Prometheus consumers, so we should absolutely
> make
> >> this info available in JSON. And we can *also* send it in the
> Prometheus
> >> format. The “correct” HTTP-way of doing this would be to use the Accept
> >> header on the new endpoint, as Joan points out, but that’s often not an
> >> option, so we could fall back to a ?accept=prometheus option. This
> would
> >> also leave us open to add more formats in the future, as new standards
> >> arise.
> >>
> >> 5. That leads us to whether we want to do this. Every five or so years,
> >> new standards for these types of systems arise, and sometimes it is
> worth
> >> incorporating them (like we finally do with the SystemD compatible log
> >> formatter) and sometimes it is not and folks write tools to convert
> from
> >> our HTTP/JSON standard to whatever they need (
> >> https://github.com/gesellix/couchdb-prometheus-exporter
> >> )
> >>
> >> 6. We could also just bundle this exporter (although it is written in
> Go,
> >> which we currently don’t have as a dependency.
> >>
> >> * * *
> >>
> >> Personally, I think the Prometheus format is widely enough used to
> warrant
> >> inclusion, as long as we do it tastefully. I think a new endpoint with
> an
> >> additional ?accept= or similar URL-level override for the format would
> be
> >> a pragmatic, if not entirely *neat* approach. If we can build this all
> in
> >> Erlang, the better, if we wanna shortcut dev time and bundle the Go
> >> project, I might be more hesitant. On the per-node-or-per-cluster
> >> question, I don’t know enough about the Prometheus format and whether
> it
> >> allows us to send the equivalent of {nodes: { “node1”: {…}, “node2”:
> {…},
> >> “node3”: {…} }}, or whether it demands per-node output, in which case
> >> _active_tasks might get a bit awkward.
> >>
> >> Best
> >> Jan
> >>
> >> —
> >> Professional Support for Apache CouchDB:
> >> https://neighbourhood.ie/couchdb-support/
> >>
> >>
> >> 24/7 Observation for your CouchDB Instances:
> >> https://opservatory.app
> >>
> >>
> >>> On 22. Sep 2020, at 14:55, jiangph <jiangpeng...@hotmail.com> wrote:
> >>>
> >>> Hey all,
> >>>
> >>> We would like to add a Prometheus metrics endpoint for CouchDB and
> >> wanted to see if the community would be interested in us contributing
> this
> >> to CouchDB 4.x.
> >>>
> >>> Prometheus is a CNCF open-source project and the Prometheus metrics
> >> endpoint format is supported by many monitoring tools. Its data model
> is
> >> based around having a metric name which then contains a label name and
> a
> >> label value:
> >>>
> >>> <metric name>{<label name>=<label value>, ...}
> >>>
> >>> And it supports the Counter, Gauge, Histogram, and Summary metric
> types.
> >>
> >>>
> >>> The idea for the new Prometheus endpoint, /_metrics, would be that the
> >> endpoint is a consolidation of the _stats [1],  _system [2], and
> >> _active_tasks [3] endpoints.
> >>>
> >>> For _stats and _system, the conversion from JSON to Prometheus-based
> >> format seems to be straightforward.
> >>>
> >>> JSON format:
> >>> {
> >>> "value": {
> >>> "min": 0,
> >>> "max": 0,
> >>> "arithmetic_mean": 0,
> >>> "geometric_mean": 0,
> >>> "harmonic_mean": 0,
> >>> "median": 0,
> >>> "variance": 0,
> >>> "standard_deviation": 0,
> >>> ...
> >>> "percentile": [
> >>> [
> >>>  50,
> >>>  0
> >>> ],
> >>> [
> >>>  75,
> >>>  0
> >>> ],
> >>> [
> >>>  90,
> >>>  0
> >>> ],
> >>> [
> >>>  95,
> >>>  0
> >>> ],
> >>> [
> >>>  99,
> >>>  0
> >>> ],
> >>> [
> >>>  999,
> >>>  0
> >>> ]
> >>> ],
> >>> "histogram": [
> >>> [
> >>>  0,
> >>>  0
> >>> ]
> >>> ],
> >>> }
> >>>
> >>> Prometheus-based format:
> >>>
> >>> couchdb_stats{value="min"} 0
> >>> couchdb_stats{value="max"} 0
> >>> couchdb_stats{value="percentile50"} 0
> >>> couchdb_stats{value="percentile75"} 0
> >>> couchdb_stats{value="percentile95"} 0
> >>>
> >>> For _active_tasks, the change will be a bit more complicated, and some
> >> fields will be added to labels and tags.
> >>>
> >>> JSON format:
> >>>
> >>> {
> >>>  "checkpointed_source_seq": 68585,
> >>>  "continuous": false,
> >>>  "doc_id": null,
> >>>  "doc_write_failures": 0,
> >>>  "docs_read": 4524,
> >>>  "docs_written": 4524,
> >>>  "missing_revisions_found": 4524,
> >>>  "pid": "<0.1538.5>",
> >>>  "progress": 44,
> >>>  "replication_id": "9bc1727d74d49d9e157e260bb8bbd1d5",
> >>>  "revisions_checked": 4524,
> >>>  "source": "mailbox",
> >>>  "source_seq": 154419,
> >>>  "started_on": 1376116644,
> >>>      "target": "
> >> http://mailsrv:5984/mailbox
> >> <
> >> http://mailsrv:5984/mailbox
> >>> ",
> >>>  "type": "replication",
> >>>  "updated_on": 1376116651
> >>> }
> >>>
> >>> Prometheus-based would look something like:
> >>>
> >>> format:couchdb_active_task{type="replication", source="mailbox",
> >> target="
> >> http://mailsrv:5984/mailbox
> >> <
> >> http://mailsrv:5984/mailbox
> >>> ", docs_count = "docs_read"} 4524
> >>> couchdb_active_task{type="replication", source="mailbox", target="
> >> http://mailsrv:5984/mailbox
> >> <
> >> http://mailsrv:5984/mailbox
> >>> ", docs_count = "docs_written"} 4524
> >>> couchdb_active_task{type="replication", source="mailbox", target="
> >> http://mailsrv:5984/mailbox
> >> <
> >> http://mailsrv:5984/mailbox
> >>> ", docs_count = "missing_revisions_found"} 4524
> >>>
> >>>
> >>> Best regards,
> >>> Garren Smith
> >>> Peng Hui Jiang
> >>>
> >>> [1]
> >>
> https://docs.couchdb.org/en/latest/api/server/common.html#node-node-name-stats
> >> <
> >>
> https://docs.couchdb.org/en/latest/api/server/common.html#node-node-name-stats
> >>>
> >>> [2]
> >> https://docs.couchdb.org/en/latest/api/server/common.html#active-tasks
> >> <
> >> https://docs.couchdb.org/en/latest/api/server/common.html#active-tasks
> >>>
> >>> [3]
> >>
> https://docs.couchdb.org/en/latest/api/server/common.html#node-node-name-system
> >> <
> >>
> https://docs.couchdb.org/en/latest/api/server/common.html#node-node-name-system
> >>>
> >>
> >>
> >>
> >>
> >>
> >> Unless stated otherwise above:
> >> IBM United Kingdom Limited - Registered in England and Wales with
> number
> >> 741598.
> >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> >>
> >
>
>

Re: [DISCUSS] Prometheus endpoint in CouchDB 4.x

Reply via email to