On 16 Mar 20:39, Philip Potter wrote: > Since this was a "dear all" on prometheus-users, I guess I can throw my > opinion in. > > In my view, the one thing I most desire that is currently missing from the > Prometheus docs is a rationale / design principles document. Something > that explains *why* Prometheus is the way that it is.
Some questions are answered by https://github.com/prometheus/docs/pull/1054/files > > The sorts of things this document could help with: > > - what problems is Prometheus designed to solve? what is it *not* > designed to solve? > - it does metrics for operational monitoring - not for audit, not > 100% accuracy > - a prometheus server should have no external dependencies, so that > it can run even when other things are breaking > - something something failure domains > - pull model > - the prometheus server is in charge of scheduling scrapes; the > clients don't set cadence > - some example implications of design choices - features / patterns > of prometheus and how they tie back to the above > - why do counters monotonically increase? why does the rate() function > do what it does and how does it compare with other metrics tools? > - why does prometheus use local disk for storage? why is there no > replication option? if i'm used to setting up > replication/backups on all my > stateful services, what should I do with Prometheus? (I have seen > someone > decide that remote_write to Postgres/TimescaleDB was absolutely required > because we know how to operate Postgres already) > - why does the increase() function return fractional values? > - why do we prefer small single-purpose exporters? > > I kind of feel I know the answers to these questions, but it has been > picked up slowly through a long process of osmosis, and it makes it hard to > upskill others at my workplace. I have become the local Prometheus expert > because I've been playing with it long enough to have picked up this oral > tradition; but I'd like to see it written down somewhere official so that I > can point others at it. It's also very slow to pick this information up > because you often learn it through a disjointed series of "why..?" > questions (the second list above) and have to infer the underlying design > principles (the first list above) piece by piece. > > There are bits of this kind of information dotted around the docs: the > bottom of the overview page < > https://prometheus.io/docs/introduction/overview/> has some very high level > principles, but it doesn't feel complete; the "Best Practices" section has > some of this information, but it varies massively in scope and level of > detail (from "how do I design a console or dashboard" which is super > high-level and not even specific to Prometheus, to "how should I instrument > inner loops in my program" which is pretty low-level and specific to a > particular problem; neither is really at the level I'm talking about here). > > The thing is, having used Prometheus for 2 years now it's clear that > Prometheus and its community does have a very strong sense of shared design > principles, and this is a good thing; but the lack of a place to point > beginners at to explain these design principles has been a barrier for me > teaching Prometheus to others at my workplace. I would really like to see > a single page or set of pages to capture these ideas. > > Phil -- (o- Julien Pivotto //\ Open-Source Consultant V_/_ Inuits - https://www.inuits.eu -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/20200316204454.GA2375%40oxygen.
signature.asc
Description: PGP signature

