On 25/02/2021 13:40, Saurabh Vartak wrote:
Hi Stuart,
Thanks for the prompt response and all the guidance till date.
The set up we are looking for is that the user of the Grafana portal
need not have access to absolutely any other piece of infrastructure
(including the other Kubernetes clusters which are scraped for metrics).
So what we have thought is to have all the Kubernetes clusters push
their metrics to a Centralized Prometheus ... and have the Grafana
sitting on top of only that Centralized Prometheus server.
I was able to set-up the Prometheus server to server communication
using Prometheus Federation as you have correctly suggested. However I
am still reading for what metrics I may miss if I use the Prometheus
Federation. In all I have the below three queries:
1. Are all the metrics forwarded using Prometheus Federation? Or is
it that only a few are forwarded?
2. The metrics that are forwarded using Prometheus Federation, do
they get stored in the TSDB of the destination Prometheus Server?
3. What would be the best way to take the back up of the Centralized
Prometheus server? Do we need to use any external source like
Thanos? Or are the disk backups of the Centralized Prometheus
Server enough?
Trying to bring all data into a single central server isn't recommended
- resource requirements can quickly get very high as the number of time
series would likely be huge.
For your use case it sounds like a solution such as Cortex or Thanos
would be a good fit.
Instead of running a central Prometheus server each one send data to an
object store (e.g. S3 bucket). That store is then presented in a
Prometheus compatible way to allow queries from Grafana.
With federation one method is to produce aggregate metrics within each
Prometheus using recording rules (e.g. sum together a metric to remove
instance or pod labels) which are then selected for federation (possibly
at a lower scraping frequency than the source server uses). That way you
have the full resolution metrics in the localised servers, which can be
used for per-pod queries and aggregate metrics in the central system,
which can be used for "global" dashboards (services that span clusters
or showing different geographic regions).
With that setup you could either run Grafana locally to each Prometheus
(which has the advantage of allowing dashboards to be viewed even if the
network or central server is broken) or a single central Grafana (or a
combination of both options). The central Grafana as well as querying
the central Prometheus server could be configured with additional
Prometheus data sources for each of the local servers too, allowing both
aggregated and specific queries.
--
Stuart Clark
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/6adb1757-ea20-c4c0-206b-d8680d6ffa42%40Jahingo.com.