Thanks Brian for the guidance.

1) We are leveraging Thanos querier for consolidating the metrics across
the shards

2) We are using Thanos store for long term storage which kind of serving
our needs


The only concern I feel here is - with shards we always bound to
considerate on "Single point of failure" and how do we technically address
it

Likewise any shard going down we would get to drop "x" mins of metrics and
due to which we can't achieve the fault tolerance of so called  4 9's
(99.99% of availability)

Any thoughts on this lines on how do we get more resilient

On Sat, 5 Sep, 2020, 12:26 am Brian Candler, <[email protected]> wrote:

> Guidelines I have seen:
> - no more than 2m metrics per prometheus server
> - aim for <10k metrics per scrape target if possible
>
> If you're scaling really large then you might want to look at Thanos,
> Cortex etc.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/aa7ad804-9e13-4476-ab7b-9ecc546cd1f8o%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/aa7ad804-9e13-4476-ab7b-9ecc546cd1f8o%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAA6KEskP0Um8CttBY7TRSn8URxd7OZcrrysrdN6gUZ8%2BFJhF9Q%40mail.gmail.com.

Reply via email to