Thanks Brian for the guidance. 1) We are leveraging Thanos querier for consolidating the metrics across the shards
2) We are using Thanos store for long term storage which kind of serving our needs The only concern I feel here is - with shards we always bound to considerate on "Single point of failure" and how do we technically address it Likewise any shard going down we would get to drop "x" mins of metrics and due to which we can't achieve the fault tolerance of so called 4 9's (99.99% of availability) Any thoughts on this lines on how do we get more resilient On Sat, 5 Sep, 2020, 12:26 am Brian Candler, <[email protected]> wrote: > Guidelines I have seen: > - no more than 2m metrics per prometheus server > - aim for <10k metrics per scrape target if possible > > If you're scaling really large then you might want to look at Thanos, > Cortex etc. > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/aa7ad804-9e13-4476-ab7b-9ecc546cd1f8o%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/aa7ad804-9e13-4476-ab7b-9ecc546cd1f8o%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAA6KEskP0Um8CttBY7TRSn8URxd7OZcrrysrdN6gUZ8%2BFJhF9Q%40mail.gmail.com.

