Re: [prometheus-users] Prometheus HA strategies

Harald Koch Thu, 27 Feb 2020 07:52:19 -0800

On Thu, Feb 27, 2020, at 07:36, Riyan Shaik wrote:
> 
> I believe the downside or a corner case with the prometheus instances behind 
> LB is, if one of the instances goes down, then you may have gaps in your 
> graphs and you can't backfill (not supported by prom) the data once that 
> instance comes back up.

We're using Prometheus for gathering statistics for long-term usage changes
(e.g. to know when a cluster needs to be scaled out); for short-term analysis
of performance (e.g. why the heck did our message rate drop last night? Oh
right, the SAN disk latency went up by a factor of 10), and for alerting
(messages to the Labs aren't flowing).

In all cases, short drop-outs in statistics gathering simply haven't been a
problem, other than I've had to smooth a few alerts to prevent them from
resolving and refiring on a missed scrape, (e.g. changing "up != 1" to
"avg_over_time(up[1m]) < 0.9".

In short, I'm happy trading off occasional gaps in my data for the simplicity
of Prometheus.

> Out of curiosity, i'd like to know why have you decided to go with VM over
> Thanos ? Would really to your PoV and experiences with VM ?

My personal experience was that setting up Victoriametrics as an aggregator for
use with Grafana was incredibly simple, while setting up Thanos was
just-less-simple-enough that I never succeeded. (It's not that Thanos is really
that difficult, but I work on a small, understaffed team and there are never
enough spare minutes :).

--
Harald
[email protected]

--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/8e92c2a8-8dd7-4cf9-979d-c41de390c179%40www.fastmail.com.

Re: [prometheus-users] Prometheus HA strategies

Reply via email to