Re: [prometheus-users] Prometheus HA strategies

Stuart Clark Thu, 27 Feb 2020 05:53:07 -0800

Prometheus isn't a clustered system by design, so it expects to have complete 
control of the data files. If another process starts changing the files it 
would quickly result in data corruption.


The large benefit of totally separate Prometheus servers without shared storage 
is simplicity. As the purpose of the platform is to record and alert on metrics 
something that is less likely to fail itself is very valuable - a cluster 
failing causing alerts and dashboards to stop could be disastrous.

If you are worried about gaps from a simple load balancer based solution look 
at Promxy or other parts of Thanos. They query both instances and deduplicate, 
filling in any gaps. 

On 27 February 2020 12:36:56 GMT, Riyan Shaik <[email protected]> wrote:
>Thanks Both of your replies.
>
>As has been discussed elsewhere, two Prometheus instances cannot share
>the 
>> same data store. I'd also add that using NFS introduces extra,
>unnecessary 
>> failure modes.
>
>
>You're right about adding another moving piece like NFS into the 
>architecture. Is there any documentation as to why prometheus instances
>
>can't share the same tsdb / datastore ?
>
>If the two instances are both scraping the same targets you don't need
>a 
>> global view. I just point my Grafana instance at the load balancer
>sitting 
>> in front of my parallel Prometheus instances; I've never noticed any 
>> display glitches caused by the load balancer switching between
>instances. 
>> The reality is that while their datasets may not be identical,
>they'll be 
>> "close enough".
>
>
>I believe the downside or a corner case with the prometheus instances 
>behind LB is, if one of the instances goes down, then you may have gaps
>in 
>your graphs and you can't backfill (not supported by prom) the data
>once 
>that instance comes back up.  
>
>
>You would only need deduplication at all if you were sending the data
>on to 
>> another system (Prometheus federation, Thanos, Victoriametrics,
>etc.). I'm 
>> currently experimenting with sending data from all of my instances
>(DEV, 
>> paired TEST, paired PROD) to a single Victoriametrics server. I
>haven't yet 
>> played with de-duplication though.
>
>
>Infact, i'm running thanos sidecar with my prom instances, to backup
>the 
>metrics data, i haven't looked at other components of Thanos just yet,
>its 
>a bit intimidating to start with. And the documentation of Thanos lacks
>
>doesn't speak about the baremetal setup. Out of curiosity, i'd like to
>know 
>why have you decided to go with VM over Thanos ? Would really to your
>PoV 
>and experiences with VM ? 
>
>Thanks.
>
>On Thursday, 27 February 2020 02:45:19 UTC+5:30, Harald Koch wrote:
>>
>>
>> On Wed, Feb 26, 2020, at 14:01, Riyan Shaik wrote:
>>
>> a) With a HA pair, the prometheus data will be local to both the 
>> prometheus instances. Is it a good idea to have these 2 prometheus 
>> instances write to some sort of a network mounted filesystem like NFS
>/ 
>> GlusterFS filesystem,so that the data is identical for both the
>prometheus 
>> instances ? Has anyone tried this ?
>>
>>
>> As has been discussed elsewhere, two Prometheus instances cannot
>share the 
>> same data store. I'd also add that using NFS introduces extra,
>unnecessary 
>> failure modes.
>>
>> b) AFAIK, with both the ha pairs scraping the same targets, how do i
>build 
>> a global view of these local prometheus instances? Is federation the
>only 
>> way with another Prometheus instance scraping the ha pair ?
>>
>>
>> If the two instances are both scraping the same targets you don't
>need a 
>> global view. I just point my Grafana instance at the load balancer
>sitting 
>> in front of my parallel Prometheus instances; I've never noticed any 
>> display glitches caused by the load balancer switching between
>instances. 
>> The reality is that while their datasets may not be identical,
>they'll be 
>> "close enough".
>>
>>
>> c) When the 2 ha pair scrape the same targets, are the metric values 
>> identical or slightly different, due to time offsets between the
>scrapes ? 
>> What happens if my scrape interval for prometheus A is 15sec and
>Prometheus 
>> B is 16 sec, then do i still need to dedup, since the values will be 
>> different ? What's the right strategy to dedup the metrics ?
>>
>>
>> You would only need deduplication at all if you were sending the data
>on 
>> to another system (Prometheus federation, Thanos, Victoriametrics,
>etc.). 
>> I'm currently experimenting with sending data from all of my
>instances 
>> (DEV, paired TEST, paired PROD) to a single Victoriametrics server. I
>
>> haven't yet played with de-duplication though.
>>
>> -- 
>> Harald
>> [email protected] <javascript:>
>>
>>
>
>-- 
>You received this message because you are subscribed to the Google
>Groups "Prometheus Users" group.
>To unsubscribe from this group and stop receiving emails from it, send
>an email to [email protected].
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/prometheus-users/60abb8ed-6bf5-406f-a00e-9d43ea14af5a%40googlegroups.com.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/BEEAB960-D17C-4D52-B877-2DBB1E09A460%40Jahingo.com.

Re: [prometheus-users] Prometheus HA strategies

Reply via email to