[prometheus-users] How to solve "Two hour in-memory prometheus data during upgrade/failover"

Shaam Dinesh Mon, 30 Mar 2020 08:40:20 -0700

Hi Team,

*Current setup:*


I am using using prometheus setup running along with thanos for extended 
storage to accomodate 2 months of data for longer persistence


   1. 2 Instances of prometheus (0 replica/1 replica) and both this 
   instances enabled with thanos-sidecar
   2. thanos-sidecar subsequently writes data to GCS bucket for extended 
   long term storage
   3. thanos querier connected to two instances and reading data via store 
   gateway

*Challenge/Issue in current setup:*

Despite of having long term storage we would still be losing potential data 
of real time/latest/current 2 hours.(storage.tsdb.max-block-duration=2h) - 
How do we handle the below cases?


   - Backup/snapshot for older instance during upgrade will be an option 
   (But it cannot be seamless and will fail to perform fault tolerance and 
   post-facto)
   - 2 HA Instances of prometheus to handle DR scenarios but if both the 
   instances fail then it will lose all 2 hours data with extreme case

*Question:*

Is there any efficient mechanism instilled within prometheus to save 2 
hours data during restarts/crash recovery/upgrade? what measures/guidelines 
should we follow to minimize the data loss with less disruption

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ff63e213-c9fe-44d3-ace6-354ac7d855f5%40googlegroups.com.

[prometheus-users] How to solve "Two hour in-memory prometheus data during upgrade/failover"

Reply via email to