Hi Team, *Current setup:*
I am using using prometheus setup running along with thanos for extended storage to accomodate 2 months of data for longer persistence 1. 2 Instances of prometheus (0 replica/1 replica) and both this instances enabled with thanos-sidecar 2. thanos-sidecar subsequently writes data to GCS bucket for extended long term storage 3. thanos querier connected to two instances and reading data via store gateway *Challenge/Issue in current setup:* Despite of having long term storage we would still be losing potential data of real time/latest/current 2 hours.(storage.tsdb.max-block-duration=2h) - How do we handle the below cases? - Backup/snapshot for older instance during upgrade will be an option (But it cannot be seamless and will fail to perform fault tolerance and post-facto) - 2 HA Instances of prometheus to handle DR scenarios but if both the instances fail then it will lose all 2 hours data with extreme case *Question:* Is there any efficient mechanism instilled within prometheus to save 2 hours data during restarts/crash recovery/upgrade? what measures/guidelines should we follow to minimize the data loss with less disruption -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/ff63e213-c9fe-44d3-ace6-354ac7d855f5%40googlegroups.com.

