Have similar case. I would like to use remote-write to collect metrics from multiple namespaces/clusters, however federation seems me much more reliable. Federation endpoint is just another scrapping target - in case of network failure (or any other failure) I will get an alert that federation endpoint is down. In case of remote write I have risks to stay blind. I see no clear mechanism to be sure I'm getting the metrics =/
What are the possible solutions in this case? On Wednesday, 20 July 2022 at 18:16:29 UTC+2 [email protected] wrote: > @Stuart: I agree with most of the ideas you say :-) I see remote-write as > the most appropriate metrics forwarding for my deployment use case. > Using federation is not good in terms of interface > standardization, HA of monitoring stack, and feature support. > For the above case, I have functions and a dedicated set > of engineers who own such workload to query individual instances, and the > global instance is used as centralized monitoring. > I was looking at this > <https://github.com/prometheus/prometheus/issues/5666>closed bug, raised > on Prometheus in the 2019 Summer. To my understanding, there are > performance issues with remote-write but most of them are resolved and the > community sees remote-write to perform better when compared to the > federation. Am I thinking correctly? > Could you clarify the performance comparison between > remote-write and federation? > > /Teja > > On Tuesday, July 19, 2022 at 5:02:11 PM UTC+2 Stuart Clark wrote: > >> On 19/07/2022 13:24, tejaswini vadlamudi wrote: >> > @Ben: Makes a point, but getting Thanos or Cortex into the picture >> > could be a way forward after some time. For now, do you think it is >> > good enough to use remote-write instead of federation? From a >> > performance and resource consumption POV, do you see remote-write as >> > the way-forward? >> > >> With remote write you could use agent mode, so you don't have to have >> local storage other than for the destination instance. >> >> However again it depends what you are trying to achieve and why you have >> suggested having four instances. Are you wanting to query all four >> instances or only the "global" one? Are you wanting to copy all data to >> the "global" instance or only some metrics? Every data point, or only at >> a lower frequency? >> >> If you are intending to copy all data (both metrics & data points) that >> leans towards remote write as federation works differently. But in that >> case there doesn't seem to be any advantage in having the extra three >> instances at all (unless you are intending on doing local querying, >> alerting or recording rules) - so I'd just have a single instance that >> scrapes all namespaces. >> >> Alternatively if you are needing to have separate instances with local >> storage/querying then I'd probably not look to copy all the data to the >> "global" instance (which just doubles storage and memory usage) and >> either use remote write for a much smaller subset of metrics, federation >> with a slower scrape rate/reduced set of metrics, or as Ben suggested >> something like Thanos (other options exist as well) to do away with the >> fourth instance entirely and distribute the queries to the individual >> instances instead. >> >> Maybe if you could explain a bit about what the design is hoping to >> achieve it would help us advise better? >> >> -- >> Stuart Clark >> >> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/344c19de-664e-44a2-b389-56145585d47cn%40googlegroups.com.

