Thank you very much Stuart : ) For implementing "split your list of targets across multiple servers", currently in our env, the mulitple jobs are sharing the same configmap. So in order to split the target list, should I create separate configmap for each instance? I'm not sure if it is the correct way.
On Wednesday, June 9, 2021 at 2:30:30 PM UTC+8 Stuart Clark wrote: > On 09/06/2021 07:16, nina guo wrote: > > Thank you very much. > > May I ask if there is a way to make multiple Prometheus instances to > scrape different targets? > > Compared the 2 solution, scraping same targerts vs scraping different > targets, which is more better? > > On Monday, June 7, 2021 at 6:08:47 PM UTC+8 Stuart Clark wrote: > >> When doing autoscaling (not specifically with Prometheus but everything) >> you need to ensure that you don't have too many changes happening at once, >> otherwise you might start rejecting requests (if all instances are >> restarting at the same time). >> >> This would generally be done via things like pod distuption budgets. For >> a pair of Prometheus servers I'd not want more than one change at once. For >> other systems I might go as far as N-1 changes at once. >> > Yes sharding is a standard solution when wanting to scale Prometheus > performance. > > The two options are for different use cases and work together. A single > Prometheus server can handle a certain number of targets/queries based on > both the number of metrics being scraped and the CPU/memory assigned to > that server. Above that level you would look to split your list of targets > across multiple servers. Also it might make sense to do that splitting also > for organisational reasons - different servers split by product, service, > location, etc. which are managed by different teams for example. So you > might have a server in location X and two servers in location Y (product A > and product B). > > You might have additional more central servers for global alerts using > federation or a system such as Thanos not to combine all metrics together > (which would be a single point of failure and require massive resources) > but to allow for a consolidated view. > > Alongside this you would use pairs of Prometheus for HA, so that if a > single server isn't operating (failure, maintenance, etc.) you don't lose > metrics. You might run a system such as promxy or Thanos in front of each > pair to handle deduplication. So in the example of 3 groups of Prometheus > servers (X, AY & BY) they would actually be HA pairs, so 6 servers in > total. If using a system such as Kubernetes you'd need to ensure that any > changes are limited (e.g. via pod disruption budgets) to ensure the second > pod isn't stopped/replaced while the first is out of action. > > -- > Stuart Clark > > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/50c36d1d-e543-4d56-9f03-6c9fb1a21c3bn%40googlegroups.com.

