Re: [prometheus-users] Best practice for Prometheus + GCP Cloud Run

Stuart Clark Fri, 18 Dec 2020 08:28:28 -0800

On 18/12/2020 14:50, Rohit Ramkumar wrote:

Hi,
I'm running a service in Cloud Run (https://cloud.google.com/run) andwondering what the best practice is here for setting up Prometheus.Specifically, I'm wondering how to handle the case when there aremultiple container instances running behind a single Cloud Run APIendpoint.
If there is only one container instance ever, then this is easy. I cansimply deploy the Prometheus server along with my application serverand expose it. Clients can hit the Cloud Run endpoint and get themetrics. However, if there is more than one container instance (duringautoscaling for example) how will this work? Wouldn't a client requestfor metrics get sent to any of the backends? Is using a push gatewaythe best practice in this case?

I'll start by saying that I'm not all that familiar with Google Cloud,as we use AWS mostly, but in terms of good practice for Prometheus theanswer is always to access the underlying instances/pods/containersdirectly and not via a load balancer. I'd normally use one of theService Discovery (SD) mechanisms to find those (e.g. Kubernetes SD forpods or AWS SD for EC2 instances). Hopefully you can do somethingsimilar with the GCE SD(https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config).

If it isn't possible to connect to such instances (for example forLambdas in AWS) I would then look to connect the cloud's native metricssystem to Prometheus. So for AWS I'd look at using the CloudWatchExporter. It looks like there is a Stackdriver Exporter, which I thinkwould be the equivalent for GCE?

The Push Gateway isn't designed, and is a very poor fit, for these sortof use cases. The Push Gateway is really for short lived processes thatcan't be directly scraped due to the limited time they exist (forexample cron jobs). Equally it works best when there is only a single(or a fixed number) of parallel instances of that short lived process(e.g. for a cron job you'd expect only a single run every configuredperiod). When you send metrics to the Push Gateway you replace theprevious set, so for multiple instances (or jobs) you'd have differentprefixes. If the number of instances is dynamic you'd end up withmetrics for instances that still exist in Push Gateway, but no longerexist in reality. People then engineer something which tries to keep thePush Gateway "tidy", but you end up with something that is complex andprobably not that reliable.

So in short, the Push Gateway is unlikely to be useful at all for youruse case. Instead try to connect to instances directly (behind the loadbalancer) and if not possible look at integration with the Googlemetrics system.


--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/48a7ebcb-22b2-ea49-c02f-8d18199e10c0%40Jahingo.com.

Re: [prometheus-users] Best practice for Prometheus + GCP Cloud Run

Reply via email to