On 18/12/2020 14:50, Rohit Ramkumar wrote:
Hi,

I'm running a service in Cloud Run (https://cloud.google.com/run) and wondering what the best practice is here for setting up Prometheus. Specifically, I'm wondering how to handle the case when there are multiple container instances running behind a single Cloud Run API endpoint.

If there is only one container instance ever, then this is easy. I can simply deploy the Prometheus server along with my application server and expose it. Clients can hit the Cloud Run endpoint and get the metrics. However, if there is more than one container instance (during autoscaling for example) how will this work? Wouldn't a client request for metrics get sent to any of the backends? Is using a push gateway the best practice in this case?

I'll start by saying that I'm not all that familiar with Google Cloud, as we use AWS mostly, but in terms of good practice for Prometheus the answer is always to access the underlying instances/pods/containers directly and not via a load balancer. I'd normally use one of the Service Discovery (SD) mechanisms to find those (e.g. Kubernetes SD for pods or AWS SD for EC2 instances). Hopefully you can do something similar with the GCE SD (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config).

If it isn't possible to connect to such instances (for example for Lambdas in AWS) I would then look to connect the cloud's native metrics system to Prometheus. So for AWS I'd look at using the CloudWatch Exporter. It looks like there is a Stackdriver Exporter, which I think would be the equivalent for GCE?

The Push Gateway isn't designed, and is a very poor fit, for these sort of use cases. The Push Gateway is really for short lived processes that can't be directly scraped due to the limited time they exist (for example cron jobs). Equally it works best when there is only a single (or a fixed number) of parallel instances of that short lived process (e.g. for a cron job you'd expect only a single run every configured period). When you send metrics to the Push Gateway you replace the previous set, so for multiple instances (or jobs) you'd have different prefixes. If the number of instances is dynamic you'd end up with metrics for instances that still exist in Push Gateway, but no longer exist in reality. People then engineer something which tries to keep the Push Gateway "tidy", but you end up with something that is complex and probably not that reliable.

So in short, the Push Gateway is unlikely to be useful at all for your use case. Instead try to connect to instances directly (behind the load balancer) and if not possible look at integration with the Google metrics system.

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/48a7ebcb-22b2-ea49-c02f-8d18199e10c0%40Jahingo.com.

Reply via email to