I always knew Stackdriver existed but didn't think to incorporate it here. Thanks! This is helpful.
On Fri, Dec 18, 2020 at 11:28 AM Stuart Clark <[email protected]> wrote: > On 18/12/2020 14:50, Rohit Ramkumar wrote: > > Hi, > > > > I'm running a service in Cloud Run (https://cloud.google.com/run) and > > wondering what the best practice is here for setting up Prometheus. > > Specifically, I'm wondering how to handle the case when there are > > multiple container instances running behind a single Cloud Run API > > endpoint. > > > > If there is only one container instance ever, then this is easy. I can > > simply deploy the Prometheus server along with my application server > > and expose it. Clients can hit the Cloud Run endpoint and get the > > metrics. However, if there is more than one container instance (during > > autoscaling for example) how will this work? Wouldn't a client request > > for metrics get sent to any of the backends? Is using a push gateway > > the best practice in this case? > > > I'll start by saying that I'm not all that familiar with Google Cloud, > as we use AWS mostly, but in terms of good practice for Prometheus the > answer is always to access the underlying instances/pods/containers > directly and not via a load balancer. I'd normally use one of the > Service Discovery (SD) mechanisms to find those (e.g. Kubernetes SD for > pods or AWS SD for EC2 instances). Hopefully you can do something > similar with the GCE SD > ( > https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config > ). > > If it isn't possible to connect to such instances (for example for > Lambdas in AWS) I would then look to connect the cloud's native metrics > system to Prometheus. So for AWS I'd look at using the CloudWatch > Exporter. It looks like there is a Stackdriver Exporter, which I think > would be the equivalent for GCE? > > The Push Gateway isn't designed, and is a very poor fit, for these sort > of use cases. The Push Gateway is really for short lived processes that > can't be directly scraped due to the limited time they exist (for > example cron jobs). Equally it works best when there is only a single > (or a fixed number) of parallel instances of that short lived process > (e.g. for a cron job you'd expect only a single run every configured > period). When you send metrics to the Push Gateway you replace the > previous set, so for multiple instances (or jobs) you'd have different > prefixes. If the number of instances is dynamic you'd end up with > metrics for instances that still exist in Push Gateway, but no longer > exist in reality. People then engineer something which tries to keep the > Push Gateway "tidy", but you end up with something that is complex and > probably not that reliable. > > So in short, the Push Gateway is unlikely to be useful at all for your > use case. Instead try to connect to instances directly (behind the load > balancer) and if not possible look at integration with the Google > metrics system. > > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CANuTqDKv8Gv7gcswsL4QW5F%3DgV%2BwDdvstFWUA5tf2aQyEp4eSQ%40mail.gmail.com.

