We have a consul cluster of 3 members and about 1k services. consul_exporter has been using significantly more CPU and is also logging this:
level=error ts=2020-06-16T23:56:46.593Z caller=consul_exporter.go:400 msg="Failed to query service health" err="Get \"http://consul.service:8500/v1/health/service/[service name]?stale= <http://consul.service:8500/v1/health/service/kong-portal-awd4235b?stale=>\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" It is running as a docker container in Nomad. I bumped the CPU resource from the default to 900 MHz and also the consul.timeout to 2s. This has improved things, but we still sporadically receive this error. I haven't had a chance to dig through the entire source yet, but wondering why too consult_exporter has so many open connections to the same 3 consul servers: $ netstat | grep :8500 | wc -l 13653 Why would the connections remain, and also if they do remain, not reused? I suspect we may be hitting up against this issue, but hoping for further clarification: https://github.com/prometheus/consul_exporter/issues/102 Thanks! Dennis -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/ece427fb-99ea-4deb-a99c-60707f2c807dn%40googlegroups.com.

