We have a consul cluster of 3 members and about 1k services. 
consul_exporter has been using significantly more CPU and is also logging 
this:

level=error ts=2020-06-16T23:56:46.593Z caller=consul_exporter.go:400 
msg="Failed to query service health" err="Get 
\"http://consul.service:8500/v1/health/service/[service 
name]?stale= 
<http://consul.service:8500/v1/health/service/kong-portal-awd4235b?stale=>\": 
context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

It is running as a docker container in Nomad. I bumped the CPU resource 
from the default to 900 MHz and also the consul.timeout to 2s. This has 
improved things, but we still sporadically receive this error. I haven't 
had a chance to dig through the entire source yet, but wondering why too 
consult_exporter has so many open connections to the same 3 consul servers:

$ netstat | grep :8500 | wc -l

13653

Why would the connections remain, and also if they do remain, not reused? I 
suspect we may be hitting up against this issue, but hoping for further 
clarification:

https://github.com/prometheus/consul_exporter/issues/102

Thanks!

Dennis


-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ece427fb-99ea-4deb-a99c-60707f2c807dn%40googlegroups.com.

Reply via email to