Hi! I have just upgraded from Prometheus 2.3.2 to Prometheus 2.17.1 with no more changes than replacing the binaries and scraping of our Kubernetes clusters started to fail. We host the Prometheus server in a dedicated machine in EC2 and access the K8s API via internal network. This had been working fine for several months until this upgrade.
These errors started appearing in the logs: level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:407: Failed to list *v1.Service: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out" level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:362: Failed to list *v1.Service: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out" level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:385: Failed to list *v1.Pod: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/pods?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out" level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:361: Failed to list *v1.Endpoints: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out" level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:363: Failed to list *v1.Pod: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/pods?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out" level=error ts=2020-04-15T08:35:09.846Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:449: Failed to list *v1.Node: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out" It seems to me to it's a similar issue to the one described in https://github.com/prometheus/prometheus/issues/5108 but in the discovery phase. HTTP access to the api server as it's being attempted has never been allowed and was not causing issues in the past. A sample of our configuration is at the end of the message. Any ideas or insights will be very appreciated. Regards, Miguel prometheus.yml: [...] - job_name: 'develop-kubelet' metrics_path: '/metrics' scheme: https tls_config: ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt cert_file: /etc/prometheus/certs/develop.k8s.local.crt key_file: /etc/prometheus/certs/develop.k8s.local.key kubernetes_sd_configs: - api_server: internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com role: node tls_config: ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt cert_file: /etc/prometheus/certs/develop.k8s.local.crt key_file: /etc/prometheus/certs/develop.k8s.local.key relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: https://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/ - source_labels: [kubernetes_io_hostname] target_label: node - job_name: 'develop-container' metrics_path: '/metrics' scheme: https tls_config: ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt cert_file: /etc/prometheus/certs/develop.k8s.local.crt key_file: /etc/prometheus/certs/develop.k8s.local.key kubernetes_sd_configs: - api_server: internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com role: node tls_config: ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt cert_file: /etc/prometheus/certs/develop.k8s.local.crt key_file: /etc/prometheus/certs/develop.k8s.local.key relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: https://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - source_labels: [__meta_kubernetes_namespace] target_label: namespace - source_labels: [__meta_kubernetes_node_name] target_label: node - source_labels: [kubernetes_io_hostname] target_label: node - job_name: 'develop-endpoint' metrics_path: '/metrics' scheme: https tls_config: ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt cert_file: /etc/prometheus/certs/develop.k8s.local.crt key_file: /etc/prometheus/certs/develop.k8s.local.key kubernetes_sd_configs: - api_server: internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com role: endpoints tls_config: ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt cert_file: /etc/prometheus/certs/develop.k8s.local.crt key_file: /etc/prometheus/certs/develop.k8s.local.key relabel_configs: - target_label: __address__ replacement: https://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com - source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name separator: ; regex: default;kubernetes;https replacement: $1 action: keep - job_name: 'develop-pod' metrics_path: '/metrics' scheme: https tls_config: ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt cert_file: /etc/prometheus/certs/develop.k8s.local.crt key_file: /etc/prometheus/certs/develop.k8s.local.key kubernetes_sd_configs: - api_server: internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com role: pod tls_config: ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt cert_file: /etc/prometheus/certs/develop.k8s.local.crt key_file: /etc/prometheus/certs/develop.k8s.local.key relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - target_label: __address__ replacement: https://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] regex: ^$ replacement: http target_label: __meta_kubernetes_pod_annotation_prometheus_io_scheme - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] regex: (.+) replacement: ${1} target_label: __metrics_path__ - source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_annotation_prometheus_io_scheme - __meta_kubernetes_pod_name - __meta_kubernetes_pod_annotation_prometheus_io_port - __metrics_path__ regex: (.+);(.+);(.+);(.+);(.+) action: replace target_label: __metrics_path__ replacement: /api/v1/namespaces/${1}/pods/${2}:${3}:${4}/proxy${5} - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: namespace - source_labels: [__meta_kubernetes_pod_node_name] target_label: node - source_labels: [__meta_kubernetes_pod_name] target_label: service -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/20869db6-d095-4058-8499-75c9829c6576%40googlegroups.com.

