Just to close the loop here, the issue ended up being that HTTP/2 is disabled. https://github.com/prometheus/prometheus/issues/9068
On Wednesday, July 7, 2021 at 2:07:02 PM UTC-7 Travis Illig wrote: > I can create an Ubuntu container and verify connectivity to the container > metrics endpoint with both curl and openssl: > > curl https://10.244.3.10:9102/metrics --cacert > /etc/istio-certs/root-cert.pem --cert /etc/istio-certs/cert-chain.pem --key > /etc/istio-certs/key.pem --insecure > > openssl s_client -connect 10.244.3.10:9102 -cert > /etc/istio-certs/cert-chain.pem -key /etc/istio-certs/key.pem -CAfile > /etc/istio-certs/root-cert.pem -alpn "istio" > > The curl call seems to correctly auto-negotiate the TLS 1.3 comms. The > openssl call requires the -alpn "istio" flag to negotiate the protocol at > the application layer or it will fail to connect. > > *The results of my testing (shown below) make me think it's something in > Prometheus or the Go stack causing the problem.* I don't think it's an OS > configuration issue in the container or anything like that. However, I'm > not sure how to debug the Prometheus/Go side of things. > > A more verbose log from curl shows it will default to HTTP/2 (which I > recall seeing is disabled in Prometheus at the moment). > > root@sleep-5f98748557-s4wh5:/# curl https://10.244.3.10:9102/metrics > --cacert /etc/istio-certs/root-cert.pem --cert > /etc/istio-certs/cert-chain.pem --key /etc/istio-certs/key.pem --insecure -v > * Trying 10.244.3.10:9102... > * TCP_NODELAY set > * Connected to 10.244.3.10 (10.244.3.10) port 9102 (#0) > * ALPN, offering h2 > * ALPN, offering http/1.1 > * successfully set certificate verify locations: > * CAfile: /etc/istio-certs/root-cert.pem > CApath: /etc/ssl/certs > * TLSv1.3 (OUT), TLS handshake, Client hello (1): > * TLSv1.3 (IN), TLS handshake, Server hello (2): > * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): > * TLSv1.3 (IN), TLS handshake, Request CERT (13): > * TLSv1.3 (IN), TLS handshake, Certificate (11): > * TLSv1.3 (IN), TLS handshake, CERT verify (15): > * TLSv1.3 (IN), TLS handshake, Finished (20): > * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): > * TLSv1.3 (OUT), TLS handshake, Certificate (11): > * TLSv1.3 (OUT), TLS handshake, CERT verify (15): > * TLSv1.3 (OUT), TLS handshake, Finished (20): > * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 > * ALPN, server accepted to use h2 > * Server certificate: > * subject: [NONE] > * start date: Jul 7 20:21:33 2021 GMT > * expire date: Jul 8 20:21:33 2021 GMT > * issuer: O=cluster.local > * SSL certificate verify ok. > * Using HTTP2, server supports multi-use > * Connection state changed (HTTP/2 confirmed) > * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: > len=0 > * Using Stream ID: 1 (easy handle 0x564d80d81e10) > > GET /metrics HTTP/2 > > Host: 10.244.3.10:9102 > > user-agent: curl/7.68.0 > > accept: */* > > > * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): > * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): > * old SSL session ID is stale, removing > * Connection state changed (MAX_CONCURRENT_STREAMS == 2147483647 > <(214)%20748-3647>)! > < HTTP/2 200 > > I can add --http1.1 to force HTTP/1.1 and it'll still work: > > root@sleep-5f98748557-s4wh5:/# curl https://10.244.3.10:9102/metrics > --cacert /etc/istio-certs/root-cert.pem --cert > /etc/istio-certs/cert-chain.pem --key /etc/istio-certs/key.pem --insecure > -v --http1.1 > * Trying 10.244.3.10:9102... > * TCP_NODELAY set > * Connected to 10.244.3.10 (10.244.3.10) port 9102 (#0) > * ALPN, offering http/1.1 > * successfully set certificate verify locations: > * CAfile: /etc/istio-certs/root-cert.pem > CApath: /etc/ssl/certs > * TLSv1.3 (OUT), TLS handshake, Client hello (1): > * TLSv1.3 (IN), TLS handshake, Server hello (2): > * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): > * TLSv1.3 (IN), TLS handshake, Request CERT (13): > * TLSv1.3 (IN), TLS handshake, Certificate (11): > * TLSv1.3 (IN), TLS handshake, CERT verify (15): > * TLSv1.3 (IN), TLS handshake, Finished (20): > * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): > * TLSv1.3 (OUT), TLS handshake, Certificate (11): > * TLSv1.3 (OUT), TLS handshake, CERT verify (15): > * TLSv1.3 (OUT), TLS handshake, Finished (20): > * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 > * ALPN, server accepted to use http/1.1 > * Server certificate: > * subject: [NONE] > * start date: Jul 7 20:21:33 2021 GMT > * expire date: Jul 8 20:21:33 2021 GMT > * issuer: O=cluster.local > * SSL certificate verify ok. > > GET /metrics HTTP/1.1 > > Host: 10.244.3.10:9102 > > User-Agent: curl/7.68.0 > > Accept: */* > > > * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): > * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): > * old SSL session ID is stale, removing > * Mark bundle as not supporting multiuse > < HTTP/1.1 200 OK > > Since that works it makes me wonder if there's something wrong with the > ALPN handling in the way HTTP/2 is disabled at the moment, like maybe it's > not negotiating right? I have no idea, I'm mostly grasping at straws. > > On Tuesday, July 6, 2021 at 1:24:13 PM UTC-7 Travis Illig wrote: > >> It's not the certificate handling. I tried setting GODEBUG as indicated >> in the docs and that didn't fix anything. I'm starting to wonder if it's an >> HTTP/2 issue or something similar but I'm not sure how to determine if >> that's the problem. >> >> The error message in Prometheus debug logs isn't super helpful, it just >> seems to indicate a protocol problem. >> >> level=debug ts=2021-07-06T20:00:50.996Z caller=scrape.go:1091 >> component="scrape manager" scrape_pool=kubernetes-pods-istio-secure target= >> https://10.244.3.10:9102/metrics msg="Scrape failed" err="Get \" >> https://10.244.3.10:9102/metrics\": read tcp 10.244.4.85:51794-> >> 10.244.3.10:9102: read: connection reset by peer" >> >> On Tuesday, July 6, 2021 at 12:01:08 PM UTC-7 Travis Illig wrote: >> >>> I've verified: >>> >>> - v2.20.1 is the last version where the mTLS scraping works. >>> - It doesn't matter which Docker registry you pull from (Docker Hub >>> or quay.io - I've sometimes seen different "versions" of containers >>> based on registry). >>> >>> Looking at the release notes for v2.21.0 >>> <https://github.com/prometheus/prometheus/releases/tag/v2.21.0> it >>> appears there's a new version of Go used for compilation which includes >>> some changes on how certificates are handled >>> <https://golang.org/doc/go1.15#commonname>. Unclear if this is what I'm >>> hitting, but it seems worth looking into. >>> >>> On Tuesday, July 6, 2021 at 11:02:56 AM UTC-7 Travis Illig wrote: >>> >>>> I'm deploying Prometheus using the Helm chart >>>> <https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus> >>>> >>>> and I have it configured to scrape Istio mTLS-secured pods using the >>>> TLS settings specified >>>> <https://istio.io/latest/docs/ops/integrations/prometheus/#tls-settings> >>>> by the Istio team to do so. Basically what this amounts to is: >>>> >>>> - Add the Istio sidecar to the Prometheus instance but disable all >>>> traffic proxying - you just want to get the certificates from it. >>>> - Mount the certificates into the Prometheus container. >>>> - Set up your scrape configuration to use the certificates when >>>> scraping Istio-enabled pods. >>>> >>>> The YAML for the scrape configuration looks like this: >>>> >>>> - job_name: "kubernetes-pods-istio-secure" >>>> scheme: https >>>> tls_config: >>>> ca_file: /etc/istio-certs/root-cert.pem >>>> cert_file: /etc/istio-certs/cert-chain.pem >>>> key_file: /etc/istio-certs/key.pem >>>> insecure_skip_verify: true >>>> >>>> *This totally works using Prometheus v2.20.1* packaged as >>>> `prom/prometheus` from Docker Hub. >>>> >>>> *This fails on Prometheus v2.28.0* packaged as ` >>>> quay.io/prometheus/prometheus` <http://quay.io/prometheus/prometheus>. >>>> Instead of getting a successful scrape, I get "connection reset by peer." >>>> I've validated the files are there and properly mounted; they have the >>>> expected contents; and there are no Prometheus log messages to indicate >>>> anything is amiss. >>>> >>>> I've been rolling back slowly to see where it starts working again. >>>> I've tried v2.26.0 and it still fails. I thought I'd drop a note in here >>>> to >>>> see if anyone knows what's up. >>>> >>> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/29b298b6-a241-4ba7-9111-4bdc31b21547n%40googlegroups.com.

