Just to close the loop here, the issue ended up being that HTTP/2 is 
disabled.
https://github.com/prometheus/prometheus/issues/9068

On Wednesday, July 7, 2021 at 2:07:02 PM UTC-7 Travis Illig wrote:

> I can create an Ubuntu container and verify connectivity to the container 
> metrics endpoint with both curl and openssl:
>
> curl https://10.244.3.10:9102/metrics --cacert 
> /etc/istio-certs/root-cert.pem --cert /etc/istio-certs/cert-chain.pem --key 
> /etc/istio-certs/key.pem --insecure
>
> openssl s_client -connect 10.244.3.10:9102 -cert 
> /etc/istio-certs/cert-chain.pem -key /etc/istio-certs/key.pem -CAfile 
> /etc/istio-certs/root-cert.pem -alpn "istio"
>
> The curl call seems to correctly auto-negotiate the TLS 1.3 comms. The 
> openssl call requires the -alpn "istio" flag to negotiate the protocol at 
> the application layer or it will fail to connect.
>
> *The results of my testing (shown below) make me think it's something in 
> Prometheus or the Go stack causing the problem.* I don't think it's an OS 
> configuration issue in the container or anything like that. However, I'm 
> not sure how to debug the Prometheus/Go side of things.
>
> A more verbose log from curl shows it will default to HTTP/2 (which I 
> recall seeing is disabled in Prometheus at the moment).
>
> root@sleep-5f98748557-s4wh5:/# curl https://10.244.3.10:9102/metrics 
> --cacert /etc/istio-certs/root-cert.pem --cert 
> /etc/istio-certs/cert-chain.pem --key /etc/istio-certs/key.pem --insecure -v
> *   Trying 10.244.3.10:9102...
> * TCP_NODELAY set
> * Connected to 10.244.3.10 (10.244.3.10) port 9102 (#0)
> * ALPN, offering h2
> * ALPN, offering http/1.1
> * successfully set certificate verify locations:
> *   CAfile: /etc/istio-certs/root-cert.pem
>   CApath: /etc/ssl/certs
> * TLSv1.3 (OUT), TLS handshake, Client hello (1):
> * TLSv1.3 (IN), TLS handshake, Server hello (2):
> * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
> * TLSv1.3 (IN), TLS handshake, Request CERT (13):
> * TLSv1.3 (IN), TLS handshake, Certificate (11):
> * TLSv1.3 (IN), TLS handshake, CERT verify (15):
> * TLSv1.3 (IN), TLS handshake, Finished (20):
> * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
> * TLSv1.3 (OUT), TLS handshake, Certificate (11):
> * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
> * TLSv1.3 (OUT), TLS handshake, Finished (20):
> * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
> * ALPN, server accepted to use h2
> * Server certificate:
> *  subject: [NONE]
> *  start date: Jul  7 20:21:33 2021 GMT
> *  expire date: Jul  8 20:21:33 2021 GMT
> *  issuer: O=cluster.local
> *  SSL certificate verify ok.
> * Using HTTP2, server supports multi-use
> * Connection state changed (HTTP/2 confirmed)
> * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: 
> len=0
> * Using Stream ID: 1 (easy handle 0x564d80d81e10)
> > GET /metrics HTTP/2
> > Host: 10.244.3.10:9102
> > user-agent: curl/7.68.0
> > accept: */*
> >
> * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
> * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
> * old SSL session ID is stale, removing
> * Connection state changed (MAX_CONCURRENT_STREAMS == 2147483647 
> <(214)%20748-3647>)!
> < HTTP/2 200
>
> I can add --http1.1 to force HTTP/1.1 and it'll still work:
>
> root@sleep-5f98748557-s4wh5:/# curl https://10.244.3.10:9102/metrics 
> --cacert /etc/istio-certs/root-cert.pem --cert 
> /etc/istio-certs/cert-chain.pem --key /etc/istio-certs/key.pem --insecure 
> -v --http1.1
> *   Trying 10.244.3.10:9102...
> * TCP_NODELAY set
> * Connected to 10.244.3.10 (10.244.3.10) port 9102 (#0)
> * ALPN, offering http/1.1
> * successfully set certificate verify locations:
> *   CAfile: /etc/istio-certs/root-cert.pem
>   CApath: /etc/ssl/certs
> * TLSv1.3 (OUT), TLS handshake, Client hello (1):
> * TLSv1.3 (IN), TLS handshake, Server hello (2):
> * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
> * TLSv1.3 (IN), TLS handshake, Request CERT (13):
> * TLSv1.3 (IN), TLS handshake, Certificate (11):
> * TLSv1.3 (IN), TLS handshake, CERT verify (15):
> * TLSv1.3 (IN), TLS handshake, Finished (20):
> * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
> * TLSv1.3 (OUT), TLS handshake, Certificate (11):
> * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
> * TLSv1.3 (OUT), TLS handshake, Finished (20):
> * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
> * ALPN, server accepted to use http/1.1
> * Server certificate:
> *  subject: [NONE]
> *  start date: Jul  7 20:21:33 2021 GMT
> *  expire date: Jul  8 20:21:33 2021 GMT
> *  issuer: O=cluster.local
> *  SSL certificate verify ok.
> > GET /metrics HTTP/1.1
> > Host: 10.244.3.10:9102
> > User-Agent: curl/7.68.0
> > Accept: */*
> >
> * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
> * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
> * old SSL session ID is stale, removing
> * Mark bundle as not supporting multiuse
> < HTTP/1.1 200 OK
>
> Since that works it makes me wonder if there's something wrong with the 
> ALPN handling in the way HTTP/2 is disabled at the moment, like maybe it's 
> not negotiating right? I have no idea, I'm mostly grasping at straws.
>
> On Tuesday, July 6, 2021 at 1:24:13 PM UTC-7 Travis Illig wrote:
>
>> It's not the certificate handling. I tried setting GODEBUG as indicated 
>> in the docs and that didn't fix anything. I'm starting to wonder if it's an 
>> HTTP/2 issue or something similar but I'm not sure how to determine if 
>> that's the problem.
>>
>> The error message in Prometheus debug logs isn't super helpful, it just 
>> seems to indicate a protocol problem.
>>
>> level=debug ts=2021-07-06T20:00:50.996Z caller=scrape.go:1091 
>> component="scrape manager" scrape_pool=kubernetes-pods-istio-secure target=
>> https://10.244.3.10:9102/metrics msg="Scrape failed" err="Get \"
>> https://10.244.3.10:9102/metrics\": read tcp 10.244.4.85:51794->
>> 10.244.3.10:9102: read: connection reset by peer"
>>
>> On Tuesday, July 6, 2021 at 12:01:08 PM UTC-7 Travis Illig wrote:
>>
>>> I've verified:
>>>
>>>    - v2.20.1 is the last version where the mTLS scraping works.
>>>    - It doesn't matter which Docker registry you pull from (Docker Hub 
>>>    or quay.io - I've sometimes seen different "versions" of containers 
>>>    based on registry).
>>>
>>> Looking at the release notes for v2.21.0 
>>> <https://github.com/prometheus/prometheus/releases/tag/v2.21.0> it 
>>> appears there's a new version of Go used for compilation which includes 
>>> some changes on how certificates are handled 
>>> <https://golang.org/doc/go1.15#commonname>. Unclear if this is what I'm 
>>> hitting, but it seems worth looking into.
>>>
>>> On Tuesday, July 6, 2021 at 11:02:56 AM UTC-7 Travis Illig wrote:
>>>
>>>> I'm deploying Prometheus using the Helm chart 
>>>> <https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus>
>>>>  
>>>> and I have it configured to scrape Istio mTLS-secured pods using the 
>>>> TLS settings specified 
>>>> <https://istio.io/latest/docs/ops/integrations/prometheus/#tls-settings> 
>>>> by the Istio team to do so. Basically what this amounts to is:
>>>>
>>>>    - Add the Istio sidecar to the Prometheus instance but disable all 
>>>>    traffic proxying - you just want to get the certificates from it.
>>>>    - Mount the certificates into the Prometheus container.
>>>>    - Set up your scrape configuration to use the certificates when 
>>>>    scraping Istio-enabled pods.
>>>>
>>>> The YAML for the scrape configuration looks like this:
>>>>
>>>> - job_name: "kubernetes-pods-istio-secure"
>>>>   scheme: https
>>>>   tls_config:
>>>>     ca_file: /etc/istio-certs/root-cert.pem
>>>>     cert_file: /etc/istio-certs/cert-chain.pem
>>>>     key_file: /etc/istio-certs/key.pem
>>>>     insecure_skip_verify: true
>>>>
>>>> *This totally works using Prometheus v2.20.1* packaged as 
>>>> `prom/prometheus` from Docker Hub.
>>>>
>>>> *This fails on Prometheus v2.28.0* packaged as `
>>>> quay.io/prometheus/prometheus` <http://quay.io/prometheus/prometheus>. 
>>>> Instead of getting a successful scrape, I get "connection reset by peer." 
>>>> I've validated the files are there and properly mounted; they have the 
>>>> expected contents; and there are no Prometheus log messages to indicate 
>>>> anything is amiss.
>>>>
>>>> I've been rolling back slowly to see where it starts working again. 
>>>> I've tried v2.26.0 and it still fails. I thought I'd drop a note in here 
>>>> to 
>>>> see if anyone knows what's up.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/29b298b6-a241-4ba7-9111-4bdc31b21547n%40googlegroups.com.

Reply via email to