Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-17 Thread Otto Moerbeek via dnsdist
On Sun, Mar 17, 2024 at 06:41:13PM +0100, Christoph via dnsdist wrote:

> Hi,
> 
> in February we upgraded our test DoH/DoT server from 1.8.3 to 1.9.0
> but we did not notice any problems so we upgraded our production server
> from 1.8.3 to 1.9.0 yesterday.
> 
> Immediately after upgrading our monitoring claimed our DoH service is
> unavailable (HTTP 400) but we were unable to reproduce it using firefox.
> 
> A closer look confirmed that there is some issue because we see about 50%
> less DoH requests in our grafana graphs showing DoH request rates.
> 
> Having a look at the request rates per HTTP method suggests that we "loose"
> almost all GET requests but also a significant fraction of POST DoH
> requests.
> 
> sum by (method) 
> (irate(dnsdist_frontend_doh_http_method_queries{job="$job"}[$__rate_interval]))
> 
> After looking at the TLS versions graph I noticed a clear correlation
> but then I realized that all our DoH requests are TLS version 1.3
> because we set minTLSVersion='tls1.3' - so this might be irrelevant.
> 
> irate(dnsdist_frontend_tlsqueries{job="$job"}[$__rate_interval])
> 
> 2024-03-16 20:57:59 dnsdist upgraded: 1.8.3_1 -> 1.9.0
> 2024-03-16 20:59 monitoring says DoH is down (HTTP 400 - Bad Request)
> monitoring requests this: 
> https://doh.applied-privacy.net/query?dns=l1sBAAABA3d3dw1rbm90LXJlc29sdmVyAmN6AAAcAAE
> Mar 17 02:40:45 bender-dpriv1 kernel: pid 77544 (dnsdist), jid 0, uid 208:
> exited on signal 11 -> also interesting put likely unrelated?
> 
> Today we downgraded to 1.8.3, and everything went back to normal.
> 
> Is anyone else observing similar issues on dnsdist 1.9.0?
> 
> DoT does not appear to be affected.
> 
> best regards,
> Christoph
> 
> OS: FreeBSD 13.2
> dnsdist installed via pkg
> 
> our dnsdist config:
> 
> newServer({address="109.70.100.136", maxInFlight=1000, sockets=32,
> name="clamps"})
> newServer({address="109.70.100.140", maxInFlight=1000, sockets=32,
> name="roberto"})
> --newServer({address="109.70.100.133", sockets=4, name="titanius-dpriv1"})
> setServerPolicy(leastOutstanding)
> 
> addTLSLocal("0.0.0.0",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
> {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256',
> minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
> addTLSLocal("[::]",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
> {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256',
> minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
> 
> addDOHLocal("0.0.0.0:444",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key",
> "/query", {minTLSVersion='tls1.3', serverTokens='doh',
> tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
> addDOHLocal("[::]:444",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key",
> "/query", {minTLSVersion='tls1.3', serverTokens='doh',
> tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
> 
> setACL({'0.0.0.0/0', '::/0'})
> controlSocket('127.0.0.1:5199')
> setConsoleACL('127.0.0.1/8')
> 
> setKey()
> 
> pc = newPacketCache(5, {maxTTL=86400, minTTL=3, temporaryFailureTTL=60,
> staleTTL=60, dontAge=false})
> getPool(""):setCache(pc)
> 
> webserver("127.0.0.1:8083")
> setWebserverConfig({...})
> setVerboseHealthChecks(true)
> addAction(QTypeRule(65535), RCodeAction(DNSRCode.NOTIMP))


This might be related: https://github.com/PowerDNS/pdns/issues/13850,
not backported yet

-Otto

___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


[dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-17 Thread Christoph via dnsdist

Hi,

in February we upgraded our test DoH/DoT server from 1.8.3 to 1.9.0
but we did not notice any problems so we upgraded our production server
from 1.8.3 to 1.9.0 yesterday.

Immediately after upgrading our monitoring claimed our DoH service is 
unavailable (HTTP 400) but we were unable to reproduce it using firefox.


A closer look confirmed that there is some issue because we see about 
50% less DoH requests in our grafana graphs showing DoH request rates.


Having a look at the request rates per HTTP method suggests that we 
"loose" almost all GET requests but also a significant fraction of POST 
DoH requests.


sum by (method) 
(irate(dnsdist_frontend_doh_http_method_queries{job="$job"}[$__rate_interval]))


After looking at the TLS versions graph I noticed a clear correlation
but then I realized that all our DoH requests are TLS version 1.3
because we set minTLSVersion='tls1.3' - so this might be irrelevant.

irate(dnsdist_frontend_tlsqueries{job="$job"}[$__rate_interval])

2024-03-16 20:57:59 dnsdist upgraded: 1.8.3_1 -> 1.9.0
2024-03-16 20:59 monitoring says DoH is down (HTTP 400 - Bad Request)
monitoring requests this: 
https://doh.applied-privacy.net/query?dns=l1sBAAABA3d3dw1rbm90LXJlc29sdmVyAmN6AAAcAAE
Mar 17 02:40:45 bender-dpriv1 kernel: pid 77544 (dnsdist), jid 0, uid 
208: exited on signal 11 -> also interesting put likely unrelated?


Today we downgraded to 1.8.3, and everything went back to normal.

Is anyone else observing similar issues on dnsdist 1.9.0?

DoT does not appear to be affected.

best regards,
Christoph

OS: FreeBSD 13.2
dnsdist installed via pkg

our dnsdist config:

newServer({address="109.70.100.136", maxInFlight=1000, sockets=32, 
name="clamps"})
newServer({address="109.70.100.140", maxInFlight=1000, sockets=32, 
name="roberto"})

--newServer({address="109.70.100.133", sockets=4, name="titanius-dpriv1"})
setServerPolicy(leastOutstanding)

addTLSLocal("0.0.0.0", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', 
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
addTLSLocal("[::]", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', 
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })


addDOHLocal("0.0.0.0:444", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
"/query", {minTLSVersion='tls1.3', serverTokens='doh', 
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
addDOHLocal("[::]:444", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
"/query", {minTLSVersion='tls1.3', serverTokens='doh', 
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })


setACL({'0.0.0.0/0', '::/0'})
controlSocket('127.0.0.1:5199')
setConsoleACL('127.0.0.1/8')

setKey()

pc = newPacketCache(5, {maxTTL=86400, minTTL=3, 
temporaryFailureTTL=60, staleTTL=60, dontAge=false})

getPool(""):setCache(pc)

webserver("127.0.0.1:8083")
setWebserverConfig({...})
setVerboseHealthChecks(true)
addAction(QTypeRule(65535), RCodeAction(DNSRCode.NOTIMP))



___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist