Hi
On 05/10/2023 10:41, Aleš Rygl via dnsdist wrote:
     Thanks for your response. After some deep documentation reading and config tweaking I am nearly on the previous values regarding CPU load, apart from latency, which is still higher (1.3ms -> 2.3ms). I suspect a different way the latency is likely computed (I noticed a new set of latency counters for TLS, TCP, etc.) here.  The key configuration parameter is setMaxTCPClientThreads(). Changing anything else (cache shards, number of listeners, etc.) has nearly no impact. We had 256 with 1.7.4. now it is 16. Going up here means a rapid increase of CPU load, having less than 16 means dropping TCP connections in showTCPStats(), where Queued hits Max Queued. Insane values like 1024 kills the CPU. We have a physical server with 16 phys. cores, OS sees 32 cores.

OK, this is clearly unexpected. I mean, since 1.4.0 you should not be needing more TCP worker threads than the number of cores, since a single worker can handle a lot (easily thousands) of TCP connections, but having a larger value should not kill the CPU so I'm wondering if we are busy-looping somewhere. I have not been able to reproduce that so far, so I would be really interested in seeing the perf output if you can get it.

Update: after some testing I can say that dnsdist 1.7.4 on Bookworm has the same issue as 1.8.1. The reason is apparently here: https://github.com/openssl/openssl/issues/17064. There is a safe workaround - lowering setMaxTCPClientThreads(). Watch out TCP queueing - use showTCPStats(). And improving TLS performance using STEK file can help as well.

I'd like to thank Remi for his excellent support.

Ales



_______________________________________________
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist

Reply via email to