Re: [dnsdist] dnsdist 1.7.4 Debian Bullseye vs 1.8.4 Bullseye

2023-10-09 Thread Aleš Rygl via dnsdist

Hi

On 05/10/2023 10:41, Aleš Rygl via dnsdist wrote:
 Thanks for your response. After some deep documentation reading 
and config tweaking I am nearly on the previous values regarding CPU 
load, apart from latency, which is still higher (1.3ms -> 2.3ms). I 
suspect a different way the latency is likely computed (I noticed a 
new set of latency counters for TLS, TCP, etc.) here.  The key 
configuration parameter is setMaxTCPClientThreads(). Changing 
anything else (cache shards, number of listeners, etc.) has nearly no 
impact. We had 256 with 1.7.4. now it is 16. Going up here means a 
rapid increase of CPU load, having less than 16 means dropping TCP 
connections in showTCPStats(), where Queued hits Max Queued. Insane 
values like 1024 kills the CPU. We have a physical server with 16 
phys. cores, OS sees 32 cores.


OK, this is clearly unexpected. I mean, since 1.4.0 you should not be 
needing more TCP worker threads than the number of cores, since a 
single worker can handle a lot (easily thousands) of TCP connections, 
but having a larger value should not kill the CPU so I'm wondering if 
we are busy-looping somewhere. I have not been able to reproduce that 
so far, so I would be really interested in seeing the perf output if 
you can get it.


Update: after some testing I can say that dnsdist 1.7.4 on Bookworm has 
the same issue as 1.8.1. The reason is apparently here: 
https://github.com/openssl/openssl/issues/17064. There is a safe 
workaround - lowering setMaxTCPClientThreads(). Watch out TCP queueing - 
use showTCPStats(). And improving TLS performance using STEK file can 
help as well.


I'd like to thank Remi for his excellent support.

Ales



___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist 1.7.4 Debian Bullseye vs 1.8.4 Bullseye

2023-10-05 Thread Remi Gacogne via dnsdist

Hi!

On 05/10/2023 10:41, Aleš Rygl via dnsdist wrote:
     Thanks for your response. After some deep documentation reading and 
config tweaking I am nearly on the previous values regarding CPU load, 
apart from latency, which is still higher (1.3ms -> 2.3ms). I suspect a 
different way the latency is likely computed (I noticed a new set of 
latency counters for TLS, TCP, etc.) here.  The key configuration 
parameter is setMaxTCPClientThreads(). Changing anything else (cache 
shards, number of listeners, etc.) has nearly no impact. We had 256 with 
1.7.4. now it is 16. Going up here means a rapid increase of CPU load, 
having less than 16 means dropping TCP connections in showTCPStats(), 
where Queued hits Max Queued. Insane values like 1024 kills the CPU. We 
have a physical server with 16 phys. cores, OS sees 32 cores.


OK, this is clearly unexpected. I mean, since 1.4.0 you should not be 
needing more TCP worker threads than the number of cores, since a single 
worker can handle a lot (easily thousands) of TCP connections, but 
having a larger value should not kill the CPU so I'm wondering if we are 
busy-looping somewhere. I have not been able to reproduce that so far, 
so I would be really interested in seeing the perf output if you can get 
it.


Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/



OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist 1.7.4 Debian Bullseye vs 1.8.4 Bullseye

2023-10-05 Thread Aleš Rygl via dnsdist

Hi Remi,

On 02. 10. 23 13:53, Remi Gacogne via dnsdist wrote:

Hi Ales,

On 25/09/2023 16:09, Aleš Rygl via dnsdist wrote:
    I would to kindly ask for help or and advice. I have just 
upgraded one of our dnsdist instances from 1.7.4 do 1.8.4 together 
with OS upgrade (Debian 11.7 to 12.1). Everything works fine, no 
issues observed apart some deprecated config references. What is a 
big surprise to me is CPU usage. The newer version has nearly two 
times higher CPU consumption in userspace. I am nearly at 80% CPU 
with 16 physical cores (was about 40%). We have a lot of TLS (DoT) 
sessions (30k) and 60kqps in total (30k via DoT) here. The latency 
measured by dnsdist went up also. We are collecting all the metrics 
dnsdist produces via graphite so I can check counters, what could be 
wrong.


Wow, that's awful. It's the first time I hear about such a regression, 
and I really would like to understand what is going on.
1/ Are you using our packages, compiling yourself, or perhaps using 
the Debian ones?
2/ Do you think it would be possible for you to try downgrading the 
instance to 1.7.4 on Debian 12.1? It might help us pinpointing whether 
the issue is related to a system change (I have seen people complain 
about the performance of OpenSSL 3.0.x compared to 1.1.1x, for example).

3/ Would you mind sharing your configuration?
4/ And finally, do you think it would be possible for you to collect a 
perf trace on this instance? It would require installing linux-perf, 
if possible the debug symbols for dnsdist (dnsdist-dbgsym) then 
running 'perf record --call-graph dwarf -p process> -o ' for a few dozens of seconds to 
collect a trace, stopping it with Ctrl+C and finally getting a report 
with "perf report -i  --stdio". It should tell 
us where the CPU usage is going.


Best regards,

    Thanks for your response. After some deep documentation reading and 
config tweaking I am nearly on the previous values regarding CPU load, 
apart from latency, which is still higher (1.3ms -> 2.3ms). I suspect a 
different way the latency is likely computed (I noticed a new set of 
latency counters for TLS, TCP, etc.) here.  The key configuration 
parameter is setMaxTCPClientThreads(). Changing anything else (cache 
shards, number of listeners, etc.) has nearly no impact. We had 256 with 
1.7.4. now it is 16. Going up here means a rapid increase of CPU load, 
having less than 16 means dropping TCP connections in showTCPStats(), 
where Queued hits Max Queued. Insane values like 1024 kills the CPU. We 
have a physical server with 16 phys. cores, OS sees 32 cores.


Back to your questions:

1/ from your repos
2/ yes, I could try it, the thing is that 1.7.4 for Bullseye crashes on 
Bookworm wit TLS enabled and there a no packages of 1.7.4 for Bookworm 
in your repo

3/ sure, I will do so
4/ no problem

Best regards

Ales





___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist 1.7.4 Debian Bullseye vs 1.8.4 Bullseye

2023-10-02 Thread Remi Gacogne via dnsdist

Hi Ales,

On 25/09/2023 16:09, Aleš Rygl via dnsdist wrote:
    I would to kindly ask for help or and advice. I have just upgraded 
one of our dnsdist instances from 1.7.4 do 1.8.4 together with OS 
upgrade (Debian 11.7 to 12.1). Everything works fine, no issues 
observed apart some deprecated config references. What is a big 
surprise to me is CPU usage. The newer version has nearly two times 
higher CPU consumption in userspace. I am nearly at 80% CPU with 16 
physical cores (was about 40%). We have a lot of TLS (DoT) sessions 
(30k) and 60kqps in total (30k via DoT) here. The latency measured by 
dnsdist went up also. We are collecting all the metrics dnsdist 
produces via graphite so I can check counters, what could be wrong.


Wow, that's awful. It's the first time I hear about such a regression, 
and I really would like to understand what is going on.
1/ Are you using our packages, compiling yourself, or perhaps using the 
Debian ones?
2/ Do you think it would be possible for you to try downgrading the 
instance to 1.7.4 on Debian 12.1? It might help us pinpointing whether 
the issue is related to a system change (I have seen people complain 
about the performance of OpenSSL 3.0.x compared to 1.1.1x, for example).

3/ Would you mind sharing your configuration?
4/ And finally, do you think it would be possible for you to collect a 
perf trace on this instance? It would require installing linux-perf, if 
possible the debug symbols for dnsdist (dnsdist-dbgsym) then running 
'perf record --call-graph dwarf -p  -o 
' for a few dozens of seconds to collect a trace, 
stopping it with Ctrl+C and finally getting a report with "perf report 
-i  --stdio". It should tell us where the CPU 
usage is going.


Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/


OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist 1.7.4 Debian Bullseye vs 1.8.4 Bullseye

2023-09-25 Thread Aleš Rygl via dnsdist
Ah, I am sorry, the subject should be  1.7.4 Debian Bullseye vs 1.8.1 
Bookworm. I am running 1.8.1 on Bookworm...

Ales

On 25. 09. 23 16:01, Aleš Rygl via dnsdist wrote:

Hello,

    I would to kindly ask for help or and advice. I have just upgraded 
one of our dnsdist instances from 1.7.4 do 1.8.4 together with OS 
upgrade (Debian 11.7 to 12.1). Everything works fine, no issues 
observed apart some deprecated config references. What is a big 
surprise to me is CPU usage. The newer version has nearly two times 
higher CPU consumption in userspace. I am nearly at 80% CPU with 16 
physical cores (was about 40%). We have a lot of TLS (DoT) sessions 
(30k) and 60kqps in total (30k via DoT) here. The latency measured by 
dnsdist went up also. We are collecting all the metrics dnsdist 
produces via graphite so I can check counters, what could be wrong.


    Thanks in advance

With best regards

Ales




___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist

___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist