Re: [dnsdist] greqp() output columns

2023-10-02 Thread Remi Gacogne via dnsdist

On 02/10/2023 19:17, Christoph via dnsdist wrote:
I don't think we have a way to log only these, unfortunately :-/ If 
you have the dnsdist console set up, you can use grepq('1000ms') to 
look at all queries that took more than 1 second, which is usually 
indicative of a problem, or even grepq('2000ms'), as dnsdist records 
timeouts with a very high response time.


Thanks for this suggestion.

out of ~200 lines from the grepq('3000ms') output 184 lines end with
... T.O RD No Error. 0 answers

examples:
aPPLE.CoM. A T.O   RD    No Error. 0 answers
fACeboOK.COm.     T.O   RD    No Error. 0 answers

does "T.O" in the Lat. column stand for timeout?


Yes, it means that dnsdist believes it did not get a response from the 
backend in time.


Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/

___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] greqp() output columns

2023-10-02 Thread Christoph via dnsdist

Hi Remi,

I don't think we have a way to log only these, unfortunately :-/ If you 
have the dnsdist console set up, you can use grepq('1000ms') to look at 
all queries that took more than 1 second, which is usually indicative of 
a problem, or even grepq('2000ms'), as dnsdist records timeouts with a 
very high response time.


Thanks for this suggestion.

out of ~200 lines from the grepq('3000ms') output 184 lines end with
... T.O RD No Error. 0 answers

examples:
aPPLE.CoM.  A T.O   RDNo Error. 0 answers
fACeboOK.COm.     T.O   RDNo Error. 0 answers

does "T.O" in the Lat. column stand for timeout?

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] backend drops metrics for TCP

2023-10-02 Thread Remi Gacogne via dnsdist

Hi Christoph,

On 13/09/2023 07:30, Christoph via dnsdist wrote:

I've switched back to using UDP.
Is there an easy way to log queries that timeout (2s) - and not log any 
others? To investigate some examples further?


I don't think we have a way to log only these, unfortunately :-/ If you 
have the dnsdist console set up, you can use grepq('1000ms') to look at 
all queries that took more than 1 second, which is usually indicative of 
a problem, or even grepq('2000ms'), as dnsdist records timeouts with a 
very high response time.


Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/



OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist latency bucket metric still broken in 1.8.0?

2023-10-02 Thread Remi Gacogne via dnsdist

Hi!

On 03/09/2023 11:08, Christoph via dnsdist wrote:
latency-doh-avg100 contains only a single avg value compared to 
latency-bucket.
Was there a specific reason, for not having a latency-bucket for DoH/DoT 
queries as well?


I do not recall whether this was an explicit decision, but my guess is 
that we were not expecting as much scrutiny over the DoT/DoH latency as 
with the UDP one. I am very willing to add latency-bucket for DoT, DoH 
and the upcoming DoQ, so I have put the issue you opened into the 1.9 
milestone.


Thanks!
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/



OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist 1.7.4 Debian Bullseye vs 1.8.4 Bullseye

2023-10-02 Thread Remi Gacogne via dnsdist

Hi Ales,

On 25/09/2023 16:09, Aleš Rygl via dnsdist wrote:
    I would to kindly ask for help or and advice. I have just upgraded 
one of our dnsdist instances from 1.7.4 do 1.8.4 together with OS 
upgrade (Debian 11.7 to 12.1). Everything works fine, no issues 
observed apart some deprecated config references. What is a big 
surprise to me is CPU usage. The newer version has nearly two times 
higher CPU consumption in userspace. I am nearly at 80% CPU with 16 
physical cores (was about 40%). We have a lot of TLS (DoT) sessions 
(30k) and 60kqps in total (30k via DoT) here. The latency measured by 
dnsdist went up also. We are collecting all the metrics dnsdist 
produces via graphite so I can check counters, what could be wrong.


Wow, that's awful. It's the first time I hear about such a regression, 
and I really would like to understand what is going on.
1/ Are you using our packages, compiling yourself, or perhaps using the 
Debian ones?
2/ Do you think it would be possible for you to try downgrading the 
instance to 1.7.4 on Debian 12.1? It might help us pinpointing whether 
the issue is related to a system change (I have seen people complain 
about the performance of OpenSSL 3.0.x compared to 1.1.1x, for example).

3/ Would you mind sharing your configuration?
4/ And finally, do you think it would be possible for you to collect a 
perf trace on this instance? It would require installing linux-perf, if 
possible the debug symbols for dnsdist (dnsdist-dbgsym) then running 
'perf record --call-graph dwarf -p  -o 
' for a few dozens of seconds to collect a trace, 
stopping it with Ctrl+C and finally getting a report with "perf report 
-i  --stdio". It should tell us where the CPU 
usage is going.


Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/


OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist