Re: [dnsdist] greqp() output columns
On 02/10/2023 19:17, Christoph via dnsdist wrote: I don't think we have a way to log only these, unfortunately :-/ If you have the dnsdist console set up, you can use grepq('1000ms') to look at all queries that took more than 1 second, which is usually indicative of a problem, or even grepq('2000ms'), as dnsdist records timeouts with a very high response time. Thanks for this suggestion. out of ~200 lines from the grepq('3000ms') output 184 lines end with ... T.O RD No Error. 0 answers examples: aPPLE.CoM. A T.O RD No Error. 0 answers fACeboOK.COm. T.O RD No Error. 0 answers does "T.O" in the Lat. column stand for timeout? Yes, it means that dnsdist believes it did not get a response from the backend in time. Best regards, -- Remi Gacogne PowerDNS.COM BV - https://www.powerdns.com/ ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] greqp() output columns
Hi Remi, I don't think we have a way to log only these, unfortunately :-/ If you have the dnsdist console set up, you can use grepq('1000ms') to look at all queries that took more than 1 second, which is usually indicative of a problem, or even grepq('2000ms'), as dnsdist records timeouts with a very high response time. Thanks for this suggestion. out of ~200 lines from the grepq('3000ms') output 184 lines end with ... T.O RD No Error. 0 answers examples: aPPLE.CoM. A T.O RDNo Error. 0 answers fACeboOK.COm. T.O RDNo Error. 0 answers does "T.O" in the Lat. column stand for timeout? best regards, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] backend drops metrics for TCP
Hi Christoph, On 13/09/2023 07:30, Christoph via dnsdist wrote: I've switched back to using UDP. Is there an easy way to log queries that timeout (2s) - and not log any others? To investigate some examples further? I don't think we have a way to log only these, unfortunately :-/ If you have the dnsdist console set up, you can use grepq('1000ms') to look at all queries that took more than 1 second, which is usually indicative of a problem, or even grepq('2000ms'), as dnsdist records timeouts with a very high response time. Best regards, -- Remi Gacogne PowerDNS.COM BV - https://www.powerdns.com/ OpenPGP_signature.asc Description: OpenPGP digital signature ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] dnsdist latency bucket metric still broken in 1.8.0?
Hi! On 03/09/2023 11:08, Christoph via dnsdist wrote: latency-doh-avg100 contains only a single avg value compared to latency-bucket. Was there a specific reason, for not having a latency-bucket for DoH/DoT queries as well? I do not recall whether this was an explicit decision, but my guess is that we were not expecting as much scrutiny over the DoT/DoH latency as with the UDP one. I am very willing to add latency-bucket for DoT, DoH and the upcoming DoQ, so I have put the issue you opened into the 1.9 milestone. Thanks! -- Remi Gacogne PowerDNS.COM BV - https://www.powerdns.com/ OpenPGP_signature.asc Description: OpenPGP digital signature ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] dnsdist 1.7.4 Debian Bullseye vs 1.8.4 Bullseye
Hi Ales, On 25/09/2023 16:09, Aleš Rygl via dnsdist wrote: I would to kindly ask for help or and advice. I have just upgraded one of our dnsdist instances from 1.7.4 do 1.8.4 together with OS upgrade (Debian 11.7 to 12.1). Everything works fine, no issues observed apart some deprecated config references. What is a big surprise to me is CPU usage. The newer version has nearly two times higher CPU consumption in userspace. I am nearly at 80% CPU with 16 physical cores (was about 40%). We have a lot of TLS (DoT) sessions (30k) and 60kqps in total (30k via DoT) here. The latency measured by dnsdist went up also. We are collecting all the metrics dnsdist produces via graphite so I can check counters, what could be wrong. Wow, that's awful. It's the first time I hear about such a regression, and I really would like to understand what is going on. 1/ Are you using our packages, compiling yourself, or perhaps using the Debian ones? 2/ Do you think it would be possible for you to try downgrading the instance to 1.7.4 on Debian 12.1? It might help us pinpointing whether the issue is related to a system change (I have seen people complain about the performance of OpenSSL 3.0.x compared to 1.1.1x, for example). 3/ Would you mind sharing your configuration? 4/ And finally, do you think it would be possible for you to collect a perf trace on this instance? It would require installing linux-perf, if possible the debug symbols for dnsdist (dnsdist-dbgsym) then running 'perf record --call-graph dwarf -p -o ' for a few dozens of seconds to collect a trace, stopping it with Ctrl+C and finally getting a report with "perf report -i --stdio". It should tell us where the CPU usage is going. Best regards, -- Remi Gacogne PowerDNS.COM BV - https://www.powerdns.com/ OpenPGP_signature.asc Description: OpenPGP digital signature ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist