Re: [dnsdist] dnsdist tuning for high qps on nxdomain ddos

2024-05-06 Thread Klaus Darilion via dnsdist
Hi Jasper!

Not that I can help you that much with dnsdist, but I want to share some things 
we have done….

I found some measurements from 2022 on a VM with 8 vCPUs.
Dnsdist with PowerDns/postgresql Backend and random queries: 20k qps
Dnsdist (with PowerDns/postgresql Backend) and hot dnsdist cache: 150k qps
Knot and random queries: 575k qps

So I think, if you do not need the dnsdist features you might be better using a 
faster nameserver for all your zones on the public facing name servers. 250K 
zones is doable with Knot and Co. We still use PowerDNS for zone provisioning 
(API) and we still use dnsdist+PowerDNS as public facing nameservers. But for 
customer which have random subdomain attacks 24x7 we use Knot as public facing 
nameserver (which get its zone via AXFR from a local PowerDNS). Of course this 
is more management overhead but solved our random subdomain attack problems.

You might be interested in my talk at DNS-OARC [1].

It was quite some work until it was running smoothly, but we now serve several 
million zones from Knot. Some things are not that easy any more and checking if 
all zones are in sync is cumbersome [2]. You might also consider, like we do, 
using 2 setups, one with dnsdist+powerdns for “normal” zones and using only 
Knot (or NSD/Bind) for “exposed” zones.

But on the other hand: If you manage to tune dnsdist please let us know 




Von: dnsdist  Im Auftrag von Jasper 
Aikema via dnsdist
Gesendet: Montag, 6. Mai 2024 16:02
Betreff: Re: [dnsdist] dnsdist tuning for high qps on nxdomain ddos

> 200k QPS is fairly low based on what you describe. Would you mind
> sharing the whole configuration (redacting passwords and keys, of
> course), and telling us a bit more about the hardware dnsdist is running on?

The server is a virtual server (Ubuntu 22.04) on our vmware platform with 16GB 
of memory and 8 cores (Intel Xeon 4214R @2.4Ghz). I have pasted the new config 
at the bottom of this message.

> 6 times the amount of cores is probably not a good idea. I usually
> advise to make it so that the number of threads is roughly equivalent to
> the number of cores that are dedicated to dnsdist, so in your case the
> number of addLocal + the number of newServer + the number of TCP workers
> should ideally match the number of cores you have. If you need to
> overcommit the cores a bit that's fine, but keep it to something like
> twice the number of cores you have, not 10 times.

> I'm pretty sure this does not make sense, I would first go with the
> default until you see TCP/DoT connections are not processed correctly.

I did overcommit / try to tune, because I was getting a high number of 
udp-in-errors and also a high number of Drops in showServers().
If those issues are gone, I agree there should be no reason to overcommit.

> When you say it doesn't work for NXDomain, I'm assuming you mean it
> doesn't solve the problem of random sub-domains attacks, not that a
> NXDomain is not properly cached/accounted?

Yes. That is indeed what I meant, the responses are getting cached, but that is 
exactly why nxdomains attacks are working. They request a lot of random 
sub-domains and caching doesnt help making it more responsive.

> I expect lowering the number of threads will reduce the context switches
> a lot. If you are still not getting good QPS numbers, I would suggest
> checking if disabling the rules help, to figure out the bottleneck. You
> might also want to take a look with "perf top -p "
> during the high load to see where the CPU time is spent.

I have updated the config and lowered the threads. But now I get a high number 
of udp-in-errors. The perf top command gives:

Samples: 80K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 
15028605853 lost: 0/0 drop: 0/0
Overhead  Shared Object   Symbol
   4.78%  [kernel][k] __lock_text_start
   2.29%  [kernel][k] 
   2.29%  [kernel][k] 
   1.86%  [nf_conntrack]  [k] 
   1.81%  [kernel][k] __fget_files
   1.42%  [kernel][k] _raw_spin_lock
   1.39%  [vmxnet3]   [k] 
   1.34%  [kernel][k] 
   1.32%  [nf_tables] [k] nft_do_chain
   1.23%   [.] cfree
   1.08%  [kernel][k] 

Re: [dnsdist] [EXT] AW: Suggestions for rules to block abusive traffic

2024-01-09 Thread Klaus Darilion via dnsdist
Hi Remi!

Thanks for the details. 

> > Blocking all queries to the attacked domain prevents collateral
> damage, but causes a DoS to the attacked domain and makes the customer
> of the attacked domain unhappy.
> I fully agree, and we are working on having smarter mitigations in
> dnsdist to only drops/truncate/route to a different pool queries that
> are very likely to be part of a PRSD/enumeration attack.

Do you already have ideas how to implement that? I have thought a lot about an 
algorithm to block only "bad" queries bad have not found a method yet.

For authoritative nameservers, meanwhile I think it would be better to just 
load the attacked zone completely into dnsdist or pdns-cache (or something 
similar to aggressive caching). Because I think just answering (mostly 
NXDOMAIN) may be faster then deciding if a query is bad or good.


dnsdist mailing list

Re: [dnsdist] Suggestions for rules to block abusive traffic

2024-01-08 Thread Klaus Darilion via dnsdist
> -Ursprüngliche Nachricht-
> Von: dnsdist  Im Auftrag von
> Remi Gacogne via dnsdist
> Gesendet: Montag, 8. Januar 2024 17:51
> An:
> Betreff: Re: [dnsdist] Suggestions for rules to block abusive traffic
> Hi Dan,
> On 08/01/2024 17:28, Dan McCombs via dnsdist wrote:
> >   In our case we are affected as we use Pdns + DB backend as backend.
> >
> > Yep, that's exactly our case as well - our legacy Pdns + mysql backends
> > don't handle this very well. Longer term we intend to move away from
> > that, but finding some improvements in the meantime for handling these
> > floods would be helpful. I'll let you know if we come up with anything
> > interesting!
> This is unfortunately a common issue indeed these days. It is possible
> to use dnsdist to detect and mitigate these attacks to a certain extent,
> using the StatNode API along with DynBlockRulesGroup:setSuffixMatchRule
> [1] or the FFI equivalent for better performance. It requires writing a
> bit of Lua code and some tuning on top of dnsdist, but all the building
> blocks are there already. We have implemented this for several customers
> and they are happy with the results.

Hi Remi!
How does this work in detail? Does your implementation block only the queries 
for or also "normal" queries like or MX? Or do you explicitly allow common subdomains before blocking 
everything else?

Blocking all queries to the attacked domain prevents collateral damage, but 
causes a DoS to the attacked domain and makes the customer of the attacked 
domain unhappy.

dnsdist mailing list

Re: [dnsdist] Suggestions for rules to block abusive traffic

2024-01-08 Thread Klaus Darilion via dnsdist

Von: Dan McCombs 
Gesendet: Montag, 8. Januar 2024 17:28
An: Klaus Darilion 
Betreff: Re: [dnsdist] Suggestions for rules to block abusive traffic

Hi Klaus!

 In our case we are affected as we use Pdns + DB backend as backend.

Yep, that's exactly our case as well - our legacy Pdns + mysql backends don't 
handle this very well. Longer term we intend to move away from that, but 
finding some improvements in the meantime for handling these floods would be 
helpful. I'll let you know if we come up with anything interesting!

If you use PDNS make sure to use at least version 4.5 and use
 (this saves plenty of DB queries). Further, the DB server must have enough RAM 
to have the database in RAM (i.e. in the linux file buffers).

Further you might be interested in and if you plan to use 
another name server. Another very fresh option would be PDNS + lmdb backend and for replication.

For dnsdist there are probably other guys with more know.

dnsdist mailing list

Re: [dnsdist] Suggestions for rules to block abusive traffic

2024-01-08 Thread Klaus Darilion via dnsdist
Hi Dan!

This is a known issue and we have not found a simple solution in dnsdist. And 
obviously it is only a problem if the backend is slow. In our case we are 
affected as we use Pdns + DB backend as backend.

  1.  Use a fast name server as additional backend (we used NSD) and 
dynamically provision targeted zones (and all subzones) on the faster backend 
and redirect the zone to the fast backend (dnsdist rule). Out detection is 
based on “dsc” statistics collector.
  2.  Use a fast nameserver instead of dnsdist + slow backend (we use Knot for 
customers that are constantly under attack)

These two methods helped us, but of course add additional operations work to 
implement and operate it.

If you find a simple dnsdist based solution to filter these random queries I 
would be interested too ;-)


Von: dnsdist  Im Auftrag von Dan McCombs 
via dnsdist
Gesendet: Freitag, 29. Dezember 2023 20:11
Betreff: [dnsdist] Suggestions for rules to block abusive traffic

Hi all,

I'm wondering if anyone has suggestions of reasonable ways to handle this type 
of abusive traffic with dnsdist.

We've had on and off attacks recently targeting legitimate domains delegated to 
our authoritative service flooding queries for random subdomains of varying 
length and characters/words. i.e.,,, where is a different domain we're authoritative for 
each attack.

The dnsdist nodes can handle the traffic, but breaking cache and going through 
to our backends is having more of an impact.

We have thousands of domains, so it doesn't seem reasonable to apply individual 
rate limits to them all, but if there is a straight forward way to do something 
like that I'd be happy to hear it. The source addresses are well known public 
resolvers that we shouldn't rate limit either.

I'm wondering if there's any way to detect and apply a rule dynamically to 
respond to queries for one of these domains without affecting the source IP 
address entirely, and not require us to manually add a rule for each domain as 
it occurs.

Any ideas would be appreciated.

Take care,



Dan McCombs
Senior Engineer I - DNS
dnsdist mailing list

Re: [dnsdist] Backend Questions

2022-11-02 Thread Klaus Darilion via dnsdist

> > Shouldn't newServer(...healthCheckMode='UP') also work? In my case it
> does not work.
> > I have set healthCheckMode='UP' but:
> > showServers show status as "up" whereas after setUp() the status is "UP".
> And it still does helathchecks and status goes "down" if the backend is down.
> >
> > What is wrong? The docs? Am I misinterpreting the docs? Bug?
> Which version are yor running? healthCheckMode wil only be available
> in 1.8.0, which is not released yet. See

Oh I missed that comment about changes in 1.8.

dnsdist mailing list

Re: [dnsdist] Backend Questions

2022-11-02 Thread Klaus Darilion via dnsdist
(resent to the list)

Hi Remi!

> On 07/10/2022 10:53, Klaus Darilion via dnsdist wrote:
> > We use dnsdist with 1 single backend server (PDNS). So if this backend
> > is overloaded, dnsdist will detect the backend as DOWN. Hence, the only
> > server for this backend pool down. How will dnsdist behave if all
> > servers for a backend pool are down? Will it stop senden queries to the
> > backend, or will it still send queries to the DOWN server as there is no
> > UP server available?
> All of the built-in load balancing policies will stop forwarding queries
> when all the servers in the selected pool are down. It would be possible
> to write a custom load-balancing in Lua that does not do that, of
> course, but I don't think that's what you want in that case.
> > So it may be useful to disable healthchecks completely. How can this be
> > done?
> You can force a server in the "up" state using the 'setUp()' method, see
> [1].

Shouldn't newServer(...healthCheckMode='UP') also work? In my case it does not 
I have set healthCheckMode='UP' but:
showServers show status as "up" whereas after setUp() the status is "UP". And 
it still does helathchecks and status goes "down" if the backend is down.

What is wrong? The docs? Am I misinterpreting the docs? Bug?

dnsdist mailing list

[dnsdist] Backend Questions

2022-10-07 Thread Klaus Darilion via dnsdist

We use dnsdist with 1 single backend server (PDNS). So if this backend is 
overloaded, dnsdist will detect the backend as DOWN. Hence, the only server for 
this backend pool down. How will dnsdist behave if all servers for a backend 
pool are down? Will it stop senden queries to the backend, or will it still 
send queries to the DOWN server as there is no UP server available?

So it may be useful to disable healthchecks completely. How can this be done?

My current config is a few years old tested with dnsdist 1.3. These days, 
dnsdist was faster when I added the listen port multiple times, and also add 
the single backend server multiple times, to have more receiver threads. For 
-- Open the same socket multiple times. This allows better load distribution
-- over all cores. Note: 1. setLocal(), dann addLocal()!
setLocal("", { reusePort=true, tcpFastOpenSize=100 })
addLocal("", { reusePort=true, tcpFastOpenSize=100 })

-- Define the Backend Server Pools. Define them multiple times to have multiple 
receiver threads
-- handling the responses from the Backend.
newServer{address='',name='pdns_1'}   -- this is the 
PowerDNS server
newServer{address='',name='pdns_2'}   -- this is the 
PowerDNS server

Is it still (dnsdist 1.6/1.7) useful/necessary to add listenSockets and 
Backendserver multiple times to improve performance?


Klaus Darilion, Head of Operations GmbH, Jakob-Haringer-Straße 8/V
5020 Salzburg, Austria
dnsdist mailing list

Re: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

2022-03-24 Thread Klaus Darilion via dnsdist
Indeed that might be a problem. We use (ferm syntax):
table raw {
# Wir wollen NOTRACK fuer eingehende DNS Anfragen und die dazugehoerigen
# ausgehenden Antworten. Ausgehende DNS Anfragen sollen weiter getrackt
# werden damit die dazugehoerige Antwort rein darf.
proto (udp tcp) dport 53 NOTRACK;
chain OUTPUT {
proto (udp tcp) sport 53 NOTRACK;
Same for IPv4 and IPv6 in our case.


Von: dnsdist  Im Auftrag von Rasto 
Rickardt via dnsdist
Gesendet: Donnerstag, 24. März 2022 11:36
Betreff: Re: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

Hello Rais,
i noticed that you are increasing nf_conntrack_max. I am not sure how the 
backend servers are connected,
but i suggest not to use connection tracking/NAT at all. You can use for 
example dedicated interface for backend
management and other one to connect to dnsdist.
On 24/03/2022 11:11, Rais Ahmed via dnsdist wrote:

Thanks for the guidance...!

We are testing with multiple scenarios, with/without kernel tuning. We observed 
UDP packets errors on both backend servers (not a single UDP error on dnsdist 
LB server).

Tested with resperf 15K QPS
resperf -s -R -d queryfile-example-10million-201202 -C 100 -c 300 
-r 0 -m 15000 -q 20

Backend 1: (without Kernel tuning):
netstat -su
InType3: 2229
InType8: 6
InType11: 194
OutType0: 6
OutType3: 762
1634847 packets received
843 packets to unknown port received.
193891 packet receive errors
1859642 packets sent
193891 receive buffer errors
0 send buffer errors
InOctets: 580762744
OutOctets: 237368675
InNoECTPkts: 1995692
InECT0Pkts: 27

Backend 2: (with Kernel Tuning):
netstat -su
InType3: 19177
InType8: 5802
InType11: 2645
OutType0: 5802
OutType3: 5122
10798358 packets received
6846 packets to unknown port received.
4815377 packet receive errors
11949871 packets sent
4815377 receive buffer errors
0 send buffer errors
InNoRoutes: 11
InOctets: 3312682950
OutOctets: 1741771756
InNoECTPkts: 16355120
InECT1Pkts: 72
InECT0Pkts: 92
InCEPkts: 4

Kernel Tuning configured in /etc/rc.local

ethtool -L eth0 combined 16
echo 52428800 > /proc/sys/net/netfilter/nf_conntrack_max
sysctl -w net.core.rmem_max=33554432
sysctl -w net.core.wmem_max=33554432
sysctl -w net.core.rmem_default=16777216
sysctl -w net.core.wmem_default=16777216
sysctl -w net.core.netdev_max_backlog=65536
sysctl -w net.core.somaxconn=1024
ulimit -n 16000

Network config/ specs are same on all three servers, are we doing something 


-Original Message-
From: Klaus Darilion 
Sent: Thursday, March 24, 2022 12:38 PM
To: Rais Ahmed;
Subject: AW: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

Have you tested how many Qps your Backend is capably to handle? First test your 
Backend performance to know how much qps a single backend can handle. I guess 
500k qps might be difficult to achieve with bind. If you need more performance 
switch the Backend to NSD or Knot.


-Ursprüngliche Nachricht-
Von: dnsdist Im Auftrag von 
Rais Ahmed via dnsdist
Gesendet: Mittwoch, 23. März 2022 22:02
Betreff: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

Thanks for reply...!

We have configured setMaxUDPOutstanding(65535) and still we are seeing 
backend down, logs are showing frequently as below.

Timeout while waiting for the health check response from backend
Timeout while waiting for the health check response from backend

Please have a look at below dnsdist configuration and help us to find 
misconfiguration (16 Listeners & 8+8 backends added as per vCPUs 
(2 Socket x 8 Cores):


 Listen addresses
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true }) 
addLocal('', { reusePort=true })

 Back-end server

Re: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

2022-03-24 Thread Klaus Darilion via dnsdist
Have you tested how many Qps your Backend is capably to handle? First test your 
Backend performance to know how much qps a single backend can handle. I guess 
500k qps might be difficult to achieve with bind. If you need more performance 
switch the Backend to NSD or Knot.


> -Ursprüngliche Nachricht-
> Von: dnsdist  Im Auftrag von
> Rais Ahmed via dnsdist
> Gesendet: Mittwoch, 23. März 2022 22:02
> An:
> Betreff: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'
> Hi,
> Thanks for reply...!
> We have configured setMaxUDPOutstanding(65535) and still we are seeing
> backend down, logs are showing frequently as below.
> Timeout while waiting for the health check response from backend
> Timeout while waiting for the health check response from backend
> Please have a look at below dnsdist configuration and help us to find
> misconfiguration (16 Listeners & 8+8 backends added as per vCPUs available
> (2 Socket x 8 Cores):
> controlSocket('')
> setKey("")
>  Listen addresses
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
> addLocal('', { reusePort=true })
>  Back-end server
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=1})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=2})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=3})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=4})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=5})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=6})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=7})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=8})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=9})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=10})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=11})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=12})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=13})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=14})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=15})
> newServer({address='', maxCheckFailures=3, checkInterval=5,
> weight=4, qps=4, order=16})
> setMaxUDPOutstanding(65535)
>  Server Load Balancing Policy
> setServerPolicy(leastOutstanding)
>  Web-server
> webserver('')
> setWebserverConfig({acl='', password='Secret'})
>  Customers Policy
> customerACLs={''}
> setACL(customerACLs)
> pc = newPacketCache(30, {maxTTL=86400, minTTL=0,
> temporaryFailureTTL=60, staleTTL=60, dontAge=false})
> getPool(""):setCache(pc)
> setVerboseHealthChecks(true)
> Servers Specs are as below:
> Dnsdist LB Server Specs: 16 vCPUs, 16 GB RAM, Virtio NIC (10G) with 16
> Multiqueues.
> Backend bind9 servers Specs: 16 vCPUs, 16GM RAM, Virtio NIC (10G) with 16
> Multiqueues.
> We are trying to handle 500K qps (will increase hardware specs, If required)
> or with above specs atleast 100K qps.
> Regards,
> Rais
> -Original Message-
> From: dnsdist  On Behalf Of
> Sent: Wednesday, March 23, 2022 5:00 PM
> To:
> Subject: dnsdist Digest, Vol 79, Issue 3
> Send dnsdist mailing list submissions to
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to

[dnsdist] XDP/eBPF blocking (was dnsdist 1.7.0 released)

2022-01-17 Thread Klaus Darilion via dnsdist
> Pierre Grié from Nameshield contributed an XDP program to reply to
> blocked UDP queries with a truncated response directly from the kernel,
> in a similar way to what we were already doing using eBPF socket
> filters. This version adds support for eBPF pinned maps, allowing
> dnsdist to populate the maps using our dynamic blocking mechanism, and
> letting the external XDP program do the actual blocking or response.

How does this work in detail? If is on these lists (filtering or 
truncate response), will it block also (and other subdomains) 
or only exactly the name on the list?

dnsdist mailing list