Re: [dnsdist] dnsdist tuning for high qps on nxdomain ddos

2024-05-06 Thread Klaus Darilion via dnsdist
Hi Jasper!

Not that I can help you that much with dnsdist, but I want to share some things 
we have done….

I found some measurements from 2022 on a VM with 8 vCPUs.
Dnsdist with PowerDns/postgresql Backend and random queries: 20k qps
Dnsdist (with PowerDns/postgresql Backend) and hot dnsdist cache: 150k qps
Knot and random queries: 575k qps

So I think, if you do not need the dnsdist features you might be better using a 
faster nameserver for all your zones on the public facing name servers. 250K 
zones is doable with Knot and Co. We still use PowerDNS for zone provisioning 
(API) and we still use dnsdist+PowerDNS as public facing nameservers. But for 
customer which have random subdomain attacks 24x7 we use Knot as public facing 
nameserver (which get its zone via AXFR from a local PowerDNS). Of course this 
is more management overhead but solved our random subdomain attack problems.

You might be interested in my talk at DNS-OARC [1].

It was quite some work until it was running smoothly, but we now serve several 
million zones from Knot. Some things are not that easy any more and checking if 
all zones are in sync is cumbersome [2]. You might also consider, like we do, 
using 2 setups, one with dnsdist+powerdns for “normal” zones and using only 
Knot (or NSD/Bind) for “exposed” zones.

But on the other hand: If you manage to tune dnsdist please let us know 😉


Regards
Klaus

[1] https://indico.dns-oarc.net/event/47/contributions/1008/
https://www.youtube.com/watch?v=8UnM7_uGDv0

[2] https://indico.dns-oarc.net/event/47/contributions/1017/
https://www.youtube.com/watch?v=jgtODGv7X4Y


Von: dnsdist  Im Auftrag von Jasper 
Aikema via dnsdist
Gesendet: Montag, 6. Mai 2024 16:02
An: dnsdist@mailman.powerdns.com
Betreff: Re: [dnsdist] dnsdist tuning for high qps on nxdomain ddos

> 200k QPS is fairly low based on what you describe. Would you mind
> sharing the whole configuration (redacting passwords and keys, of
> course), and telling us a bit more about the hardware dnsdist is running on?

The server is a virtual server (Ubuntu 22.04) on our vmware platform with 16GB 
of memory and 8 cores (Intel Xeon 4214R @2.4Ghz). I have pasted the new config 
at the bottom of this message.

> 6 times the amount of cores is probably not a good idea. I usually
> advise to make it so that the number of threads is roughly equivalent to
> the number of cores that are dedicated to dnsdist, so in your case the
> number of addLocal + the number of newServer + the number of TCP workers
> should ideally match the number of cores you have. If you need to
> overcommit the cores a bit that's fine, but keep it to something like
> twice the number of cores you have, not 10 times.

> I'm pretty sure this does not make sense, I would first go with the
> default until you see TCP/DoT connections are not processed correctly.

I did overcommit / try to tune, because I was getting a high number of 
udp-in-errors and also a high number of Drops in showServers().
If those issues are gone, I agree there should be no reason to overcommit.

> When you say it doesn't work for NXDomain, I'm assuming you mean it
> doesn't solve the problem of random sub-domains attacks, not that a
> NXDomain is not properly cached/accounted?

Yes. That is indeed what I meant, the responses are getting cached, but that is 
exactly why nxdomains attacks are working. They request a lot of random 
sub-domains and caching doesnt help making it more responsive.

> I expect lowering the number of threads will reduce the context switches
> a lot. If you are still not getting good QPS numbers, I would suggest
> checking if disabling the rules help, to figure out the bottleneck. You
> might also want to take a look with "perf top -p "
> during the high load to see where the CPU time is spent.

I have updated the config and lowered the threads. But now I get a high number 
of udp-in-errors. The perf top command gives:

Samples: 80K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 
15028605853 lost: 0/0 drop: 0/0
Overhead  Shared Object   Symbol
   4.78%  [kernel][k] __lock_text_start
   2.29%  [kernel][k] 
copy_user_generic_unrolled
   2.29%  [kernel][k] 
copy_from_kernel_nofault
   1.86%  [nf_conntrack]  [k] 
__nf_conntrack_find_get
   1.81%  [kernel][k] __fget_files
   1.42%  [kernel][k] _raw_spin_lock
   1.39%  [vmxnet3]   [k] 
vmxnet3_poll_rx_only
   1.34%  [kernel][k] 
finish_task_switch.isra.0
   1.32%  [nf_tables] [k] nft_do_chain
   1.23%  libc.so.6   [.] cfree
   1.08%  [kernel][k] 
__siphash_unaligned
   1.07% 

Re: [dnsdist] dnsdist tuning for high qps on nxdomain ddos

2024-05-06 Thread Jasper Aikema via dnsdist
> 200k QPS is fairly low based on what you describe. Would you mind
> sharing the whole configuration (redacting passwords and keys, of
> course), and telling us a bit more about the hardware dnsdist is running
on?

The server is a virtual server (Ubuntu 22.04) on our vmware platform with
16GB of memory and 8 cores (Intel Xeon 4214R @2.4Ghz). I have pasted the
new config at the bottom of this message.

> 6 times the amount of cores is probably not a good idea. I usually
> advise to make it so that the number of threads is roughly equivalent to
> the number of cores that are dedicated to dnsdist, so in your case the
> number of addLocal + the number of newServer + the number of TCP workers
> should ideally match the number of cores you have. If you need to
> overcommit the cores a bit that's fine, but keep it to something like
> twice the number of cores you have, not 10 times.

> I'm pretty sure this does not make sense, I would first go with the
> default until you see TCP/DoT connections are not processed correctly.

I did overcommit / try to tune, because I was getting a high number of
udp-in-errors and also a high number of Drops in showServers().
If those issues are gone, I agree there should be no reason to overcommit.

> When you say it doesn't work for NXDomain, I'm assuming you mean it
> doesn't solve the problem of random sub-domains attacks, not that a
> NXDomain is not properly cached/accounted?

Yes. That is indeed what I meant, the responses are getting cached, but
that is exactly why nxdomains attacks are working. They request a lot of
random sub-domains and caching doesnt help making it more responsive.

> I expect lowering the number of threads will reduce the context switches
> a lot. If you are still not getting good QPS numbers, I would suggest
> checking if disabling the rules help, to figure out the bottleneck. You
> might also want to take a look with "perf top -p "
> during the high load to see where the CPU time is spent.

I have updated the config and lowered the threads. But now I get a high
number of udp-in-errors. The perf top command gives:

Samples: 80K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.):
15028605853 lost: 0/0 drop: 0/0

Overhead  Shared Object   Symbol


   4.78%  [kernel][k]
__lock_text_start

   2.29%  [kernel][k]
copy_user_generic_unrolled

   2.29%  [kernel][k]
copy_from_kernel_nofault

   1.86%  [nf_conntrack]  [k]
__nf_conntrack_find_get

   1.81%  [kernel][k] __fget_files


   1.42%  [kernel][k]
_raw_spin_lock

   1.39%  [vmxnet3]   [k]
vmxnet3_poll_rx_only

   1.34%  [kernel][k]
finish_task_switch.isra.0

   1.32%  [nf_tables] [k] nft_do_chain


   1.23%  libc.so.6   [.] cfree


   1.08%  [kernel][k]
__siphash_unaligned

   1.07%  [kernel][k]
syscall_enter_from_user_mode

   1.05%  [kernel][k]
memcg_slab_free_hook

   1.00%  [kernel][k] memset_orig


We have the following configuration:

setACL({'0.0.0.0/0', '::/0'})
controlSocket("127.0.0.1:5900")
setKey("")
webserver("127.0.0.1:8083")
setWebserverConfig({password=hashPassword("")})
addLocal(":53",{reusePort=true,tcpFastOpenQueueSize=100})
addLocal(":53",{reusePort=true,tcpFastOpenQueueSize=100})
newServer({address="127.0.0.1:54", pool="all"})
newServer({address="127.0.0.1:54", pool="all"})
newServer({address=":53", pool="abuse", tcpFastOpen=true,
maxCheckFailures=5, sockets=16})
newServer({address=":53", pool="abuse", tcpFastOpen=true,
maxCheckFailures=5, sockets=16})
addAction(OrRule({OpcodeRule(DNSOpcode.Notify),
OpcodeRule(DNSOpcode.Update), QTypeRule(DNSQType.AXFR),
QTypeRule(DNSQType.IXFR)}), RCodeAction(DNSRCode.REFUSED))
addAction(AllRule(), PoolAction("all"))

We have removed the caching and qps blocker per IP, because we are
attacking it from 4 servers.

Already thanks for all the help you can give me.

Op ma 6 mei 2024 om 10:41 schreef Remi Gacogne via dnsdist <
dnsdist@mailman.powerdns.com>:

> Hi!
>
> On 03/05/2024 22:20, Jasper Aikema via dnsdist wrote:
> > Currently we are stuck at a max of +/- 200k qps for nxdomain requests
> > and want to be able to serve +/- 300k qps per server.
>
> 200k QPS is fairly low based on what you describe. Would you mind
> sharing the whole configuration (redacting passwords and keys, of
> course), and telling us a bit more about the hardware dnsdist is running
> on?
>
> > We have done the following:
> > - added multiple (6x the amount of cores) addLocal listeners for IPv4
> > and IPv6, w

Re: [dnsdist] PowerDNS DNSdist 1.9.3 released

2024-05-06 Thread Stephane Bortzmeyer via dnsdist
On Mon, May 06, 2024 at 12:54:05PM +0200,
 Marco Davids (SIDN) via dnsdist  wrote 
 a message of 1346 lines which said:

> Is there a specific reason why the Alpine package (in Alpine Edge) of
> DNSdist comes without DoQ support?

Because the QUIC library quiche does not seem packaged?

https://pkgs.alpinelinux.org/packages?name=quiche&branch=edge&repo=&arch=&maintainer=

It is a common problem with QUIC, there are very few "official"
packages (not just on Alpine).
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] PowerDNS DNSdist 1.9.3 released

2024-05-06 Thread Marco Davids (SIDN) via dnsdist

Hi,

Is there a specific reason why the Alpine package (in Alpine Edge) of 
DNSdist comes without DoQ support?



Op 05-04-2024 om 13:55 schreef Remi Gacogne via dnsdist:

Less than an hour after the release of PowerDNS DNSdist 1.9.2 today, we 
received reports of DNSdist crashing in some setups. This 1.9.3 release 
fixes the issue that was introduced in 1.9.2, for now by reverting the 
related change.


--
𝓜𝓪𝓻𝓬𝓸 𝓓𝓪𝓿𝓲𝓭𝓼
Research Engineer

SIDN | Meander 501 | 6825 MD | Postbus 5022 | 6802 EA | ARNHEM
T +31 (0)26 352 55 00 | www.sidnlabs.nl | Twitter: @marcodavids
https://mastodon.social/@marcodavids | Matrix: @marco:sidnlabs.nl
Nostr: 11ed01ff277d94705c2931867b8d900d8bacce6f27aaf7440ce98bb50e02fb34


OpenPGP_0xBB2857E82C0F54F3.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist tuning for high qps on nxdomain ddos

2024-05-06 Thread Remi Gacogne via dnsdist

Hi!

On 03/05/2024 22:20, Jasper Aikema via dnsdist wrote:
Currently we are stuck at a max of +/- 200k qps for nxdomain requests 
and want to be able to serve +/- 300k qps per server.


200k QPS is fairly low based on what you describe. Would you mind 
sharing the whole configuration (redacting passwords and keys, of 
course), and telling us a bit more about the hardware dnsdist is running on?



We have done the following:
- added multiple (6x the amount of cores) addLocal listeners for IPv4 
and IPv6, with the options reusePort=true and tcpFastOpenQueueSize=100

> - add multiple (2x the amount of cores) newServer to the backend, with
> the options tcpFastOpen=true and sockets=(2x the amount of cores)

6 times the amount of cores is probably not a good idea. I usually 
advise to make it so that the number of threads is roughly equivalent to 
the number of cores that are dedicated to dnsdist, so in your case the 
number of addLocal + the number of newServer + the number of TCP workers 
should ideally match the number of cores you have. If you need to 
overcommit the cores a bit that's fine, but keep it to something like 
twice the number of cores you have, not 10 times.



- setMaxTCPClientThreads(1000)
I'm pretty sure this does not make sense, I would first go with the 
default until you see TCP/DoT connections are not processed correctly.


And the defaults like caching requests (which doesn't work for nxdomain) 
and limit the amount of qps per ip (which also doens't work for nxdomain 
attack because they use public resolvers).


When you say it doesn't work for NXDomain, I'm assuming you mean it 
doesn't solve the problem of random sub-domains attacks, not that a 
NXDomain is not properly cached/accounted?
When we simulate a nxdomain attack (with 200k qps and 500MBit of 
traffic) , we get a high load on the dnsdist server (50% CPU for dsndist 
and a lot of interrupts and context switches).


I expect lowering the number of threads will reduce the context switches 
a lot. If you are still not getting good QPS numbers, I would suggest 
checking if disabling the rules help, to figure out the bottleneck. You 
might also want to take a look with "perf top -p " 
during the high load to see where the CPU time is spent.



So the question from me to you are:
- how much qps are you able to push through dnsdist using a powerdns or 
bind backend


It really depends on the hardware you have and the rules you are 
enabling, but it's quite common to see people pushing 400k+ QPS on a 
single DNSdist without a lot of fine tuning, and a fair amount of 
remaining head-room.


- have I overlooked some tuning parameters, e.g. more kernel parameters 
or some dnsdist parameters


I shared a few parameters a while ago: [1].

- what is the best method of sending packets for a domain to a seperate 
backend, right we now we use 'addAction("", 
PoolAction("abuse")), but is this the least CPU intensive one? Are there 
better methods?


It's the best method and should be really cheap.

> I have seen eBPF socket filtering, but as far as I have seen that is 
for dropping unwanted packets.


Correct. You could look into enabling AF_XDP / XSK [2] but I would 
recommend checking that you really cannot get the performance you want 
with normal processing first, as AF_XDP has some rough edges.


[1]: https://mailman.powerdns.com/pipermail/dnsdist/2023-January/001271.html
[2]: https://dnsdist.org/advanced/xsk.html

Best regards,
--
Remi Gacogne
PowerDNS B.V


OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist