Re: Request for review of performance advice

2020-07-29 Thread Niall O'Reilly
On 9 Jul 2020, at 21:25, Havard Eidnes via bind-users wrote:

> 2e#1) Make sure your UDP socket *receive* buffers are big enough.
>   If on BSD, monitor for "dropped due to full socket buffers"
>   count in "netstat -s" output, and tune accordingly.  Note that
>   this may be a symptom of mis-tuning of other parts of BIND,
>   causing excessive CPU usage, which may contribute to this
>   problem.

I'm seeing some instances of "dropped due to no socket" on my FreeBSD
systems where my resolvers run.

I'm wondering

- whether and how I can address this with tuning, and also
- whether I'm wandering out of scope for this list.

Thanks in anticipation and/or apologies.
Niall
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Request for review of performance advice

2020-07-10 Thread Timothe Litt
These suggestions - like most performance articles - are oriented toward
achieving the highest performance with large configurations.  E.g. "How
big can/should you go to support big loads?"

That's useful for many users.  But there are also many people who run
smaller operations, where the goal is to provide adequate (or even
exceptional) performance with a minimum footprint. When BIND is one of
many services, overall performance can be improved by minimizing BIND's
resource requirements.  This is also true in embedded applications,
where footprint matters.

So a discussion about how to optimize for the smaller cases - what do
you trade-off?  What knobs can one turn down - and how far? would be a
useful part of or complement to the proposed article.  E.g. "How small
can/should you go when your loads are smaller?"

FWIW, a wizard - even just a spreadsheet - that encapsulates known
performance results might also be useful.  E.g. Given a processor,
number/size of zones, query rate, & type, produce a memory size, disk &
network I/O rates, and starting configuration parameters... Obviously,
this could become arbitrarily complicated, but a simple spreadsheet with
configuration (hardware & software) and performance data that's
searchable would give people a good starting point.  Especially if it's
real-world. (It can be challenging to map artificial
"performance"/stress tests done in a development/verification
environment to the real world...)  While full automation can be fun,
it's amazing how much one can get out of a spreadsheet with/autofilter. 
(For the next level, pivot tables and/or charts...)

Timothe Litt
ACM Distinguished Engineer
--
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed. 

On 07-Jul-20 21:57, Victoria Risk wrote:
> A while ago we created a KB article with tips on how to improve your
> performance with our Kea dhcp server. The tips were fairly obvious to
> our developers and this was pretty successful. We would like to do
> something similar for BIND, provide a dozen or so tips for how to
> maximize your throughput with BIND. However, as usual, everything is
> more complicated with BIND.
>
> Can those of you who care about performance, who have worked to
> improve your performance, share some of your suggestions that have the
> most impact?  Please also comment if you think any of these ideas
> below are stupid or dangerous. I have combined advice for resolvers
> and for authoritative servers, I hope it is clear which is which...
>
> The ideas we have fall into four general categories:
>
> System design
> 1a) Use a load balancerto specialize your resolvers and maximize your
> cache hit ratio.  A load balancer is traditionally designed to spread
> the traffic out evenly among a pool of servers, but it can also be
> used to concentrate related queries on one server to make its cache as
> hot as possible. For example, if all queries for domains in .info are
> sent to one server in a pool, there is a better chance that an answer
> will be in the cache there.
>
> 1b) If you have a large authoritative system with many servers,
> consider dedicating some machines to propagate transfers. These
> machines, called transfer servers, would not answer client queries,
> but just send notifies and process IXFR requests.
> 1c) Deploy ghost secondaries.  If you store copies of authoritative
> zones on resolvers (resolvers as undelegated secondaries), you can
> avoid querying those authoritative zones. The most obvious uses of
> this would be mirroring the root zone locally or mirroring your own
> authoritative zones on your resolver.
>
> we have other system design ideas that we suspect would help, but we
> are not sure, so I will wait to see if anyone suggests them.
>
> OS settings and the system environment
> 2a) Run on bare metal if possible, not on virtual machines or in the
> cloud. (any idea how much difference this makes? the only reference we
> can cite is pretty out of date
> - 
> https://indico.dns-oarc.net/event/19/contributions/234/attachments/217/411/DNS_perf_OARC_Apr_14.pdf
> )
>
> 2b) Consider using with-tuning-large.
> (https://kb.isc.org/docs/aa-01314) This is a compile time option, so
> not something you can switch on and off during production. 
>
> 2c) Consider which R/W lock choice you want to use -
> https://kb.isc.org/docs/choosing-a-read-write-lock-implementation-to-use-with-named
> For the highest tested query rates (> 100,000 queries per second),
> pthreads read-write locks with hyper-threading /enabled/ seem to be
> the best-performing choice by far.
>
> 2d) Pay attention to your choice of NIC cards. We have found wide
> variations in their performance. (Can anyone suggest what specifically
> to look for?)
>
> 2e) Make sure your socket send buffers are big enough. (not sure if
> this is obsolete advice, do we need to tell people how to tell if
> their buffers are causing delays?)
>
> 2f) When the number of C

Re: Request for review of performance advice

2020-07-09 Thread Havard Eidnes via bind-users
> OS settings and the system environment
...
> 2e) Make sure your socket send buffers are big enough. (not
> sure if this is obsolete advice, do we need to tell people how
> to tell if their buffers are causing delays?)

2e#1) Make sure your UDP socket *receive* buffers are big enough.
  If on BSD, monitor for "dropped due to full socket buffers"
  count in "netstat -s" output, and tune accordingly.  Note that
  this may be a symptom of mis-tuning of other parts of BIND,
  causing excessive CPU usage, which may contribute to this
  problem.
  
BTW, unbound has configuration options ("so-rcvbuf" / "so-sndbuf")
to tune these for only the name server; when I earlier looked for
something similar in BIND I could not find a corresponding option,
so had to do a system-wide tuning via sysctl, which isn't ideal, but
solved the problem in my case.

> named Features
> 3a) Minimize logging. Query logging is expensive (can cost you
> 20% or more of your throughput) so don't do it unless you
> are using the logs for something. Logging with dnstap is
> lower impact, but still fairly expensive.  Don't run in
> debug mode unless necessary.

3a#1) Do not configure BIND with --enable-querytrace.  It most
  probably doesn't do what you might think it does, and is a
  major drag on performance.
  
See above under the new "2e#1" for a possible symptom...

> 4b) Set an appropriate MTU for your network. Ensure that your
> network infrastructure supports EDNS and large UDP responses up
> to 4096.  Ensure that your network infrastructure allows transit
> for and reassembly of fragmented UDP packets (these will be
> large query responses if you are DNSSEC signing)

Well, isn't the major goal of DNS Flag Day 2020 to eliminate
fragmentation for various reasons (some of them security-related),
and recommends to set EDNS buffer size to 1232 instead of letting it
be the present default of BIND of 4096?

Best regards,

- Håvard
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Request for review of performance advice

2020-07-08 Thread Chuck Aurora

On 2020-07-07 20:57, Victoria Risk wrote:

A while ago we created a KB article with tips on how to improve your
performance with our Kea dhcp server. The tips were fairly obvious to
our developers and this was pretty successful. We would like to do
something similar for BIND, provide a dozen or so tips for how to
maximize your throughput with BIND. However, as usual, everything is
more complicated with BIND.

[big snip]

Any further suggestions, corrections or warnings are very welcome.


Vicky, I'd suggest separating these performance tips into two separate
articles: authoritative and recursive.  Lumping both together is going
to create more confusion.

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Request for review of performance advice

2020-07-08 Thread John Thurston


On 7/7/2020 5:57 PM, Victoria Risk wrote:
A while ago we created a KB article with tips on how to improve your 
performance with our Kea dhcp server. The tips were fairly obvious to 
our developers and this was pretty successful. We would like to do 
something similar for BIND, provide a dozen or so tips for how to 
maximize your throughput with BIND. However, as usual, everything is 
more complicated with BIND.


This is an excellent idea.

If it comes to fruition, I ask there be some guidance offered on when 
such optimizations are useful. I've seen places where such a guide-sheet 
is followed when the guidelines were suitable for a business with 10X or 
100X the traffic the customer sees.


That is, a configuration which benefits an organization seeing 100,000 
qps may be excessively complex (or brittle) for one seeing 100 qps.


--
Do things because you should, not just because you can.

John Thurston907-465-8591
john.thurs...@alaska.gov
Department of Administration
State of Alaska




Can those of you who care about performance, who have worked to improve 
your performance, share some of your suggestions that have the most 
impact?  Please also comment if you think any of these ideas below are 
stupid or dangerous. I have combined advice for resolvers and for 
authoritative servers, I hope it is clear which is which...


The ideas we have fall into four general categories:

System design
1a) Use a load balancerto specialize your resolvers and maximize your 
cache hit ratio.  A load balancer is traditionally designed to spread 
the traffic out evenly among a pool of servers, but it can also be used 
to concentrate related queries on one server to make its cache as hot as 
possible. For example, if all queries for domains in .info are sent to 
one server in a pool, there is a better chance that an answer will be in 
the cache there.


1b) If you have a large authoritative system with many servers, consider 
dedicating some machines to propagate transfers. These machines, called 
transfer servers, would not answer client queries, but just send 
notifies and process IXFR requests.


1c) Deploy ghost secondaries.  If you store copies of authoritative 
zones on resolvers (resolvers as undelegated secondaries), you can avoid 
querying those authoritative zones. The most obvious uses of this would 
be mirroring the root zone locally or mirroring your own authoritative 
zones on your resolver.


we have other system design ideas that we suspect would help, but we are 
not sure, so I will wait to see if anyone suggests them.


OS settings and the system environment
2a) Run on bare metal if possible, not on virtual machines or in the 
cloud. (any idea how much difference this makes? the only reference we 
can cite is pretty out of date - 
https://indico.dns-oarc.net/event/19/contributions/234/attachments/217/411/DNS_perf_OARC_Apr_14.pdf 
 
)


2b) Consider using with-tuning-large. (https://kb.isc.org/docs/aa-01314 
) 
This is a compile time option, so not something you can switch on and 
off during production.


2c) Consider which R/W lock choice you want to use - 
https://kb.isc.org/docs/choosing-a-read-write-lock-implementation-to-use-with-named 
 
For the highest tested query rates (> 100,000 queries per second), 
pthreads read-write locks with hyper-threading /enabled/seem to be the 
best-performing choice by far.


2d) Pay attention to your choice of NIC cards. We have found wide 
variations in their performance. (Can anyone suggest what specifically 
to look for?)


2e) Make sure your socket send buffers are big enough. (not sure if this 
is obsolete advice, do we need to tell people how to tell if their 
buffers are causing delays?)


2f) When the number of CPUs is very large (32 or more), the increase in 
UDP listeners may not provide any performance improvement and might 
actually reduce throughput slightly due to the overhead of the 
additional structures and tasks. We suggest trying different values of 
-U to find the optimal one for your production environment.



named Features
3a) Minimize logging. Query logging is expensive (can cost you 20% or 
more of your throughput) so don’t do it unless you are using the logs 
for something. Logging with dnstap is lower impact, but still fairly 
expensive. Don’t run in debug mode unless necessary.


3b) Use named.conf option minimal-responses yes; to reduce the amount of 
work that named needs to

Re: Request for review of performance advice

2020-07-07 Thread Browne, Stuart via bind-users
Just one quick one before I run off to lunch with regards to section 2:

- Try to avoid crossing NUMA boundaries. At high throughput, the context 
switching and far memory calls kills performance.

Stuart

From: bind-users  on behalf of Victoria Risk 

Date: Wednesday, 8 July 2020 at 11:58
To: bind-users 
Subject: Request for review of performance advice

A while ago we created a KB article with tips on how to improve your 
performance with our Kea dhcp server. The tips were fairly obvious to our 
developers and this was pretty successful. We would like to do something 
similar for BIND, provide a dozen or so tips for how to maximize your 
throughput with BIND. However, as usual, everything is more complicated with 
BIND.

Can those of you who care about performance, who have worked to improve your 
performance, share some of your suggestions that have the most impact?  Please 
also comment if you think any of these ideas below are stupid or dangerous. I 
have combined advice for resolvers and for authoritative servers, I hope it is 
clear which is which...

The ideas we have fall into four general categories:

System design
1a) Use a load balancer to specialize your resolvers and maximize your cache 
hit ratio.  A load balancer is traditionally designed to spread the traffic out 
evenly among a pool of servers, but it can also be used to concentrate related 
queries on one server to make its cache as hot as possible. For example, if all 
queries for domains in .info are sent to one server in a pool, there is a 
better chance that an answer will be in the cache there.

1b) If you have a large authoritative system with many servers, consider 
dedicating some machines to propagate transfers. These machines, called 
transfer servers, would not answer client queries, but just send notifies and 
process IXFR requests.


1c) Deploy ghost secondaries.  If you store copies of authoritative zones on 
resolvers (resolvers as undelegated secondaries), you can avoid querying those 
authoritative zones. The most obvious uses of this would be mirroring the root 
zone locally or mirroring your own authoritative zones on your resolver.

we have other system design ideas that we suspect would help, but we are not 
sure, so I will wait to see if anyone suggests them.

OS settings and the system environment
2a) Run on bare metal if possible, not on virtual machines or in the cloud. 
(any idea how much difference this makes? the only reference we can cite is 
pretty out of date - 
https://urldefense.com/v3/__https:/indico.dns-oarc.net/event/19/contributions/234/attachments/217/411/DNS_perf_OARC_Apr_14.pdf__;!!N14HnBHF!rk-RfzR0chw8mToGMWAwQAF_WiiXKZM3KXol3WR8YPytPoI_cWyNe5BZ_rsEqdV7T9SIQ1M$
 )

2b) Consider using with-tuning-large. 
(https://urldefense.com/v3/__https:/kb.isc.org/docs/aa-01314__;!!N14HnBHF!rk-RfzR0chw8mToGMWAwQAF_WiiXKZM3KXol3WR8YPytPoI_cWyNe5BZ_rsEqdV7ufSMbnU$)
 This is a compile time option, so not something you can switch on and off 
during production. 

2c) Consider which R/W lock choice you want to use - 
https://urldefense.com/v3/__https:/kb.isc.org/docs/choosing-a-read-write-lock-implementation-to-use-with-named__;!!N14HnBHF!rk-RfzR0chw8mToGMWAwQAF_WiiXKZM3KXol3WR8YPytPoI_cWyNe5BZ_rsEqdV7mVVUg4A$
 For the highest tested query rates (> 100,000 queries per second), pthreads 
read-write locks with hyper-threading enabled seem to be the best-performing 
choice by far.


2d) Pay attention to your choice of NIC cards. We have found wide variations in 
their performance. (Can anyone suggest what specifically to look for?)


2e) Make sure your socket send buffers are big enough. (not sure if this is 
obsolete advice, do we need to tell people how to tell if their buffers are 
causing delays?)

2f) When the number of CPUs is very large (32 or more), the increase in UDP 
listeners may not provide any performance improvement and might actually reduce 
throughput slightly due to the overhead of the additional structures and tasks. 
We suggest trying different values of -U to find the optimal one for your 
production environment.




named Features
3a) Minimize logging. Query logging is expensive (can cost you 20% or more of 
your throughput) so don’t do it unless you are using the logs for something. 
Logging with dnstap is lower impact, but still fairly expensive. Don’t run in 
debug mode unless necessary. 


3b) Use named.conf option minimal-responses yes; to reduce the amount of work 
that named needs to do to assemble the query response as well as reducing the 
amount of outbound traffic


3c) Disable synth-from-dnssec. While this seemed like a good idea, it turns 
out, in practice it does not improve performance.


3d) Tune your zone transfers. 
(https://urldefense.com/v3/__https:/kb.isc.org/docs/aa-00726__;!!N14HnBHF!rk-RfzR0chw8mToGMWAwQAF_WiiXKZM3KXol3WR8YPytPoI_cWyNe5BZ_rsEqdV7K_7-VnQ$)
When tuning the behavior of the primary, there are several factors that you can 
control:
- The rate of notifications

Request for review of performance advice

2020-07-07 Thread Victoria Risk
A while ago we created a KB article with tips on how to improve your 
performance with our Kea dhcp server. The tips were fairly obvious to our 
developers and this was pretty successful. We would like to do something 
similar for BIND, provide a dozen or so tips for how to maximize your 
throughput with BIND. However, as usual, everything is more complicated with 
BIND.

Can those of you who care about performance, who have worked to improve your 
performance, share some of your suggestions that have the most impact?  Please 
also comment if you think any of these ideas below are stupid or dangerous. I 
have combined advice for resolvers and for authoritative servers, I hope it is 
clear which is which...

The ideas we have fall into four general categories:

System design
1a) Use a load balancer to specialize your resolvers and maximize your cache 
hit ratio.  A load balancer is traditionally designed to spread the traffic out 
evenly among a pool of servers, but it can also be used to concentrate related 
queries on one server to make its cache as hot as possible. For example, if all 
queries for domains in .info are sent to one server in a pool, there is a 
better chance that an answer will be in the cache there.

1b) If you have a large authoritative system with many servers, consider 
dedicating some machines to propagate transfers. These machines, called 
transfer servers, would not answer client queries, but just send notifies and 
process IXFR requests.

1c) Deploy ghost secondaries.  If you store copies of authoritative zones on 
resolvers (resolvers as undelegated secondaries), you can avoid querying those 
authoritative zones. The most obvious uses of this would be mirroring the root 
zone locally or mirroring your own authoritative zones on your resolver.

we have other system design ideas that we suspect would help, but we are not 
sure, so I will wait to see if anyone suggests them.

OS settings and the system environment
2a) Run on bare metal if possible, not on virtual machines or in the cloud. 
(any idea how much difference this makes? the only reference we can cite is 
pretty out of date - 
https://indico.dns-oarc.net/event/19/contributions/234/attachments/217/411/DNS_perf_OARC_Apr_14.pdf
 

 )

2b) Consider using with-tuning-large. (https://kb.isc.org/docs/aa-01314 
) This is a compile time option, so not 
something you can switch on and off during production. 

2c) Consider which R/W lock choice you want to use - 
https://kb.isc.org/docs/choosing-a-read-write-lock-implementation-to-use-with-named
 

 For the highest tested query rates (> 100,000 queries per second), pthreads 
read-write locks with hyper-threading enabled seem to be the best-performing 
choice by far.

2d) Pay attention to your choice of NIC cards. We have found wide variations in 
their performance. (Can anyone suggest what specifically to look for?)

2e) Make sure your socket send buffers are big enough. (not sure if this is 
obsolete advice, do we need to tell people how to tell if their buffers are 
causing delays?)

2f) When the number of CPUs is very large (32 or more), the increase in UDP 
listeners may not provide any performance improvement and might actually reduce 
throughput slightly due to the overhead of the additional structures and tasks. 
We suggest trying different values of -U to find the optimal one for your 
production environment.


named Features
3a) Minimize logging. Query logging is expensive (can cost you 20% or more of 
your throughput) so don’t do it unless you are using the logs for something. 
Logging with dnstap is lower impact, but still fairly expensive.  Don’t run in 
debug mode unless necessary. 

3b) Use named.conf option minimal-responses yes; to reduce the amount of work 
that named needs to do to assemble the query response as well as reducing the 
amount of outbound traffic

3c) Disable synth-from-dnssec. While this seemed like a good idea, it turns 
out, in practice it does not improve performance.

3d) Tune your zone transfers.  (https://kb.isc.org/docs/aa-00726 
)
When tuning the behavior of the primary, there are several factors that you can 
control:

- The rate of notifications of changes to secondary servers (serial-query-rate 
and notify-delay)

- Limits on concurrent zone transfers (transfers-out, tcp-clients, 
tcp-listen-queue, reserved-sockets)

- Efficiency/management options (max-transfer-time-out, max-transfer-idle-out, 
transfer-format)

The most important options to focus on are transfers-out, serial-query-rate, 
tcp-clients and tcp-listen-queue.

4e) If you use RPZ, consider using qnane-wait-recurse. We have had issues with 
RPZ transfers impacting query performance in resolvers. In general, more 
smaller RPZ zones