Hello Stuart, a few simple ideas to your tests: - have you inspected the per-thread CPU? Aren't some of the threads overloaded? - have you tried to get the statistics from the Bind server using the XML or JSON interface? It may bring you another insight to the errors. - I may have missed the connection count you use for testing - can you post it? More, how may entries do you have in your database? Can you share your named.conf (without any compromising entries)? - what is your network environment? How many switches/routers are there between your simulator and the Bind server host? - is Bind the only running process on the tested server? - what CPUs is the Bind server being run on? - is there numad running and while trying the taskset, have you selected the CPUs on the same processor? What does numastat show during the test? - how many UDP sockets are in use during your test?
Curious for the responses. Lukas Browne, Stuart <stuart.bro...@neustar.biz> writes: > Cheers Matthew. > > 1) Not seeing that error, seeing this one instead: > > 01-Jun-2017 01:46:27.952 client: warning: client 192.168.0.23#38125 > (x41fe848-f3d1-4eec-967e-039d075ee864.perf1000): error sending response: > would block > > Only seeing a few of them per run (out of ~70 million requests). > > Whilst I can see where this is raised in the BIND code (lib/isc/unix/socket.c > in doio_send), I don't understand the underlying reason for it being set > (errno == EWOULDBLOCK || errno == EAGAIN). > > I've not bumped wmem/rmem up as much as the link (only to 16MB, not 40MB), > but no real difference after tweaks. I did another run with stupidly-large > core.{rmem,wmem}_{max,default} (64MB), this actually degraded performance a > bit so over tuning isn't good either. Need to figure out a good balance here. > > I'd love to figure out what the math here should be. 'X number of > simultaneous connections multiplied by Y socket memory size = rmem' or some > such. > > 2) I am still seeing some udp receive errors and receive buffer errors; about > 1.3% of received packets. > > From a 'netstat' point of view, I see: > > Active Internet connections (servers and established) > Proto Recv-Q Send-Q Local Address Foreign Address State > udp 382976 17664 192.168.1.21:53 0.0.0.0:* > > The numbers in the receive queue stay in the 200-300k range whilst the > send-queue floats around the 20-40k range. wmem already bumped. > > 3) Huh, didn't know about this one. Bumped up the backlog, small increase in > throughput for my tests. Still need to figure out how to read sofnet_stat. > More google-fu in my future. > > After a reboot and the wmem/rmem/backlog increases, no longer any non-zero in > the 2nd column. > > 4) Yes, max_dgram_qlen is already set to 512. > > 5) Oo! new tool! :) > > -- > ... > 11 drops at location 0xffffffff815df171 > 854 drops at location 0xffffffff815e1c64 > 12 drops at location 0xffffffff815df171 > 822 drops at location 0xffffffff815e1c64 > ... _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users