Re: BIND 9.16.25 "file descriptor exceeds limit" messages
On 01. 02. 22 15:43, Anand Buddhdev wrote: On 01/02/2022 15:33, Petr Špaček wrote: Hi Petr, As you correctly noticed, the log message "adjusted limit on open files from 4096 to 1048576" already shows that BIND adjusted OS-level file descriptor limit. The only way out is what Tony wrote in another thread: Add "-S " parameter to bump the built-in limit of 21000 FDs. This is BIND's limit as opposed to OS limit, so systemd-level settings cannot raise it. Thanks. I will try this out. The option does come with a warning though. ... or migrate to 9.18.0 which does not have this built-in limit anymore. I have packages ready. But I don't feel comfortable deploying this version in production. When 9.16 came out, it was branded as "stable" but it took several updates before it actually worked reliably for us. Version 9.18 has a lot of new code, and I am sure several things will be glitchy, so I will wait a while and see how it develops before considering it for any production servers here. That's understandable. We can only hope that not everyone will delay upgrading :-) On a more serious note, we have significantly expanded load testing with UDP traffic during the 9.17 development cycle, so hopefully, 9.18.0 has fewer rough edges than 9.16.0 had. I apologize for that bad experience. Since then, we have learned our lesson and have been working on test improvements. -- Petr Špaček -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND 9.16.25 "file descriptor exceeds limit" messages
On 01/02/2022 15:33, Petr Špaček wrote: Hi Petr, As you correctly noticed, the log message "adjusted limit on open files from 4096 to 1048576" already shows that BIND adjusted OS-level file descriptor limit. The only way out is what Tony wrote in another thread: Add "-S " parameter to bump the built-in limit of 21000 FDs. This is BIND's limit as opposed to OS limit, so systemd-level settings cannot raise it. Thanks. I will try this out. The option does come with a warning though. ... or migrate to 9.18.0 which does not have this built-in limit anymore. I have packages ready. But I don't feel comfortable deploying this version in production. When 9.16 came out, it was branded as "stable" but it took several updates before it actually worked reliably for us. Version 9.18 has a lot of new code, and I am sure several things will be glitchy, so I will wait a while and see how it develops before considering it for any production servers here. Regards, Anand -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND 9.16.25 "file descriptor exceeds limit" messages
On 01. 02. 22 13:30, Anand Buddhdev wrote: Hi Ondrej, Do you recommend setting LimitNOFILE=1048576 in the systemd unit file for BIND? I'm not Ondrej, but let me try: No, that would be redundant. As you correctly noticed, the log message "adjusted limit on open files from 4096 to 1048576" already shows that BIND adjusted OS-level file descriptor limit. The only way out is what Tony wrote in another thread: Add "-S " parameter to bump the built-in limit of 21000 FDs. This is BIND's limit as opposed to OS limit, so systemd-level settings cannot raise it. ... or migrate to 9.18.0 which does not have this built-in limit anymore. On 28/01/2022 15:03, Anand Buddhdev wrote: Hi Ondrej, It is 1024. I see named logging this: adjusted limit on open files from 4096 to 1048576 I thought there was no need to set LimitNOFILE=1048576 in the systemd unit file. Am I mistaken? -- Petr Špaček @ Internet Systems Consortium -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND 9.16.25 "file descriptor exceeds limit" messages
Hi Ondrej, Do you recommend setting LimitNOFILE=1048576 in the systemd unit file for BIND? Regards, Anand On 28/01/2022 15:03, Anand Buddhdev wrote: Hi Ondrej, It is 1024. I see named logging this: adjusted limit on open files from 4096 to 1048576 I thought there was no need to set LimitNOFILE=1048576 in the systemd unit file. Am I mistaken? -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND 9.16.25 "file descriptor exceeds limit" messages
On 28. 01. 22 16:28, Tony Finch wrote: Anand Buddhdev wrote: The server has many IP addresses. In named.conf, there are 129 IPv6 addresses in the "listen-on-v6" option and 128 IPv4 addresses in the "listen-on" option. The server begins running, but then repeatedly emits this log: general: error: socket: file descriptor exceeds limit (46474/21000) Hmm, (128+129)*88*2 == 45232, (2 == UDP + TCP) so the big number looks plausible. The 21000 limit comes from a hardcoded value for ISC_SOCKET_MAXSOCKETS. You can adjust -U (number of listeners) on the command line to avoid hitting the fixed MAXSOCKETS limit, and leave -n (max sockets) unset, at its default. You can also set ISC_SOCKET_MAXSOCKETS at build time, if you can work out how to wrangle the build system :-) Or go for 9.18.0 which does not have this limit anymore. -- Petr Špaček @ Internet Systems Consortium -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND 9.16.25 "file descriptor exceeds limit" messages
Anand Buddhdev wrote: > > The server has many IP addresses. In named.conf, there are 129 IPv6 addresses > in the "listen-on-v6" option and 128 IPv4 addresses in the "listen-on" option. > The server begins running, but then repeatedly emits this log: > > general: error: socket: file descriptor exceeds limit (46474/21000) Hmm, (128+129)*88*2 == 45232, (2 == UDP + TCP) so the big number looks plausible. The 21000 limit comes from a hardcoded value for ISC_SOCKET_MAXSOCKETS. You can adjust -U (number of listeners) on the command line to avoid hitting the fixed MAXSOCKETS limit, and leave -n (max sockets) unset, at its default. You can also set ISC_SOCKET_MAXSOCKETS at build time, if you can work out how to wrangle the build system :-) Tony. -- f.anthony.n.finchhttps://dotat.at/ Fair Isle, Faeroes: Southwest 6 to gale 8, occasionally severe gale 9 in Fair Isle, veering northwest gale 8 to storm 10. Rough or very rough, occasionally moderate at first in southeast Fair Isle, becoming very rough or high. Rain, squally showers later, wintry in Faeroes. Good, occasionally poor. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND 9.16.25 "file descriptor exceeds limit" messages
Hi Ondrej, It is 1024. I see named logging this: adjusted limit on open files from 4096 to 1048576 I thought there was no need to set LimitNOFILE=1048576 in the systemd unit file. Am I mistaken? Regards, Anand On 28/01/2022 14:47, Ondřej Surý wrote: Hi Anand, what is your open files limit before starting the server? (ulimit -n) ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND 9.16.25 "file descriptor exceeds limit" messages
Hi Anand, what is your open files limit before starting the server? (ulimit -n) Ondrej -- Ondřej Surý (He/Him) ond...@isc.org My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours. > On 28. 1. 2022, at 14:33, Anand Buddhdev wrote: > > I just tried to start BIND 9.16.25 on a server with 88 vCPUs, running CentOS > 7. Systemd is used to start BIND, and it emits the following: > > general: notice: starting BIND 9.16.25 (Extended Support Version) > general: notice: running on Linux x86_64 3.10.0-1160.24.1.el7.x86_64 #1 SMP > Thu Apr 8 19:51:47 UTC 2021 > general: notice: built with '--build=x86_64-redhat-linux-gnu' > '--host=x86_64-redhat-linux-gnu' '--program-prefix=' > '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' > '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' > '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' > '--libexecdir=/usr/libexec' '--localstatedir=/var' > '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' > '--infodir=/usr/share/info' '--sysconfdir=/etc/named' '--disable-static' > '--with-libtool' '--with-pic' '--without-python' > 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' > 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions > -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 > -mtune=generic' 'LDFLAGS=-Wl,-z,relro ' > 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' > general: notice: running as: named -f -L /var/log/named/named.log -u named > general: notice: compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-44) > general: notice: compiled with OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan > 2017 > general: notice: linked to OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017 > general: notice: compiled with zlib version: 1.2.7 > general: notice: linked to zlib version: 1.2.7 > general: notice: adjusted limit on open files from 4096 to 1048576 > general: info: found 88 CPUs, using 88 worker threads > general: info: using 88 UDP listeners per interface > general: info: using up to 21000 sockets > network: info: listening on IPv4 interface lo, 127.0.0.1#53 > ... > network: info: listening on IPv6 interface lo, ::1#53 > ... > general: info: sizing zone task pool based on 5486 zones > ... > general: notice: command channel listening on 127.0.0.1#953 > general: info: configuring command channel from '/etc/named/rndc.key' > general: error: socket: file descriptor exceeds limit (46474/21000) > general: notice: couldn't add command channel ::1#953: not enough free > resources > ... > > The server has many IP addresses. In named.conf, there are 129 IPv6 addresses > in the "listen-on-v6" option and 128 IPv4 addresses in the "listen-on" > option. The server begins running, but then repeatedly emits this log: > > general: error: socket: file descriptor exceeds limit (46474/21000) > > If I start named with "-n 8 -U 16", then I don't see these messages. Does ISC > have any guidance on running BIND on systems with lots of processors, and how > to tune the values of "-n" and "-U"? The values I'm using now (8 and 16 > respectively) were determined by trial and error for a system with 32 vCPUs. > > Regards, > Anand > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe > from this list > > ISC funds the development of this software with paid support subscriptions. > Contact us at https://www.isc.org/contact/ for more information. > > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
BIND 9.16.25 "file descriptor exceeds limit" messages
I just tried to start BIND 9.16.25 on a server with 88 vCPUs, running CentOS 7. Systemd is used to start BIND, and it emits the following: general: notice: starting BIND 9.16.25 (Extended Support Version) general: notice: running on Linux x86_64 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Apr 8 19:51:47 UTC 2021 general: notice: built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc/named' '--disable-static' '--with-libtool' '--with-pic' '--without-python' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS=-Wl,-z,relro ' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' general: notice: running as: named -f -L /var/log/named/named.log -u named general: notice: compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-44) general: notice: compiled with OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017 general: notice: linked to OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017 general: notice: compiled with zlib version: 1.2.7 general: notice: linked to zlib version: 1.2.7 general: notice: adjusted limit on open files from 4096 to 1048576 general: info: found 88 CPUs, using 88 worker threads general: info: using 88 UDP listeners per interface general: info: using up to 21000 sockets network: info: listening on IPv4 interface lo, 127.0.0.1#53 ... network: info: listening on IPv6 interface lo, ::1#53 ... general: info: sizing zone task pool based on 5486 zones ... general: notice: command channel listening on 127.0.0.1#953 general: info: configuring command channel from '/etc/named/rndc.key' general: error: socket: file descriptor exceeds limit (46474/21000) general: notice: couldn't add command channel ::1#953: not enough free resources ... The server has many IP addresses. In named.conf, there are 129 IPv6 addresses in the "listen-on-v6" option and 128 IPv4 addresses in the "listen-on" option. The server begins running, but then repeatedly emits this log: general: error: socket: file descriptor exceeds limit (46474/21000) If I start named with "-n 8 -U 16", then I don't see these messages. Does ISC have any guidance on running BIND on systems with lots of processors, and how to tune the values of "-n" and "-U"? The values I'm using now (8 and 16 respectively) were determined by trial and error for a system with 32 vCPUs. Regards, Anand ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: file descriptor exceeds limit
On 6/19/15, 4:07 PM, bind-users-boun...@lists.isc.org on behalf of /dev/rob0 bind-users-boun...@lists.isc.org on behalf of r...@gmx.co.uk wrote: On Fri, Jun 19, 2015 at 02:55:23PM -0500, I wrote: On Thu, Jun 18, 2015 at 11:11:16PM +, Mike Hoskins (michoski) wrote: snip Note that connection tracking can be a problem upstream as well, for the same reasons as described in the article. I would still turn off conntrack for UDP DNS upstream, unless you're using DNAT (yuck.) Oh ... hahaha ... I missed the @cisco.com, so I don't suppose you're using Linux on your upstream routers. :) The same idea applies regardless of implementation, of course. Quite alright... In past lives yes, and perhaps even internally at times (more often OpenBSD and pf)...though I won't admit that. ;-D Regardless, all input is welcome. I'll check out the KB article. I have sat for hours with the network team making sure their gear isn't touching my DNS packets in any perverted ways, but it's always good to triple check. Thanks! ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: file descriptor exceeds limit
On Fri, Jun 19, 2015 at 02:55:23PM -0500, I wrote: On Thu, Jun 18, 2015 at 11:11:16PM +, Mike Hoskins (michoski) wrote: snip Note that connection tracking can be a problem upstream as well, for the same reasons as described in the article. I would still turn off conntrack for UDP DNS upstream, unless you're using DNAT (yuck.) Oh ... hahaha ... I missed the @cisco.com, so I don't suppose you're using Linux on your upstream routers. :) The same idea applies regardless of implementation, of course. -- http://rob0.nodns4.us/ Offlist GMX mail is seen only if /dev/rob0 is in the Subject: ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: file descriptor exceeds limit
Am 19.06.2015 um 18:44 schrieb Mike Hoskins (michoski): I suppose the only way to avoid any intermediate firewalls would be to place everything you run on a LAN segment hanging directly off your router/Internet drop with host based firewalls well, if the router is from Cisco and has NAt enabled there are dns ALG's breaking zone-transfers in several ways been there done that until forced the ISP to never ever ship a default Cisco deivce to us signature.asc Description: OpenPGP digital signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: file descriptor exceeds limit
On Thu, Jun 18, 2015 at 11:11:16PM +, Mike Hoskins (michoski) wrote: On 6/18/15, 7:09 PM, Stuart Browne stuart.bro...@bomboratech.com.au wrote: Just wondering. You mention you're using RHEL6; are you also getting messages in 'dmesg' about connection tracking tables being full? You may need some 'NOTRACK' rules in your iptables. Just following along, for the record... On our side, iptables is completely disabled. We do that sort of thing upstream on dedicated firewalls. There is a Knowledge Base article about this: https://kb.isc.org/article/AA-01183/ Note that connection tracking can be a problem upstream as well, for the same reasons as described in the article. I would still turn off conntrack for UDP DNS upstream, unless you're using DNAT (yuck.) Just now getting time to reply to Cathy...more detail on that there. -- http://rob0.nodns4.us/ Offlist GMX mail is seen only if /dev/rob0 is in the Subject: ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: file descriptor exceeds limit
On 6/19/15, 1:16 PM, bind-users-boun...@lists.isc.org on behalf of Reindl Harald bind-users-boun...@lists.isc.org on behalf of h.rei...@thelounge.net wrote: Am 19.06.2015 um 18:44 schrieb Mike Hoskins (michoski): I suppose the only way to avoid any intermediate firewalls would be to place everything you run on a LAN segment hanging directly off your router/Internet drop with host based firewalls well, if the router is from Cisco and has NAt enabled there are dns ALG's breaking zone-transfers in several ways been there done that until forced the ISP to never ever ship a default Cisco deivce to us Over the years I've learned that trusting defaults is rarely sane, regardless of vendor. Having been involved in many discussions related to this sort of thing...I've sadly also learned that, much like BCP38, things which seem simple to fix from the outside often aren't. :-) ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: file descriptor exceeds limit
On 6/18/15, 7:09 PM, Stuart Browne stuart.bro...@bomboratech.com.au wrote: Just wondering. You mention you're using RHEL6; are you also getting messages in 'dmesg' about connection tracking tables being full? You may need some 'NOTRACK' rules in your iptables. Just following along, for the record... On our side, iptables is completely disabled. We do that sort of thing upstream on dedicated firewalls. Just now getting time to reply to Cathy...more detail on that there. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: file descriptor exceeds limit
Inline... On 6/18/15, 9:22 AM, Cathy Almond cat...@isc.org wrote: On 18/06/2015 12:00, Matus UHLAR - fantomas wrote: On 17.06.15 22:39, Shawn Zhou wrote: BIND on my resolvers reaches the max open file limit and I am getting lots of SERVFAILs http://pastebin.com/SxRsHLff After I increased the max-socks (-s 8192) to 8192, I no longer saw the file limit error from the log anymore; however, I am still many SERVFAILs. no other errors? Our resolvers were doing about 15k queries per seconds when this was happening and those were legit traffic. I am aware that I am setting recursive clients to a very high number. Those resolvers are running on 12-cores cpu and 24G RAM hardware. cpu utilization was at about 20% and plenty of RAM left. I am wondering if I've reached the limit of BIND for the amount of recursive queries it can serve. Any other tunings I should try? maybe changing number of recursive-clients, max-clients-per-query. Does EDNS work for you? EDNS problems often result to increased number of TCP queries which slows down resolution ... By the way, the resolvers are running RHEL 6.x. precise BIND version would help a bit more... seems RH6.6 contains 9.8.2 but that may be different for older RH6 versions. Unless you're running a build with --with-tuning=large (for which there are a number of caveats around the capacity of the machine etc..), then you don't really want to have a backlog of recursive clients that exceeds 3000-3500. If you're getting that many in your backlog, then as already highlighted to you, there is Something Wrong going on. We're running --with-tuning=large, but I think we are OK (128GB RAM, 32 cores). If there are other caveats to be aware of, please share. For years I kept recursive clients conservatively set (based on some of your docs, and community comments). I finally raised it much higher just to see what would happen (after having to repeatedly explain why blindly increasing that number wasn't a good thing), and it had no effect one way or another. Still got the servfails. We are in a somewhat unique situation, because we have batch type jobs generating rules/etc which often purposefully crawl the bad parts of the 'Net and in turn generate DNS requests for things which legitimately return servfail. However, we were getting increasingly consistent complaints from users about seeing servfails where they weren't expected. The biggest thing which helped for us was increasing DISC_SOCKET_MAXEVENTS. We're still digging to see if the remaining servfail reports are genuinely something we can tune around, or a symptom of the use case. You're probably running into other resource limits that will be what are causing the SERVFAIL responses you're still seeing despite increasing the maximum number of sockets that named can use. I would tune down the limit to 3000 and allow named to drop the oldest outstanding client queries when new ones need to be processed. I'm going to crank this back down in our environments. There is another logging category you can use (query-errors) that can tell you more, but it's probably not worth it in this instance. And I have another suggestion for what might be causing your backlog (apart from problems in the network path between your servers and the Internet authoritative servers), for which we have some soon-to-be-released new mitigation features (in 9.10.3): https://kb.isc.org/article/AA-01178 (this will be updated to reflect the features we will actually include in the upcoming release - but they're essentially going to be fetches-per-server and fetches-per-zone along with with improved logging/stats for both of those) There's going to be a webinar about both the problem and the mitigations on July 8th: https://www.facebook.com/events/100311766979499/ http://goo.gl/Z8idQf Looking forward to this. We've been sticking to 9.9.x (currently running 9.9.7) as an ESV release, but maybe 9.10 makes sense. Not sure how the community feels about that? For the record I've spent a lot of time with our network team looking at firewall logs, getting packet traces, etc and not found any smoking guns. We have a perhaps not so unique setup where the caches are in a DMZ, so clients talk through a firewall, and the DNS servers talk through a firewall. I've identified and fixed a number of issues along the way...enumerating here in case it helps anyone else. The internal firewall was oversubscribed, and at peak times would reset connections causing clients to retry which quickly wound up recursive clients. Replaced those firewalls, and that specific behavior got a lot better. The external firewall was sharing a PAT for all caches, which eventually exhausted 65k ports. Can't drop these direct on the 'Net for security reasons, but now have 1-to-1 NAT per cache and haven't seen this exact behavior sense. We do still routinely see that at least some of these also don't resolve manually from other
Re: file descriptor exceeds limit
On 17.06.15 22:39, Shawn Zhou wrote: BIND on my resolvers reaches the max open file limit and I am getting lots of SERVFAILs http://pastebin.com/SxRsHLff After I increased the max-socks (-s 8192) to 8192, I no longer saw the file limit error from the log anymore; however, I am still many SERVFAILs. no other errors? Our resolvers were doing about 15k queries per seconds when this was happening and those were legit traffic. I am aware that I am setting recursive clients to a very high number. Those resolvers are running on 12-cores cpu and 24G RAM hardware. cpu utilization was at about 20% and plenty of RAM left. I am wondering if I've reached the limit of BIND for the amount of recursive queries it can serve. Any other tunings I should try? maybe changing number of recursive-clients, max-clients-per-query. Does EDNS work for you? EDNS problems often result to increased number of TCP queries which slows down resolution ... By the way, the resolvers are running RHEL 6.x. precise BIND version would help a bit more... seems RH6.6 contains 9.8.2 but that may be different for older RH6 versions. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. LSD will make your ECS screen display 16.7 million colors ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
file descriptor exceeds limit
Hello, BIND on my resolvers reaches the max open file limit and I am getting lots of SERVFAILs http://pastebin.com/SxRsHLff After I increased the max-socks (-s 8192) to 8192, I no longer saw the file limit error from the log anymore; however, I am still many SERVFAILs. Our resolvers were doing about 15k queries per seconds when this was happening and those were legit traffic. I am aware that I am setting recursive clients to a very high number. Those resolvers are running on 12-cores cpu and 24G RAM hardware. cpu utilization was at about 20% and plenty of RAM left. I am wondering if I've reached the limit of BIND for the amount of recursive queries it can serve. Any other tunings I should try? By the way, the resolvers are running RHEL 6.x. Thanks,Shawn ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
At Thu, 29 Apr 2010 14:53:44 -0700, Dale Kiefling dale.kiefl...@cbs.com wrote: We have a Bind 9.7.0-P1 instance that is throwing the following errors: 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds limit (1024/1024) The fact that the FD limit is 1024 suggests your named uses select instead of epoll. As far as I know Linux kernel 2.6 should support epoll, so your named may have been built with --disable-epoll. What's the result of named -V? $ uname -a Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 EDT 2009 i686 athlon i386 GNU/Linux For a busy recursive server that could consume more than 1024 open sockets, select won't work well anyway. Even if you increase the FD limit it's quite likely that the server hits other scalability issues. So, if your named was built --disable-epoll, I'd suggest you to rebuild it with enabling epoll (which should be enabled by default on your Linux system) and try again. In any case, the assertion failure should be a bug, but right now I have no idea about how it happened. --- JINMEI, Tatuya Internet Systems Consortium, Inc. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
Dale: Sorry I emailed you directly. Im sending my response to the group. Dale: The limits.conf file will only set the high and low limit when you log in. Once you log out, the open file limit will go back to its default vaule. Read the man page for limits.conf. The below issue has caught all of us. *Excerpt is below* In general, individual limits have priority over group limits, so if you impose no limits for admin group, but one of the members in this group have a limits line, the user will have its limits set according to this line. Also, please note that all limit settings are set per login. They are not global, nor are they permanent; existing only for the duration of the session. On Fri, Apr 30, 2010 at 7:32 PM, Dale Kiefling dale.kiefl...@cbs.comwrote: Hey Ezra, Thanks for the reply. ulimit -Hn and ulimit -Sn report 8192. Wasn't sure if limits.conf would help or not. Dale On Apr 30, 2010, at 4:18 PM, Ezra Taylor wrote: Dale: The limits.conf file is not going to solve your problem. Read the man page for initscript and inittab. On Thu, Apr 29, 2010 at 5:53 PM, Dale Kiefling dale.kiefl...@cbs.comwrote: We have a Bind 9.7.0-P1 instance that is throwing the following errors: 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.122 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.123 general: error: socket: file descriptor exceeds limit (1024/1024) When we try to increase the socket value we are seeing assertion failures. Restarted named with the option -S 8192: Apr 26 19:20:54 ha1 named[3891]: socket.c:2781: INSIST(!sock-pending_recv) failed, back trace Apr 26 19:20:54 ha1 named[3891]: #0 0x806525b in ?? Apr 26 19:20:54 ha1 named[3891]: #1 0x7b4b57 in ?? Apr 26 19:20:54 ha1 named[3891]: #2 0x7dfc03 in ?? Apr 26 19:20:54 ha1 named[3891]: #3 0x7e16f9 in ?? Apr 26 19:20:54 ha1 named[3891]: #4 0x7e1979 in ?? Apr 26 19:20:54 ha1 named[3891]: #5 0x7e1be7 in ?? Apr 26 19:20:54 ha1 named[3891]: #6 0x61a49b in ?? Apr 26 19:20:54 ha1 named[3891]: #7 0x6fd42e in ?? Apr 26 19:20:54 ha1 named[3891]: exiting (due to assertion failure) Any advice given the info provided below? Let me know if I can provide more info. Dale $ dig +short version.bind chaos txt 9.7.0-P1 $ uname -a Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 EDT 2009 i686 athlon i386 GNU/Linux $ cat /etc/redhat-release CentOS release 5.3 (Final) $ cat /etc/security/limits.conf * hardnofile 8192 * softnofile 8192 ntp - memlock 32768 cat named.conf ... options { directory /var/opt/named; pid-file /etc/named.pid; notify yes; also-notify { }; recursion yes; allow-query { any; }; //edns-udp-size 512; }; ... unlimit -a reports: open files (-n) 8192 recent rndc stats: +++ Statistics Dump +++ (1271794427) ++ Incoming Requests ++ 108267159 QUERY 313 NOTIFY ++ Incoming Queries ++ 91731351 A 314215 NS 10840 SOA 2704323 PTR 4367570 MX 81 TXT 325 X25 9135705 1072 SRV 6 IXFR 1453 AXFR 218 ANY ++ Outgoing Queries ++ [View: default] 3077427 A 5991 NS 2113 SOA 44931 PTR 7552045 MX 53 TXT 41 X25 3218008 426 SRV 18 ANY [View: _bind] [View: _meta] ++ Name Server Statistics ++ 108267472 IPv4 requests received 3342 requests with EDNS(0) received 5600 TCP requests received 108051102 responses sent 4972 truncated responses sent 3342 responses with EDNS(0) sent 98180939 queries resulted in successful answer 101089523 queries resulted in authoritative answer 5075782 queries resulted in non authoritative answer 7 queries resulted in referral answer 3987640 queries resulted in nxrrset 1885481 queries resulted in SERVFAIL 3996719 queries resulted in NXDOMAIN 5660199 queries caused recursion 207266 duplicate queries received 7610 queries dropped 1456 requested transfers completed ++ Zone Maintenance Statistics ++ 9833 IPv4 notifies sent 301 IPv4 notifies received 268 notifies rejected 315214 IPv4 SOA queries sent 6 IPv4 AXFR requested 23 IPv4 IXFR requested
Re: Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
Dale: The limits.conf file is not going to solve your problem. Read the man page for initscript and inittab. On Thu, Apr 29, 2010 at 5:53 PM, Dale Kiefling dale.kiefl...@cbs.comwrote: We have a Bind 9.7.0-P1 instance that is throwing the following errors: 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.122 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.123 general: error: socket: file descriptor exceeds limit (1024/1024) When we try to increase the socket value we are seeing assertion failures. Restarted named with the option -S 8192: Apr 26 19:20:54 ha1 named[3891]: socket.c:2781: INSIST(!sock-pending_recv) failed, back trace Apr 26 19:20:54 ha1 named[3891]: #0 0x806525b in ?? Apr 26 19:20:54 ha1 named[3891]: #1 0x7b4b57 in ?? Apr 26 19:20:54 ha1 named[3891]: #2 0x7dfc03 in ?? Apr 26 19:20:54 ha1 named[3891]: #3 0x7e16f9 in ?? Apr 26 19:20:54 ha1 named[3891]: #4 0x7e1979 in ?? Apr 26 19:20:54 ha1 named[3891]: #5 0x7e1be7 in ?? Apr 26 19:20:54 ha1 named[3891]: #6 0x61a49b in ?? Apr 26 19:20:54 ha1 named[3891]: #7 0x6fd42e in ?? Apr 26 19:20:54 ha1 named[3891]: exiting (due to assertion failure) Any advice given the info provided below? Let me know if I can provide more info. Dale $ dig +short version.bind chaos txt 9.7.0-P1 $ uname -a Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 EDT 2009 i686 athlon i386 GNU/Linux $ cat /etc/redhat-release CentOS release 5.3 (Final) $ cat /etc/security/limits.conf * hardnofile 8192 * softnofile 8192 ntp - memlock 32768 cat named.conf ... options { directory /var/opt/named; pid-file /etc/named.pid; notify yes; also-notify { }; recursion yes; allow-query { any; }; //edns-udp-size 512; }; ... unlimit -a reports: open files (-n) 8192 recent rndc stats: +++ Statistics Dump +++ (1271794427) ++ Incoming Requests ++ 108267159 QUERY 313 NOTIFY ++ Incoming Queries ++ 91731351 A 314215 NS 10840 SOA 2704323 PTR 4367570 MX 81 TXT 325 X25 9135705 1072 SRV 6 IXFR 1453 AXFR 218 ANY ++ Outgoing Queries ++ [View: default] 3077427 A 5991 NS 2113 SOA 44931 PTR 7552045 MX 53 TXT 41 X25 3218008 426 SRV 18 ANY [View: _bind] [View: _meta] ++ Name Server Statistics ++ 108267472 IPv4 requests received 3342 requests with EDNS(0) received 5600 TCP requests received 108051102 responses sent 4972 truncated responses sent 3342 responses with EDNS(0) sent 98180939 queries resulted in successful answer 101089523 queries resulted in authoritative answer 5075782 queries resulted in non authoritative answer 7 queries resulted in referral answer 3987640 queries resulted in nxrrset 1885481 queries resulted in SERVFAIL 3996719 queries resulted in NXDOMAIN 5660199 queries caused recursion 207266 duplicate queries received 7610 queries dropped 1456 requested transfers completed ++ Zone Maintenance Statistics ++ 9833 IPv4 notifies sent 301 IPv4 notifies received 268 notifies rejected 315214 IPv4 SOA queries sent 6 IPv4 AXFR requested 23 IPv4 IXFR requested 29 transfer requests succeeded ++ Resolver Statistics ++ [Common] 570 mismatch responses received 151245 failures in opening query sockets [View: default] 13714283 IPv4 queries sent 186770 IPv6 queries sent 10815900 IPv4 responses received 31 IPv6 responses received 123548 NXDOMAIN received 955379 SERVFAIL received 33013 FORMERR received 806336 other errors received 382773 EDNS(0) query failures 442 truncated responses received 751147 lame delegations received 4759160 query retries 3103740 query timeouts 546721 IPv4 NS address fetches 1168510 IPv6 NS address fetches 80562 IPv4 NS address fetch failed 1158909 IPv6 NS address fetch failed 1527841 queries with RTT 10ms 4509306 queries with RTT 10-100ms
Re: Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
Hey Ezra, Thanks for the reply. ulimit -Hn and ulimit -Sn report 8192. Wasn't sure if limits.conf would help or not. Dale On Apr 30, 2010, at 4:18 PM, Ezra Taylor wrote: Dale: The limits.conf file is not going to solve your problem. Read the man page for initscript and inittab. On Thu, Apr 29, 2010 at 5:53 PM, Dale Kiefling dale.kiefl...@cbs.com wrote: We have a Bind 9.7.0-P1 instance that is throwing the following errors: 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.122 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.123 general: error: socket: file descriptor exceeds limit (1024/1024) When we try to increase the socket value we are seeing assertion failures. Restarted named with the option -S 8192: Apr 26 19:20:54 ha1 named[3891]: socket.c:2781: INSIST(!sock-pending_recv) failed, back trace Apr 26 19:20:54 ha1 named[3891]: #0 0x806525b in ?? Apr 26 19:20:54 ha1 named[3891]: #1 0x7b4b57 in ?? Apr 26 19:20:54 ha1 named[3891]: #2 0x7dfc03 in ?? Apr 26 19:20:54 ha1 named[3891]: #3 0x7e16f9 in ?? Apr 26 19:20:54 ha1 named[3891]: #4 0x7e1979 in ?? Apr 26 19:20:54 ha1 named[3891]: #5 0x7e1be7 in ?? Apr 26 19:20:54 ha1 named[3891]: #6 0x61a49b in ?? Apr 26 19:20:54 ha1 named[3891]: #7 0x6fd42e in ?? Apr 26 19:20:54 ha1 named[3891]: exiting (due to assertion failure) Any advice given the info provided below? Let me know if I can provide more info. Dale $ dig +short version.bind chaos txt 9.7.0-P1 $ uname -a Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 EDT 2009 i686 athlon i386 GNU/Linux $ cat /etc/redhat-release CentOS release 5.3 (Final) $ cat /etc/security/limits.conf * hardnofile 8192 * softnofile 8192 ntp - memlock 32768 cat named.conf ... options { directory /var/opt/named; pid-file /etc/named.pid; notify yes; also-notify { }; recursion yes; allow-query { any; }; //edns-udp-size 512; }; ... unlimit -a reports: open files (-n) 8192 recent rndc stats: +++ Statistics Dump +++ (1271794427) ++ Incoming Requests ++ 108267159 QUERY 313 NOTIFY ++ Incoming Queries ++ 91731351 A 314215 NS 10840 SOA 2704323 PTR 4367570 MX 81 TXT 325 X25 9135705 1072 SRV 6 IXFR 1453 AXFR 218 ANY ++ Outgoing Queries ++ [View: default] 3077427 A 5991 NS 2113 SOA 44931 PTR 7552045 MX 53 TXT 41 X25 3218008 426 SRV 18 ANY [View: _bind] [View: _meta] ++ Name Server Statistics ++ 108267472 IPv4 requests received 3342 requests with EDNS(0) received 5600 TCP requests received 108051102 responses sent 4972 truncated responses sent 3342 responses with EDNS(0) sent 98180939 queries resulted in successful answer 101089523 queries resulted in authoritative answer 5075782 queries resulted in non authoritative answer 7 queries resulted in referral answer 3987640 queries resulted in nxrrset 1885481 queries resulted in SERVFAIL 3996719 queries resulted in NXDOMAIN 5660199 queries caused recursion 207266 duplicate queries received 7610 queries dropped 1456 requested transfers completed ++ Zone Maintenance Statistics ++ 9833 IPv4 notifies sent 301 IPv4 notifies received 268 notifies rejected 315214 IPv4 SOA queries sent 6 IPv4 AXFR requested 23 IPv4 IXFR requested 29 transfer requests succeeded ++ Resolver Statistics ++ [Common] 570 mismatch responses received 151245 failures in opening query sockets [View: default] 13714283 IPv4 queries sent 186770 IPv6 queries sent 10815900 IPv4 responses received 31 IPv6 responses received 123548 NXDOMAIN received 955379 SERVFAIL received 33013 FORMERR received 806336 other errors received 382773 EDNS(0) query failures 442 truncated responses received 751147 lame delegations received 4759160 query retries 3103740 query timeouts 546721 IPv4 NS address fetches 1168510 IPv6 NS address fetches 80562 IPv4 NS address fetch failed 1158909 IPv6 NS address fetch failed 1527841 queries with RTT
Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
We have a Bind 9.7.0-P1 instance that is throwing the following errors: 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.122 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.123 general: error: socket: file descriptor exceeds limit (1024/1024) When we try to increase the socket value we are seeing assertion failures. Restarted named with the option -S 8192: Apr 26 19:20:54 ha1 named[3891]: socket.c:2781: INSIST(!sock-pending_recv) failed, back trace Apr 26 19:20:54 ha1 named[3891]: #0 0x806525b in ?? Apr 26 19:20:54 ha1 named[3891]: #1 0x7b4b57 in ?? Apr 26 19:20:54 ha1 named[3891]: #2 0x7dfc03 in ?? Apr 26 19:20:54 ha1 named[3891]: #3 0x7e16f9 in ?? Apr 26 19:20:54 ha1 named[3891]: #4 0x7e1979 in ?? Apr 26 19:20:54 ha1 named[3891]: #5 0x7e1be7 in ?? Apr 26 19:20:54 ha1 named[3891]: #6 0x61a49b in ?? Apr 26 19:20:54 ha1 named[3891]: #7 0x6fd42e in ?? Apr 26 19:20:54 ha1 named[3891]: exiting (due to assertion failure) Any advice given the info provided below? Let me know if I can provide more info. Dale $ dig +short version.bind chaos txt 9.7.0-P1 $ uname -a Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 EDT 2009 i686 athlon i386 GNU/Linux $ cat /etc/redhat-release CentOS release 5.3 (Final) $ cat /etc/security/limits.conf * hardnofile 8192 * softnofile 8192 ntp - memlock 32768 cat named.conf ... options { directory /var/opt/named; pid-file /etc/named.pid; notify yes; also-notify { }; recursion yes; allow-query { any; }; //edns-udp-size 512; }; ... unlimit -a reports: open files (-n) 8192 recent rndc stats: +++ Statistics Dump +++ (1271794427) ++ Incoming Requests ++ 108267159 QUERY 313 NOTIFY ++ Incoming Queries ++ 91731351 A 314215 NS 10840 SOA 2704323 PTR 4367570 MX 81 TXT 325 X25 9135705 1072 SRV 6 IXFR 1453 AXFR 218 ANY ++ Outgoing Queries ++ [View: default] 3077427 A 5991 NS 2113 SOA 44931 PTR 7552045 MX 53 TXT 41 X25 3218008 426 SRV 18 ANY [View: _bind] [View: _meta] ++ Name Server Statistics ++ 108267472 IPv4 requests received 3342 requests with EDNS(0) received 5600 TCP requests received 108051102 responses sent 4972 truncated responses sent 3342 responses with EDNS(0) sent 98180939 queries resulted in successful answer 101089523 queries resulted in authoritative answer 5075782 queries resulted in non authoritative answer 7 queries resulted in referral answer 3987640 queries resulted in nxrrset 1885481 queries resulted in SERVFAIL 3996719 queries resulted in NXDOMAIN 5660199 queries caused recursion 207266 duplicate queries received 7610 queries dropped 1456 requested transfers completed ++ Zone Maintenance Statistics ++ 9833 IPv4 notifies sent 301 IPv4 notifies received 268 notifies rejected 315214 IPv4 SOA queries sent 6 IPv4 AXFR requested 23 IPv4 IXFR requested 29 transfer requests succeeded ++ Resolver Statistics ++ [Common] 570 mismatch responses received 151245 failures in opening query sockets [View: default] 13714283 IPv4 queries sent 186770 IPv6 queries sent 10815900 IPv4 responses received 31 IPv6 responses received 123548 NXDOMAIN received 955379 SERVFAIL received 33013 FORMERR received 806336 other errors received 382773 EDNS(0) query failures 442 truncated responses received 751147 lame delegations received 4759160 query retries 3103740 query timeouts 546721 IPv4 NS address fetches 1168510 IPv6 NS address fetches 80562 IPv4 NS address fetch failed 1158909 IPv6 NS address fetch failed 1527841 queries with RTT 10ms 4509306 queries with RTT 10-100ms 3619163 queries with RTT 100-500ms 518078 queries with RTT 500-800ms 493598 queries with RTT 800-1600ms 147945 queries with RTT 1600ms [View: _bind] [View: _meta] ++ Cache DB RRsets ++ [View: default