Bug#518129: bind9 hangs: NXDOMAIN for recursive requests but serves authoritative zones
Version: 1:9.5.1.dfsg.P1-2 Hi, Just suffered the same problem! It sounds pretty nasty if you run a busy nameserver or just set a low cache size to restrict memory usage. I had max-cache-size 1m; which I think triggers the problem sooner. My best guess is that the cache becomes exhausted after several hours/days of running; old entries are purged from the cache, but unfortunately this includes the root hints. Is that a bug or misconfiguration on my part? It causes recursive queries to fail, although answers are still given from authoritative zones. My configuration is a little complicated: split-horizon with internal/external views, but only the internal view allows recursion and that's where I had problems. Relevant global options: options { // ... max-cache-size 1m; recursive-clients 256; }; Internal view options: view internal { match-clients { 192.168.0.0/16; 127.0.0.1/16; }; recursion yes; notify no; // prime the server with knowledge of the root servers zone . { type hint; file /etc/bind/db.root; }; // ... }; My root hints file was the 2008020400-serial that shipped with the Debian package, but I'll be updating that now. My workaround will be to set max-cache-size unlimited; for the time being. Regards, -- Steven Chamberlain ste...@pyro.eu.org signature.asc Description: OpenPGP digital signature
Bug#518129: bind9 hangs: NXDOMAIN for recursive requests but serves authoritative zones
Package: bind9 Version: 1:9.5.1.dfsg.P1-1 Severity: important Since we upgraded our bind9 name servers from Etch to Lenny we are experiencing occasional hangs. While all requests for authoritative zones are still answered correctly we can't seem to get replies for recursive queries. All we get is NXDOMAIN until we init.d/restart the bind process. Every tenth or so request is answered properly but the next request fails again with NXDOMAIN. So the successful response from the root servers doesn't seem to get served from the internal cache either. In that situation our log file fills up with: Mar 4 07:21:45 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:21:56 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:22:07 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:22:57 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:22:58 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:22:59 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:03 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:09 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:32 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:36 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:37 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:43 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:26:23 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:26:34 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:31:35 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:32:33 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:32:35 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:32:45 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:47 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:51 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:54 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:54 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:56 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:56 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:58 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:58 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:58 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:00 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:03 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:12 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:12 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:13 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:13 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:13 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found We traced (tshark) what's happening on the network and it seems like bind9 isn't even sending out requests to the internet if we send it a recursive query from inside/LAN. Instead is instantly replies with NXDOMAIN. This situation is happening every few days and requires a bind restart or else our clients can't run recursive queries any more (which apparently isn't making them happy). Our name server serves nearly 500 authoritative zones and is used as a forwarder for the internal/LAN clients. rndc status shows: == version: 9.5.1-P1 number of zones: 511 debug level: 0 xfers running: 0 xfers deferred: 0 soa queries in progress: 0 query logging is OFF recursive clients: 6/0/1000 tcp clients: 0/100 server is up and running == Our