DNS Redundancy
The normal procedure on internet-connected systems is to set the resolv.conf file to include at least 2 domain name servers. Example: nameserver 139.78.100.1 nameserver 139.78.200.1 Last night, I had to take down our primary DNS for maintenance and lots of FreeBSD and Linux systems began having trouble of various kinds. While I expected the FreeBSD system I was on to hang for a couple of seconds and then start using the second DNS, it basically froze while some Linux boxes also began exhibiting similar behavior. I finally manually changed the resolv.conf on the system I was using to force the slave DNS to be first in the list and that helped, but loosing the primary DNS was not the slight slowdown one might expect. It was a full-blown outage. Are we missing some other configuration directive for Unix systems that would make the systems use the redundancy a little more gracefully than what happened? Otherwise, why have it if somebody has to manually intervene? The only thing we should have lost was dynamic updates. The outage lasted for 25 minutes or so but didn't resolve until the primary came back on line. This is my week for asking novice questions, but I don't get to see what happens when the master goes away all that often and what I saw wasn't pretty. Martin McCormick WB5AGZ Stillwater, OK Systems Engineer OSU Information Technology Department Telecommunications Services Group ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: DNS Redundancy
On Thu, Oct 21, 2010 at 06:32:09AM -0500, Martin McCormick mar...@dc.cis.okstate.edu wrote a message of 39 lines which said: Example: nameserver139.78.100.1 nameserver139.78.200.1 I always add: timeout:1 because the default timeout is 5 seconds, much too important to allow for a smooth fallback. Other options could be interesting, such as rotate. See resolv.conf(5). Unlike the failure of an authoritative name server, the failure of a resolver is not really transparent for the Unix stub resolver, as you have discovered. You may consider solutions using a redundancy at layer 3 such as VRRP or CARP. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: DNS Redundancy
On 21 Oct 2010, at 12:32, Martin McCormick wrote: The normal procedure on internet-connected systems is to set the resolv.conf file to include at least 2 domain name servers. Example: nameserver139.78.100.1 nameserver139.78.200.1 Last night, I had to take down our primary DNS for maintenance and lots of FreeBSD and Linux systems began having trouble of various kinds. While I expected the FreeBSD system I was on to hang for a couple of seconds and then start using the second DNS, it basically froze while some Linux boxes also began exhibiting similar behavior. I finally manually changed the resolv.conf on the system I was using to force the slave DNS to be first in the list and that helped, but loosing the primary DNS was not the slight slowdown one might expect. It was a full-blown outage. It's a good idea to keep your authoritative name service (for announcing DNS records for your part of the DNS) separate from your resolver name service (for mediating name service to the clients on your network). /etc/resolv.conf (or equivalent on other platforms) specifies where the client should look for resolver service. The addresses in there should best not be those of the master or slave server for your DNS zone(s). Without more detail, it's difficult to say exactly what chain of cause and effect led to your full-blown outage. It's well to bear in mind that the typical (Unix-like) client will always step through the nameserver addresses in the order in which they appear in /etc/resolv.conf. If you're planning to take one of them down for maintenance, and wish to avoid client-side delays, you need either to configure the clients in advance (for example, by using DHCP) with a different /etc/resolv.conf. Alternatively, you might instantiate the first address in the list on the second server. There is no one true way. On the other hand, dedicated resolver servers (at least those running BIND named) keep track of the state of the authoritative servers for the names for which they are processing queries, and automagically ignore any that are unreachable. This allows my customers (for example) to be spared delay when you take one of your authoritative servers down. Best regards, Niall O'Reilly ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: DNS Redundancy
On 21/10/10 12:50, Stephane Bortzmeyer wrote: Unlike the failure of an authoritative name server, the failure of a resolver is not really transparent for the Unix stub resolver, as you have discovered. You may consider solutions using a redundancy at layer 3 such as VRRP or CARP. Yeah, we've observed this. Our primary and secondary DNS IPs are actually virtual IPs; one is via a layer4 loadbalancer, the other via an eBGP injected route (for diversity) pointing at 4 real resolvers. You can alleviate it with nscd on the clients, but that has its own problems. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: DNS Redundancy
Stephane Bortzmeyer writes: On Thu, Oct 21, 2010 at 06:32:09AM -0500, Martin McCormick mar...@dc.cis.okstate.edu wrote a message of 39 lines which said: Example: nameserver 139.78.100.1 nameserver 139.78.200.1 I always add: timeout:1 because the default timeout is 5 seconds, much too important to allow for a smooth fallback. Other options could be interesting, such as rotate. See resolv.conf(5). Nearly off-topic, but how does one specify such options via dhcp? --- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. --- ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
DNS Redundancy, Round 2
A slightly different, but allied question: we are seeing a situation where (Red Hat or CentOS) servers with 2 nameservers in their resolv.conf files nearly hang in name resolution with 2 nameservers listed, but run quickly if one of the nameservers is deleted from the resolve.conf. Both the referenced nameservers are on the same internal subnet, 10.5.0.2, 10.5.0.3 The two internal nameservers are running AIXV5.3 and BIND 9.2.1 I haven't delved into this yet, but I'd welcome suggestions on where I should be looking. -- One must think like a hero to behave like a merely decent human being. - May Sarton Stewart Dean, Unix System Admin, Bard College, New York 12504 sd...@bard.edu voice: 845-758-7475, fax: 845-758-7035 ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: DNS Redundancy
On Thu, Oct 21, 2010 at 02:27:52PM +0100, lheck...@users.sourceforge.net lheck...@users.sourceforge.net wrote a message of 35 lines which said: Other options could be interesting, such as rotate. See resolv.conf(5). Nearly off-topic, but how does one specify such options via dhcp? It depends on the DHCP client you use. With pump, you can use --noresolvconf. For ISC client, man dhclient. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: DNS Redundancy
We have been very successful using any-casting whereby multiple, equivalently-configured DNS servers are placed throughout the network, all providing DNS service on the same virtual addresses, and these virtual addresses are host-routed (i.e. route with slash-32 netmask). The keys to this working well are: 1. Host routes are dynamically asserted or withdrawn based on health of the DNS service on each server. 2. Packet flow paths are stable across the network (for tcp based queries). 3. Publish two any-cast resolver addresses. I have seen people run dynamic routing protocols on the servers (e.g. ripv2 or ospf) combined with cron-driven health check scripts that control the dynamic routing of the virtual address. We have also used load balancers to handle the server health monitoring and the dynamic routing -- only because the load balancers happened to be convenient -- I would not use a load balancer otherwise. But I prefer the Cisco IP SLA idea to both monitor the server health and control the host routes (although I have not tested this). The stable path requirement is easy with Cisco CEF as long as you do not use per-packet load sharing. It is actually counter-productive to have two resolvers configured with this architecture, but to circumvent human nature, we publish two. There is absolutely no functional difference between the two, and there is no redundancy value for the second one -- they are both hosted on each and every one of the any-cast servers. The only reason for the the second resolver is to deter people from making up their own second resolver -- people expect two resolvers, and if you give them only one, they will go ahead and put something in as the second resolver -- even if you tell them not to. This is a very important aspect of having the architecture succeed in our environment. -- Gordon A. Lang - Original Message - From: Martin McCormick mar...@dc.cis.okstate.edu To: bind-us...@isc.org Sent: Thursday, October 21, 2010 7:32 AM Subject: DNS Redundancy The normal procedure on internet-connected systems is to set the resolv.conf file to include at least 2 domain name servers. Example: nameserver 139.78.100.1 nameserver 139.78.200.1 Last night, I had to take down our primary DNS for maintenance and lots of FreeBSD and Linux systems began having trouble of various kinds. While I expected the FreeBSD system I was on to hang for a couple of seconds and then start using the second DNS, it basically froze while some Linux boxes also began exhibiting similar behavior. I finally manually changed the resolv.conf on the system I was using to force the slave DNS to be first in the list and that helped, but loosing the primary DNS was not the slight slowdown one might expect. It was a full-blown outage. Are we missing some other configuration directive for Unix systems that would make the systems use the redundancy a little more gracefully than what happened? Otherwise, why have it if somebody has to manually intervene? The only thing we should have lost was dynamic updates. The outage lasted for 25 minutes or so but didn't resolve until the primary came back on line. This is my week for asking novice questions, but I don't get to see what happens when the master goes away all that often and what I saw wasn't pretty. Martin McCormick WB5AGZ Stillwater, OK Systems Engineer OSU Information Technology Department Telecommunications Services Group ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: DNS Redundancy
On 10/21/10 08:26, Gordon A. Lang wrote: It is actually counter-productive to have two resolvers configured with this architecture, but to circumvent human nature, we publish two. There is absolutely no functional difference between the two, and there is no redundancy value for the second one -- they are both hosted on each and every one of the any-cast servers. The only reason for the the second resolver is to deter people from making up their own second resolver -- people expect two resolvers, and if you give them only one, they will go ahead and put something in as the second resolver -- even if you tell them not to. This is a very important aspect of having the architecture succeed in our environment. I mentioned this in another thread (perhaps on another list!), but there are reasons you might want to have two separate redundant anycast clouds and configure two servers in client stub resolvers. Background: We have been doing anycast within our OSPF IGP since 1999 for DNS. Initially, we announced all resolver addresses from one set of anycast servers, and each server advertised all configured addresses (we had 4 back then for historical reasons). On very rare occasions, we would have a weird error where a system would be unable to fork new processes (such as the cron script to verify health of the server) or the kernel would get into a weird bogged-down state where named would effectively stop working but the system wouldn't get taken out of routing. (That one turned out to be a kernel bug.) Clients within the anycast catchment of such a server would be stuck talking over and over to the same broken server. We now have two separate sets of anycast servers so that the resolvers can still fail to a different set of servers as a last resort. Having the stub resolver's own failover mechanism in place provides an extra layer of protection, provided you have separate anycast clouds. This is now considered a best practice. See slide 38 of Woody's presentation here: http://www.pch.net/resources/papers/ipv4-anycast/ipv4-anycast.pdf michael ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users