Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
* Anand Buddhdev ana...@ripe.net [2014-08-15 09:39]: BGP sessions between the ASR 9 and each DNS server in the cluster, ExaBGP running on them announcing their loopback/service /32 + /128 address(es). Health check scripts on each service to probe for service ability, retract the announcement upon failure. We are doing this exact same thing on many RIPE NCC DNS servers, and it works very well. The other advantage of BGP is that as soon as you withdraw the announcement, the router stops sending traffic to the server. With OSPF, you have timeouts of several seconds before traffic stops arriving at a dead server. Can you share a bit more about the setup/tools RIPE uses for this? I want to do something like this but I'm still looking for good recommendations for heartbeat/keepalive tools that watch the DNS daemon and update ExaBGP accordingly. Regards Sebastian -- GPG Key: 0x93A0B9CE (F4F6 B1A3 866B 26E9 450A 9D82 58A2 D94A 93A0 B9CE) 'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE. -- Terry Pratchett, The Fifth Elephant ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
On 15/08/2014 00:00, Nat Morris wrote: BGP sessions between the ASR 9 and each DNS server in the cluster, ExaBGP running on them announcing their loopback/service /32 + /128 address(es). Health check scripts on each service to probe for service ability, retract the announcement upon failure. We are doing this exact same thing on many RIPE NCC DNS servers, and it works very well. The other advantage of BGP is that as soon as you withdraw the announcement, the router stops sending traffic to the server. With OSPF, you have timeouts of several seconds before traffic stops arriving at a dead server. Regards, Anand ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
We do the same with Quagga or BIRD on Linux and ospf daemon for georedundancy and load sharing with proximity for customer access to recursive bind resolvers. avoiding tedious specific need in our case, we have the primary/secondary DNS IPs announced as loopback by the system. We don't have any specific monitoring to bring OSFP down on the server since we have lots of them (4 per POP) and specific scripted and human monitoring 24x7, so if a server has issue the customer barely notices it before the human acts and bring down the server affected. we also had power surge in a POP that brought it offline entirely on DNS side (network was on dc while problem affected ac power only for some racks), and 30 seconds after the service was up again using dnses of another pop. very effective given the giant fail we had. about the timeouts, you don't need to wait if you bring down the loopbacks instead of the ospf daemon. after downing the loopbacks the ospf notifies he does not have those IPs anymore and upstream routers load share only on remaining servers. then you can shut the daemon down. I wondered if using the probe, but found the it was an overkill in our case since a simple transient hang in the network (STP issue, mismatched cabling) could have brought down an entire POP for a minor event. We preferred to have human monitoring instead since a 24x7 service was already there for network alarms and could easily correlate with other causes or real server issue. We didn't had a single sw failure in more then 7 years with four different installations (RHEL 3, Centos 4,5,6) in a very complex environment due to efficency and law constraints (we have upstream DNS providing DNS poisoning for law requirement and a shared caching for all the anycast dnses). Ciao, A. Il giorno 15/ago/2014, alle ore 09:46, Anand Buddhdev ana...@ripe.net ha scritto: On 15/08/2014 00:00, Nat Morris wrote: BGP sessions between the ASR 9 and each DNS server in the cluster, ExaBGP running on them announcing their loopback/service /32 + /128 address(es). Health check scripts on each service to probe for service ability, retract the announcement upon failure. We are doing this exact same thing on many RIPE NCC DNS servers, and it works very well. The other advantage of BGP is that as soon as you withdraw the announcement, the router stops sending traffic to the server. With OSPF, you have timeouts of several seconds before traffic stops arriving at a dead server. Regards, Anand ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs CONFIDENTIAL: This E-mail and any attachment are confidential and may contain reserved information. If you are not one of the named recipients, please notify the sender immediately. Moreover, you should not disclose the contents to any other person, or should the information contained be used for any purpose or stored or copied in any form. ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
I forgot to mention that you should disable proxy ARP for connected interfaces on Linux otherwise you'll trigger a bug in ASR code (confirmed on 4.2.x) that will loop route packets back an forth on the default instead of loadsharing to DNS. If anybody is interested, I can provide exact sysctl to workaround issue. Il giorno 15/ago/2014, alle ore 11:38, Costantino Andrea (Con) andrea.costant...@h3g.it ha scritto: We do the same with Quagga or BIRD on Linux and ospf daemon for georedundancy and load sharing with proximity for customer access to recursive bind resolvers. avoiding tedious specific need in our case, we have the primary/secondary DNS IPs announced as loopback by the system. We don't have any specific monitoring to bring OSFP down on the server since we have lots of them (4 per POP) and specific scripted and human monitoring 24x7, so if a server has issue the customer barely notices it before the human acts and bring down the server affected. we also had power surge in a POP that brought it offline entirely on DNS side (network was on dc while problem affected ac power only for some racks), and 30 seconds after the service was up again using dnses of another pop. very effective given the giant fail we had. about the timeouts, you don't need to wait if you bring down the loopbacks instead of the ospf daemon. after downing the loopbacks the ospf notifies he does not have those IPs anymore and upstream routers load share only on remaining servers. then you can shut the daemon down. I wondered if using the probe, but found the it was an overkill in our case since a simple transient hang in the network (STP issue, mismatched cabling) could have brought down an entire POP for a minor event. We preferred to have human monitoring instead since a 24x7 service was already there for network alarms and could easily correlate with other causes or real server issue. We didn't had a single sw failure in more then 7 years with four different installations (RHEL 3, Centos 4,5,6) in a very complex environment due to efficency and law constraints (we have upstream DNS providing DNS poisoning for law requirement and a shared caching for all the anycast dnses). Ciao, A. Il giorno 15/ago/2014, alle ore 09:46, Anand Buddhdev ana...@ripe.net ha scritto: On 15/08/2014 00:00, Nat Morris wrote: BGP sessions between the ASR 9 and each DNS server in the cluster, ExaBGP running on them announcing their loopback/service /32 + /128 address(es). Health check scripts on each service to probe for service ability, retract the announcement upon failure. We are doing this exact same thing on many RIPE NCC DNS servers, and it works very well. The other advantage of BGP is that as soon as you withdraw the announcement, the router stops sending traffic to the server. With OSPF, you have timeouts of several seconds before traffic stops arriving at a dead server. Regards, Anand ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs CONFIDENTIAL: This E-mail and any attachment are confidential and may contain reserved information. If you are not one of the named recipients, please notify the sender immediately. Moreover, you should not disclose the contents to any other person, or should the information contained be used for any purpose or stored or copied in any form. ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
On Fri, Aug 15, 2014 at 09:22:02AM +0200, Anand Buddhdev wrote: On 15/08/2014 00:00, Nat Morris wrote: BGP sessions between the ASR 9 and each DNS server in the cluster, ExaBGP running on them announcing their loopback/service /32 + /128 address(es). Health check scripts on each service to probe for service ability, retract the announcement upon failure. We are doing this exact same thing on many RIPE NCC DNS servers, and it works very well. The other advantage of BGP is that as soon as you withdraw the announcement, the router stops sending traffic to the server. With OSPF, you have timeouts of several seconds before traffic stops arriving at a dead server. You can tweak OSPF timers like hello and dead interval in order to increase the responsiveness of the health check. Cheers, -- Marcelo Gardini ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
On Thu, Aug 14, 2014 at 6:00 PM, Nat Morris n...@nuqe.net wrote: On 14 August 2014 18:48, Jake Zack jake.z...@cira.ca wrote: In the ASR 9xxx series with IOS XR, the “ipsla” that it has available doesn’t seem to do either TCP connections or UDP DNS queries. It seems my only real option is to monitor for ICMP reachability and nothing else. Anyone have a better solution? I’ve considered throwing a wrapper around BIND doing OSPF updates and such…but it seems unideal. What seems unideal about it? It is a well know and understood technique, relies only on open and tested core features. I'd suggest doing BGP instead of OSPF, but much of that is personal preference... BGP sessions between the ASR 9 and each DNS server in the cluster, ExaBGP running on them announcing their loopback/service /32 + /128 address(es). Yup, this also only uses well know, well understood systems - with anything like the Cisco solution you end up with vendor lock-in - and are subject to their whims (like what Jake described). ipsla is not part of their core features and so changes over releases / platforms. I'm sure they'd be happy to sell you an ACE though :-) Health check scripts on each service to probe for service ability, retract the announcement upon failure. -- Nat https://nat.ms +44 7531 750292 ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs -- I don't think the execution is relevant when it was obviously a bad idea in the first place. This is like putting rabid weasels in your pants, and later expressing regret at having chosen those particular rabid weasels and that pair of pants. ---maf ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
On Thu, 2014-08-14 at 17:48 +, Jake Zack wrote: Anyone doing this? Previously I’d been using Cisco 3945’s and 3845’s running standard IOS…thus using Cisco IP SLA + track to do DNS queries of each server and add/remove them from the cluster. In the ASR 9xxx series with IOS XR, the “ipsla” that it has available doesn’t seem to do either TCP connections or UDP DNS queries. It seems my only real option is to monitor for ICMP reachability and nothing else. Anyone have a better solution? I’ve considered throwing a wrapper around BIND doing OSPF updates and such…but it seems unideal. -Jake DNS Administrator – CIRA (.CA TLD) We are using a couple of small clusters of Linux Servers (Scientific linux (whitebox RHEL distribution) for recursive resolvers. They consist of 2 load balancers using a CMAN/Pacemaker cluster. The load balancing is done with the Linux kernel's IP Virtual SErvice (IPVS) featire. The resolver IPs are VIPs managed by the cluster. And the load balancers are setup to replication their connection tables to each other to add in seamless failover capabilities Also in the mix I run keepalived on the load balancers. Keepalived manages the IPVS configuration in conjunction with health checks for each of the back-end nodes. If a back-end node stop responding, the IPVS configuration is altered to remove that node from tthe cluster. And note that keepalived also implements a VRRP routing daemon for failover between a set of routers. (We don't use VRRP in our setup.) There are 4 back-end servers running just Bind as caching name servers with a few of our main authoritative zones as slaves. The load balancers have all of the back-end servers in their configurations, but we normally only have 2 back-end nodes servicing one of the resolver VIPs. The other two are set to weight 0. I can alter the weights in the lod balancers to bring back-end nodes in and out of service and to move them between resolver VIPs. I've clocked a resolver cluster (1 Load Balancers, 2 backend nodes and named caches flushed) north of 11,000 queries per second before it queries started to fail. I've been using a similar setup (minus the keepalived) for well over 7 years with out any major issues. The resolvers clusters have been running about 3 years without any major issues. -- Stephen L Johnson stephen.john...@arkansas.gov Unix Systems Administrator / DNS Hostmaster Department of Information Systems State of Arkansas 501-682-4339 ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
[dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
Anyone doing this? Previously I'd been using Cisco 3945's and 3845's running standard IOS...thus using Cisco IP SLA + track to do DNS queries of each server and add/remove them from the cluster. In the ASR 9xxx series with IOS XR, the ipsla that it has available doesn't seem to do either TCP connections or UDP DNS queries. It seems my only real option is to monitor for ICMP reachability and nothing else. Anyone have a better solution? I've considered throwing a wrapper around BIND doing OSPF updates and such...but it seems unideal. -Jake DNS Administrator - CIRA (.CA TLD) ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
On 14 August 2014 18:48, Jake Zack jake.z...@cira.ca wrote: In the ASR 9xxx series with IOS XR, the “ipsla” that it has available doesn’t seem to do either TCP connections or UDP DNS queries. It seems my only real option is to monitor for ICMP reachability and nothing else. Anyone have a better solution? I’ve considered throwing a wrapper around BIND doing OSPF updates and such…but it seems unideal. BGP sessions between the ASR 9 and each DNS server in the cluster, ExaBGP running on them announcing their loopback/service /32 + /128 address(es). Health check scripts on each service to probe for service ability, retract the announcement upon failure. -- Nat https://nat.ms +44 7531 750292 ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
BFD and qualified next hops if you don't want to do the whole-hog BGP solution that Nat proposed. ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
On Fri, Aug 15, 2014 at 01:47:38AM +0100, Alex Howells wrote: BFD and qualified next hops if you don't want to do the whole-hog BGP solution that Nat proposed. what are you using for bfd on the DNS server side? do you do a dns health check and stop the DNS server side bfd daemon if it fails? Thanks, Adi ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
Re: [dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)
On 15 August 2014 02:05, R.P. Aditya adi...@grot.org wrote: On Fri, Aug 15, 2014 at 01:47:38AM +0100, Alex Howells wrote: BFD and qualified next hops if you don't want to do the whole-hog BGP solution that Nat proposed. what are you using for bfd on the DNS server side? https://github.com/dyninc/OpenBFDD ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs