Re: [Pdns-dev] PDNS Recursor functionality request re:SERVFAIL outages of today

John Todd Sat, 22 Oct 2016 09:52:18 -0700



On 21 Oct 2016, at 16:53, John Todd wrote:

As most of you know by now, today’s DynDNS outage due to DDoS attackcaused fairly widespread outages across a large number of domains.Authoritative resolvers seem to be a particularly interesting targetfor attackers as they are often smaller in scope (IP address range,transit size of authoritative resolver networks) than a full serviceoffering by a provider of multiple other services like HTTP. It seemsthat there may be some reasonable ways to respond to outages like thiswhich at a minimum will result in failures that are less “bad”than having no replies at all, and which can be implemented by DNSrecursors.
I’d like to propose an extension to PowerDNS Recursor for mitigating(partially) events like we had today where major authoritativenameservers were put out of commission. This might be a particularlyfoolish or error-prone method - it only took me a few minutes to thinkup. But I’d at least like to hear a discussion as to why thisisn’t a good idea. The comment of “But this might end up givingout the wrong answer!” is true, but I view a wrong answer as betterthan no answer. What would a domain operator USUALLY want to get?They’d want to get the inbound connection, rather than having userscompletely offline. This seems to be particularly valuable for TLD andother low-churn zones which may come under attack for variouspolitical reasons but which contain a significant number of NSrecords.
Having done plenty of OSS work, I’m sure the next comment will be“patches welcome.” ;-) I would be happy to pay some small amountof dollars to someone to write this, but I have little budget, highhopes, and no coders on staff at this level yet otherwise I would dojust that.
PowerDNS Recursor proposed feature extensions:

servfail-ttl-override
* Integer
* Default: 180
The recursor keeps all records for this amount of seconds after TTLexpiration. If the authoritative-provided TTL has expired, then lookupis performed on the query in a normal way. If that query fails due toa SERVFAIL, then the TTL timer on this “old” record is set back tozero and the “old” record is provided as a response. If anauthoritative server is marked as “down” due to repeated SERVFAILresponses (see packetcache-servfail-ttl) then the “old” record ishanded back immediately without a new query attempt, and the TTL timeris set back to zero to keep the answer in a state of perpetualvalidity as long as there are active queries occurring within theservfail-ttl-override interval and the authoritative server isresulting in SERVFAIL. (packetcache-servfail-ttl is on a rotatingtimer, and will try every X seconds, leading to one single querygetting delays during the next attempt cycle - other queries areimmediately replied to with the “old” answer.) An NXDOMAINresponse from an authoritative server clears “old” records inmemory immediately.This timer method is useful in situations where authoritativenameservers are being DDoS’ed and cannot provide responses, with theintent that some answer is better than no answer. If a domain operatorwishes to stop traffic to their site, then replies with NXDOMAINnegate this behavior. Only a nameserver being unreachable will resultin this cache being used as a last resort, and there is a timer formaximum duration of these old records being kept. Setting this valuelow will mean that highly-traffic’ed websites will typically alwaysreply with a result even if the authoritative nameservers areunreachable due to attack or network disconnect, but lessoften-queried domains may be removed from the cache leading to queryfailures. Setting this value high may lead to unexpected results forinfrequently-used domains which have dynamic results.
servfail-ttl-override-domain-exceptions
* Domains, comma separated

List of domains on which we never use the servfail-TTL-override method

servfail-ttl-override-server-exceptions
* IP addresses, comma separated
List of authoritative servers on which we never use theservfail-TTL-override method
JT

After some thought in the shower this morning, I think I need to updatemy original proposal. Instead of the refreshed timer being the TTL ofthe original record, the new TTL should be set to bepacketcache-servfail-ttl. This means that a refreshed record will onlystay in the cache as long as the authoritative server is unreachable.


JT

_______________________________________________
Pdns-dev mailing list
Pdns-dev@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-dev

Re: [Pdns-dev] PDNS Recursor functionality request re:SERVFAIL outages of today

Reply via email to