Re: recursive-clients : recommended value for a high traffic recursive nameserver
Hello, It may or may not be relevant, but it sounds similar to a problem we had to solve a few months ago. Try the following query analysis - monitor the number of recursive queries in a given moment, and when it exceeds a certain threshold, send "rndc recursing" to Bind and have a look on the queries. Basically, we have find out there is and ongoing attack originating from China that has the following structure - a number of bogus domains is registrered, like "345qp.com.cn", etc, then target nameservers are listed as authoritative for it, and vast botnets of infected home routers/modems are told to send bogus queries for the domain. Your resolvers will start having problems you describe when the admin of the attacked authoritative servers realizes what's going on and stops responding to queries to these domains. That means your resolvers have to wait for timeout of each and everyone of these bogus queries which in the meantime blocks an amount of memory and processing time, and it adds up rather quickly, potentially overwhelming your hardware (basically, it's a huge abnormal peak contrasting with normal operation) The solution we chose is that we identify these bogus queries (they vastly outnumber legitimate queries), and we decide to sort of "blacklist" the given bogus domain for an amount of time in the sense that we no longer do a recursive query for the client, but we immediately and authoritatively answer NXDOMAIN for the query. While it is a deviation from the correct behavior, it conservers the resources of the resolver, because an immediate authoritative answer takes fraction of time, memory and cpu to resolve. False positives are of course possible, but with some degree of monitoring and whitelisting problematic domains (like google.com, yahoo.com, etc.), they can be rather rare. Hope this helps, don't hesitate to ask me for details. I think it maybe relevant to your situation. -- Best Regards, Daniel Ryšlink System Administrator Dial Telecom a. s. Křižíkova 36a/237 186 00 Praha 3, Česká Republika Tel.:+420.226204627 daniel.rysl...@dialtelecom.cz --- www.dialtelecom.cz Dial Telecom, a.s. Jednoduše se připojte --- On 11/24/2014 12:37 PM, Niall O'Reilly wrote: At Sun, 23 Nov 2014 21:00:15 -0800 (PST), blrmaani wrote: Our nameservers take upto 10KQPS (mostly NOERROR type most of the time). Twice or thrice a week, I have seen upto 10% of the queries are SERVFAIL and we have started exceeding the default value of 2000 for recursive-clients settings in BIND 9.9.x. Is there a recommended value for recursive-clients option assuming huge number of SERVFAIL queries once in a 2/3 days? I'm not convinced to increase it to some arbitrary huge number 20,000 or 200,000. I am looking for answer like - if your peak SERVFAIL queries are 2000/second, then your recursive-clients value should be N. I wouldn't expect that such an answer could make sense. Exhaustion of the active recursive-clients list and the generation of responses marked SERVFAIL are most likely different symptoms of the same problem. I think you'll need to identify this problem and then determine what action to take. Your resolver seems to be dealing with queries which are unanswerable and which are arriving in a quantity sufficient to fill the recursive-clients list. This may be due to rogue clients, misconfigured authoritative servers, network problems, or some combination of these. Your logs will help identify which. I hope this helps. Niall O'Reilly ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: recursive-clients : recommended value for a high traffic recursive nameserver
At Sun, 23 Nov 2014 21:00:15 -0800 (PST), blrmaani wrote: > > Our nameservers take upto 10KQPS (mostly NOERROR type most of the time). > > Twice or thrice a week, I have seen upto 10% of the queries are > SERVFAIL and we have started exceeding the default value of 2000 for > recursive-clients settings in BIND 9.9.x. > > Is there a recommended value for recursive-clients option assuming > huge number of SERVFAIL queries once in a 2/3 days? > > I'm not convinced to increase it to some arbitrary huge number > 20,000 or 200,000. > > I am looking for answer like - if your peak SERVFAIL queries are > 2000/second, then your recursive-clients value should be N. I wouldn't expect that such an answer could make sense. Exhaustion of the active recursive-clients list and the generation of responses marked SERVFAIL are most likely different symptoms of the same problem. I think you'll need to identify this problem and then determine what action to take. Your resolver seems to be dealing with queries which are unanswerable and which are arriving in a quantity sufficient to fill the recursive-clients list. This may be due to rogue clients, misconfigured authoritative servers, network problems, or some combination of these. Your logs will help identify which. I hope this helps. Niall O'Reilly ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: recursive-clients : recommended value for a high traffic recursive nameserver
At Sun, 23 Nov 2014 21:00:15 -0800 (PST), blrmaani wrote: > > Our nameservers take upto 10KQPS (mostly NOERROR type most of the time). > > Twice or thrice a week, I have seen upto 10% of the queries are > SERVFAIL and we have started exceeding the default value of 2000 for > recursive-clients settings in BIND 9.9.x. > > Is there a recommended value for recursive-clients option assuming > huge number of SERVFAIL queries once in a 2/3 days? > > I'm not convinced to increase it to some arbitrary huge number > 20,000 or 200,000. > > I am looking for answer like - if your peak SERVFAIL queries are > 2000/second, then your recursive-clients value should be N. I wouldn't expect that such an answer could make sense. Exhaustion of the active recursive-clients list and the generation of responses marked SERVFAIL are most likely different symptoms of the same problem. I think you'll need to identify this problem and then determine what action to take. Your resolver seems to be dealing with queries which are unanswerable and which are arriving in a quantity sufficient to fill the recursive-clients list. This may be due to rogue clients, misconfigured authoritative servers, network problems, or some combination of these. Your logs will help identify which. I hope this helps. Niall O'Reilly ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
recursive-clients : recommended value for a high traffic recursive nameserver
Our nameservers take upto 10KQPS (mostly NOERROR type most of the time). Twice or thrice a week, I have seen upto 10% of the queries are SERVFAIL and we have started exceeding the default value of 2000 for recursive-clients settings in BIND 9.9.x. Is there a recommended value for recursive-clients option assuming huge number of SERVFAIL queries once in a 2/3 days? I'm not convinced to increase it to some arbitrary huge number 20,000 or 200,000. I am looking for answer like - if your peak SERVFAIL queries are 2000/second, then your recursive-clients value should be N. Please help! thanks in advance Blr ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users