Re: Notify storms
I agree - we are removing 1/2 the masters in a couple weeks to help things. Slaves only talk to masters, there are no slaves of slaves as we refer to them. Our architecture goal has been uniformity among the configurations, and this is part of the price we pay for that. At a functional level, this will persist, just not with as much traffic. As a result of our use of CVS, and our inability to control notifies, whenever we push out big updates, the first master takes all the traffic, while the others sit there unused. We've experimented with breaking up the notifies by pushing updates in chunks to various masters, but that really breaks things, both process-wise and logically. What we really need is some way to say there is an update, any one of these servers has an acceptable version of the update instead of hey, there's an update hey, there's an update hey there's an update and having each slave go to each of the masters. While it seems trivial operationally to handle these loads, and we're not concerned about network bandwidth, a single master can only manage so many transactions at once. I doubt we're even in the top 50% of deployments, zone count wise, so I am confident that our number of zones isn't an issue. But I suspect that 80+ slaves are a little out there. with 80 slaves and 1800 zones, each master sends out 144000 notifies (for major changes/a master reload), which triggers 144000 SOA queries back to the master very quickly. That is bound to cause delays. One option we've considered is making our MASTERS and NS records point to an anycast IP/load balancer so that there can be multiple masters answering for the same notify. Another option would be to stop all notifies altogether, then figure out a way to manually trigger notifies (generating notifys via perl script/something clever) so we can control where the notifies come from. When all the EU DNS servers get notifies first from an NA master, they grab the data from there, so being able to control notifies would be nice sometimes. Thankfully we're mid-rearchitecture, and this will (hopefully) be torn out soon, but until it is we need to make sure that our users can manage their changes in a reasonable manner. A for loop doing rndc retransfer for changed zones, which seems to bypass all the congestion, is a short term fix until we can figure out how to make things a little smoother. Apologies for the wall of text - this is a frequent discussion with very little in the way of conclusion around here :) Todd. On Wed, Jan 20, 2010 at 10:33 PM, Joseph S D Yao j...@tux.org wrote: On Wed, Jan 20, 2010 at 03:52:33PM -0500, Todd wrote: serial-query-rate While this appears to be helping in the lab, it's still taking between 2 and 3 minutes for each slave to even finish receiving the NOTIFYs from the master. They then start hitting the master(s) with SOA queries whch seems to take a really long time. Your NOTIFY tree sounds like it's many-to-many. Maybe you should be using a sparser tree. -- /*\ ** ** Joe Yao j...@tux.org - Joseph S. D. Yao ** \*/ ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
named 9.6.1 Filling wtmp
We have BIND 9.6.1-P3 running on several AIX 5.3 servers. On one of them, named is filling /var/adm/wtmp with numerous entries like the following. user pts/1 pts/1 7 1327240 1264089183 host-NN.domain Thu Jan 21 10:53:03 EST 2010 named 8 2572472 1264089217Thu Jan 21 10:53:37 EST 2010 named 8 2572472 1264089217Thu Jan 21 10:53:37 EST 2010 named 8 2572472 1264089277Thu Jan 21 10:54:37 EST 2010 named 8 2572472 1264089277Thu Jan 21 10:54:37 EST 2010 named 8 2572472 1264089337Thu Jan 21 10:55:37 EST 2010 named 8 2572472 1264089337Thu Jan 21 10:55:37 EST 2010 named 8 2572472 1264089337Thu Jan 21 10:55:37 EST 2010 named 8 2572472 1264089397Thu Jan 21 10:56:37 EST 2010 named 8 2572472 1264089397Thu Jan 21 10:56:37 EST 2010 named 8 2572472 1264089397Thu Jan 21 10:56:37 EST 2010 named 8 2572472 1264089457Thu Jan 21 10:57:37 EST 2010 named 8 2572472 1264089457Thu Jan 21 10:57:37 EST 2010 named 8 2572472 1264089457Thu Jan 21 10:57:37 EST 2010 named 8 2572472 1264089517Thu Jan 21 10:58:37 EST 2010 named 8 2572472 1264089517Thu Jan 21 10:58:37 EST 2010 named 8 2572472 1264089517Thu Jan 21 10:58:37 EST 2010 named 8 2572472 1264089577Thu Jan 21 10:59:37 EST 2010 named 8 2572472 1264089577Thu Jan 21 10:59:37 EST 2010 named 8 2572472 1264089577Thu Jan 21 10:59:37 EST 2010 named 8 2572472 1264089637Thu Jan 21 11:00:37 EST 2010 named 8 2572472 1264089637Thu Jan 21 11:00:37 EST 2010 named 8 2572472 1264089637Thu Jan 21 11:00:37 EST 2010 named 8 2572472 1264089697Thu Jan 21 11:01:37 EST 2010 named 8 2572472 1264089697Thu Jan 21 11:01:37 EST 2010 named 8 2572472 1264089697Thu Jan 21 11:01:37 EST 2010 named 8 2572472 1264089757Thu Jan 21 11:02:37 EST 2010 named 8 2572472 1264089757Thu Jan 21 11:02:37 EST 2010 named 8 2572472 1264089757Thu Jan 21 11:02:37 EST 2010 named 8 2572472 1264089817Thu Jan 21 11:03:37 EST 2010 named 8 2572472 1264089817Thu Jan 21 11:03:37 EST 2010 named 8 2572472 1264089817Thu Jan 21 11:03:37 EST 2010 named 8 2572472 1264089877Thu Jan 21 11:04:37 EST 2010 named 8 2572472 1264089877Thu Jan 21 11:04:37 EST 2010 named 8 2572472 1264089937Thu Jan 21 11:05:37 EST 2010 named 8 2572472 1264089937Thu Jan 21 11:05:37 EST 2010 What is going on? How do we correct this issue? ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Strange CNAME issue
Thanks for your response. I didn't know about the +trace option in dig. After some more searching, I believe you are correct about the long responses being related. The responses that fail all seem to exceed 512-bytes. Why this would happen in multiple locations is a mystery but perhaps our firewalls are configured similarly. I'll look into the firewall settings on my end, but since there may be other devices out there configured similarly I'll need to try and reduce my CNAMES to find into a 512-byte response or switch them to A records. -seren On Jan 20, 2010, at 1:48 AM, Niall O'Reilly wrote: seren wrote: Hi, I've run into some strange issues with BIND and CNAMES. The examples you show indicate strange issues only with whatever name server code is running on your localhost. Nothing in your examples actually identify this as BIND. We're using BIND9 (on Ubuntu) internally and have our external DNS hosted by NetworkSolutions. Occasionally I'll be able to create a CNAME in NetworkSolutions that BIND is unable to resolve. Using dig I notice it's doing a query for an A record, This is the record type use by dig in default of a specific type on the command line. and in most cases this works even if the entry is a CNAME. In the cases where it fails, I see either a timeout error or a SERVFAIL. Your local instance of named is respectively either not responding, or reporting an error. Have you looked in your logs for more information? Have you tried 'dig +trace'? If I then do a dig query specifying a CNAME, I get a quick successful result and subsequent queries to BIND succeed, until the record expires from the cache. The records that fail don't seem to have anything in common besides them all being CNAMES and longer names seeming to fail more. Both BIND9 and two windows-based DNS servers fail with the exact same records, however Google (8.8.8.8) and several other public DNS services resolve them fine. I think you need to ask what's different between (on the one hand) your BIND9 and windows-based name servers and (on the other) name servers which you tell us work: if not in the configuration, then in the environment. Are all of your failing name servers behind the same firewall? If so, does the firewall allow DNS queries and responses over TCP as well as UDP? Does the firewall perhaps break long responses? I ask because I've noticed some truncation and fallback to TCP when I use 'dig +trace' to query for one of the names you've mentioned as failing. Best regards, Niall O'Reilly University College Dublin IT Services ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Added new master zone, copy .hosts does not replicate properly
We run 2 BIND in master/master configuration in two geographically separate datacenters. This is done because a master/slave configuration has certain limitations if the master goes down (slave can not be easily modified). The setup we have works great -- we make changes to A or MX records in MASTER-A and there is a script that runs every 5 minutes that copies these files to MASTER-B: /var/named/chroot/var/named/*.hosts and *.rev So I created a new master zone on MASTER-A, call it test.com, it creates a new file test.com.hosts on MASTER-A This file is replicated (copied) to MASTER-B but even though the script reloads BIND on MASTER-B it does not work. I think there might be a file that has a list of all the hosts or is there some other file I need to also copy over ? I checked the directory structures and can not find it. So my setup has been working great modifying existing zones adding and removing records. But when I add a new zone, it apparently does not work. So I think I am missing an important file that lists all the zones BIND uses? What BIND file am I needing to copy to properly replicate MASTER-A to MASTER-B? I hope this is something very simple .. _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/196390710/direct/01/___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Added new master zone, copy .hosts does not replicate properly
On 1/21/10 3:40 PM, Ryan S wrote: So my setup has been working great modifying existing zones adding and removing records. But when I add a new zone, it apparently does not work. So I think I am missing an important file that lists all the zones BIND uses? What BIND file am I needing to copy to properly replicate MASTER-A to MASTER-B? I hope this is something very simple .. named.conf, perhaps? ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: named 9.6.1 Filling wtmp
In message 6b845b73-065f-4e8b-afa5-408ecdbe7...@govnet.state.vt.us, David Kre indler writes: We have BIND 9.6.1-P3 running on several AIX 5.3 servers. On one of them, nam ed is filling /var/adm/wtmp with numerous entries like the following. This is not named (the program). It may be su or some other process that is logging changes in uid. Or it could be someone login in as the user named. Mark user pts/1 pts/1 7 1327240 1264089183 host-NN.domain Thu Jan 21 10: 53:03 EST 2010 named 8 2572472 1264089217Thu Jan 21 10: 53:37 EST 2010 named 8 2572472 1264089217Thu Jan 21 10: 53:37 EST 2010 named 8 2572472 1264089277Thu Jan 21 10: 54:37 EST 2010 -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Fatal Error in resolver.c
Thank you very much for your bug report. For your information, you can also submit bugs to our bind9-bugs AT isc.org email address. Your issue is now being tracked as ticket # 20923. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
BIND 9.6 - IXFR to slaves not working after manual zone edit.
Hi folks, According to DNS and BIND (the book) and other sources, BIND from 9.3 on is able to calculate differences for a zone file that is manually edited by using the following methodology: rndc freeze zone perform-updates-on-the-zone-file rndc thaw zone http://books.google.com/books?id=HggtWI1ShvMCpg=PT263dq=%22this+means+you+can+now+%28or+again%29+edit+zone+datafiles+manually.%22cd=1#v=onepageq=f=false All servers have ixfr-from-differences yes; appears in options { } on the master and slave servers. I have created a script that does what is specified. The perform-updates-on-the-zone-file part of the process reads a list of IPs from a proprietary database and recreates the zone file. The result is the same zone file with a new serial number and some changes in the IPs that are include (A records for a blocking list). The rest of the file appears unchanged. This is as if the file had been edited by hand to change a few of the A records and update the serial number with no other changes. I am performing this operation on an RHEL 5 server using the an RPM from FC including BIND 9.6.1. BIND (named) is running chrooted. The zone file is located in /var/named/chroot/var/named/dynamic so that named can write to it and create the .jnl file. The design of the system is for this server to calculate the differences and then provide the changes to slave servers vi IXFR for delivery to end user systems etc. The problem: With each update the entire zone is being retransmitted via AXFR. Since this is on the order of 2 - 5 million A records where only a handfull will have changed between updates (about 10 minutes apart) -- AXFR is _NOT_ desired. Additional symptoms: BIND appears to be unable to see the .jnl file after it is successfully created. It complains at each update that the .jnl file does not exist and then recreates it. However it is clear that the .jnl file _does_ exist and has the correct permissions for named to see it. Log says: journal file /var/named/dynamic/blacklist.example.com.zone.jnl does not exist, creating it However ls says: [r...@server etc]# ls -l /var/named/chroot/var/named/dynamic total 56704 -rwxrwx--- 1 named named 57852592 Jan 21 20:11 blacklist.example.com.zone -rw-r--r-- 1 named named 141770 Jan 21 20:12 blacklist.example.com.zone.jnl Additional info: The master and slave servers are using the same version of BIND in the same configuration. The slave servers are working correctly as a resolver farm and have been for quite some time. Once transferred queries to the zone work as expected. The problem is strictly how the transfers are done. As an experiment I tried using nsupdate to change a single record at a time in the zone. Each time I changed a single record the slaves were notified and performed a full download of the zone again via AXFR. Anyone have any idea what's going on? Why doesn't IXFR work as specified? Why does BIND complain that the .jnl file for the zone does not exsit when it is clearly visible? Why does a single record change via nsupdate also cause a full zone transfer to slaves? Thanks in advance! _M ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Strange CNAME issue
In message a9981203-ca2a-4ba2-b95b-08d992178...@mellmo.com, seren writes: Thanks for your response. I didn't know about the +trace option in dig. = After some more searching, I believe you are correct about the long = responses being related. The responses that fail all seem to exceed = 512-bytes. Why this would happen in multiple locations is a mystery but = perhaps our firewalls are configured similarly. I'll look into the = firewall settings on my end, but since there may be other devices out = there configured similarly I'll need to try and reduce my CNAMES to find = into a 512-byte response or switch them to A records. -seren Some filewall vendors / operators think that all UDP DNS responses are = 512 bytes of payload. This has not be the case offically for over a decade now with EDNS, and was never one in practice as there have always been servers that sent larger responses as long as I've been working with DNS, ~20 years now. Some filewall vendors / operators think that TCP DNS is only used for AXFR. This has *never* been the case. One or both of these may be the problem. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND 9.6 - IXFR to slaves not working after manual zone edit.
First of all, decide where you want the delta to be computed. On the master or on the slaves? Currently you are asking all servers to compute the delta. Secondly if you are not using dynamic update to modify the master zone you don't need to freeze the master zone as named is NOT writing to it. Just replace and reload will work. Thirdly rndc freeze removes the journal. Fourthly ixfr-from-differences forces the slave to make AXFR requests. Lastly, dynamic zones and ixfr-from-differences are currently mutually exclusive. This may or may not change in future releases. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users