Re: BIND slave stops updating from master after 1-3 days
That's certainly disconcerting (and diverges from the behavior we continue to see with BIND 9.3). Is there any reason these updates would work without issue immediately after a restart but stop working at some point later? As you can see in the logs I provided in my initial post (relevant lines copied below) it does work as I described after a restart, for an as-yet-determined amount of time: 29-Jul-2013 10:43:34.879 notify: info: client 10.0.4.1#42576: received notify for zone 'example.com' 29-Jul-2013 10:43:34.890 general: info: zone example.com/IN: serial number (2011061500) received from master 10.0.1.1#53 < ours (2013022611) 29-Jul-2013 10:43:34.900 general: info: zone example.com/IN: refresh: non-authoritative answer from master 10.0.2.1#53 (source 10.10.10.1#0) 29-Jul-2013 10:43:34.904 general: info: zone example.com/IN: refresh: non-authoritative answer from master 10.0.3.1#53 (source 10.10.10.1#0) 29-Jul-2013 10:43:34.915 general: info: zone example.com/IN: Transfer started. 29-Jul-2013 10:43:34.916 xfer-in: info: transfer of 'example.com/IN' from 10.0.4.1#53: connected using 10.10.10.1#44081 29-Jul-2013 10:43:34.919 general: info: zone example.com/IN: transferred serial 2013072910 29-Jul-2013 10:43:34.919 xfer-in: info: transfer of 'example.com/IN' from 10.0.4.1#53: Transfer completed: 1 messages, 23 records, 719 bytes, 0.002 secs (359500 bytes/sec) 29-Jul-2013 10:43:35.379 notify: info: client 10.0.4.1#43038: received notify for zone 'example.com' 29-Jul-2013 10:43:35.380 general: info: zone example.com/IN: notify from 10.0.4.1#43038: zone is up to date On Tue, Jul 30, 2013 at 6:06 PM, Steven Carr wrote: > On 30 July 2013 22:52, Brandon Whaley wrote: > >> Once every few minutes the reload occurs on the master, which sends the >> notify to our slave servers, who should check serials on all the masters >> and transfer from the latest. >> > > I think this is your problem. From what I understand BIND does not do > this. It will contact the last server that it received an update from and > check the serial, if it's greater then it will update, but it certainly > won't chase around each master server looking to see if one of them has a > higher version. > > I think you need to fix the way you have implemented the masters, BIND > doesn't support multi-master DNS which is what you are trying to implement. > If you need this functionality then Microsoft does (to a point, there still > is effectively a master but as it's distributed through LDAP it handles > multiple updates in the background using a timestamp of the update as the > decider) but then IMHO it's just not BIND. > > Steve > > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to > unsubscribe from this list > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users > -- Best Regards, Brandon W. Tier 3 System Administrator InMotion Hosting Inc. 888-321-4678 757-416-6575 (Int'l) NEW: 24x7 EMAIL and PHONE Technical Support Did you know? We'll Build, Update and Promote Your Site for You! Visit www.inmotionhosting.com/webdesign Answers to commonly asked questions, as well as other useful tools, can be found at http://support.inmotionhosting.com How am I doing? Please feel free to email my manager at manager_feedb...@inmotion.net ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND slave stops updating from master after 1-3 days
The logs do seem to only check the first 1-2 servers in the forwarders section when the problem is occurring. If named is restarted on this slave, it will check all the servers, as expected when it receives a notify. We have our zones distributed among 5 masters to speed updates. The software we use that allows customers to update their zones does a direct update of the zone file on the master server to which it is connected, then queues a reload of named. Once every few minutes the reload occurs on the master, which sends the notify to our slave servers, who should check serials on all the masters and transfer from the latest. We do this because the number of zones we have was causing the reload on the masters to take too long and delaying updates to the slaves. This system was working without issue until this slave was updated to BIND 9.8, and the slave that is still on BIND 9.3 continues to function normally. One thing that just occurred to me is that in every case I've seen thus far, an earlier master (say 10.0.1.1) had a zone with an old serial for the domain in question (which has a newer serial on say 10.0.5.1). This occurs because we tend to move users between servers that sync to different masters. Could this be related to the problem? On Tue, Jul 30, 2013 at 5:01 PM, Steven Carr wrote: > On 30 July 2013 21:38, Brandon Whaley wrote: > >> zone "example.com" { >> type slave; >> file "/var/named/slaves/example.com.db"; >> masters { 10.0.1.1; 10.0.2.1; 10.0.3.1; 10.0.4.1; 10.0.5.1; }; >> }; >> > > So given what I mentioned before I would envisage BIND contacting 10.0.1.1 > and then failing to 10.0.2.1 but ignoring the rest. Is this what you are > seeing in the logs? Or is the slave not attempting to contact any of the > servers? > > Just out of curiosity, what is the reason for having 5 masters? are these > multi-master? or are they effectively slaves that also allow zone transfers? > > Steve > > > > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to > unsubscribe from this list > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users > -- Best Regards, Brandon W. Tier 3 System Administrator InMotion Hosting Inc. 888-321-4678 757-416-6575 (Int'l) NEW: 24x7 EMAIL and PHONE Technical Support Did you know? We'll Build, Update and Promote Your Site for You! Visit www.inmotionhosting.com/webdesign Answers to commonly asked questions, as well as other useful tools, can be found at http://support.inmotionhosting.com How am I doing? Please feel free to email my manager at manager_feedb...@inmotion.net ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND slave stops updating from master after 1-3 days
Hey Lawrence, this is the zone entry as seen on the 10.10.10.1 slave. The 10.0.x.1 IPs are the addresses of the masters. On Tue, Jul 30, 2013 at 4:43 PM, Lawrence K. Chen, P.Eng. wrote: > > > -- > > > I think that's what you asked for. In case I misunderstood, here's a zone > entry from the slave's named.conf (this immediately follows the options > block in my first email: > > > zone "example.com" { > type slave; > file "/var/named/slaves/example.com.db"; > masters { 10.0.1.1; 10.0.2.1; 10.0.3.1; 10.0.4.1; 10.0.5.1; }; > }; > > > Should probably have the 10.10.10.1 master here, rather than the slave > nameservers that are configured not to allow transfers. > > L > -- Best Regards, Brandon W. Tier 3 System Administrator InMotion Hosting Inc. 888-321-4678 757-416-6575 (Int'l) NEW: 24x7 EMAIL and PHONE Technical Support Did you know? We'll Build, Update and Promote Your Site for You! Visit www.inmotionhosting.com/webdesign Answers to commonly asked questions, as well as other useful tools, can be found at http://support.inmotionhosting.com How am I doing? Please feel free to email my manager at manager_feedb...@inmotion.net ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND slave stops updating from master after 1-3 days
Hey Steve, thanks for the reply. Here's the top of one of the masters' named.conf files (they're all the same, with the exception of which zones are on them: controls { inet 127.0.0.1 allow { localhost; } keys { "rndc-key"; }; }; include "/etc/rndc.key"; logging{ channel simple_log { file "/var/log/ramlog/named.log" versions 3 size 65m; severity debug 0; print-time yes; print-severity yes; print-category yes; }; category default{ simple_log; }; }; zone "." { type hint; file "/var/named/named.ca"; }; options { statistics-file "/var/named/data/named_stats.txt"; directory "/var/named"; recursion no; transfers-out 1; notify explicit; also-notify {10.10.10.1; 10.10.10.2; }; allow-transfer {10.10.10.1; 10.10.10.2; }; files 4096; }; zone "0.0.127.in-addr.arpa" { type master; file "/var/named/named.local"; }; zone "example.com" { type master; file "/var/named/example.com.db"; }; I think that's what you asked for. In case I misunderstood, here's a zone entry from the slave's named.conf (this immediately follows the options block in my first email: zone "example.com" { type slave; file "/var/named/slaves/example.com.db"; masters { 10.0.1.1; 10.0.2.1; 10.0.3.1; 10.0.4.1; 10.0.5.1; }; }; On Tue, Jul 30, 2013 at 4:10 PM, Steven Carr wrote: > On 30 July 2013 20:31, Brandon Whaley wrote: > >> Sorry for the bump here, but through extensive troubleshooting I've >> identified a trend in this. It appears that zones hosted on the >> lower-numbered masters are still updating without issue. This leads me to >> believe that something is causing BIND to "forget" the later cluster >> servers, as the logs show that it doesn't even try to query them for zone >> updates. Is this known behavior? Perhaps a network failure causes a >> master to be marked "bad" in newer versions of BIND? Restarting named on >> the slave continues to correct the problem, so for now I'm (unfortunately) >> restarting named frequently on this slave. >> > > Can you post a snippet of one of your secondary zone config stanzas so we > can see how you have the slave zone configured. > > From previous posts to the list I think it was identified that BIND will > contact the first master server listed and failover to the second master if > the first wasn't contactable, but then it would ignore any additional > masters. > > Would be good to get some clarification on this from ISC, I've tried to > trace my way through the source code and can't identify where BIND decides > which master to update from, all I can find is the code where it goes to > cleanup if the server isn't contactable (bind-9.9.3-P2/lib/dns/zone.c > ln:13647), but can't see where it would then choose another one and try > again. > > Steve > > > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to > unsubscribe from this list > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users > -- Best Regards, Brandon W. Tier 3 System Administrator InMotion Hosting Inc. 888-321-4678 757-416-6575 (Int'l) NEW: 24x7 EMAIL and PHONE Technical Support Did you know? We'll Build, Update and Promote Your Site for You! Visit www.inmotionhosting.com/webdesign Answers to commonly asked questions, as well as other useful tools, can be found at http://support.inmotionhosting.com How am I doing? Please feel free to email my manager at manager_feedb...@inmotion.net ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND slave stops updating from master after 1-3 days
Sorry for the bump here, but through extensive troubleshooting I've identified a trend in this. It appears that zones hosted on the lower-numbered masters are still updating without issue. This leads me to believe that something is causing BIND to "forget" the later cluster servers, as the logs show that it doesn't even try to query them for zone updates. Is this known behavior? Perhaps a network failure causes a master to be marked "bad" in newer versions of BIND? Restarting named on the slave continues to correct the problem, so for now I'm (unfortunately) restarting named frequently on this slave. On Mon, Jul 29, 2013 at 6:41 PM, Brandon Whaley < brand...@inmotionhosting.com> wrote: > Hi all, I've recently upgraded from a CentOS5 install of BIND 9 > (bind-9.3.6-20.P1.el5_8.6) to a CentOS6 install > (bind-9.8.2-0.17.rc1.el6_4.4.x86_64) for one of my two nameservers. The > config I'm using is nearly identical (added rate limiting only) and the > server that has not yet been updated is still having no problems. The > upgraded server will stop receiving updates for zones after 1-3 days of > completely normal operation. Restarting named (but not reloading) corrects > the problem for another 1-3 days. > > I have logs showing what happens before and after a restart of the > service. I've changed the IPs in these logs and the config file which > follows. In them, 10.0.x.1 represents a master server and 10.10.10.1 is > the slave having issues. These logs are basically "tail -f named.log | > grep example.com" where example.com is the test domain I'm using (it > happens with all domains). I update the zone file's serial on 10.0.4.1, > then do an rndc reload there, these are the logs that come up: > > Before named restart > = > 29-Jul-2013 10:17:29.567 notify: info: client 10.0.4.1#37224: received > notify for zone 'example.com' > 29-Jul-2013 10:17:30.069 notify: info: client 10.0.4.1#32206: received > notify for zone 'example.com' > 29-Jul-2013 10:17:30.069 general: info: zone example.com/IN: notify from > 10.0.4.1#32206: refresh in progress, refresh check queued > 29-Jul-2013 10:18:59.568 general: info: zone example.com/IN: refresh: > retry limit for master 10.0.1.1#53 exceeded (source 10.10.10.1#0) > 29-Jul-2013 10:18:59.568 general: info: zone example.com/IN: Transfer > started. > 29-Jul-2013 10:18:59.569 xfer-in: info: transfer of 'example.com/IN' from > 10.0.1.1#53: connected using 10.10.10.1#55992 > 29-Jul-2013 10:18:59.570 xfer-in: info: transfer of 'example.com/IN' from > 10.0.1.1#53: Transfer completed: 0 messages, 1 records, 0 bytes, 0.001 secs > (0 bytes/sec) > > After named restart > = > 29-Jul-2013 10:43:34.879 notify: info: client 10.0.4.1#42576: received > notify for zone 'example.com' > 29-Jul-2013 10:43:34.890 general: info: zone example.com/IN: serial > number (2011061500) received from master 10.0.1.1#53 < ours (2013022611) > 29-Jul-2013 10:43:34.900 general: info: zone example.com/IN: refresh: > non-authoritative answer from master 10.0.2.1#53 (source 10.10.10.1#0) > 29-Jul-2013 10:43:34.904 general: info: zone example.com/IN: refresh: > non-authoritative answer from master 10.0.3.1#53 (source 10.10.10.1#0) > 29-Jul-2013 10:43:34.915 general: info: zone example.com/IN: Transfer > started. > 29-Jul-2013 10:43:34.916 xfer-in: info: transfer of 'example.com/IN' from > 10.0.4.1#53: connected using 10.10.10.1#44081 > 29-Jul-2013 10:43:34.919 general: info: zone example.com/IN: transferred > serial 2013072910 > 29-Jul-2013 10:43:34.919 xfer-in: info: transfer of 'example.com/IN' from > 10.0.4.1#53: Transfer completed: 1 messages, 23 records, 719 bytes, 0.002 > secs (359500 bytes/sec) > 29-Jul-2013 10:43:35.379 notify: info: client 10.0.4.1#43038: received > notify for zone 'example.com' > 29-Jul-2013 10:43:35.380 general: info: zone example.com/IN: notify from > 10.0.4.1#43038: zone is up to date > > > > In case it's necessary, here is the config for the slave where these logs > were produced: > > controls { > inet 127.0.0.1 port 953 > allow { 127.0.0.1; } keys { "rndc-key"; }; > }; > > include "/etc/rndc.key"; > > acl "notifytrusted" { 10.0.1.1; 10.0.2.1; 10.0.3.1; 10.0.4.1; 10.0.5.1; }; > > logging{ > channel simple_log { > file "/var/log/named.log" versions 3 size 65m; > #severity warning; > severity debug 0; > print-time yes; > print-severity yes; > print-category yes; > }; > category default{ > simple_log; > }; > }; > > zone "." { > type hint; >
BIND slave stops updating from master after 1-3 days
Hi all, I've recently upgraded from a CentOS5 install of BIND 9 (bind-9.3.6-20.P1.el5_8.6) to a CentOS6 install (bind-9.8.2-0.17.rc1.el6_4.4.x86_64) for one of my two nameservers. The config I'm using is nearly identical (added rate limiting only) and the server that has not yet been updated is still having no problems. The upgraded server will stop receiving updates for zones after 1-3 days of completely normal operation. Restarting named (but not reloading) corrects the problem for another 1-3 days. I have logs showing what happens before and after a restart of the service. I've changed the IPs in these logs and the config file which follows. In them, 10.0.x.1 represents a master server and 10.10.10.1 is the slave having issues. These logs are basically "tail -f named.log | grep example.com" where example.com is the test domain I'm using (it happens with all domains). I update the zone file's serial on 10.0.4.1, then do an rndc reload there, these are the logs that come up: Before named restart = 29-Jul-2013 10:17:29.567 notify: info: client 10.0.4.1#37224: received notify for zone 'example.com' 29-Jul-2013 10:17:30.069 notify: info: client 10.0.4.1#32206: received notify for zone 'example.com' 29-Jul-2013 10:17:30.069 general: info: zone example.com/IN: notify from 10.0.4.1#32206: refresh in progress, refresh check queued 29-Jul-2013 10:18:59.568 general: info: zone example.com/IN: refresh: retry limit for master 10.0.1.1#53 exceeded (source 10.10.10.1#0) 29-Jul-2013 10:18:59.568 general: info: zone example.com/IN: Transfer started. 29-Jul-2013 10:18:59.569 xfer-in: info: transfer of 'example.com/IN' from 10.0.1.1#53: connected using 10.10.10.1#55992 29-Jul-2013 10:18:59.570 xfer-in: info: transfer of 'example.com/IN' from 10.0.1.1#53: Transfer completed: 0 messages, 1 records, 0 bytes, 0.001 secs (0 bytes/sec) After named restart = 29-Jul-2013 10:43:34.879 notify: info: client 10.0.4.1#42576: received notify for zone 'example.com' 29-Jul-2013 10:43:34.890 general: info: zone example.com/IN: serial number (2011061500) received from master 10.0.1.1#53 < ours (2013022611) 29-Jul-2013 10:43:34.900 general: info: zone example.com/IN: refresh: non-authoritative answer from master 10.0.2.1#53 (source 10.10.10.1#0) 29-Jul-2013 10:43:34.904 general: info: zone example.com/IN: refresh: non-authoritative answer from master 10.0.3.1#53 (source 10.10.10.1#0) 29-Jul-2013 10:43:34.915 general: info: zone example.com/IN: Transfer started. 29-Jul-2013 10:43:34.916 xfer-in: info: transfer of 'example.com/IN' from 10.0.4.1#53: connected using 10.10.10.1#44081 29-Jul-2013 10:43:34.919 general: info: zone example.com/IN: transferred serial 2013072910 29-Jul-2013 10:43:34.919 xfer-in: info: transfer of 'example.com/IN' from 10.0.4.1#53: Transfer completed: 1 messages, 23 records, 719 bytes, 0.002 secs (359500 bytes/sec) 29-Jul-2013 10:43:35.379 notify: info: client 10.0.4.1#43038: received notify for zone 'example.com' 29-Jul-2013 10:43:35.380 general: info: zone example.com/IN: notify from 10.0.4.1#43038: zone is up to date In case it's necessary, here is the config for the slave where these logs were produced: controls { inet 127.0.0.1 port 953 allow { 127.0.0.1; } keys { "rndc-key"; }; }; include "/etc/rndc.key"; acl "notifytrusted" { 10.0.1.1; 10.0.2.1; 10.0.3.1; 10.0.4.1; 10.0.5.1; }; logging{ channel simple_log { file "/var/log/named.log" versions 3 size 65m; #severity warning; severity debug 0; print-time yes; print-severity yes; print-category yes; }; category default{ simple_log; }; }; zone "." { type hint; file "/var/named/named.ca"; }; options { statistics-file "/var/named/data/named_stats.txt"; directory "/var/named"; forwarders {10.0.1.1; 10.0.2.1; 10.0.3.1; 10.0.4.1; 10.0.5.1; }; forward only; transfers-in 5; transfers-per-ns 5; serial-query-rate 1; transfer-source 10.10.10.1; use-alt-transfer-source no; rate-limit { responses-per-second 200; window 5; }; allow-transfer { none; }; notify no; allow-notify {notifytrusted; }; }; After this is just zone definitions. Has anyone else seen this problem? -- Best Regards, Brandon W. Tier 3 System Administrator InMotion Hosting Inc. 888-321-4678 757-416-6575 (Int'l) NEW: 24x7 EMAIL and PHONE Technical Support Did you know? We'll Build, Update and Promote Your Site for You! Visit www.inmotionhosting.com/webdesign Answers to commonly asked questions, as well as other useful tools, can be found at http://support.inmotionhosting.com How am I doing? Please feel free to email my manager at manager_feedb...@inmotion.net ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/