Re: Stalling slave transfers
On 15/05/13 15:58, Tony Finch wrote: > Tom Sommer wrote: >> >> That works fine, but I think I figured out the problem, it was due to >> the server having acquired a 2nd (autodiscovered) IPv6 address, and it >> was using that as transfer source. It would be very helpful if the >> logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That >> would help debugging a lot. > > I have found that if you have multiple master addresses listed for a slave > zone, named will not fall back to trying later addresses if the first one > fails. > > Tony. > The speed of fall-back through the masters list may depend on whether or not you set "try-tcp-refresh no;" in named.conf. Another contributing factor is whether the failure mode is immediate (ICMP error or connection failure) or has to time out from named's perspective. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Stalling slave transfers
Tom Sommer wrote: > > That works fine, but I think I figured out the problem, it was due to > the server having acquired a 2nd (autodiscovered) IPv6 address, and it > was using that as transfer source. It would be very helpful if the > logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That > would help debugging a lot. I have found that if you have multiple master addresses listed for a slave zone, named will not fall back to trying later addresses if the first one fails. Tony. -- f.anthony.n.finchhttp://dotat.at/ Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first. Rough, becoming slight or moderate. Showers, rain at first. Moderate or good, occasionally poor at first. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Stalling slave transfers
On 5/9/13 2:19 PM, Luther, Dan wrote: > Tom, > > What happens when you "dig +tcp example.com @1.2.3.4"? Specifically I'm > wondering here if the slave you're having problems with is blocking TCP port > 53. Such a configuration would allow you to query the master server, but not > transfer to/from it. That works fine, but I think I figured out the problem, it was due to the server having acquired a 2nd (autodiscovered) IPv6 address, and it was using that as transfer source. It would be very helpful if the logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That would help debugging a lot. I'm down to only seeing the error "retry limit for master" and "refresh: failure trying master" on IPv6 now, and only occasionally. It also appears the master is sending two notifies for each zone, to each slave, one on IPv4 and one on IPv6? // Tom ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
RE: Stalling slave transfers
Tom, What happens when you "dig +tcp example.com @1.2.3.4"? Specifically I'm wondering here if the slave you're having problems with is blocking TCP port 53. Such a configuration would allow you to query the master server, but not transfer to/from it. Dan Luther Operations Engineer Systems Operation Engineering Level 3 Communications One Technology Center, Tulsa OK 74103 e: dan.lut...@level3.com -Original Message- From: bind-users-bounces+dan.luther=level3@lists.isc.org [mailto:bind-users-bounces+dan.luther=level3@lists.isc.org] On Behalf Of Tom Sommer Sent: Wednesday, May 08, 2013 1:16 PM To: Cathy Almond Cc: bind-users@lists.isc.org Subject: Re: Stalling slave transfers On 5/8/13 12:25 PM, Cathy Almond wrote: > On 08/05/13 08:26, Tom Sommer wrote: >> Hi, >> >> I have a problem with one of 3 slave servers, all set up the exact >> same way, with the exact same bind version and configuration. >> >> One slave has a problem transfering zones from the master. >> >> The logfiles are flooded with "received notify for zone" .. "refresh >> in progress, refresh check queued" lines and "rndc status" returns a >> constant high number of "soa queries in progress". >> After a few hours the zones are transfers, so the connection to the >> master is working, but there is a major delay. I tried resetting the >> slave and transfering ALL slave zones again, which worked fine >> instantly. The problem still appeared again after a few hours though. >> >> The master has three network-paths, one on external IP, one on >> internal IP and one on IPv6. All 3 paths work fine, because the >> transfers happen after an hour or so. >> >> There is no hints in the master's log. >> The other two slaves are running perfectly, no errors or delays what >> so ever. >> >> Bind version 9.9.2-P2 (recently upgraded to). >> >> Any hints would be appreciated, as I feel like I've exhausted most options. >> >> Thank you. > Have a look at this KB article (you'll need to register to view - but > registration is open to all): > > https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration- > effectively-for-zone-transfers-particularly-with-many-frequently-updat > ed-zones.html > > Also - and this isn't covered in that article (yet) - if you're using > views, then use-alt-transfer-source defaults to 'yes'. You might want > to set it explicitly to 'no' or to define alt-transfer-source and/or > alt-transfer-source-v6. > Thank you, great resource. I think I solved it with raising serial-query-limit, it's just odd that it's not required on the other two servers. Another issue has arisen now though, the logfile is filled with lots of named[5596]: zone example.com/IN: refresh: failure trying master 1.2.3.4#53 (source 0.0.0.0#0): operation canceled But if I do a "dig example.com @1.2.3.4" it's working just fine. Same server as with the previous issue. Any thoughts? Thank you. // Tom ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Stalling slave transfers
On 5/9/13 11:36 AM, Cathy Almond wrote: I don't think you solved the problem - I think you moved it (or made it happen faster...) The refresh errors indicate that the master isn't responding to your slave for some reason. That's what you'll need to investigate. I would suggest auditing the differences between this slave and the others in their named configurations as well as their configured IP interfaces and routing tables. A pair of network packet traces (slave and the non-responding auth server) might also point you in the right direction. Right, but when I perform a "dig" from the server OS, the transfer and network-communication work fine - so there are no signs as to why named can't connect to the master, but the OS can. I'll do some more digging. Thanks. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Stalling slave transfers
On 08/05/13 19:15, Tom Sommer wrote: > > On 5/8/13 12:25 PM, Cathy Almond wrote: >> On 08/05/13 08:26, Tom Sommer wrote: >>> Hi, >>> >>> I have a problem with one of 3 slave servers, all set up the exact same >>> way, with the exact same bind version and configuration. >>> >>> One slave has a problem transfering zones from the master. >>> >>> The logfiles are flooded with "received notify for zone" .. "refresh in >>> progress, refresh check queued" lines and "rndc status" returns a >>> constant high number of "soa queries in progress". >>> After a few hours the zones are transfers, so the connection to the >>> master is working, but there is a major delay. I tried resetting the >>> slave and transfering ALL slave zones again, which worked fine >>> instantly. The problem still appeared again after a few hours though. >>> >>> The master has three network-paths, one on external IP, one on internal >>> IP and one on IPv6. All 3 paths work fine, because the transfers happen >>> after an hour or so. >>> >>> There is no hints in the master's log. >>> The other two slaves are running perfectly, no errors or delays what so >>> ever. >>> >>> Bind version 9.9.2-P2 (recently upgraded to). >>> >>> Any hints would be appreciated, as I feel like I've exhausted most >>> options. >>> >>> Thank you. >> Have a look at this KB article (you'll need to register to view - but >> registration is open to all): >> >> https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html >> >> >> Also - and this isn't covered in that article (yet) - if you're using >> views, then use-alt-transfer-source defaults to 'yes'. You might want >> to set it explicitly to 'no' or to define alt-transfer-source >> and/or alt-transfer-source-v6. >> > Thank you, great resource. I think I solved it with raising > serial-query-limit, it's just odd that it's not required on the other > two servers. > > Another issue has arisen now though, the logfile is filled with lots of > named[5596]: zone example.com/IN: refresh: failure trying master > 1.2.3.4#53 (source 0.0.0.0#0): operation canceled > > But if I do a "dig example.com @1.2.3.4" it's working just fine. Same > server as with the previous issue. > > Any thoughts? Thank you. > > // Tom I don't think you solved the problem - I think you moved it (or made it happen faster...) The refresh errors indicate that the master isn't responding to your slave for some reason. That's what you'll need to investigate. I would suggest auditing the differences between this slave and the others in their named configurations as well as their configured IP interfaces and routing tables. A pair of network packet traces (slave and the non-responding auth server) might also point you in the right direction. Cathy ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Stalling slave transfers
On 5/8/13 8:15 PM, Tom Sommer wrote: Another issue has arisen now though, the logfile is filled with lots of named[5596]: zone example.com/IN: refresh: failure trying master 1.2.3.4#53 (source 0.0.0.0#0): operation canceled and named[5596]: zone example.com/IN: refresh: retry limit for master 1.2.3.4#53 exceeded (source 0.0.0.0#0) // Tom ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Stalling slave transfers
On 5/8/13 12:25 PM, Cathy Almond wrote: On 08/05/13 08:26, Tom Sommer wrote: Hi, I have a problem with one of 3 slave servers, all set up the exact same way, with the exact same bind version and configuration. One slave has a problem transfering zones from the master. The logfiles are flooded with "received notify for zone" .. "refresh in progress, refresh check queued" lines and "rndc status" returns a constant high number of "soa queries in progress". After a few hours the zones are transfers, so the connection to the master is working, but there is a major delay. I tried resetting the slave and transfering ALL slave zones again, which worked fine instantly. The problem still appeared again after a few hours though. The master has three network-paths, one on external IP, one on internal IP and one on IPv6. All 3 paths work fine, because the transfers happen after an hour or so. There is no hints in the master's log. The other two slaves are running perfectly, no errors or delays what so ever. Bind version 9.9.2-P2 (recently upgraded to). Any hints would be appreciated, as I feel like I've exhausted most options. Thank you. Have a look at this KB article (you'll need to register to view - but registration is open to all): https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html Also - and this isn't covered in that article (yet) - if you're using views, then use-alt-transfer-source defaults to 'yes'. You might want to set it explicitly to 'no' or to define alt-transfer-source and/or alt-transfer-source-v6. Thank you, great resource. I think I solved it with raising serial-query-limit, it's just odd that it's not required on the other two servers. Another issue has arisen now though, the logfile is filled with lots of named[5596]: zone example.com/IN: refresh: failure trying master 1.2.3.4#53 (source 0.0.0.0#0): operation canceled But if I do a "dig example.com @1.2.3.4" it's working just fine. Same server as with the previous issue. Any thoughts? Thank you. // Tom ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Stalling slave transfers
On 08/05/13 08:26, Tom Sommer wrote: > Hi, > > I have a problem with one of 3 slave servers, all set up the exact same > way, with the exact same bind version and configuration. > > One slave has a problem transfering zones from the master. > > The logfiles are flooded with "received notify for zone" .. "refresh in > progress, refresh check queued" lines and "rndc status" returns a > constant high number of "soa queries in progress". > After a few hours the zones are transfers, so the connection to the > master is working, but there is a major delay. I tried resetting the > slave and transfering ALL slave zones again, which worked fine > instantly. The problem still appeared again after a few hours though. > > The master has three network-paths, one on external IP, one on internal > IP and one on IPv6. All 3 paths work fine, because the transfers happen > after an hour or so. > > There is no hints in the master's log. > The other two slaves are running perfectly, no errors or delays what so > ever. > > Bind version 9.9.2-P2 (recently upgraded to). > > Any hints would be appreciated, as I feel like I've exhausted most options. > > Thank you. Have a look at this KB article (you'll need to register to view - but registration is open to all): https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html Also - and this isn't covered in that article (yet) - if you're using views, then use-alt-transfer-source defaults to 'yes'. You might want to set it explicitly to 'no' or to define alt-transfer-source and/or alt-transfer-source-v6. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users