Re: Stalling slave transfers

2013-05-17 Thread Cathy Almond
On 15/05/13 15:58, Tony Finch wrote:
 Tom Sommer m...@tomsommer.dk wrote:

 That works fine, but I think I figured out the problem, it was due to
 the server having acquired a 2nd (autodiscovered) IPv6 address, and it
 was using that as transfer source. It would be very helpful if the
 logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That
 would help debugging a lot.
 
 I have found that if you have multiple master addresses listed for a slave
 zone, named will not fall back to trying later addresses if the first one
 fails.
 
 Tony.
 
The speed of fall-back through the masters list may depend on whether or
not you set try-tcp-refresh no; in named.conf.

Another contributing factor is whether the failure mode is immediate
(ICMP error or connection failure) or has to time out from named's
perspective.


___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-15 Thread Tony Finch
Tom Sommer m...@tomsommer.dk wrote:

 That works fine, but I think I figured out the problem, it was due to
 the server having acquired a 2nd (autodiscovered) IPv6 address, and it
 was using that as transfer source. It would be very helpful if the
 logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That
 would help debugging a lot.

I have found that if you have multiple master addresses listed for a slave
zone, named will not fall back to trying later addresses if the first one
fails.

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first.
Rough, becoming slight or moderate. Showers, rain at first. Moderate or good,
occasionally poor at first.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-14 Thread Tom Sommer


On 5/9/13 2:19 PM, Luther, Dan wrote: 

 Tom, 
 
 What happens when you dig +tcp example.com @1.2.3.4? Specifically I'm 
 wondering here if the slave you're having problems with is blocking TCP port 
 53. Such a configuration would allow you to query the master server, but not 
 transfer to/from it.

That works fine, but I think I figured out the problem, it was due to
the server having acquired a 2nd (autodiscovered) IPv6 address, and it
was using that as transfer source. It would be very helpful if the
logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That
would help debugging a lot. 

I'm down to only seeing the error retry limit for master and refresh:
failure trying master on IPv6 now, and only occasionally. 

It also appears the master is sending two notifies for each zone, to
each slave, one on IPv4 and one on IPv6? 

// Tom ___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Stalling slave transfers

2013-05-09 Thread Cathy Almond
On 08/05/13 19:15, Tom Sommer wrote:
 
 On 5/8/13 12:25 PM, Cathy Almond wrote:
 On 08/05/13 08:26, Tom Sommer wrote:
 Hi,

 I have a problem with one of 3 slave servers, all set up the exact same
 way, with the exact same bind version and configuration.

 One slave has a problem transfering zones from the master.

 The logfiles are flooded with received notify for zone .. refresh in
 progress, refresh check queued lines and rndc status returns a
 constant high number of soa queries in progress.
 After a few hours the zones are transfers, so the connection to the
 master is working, but there is a major delay. I tried resetting the
 slave and transfering ALL slave zones again, which worked fine
 instantly. The problem still appeared again after a few hours though.

 The master has three network-paths, one on external IP, one on internal
 IP and one on IPv6. All 3 paths work fine, because the transfers happen
 after an hour or so.

 There is no hints in the master's log.
 The other two slaves are running perfectly, no errors or delays what so
 ever.

 Bind version 9.9.2-P2 (recently upgraded to).

 Any hints would be appreciated, as I feel like I've exhausted most
 options.

 Thank you.
 Have a look at this KB article (you'll need to register to view - but
 registration is open to all):

 https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html


 Also - and this isn't covered in that article (yet) - if you're using
 views, then use-alt-transfer-source defaults to 'yes'.  You might want
 to set it explicitly to 'no' or to define alt-transfer-source
 and/or alt-transfer-source-v6.

 Thank you, great resource. I think I solved it with raising
 serial-query-limit, it's just odd that it's not required on the other
 two servers.
 
 Another issue has arisen now though, the logfile is filled with lots of
 named[5596]: zone example.com/IN: refresh: failure trying master
 1.2.3.4#53 (source 0.0.0.0#0): operation canceled
 
 But if I do a dig example.com @1.2.3.4 it's working just fine. Same
 server as with the previous issue.
 
 Any thoughts? Thank you.
 
 // Tom

I don't think you solved the problem - I think you moved it (or made it
happen faster...)

The refresh errors indicate that the master isn't responding to your
slave for some reason.  That's what you'll need to investigate.  I would
suggest auditing the differences between this slave and the others in
their named configurations as well as their configured IP interfaces and
routing tables.

A pair of network packet traces (slave and the non-responding auth
server) might also point you in the right direction.

Cathy
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-09 Thread Tom Sommer


On 5/9/13 11:36 AM, Cathy Almond wrote:

I don't think you solved the problem - I think you moved it (or made it
happen faster...)

The refresh errors indicate that the master isn't responding to your
slave for some reason.  That's what you'll need to investigate.  I would
suggest auditing the differences between this slave and the others in
their named configurations as well as their configured IP interfaces and
routing tables.

A pair of network packet traces (slave and the non-responding auth
server) might also point you in the right direction.

Right, but when I perform a dig from the server OS, the transfer and 
network-communication work fine - so there are no signs as to why named 
can't connect to the master, but the OS can.


I'll do some more digging.

Thanks.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: Stalling slave transfers

2013-05-09 Thread Luther, Dan
Tom, 

What happens when you dig +tcp example.com @1.2.3.4? Specifically I'm 
wondering here if the slave you're having problems with is blocking TCP port 
53. Such a configuration would allow you to query the master server, but not 
transfer to/from it.

Dan Luther
Operations Engineer
Systems Operation Engineering 
Level 3 Communications
One Technology Center, Tulsa OK 74103
e: dan.lut...@level3.com


-Original Message-
From: bind-users-bounces+dan.luther=level3@lists.isc.org 
[mailto:bind-users-bounces+dan.luther=level3@lists.isc.org] On Behalf Of 
Tom Sommer
Sent: Wednesday, May 08, 2013 1:16 PM
To: Cathy Almond
Cc: bind-users@lists.isc.org
Subject: Re: Stalling slave transfers


On 5/8/13 12:25 PM, Cathy Almond wrote:
 On 08/05/13 08:26, Tom Sommer wrote:
 Hi,

 I have a problem with one of 3 slave servers, all set up the exact 
 same way, with the exact same bind version and configuration.

 One slave has a problem transfering zones from the master.

 The logfiles are flooded with received notify for zone .. refresh 
 in progress, refresh check queued lines and rndc status returns a 
 constant high number of soa queries in progress.
 After a few hours the zones are transfers, so the connection to the 
 master is working, but there is a major delay. I tried resetting the 
 slave and transfering ALL slave zones again, which worked fine 
 instantly. The problem still appeared again after a few hours though.

 The master has three network-paths, one on external IP, one on 
 internal IP and one on IPv6. All 3 paths work fine, because the 
 transfers happen after an hour or so.

 There is no hints in the master's log.
 The other two slaves are running perfectly, no errors or delays what 
 so ever.

 Bind version 9.9.2-P2 (recently upgraded to).

 Any hints would be appreciated, as I feel like I've exhausted most options.

 Thank you.
 Have a look at this KB article (you'll need to register to view - but 
 registration is open to all):

 https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-
 effectively-for-zone-transfers-particularly-with-many-frequently-updat
 ed-zones.html

 Also - and this isn't covered in that article (yet) - if you're using 
 views, then use-alt-transfer-source defaults to 'yes'.  You might want 
 to set it explicitly to 'no' or to define alt-transfer-source and/or 
 alt-transfer-source-v6.

Thank you, great resource. I think I solved it with raising serial-query-limit, 
it's just odd that it's not required on the other two servers.

Another issue has arisen now though, the logfile is filled with lots of
named[5596]: zone example.com/IN: refresh: failure trying master
1.2.3.4#53 (source 0.0.0.0#0): operation canceled

But if I do a dig example.com @1.2.3.4 it's working just fine. Same server as 
with the previous issue.

Any thoughts? Thank you.

// Tom
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Stalling slave transfers

2013-05-08 Thread Tom Sommer

Hi,

I have a problem with one of 3 slave servers, all set up the exact same 
way, with the exact same bind version and configuration.


One slave has a problem transfering zones from the master.

The logfiles are flooded with received notify for zone .. refresh in 
progress, refresh check queued lines and rndc status returns a 
constant high number of soa queries in progress.
After a few hours the zones are transfers, so the connection to the 
master is working, but there is a major delay. I tried resetting the 
slave and transfering ALL slave zones again, which worked fine 
instantly. The problem still appeared again after a few hours though.


The master has three network-paths, one on external IP, one on internal 
IP and one on IPv6. All 3 paths work fine, because the transfers happen 
after an hour or so.


There is no hints in the master's log.
The other two slaves are running perfectly, no errors or delays what so 
ever.


Bind version 9.9.2-P2 (recently upgraded to).

Any hints would be appreciated, as I feel like I've exhausted most 
options.


Thank you.
--
Tom Sommer
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-08 Thread Tom Sommer


On 5/8/13 12:25 PM, Cathy Almond wrote:

On 08/05/13 08:26, Tom Sommer wrote:

Hi,

I have a problem with one of 3 slave servers, all set up the exact same
way, with the exact same bind version and configuration.

One slave has a problem transfering zones from the master.

The logfiles are flooded with received notify for zone .. refresh in
progress, refresh check queued lines and rndc status returns a
constant high number of soa queries in progress.
After a few hours the zones are transfers, so the connection to the
master is working, but there is a major delay. I tried resetting the
slave and transfering ALL slave zones again, which worked fine
instantly. The problem still appeared again after a few hours though.

The master has three network-paths, one on external IP, one on internal
IP and one on IPv6. All 3 paths work fine, because the transfers happen
after an hour or so.

There is no hints in the master's log.
The other two slaves are running perfectly, no errors or delays what so
ever.

Bind version 9.9.2-P2 (recently upgraded to).

Any hints would be appreciated, as I feel like I've exhausted most options.

Thank you.

Have a look at this KB article (you'll need to register to view - but
registration is open to all):

https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html

Also - and this isn't covered in that article (yet) - if you're using
views, then use-alt-transfer-source defaults to 'yes'.  You might want
to set it explicitly to 'no' or to define alt-transfer-source
and/or alt-transfer-source-v6.

Thank you, great resource. I think I solved it with raising 
serial-query-limit, it's just odd that it's not required on the other 
two servers.


Another issue has arisen now though, the logfile is filled with lots of
named[5596]: zone example.com/IN: refresh: failure trying master 
1.2.3.4#53 (source 0.0.0.0#0): operation canceled


But if I do a dig example.com @1.2.3.4 it's working just fine. Same 
server as with the previous issue.


Any thoughts? Thank you.

// Tom
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-08 Thread Tom Sommer


On 5/8/13 8:15 PM, Tom Sommer wrote:

Another issue has arisen now though, the logfile is filled with lots of
named[5596]: zone example.com/IN: refresh: failure trying master 
1.2.3.4#53 (source 0.0.0.0#0): operation canceled



and

named[5596]: zone example.com/IN: refresh: retry limit for master 
1.2.3.4#53 exceeded (source 0.0.0.0#0)


// Tom
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users