Re: Stalling slave transfers

2013-05-14 Thread Tom Sommer


On 5/9/13 2:19 PM, Luther, Dan wrote: 

 Tom, 
 
 What happens when you dig +tcp example.com @1.2.3.4? Specifically I'm 
 wondering here if the slave you're having problems with is blocking TCP port 
 53. Such a configuration would allow you to query the master server, but not 
 transfer to/from it.

That works fine, but I think I figured out the problem, it was due to
the server having acquired a 2nd (autodiscovered) IPv6 address, and it
was using that as transfer source. It would be very helpful if the
logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That
would help debugging a lot. 

I'm down to only seeing the error retry limit for master and refresh:
failure trying master on IPv6 now, and only occasionally. 

It also appears the master is sending two notifies for each zone, to
each slave, one on IPv4 and one on IPv6? 

// Tom ___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Stalling slave transfers

2013-05-09 Thread Tom Sommer


On 5/9/13 11:36 AM, Cathy Almond wrote:

I don't think you solved the problem - I think you moved it (or made it
happen faster...)

The refresh errors indicate that the master isn't responding to your
slave for some reason.  That's what you'll need to investigate.  I would
suggest auditing the differences between this slave and the others in
their named configurations as well as their configured IP interfaces and
routing tables.

A pair of network packet traces (slave and the non-responding auth
server) might also point you in the right direction.

Right, but when I perform a dig from the server OS, the transfer and 
network-communication work fine - so there are no signs as to why named 
can't connect to the master, but the OS can.


I'll do some more digging.

Thanks.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Stalling slave transfers

2013-05-08 Thread Tom Sommer

Hi,

I have a problem with one of 3 slave servers, all set up the exact same 
way, with the exact same bind version and configuration.


One slave has a problem transfering zones from the master.

The logfiles are flooded with received notify for zone .. refresh in 
progress, refresh check queued lines and rndc status returns a 
constant high number of soa queries in progress.
After a few hours the zones are transfers, so the connection to the 
master is working, but there is a major delay. I tried resetting the 
slave and transfering ALL slave zones again, which worked fine 
instantly. The problem still appeared again after a few hours though.


The master has three network-paths, one on external IP, one on internal 
IP and one on IPv6. All 3 paths work fine, because the transfers happen 
after an hour or so.


There is no hints in the master's log.
The other two slaves are running perfectly, no errors or delays what so 
ever.


Bind version 9.9.2-P2 (recently upgraded to).

Any hints would be appreciated, as I feel like I've exhausted most 
options.


Thank you.
--
Tom Sommer
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-08 Thread Tom Sommer


On 5/8/13 12:25 PM, Cathy Almond wrote:

On 08/05/13 08:26, Tom Sommer wrote:

Hi,

I have a problem with one of 3 slave servers, all set up the exact same
way, with the exact same bind version and configuration.

One slave has a problem transfering zones from the master.

The logfiles are flooded with received notify for zone .. refresh in
progress, refresh check queued lines and rndc status returns a
constant high number of soa queries in progress.
After a few hours the zones are transfers, so the connection to the
master is working, but there is a major delay. I tried resetting the
slave and transfering ALL slave zones again, which worked fine
instantly. The problem still appeared again after a few hours though.

The master has three network-paths, one on external IP, one on internal
IP and one on IPv6. All 3 paths work fine, because the transfers happen
after an hour or so.

There is no hints in the master's log.
The other two slaves are running perfectly, no errors or delays what so
ever.

Bind version 9.9.2-P2 (recently upgraded to).

Any hints would be appreciated, as I feel like I've exhausted most options.

Thank you.

Have a look at this KB article (you'll need to register to view - but
registration is open to all):

https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html

Also - and this isn't covered in that article (yet) - if you're using
views, then use-alt-transfer-source defaults to 'yes'.  You might want
to set it explicitly to 'no' or to define alt-transfer-source
and/or alt-transfer-source-v6.

Thank you, great resource. I think I solved it with raising 
serial-query-limit, it's just odd that it's not required on the other 
two servers.


Another issue has arisen now though, the logfile is filled with lots of
named[5596]: zone example.com/IN: refresh: failure trying master 
1.2.3.4#53 (source 0.0.0.0#0): operation canceled


But if I do a dig example.com @1.2.3.4 it's working just fine. Same 
server as with the previous issue.


Any thoughts? Thank you.

// Tom
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-08 Thread Tom Sommer


On 5/8/13 8:15 PM, Tom Sommer wrote:

Another issue has arisen now though, the logfile is filled with lots of
named[5596]: zone example.com/IN: refresh: failure trying master 
1.2.3.4#53 (source 0.0.0.0#0): operation canceled



and

named[5596]: zone example.com/IN: refresh: retry limit for master 
1.2.3.4#53 exceeded (source 0.0.0.0#0)


// Tom
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users