Re: Stalling slave transfers

2013-05-17 Thread Cathy Almond
On 15/05/13 15:58, Tony Finch wrote:
> Tom Sommer  wrote:
>>
>> That works fine, but I think I figured out the problem, it was due to
>> the server having acquired a 2nd (autodiscovered) IPv6 address, and it
>> was using that as transfer source. It would be very helpful if the
>> logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That
>> would help debugging a lot.
> 
> I have found that if you have multiple master addresses listed for a slave
> zone, named will not fall back to trying later addresses if the first one
> fails.
> 
> Tony.
> 
The speed of fall-back through the masters list may depend on whether or
not you set "try-tcp-refresh no;" in named.conf.

Another contributing factor is whether the failure mode is immediate
(ICMP error or connection failure) or has to time out from named's
perspective.


___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-15 Thread Tony Finch
Tom Sommer  wrote:
>
> That works fine, but I think I figured out the problem, it was due to
> the server having acquired a 2nd (autodiscovered) IPv6 address, and it
> was using that as transfer source. It would be very helpful if the
> logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That
> would help debugging a lot.

I have found that if you have multiple master addresses listed for a slave
zone, named will not fall back to trying later addresses if the first one
fails.

Tony.
-- 
f.anthony.n.finchhttp://dotat.at/
Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first.
Rough, becoming slight or moderate. Showers, rain at first. Moderate or good,
occasionally poor at first.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-14 Thread Tom Sommer


On 5/9/13 2:19 PM, Luther, Dan wrote: 

> Tom, 
> 
> What happens when you "dig +tcp example.com @1.2.3.4"? Specifically I'm 
> wondering here if the slave you're having problems with is blocking TCP port 
> 53. Such a configuration would allow you to query the master server, but not 
> transfer to/from it.

That works fine, but I think I figured out the problem, it was due to
the server having acquired a 2nd (autodiscovered) IPv6 address, and it
was using that as transfer source. It would be very helpful if the
logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That
would help debugging a lot. 

I'm down to only seeing the error "retry limit for master" and "refresh:
failure trying master" on IPv6 now, and only occasionally. 

It also appears the master is sending two notifies for each zone, to
each slave, one on IPv4 and one on IPv6? 

// Tom ___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

RE: Stalling slave transfers

2013-05-09 Thread Luther, Dan
Tom, 

What happens when you "dig +tcp example.com @1.2.3.4"? Specifically I'm 
wondering here if the slave you're having problems with is blocking TCP port 
53. Such a configuration would allow you to query the master server, but not 
transfer to/from it.

Dan Luther
Operations Engineer
Systems Operation Engineering 
Level 3 Communications
One Technology Center, Tulsa OK 74103
e: dan.lut...@level3.com


-Original Message-
From: bind-users-bounces+dan.luther=level3@lists.isc.org 
[mailto:bind-users-bounces+dan.luther=level3@lists.isc.org] On Behalf Of 
Tom Sommer
Sent: Wednesday, May 08, 2013 1:16 PM
To: Cathy Almond
Cc: bind-users@lists.isc.org
Subject: Re: Stalling slave transfers


On 5/8/13 12:25 PM, Cathy Almond wrote:
> On 08/05/13 08:26, Tom Sommer wrote:
>> Hi,
>>
>> I have a problem with one of 3 slave servers, all set up the exact 
>> same way, with the exact same bind version and configuration.
>>
>> One slave has a problem transfering zones from the master.
>>
>> The logfiles are flooded with "received notify for zone" .. "refresh 
>> in progress, refresh check queued" lines and "rndc status" returns a 
>> constant high number of "soa queries in progress".
>> After a few hours the zones are transfers, so the connection to the 
>> master is working, but there is a major delay. I tried resetting the 
>> slave and transfering ALL slave zones again, which worked fine 
>> instantly. The problem still appeared again after a few hours though.
>>
>> The master has three network-paths, one on external IP, one on 
>> internal IP and one on IPv6. All 3 paths work fine, because the 
>> transfers happen after an hour or so.
>>
>> There is no hints in the master's log.
>> The other two slaves are running perfectly, no errors or delays what 
>> so ever.
>>
>> Bind version 9.9.2-P2 (recently upgraded to).
>>
>> Any hints would be appreciated, as I feel like I've exhausted most options.
>>
>> Thank you.
> Have a look at this KB article (you'll need to register to view - but 
> registration is open to all):
>
> https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-
> effectively-for-zone-transfers-particularly-with-many-frequently-updat
> ed-zones.html
>
> Also - and this isn't covered in that article (yet) - if you're using 
> views, then use-alt-transfer-source defaults to 'yes'.  You might want 
> to set it explicitly to 'no' or to define alt-transfer-source and/or 
> alt-transfer-source-v6.
>
Thank you, great resource. I think I solved it with raising serial-query-limit, 
it's just odd that it's not required on the other two servers.

Another issue has arisen now though, the logfile is filled with lots of
named[5596]: zone example.com/IN: refresh: failure trying master
1.2.3.4#53 (source 0.0.0.0#0): operation canceled

But if I do a "dig example.com @1.2.3.4" it's working just fine. Same server as 
with the previous issue.

Any thoughts? Thank you.

// Tom
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-09 Thread Tom Sommer


On 5/9/13 11:36 AM, Cathy Almond wrote:

I don't think you solved the problem - I think you moved it (or made it
happen faster...)

The refresh errors indicate that the master isn't responding to your
slave for some reason.  That's what you'll need to investigate.  I would
suggest auditing the differences between this slave and the others in
their named configurations as well as their configured IP interfaces and
routing tables.

A pair of network packet traces (slave and the non-responding auth
server) might also point you in the right direction.

Right, but when I perform a "dig" from the server OS, the transfer and 
network-communication work fine - so there are no signs as to why named 
can't connect to the master, but the OS can.


I'll do some more digging.

Thanks.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-09 Thread Cathy Almond
On 08/05/13 19:15, Tom Sommer wrote:
> 
> On 5/8/13 12:25 PM, Cathy Almond wrote:
>> On 08/05/13 08:26, Tom Sommer wrote:
>>> Hi,
>>>
>>> I have a problem with one of 3 slave servers, all set up the exact same
>>> way, with the exact same bind version and configuration.
>>>
>>> One slave has a problem transfering zones from the master.
>>>
>>> The logfiles are flooded with "received notify for zone" .. "refresh in
>>> progress, refresh check queued" lines and "rndc status" returns a
>>> constant high number of "soa queries in progress".
>>> After a few hours the zones are transfers, so the connection to the
>>> master is working, but there is a major delay. I tried resetting the
>>> slave and transfering ALL slave zones again, which worked fine
>>> instantly. The problem still appeared again after a few hours though.
>>>
>>> The master has three network-paths, one on external IP, one on internal
>>> IP and one on IPv6. All 3 paths work fine, because the transfers happen
>>> after an hour or so.
>>>
>>> There is no hints in the master's log.
>>> The other two slaves are running perfectly, no errors or delays what so
>>> ever.
>>>
>>> Bind version 9.9.2-P2 (recently upgraded to).
>>>
>>> Any hints would be appreciated, as I feel like I've exhausted most
>>> options.
>>>
>>> Thank you.
>> Have a look at this KB article (you'll need to register to view - but
>> registration is open to all):
>>
>> https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html
>>
>>
>> Also - and this isn't covered in that article (yet) - if you're using
>> views, then use-alt-transfer-source defaults to 'yes'.  You might want
>> to set it explicitly to 'no' or to define alt-transfer-source
>> and/or alt-transfer-source-v6.
>>
> Thank you, great resource. I think I solved it with raising
> serial-query-limit, it's just odd that it's not required on the other
> two servers.
> 
> Another issue has arisen now though, the logfile is filled with lots of
> named[5596]: zone example.com/IN: refresh: failure trying master
> 1.2.3.4#53 (source 0.0.0.0#0): operation canceled
> 
> But if I do a "dig example.com @1.2.3.4" it's working just fine. Same
> server as with the previous issue.
> 
> Any thoughts? Thank you.
> 
> // Tom

I don't think you solved the problem - I think you moved it (or made it
happen faster...)

The refresh errors indicate that the master isn't responding to your
slave for some reason.  That's what you'll need to investigate.  I would
suggest auditing the differences between this slave and the others in
their named configurations as well as their configured IP interfaces and
routing tables.

A pair of network packet traces (slave and the non-responding auth
server) might also point you in the right direction.

Cathy
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-08 Thread Tom Sommer


On 5/8/13 8:15 PM, Tom Sommer wrote:

Another issue has arisen now though, the logfile is filled with lots of
named[5596]: zone example.com/IN: refresh: failure trying master 
1.2.3.4#53 (source 0.0.0.0#0): operation canceled



and

named[5596]: zone example.com/IN: refresh: retry limit for master 
1.2.3.4#53 exceeded (source 0.0.0.0#0)


// Tom
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-08 Thread Tom Sommer


On 5/8/13 12:25 PM, Cathy Almond wrote:

On 08/05/13 08:26, Tom Sommer wrote:

Hi,

I have a problem with one of 3 slave servers, all set up the exact same
way, with the exact same bind version and configuration.

One slave has a problem transfering zones from the master.

The logfiles are flooded with "received notify for zone" .. "refresh in
progress, refresh check queued" lines and "rndc status" returns a
constant high number of "soa queries in progress".
After a few hours the zones are transfers, so the connection to the
master is working, but there is a major delay. I tried resetting the
slave and transfering ALL slave zones again, which worked fine
instantly. The problem still appeared again after a few hours though.

The master has three network-paths, one on external IP, one on internal
IP and one on IPv6. All 3 paths work fine, because the transfers happen
after an hour or so.

There is no hints in the master's log.
The other two slaves are running perfectly, no errors or delays what so
ever.

Bind version 9.9.2-P2 (recently upgraded to).

Any hints would be appreciated, as I feel like I've exhausted most options.

Thank you.

Have a look at this KB article (you'll need to register to view - but
registration is open to all):

https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html

Also - and this isn't covered in that article (yet) - if you're using
views, then use-alt-transfer-source defaults to 'yes'.  You might want
to set it explicitly to 'no' or to define alt-transfer-source
and/or alt-transfer-source-v6.

Thank you, great resource. I think I solved it with raising 
serial-query-limit, it's just odd that it's not required on the other 
two servers.


Another issue has arisen now though, the logfile is filled with lots of
named[5596]: zone example.com/IN: refresh: failure trying master 
1.2.3.4#53 (source 0.0.0.0#0): operation canceled


But if I do a "dig example.com @1.2.3.4" it's working just fine. Same 
server as with the previous issue.


Any thoughts? Thank you.

// Tom
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stalling slave transfers

2013-05-08 Thread Cathy Almond
On 08/05/13 08:26, Tom Sommer wrote:
> Hi,
> 
> I have a problem with one of 3 slave servers, all set up the exact same
> way, with the exact same bind version and configuration.
> 
> One slave has a problem transfering zones from the master.
> 
> The logfiles are flooded with "received notify for zone" .. "refresh in
> progress, refresh check queued" lines and "rndc status" returns a
> constant high number of "soa queries in progress".
> After a few hours the zones are transfers, so the connection to the
> master is working, but there is a major delay. I tried resetting the
> slave and transfering ALL slave zones again, which worked fine
> instantly. The problem still appeared again after a few hours though.
> 
> The master has three network-paths, one on external IP, one on internal
> IP and one on IPv6. All 3 paths work fine, because the transfers happen
> after an hour or so.
> 
> There is no hints in the master's log.
> The other two slaves are running perfectly, no errors or delays what so
> ever.
> 
> Bind version 9.9.2-P2 (recently upgraded to).
> 
> Any hints would be appreciated, as I feel like I've exhausted most options.
> 
> Thank you.

Have a look at this KB article (you'll need to register to view - but
registration is open to all):

https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html

Also - and this isn't covered in that article (yet) - if you're using
views, then use-alt-transfer-source defaults to 'yes'.  You might want
to set it explicitly to 'no' or to define alt-transfer-source
and/or alt-transfer-source-v6.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users