Re: Notify storms

2010-01-21 Thread Todd
I agree - we are removing 1/2 the masters in a couple weeks to help
things.  Slaves only talk to masters, there are no slaves of slaves
as we refer to them.  Our architecture goal has been uniformity among
the configurations, and this is part of the price we pay for that.

At a functional level, this will persist, just not with as much
traffic.  As a result of our use of CVS, and our inability to control
notifies, whenever we push out big updates, the first master takes all
the traffic, while the others sit there unused.  We've experimented
with breaking up the notifies by pushing updates in chunks to various
masters, but that really breaks things, both process-wise and
logically.

What we really need is some way to say there is an update, any one of
these servers has an acceptable version of the update instead of
hey, there's an update hey, there's an update hey there's an
update and having each slave go to each of the masters.  While it
seems trivial operationally to handle these loads, and we're not
concerned about network bandwidth, a single master can only manage so
many transactions at once.  I doubt we're even in the top 50% of
deployments, zone count wise, so I am confident that our number of
zones isn't an issue.  But I suspect that 80+ slaves are a little out
there.  with 80 slaves and 1800 zones, each master sends out 144000
notifies (for major changes/a master reload), which triggers 144000
SOA queries back to the master very quickly.  That is bound to cause
delays.

One option we've considered is making our MASTERS and NS records point
to an anycast IP/load balancer so that there can be multiple masters
answering for the same notify.  Another option would be to stop all
notifies altogether, then figure out a way to manually trigger
notifies (generating notifys via perl script/something clever) so we
can control where the notifies come from.  When all the EU DNS servers
get notifies first from an NA master, they grab the data from there,
so being able to control notifies would be nice sometimes.

Thankfully we're mid-rearchitecture, and this will (hopefully) be torn
out soon, but until it is we need to make sure that our users can
manage their changes in a reasonable manner.  A for loop doing rndc
retransfer for changed zones, which seems to bypass all the
congestion, is a short term fix until we can figure out how to make
things a little smoother.

Apologies for the wall of text - this is a frequent discussion with
very little in the way of conclusion around here :)

Todd.




On Wed, Jan 20, 2010 at 10:33 PM, Joseph S D Yao j...@tux.org wrote:
 On Wed, Jan 20, 2010 at 03:52:33PM -0500, Todd wrote:
  serial-query-rate

 While this appears to be helping in the lab, it's still taking between
 2 and 3 minutes for each slave to even finish receiving the NOTIFYs
 from the master.  They then start hitting the master(s) with SOA
 queries whch seems to take a really long time.


 Your NOTIFY tree sounds like it's many-to-many.  Maybe you should be
 using a sparser tree.


 --
 /*\
 **
 ** Joe Yao                              j...@tux.org - Joseph S. D. Yao
 **
 \*/

___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


named 9.6.1 Filling wtmp

2010-01-21 Thread David Kreindler
We have BIND 9.6.1-P3 running on several AIX 5.3 servers. On one of them, named 
is filling /var/adm/wtmp with numerous entries like the following.

user pts/1 pts/1 7 1327240   1264089183 host-NN.domain Thu Jan 21 
10:53:03 EST 2010
 named   8 2572472   1264089217Thu Jan 21 
10:53:37 EST 2010
 named   8 2572472   1264089217Thu Jan 21 
10:53:37 EST 2010
 named   8 2572472   1264089277Thu Jan 21 
10:54:37 EST 2010
 named   8 2572472   1264089277Thu Jan 21 
10:54:37 EST 2010
 named   8 2572472   1264089337Thu Jan 21 
10:55:37 EST 2010
 named   8 2572472   1264089337Thu Jan 21 
10:55:37 EST 2010
 named   8 2572472   1264089337Thu Jan 21 
10:55:37 EST 2010
 named   8 2572472   1264089397Thu Jan 21 
10:56:37 EST 2010
 named   8 2572472   1264089397Thu Jan 21 
10:56:37 EST 2010
 named   8 2572472   1264089397Thu Jan 21 
10:56:37 EST 2010
 named   8 2572472   1264089457Thu Jan 21 
10:57:37 EST 2010
 named   8 2572472   1264089457Thu Jan 21 
10:57:37 EST 2010
 named   8 2572472   1264089457Thu Jan 21 
10:57:37 EST 2010
 named   8 2572472   1264089517Thu Jan 21 
10:58:37 EST 2010
 named   8 2572472   1264089517Thu Jan 21 
10:58:37 EST 2010
 named   8 2572472   1264089517Thu Jan 21 
10:58:37 EST 2010
 named   8 2572472   1264089577Thu Jan 21 
10:59:37 EST 2010
 named   8 2572472   1264089577Thu Jan 21 
10:59:37 EST 2010
 named   8 2572472   1264089577Thu Jan 21 
10:59:37 EST 2010
 named   8 2572472   1264089637Thu Jan 21 
11:00:37 EST 2010
 named   8 2572472   1264089637Thu Jan 21 
11:00:37 EST 2010
 named   8 2572472   1264089637Thu Jan 21 
11:00:37 EST 2010
 named   8 2572472   1264089697Thu Jan 21 
11:01:37 EST 2010
 named   8 2572472   1264089697Thu Jan 21 
11:01:37 EST 2010
 named   8 2572472   1264089697Thu Jan 21 
11:01:37 EST 2010
 named   8 2572472   1264089757Thu Jan 21 
11:02:37 EST 2010
 named   8 2572472   1264089757Thu Jan 21 
11:02:37 EST 2010
 named   8 2572472   1264089757Thu Jan 21 
11:02:37 EST 2010
 named   8 2572472   1264089817Thu Jan 21 
11:03:37 EST 2010
 named   8 2572472   1264089817Thu Jan 21 
11:03:37 EST 2010
 named   8 2572472   1264089817Thu Jan 21 
11:03:37 EST 2010
 named   8 2572472   1264089877Thu Jan 21 
11:04:37 EST 2010
 named   8 2572472   1264089877Thu Jan 21 
11:04:37 EST 2010
 named   8 2572472   1264089937Thu Jan 21 
11:05:37 EST 2010
 named   8 2572472   1264089937Thu Jan 21 
11:05:37 EST 2010

What is going on? How do we correct this issue?

___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Strange CNAME issue

2010-01-21 Thread seren
Thanks for your response. I didn't know about the +trace option in dig. After 
some more searching, I believe you are correct about the long responses being 
related. The responses that fail all seem to exceed 512-bytes. Why this would 
happen in multiple locations is a mystery but perhaps our firewalls are 
configured similarly. I'll look into the firewall settings on my end, but since 
there may be other devices out there configured similarly I'll need to try and 
reduce my CNAMES to find into a 512-byte response or switch them to A records.

 -seren

On Jan 20, 2010, at 1:48 AM, Niall O'Reilly wrote:

 seren wrote:
 Hi, I've run into some strange issues with BIND and CNAMES.
 
   The examples you show indicate strange issues only with
   whatever name server code is running on your localhost.
   Nothing in your examples actually identify this as BIND.
 
 We're using BIND9 (on Ubuntu)
 internally and have our external DNS hosted by NetworkSolutions. 
 Occasionally I'll be able
 to create a CNAME in NetworkSolutions that BIND is unable to resolve.
 Using dig I notice it's doing a query for an A record,
 
   This is the record type use by dig in default of a specific
   type on the command line.
 
 and in most cases this works even
 if the entry is a CNAME. In the cases where it fails, I see either a timeout 
 error or a
 SERVFAIL.
 
   Your local instance of named is respectively either not
   responding, or reporting an error.
 
   Have you looked in your logs for more information?
   Have you tried 'dig +trace'?
 
 If I then do a dig query specifying a CNAME, I get a quick successful result
 and subsequent queries to BIND succeed, until the record expires from the 
 cache.
 The records that fail don't seem to have anything in common besides them all 
 being
 CNAMES and longer names seeming to fail more. Both BIND9 and two 
 windows-based DNS
 servers fail with the exact same records, however Google (8.8.8.8) and 
 several other
 public DNS services resolve them fine.
 
   I think you need to ask what's different between (on the one
   hand) your BIND9 and windows-based name servers and (on the
   other) name servers which you tell us work: if not in the
   configuration, then in the environment.
 
   Are all of your failing name servers behind the same firewall?
   If so, does the firewall allow DNS queries and responses over
   TCP as well as UDP?  Does the firewall perhaps break long
   responses?  I ask because I've noticed some truncation
   and fallback to TCP when I use 'dig +trace' to query for one of
   the names you've mentioned as failing.
 
 
   Best regards,
 
   Niall O'Reilly
   University College Dublin IT Services
 


___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Added new master zone, copy .hosts does not replicate properly

2010-01-21 Thread Ryan S

We run 2 BIND in master/master configuration in two geographically separate 
datacenters.  This is done because a master/slave configuration has certain 
limitations if the master goes down (slave can not be easily modified).

 

The setup we have works great -- we make changes to A or MX records in 
MASTER-A and there is a script that runs every 5 minutes that copies these 
files to MASTER-B:

 

/var/named/chroot/var/named/*.hosts and *.rev

 

So I created a new master zone on MASTER-A, call it test.com, it creates a 
new file test.com.hosts on MASTER-A

 

This file is replicated (copied) to MASTER-B but even though the script reloads 
BIND on MASTER-B it does not work.  I think there might be a file that has a 
list of all the hosts or is there some other file I need to also copy over ?  I 
checked the directory structures and can not find it.

 

So my setup has been working great modifying existing zones adding and removing 
records.  But when I add a new zone, it apparently does not work.  So I think I 
am missing an important file that lists all the zones BIND uses?   What BIND 
file am I needing to copy to properly replicate MASTER-A to MASTER-B?  I hope 
this is something very simple ..
  
_
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/196390710/direct/01/___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Added new master zone, copy .hosts does not replicate properly

2010-01-21 Thread Michael Sinatra

On 1/21/10 3:40 PM, Ryan S wrote:


So my setup has been working great modifying existing zones adding
and removing records.  But when I add a new zone, it apparently does
not work.  So I think I am missing an important file that lists all
the zones BIND uses?   What BIND file am I needing to copy to
properly replicate MASTER-A to MASTER-B?  I hope this is something
very simple ..


named.conf, perhaps?
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: named 9.6.1 Filling wtmp

2010-01-21 Thread Mark Andrews

In message 6b845b73-065f-4e8b-afa5-408ecdbe7...@govnet.state.vt.us, David Kre
indler writes:
 We have BIND 9.6.1-P3 running on several AIX 5.3 servers. On one of them, nam
 ed is filling /var/adm/wtmp with numerous entries like the following.

This is not named (the program).  It may be su or some other process that
is logging changes in uid.  Or it could be someone login in as the user
named.

Mark
 
 user pts/1 pts/1 7 1327240   1264089183 host-NN.domain Thu Jan 21 10:
 53:03 EST 2010
  named   8 2572472   1264089217Thu Jan 21 10:
 53:37 EST 2010
  named   8 2572472   1264089217Thu Jan 21 10:
 53:37 EST 2010
  named   8 2572472   1264089277Thu Jan 21 10:
 54:37 EST 2010
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Fatal Error in resolver.c

2010-01-21 Thread Jeremy C. Reed
Thank you very much for your bug report. For your
information, you can also submit bugs to our bind9-bugs
AT isc.org email address.  Your issue is now being
tracked as ticket # 20923.
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


BIND 9.6 - IXFR to slaves not working after manual zone edit.

2010-01-21 Thread Pete McNeil

Hi folks,

According to DNS and BIND (the book) and other sources, BIND from 9.3 on 
is able to calculate differences for a zone file that is manually edited 
by using the following methodology:


rndc freeze zone
perform-updates-on-the-zone-file
rndc thaw zone

http://books.google.com/books?id=HggtWI1ShvMCpg=PT263dq=%22this+means+you+can+now+%28or+again%29+edit+zone+datafiles+manually.%22cd=1#v=onepageq=f=false

All servers have ixfr-from-differences yes; appears in options { } on 
the master and slave servers.


I have created a script that does what is specified. The 
perform-updates-on-the-zone-file part of the process reads a list of IPs 
from a proprietary database and recreates the zone file. The result is 
the same zone file with a new serial number and some changes in the IPs 
that are include (A records for a blocking list). The rest of the file 
appears unchanged. This is as if the file had been edited by hand to 
change a few of the A records and update the serial number with no other 
changes.


I am performing this operation on an RHEL 5 server using the an RPM from 
FC including BIND 9.6.1. BIND (named) is running chrooted.


The zone file is located in /var/named/chroot/var/named/dynamic so that 
named can write to it and create the .jnl file.


The design of the system is for this server to calculate the differences 
and then provide the changes to slave servers vi IXFR for delivery to 
end user systems etc.


The problem:

With each update the entire zone is being retransmitted via AXFR. Since 
this is on the order of 2 - 5 million A records where only a handfull 
will have changed between updates (about 10 minutes apart) -- AXFR is 
_NOT_ desired.


Additional symptoms:

BIND appears to be unable to see the .jnl file after it is successfully 
created. It complains at each update that the .jnl file does not exist 
and then recreates it. However it is clear that the .jnl file _does_ 
exist and has the correct permissions for named to see it.



Log says:

journal file /var/named/dynamic/blacklist.example.com.zone.jnl does not 
exist, creating it


However ls says:

[r...@server etc]# ls -l /var/named/chroot/var/named/dynamic
total 56704
-rwxrwx--- 1 named named 57852592 Jan 21 20:11 blacklist.example.com.zone
-rw-r--r-- 1 named named   141770 Jan 21 20:12 
blacklist.example.com.zone.jnl


Additional info:

The master and slave servers are using the same version of BIND in the 
same configuration. The slave servers are working correctly as a 
resolver farm and have been for quite some time. Once transferred 
queries to the zone work as expected. The problem is strictly how the 
transfers are done.


As an experiment I tried using nsupdate to change a single record at a 
time in the zone. Each time I changed a single record the slaves were 
notified and performed a full download of the zone again via AXFR.


Anyone have any idea what's going on?
Why doesn't IXFR work as specified?
Why does BIND complain that the .jnl file for the zone does not exsit 
when it is clearly visible?
Why does a single record change via nsupdate also cause a full zone 
transfer to slaves?


Thanks in advance!

_M

___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Strange CNAME issue

2010-01-21 Thread Mark Andrews

In message a9981203-ca2a-4ba2-b95b-08d992178...@mellmo.com, seren writes:
 
 Thanks for your response. I didn't know about the +trace option in dig. =
 After some more searching, I believe you are correct about the long =
 responses being related. The responses that fail all seem to exceed =
 512-bytes. Why this would happen in multiple locations is a mystery but =
 perhaps our firewalls are configured similarly. I'll look into the =
 firewall settings on my end, but since there may be other devices out =
 there configured similarly I'll need to try and reduce my CNAMES to find =
 into a 512-byte response or switch them to A records.
 
  -seren

Some filewall vendors / operators think that all UDP DNS responses
are = 512 bytes of payload.  This has not be the case offically
for over a decade now with EDNS, and was never one in practice as
there have always been servers that sent larger responses as long
as I've been working with DNS, ~20 years now.

Some filewall vendors / operators think that TCP DNS is only used
for AXFR.  This has *never* been the case.

One or both of these may be the problem.

Mark

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND 9.6 - IXFR to slaves not working after manual zone edit.

2010-01-21 Thread Mark Andrews

First of all, decide where you want the delta to be computed.  On
the master or on the slaves?  Currently you are asking all servers
to compute the delta.

Secondly if you are not using dynamic update to modify the master
zone you don't need to freeze the master zone as named is NOT writing
to it.  Just replace and reload will work.

Thirdly rndc freeze removes the journal.

Fourthly ixfr-from-differences forces the slave to make AXFR requests.

Lastly, dynamic zones and ixfr-from-differences are currently
mutually exclusive.  This may or may not change in future releases.

Mark

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users