Re: BIND 9.7 Serial Number Decrease Problem

2011-06-17 Thread John Wobus

Barry Finkel wrote:

I ran a test this morning on one of the Solaris 10 slave servers.
A query to the server showed serial numbers:

 _tcp   1238
 _udp842

Both of these match the zone on the MS Windows DNS Server.
I checked the zone files on the slave server:

 _tcp   1239
 _udp843

Both of these are increased by one from what BIND returns in
response to a query.

The two zones have NO .jnl files.

I did

 ./rndc stop
 Wait for the exiting message.
 /etc/init.d/named.anl start;tail -f /var/adm/messages

Once BIND started, the serial numbers were INCREASED, as I
expected they would be, given the lack of .jnl files.

And a few minutes later BIND complained about the serial
number on the master being less than that on the slave
for both zones.  I consider this a bug in BIND 9.
What further diagnostics do I need to get?

I have another Solaris 10 slave on which, I assume, I can
duplicate this.  And from past experience, in one day, after
the zone has expired and been refreshed, I will be in the same
state on this slave.


Do bind slave instances EVER make up or increment serial
numbers?  This just seems like such an unlikely bug
that bind would start doing that.  Could it be that
the supposed slave instance is accepting dynamic updates?

I'd be tracing/tracking SOA files on the master, and communications
between the dns instances very closely before I'd even
give such a potential bug much thought. Perhaps there are
bind functions that I'm not aware of and I'm wrong.

John

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND 9.7 Serial Number Decrease Problem

2011-06-10 Thread Barry Finkel

On 07/06/11 13:51, I wrote:

I now have this situation on one Solaris 10 slave; the problem
probably also exists on the other Sol 10 slave and the two
Ubuntu hardy slaves:

The _tcp zone on the master MS DNS Server:

 1238 600 86400 3600

The _tcp zone on the BIND 9.7.3-P1 Solaris 10 server disk:

 1239   ; serial
 900; refresh (15 minutes)
 600; retry (10 minutes)
 86400  ; expire (1 day)
 3600   ; minimum (1 hour)

The _udp zone on the master MS DNS Server:

 842 900 600 86400 3600

The _udp zone on the BIND 9.7.3-P1 Solaris 10 server disk:
 843; serial
 900; refresh (15 minutes)
 600; retry (10 minutes)
 86400  ; expire (1 day)
 3600   ; minimum (1 hour)

Note that the zone serial number for both zones on the master is
one LESS than the serial number on the slave.  The last messages
in /var/adm/messages are

 _tcp:
 Jun  4 07:46:57 serial number (1238) received from master ... 
ours (1239)
 Jun  4 07:47:35 zone ... expired
 Jun  4 07:47:35 zone ... transfer started
 Jun  4 07:47:35 zone ... transferred serial 1238
 Jun  4 07:47:35 zone ... Transfer completed: ...

 _udp:
 Jun  4 07:39:22 serial number (842) received from master ... 
ours (843)
 Jun  4 07:42:22 zone ... expired
 Jun  4 07:42:22 zone ... transfer started
 Jun  4 07:42:22 zone ... transferred serial 842
 Jun  4 07:42:22 zone ... Transfer completed

There was a zone serial number mismatch, each zone expired three days
ago, and new zones were transferred from the master.  But the zone
files on disk still have the higher serial numbers.  There are no .jnl
files on the disk.  A dig on the server for the zone serial numbers
shows the lower numbers, so BIND has those correct serial numbers.  I
assume that if I stopped BIND (rndc stop) and restarted it, then I
would again see the serial number mismatches.  I can try this during
the day, as this server is not heavily used.  Is there any debugging I
need to run?  Thanks.



I ran a test this morning on one of the Solaris 10 slave servers.
A query to the server showed serial numbers:

 _tcp   1238
 _udp842

Both of these match the zone on the MS Windows DNS Server.
I checked the zone files on the slave server:

 _tcp   1239
 _udp843

Both of these are increased by one from what BIND returns in
response to a query.

The two zones have NO .jnl files.

I did

 ./rndc stop
 Wait for the exiting message.
 /etc/init.d/named.anl start;tail -f /var/adm/messages

Once BIND started, the serial numbers were INCREASED, as I
expected they would be, given the lack of .jnl files.

And a few minutes later BIND complained about the serial
number on the master being less than that on the slave
for both zones.  I consider this a bug in BIND 9.
What further diagnostics do I need to get?

I have another Solaris 10 slave on which, I assume, I can
duplicate this.  And from past experience, in one day, after
the zone has expired and been refreshed, I will be in the same
state on this slave.
-
--
Barry S. Finkel
Computing and Information Systems Division
Argonne National Laboratory  Phone:+1 (630) 252-7277
9700 South Cass Avenue   Facsimile:+1 (630) 252-4601
Building 240, Room 5.B.8 Internet: bsfin...@anl.gov
Argonne, IL   60439-4828 IBMMAIL:  I1004994
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND 9.7 Serial Number Decrease Problem

2011-06-07 Thread Phil Mayers

On 06/06/2011 08:01 PM, Barry Finkel wrote:


Phil Mayers suggested a corrupt .jnl file; I am not sure.
How do I debug this?


Given what Mark has said, I think it's unlikely; I didn't realise bind 
wrote a new journal and did a rename() which is atomic on every POSIX 
system that you're likely to be using.


So, ignore what I said!
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND 9.7 Serial Number Decrease Problem

2011-06-07 Thread Barry Finkel

In my last posting I was confused as to the .jnl file.  I have
about 44 AD slave files on my BIND servers, and 40 .jnl files.
The two zones in question do not have .jnl files.  As I do not
look at .jnl files much, I had forgotten about the tool to
list them.

I now have this situation on one Solaris 10 slave; the problem
probably also exists on the other Sol 10 slave and the two
Ubuntu hardy slaves:

The _tcp zone on the master MS DNS Server:

 1238 600 86400 3600

The _tcp zone on the BIND 9.7.3-P1 Solaris 10 server disk:

 1239   ; serial
 900; refresh (15 minutes)
 600; retry (10 minutes)
 86400  ; expire (1 day)
 3600   ; minimum (1 hour)

The _udp zone on the master MS DNS Server:

 842 900 600 86400 3600

The _udp zone on the BIND 9.7.3-P1 Solaris 10 server disk:
 843; serial
 900; refresh (15 minutes)
 600; retry (10 minutes)
 86400  ; expire (1 day)
 3600   ; minimum (1 hour)

Note that the zone serial number for both zones on the master is
one LESS than the serial number on the slave.  The last messages
in /var/adm/messages are

 _tcp:
 Jun  4 07:46:57 serial number (1238) received from master ...  
ours (1239)

 Jun  4 07:47:35 zone ... expired
 Jun  4 07:47:35 zone ... transfer started
 Jun  4 07:47:35 zone ... transferred serial 1238
 Jun  4 07:47:35 zone ... Transfer completed: ...

 _udp:
 Jun  4 07:39:22 serial number (842) received from master ...  
ours (843)

 Jun  4 07:42:22 zone ... expired
 Jun  4 07:42:22 zone ... transfer started
 Jun  4 07:42:22 zone ... transferred serial 842
 Jun  4 07:42:22 zone ... Transfer completed

There was a zone serial number mismatch, each zone expired three days
ago, and new zones were transferred from the master.  But the zone
files on disk still have the higher serial numbers.  There are no .jnl
files on the disk.  A dig on the server for the zone serial numbers
shows the lower numbers, so BIND has those correct serial numbers.  I
assume that if I stopped BIND (rndc stop) and restarted it, then I
would again see the serial number mismatches.  I can try this during
the day, as this server is not heavily used.  Is there any debugging I
need to run?  Thanks.

--
--
Barry S. Finkel
Computing and Information Systems Division
Argonne National Laboratory  Phone:+1 (630) 252-7277
9700 South Cass Avenue   Facsimile:+1 (630) 252-4601
Building 240, Room 5.B.8 Internet: bsfin...@anl.gov
Argonne, IL   60439-4828 IBMMAIL:  I1004994
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND 9.7 Serial Number Decrease Problem

2011-06-07 Thread Daniel McDonald
On 6/7/11 7:51 AM, Barry Finkel bsfin...@anl.gov wrote:

 There was a zone serial number mismatch, each zone expired three days
 ago, and new zones were transferred from the master.  But the zone
 files on disk still have the higher serial numbers.  There are no .jnl
 files on the disk.  A dig on the server for the zone serial numbers
 shows the lower numbers, so BIND has those correct serial numbers.

If you have multiple masters for which this server is a slave, then check
the serial number on all of the masters.  I think you will find that one of
them is higher than the other...



  I
 assume that if I stopped BIND (rndc stop) and restarted it, then I
 would again see the serial number mismatches.  I can try this during
 the day, as this server is not heavily used.  Is there any debugging I
 need to run?  Thanks.

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281

___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND 9.7 Serial Number Decrease Problem

2011-06-07 Thread Phil Mayers

On 07/06/11 13:51, Barry Finkel wrote:

In my last posting I was confused as to the .jnl file.  I have
about 44 AD slave files on my BIND servers, and 40 .jnl files.
The two zones in question do not have .jnl files. As I do not
look at .jnl files much, I had forgotten about the tool to
list them.

I now have this situation on one Solaris 10 slave; the problem
probably also exists on the other Sol 10 slave and the two
Ubuntu hardy slaves:

The _tcp zone on the master MS DNS Server:

1238 600 86400 3600

The _tcp zone on the BIND 9.7.3-P1 Solaris 10 server disk:

1239 ; serial


As Dan McDonald mentioned - the AD integrated DNS zones do not 
maintain a stable serial number, and in fact return a per-AD-controller 
SOA statement.


Are you sure that isn't the cause of your problem?
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: BIND 9.7 Serial Number Decrease Problem

2011-06-07 Thread Barry Finkel

McDonald, Dan dan.mcdon...@austinenergy.com replied to my
posting:


I think your root problem is trying to deal with active directory
integrated zones.  We stopped using them entirely when we found that
each domain controller maintains an individual SOA record with its own
serial number.  The serial numbers rapidly (and purposely) fall out of
sync, but active directory doesn't care as they use a different
replication method.

The only way that we could successfully interact from bind was to set up
a forward-only zone and try to cache the results.  When we found that
Active directory under windows 2000 was unable to maintain proper
synchronization, we switched to bind for all zones and haven't looked
back.



If you check the list archives (back to the days when there was
bind-users and bind9-users), you will find my postings dealing
with MS article 282826.  MS details the problem with zone
serial numbers, and that is why we run the DNS Server on only
ONE Domain Controller (and have since the beginning of AD in
Windows 2000).  When we run the DNS Server on a second DC
(because the Windows admins want to), I tell BIND that there is
ONE master server.  I do not care what the zone serial number is
on the other DC DNS Server, unless we have to switch masters.
The only times I have switched is when the master DC is being
upgraded, and I switch to another DC as the master.
We have NO machines cofigured (as far as I know) to use the
DNS Servers on the DC as primary DNS servers; all machines
are configured to use the BIND slaves.

In the early days of AD, there were serial number decreases in
the MS code.  I had an open trouble ticket for a long time before
the MS DNS development team found the problem.  I have not had a
serial number decrease on the MS side for a long time except,
occasionally, when patches are being applied to the DC, the
serial number on one or more zones will decrease during the
patch run, but after the DC is rebooted, the serial number
goes back to a non-decrease normal.

--
--
Barry S. Finkel
Computing and Information Systems Division
Argonne National Laboratory  Phone:+1 (630) 252-7277
9700 South Cass Avenue   Facsimile:+1 (630) 252-4601
Building 240, Room 5.B.8 Internet: bsfin...@anl.gov
Argonne, IL   60439-4828 IBMMAIL:  I1004994
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND 9.7 Serial Number Decrease Problem

2011-06-06 Thread Barry Finkel

In message4de9045c.2050...@anl.gov, Barry Finkel writes:

I have a problem with BIND 9.7.x on Ubuntu.
I have two servers that are running 9.7.3.
They slave 332 zones, and they also master 213,750
malware/spyware zones that we have defined to reroute these
domains to a local machine.

When I was upgrading the BIND to 9.7.3-P1 yesterday, an

   ./rndc stop

command ran over 8 minutes, and named did not stop.
A kill command did not work; I had to revert to a
kill -9 command.  What was BIND doing?  Gracefully
closing all of the zones?


Most probably.  rndc stop ensures that masterfiles are up-to-date
before exiting.  rndc halt does not try to flush master files
before exiting.

There could also have been a reference leak causing named to not
stop.


  BIND 9.7.3-P1 came up fine, but there are two things that concern me:

1) After BIND began responding to queries, it was using
 100% of the CPU for about three minutes.  I am not sure what
 BIND was doing.  This is not major because BIND was handling
 customer queries, and after the three minutes the CPU usage
 dropped to a normal 1%.

2) Two zones reported serial number decreases.  This is bad.

I did some research on the two zones - both Microsoft
Active Directory zones (one _tcp and one _udp) that are mastered
on a Windows Domain Controller and slaved on my BIND boxes.
I have around 44 AD zones I slave, and only these two reported
problems - on my two internal Ubuntu slaves and my two Solaris 10
slaves.  The two Solaris 10 slaves do not run the spyware zones,
so I had no problem with ./rndc stop.  I therefore am not sure
that the serial number problems are due to the kill -9.


They shouldn't be.  The handling of master files and journals is
designed to have the power be pull at anytime provided the filesystem
supports atomic replacement of files.


I looked at the serial number issue on these two zones in detail;
I capture the serial numbers on all the AD zones each morning at
6:10.  Here is information for the _tcp zone:

   DateZone  Mast Slav Slav
   20 Oct 2010 _tcp. 1233 1233 1233
   21 Oct 2010 _tcp. 1239 1239 1239 The master incremented the serial.
   ...
   09 Nov 2010 _tcp. 1239 1239 1239
   10 Nov 2010 _tcp. 1238 1239 1239 Master decreased due to MS patch
   11 Nov 2010 _tcp. 1238 1238 1238
   ...
   03 Dec 2010 _tcp. 1238 1238 1238
   04 Dec 2010 _tcp. 1238 1238 1239 ??
   05 Dec 2010 _tcp. 1238 1239 1238 ??
   06 Dec 2010 _tcp. 1238 1238 1238
   ...
   09 Dec 2010 _tcp. 1238 1238 1238
   10 Dec 2010 _tcp. 1238 1238 1239 ??
   11 Dec 2010 _tcp. 1238 1239 1238 ??
   12 Dec 2010 _tcp. 1238 1238 1238
   ...
   05 Jan 2011 _tcp. 1238 1238 1238
   06 Jan 2011 _tcp. 1238 1239 1239 ??
   07 Jan 2011 _tcp. 1238 1238 1238
   ...
   02 Mar 2011 _tcp. 1238 1238 1238 Upgrade 9.7.2-P3 to 9.7.3
   03 Mar 2011 _tcp. 1238 1239 1239
   04 Mar 2011 _tcp. 1238 1238 1238
   ...
   16 Apr 2011 _tcp. 1238 1238 1238
   17 Apr 2011 _tcp. 1238 1238 1238 1238 1238 Two Sol10 slaves added.
   ...
   02 Jun 2011 _tcp. 1238 1238 1238 1238 1238 Upgrade 9.7.3 to 9.7.3-P1
   03 Jun 2011 _tcp. 1238 1239 1239 1239 1239

Both Ubuntu slaves have been up for 149 days (reboot around Jan 15).
The zone serial was 1239 until a MS patch run on the Domain
Controller decreased the serial by one on the evening of Nov 9.
I did nothing to correct the problem; I waited for the two zones
to expire, and then new zones were transferred from the Windows
master server.  The serial number was 1238 on the master and
slaves.  On a few days, the serial on the slaves increased
by one, and I am not sure what happened on those days.

On Mar 02 I upgraded BIND from 9.7.2-P3 to 9.7.3, and the
serial numbers on the two upgraded BIND slaves reverted to the
higher 1239 serial.  Again, I did no fixup, and on Mar 04
the serials were the same at the lower value.  I think that the
serial number decrease was temporary during the patch run.
On Apr 17 I added the two Solaris 10 slaves to my morning report, and
all five serials were contant at 1238 until I upgraded BIND Tuesday (on
the Solaris 10 boxes) and yesterday (on the Ubuntu boxes).  Immediately
after the upgrade BIND reported the serial number problem on these two
zones.  The other AD zones have had no serial number problems.

I have no idea why BIND would remember the increased 1239
serial number, when the serial number for the zone has been constant
at 1238 since Mar 04.  I have to assume that between Mar 04 and
Jun 03 BIND would have written the zone to disk, either in the
base zone file or a .jnl file.

--
--
Barry S. Finkel


Phil Mayers suggested a corrupt .jnl file; I am not sure.
How do I debug this?
I have the following situation now:

 1) The master (on an MS DNS Server) has serial 1238.

 2) The zone file on a Solaris 10 

Re: BIND 9.7 Serial Number Decrease Problem

2011-06-06 Thread Tony Finch
Barry Finkel bsfin...@anl.gov wrote:

 I am not sure how to decode the .jnl file; I have not looked at the code
 in detail.

Try the named-journalprint program. You can also try named-compilezone -j
which applies the journal to the master file.

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
Rockall, Malin, Hebrides: Cyclonic, becoming north, 5 to 7, occasionally gale
8 in Rockall. Moderate or rough, occasionally very rough in Rockall. Rain or
squally showers. Moderate or good, occasionally poor.
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: BIND 9.7 Serial Number Decrease Problem

2011-06-06 Thread McDonald, Dan
 -Original Message-
 From: bind-users-bounces+dan.mcdonald=austinenergy@lists.isc.org

[mailto:bind-users-bounces+dan.mcdonald=austinenergy@lists.isc.org]
 On Behalf Of Tony Finch
 Sent: Monday, June 06, 2011 2:43 PM
 To: Barry Finkel
 Cc: bind-users@lists.isc.org
 Subject: Re: BIND 9.7 Serial Number Decrease Problem

I think your root problem is trying to deal with active directory
integrated zones.  We stopped using them entirely when we found that
each domain controller maintains an individual SOA record with its own
serial number.  The serial numbers rapidly (and purposely) fall out of
sync, but active directory doesn't care as they use a different
replication method.

The only way that we could successfully interact from bind was to set up
a forward-only zone and try to cache the results.  When we found that
Active directory under windows 2000 was unable to maintain proper
synchronization, we switched to bind for all zones and haven't looked
back.


__
Daniel J McDonald, CCIE # 2495, CISSP # 78281
Austin Energy

___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND 9.7 Serial Number Decrease Problem

2011-06-05 Thread Mark Andrews

In message 4de9045c.2050...@anl.gov, Barry Finkel writes:
 I have a problem with BIND 9.7.x on Ubuntu.
 I have two servers that are running 9.7.3.
 They slave 332 zones, and they also master 213,750
 malware/spyware zones that we have defined to reroute these
 domains to a local machine.
 
 When I was upgrading the BIND to 9.7.3-P1 yesterday, an
 
   ./rndc stop
 
 command ran over 8 minutes, and named did not stop.
 A kill command did not work; I had to revert to a
 kill -9 command.  What was BIND doing?  Gracefully
 closing all of the zones?

Most probably.  rndc stop ensures that masterfiles are up-to-date
before exiting.  rndc halt does not try to flush master files
before exiting.

There could also have been a reference leak causing named to not
stop.

  BIND 9.7.3-P1 came up fine, but there are two things that concern me:
 
 1) After BIND began responding to queries, it was using
 100% of the CPU for about three minutes.  I am not sure what
 BIND was doing.  This is not major because BIND was handling
 customer queries, and after the three minutes the CPU usage
 dropped to a normal 1%.
 
 2) Two zones reported serial number decreases.  This is bad.
 
 I did some research on the two zones - both Microsoft
 Active Directory zones (one _tcp and one _udp) that are mastered
 on a Windows Domain Controller and slaved on my BIND boxes.
 I have around 44 AD zones I slave, and only these two reported
 problems - on my two internal Ubuntu slaves and my two Solaris 10
 slaves.  The two Solaris 10 slaves do not run the spyware zones,
 so I had no problem with ./rndc stop.  I therefore am not sure
 that the serial number problems are due to the kill -9.

They shouldn't be.  The handling of master files and journals is
designed to have the power be pull at anytime provided the filesystem
supports atomic replacement of files.

 I looked at the serial number issue on these two zones in detail;
 I capture the serial numbers on all the AD zones each morning at
 6:10.  Here is information for the _tcp zone:
 
   DateZone  Mast Slav Slav
   20 Oct 2010 _tcp. 1233 1233 1233
   21 Oct 2010 _tcp. 1239 1239 1239 The master incremented the serial.
   ...
   09 Nov 2010 _tcp. 1239 1239 1239
   10 Nov 2010 _tcp. 1238 1239 1239 Master decreased due to MS patch
   11 Nov 2010 _tcp. 1238 1238 1238
   ...
   03 Dec 2010 _tcp. 1238 1238 1238
   04 Dec 2010 _tcp. 1238 1238 1239 ??
   05 Dec 2010 _tcp. 1238 1239 1238 ??
   06 Dec 2010 _tcp. 1238 1238 1238
   ...
   09 Dec 2010 _tcp. 1238 1238 1238
   10 Dec 2010 _tcp. 1238 1238 1239 ??
   11 Dec 2010 _tcp. 1238 1239 1238 ??
   12 Dec 2010 _tcp. 1238 1238 1238
   ...
   05 Jan 2011 _tcp. 1238 1238 1238
   06 Jan 2011 _tcp. 1238 1239 1239 ??
   07 Jan 2011 _tcp. 1238 1238 1238
   ...
   02 Mar 2011 _tcp. 1238 1238 1238 Upgrade 9.7.2-P3 to 9.7.3
   03 Mar 2011 _tcp. 1238 1239 1239
   04 Mar 2011 _tcp. 1238 1238 1238
   ...
   16 Apr 2011 _tcp. 1238 1238 1238
   17 Apr 2011 _tcp. 1238 1238 1238 1238 1238 Two Sol10 slaves added.
   ...
   02 Jun 2011 _tcp. 1238 1238 1238 1238 1238 Upgrade 9.7.3 to 9.7.3-P1
   03 Jun 2011 _tcp. 1238 1239 1239 1239 1239
 
 Both Ubuntu slaves have been up for 149 days (reboot around Jan 15).
 The zone serial was 1239 until a MS patch run on the Domain
 Controller decreased the serial by one on the evening of Nov 9.
 I did nothing to correct the problem; I waited for the two zones
 to expire, and then new zones were transferred from the Windows
 master server.  The serial number was 1238 on the master and
 slaves.  On a few days, the serial on the slaves increased
 by one, and I am not sure what happened on those days.
 
 On Mar 02 I upgraded BIND from 9.7.2-P3 to 9.7.3, and the
 serial numbers on the two upgraded BIND slaves reverted to the
 higher 1239 serial.  Again, I did no fixup, and on Mar 04
 the serials were the same at the lower value.  I think that the
 serial number decrease was temporary during the patch run.
 On Apr 17 I added the two Solaris 10 slaves to my morning report, and
 all five serials were contant at 1238 until I upgraded BIND Tuesday (on
 the Solaris 10 boxes) and yesterday (on the Ubuntu boxes).  Immediately
 after the upgrade BIND reported the serial number problem on these two
 zones.  The other AD zones have had no serial number problems.
 
 I have no idea why BIND would remember the increased 1239
 serial number, when the serial number for the zone has been constant
 at 1238 since Mar 04.  I have to assume that between Mar 04 and
 Jun 03 BIND would have written the zone to disk, either in the
 base zone file or a .jnl file.
 
 -- 
 --
 Barry S. Finkel
 Computing and Information Systems Division
 Argonne National Laboratory  Phone:+1 (630) 252-7277
 9700 South Cass Avenue   Facsimile:+1 

Re: BIND 9.7 Serial Number Decrease Problem

2011-06-04 Thread Phil Mayers

On 06/03/2011 04:57 PM, Barry Finkel wrote:

I have a problem with BIND 9.7.x on Ubuntu.
I have two servers that are running 9.7.3.
They slave 332 zones, and they also master 213,750
malware/spyware zones that we have defined to reroute these
domains to a local machine.


That's a hell of a lot of zones.

Have you investigated RPZ in the newer versions of bind?


I have no idea why BIND would remember the increased 1239
serial number, when the serial number for the zone has been constant
at 1238 since Mar 04. I have to assume that between Mar 04 and
Jun 03 BIND would have written the zone to disk, either in the
base zone file or a .jnl file.



Perhaps the .jnl file was corrupted when you -9ed it?
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


BIND 9.7 Serial Number Decrease Problem

2011-06-03 Thread Barry Finkel

I have a problem with BIND 9.7.x on Ubuntu.
I have two servers that are running 9.7.3.
They slave 332 zones, and they also master 213,750
malware/spyware zones that we have defined to reroute these
domains to a local machine.

When I was upgrading the BIND to 9.7.3-P1 yesterday, an

 ./rndc stop

command ran over 8 minutes, and named did not stop.
A kill command did not work; I had to revert to a
kill -9 command.  What was BIND doing?  Gracefully
closing all of the zones?  BIND 9.7.3-P1 came up fine, but
there are two things that concern me:

1) After BIND began responding to queries, it was using
   100% of the CPU for about three minutes.  I am not sure what
   BIND was doing.  This is not major because BIND was handling
   customer queries, and after the three minutes the CPU usage
   dropped to a normal 1%.

2) Two zones reported serial number decreases.  This is bad.

I did some research on the two zones - both Microsoft
Active Directory zones (one _tcp and one _udp) that are mastered
on a Windows Domain Controller and slaved on my BIND boxes.
I have around 44 AD zones I slave, and only these two reported
problems - on my two internal Ubuntu slaves and my two Solaris 10
slaves.  The two Solaris 10 slaves do not run the spyware zones,
so I had no problem with ./rndc stop.  I therefore am not sure
that the serial number problems are due to the kill -9.

I looked at the serial number issue on these two zones in detail;
I capture the serial numbers on all the AD zones each morning at
6:10.  Here is information for the _tcp zone:

 DateZone  Mast Slav Slav
 20 Oct 2010 _tcp. 1233 1233 1233
 21 Oct 2010 _tcp. 1239 1239 1239 The master incremented the serial.
 ...
 09 Nov 2010 _tcp. 1239 1239 1239
 10 Nov 2010 _tcp. 1238 1239 1239 Master decreased due to MS patch
 11 Nov 2010 _tcp. 1238 1238 1238
 ...
 03 Dec 2010 _tcp. 1238 1238 1238
 04 Dec 2010 _tcp. 1238 1238 1239 ??
 05 Dec 2010 _tcp. 1238 1239 1238 ??
 06 Dec 2010 _tcp. 1238 1238 1238
 ...
 09 Dec 2010 _tcp. 1238 1238 1238
 10 Dec 2010 _tcp. 1238 1238 1239 ??
 11 Dec 2010 _tcp. 1238 1239 1238 ??
 12 Dec 2010 _tcp. 1238 1238 1238
 ...
 05 Jan 2011 _tcp. 1238 1238 1238
 06 Jan 2011 _tcp. 1238 1239 1239 ??
 07 Jan 2011 _tcp. 1238 1238 1238
 ...
 02 Mar 2011 _tcp. 1238 1238 1238 Upgrade 9.7.2-P3 to 9.7.3
 03 Mar 2011 _tcp. 1238 1239 1239
 04 Mar 2011 _tcp. 1238 1238 1238
 ...
 16 Apr 2011 _tcp. 1238 1238 1238
 17 Apr 2011 _tcp. 1238 1238 1238 1238 1238 Two Sol10 slaves added.
 ...
 02 Jun 2011 _tcp. 1238 1238 1238 1238 1238 Upgrade 9.7.3 to 9.7.3-P1
 03 Jun 2011 _tcp. 1238 1239 1239 1239 1239

Both Ubuntu slaves have been up for 149 days (reboot around Jan 15).
The zone serial was 1239 until a MS patch run on the Domain
Controller decreased the serial by one on the evening of Nov 9.
I did nothing to correct the problem; I waited for the two zones
to expire, and then new zones were transferred from the Windows
master server.  The serial number was 1238 on the master and
slaves.  On a few days, the serial on the slaves increased
by one, and I am not sure what happened on those days.

On Mar 02 I upgraded BIND from 9.7.2-P3 to 9.7.3, and the
serial numbers on the two upgraded BIND slaves reverted to the
higher 1239 serial.  Again, I did no fixup, and on Mar 04
the serials were the same at the lower value.  I think that the
serial number decrease was temporary during the patch run.
On Apr 17 I added the two Solaris 10 slaves to my morning report, and
all five serials were contant at 1238 until I upgraded BIND Tuesday (on
the Solaris 10 boxes) and yesterday (on the Ubuntu boxes).  Immediately
after the upgrade BIND reported the serial number problem on these two
zones.  The other AD zones have had no serial number problems.

I have no idea why BIND would remember the increased 1239
serial number, when the serial number for the zone has been constant
at 1238 since Mar 04.  I have to assume that between Mar 04 and
Jun 03 BIND would have written the zone to disk, either in the
base zone file or a .jnl file.

--
--
Barry S. Finkel
Computing and Information Systems Division
Argonne National Laboratory  Phone:+1 (630) 252-7277
9700 South Cass Avenue   Facsimile:+1 (630) 252-4601
Building 240, Room 5.B.8 Internet: bsfin...@anl.gov
Argonne, IL   60439-4828 IBMMAIL:  I1004994
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users