Re: High recursive client counts

2014-03-28 Thread Jason Brandt
Our public DNS servers are on site as well.  I user forwarders (as opposed
to slaves) from our resolvers to our public DNS servers for our internal
domains, and the resolvers still responded for internal domains, even when
the recursive count was high and external domains weren't responding.


On Thu, Mar 27, 2014 at 5:26 PM, Mark Andrews ma...@isc.org wrote:


 In message 53349e66.8050...@ksu.edu, Lawrence K. Chen, P.Eng. writes:
 
 
  On 03/26/14 04:02, Sam Wilson wrote:
   In article mailman.2530.1395774135.20661.bind-us...@lists.isc.org,
Jason Brandt jbra...@fsmail.bradley.edu wrote:
  
   For now, I've disabled DNS inspection on our firewall, as it is an
 ancient
   Cisco firewall services module, and that seems to have stabilized
 things,
   but it's only been 30 minutes or so.  Until I get a few days in, I'll
 keep
   researching.
  
   We used to run DNS inspection on our FWSMs.  We didn't notice any
 issues
   with DNS resolution per se, but we did find that turning it off dropped
   the FWSM CPU from ~70% to less than 30%.  We're not aware of any issues
   that using DNS inspection might have caused.
  
   Sam
  
 
  I had to get our DNS servers exempted from our Procera, as it was
 interfering
  DNSSEC.  The security analyst said it considered some of the large
 encrypted
  UDPs as P2P.
 
  So, every few days (less during busy times), a recursive caching query
 server
  would stop answeringwhere restarting it would make it work again.
  It was
  to the point where I had our monitoring system restart bind as needed.
 
  Eventually, my manager asked about all strange notifications.  Where he
 then
  pushed it up to the CISO to get the analyst to make the change to stop
  interfering with DNS.
 
  They had done a test a few months earlier, and said we didn't complain
 then.
  I went back through the logs, and found that it had been interfering
  then...but the weekend test wasn't enough to cause any servers to stop
 responding.
 
  I didn't think to see what the client counts were.  Though another time
 when
  the Procera had stopped passing any traffic, the counts did get really
 high
  before they stopped working.
 
  Need to work on figuring out how to have it resolve local domains when
  Internet connection is down.

 Slave the local zones is the simplest solution.

  --
  Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
  For: Enterprise Server Technologies (EST) --  SafeZone Ally
  ___
  Please visit https://lists.isc.org/mailman/listinfo/bind-users to
 unsubscribe from this list
 
  bind-users mailing list
  bind-users@lists.isc.org
  https://lists.isc.org/mailman/listinfo/bind-users
 --
 Mark Andrews, ISC
 1 Seymour St., Dundas Valley, NSW 2117, Australia
 PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
 ___
 Please visit https://lists.isc.org/mailman/listinfo/bind-users to
 unsubscribe from this list

 bind-users mailing list
 bind-users@lists.isc.org
 https://lists.isc.org/mailman/listinfo/bind-users




-- 
Jason K. Brandt
Systems Administrator
Bradley University
(309) 677-2958
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: High recursive client counts

2014-03-27 Thread Eliezer Croitoru

Are you using logs on the bind machine\s?

Eliezer

On 03/25/2014 04:31 PM, Jason Brandt wrote:

We recently migrated to BIND for our internal resolvers, and since the
migration, we are experiencing periods of high recursive client counts,
which will at times cause the BIND server to quit responding.  As a
workaround, I've been able to point the BIND server to a forwarder,
bypassing the root hints, to restore stability, but this morning even
with the forwarder, our count spiked.

We are using Ubuntu 12.04 LTS, BIND version 9.8.1-P1.  The server is
configured strictly as a resolver, and is not authoritative for any domains.

We have approximately 15-20k client devices on campus.  Our average
recursive client count is between 10 and 50.  When the spikes occur,
counts will get upwards of 3-4k (this morning: recursive clients:
2358/9900/1).

What are possible causes of high recursive client count?  What can be
done to prevent this or tune around it?  Obviously raising the max
clients doesn't solve the problem, and the forwarder seemed to help, but
apparently is still susceptible to the issue.

Any suggestions would be greatly appreciated.

--
Jason K. Brandt
Systems Administrator



___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users



___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: High recursive client counts

2014-03-27 Thread Lawrence K. Chen, P.Eng.


On 03/26/14 04:02, Sam Wilson wrote:
 In article mailman.2530.1395774135.20661.bind-us...@lists.isc.org,
  Jason Brandt jbra...@fsmail.bradley.edu wrote:
 
 For now, I've disabled DNS inspection on our firewall, as it is an ancient
 Cisco firewall services module, and that seems to have stabilized things,
 but it's only been 30 minutes or so.  Until I get a few days in, I'll keep
 researching.
 
 We used to run DNS inspection on our FWSMs.  We didn't notice any issues 
 with DNS resolution per se, but we did find that turning it off dropped 
 the FWSM CPU from ~70% to less than 30%.  We're not aware of any issues 
 that using DNS inspection might have caused.
 
 Sam
 

I had to get our DNS servers exempted from our Procera, as it was interfering
DNSSEC.  The security analyst said it considered some of the large encrypted
UDPs as P2P.

So, every few days (less during busy times), a recursive caching query server
would stop answeringwhere restarting it would make it work again.  It was
to the point where I had our monitoring system restart bind as needed.

Eventually, my manager asked about all strange notifications.  Where he then
pushed it up to the CISO to get the analyst to make the change to stop
interfering with DNS.

They had done a test a few months earlier, and said we didn't complain then.
I went back through the logs, and found that it had been interfering
then...but the weekend test wasn't enough to cause any servers to stop 
responding.

I didn't think to see what the client counts were.  Though another time when
the Procera had stopped passing any traffic, the counts did get really high
before they stopped working.

Need to work on figuring out how to have it resolve local domains when
Internet connection is down.

-- 
Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
For: Enterprise Server Technologies (EST) --  SafeZone Ally
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: High recursive client counts

2014-03-26 Thread Sam Wilson
In article mailman.2530.1395774135.20661.bind-us...@lists.isc.org,
 Jason Brandt jbra...@fsmail.bradley.edu wrote:

 For now, I've disabled DNS inspection on our firewall, as it is an ancient
 Cisco firewall services module, and that seems to have stabilized things,
 but it's only been 30 minutes or so.  Until I get a few days in, I'll keep
 researching.

We used to run DNS inspection on our FWSMs.  We didn't notice any issues 
with DNS resolution per se, but we did find that turning it off dropped 
the FWSM CPU from ~70% to less than 30%.  We're not aware of any issues 
that using DNS inspection might have caused.

Sam

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: High recursive client counts

2014-03-26 Thread Jason Brandt
The code on our FWSMs isn't the latest release, so that could be part of
the issue, but it's been about 16 hours now since I shut it off, and so far
so good.  I would say though with the other load on our firewalls, it's
highly possible that they were being overloaded.  Unfortunately our MRTG
isn't setup to track firewall CPU, so I can't say for sure.

Thanks,
Jason


On Wed, Mar 26, 2014 at 4:02 AM, Sam Wilson sam.wil...@ed.ac.uk wrote:

 In article mailman.2530.1395774135.20661.bind-us...@lists.isc.org,
  Jason Brandt jbra...@fsmail.bradley.edu wrote:

  For now, I've disabled DNS inspection on our firewall, as it is an
 ancient
  Cisco firewall services module, and that seems to have stabilized things,
  but it's only been 30 minutes or so.  Until I get a few days in, I'll
 keep
  researching.

 We used to run DNS inspection on our FWSMs.  We didn't notice any issues
 with DNS resolution per se, but we did find that turning it off dropped
 the FWSM CPU from ~70% to less than 30%.  We're not aware of any issues
 that using DNS inspection might have caused.

 Sam

 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.
 ___
 Please visit https://lists.isc.org/mailman/listinfo/bind-users to
 unsubscribe from this list

 bind-users mailing list
 bind-users@lists.isc.org
 https://lists.isc.org/mailman/listinfo/bind-users




-- 
Jason K. Brandt
Systems Administrator
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: High recursive client counts

2014-03-26 Thread Jason Brandt
I had it set as:
policy-map global_policy
 class inspection_default
inspect dns maximum-length 4096

Which is what Cisco recommends.  EDNS tests worked fine, but the BIND
servers would still get backed up.


On Wed, Mar 26, 2014 at 7:35 AM, Thom, Paul E paul.t...@ssc-spc.gc.cawrote:

  Do you have the FWSM DNS inspection configured to support EDNS.  Not
 sure if I have seen ASA / PIX code causing that problem when EDNS support
 was not configured on the firewalls but it's something to look at.





 *From:* bind-users-bounces+paul.thom=dfo-mpo.gc...@lists.isc.org [mailto:
 bind-users-bounces+paul.thom=dfo-mpo.gc...@lists.isc.org] *On Behalf Of *Jason
 Brandt
 *Sent:* March-26-14 9:09 AM
 *To:* Sam Wilson
 *Cc:* comp-protocols-dns-b...@isc.org
 *Subject:* Re: High recursive client counts



 The code on our FWSMs isn't the latest release, so that could be part of
 the issue, but it's been about 16 hours now since I shut it off, and so far
 so good.  I would say though with the other load on our firewalls, it's
 highly possible that they were being overloaded.  Unfortunately our MRTG
 isn't setup to track firewall CPU, so I can't say for sure.



 Thanks,

 Jason

 --

   Jason K. Brandt

 Systems Administrator






-- 
Jason K. Brandt
Systems Administrator
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Re: High recursive client counts

2014-03-26 Thread Jason Brandt
We don't do any NAT at the firewall level, they're all public IPs.

Thanks,
Jason


On Wed, Mar 26, 2014 at 7:51 AM, Timothe Litt l...@acm.org wrote:

 DNS inspection doesn't do anything useful; bind does enough validity
 checking.  UDP inspection suffices to let return packets thru.

 Another thing to beware of is NAT - if you do static NAT translation for
 your nameservers, be sure to specify no-payload (e.g.
   ip nat inside source static tcp/udp 10.0.0.1 53 16.123.213.11 53
 extendable no-payload )

 Otherwise, the router will try to be 'helpful' by modifying the payload -
 which  breaks quite a few things, and not necessarily in obvious ways.

 Timothe Litt
 ACM Distinguished Engineer
 --
 This communication may not represent the ACM or my employer's views,
 if any, on the matters discussed.


 On 26-Mar-14 05:02, Sam Wilson wrote:

 In article mailman.2530.1395774135.20661.bind-us...@lists.isc.org,
   Jason Brandt jbra...@fsmail.bradley.edu wrote:

  For now, I've disabled DNS inspection on our firewall, as it is an
 ancient
 Cisco firewall services module, and that seems to have stabilized things,
 but it's only been 30 minutes or so.  Until I get a few days in, I'll
 keep
 researching.

 We used to run DNS inspection on our FWSMs.  We didn't notice any issues
 with DNS resolution per se, but we did find that turning it off dropped
 the FWSM CPU from ~70% to less than 30%.  We're not aware of any issues
 that using DNS inspection might have caused.

 Sam




 ___
 Please visit https://lists.isc.org/mailman/listinfo/bind-users to
 unsubscribe from this list

 bind-users mailing list
 bind-users@lists.isc.org
 https://lists.isc.org/mailman/listinfo/bind-users




-- 
Jason K. Brandt
Systems Administrator
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: High recursive client counts

2014-03-26 Thread Sam Wilson
In article mailman.2540.1395835774.20661.bind-us...@lists.isc.org,
 Jason Brandt jbra...@fsmail.bradley.edu wrote:

 The code on our FWSMs isn't the latest release, so that could be part of
 the issue, but it's been about 16 hours now since I shut it off, and so far
 so good.  I would say though with the other load on our firewalls, it's
 highly possible that they were being overloaded.  Unfortunately our MRTG
 isn't setup to track firewall CPU, so I can't say for sure.

Logging into your FWSM and doing 'show cpu usage' when things are going 
badly might be an option, but if you've got MRTG monitoring the 6500 
that the FWSM is in you could also have a look at the traffic on the 
virtual ethernets that connect to the FWSM.  Whilst they don't show up 
on 'show int' they and a 6 Gbps portchannel are visible to SNMP and in 
'show firewall module X traffic' (or 'show firewall switch X module Y 
traffic' in a VSS setup)..

Sam

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: High recursive client counts

2014-03-26 Thread CARTWRIGHT, CORY C
,
   -version = 'snmpv1',
   -port= 162
);

if (!defined($session)) {
printf(ERROR: %s.\n, $error);
exit 1;
}

my $svSvcName = '1.3.6.1.4.1.77.1.2.3.1.1';
my $message = FWSM CPU TOO HIGH $cpu%;
my @oids = ($svSvcName, OCTET_STRING, $message);
#my @oids;
my $result = $session-trap(
-agentaddr= $monitor,
-varbindlist  = \@oids
#-varbindlist  = [$svSvcName, OCTET_STRING, 
$message]
);

if (!defined($result)) {
printf(ERROR: %s.\n, $session-error);
$session-close;

exit 1;

}

$session-close;
print Sent Trap \$message\ to $host\n;
} #end foreach
} #end sub

-Original Message-
From: bind-users-bounces+cc3283=att@lists.isc.org 
[mailto:bind-users-bounces+cc3283=att@lists.isc.org] On Behalf Of Sam Wilson
Sent: Wednesday, March 26, 2014 1:29 PM
To: comp-protocols-dns-b...@isc.org
Subject: Re: High recursive client counts

In article mailman.2540.1395835774.20661.bind-us...@lists.isc.org,
 Jason Brandt jbra...@fsmail.bradley.edu wrote:

 The code on our FWSMs isn't the latest release, so that could be part of
 the issue, but it's been about 16 hours now since I shut it off, and so far
 so good.  I would say though with the other load on our firewalls, it's
 highly possible that they were being overloaded.  Unfortunately our MRTG
 isn't setup to track firewall CPU, so I can't say for sure.

Logging into your FWSM and doing 'show cpu usage' when things are going 
badly might be an option, but if you've got MRTG monitoring the 6500 
that the FWSM is in you could also have a look at the traffic on the 
virtual ethernets that connect to the FWSM.  Whilst they don't show up 
on 'show int' they and a 6 Gbps portchannel are visible to SNMP and in 
'show firewall module X traffic' (or 'show firewall switch X module Y 
traffic' in a VSS setup)..

Sam

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: High recursive client counts

2014-03-26 Thread Jason Brandt
Thanks guys.  I appreciate the input.  I don't want to derail the list much
though, as this is supposed to be more BIND than Cisco :)

At this point my BIND installation seems to be stable, so we'll call it
case closed.  We do plan on replacing our firewalls in the near future, so
hopefully we won't need to put much more effort into it.  But again
appreciate all the help and suggestions, it definitely pushed me in the
right direction for finding the problem.

Jason


On Wed, Mar 26, 2014 at 12:56 PM, CARTWRIGHT, CORY C cc3...@att.com wrote:

 Here is a script I wrote to log and sent traps.  I'm sure you'll have to
 make a lot of changes but hopefully it can help you get started monitoring
 the FWSM.  You can use this as a template to expand upon.

 #!/usr/bin/perl

 use strict;
 use Expect;
 use Net::Telnet;
 use Data::Dumper;
 use POSIX qw(tzset);
 use Data::Dumper;
 use lib qw( /usr/local/rrdtool-1.2.13/lib/perl );
 use RRDs;
 use File::Copy;
 use Net::SNMP qw(:asn1);

 ##  quick fix for gathering codec data
 ## not very robust !!!
 ## author: Cory Cartwright corycartwri...@sbcglobal.net
 ##
 ## grab cisco FWSM cpu information for RRD graphing and SNMP trap
 generation
 ##

 $ENV{TZ} = 'EDT';
 POSIX::tzset();

 my $createRRD = shift || 'false';

 my $host = MY6500|7600 host;
 my $user = router username;
 my $pass = router passwd;
 my $fwUser = FWSM username;
 my $fwPasswd = FWSM password;
 my $comunity = FWSM comunity string;
 my $monitor = 'trap monitor IP';   # source that set and sent the trap
 my @trapCatchers = qw(array of trap catchers);

 my $filename = /var/voip/fwsm_logger.txt; #dump file
 my $DBfile = '/var/voip/codecDump.csv';

 my $trapThreshold = '60'; #'60'; #five sec thresh  send trap%
 my $procThreshold = '30'; #'30' ; #threshhold before we capture sh proc

 my %meas_hash = (   'fiveSec' = 'fiveSec',
 'oneMin' = 'oneMin',
 'fiveMin' = 'fiveMin',
  );
 my $rrd = '/usr/voip/bin/fwcpuRRD.rrd';

 if (! -e $rrd) { $createRRD = 'true'; }

 my $hashRef = doExec();

 if($hashRef-{'fiveSec'} = $trapThreshold) {
 #send trap
 print Sending trap\n;
 sendTrap($hashRef-{'fiveSec'});
 }

 createRRD($rrd,\%meas_hash) if($createRRD eq 'true');
 updateRRD($rrd,\%meas_hash,$hashRef);
 print  struct\n . Dumper(%meas_hash);
 print   data\n . Dumper($hashRef);
 copy($rrd,/var/www/voipdata/fwcpuRRD.rrd);

 sub doExec {


 my $exp = new Expect;
 #$exp-log_stdout(1);
 $exp-log_file($filename);

 my $command = ssh -l $fwUser $host;

 $exp-spawn($command) or die Could not spawn $command $!;

 my $string = qr/passwd/;
 my $return = $exp-expect(3, $string);

 $exp-send($pass\n);

 $return = $exp-expect(3, '7604-nh1');
 $exp-send(session slot 3 pro 1\n);

 $return = $exp-expect(3, /Password:/);
 $exp-send(x1c2v3\n);

 $return = $exp-expect(3, 'sipsfw');
 $exp-send(enable\n);
 $return = $exp-expect(3, $string);
 $exp-send($fwPasswd\n);

 $return = $exp-expect(3, 'sipsfw#');
 $exp-send(sh cpu\n);
 $exp-expect(2);
 my $cpu = $exp-before();
 $cpu = $exp-before();
 my %cpu = ();
 if($cpu =~
 /\d\sseconds\s=\s(\d+)\%\;\s\d\sminute\:\s(\d+)\%\;\s\d\sminutes\:\s(\d+)\%/g)
 {
 $cpu{'fiveSec'} = $1;
 $cpu{'oneMin'} = $2;
 $cpu{'fiveMin'} = $3;
 print Dumper(%cpu);
 }
 if($cpu{'fiveSec'} = $procThreshold) {
 my $timestamp = \nBEGIN: TIME:  . time .  !!  .
 localtime(time) . \n### CPU 5 sec  . $cpu{'fiveSec'} . \n;
 $exp-print_log_file($timestamp);
 $exp-send(no pager\n);
 $exp-send(sh proc\n);
 $exp-send(sh conn\n);
 $exp-send(sh resource usage\n);
 $exp-expect(3,'sipsfw#');
 }
 $exp-send(exit\n); #exit enable
 $exp-expect(1);
 $exp-send(exit\n); #exit fw
 $exp-expect(1);
 $exp-send(exit\n); #exit switch
 $exp-expect(1);
 $exp-print_log_file(\nEND\n);
 $exp-soft_close();

 return(\%cpu);
 } #end doExec


 sub updateRRD {
 my ($rrd,$meas_hashRef,$dataHashRef) = @_;
 my $epoc = time;
 my $data_string = '';
 foreach my $cust (sort keys %$meas_hashRef) {
 my $data = $$dataHashRef{$$meas_hashRef{$cust}} || 0;
print Cust $cust: $data \n;
 $data_string = $data_string . $data:;
 }

 $data_string =~ s/:$//g;
 print rrdtool update $rrd $epoc:$data_string\n;
 RRDs::updatev $rrd, $epoc .: . $data_string;
 if (my $ERROR = RRDs::error) {
 warn $0: unable to update $rrd : $ERROR;
 }
 } #end sub

 sub createRRD {
 my $starttime = time;
 my $step = (5 * 60);
 my ($rrd,$meas_hashRef) = @_;
 print Dumper($meas_hashRef);
 print In createRRD: ($starttime,$rrd,$step,$meas_hashRef)\n;
 my $DS_string = $rrd --start $starttime --step $step ;
 foreach(sort keys %{$meas_hashRef}) {
 print Key: $_\n;
 $DS_string = $DS_string . DS:$_:GAUGE:$step:U:U ;
 

Re: High recursive client counts

2014-03-26 Thread Scott Bertilson
This got me to take a look at rndc recursing on one of our servers.

It is disappointing that queries for the same FQDN/type/class from the same
client (different source port and query ID though) are handled individually
rather than being merged somehow.  Is this because of the ID or the source
port, both, or something else?


On Wed, Mar 26, 2014 at 2:05 PM, Jason Brandt jbra...@fsmail.bradley.eduwrote:

 Thanks guys.  I appreciate the input.  I don't want to derail the list
 much though, as this is supposed to be more BIND than Cisco :)

 At this point my BIND installation seems to be stable, so we'll call it
 case closed.  We do plan on replacing our firewalls in the near future, so
 hopefully we won't need to put much more effort into it.  But again
 appreciate all the help and suggestions, it definitely pushed me in the
 right direction for finding the problem.

 Jason


 On Wed, Mar 26, 2014 at 12:56 PM, CARTWRIGHT, CORY C cc3...@att.comwrote:

 Here is a script I wrote to log and sent traps.  I'm sure you'll have to
 make a lot of changes but hopefully it can help you get started monitoring
 the FWSM.  You can use this as a template to expand upon.

 #!/usr/bin/perl

 use strict;
 use Expect;
 use Net::Telnet;
 use Data::Dumper;
 use POSIX qw(tzset);
 use Data::Dumper;
 use lib qw( /usr/local/rrdtool-1.2.13/lib/perl );
 use RRDs;
 use File::Copy;
 use Net::SNMP qw(:asn1);

 ##  quick fix for gathering codec data
 ## not very robust !!!
 ## author: Cory Cartwright corycartwri...@sbcglobal.net
 ##
 ## grab cisco FWSM cpu information for RRD graphing and SNMP trap
 generation
 ##

 $ENV{TZ} = 'EDT';
 POSIX::tzset();

 my $createRRD = shift || 'false';

 my $host = MY6500|7600 host;
 my $user = router username;
 my $pass = router passwd;
 my $fwUser = FWSM username;
 my $fwPasswd = FWSM password;
 my $comunity = FWSM comunity string;
 my $monitor = 'trap monitor IP';   # source that set and sent the trap
 my @trapCatchers = qw(array of trap catchers);

 my $filename = /var/voip/fwsm_logger.txt; #dump file
 my $DBfile = '/var/voip/codecDump.csv';

 my $trapThreshold = '60'; #'60'; #five sec thresh  send trap%
 my $procThreshold = '30'; #'30' ; #threshhold before we capture sh proc

 my %meas_hash = (   'fiveSec' = 'fiveSec',
 'oneMin' = 'oneMin',
 'fiveMin' = 'fiveMin',
  );
 my $rrd = '/usr/voip/bin/fwcpuRRD.rrd';

 if (! -e $rrd) { $createRRD = 'true'; }

 my $hashRef = doExec();

 if($hashRef-{'fiveSec'} = $trapThreshold) {
 #send trap
 print Sending trap\n;
 sendTrap($hashRef-{'fiveSec'});
 }

 createRRD($rrd,\%meas_hash) if($createRRD eq 'true');
 updateRRD($rrd,\%meas_hash,$hashRef);
 print  struct\n . Dumper(%meas_hash);
 print   data\n . Dumper($hashRef);
 copy($rrd,/var/www/voipdata/fwcpuRRD.rrd);

 sub doExec {


 my $exp = new Expect;
 #$exp-log_stdout(1);
 $exp-log_file($filename);

 my $command = ssh -l $fwUser $host;

 $exp-spawn($command) or die Could not spawn $command $!;

 my $string = qr/passwd/;
 my $return = $exp-expect(3, $string);

 $exp-send($pass\n);

 $return = $exp-expect(3, '7604-nh1');
 $exp-send(session slot 3 pro 1\n);

 $return = $exp-expect(3, /Password:/);
 $exp-send(x1c2v3\n);

 $return = $exp-expect(3, 'sipsfw');
 $exp-send(enable\n);
 $return = $exp-expect(3, $string);
 $exp-send($fwPasswd\n);

 $return = $exp-expect(3, 'sipsfw#');
 $exp-send(sh cpu\n);
 $exp-expect(2);
 my $cpu = $exp-before();
 $cpu = $exp-before();
 my %cpu = ();
 if($cpu =~
 /\d\sseconds\s=\s(\d+)\%\;\s\d\sminute\:\s(\d+)\%\;\s\d\sminutes\:\s(\d+)\%/g)
 {
 $cpu{'fiveSec'} = $1;
 $cpu{'oneMin'} = $2;
 $cpu{'fiveMin'} = $3;
 print Dumper(%cpu);
 }
 if($cpu{'fiveSec'} = $procThreshold) {
 my $timestamp = \nBEGIN: TIME:  . time .  !!  .
 localtime(time) . \n### CPU 5 sec  . $cpu{'fiveSec'} . \n;
 $exp-print_log_file($timestamp);
 $exp-send(no pager\n);
 $exp-send(sh proc\n);
 $exp-send(sh conn\n);
 $exp-send(sh resource usage\n);
 $exp-expect(3,'sipsfw#');
 }
 $exp-send(exit\n); #exit enable
 $exp-expect(1);
 $exp-send(exit\n); #exit fw
 $exp-expect(1);
 $exp-send(exit\n); #exit switch
 $exp-expect(1);
 $exp-print_log_file(\nEND\n);
 $exp-soft_close();

 return(\%cpu);
 } #end doExec


 sub updateRRD {
 my ($rrd,$meas_hashRef,$dataHashRef) = @_;
 my $epoc = time;
 my $data_string = '';
 foreach my $cust (sort keys %$meas_hashRef) {
 my $data = $$dataHashRef{$$meas_hashRef{$cust}} || 0;
print Cust $cust: $data \n;
 $data_string = $data_string . $data:;
 }

 $data_string =~ s/:$//g;
 print rrdtool update $rrd $epoc:$data_string\n;
 RRDs::updatev $rrd, $epoc .: . $data_string;
 if (my $ERROR = RRDs::error) {
 warn $0: unable to update $rrd : $ERROR;
 }
 } #end sub

 sub createRRD {
 my 

High recursive client counts

2014-03-25 Thread Jason Brandt
We recently migrated to BIND for our internal resolvers, and since the
migration, we are experiencing periods of high recursive client counts,
which will at times cause the BIND server to quit responding.  As a
workaround, I've been able to point the BIND server to a forwarder,
bypassing the root hints, to restore stability, but this morning even with
the forwarder, our count spiked.

We are using Ubuntu 12.04 LTS, BIND version 9.8.1-P1.  The server is
configured strictly as a resolver, and is not authoritative for any domains.

We have approximately 15-20k client devices on campus.  Our average
recursive client count is between 10 and 50.  When the spikes occur, counts
will get upwards of 3-4k (this morning: recursive clients:
2358/9900/1).

What are possible causes of high recursive client count?  What can be done
to prevent this or tune around it?  Obviously raising the max clients
doesn't solve the problem, and the forwarder seemed to help, but apparently
is still susceptible to the issue.

Any suggestions would be greatly appreciated.

-- 
Jason K. Brandt
Systems Administrator
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: High recursive client counts

2014-03-25 Thread Mike Hoskins (michoski)
Hi Jason,

I've experienced similar things in the past on 9.8.  Since then we've
moved to the latest 9.9, but don't think this is at all version specific
(that said, you could obviously try upgrading).  I don't have an exact
solution for you, but some ideas of things to check and personal
experiences which might help you.

Are the servers in question VM or bare metal?  Several years back we made
a big push to virtualize everything, and after migrating recursive DNS it
worked great for awhile...as sites grew we hit a tipping point where
VM-based resolvers seemed to introduce additional query latency.  These
servers were running far below BIND's capabilities, not taxing virtual
resources, optimized per all available BIND/OS/virtualization knobs, and
using enterprise (read: not just the latest free bits slapped together and
expected to work) network, server and hypervisor tech.  I spent several
months trying to improve the situation and find a real root cause, but on
a whim I setup an identical cluster on bare metal...no more problems.  I
didn't have time to dig further, so we avoid virtualization on busy
resolvers (for now at least).

As your client count has grown...is there any bottlenecks on your network
that might be unaccounted for?  Beyond bandwidth I'm thinking of things
like resource constrained firewalls (are the resolvers in a DMZ?) which
could cause queries to be dropped/timed out/retried, etc?  I've seen
issues where overworked NetOps teams got behind in capacity
planning/upgrades and as clients/#DMZs grew firewalls couldn't keep up and
created all sorts of issues not related to BIND itself.

When the recursive client count backs up, you know more queries than usual
are taking longer than expected to get answers...if this is not related to
BIND itself, your servers, or the network...a bit of spelunking is in
order.  Capture some packets with tcpdump, and take a look at rndc
recursing output.  Take a look at the queries causing delays, dig them
manually from various locations, and try to find a common theme.  If there
is no common theme to the query destinations, then look even closer at
your network.  :-)

hth

-Original Message-
From: Jason Brandt jbra...@fsmail.bradley.edu
Date: Tuesday, March 25, 2014 at 10:31 AM
To: bind-users@lists.isc.org bind-users@lists.isc.org
Subject: High recursive client counts

We recently migrated to BIND for our internal resolvers, and since the
migration, we are experiencing periods of high recursive client counts,
which will at times cause the BIND server to quit responding.  As a
workaround, I've been able to point
 the BIND server to a forwarder, bypassing the root hints, to restore
stability, but this morning even with the forwarder, our count spiked.


We are using Ubuntu 12.04 LTS, BIND version 9.8.1-P1.  The server is
configured strictly as a resolver, and is not authoritative for any
domains.


We have approximately 15-20k client devices on campus.  Our average
recursive client count is between 10 and 50.  When the spikes occur,
counts will get upwards of 3-4k (this morning: recursive clients:
2358/9900/1). 


What are possible causes of high recursive client count?  What can be
done to prevent this or tune around it?  Obviously raising the max
clients doesn't solve the problem, and the forwarder seemed to help, but
apparently is still susceptible to
 the issue.  


Any suggestions would be greatly appreciated.


-- 
Jason K. Brandt
Systems Administrator





___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: High recursive client counts

2014-03-25 Thread Jason Brandt
Mike,

  I appreciate your insight here.  We are indeed on virtual systems, using
enterprise grade hardware as well.  I will be doing more investigation
today, to see if I can duplicate the behavior, which I have been able to do
recently.

Your VM vs Physical point is the thing that got me head scratching.  As I
stated, this is a new system, replacing our old resolvers; however, even
though I've had 2 different types of software doing resolution on our old
servers, they were actual physical machines.  Load in VMWare monitoring
shows what you'd normally expect, that the system isn't being taxed
heavily, network usage is fairly low.  To us, it seems like an application
configuration issue.  I could definitely see it being a VM issues of some
sort too though, with the strange way it's behaving.

I'll keep digging and debugging, to see if I can come up with more detail
and correlate results to try and come up with a common theme/cause.

Thank you for your help.


On Tue, Mar 25, 2014 at 10:52 AM, Mike Hoskins (michoski) 
micho...@cisco.com wrote:

 Hi Jason,

 I've experienced similar things in the past on 9.8.  Since then we've
 moved to the latest 9.9, but don't think this is at all version specific
 (that said, you could obviously try upgrading).  I don't have an exact
 solution for you, but some ideas of things to check and personal
 experiences which might help you.

 Are the servers in question VM or bare metal?  Several years back we made
 a big push to virtualize everything, and after migrating recursive DNS it
 worked great for awhile...as sites grew we hit a tipping point where
 VM-based resolvers seemed to introduce additional query latency.  These
 servers were running far below BIND's capabilities, not taxing virtual
 resources, optimized per all available BIND/OS/virtualization knobs, and
 using enterprise (read: not just the latest free bits slapped together and
 expected to work) network, server and hypervisor tech.  I spent several
 months trying to improve the situation and find a real root cause, but on
 a whim I setup an identical cluster on bare metal...no more problems.  I
 didn't have time to dig further, so we avoid virtualization on busy
 resolvers (for now at least).

 As your client count has grown...is there any bottlenecks on your network
 that might be unaccounted for?  Beyond bandwidth I'm thinking of things
 like resource constrained firewalls (are the resolvers in a DMZ?) which
 could cause queries to be dropped/timed out/retried, etc?  I've seen
 issues where overworked NetOps teams got behind in capacity
 planning/upgrades and as clients/#DMZs grew firewalls couldn't keep up and
 created all sorts of issues not related to BIND itself.

 When the recursive client count backs up, you know more queries than usual
 are taking longer than expected to get answers...if this is not related to
 BIND itself, your servers, or the network...a bit of spelunking is in
 order.  Capture some packets with tcpdump, and take a look at rndc
 recursing output.  Take a look at the queries causing delays, dig them
 manually from various locations, and try to find a common theme.  If there
 is no common theme to the query destinations, then look even closer at
 your network.  :-)

 hth

 -Original Message-
 From: Jason Brandt jbra...@fsmail.bradley.edu
 Date: Tuesday, March 25, 2014 at 10:31 AM
 To: bind-users@lists.isc.org bind-users@lists.isc.org
 Subject: High recursive client counts

 We recently migrated to BIND for our internal resolvers, and since the
 migration, we are experiencing periods of high recursive client counts,
 which will at times cause the BIND server to quit responding.  As a
 workaround, I've been able to point
  the BIND server to a forwarder, bypassing the root hints, to restore
 stability, but this morning even with the forwarder, our count spiked.
 
 
 We are using Ubuntu 12.04 LTS, BIND version 9.8.1-P1.  The server is
 configured strictly as a resolver, and is not authoritative for any
 domains.
 
 
 We have approximately 15-20k client devices on campus.  Our average
 recursive client count is between 10 and 50.  When the spikes occur,
 counts will get upwards of 3-4k (this morning: recursive clients:
 2358/9900/1).
 
 
 What are possible causes of high recursive client count?  What can be
 done to prevent this or tune around it?  Obviously raising the max
 clients doesn't solve the problem, and the forwarder seemed to help, but
 apparently is still susceptible to
  the issue.
 
 
 Any suggestions would be greatly appreciated.
 
 
 --
 Jason K. Brandt
 Systems Administrator
 
 
 
 




-- 
Jason K. Brandt
Systems Administrator
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: High recursive client counts

2014-03-25 Thread Jason Brandt
Cathy,
  Thank you for your comments.  I will continue to investigate, it helps to
have avenues to look down though.

As far as build version, we are aware that we aren't at current stable
release.  However we've tried to stick to the distro release as much as
possible, to help streamline patching.  But if this continues to be an
issue, it's something we will definitely consider.

The thing that's strange to me, is that we can mostly alleviate the
symptoms, by using a forwarder.  Currently I'm using an internal Windows
2003 server in the same subnet, on the same switch, to forward through,
however I was previously using 8.8.8.8, and it was behaving well too.  It
seems to happen worst when simply using the root hints.

Rndc recursing doesn't seem to be much help.  The queries are all over,
including google, adobe, amazon, microsoft, etc, as a combination of
A//PTR/TXT records, from a variety of different clients on different
subnets and in different firewall zones.   At a glance, I don't see any
correlation.

Again, I'll keep investigating, and appreciate all the input!

Jason


On Tue, Mar 25, 2014 at 12:34 PM, Cathy Almond cat...@isc.org wrote:

 Packet tracing and/or looking at rndc recursing is good - then you'll
 see which client queries are waiting for answers from authoritative
 servers.

 Depending on what you've upgraded from, this might be a problem with
 whether or not your infrastructure can handle EDNS0 and large packet
 sizes.  Newer version of BIND set the DO bit by default on the iterative
 queries, so perhaps some servers are sending back larger response than
 you were receiving before.  It's worth checking that your network
 infrastructure can handle both EDNS0 and large UDP packet sizes (and DNS
 queries via TCP of course too).  See
 https://www.dns-oarc.net/oarc/services/replysizetest

 I should also comment that the distro BIND 9.8 that you're using isn't
 the current ISC version, so you're missing-out on recent fixes - you
 might be better off with a self-build of 9.8.7-W1 or 9.8.5-W1:
 http://www.isc.org/downloads/

 These also might be helpful:

 https://kb.isc.org/article/AA-00771/46/Which-version-of-BIND-do-I-want-to-download-and-install.html

 https://kb.isc.org/article/AA-00768/46/Getting-started-with-BIND-how-to-build-and-run-named-with-a-basic-recursive-configuration.html

 HTH

 Cathy

 ___
 Please visit https://lists.isc.org/mailman/listinfo/bind-users to
 unsubscribe from this list

 bind-users mailing list
 bind-users@lists.isc.org
 https://lists.isc.org/mailman/listinfo/bind-users




-- 
Jason K. Brandt
Systems Administrator
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: High recursive client counts

2014-03-25 Thread Jason Brandt
Mark,
  That's a very good question, and something we had thought of as a
possibility as well.  I hadn't seen any good information in relation to
entropy, so I'll check into your link.  We had noticed that on other things
as well, due to the virtual environment, but nothing that caused
performance issues.

I'm not sure how bind uses randoms, but I know it is a requirement.
 Perhaps someone else knows?  From what I saw it seemed to be used
primarily for signing zones.

For now, I've disabled DNS inspection on our firewall, as it is an ancient
Cisco firewall services module, and that seems to have stabilized things,
but it's only been 30 minutes or so.  Until I get a few days in, I'll keep
researching.

Again, thanks all.  Your input and help is greatly appreciated.


On Tue, Mar 25, 2014 at 1:31 PM, Mark Elkins m...@posix.co.za wrote:

 This might be a dumb answer but as the machine is part of a virtual
 server, perhaps you have simply run out of entropy? I know its a
 Resolver... but isn't perhaps BIND using Entropy to randomly talk on
 different ports to get answers?

 What about installing the 'haveged' package,
 www.irisa.fr/caps/projects/hipsor

 I don't see this doing any harm.

 I've personally found that not doing this on Virtual machines just makes
 them 'choke up'.

 --
   .  . ___. .__  Posix Systems - (South) Africa
  /| /|   / /__   m...@posix.co.za  -  Mark J Elkins, Cisco CCIE
 / |/ |ARK \_/ /__ LKINS  Tel: +27 12 807 0590  Cell: +27 82 601 0496




-- 
Jason K. Brandt
Systems Administrator
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users