Re: Need to improve named performance

2012-11-18 Thread nudge
Sorry for arriving late and making points that might go without saying
but...

On Mon, Nov 12, 2012, at 05:23 PM, Ed LaFrance wrote:
 Hello Alan -
 
 Of course you are right, my bad.
 
 Here's the entirety of my named.conf - there's nothing pertaining to 
 logging in here, so I guess that means that 'log everything' is the 
 default. I would only want to log critical named errors, so if anyone 
 has syntax they have my gratitude:
 
 options {
  directory /var;
  auth-nxdomain no;
  pid-file /var/run/named/named.pid;
  allow-recursion {
  localnets;
  };
 
  allow-transfer {
  none;
  };
 };
 
 key rndc-key {
  algorithm hmac-md5;
  secret CeMgS23y0oWE20nyv0x40Q==;


I hope you've changed this key now that it's public ;)

Otherwise, you said the rndc command was giving you permission errors, I
get similar if I forget to sudo rndc
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: Need to improve named performance

2012-11-13 Thread Jack Tavares
One issue that *may* be impacting you (and another reason to upgrade)
is the size of the receive buffer within named was bumped up in 9.5 or 9.6
IIRC.

--
Jack Tavares

From: bind-users-bounces+j.tavares=f5@lists.isc.org 
[bind-users-bounces+j.tavares=f5@lists.isc.org] on behalf of Florian Weimer 
[f...@deneb.enyo.de]
Sent: Sunday, November 11, 2012 13:46
To: Ed LaFrance
Cc: bind-users@lists.isc.org
Subject: Re: Need to improve named performance

* Ed LaFrance:

 Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server
 (3Ghz) with 2GB RAM. Named is being used only for rDNS queries against
 our address space.

You should really upgrade to the latest version on that branch (likely
bind-9.3.6-20.P1.el5_8.5).

 The bottom line is: I need to improve named performance. Tcpdump only
 shows about 20 requests per second on average, I would estimate. This
 should be handled easily, but instead it's gagging on it and the
 requests are stacking up.

Something is stalling the named process.  Try to run strace -T -f -p
4509 (4509 is the PID for the named process) and see where named
spends its time.  The top output you quoted suggests that the process
is not spinning in user space.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread G.W. Haywood

Hi there,

On Mon, 12 Nov 2012, Ed LaFrance wrote:


... No idea on ip_conntrack. How do I check and if so, what setting
should I try and how do I do it?


Look for something like

/proc/sys/net/netfilter/ip_conntrack_tcp_timeout_established

and cat it to the terminal.  It will just be a number (it's in seconds)
and it's probably 432000 at the moment.  You (root) can change it for
example to one hour by the command

/bin/echo 3600  /proc/sys/net/netfilter/ip_conntrack_tcp_timeout_established

If it's to persist across a reboot you'll need to put the command in a
startup script such as rc.local or find out where the default settings
are in your system and tweak it there.

something like means that the name of the (virtual) file has changed
over the years and it might now be nf_conntrack_tcp_timeout_established
on your system.

Search the Web for this setting - it's a very specific term - and
you'll find that there are many other ways to tinker with TCP/IP. :)

--

73,
Ged.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread Ed LaFrance

Hello Florian -

You are my hero and new best friend. I stopped syslog:

[root@ns1 lisinc]# /sbin/service syslog stop
Shutting down kernel logger:   [  OK  ]
Shutting down system logger:   [  OK  ]

...and all the problems cleared up instantly, so you called it correctly.

I had noticed in /var/log/messages that basically every query was being 
logged:


Nov 12 06:23:54 ns1 named[8349]: client 64.12.139.83#37778: query: 
219.161.72.64.in-addr.arpa IN ANY -E
Nov 12 06:23:54 ns1 named[8349]: client 208.69.32.21#17245: query: 
129.160.72.64.in-addr.arpa IN PTR -
Nov 12 06:23:54 ns1 named[8349]: client 64.12.139.81#31273: query: 
211.21.140.204.in-addr.arpa IN PTR -E
Nov 12 06:23:54 ns1 named[8349]: client 74.125.18.212#62466: query: 
217.94.119.199.in-addr.arpa IN PTR -


I've been corresponding with several people on this issue but no one had 
questioned that when I pointed it out.


I really don't need this kind of logging in the messages log. I can turn 
on query logging in the named.conf if I need more detail on named. I 
think the simplest thing would just be to have an exclusion in the 
syslog config for named. I confess some general ignorance, so perhaps 
you know the directive for that?


Thanks again!

Ed

On 11/11/2012 10:56 PM, Florian Weimer wrote:

* Ed LaFrance:


Thanks for chiming in. Named is PID 8349 in my case. Here's a snippet
of the output from strace:



[pid  8351] send(3, 30Nov 11 13:07:25 named[8349]:..., 107,
MSG_NOSIGNAL) = 1070.015232



[pid  8353] send(3, 30Nov 11 13:07:25 named[8349]:..., 103,



[pid  8353]... send resumed  )= 1030.015034


This look like syslog logging is the culprit, each syslog message
takes 15ms to complete.

There could be several causes: syslogd is logging synchronously to
disk (doing an fsync after each message), something else in the system
is producing an extremely large number of messages (syslogd is
single-threaded), or there is a request loop where writing out the
syslog message for each reverse DNS request requires itself a reverse
DNS lookup.

You should also check if named is expected to log this many messages
in the first place.  You can pass -s 200 to strace to see more of
the logging message, so this should help to identify what's going on.

I don't think this has got anything to do with the particular BIND
version you use.



--
(800) 362-7579 ext 1

+---+
+ ColocationDedicated Servers   IPv4  IPv6 Transit +
+---+
Connex Internet Services, Inc. direct: (916) 265-1568
11230 Gold Express Dr #310-313fax: (916) 880-5663
Gold River, CA 95670http://connexinternet.com
+---+
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread Phil Mayers

On 12/11/12 15:23, Ed LaFrance wrote:


I really don't need this kind of logging in the messages log. I can turn
on query logging in the named.conf if I need more detail on named. I
think the simplest thing would just be to have an exclusion in the
syslog config for named. I confess some general ignorance, so perhaps


Don't do that. Instead, configure named to not syslog if you don't want 
it to. Maybe log to files from within named, which is quicker.



you know the directive for that?


As per the ARM:

http://www.isc.org/files/arm94_0.html#id2574861

...the defaults are:


Only one logging statement is used to define as many channels and 
categories as are wanted. If there is no logging statement, the logging 
configuration will be:


logging {
 category default { default_syslog; default_debug; };
 category unmatched { null; };
};


You can easily change this so that queries aren't logged to syslog. For 
example:


logging {
 channel query_log { file logs/query.log; versions 4; size 10m; };
 category queries { query_log; };
 category default { default_syslog; default_debug; };
 category unmatched { null; };
};

I would recommend tuning this further, as other log categories can 
generate a lot of output too. In fact, unless you need to, I would not 
use syslog for named at *all* e.g.


logging {
 channel query_log { file logs/query.log; versions 4; size 10m; };
 channel named_log { file logs/named.log; versions 4; size 10m; };
 category queries { query_log; };
 category default { named_log; };
};
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread Alan Clegg

On Nov 12, 2012, at 10:23 AM, Ed LaFrance e...@connexinternet.com wrote:

 I've been corresponding with several people on this issue but no one had 
 questioned that when I pointed it out.

I don't think I'd seen the logging stanza, but yes, logging to syslog is a bad 
thing, and logging queries to syslog is even worse.  Having had someone pick 
this out of an strace output is indeed awesome.

 I really don't need this kind of logging in the messages log. I can turn on 
 query logging in the named.conf if I need more detail on named. I think the 
 simplest thing would just be to have an exclusion in the syslog config for 
 named. I confess some general ignorance, so perhaps you know the directive 
 for that?

To reduce the load on named in general, just turn off query logging in the 
named.conf, or, you can leave the stanza in and put a querylog no; in your 
options stanza so that it is not started when named starts (I'm not sure what 
version introduced the querylog option, so you may need to test this.

AlanC
-- 
Alan Clegg | +1-919-355-8851 | a...@clegg.com





___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread Ed LaFrance

Hello Alan -

Currently I'm not using query logging, it's not in my options at all. 
Are you saying that named logging by syslog into /var/log/messages is 
controlled by named.conf? Seems unlikely, I'd think it would be a 
function of syslog.conf. I'm trying to learn more about it but I'm 
swamped this am, just thought I'd post here to see if anyone knows a 
quick way to exclude named from the syslog completely.


Ed

On 11/12/2012 7:34 AM, Alan Clegg wrote:


On Nov 12, 2012, at 10:23 AM, Ed LaFrancee...@connexinternet.com  wrote:


I've been corresponding with several people on this issue but no one had 
questioned that when I pointed it out.


I don't think I'd seen the logging stanza, but yes, logging to syslog is a bad 
thing, and logging queries to syslog is even worse.  Having had someone pick 
this out of an strace output is indeed awesome.


I really don't need this kind of logging in the messages log. I can turn on 
query logging in the named.conf if I need more detail on named. I think the 
simplest thing would just be to have an exclusion in the syslog config for 
named. I confess some general ignorance, so perhaps you know the directive for 
that?


To reduce the load on named in general, just turn off query logging in the named.conf, 
or, you can leave the stanza in and put a querylog no; in your options stanza 
so that it is not started when named starts (I'm not sure what version introduced the 
querylog option, so you may need to test this.

AlanC


--
(800) 362-7579 ext 1

+---+
+ ColocationDedicated Servers   IPv4  IPv6 Transit +
+---+
Connex Internet Services, Inc. direct: (916) 265-1568
11230 Gold Express Dr #310-313fax: (916) 880-5663
Gold River, CA 95670http://connexinternet.com
+---+
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread Eliezer Croitoru

On 11/12/2012 5:58 PM, Ed LaFrance wrote:

Hello Alan -

Currently I'm not using query logging, it's not in my options at all.
Are you saying that named logging by syslog into /var/log/messages is
controlled by named.conf? Seems unlikely, I'd think it would be a
function of syslog.conf. I'm trying to learn more about it but I'm
swamped this am, just thought I'd post here to see if anyone knows a
quick way to exclude named from the syslog completely.

Ed
It's not about excluding but to decide whether to even send then logs 
from bind to syslogd.


Regards,
Eliezer

--
Eliezer Croitoru
https://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer at ngtech.co.il
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread Alan Clegg

On Nov 12, 2012, at 10:58 AM, Ed LaFrance e...@connexinternet.com wrote:

 Currently I'm not using query logging, it's not in my options at all. Are you 
 saying that named logging by syslog into /var/log/messages is controlled by 
 named.conf? Seems unlikely, I'd think it would be a function of syslog.conf. 
 I'm trying to learn more about it but I'm swamped this am, just thought I'd 
 post here to see if anyone knows a quick way to exclude named from the syslog 
 completely.

Logging queries to syslog is not on by default (in ISC distributed BIND), so 
something is doing it.

Send us your logging stanza...

(And yes, I'm absolutely sure that logging queries to syslog is handled by 
named.conf)

AlanC
-- 
Alan Clegg | +1-919-355-8851 | a...@clegg.com





___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread Barry Margolin
In article mailman.637.1352735940.11945.bind-us...@lists.isc.org,
 Ed LaFrance e...@connexinternet.com wrote:

 Hello Alan -
 
 Currently I'm not using query logging, it's not in my options at all. 
 Are you saying that named logging by syslog into /var/log/messages is 
 controlled by named.conf? Seems unlikely, I'd think it would be a 
 function of syslog.conf. I'm trying to learn more about it but I'm 
 swamped this am, just thought I'd post here to see if anyone knows a 
 quick way to exclude named from the syslog completely.

syslog.conf tells syslogd what to do when it receives the log messages. 
It doesn't control the applications that send log messages in the first 
place, that's controlled by the application's own configuration.  named 
doesn't log queries unless you tell it to.

-- 
Barry Margolin
Arlington, MA
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread David Forrest

On Mon, 12 Nov 2012, Ed LaFrance wrote:


Hello Alan -

Of course you are right, my bad.

Here's the entirety of my named.conf - there's nothing pertaining to logging 
in here, so I guess that means that 'log everything' is the default. I would 
only want to log critical named errors, so if anyone has syntax they have my 
gratitude:


No, you just get the defaults as described in the ARM 6.2.10

Only one logging statement is used to define as many channels and 
categories as are wanted. If there is

no logging statement, the logging configuration will be:
logging {
  category default { default_syslog; default_debug; };
  category unmatched { null; };
};

The rest of 6.2.10 shows the syntax and provides the ability to roll the 
logs much as (r)syslogd.conf does for those that syslog gets.  None of my 
named logs go to syslog as I do have a logging statement of my choices.


Dave
--
David Forrest 
St. Louis, Missouri

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread Ed LaFrance
The developer of some software we use has come up with this and it 
appears to work:


logging{
channel error_log {
file /var/log/bind.log versions 3 size 5m;
severity error;
print-time yes;
print-severity yes;
print-category yes;
};
category default{
error_log;
};
};

On 11/12/2012 8:49 AM, David Forrest wrote:

On Mon, 12 Nov 2012, Ed LaFrance wrote:


Hello Alan -

Of course you are right, my bad.

Here's the entirety of my named.conf - there's nothing pertaining to
logging in here, so I guess that means that 'log everything' is the
default. I would only want to log critical named errors, so if anyone
has syntax they have my gratitude:


No, you just get the defaults as described in the ARM 6.2.10

Only one logging statement is used to define as many channels and
categories as are wanted. If there is
no logging statement, the logging configuration will be:
logging {
category default { default_syslog; default_debug; };
category unmatched { null; };
};

The rest of 6.2.10 shows the syntax and provides the ability to roll
the logs much as (r)syslogd.conf does for those that syslog gets. None
of my named logs go to syslog as I do have a logging statement of my
choices.

Dave


--
(800) 362-7579 ext 1

+---+
+ ColocationDedicated Servers   IPv4  IPv6 Transit +
+---+
Connex Internet Services, Inc. direct: (916) 265-1568
11230 Gold Express Dr #310-313fax: (916) 880-5663
Gold River, CA 95670http://connexinternet.com
+---+
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-12 Thread Jeremy C. Reed
On Mon, 12 Nov 2012, Ed LaFrance wrote:

 Currently I'm not using query logging, it's not in my options at all.

I think rndc querylog was used to enable it (even if no corresponding 
logging configuration). You can use it again to toggle it off.  rndc 
status will show if query logging is on or off.

I think in an earlier message you said rndc didn't work for you, but 
your named.conf does have some configuration for it, so maybe you need 
to use a different rndc (maybe installed multiple times?) or point to 
the correct configuration.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-11 Thread G.W. Haywood

Hi there,

On Sun, 11 Nov 2012, Ed LaFrance wrote:


Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 ...


Somebody already said upgrade.  Generally that's the first thing to do
in a case like this (before asking on mailing lists:).


The issue is that named is not keeping up with rdns requests. The
nameserver is only doing rdns, and it's the only public process on the
server (no webhosting, monitoring, etc).

When I check the router above this server I'll see 200 - 500 legitimate
connections to this server at any given time. ...


I'm not convinced that BIND is the problem.  What does 'top' tell you?

Are you running netfilter/iptables on the box?  Might be ip_conntrack.
I once had an issue with a lot of dropped TCP connections, each of
which was hanging around for five days (the default).  They filled the
connection tracking table.  The default is too long, ridiculously so.
After I reduced it to something more reasonable the problem went away.

--

73,
Ged.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-11 Thread Kevin Darcy

On 11/10/2012 1:39 PM, Ed LaFrance wrote:

Hello all -

First post to this list, hope I'm on the right place.

Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server 
(3Ghz) with 2GB RAM. Named is being used only for rDNS queries against 
our address space.


The issue is that named is not keeping up with rdns requests. The 
nameserver is only doing rdns, and it's the only public process on the 
server (no webhosting, monitoring, etc).


When I check the router above this server I'll see 200 - 500 
legitimate connections to this server at any given time. This is 
what's happening: named is not keeping up with the requests, so the 
network receive queue fills up - I can see this with netstat:


netstat -tulpn | grep :53
Proto Recv-Q Send-Q Local Address   Foreign Address 
PID/Program name

...
udp   110048  0 xxx.xxx.xxx.xxx:53   0.0.0.0:* 3918/named
udp   110048  0 xxx.xxx.xxx.xxx:53 0.0.0.0:* 3918/named

(two different IPs are on this machine to handle rDNS reqeusts)

Once the queue gets near the max value set by sysctl, udp packets 
start to drop - this can also be seen in netstat:


 netstat -su
...
Udp:
5157567 packets received
9761 packets to unknown port received.
1164232 packet receive errors
5157554 packets sent

The errors apparently correspond to drops; the only increase when the 
queue is full.


Of course by this point dns queries are timing out. I've tried 
increasing the queue size with sysctl using this command:


sysctl -w net.core.rmem_max=1048576 net.core.rmem_default=10485

then restarting named; that did eliminate the drops, but the queue 
grows gigantic and I get pretty much 100% dns lookup timeouts at that 
point.


The server loading is about 2.0 - busy, not not overwhelmed, I can run 
a shell or even a gui session on it with ease so it's by no means 
maxed out. Here's the first slice of top output:


top - 09:13:38 up 18:40,  1 user,  load average: 2.09, 2.05, 2.00
Tasks: 175 total,   1 running, 174 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 74.8%id, 24.7%wa,  0.0%hi, 0.2%si, 
0.0%st

Mem:   2074984k total,  1743584k used,   331400k free,   166588k buffers
Swap:  4128760k total,   28k used,  4128732k free,  1270032k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+ COMMAND
 4509 named 24   0 71004 4580 2036 S  1.3  0.2   0:46.74 named
 6877 root  15   0  2428 1064  788 R  0.7  0.1   0:00.04 top
  467 root  10  -5 000 D  0.3  0.0   2:59.13 kjournald
 2460 root  18   0  1816  584  484 D  0.3  0.0   3:30.35 syslogd
1 root  15   0  2160  644  556 S  0.0  0.0   0:01.08 init

The bottom line is: I need to improve named performance. Tcpdump only 
shows about 20 requests per second on average, I would estimate. This 
should be handled easily, but instead it's gagging on it and the 
requests are stacking up. If you have any ideas, I welcome your input. 
Here's named.conf, it's pretty basic for the global config, the data 
for each zone is stored separately elsewhere:


options {
directory /var;
auth-nxdomain no;
pid-file /var/run/named/named.pid;
allow-recursion {
localnets;
};

allow-transfer {
none;
};
};

key rndc-key {
algorithm hmac-md5;
secret xx;
};

controls {
inet 127.0.0.1 port 953
allow { 127.0.0.1; } keys { rndc-key; };
};

zone . {
type hint;
file named.root;
};

zone 0.0.127.IN-ADDR.ARPA {
type master;
file localhost.rev;
};


I wouldn't expect a nameserver process on Linux, hosting only a few 
reverse zones and doing nothing else, to be 71 megabytes in size; I just 
checked one of ours, serving *all* of our internal zone data, forward 
and reverse authoritative, plus some cached data for a significant 
number of zones delegated to business partners, and it's less than 100 
Mb in size.


Verify from your query logs, or by dumping cache, that it's *only* doing 
what it is supposed to do, and no more. If you've got a bunch of data in 
your cache, or a bunch of queries, that's unrelated to serving your 
reverse DNS, then that's probably the root cause of your problem. 
Consider turning off recursion, or severely limiting it, in order to 
enforce that the nameserver is only serving its intended purpose. 2Gb of 
memory is a little lean for a nameserver serving a *generic* 
Internet-name-lookup role...


I guess another possibility is that you've gone crazy with your reverse 
zones (e.g. using $GENERATE willy-nilly), and thus are using up way more 
memory than you really need, to serve your reverse-resolution needs.


- Kevin


___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https

Re: Need to improve named performance

2012-11-11 Thread Florian Weimer
* Ed LaFrance:

 Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server
 (3Ghz) with 2GB RAM. Named is being used only for rDNS queries against
 our address space.

You should really upgrade to the latest version on that branch (likely
bind-9.3.6-20.P1.el5_8.5).

 The bottom line is: I need to improve named performance. Tcpdump only
 shows about 20 requests per second on average, I would estimate. This
 should be handled easily, but instead it's gagging on it and the
 requests are stacking up.

Something is stalling the named process.  Try to run strace -T -f -p
4509 (4509 is the PID for the named process) and see where named
spends its time.  The top output you quoted suggests that the process
is not spinning in user space.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-11 Thread Ed LaFrance

Hello -

Thanks for chiming in. Named is PID 8349 in my case. Here's a snippet of 
the output from strace:


[pid  8351] time( unfinished ...
[pid  8352] ... sendmsg resumed ) = 56 0.000104
[pid  8352] recvmsg(515, {msg_name(16)={sa_family=AF_INET, 
sin_port=htons(38385), sin_addr=inet_addr(205.188.158.143)}, 
msg_iov(1)=[{Q\0\0\0\1\0\0\0\0\0\1\003157\003161\00272\00264\7in-ad..., 
4096}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, 
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 55 0.31

[pid  8351] ... time resumed NULL)= 1352668045 0.000353
[pid  8352] futex(0x9b6aecc, FUTEX_WAIT_PRIVATE, 2, NULL unfinished ...
[pid  8351] stat64(/etc/localtime, {st_mode=S_IFREG|0644, 
st_size=2819, ...}) = 0 0.000109
[pid  8351] stat64(/etc/localtime, {st_mode=S_IFREG|0644, 
st_size=2819, ...}) = 0 0.86
[pid  8351] stat64(/etc/localtime, {st_mode=S_IFREG|0644, 
st_size=2819, ...}) = 0 0.84
[pid  8351] send(3, 30Nov 11 13:07:25 named[8349]:..., 107, 
MSG_NOSIGNAL) = 107 0.015232

[pid  8351] futex(0x9b6aecc, FUTEX_WAKE_PRIVATE, 1 unfinished ...
[pid  8353] ... futex resumed )   = 0 0.052813
[pid  8351] ... futex resumed )   = 1 0.000125
[pid  8353] time(NULL)  = 1352668045 0.20
[pid  8353] stat64(/etc/localtime, {st_mode=S_IFREG|0644, 
st_size=2819, ...}) = 0 0.25
[pid  8353] stat64(/etc/localtime, {st_mode=S_IFREG|0644, 
st_size=2819, ...}) = 0 0.22
[pid  8351] sendmsg(513, {msg_name(16)={sa_family=AF_INET, 
sin_port=htons(38162), sin_addr=inet_addr(205.188.158.207)}, 
msg_iov(1)=[{@%\204\0\0\1\0\1\0\2\0\1\003249\00221\003140\003204\7in-a..., 
138}], msg_controllen=0, msg_flags=0}, 0 unfinished ...

[pid  8353] stat64(/etc/localtime,  unfinished ...
[pid  8351] ... sendmsg resumed ) = 138 0.48
[pid  8353] ... stat64 resumed {st_mode=S_IFREG|0644, st_size=2819, 
...}) = 0 0.41

[pid  8351] recvmsg(513,  unfinished ...
[pid  8353] send(3, 30Nov 11 13:07:25 named[8349]:..., 103, 
MSG_NOSIGNAL unfinished ...
[pid  8351] ... recvmsg resumed {msg_name(16)={sa_family=AF_INET, 
sin_port=htons(53507), sin_addr=inet_addr(205.188.158.206)}, 
msg_iov(1)=[{\244\273\0\0\0\1\0\0\0\0\0\1\003246\003161\00272\00264\7in-ad..., 
4096}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, 
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 55 0.86

[pid  8351] futex(0x9b6aecc, FUTEX_WAIT_PRIVATE, 2, NULL unfinished ...
[pid  8353] ... send resumed )= 103 0.015034
[pid  8353] futex(0x9b6aecc, FUTEX_WAKE_PRIVATE, 1) = 1 0.25
[pid  8350] ... futex resumed )   = 0 0.051772
[pid  8350] time( unfinished ...
[pid  8353] sendmsg(513, {msg_name(16)={sa_family=AF_INET, 
sin_port=htons(60702), sin_addr=inet_addr(64.12.139.17)}, 
msg_iov(1)=[{\343F\204\0\0\1\0\1\0\2\0\1\003251\003160\00272\00264\7in-ad..., 
151}], msg_controllen=0, msg_flags=0}, 0 unfinished ...

[pid  8350] ... time resumed NULL)= 1352668045 0.000210
[pid  8353] ... sendmsg resumed ) = 151 0.84
[pid  8350] stat64(/etc/localtime,  unfinished ...
[pid  8353] recvmsg(513,  unfinished ...
[pid  8350] ... stat64 resumed {st_mode=S_IFREG|0644, st_size=2819, 
...}) = 0 0.85
[pid  8353] ... recvmsg resumed {msg_name(16)={sa_family=AF_INET, 
sin_port=htons(3794), sin_addr=inet_addr(64.12.139.19)}, 
msg_iov(1)=[{|\354\0\0\0\1\0\0\0\0\0\1\00230\003160\00272\00264\7in-add..., 
4096}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, 
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 54 0.000150

[pid  8350] stat64(/etc/localtime,  unfinished ...
[pid  8353] futex(0x9b6aecc, FUTEX_WAIT_PRIVATE, 2, NULL unfinished ...
[pid  8350] ... stat64 resumed {st_mode=S_IFREG|0644, st_size=2819, 
...}) = 0 0.76
[pid  8350] stat64(/etc/localtime, {st_mode=S_IFREG|0644, 
st_size=2819, ...}) = 0 0.29
[pid  8350] send(3, 30Nov 11 13:07:25 named[8349]:..., 102, 
MSG_NOSIGNAL unfinished ...



On 11/11/2012 1:46 PM, Florian Weimer wrote:

* Ed LaFrance:


Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server
(3Ghz) with 2GB RAM. Named is being used only for rDNS queries against
our address space.


You should really upgrade to the latest version on that branch (likely
bind-9.3.6-20.P1.el5_8.5).


The bottom line is: I need to improve named performance. Tcpdump only
shows about 20 requests per second on average, I would estimate. This
should be handled easily, but instead it's gagging on it and the
requests are stacking up.


Something is stalling the named process.  Try to run strace -T -f -p
4509 (4509 is the PID for the named process) and see where named
spends its time.  The top output you quoted suggests that the process
is not spinning in user space.



--
(800) 362-7579 ext 1

+---+
+ ColocationDedicated Servers   IPv4  IPv6 Transit +
+---+
Connex Internet Services, Inc. direct: (916) 265-1568
11230 Gold Express Dr #310-313

Re: Need to improve named performance

2012-11-11 Thread Florian Weimer
* Ed LaFrance:

 Thanks for chiming in. Named is PID 8349 in my case. Here's a snippet
 of the output from strace:

 [pid  8351] send(3, 30Nov 11 13:07:25 named[8349]:..., 107,
 MSG_NOSIGNAL) = 107 0.015232

 [pid  8353] send(3, 30Nov 11 13:07:25 named[8349]:..., 103,

 [pid  8353] ... send resumed )= 103 0.015034

This look like syslog logging is the culprit, each syslog message
takes 15ms to complete.

There could be several causes: syslogd is logging synchronously to
disk (doing an fsync after each message), something else in the system
is producing an extremely large number of messages (syslogd is
single-threaded), or there is a request loop where writing out the
syslog message for each reverse DNS request requires itself a reverse
DNS lookup.

You should also check if named is expected to log this many messages
in the first place.  You can pass -s 200 to strace to see more of
the logging message, so this should help to identify what's going on.

I don't think this has got anything to do with the particular BIND
version you use.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Need to improve named performance

2012-11-10 Thread Ed LaFrance

Hello all -

First post to this list, hope I'm on the right place.

Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server 
(3Ghz) with 2GB RAM. Named is being used only for rDNS queries against 
our address space.


The issue is that named is not keeping up with rdns requests. The 
nameserver is only doing rdns, and it's the only public process on the 
server (no webhosting, monitoring, etc).


When I check the router above this server I'll see 200 - 500 legitimate 
connections to this server at any given time. This is what's happening: 
named is not keeping up with the requests, so the network receive queue 
fills up - I can see this with netstat:


netstat -tulpn | grep :53
Proto Recv-Q Send-Q Local Address   Foreign Address 
PID/Program name

...
udp   110048  0 xxx.xxx.xxx.xxx:53   0.0.0.0:*   3918/named
udp   110048  0 xxx.xxx.xxx.xxx:53 0.0.0.0:*   3918/named

(two different IPs are on this machine to handle rDNS reqeusts)

Once the queue gets near the max value set by sysctl, udp packets start 
to drop - this can also be seen in netstat:


 netstat -su
...
Udp:
5157567 packets received
9761 packets to unknown port received.
1164232 packet receive errors
5157554 packets sent

The errors apparently correspond to drops; the only increase when the 
queue is full.


Of course by this point dns queries are timing out. I've tried 
increasing the queue size with sysctl using this command:


sysctl -w net.core.rmem_max=1048576 net.core.rmem_default=10485

then restarting named; that did eliminate the drops, but the queue grows 
gigantic and I get pretty much 100% dns lookup timeouts at that point.


The server loading is about 2.0 - busy, not not overwhelmed, I can run a 
shell or even a gui session on it with ease so it's by no means maxed 
out. Here's the first slice of top output:


top - 09:13:38 up 18:40,  1 user,  load average: 2.09, 2.05, 2.00
Tasks: 175 total,   1 running, 174 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 74.8%id, 24.7%wa,  0.0%hi,  0.2%si, 
0.0%st

Mem:   2074984k total,  1743584k used,   331400k free,   166588k buffers
Swap:  4128760k total,   28k used,  4128732k free,  1270032k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 4509 named 24   0 71004 4580 2036 S  1.3  0.2   0:46.74 named
 6877 root  15   0  2428 1064  788 R  0.7  0.1   0:00.04 top
  467 root  10  -5 000 D  0.3  0.0   2:59.13 kjournald
 2460 root  18   0  1816  584  484 D  0.3  0.0   3:30.35 syslogd
1 root  15   0  2160  644  556 S  0.0  0.0   0:01.08 init

The bottom line is: I need to improve named performance. Tcpdump only 
shows about 20 requests per second on average, I would estimate. This 
should be handled easily, but instead it's gagging on it and the 
requests are stacking up. If you have any ideas, I welcome your input. 
Here's named.conf, it's pretty basic for the global config, the data for 
each zone is stored separately elsewhere:


options {
directory /var;
auth-nxdomain no;
pid-file /var/run/named/named.pid;
allow-recursion {
localnets;
};

allow-transfer {
none;
};
};

key rndc-key {
algorithm hmac-md5;
secret xx;
};

controls {
inet 127.0.0.1 port 953
allow { 127.0.0.1; } keys { rndc-key; };
};

zone . {
type hint;
file named.root;
};

zone 0.0.127.IN-ADDR.ARPA {
type master;
file localhost.rev;
};

Thanks!
Ed
--
(800) 362-7579 ext 1

+---+
+ ColocationDedicated Servers   IPv4  IPv6 Transit +
+---+
Connex Internet Services, Inc. direct: (916) 265-1568
11230 Gold Express Dr #310-313fax: (916) 880-5663
Gold River, CA 95670http://connexinternet.com
+---+
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-10 Thread Alan Clegg

On Nov 10, 2012, at 1:39 PM, Ed LaFrance e...@connexinternet.com wrote:

 Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5

Before everyone else says it... upgrade.

AlanC
-- 
Alan Clegg | +1-919-355-8851 | a...@clegg.com







signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Need to improve named performance

2012-11-10 Thread Ed LaFrance

Hello Alan -

I will do an upgrade as soon as I get chance - a bit tied up right now. 
But in any case, since I posted this I've done some query logging for a 
bit and find that I'm getting an average of about 60 queries per second. 
All the dns queries are coming in via udp - the connections I mentioned 
are likewise udp. As I mentioned before, netstat shoes the udp Recv-Q 
filling up on the two IPs on that server that are taking the requests.


There's a basic firewall setup on the server, only ports I need are open:

Chain INPUT (policy ACCEPT)
target prot opt source   destination
RH-Firewall-1-INPUT  all  --  0.0.0.0/00.0.0.0/0

Chain FORWARD (policy ACCEPT)
target prot opt source   destination
RH-Firewall-1-INPUT  all  --  0.0.0.0/00.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target prot opt source   destination

Chain RH-Firewall-1-INPUT (2 references)
target prot opt source   destination
ACCEPT all  --  0.0.0.0/00.0.0.0/0
ACCEPT icmp --  0.0.0.0/00.0.0.0/0   icmp type 255
ACCEPT esp  --  0.0.0.0/00.0.0.0/0
ACCEPT ah   --  0.0.0.0/00.0.0.0/0
ACCEPT udp  --  0.0.0.0/0224.0.0.251 udp dpt:5353
ACCEPT udp  --  0.0.0.0/00.0.0.0/0   udp dpt:631
ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   tcp dpt:631
ACCEPT all  --  0.0.0.0/00.0.0.0/0   state 
RELATED,ESTABLISHED
ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   state NEW 
tcp dpt:10022
ACCEPT udp  --  0.0.0.0/00.0.0.0/0   state NEW 
udp dpt:53
ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   state NEW 
tcp dpt:53
ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   state NEW 
tcp dpt:5900
ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   state NEW 
tcp dpt:5901
ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   state NEW 
tcp dpt:8550
REJECT all  --  0.0.0.0/00.0.0.0/0   reject-with 
icmp-host-prohibited


As far as recursing goes:

/usr/sbin/rndc recursing
rndc: 'recursing' failed: permission denied

Any ideas are welcome

Ed


On 11/10/2012 3:46 PM, Alan Clegg wrote:


On Nov 10, 2012, at 1:39 PM, Ed LaFrancee...@connexinternet.com
wrote:


When I check the router above this server I'll see 200 - 500
legitimate connections to this server at any given time.


Having sent my snarky update e-mail, I now ask... you say later in
the mail that you are doing about 20 queries per second (which I
agree should be handled by any hardware with more oomph than a
Z-80).

I'm curious as to what these 200-500 legitimate connections are.
Are they TCP?  If so, are you seeing lots of TCP connections hanging
around?  Do you have some firewall in the midst of this that might be
messing around with TCP connections?

If you do a rndc recursing, what do you get?

If you are only doing 20-30 transactions per second, the stats on the
UDP counts would have taken a long time to get there... something
doesn't add up.

AlanC

--
(800) 362-7579 ext 1

+---+
+ ColocationDedicated Servers   IPv4  IPv6 Transit +
+---+
Connex Internet Services, Inc. direct: (916) 265-1568
11230 Gold Express Dr #310-313fax: (916) 880-5663
Gold River, CA 95670http://connexinternet.com
+---+
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Need to improve named performance

2012-11-10 Thread Ed LaFrance

Hello Alan -

It's also worth noting that, since I have more IPs on the box than the 
ones that are designated as nameservers, and since I have dns listening 
on all addresses, I can query named using one of the non-nameserver IPs 
- and it works fine! For instance:


nslookup x.x.x.29 y.y.y.114
Server: y.y.y.114
Address:y.y.y.114#53

29.x.x.x.in-addr.arpa  name = foo.bar.net.

The problem is that the UDP receive queue is flooded for the two IPs 
that correspond to the two nameservers on this box. I.e.


ns2.mydomain.net = y.y.y.115

nslookup x.x.x.29 y.y.y.115
;; connection timed out; no servers could be reached

but since y.y.y.112/29 is on this box, you can query:

nslookup x.x.x.29 y.y.y.116
Server: y.y.y.116
Address:y.y.y.116#53

29.x.x.x.in-addr.arpa  name = foo-bar.net.

[cololine@ns3 ~]$ nslookup x.x.x.29 y.y.y.117
Server: y.y.y.117
Address:y.y.y.117#53

29.x.x.x.in-addr.arpa  name = foo.bar.net.

...etc.

What I need, hope for, want, is someone to tell me how to fix upnamed 
and/or UDP on this box so it can keep up with requests that are 
happening on the nameserver IPs, as clearly the server can do what it 
needs to do if I can get past this brokenness.


Thanks,

Ed

On 11/10/2012 3:46 PM, Alan Clegg wrote:


On Nov 10, 2012, at 1:39 PM, Ed LaFrancee...@connexinternet.com  wrote:


When I check the router above this server I'll see 200 - 500 legitimate 
connections to this server at any given time.


Having sent my snarky update e-mail, I now ask... you say later in the mail 
that you are doing about 20 queries per second (which I agree should be handled by any 
hardware with more oomph than a Z-80).

I'm curious as to what these 200-500 legitimate connections are.  Are they 
TCP?  If so, are you seeing lots of TCP connections hanging around?  Do you have some 
firewall in the midst of this that might be messing around with TCP connections?

If you do a rndc recursing, what do you get?

If you are only doing 20-30 transactions per second, the stats on the UDP 
counts would have taken a long time to get there... something doesn't add up.

AlanC


--
(800) 362-7579 ext 1

+---+
+ ColocationDedicated Servers   IPv4  IPv6 Transit +
+---+
Connex Internet Services, Inc. direct: (916) 265-1568
11230 Gold Express Dr #310-313fax: (916) 880-5663
Gold River, CA 95670http://connexinternet.com
+---+
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users