Re: Need to improve named performance
Sorry for arriving late and making points that might go without saying but... On Mon, Nov 12, 2012, at 05:23 PM, Ed LaFrance wrote: > Hello Alan - > > Of course you are right, my bad. > > Here's the entirety of my named.conf - there's nothing pertaining to > logging in here, so I guess that means that 'log everything' is the > default. I would only want to log critical named errors, so if anyone > has syntax they have my gratitude: > > options { > directory "/var"; > auth-nxdomain no; > pid-file "/var/run/named/named.pid"; > allow-recursion { > localnets; > }; > > allow-transfer { > "none"; > }; > }; > > key "rndc-key" { > algorithm hmac-md5; > secret "CeMgS23y0oWE20nyv0x40Q=="; I hope you've changed this key now that it's public ;) Otherwise, you said the rndc command was giving you permission errors, I get similar if I forget to sudo rndc ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
RE: Need to improve named performance
One issue that *may* be impacting you (and another reason to upgrade) is the size of the receive buffer within named was bumped up in 9.5 or 9.6 IIRC. -- Jack Tavares From: bind-users-bounces+j.tavares=f5@lists.isc.org [bind-users-bounces+j.tavares=f5@lists.isc.org] on behalf of Florian Weimer [f...@deneb.enyo.de] Sent: Sunday, November 11, 2012 13:46 To: Ed LaFrance Cc: bind-users@lists.isc.org Subject: Re: Need to improve named performance * Ed LaFrance: > Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server > (3Ghz) with 2GB RAM. Named is being used only for rDNS queries against > our address space. You should really upgrade to the latest version on that branch (likely bind-9.3.6-20.P1.el5_8.5). > The bottom line is: I need to improve named performance. Tcpdump only > shows about 20 requests per second on average, I would estimate. This > should be handled easily, but instead it's gagging on it and the > requests are stacking up. Something is stalling the named process. Try to run "strace -T -f -p 4509" (4509 is the PID for the named process) and see where named spends its time. The top output you quoted suggests that the process is not spinning in user space. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
On Mon, 12 Nov 2012, Ed LaFrance wrote: > Currently I'm not using query logging, it's not in my options at all. I think "rndc querylog" was used to enable it (even if no corresponding logging configuration). You can use it again to toggle it off. "rndc status" will show if query logging is on or off. I think in an earlier message you said rndc didn't work for you, but your named.conf does have some configuration for it, so maybe you need to use a different rndc (maybe installed multiple times?) or point to the correct configuration. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
The developer of some software we use has come up with this and it appears to work: logging{ channel error_log { file "/var/log/bind.log" versions 3 size 5m; severity error; print-time yes; print-severity yes; print-category yes; }; category default{ error_log; }; }; On 11/12/2012 8:49 AM, David Forrest wrote: On Mon, 12 Nov 2012, Ed LaFrance wrote: Hello Alan - Of course you are right, my bad. Here's the entirety of my named.conf - there's nothing pertaining to logging in here, so I guess that means that 'log everything' is the default. I would only want to log critical named errors, so if anyone has syntax they have my gratitude: No, you just get the defaults as described in the ARM 6.2.10 "Only one logging statement is used to define as many channels and categories as are wanted. If there is no logging statement, the logging configuration will be: logging { category default { default_syslog; default_debug; }; category unmatched { null; }; };" The rest of 6.2.10 shows the syntax and provides the ability to "roll" the logs much as (r)syslogd.conf does for those that syslog gets. None of my named logs go to syslog as I do have a logging statement of my choices. Dave -- (800) 362-7579 ext 1 +---+ + ColocationDedicated Servers IPv4 & IPv6 Transit + +---+ Connex Internet Services, Inc. direct: (916) 265-1568 11230 Gold Express Dr #310-313fax: (916) 880-5663 Gold River, CA 95670http://connexinternet.com +---+ ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
On Mon, 12 Nov 2012, Ed LaFrance wrote: Hello Alan - Of course you are right, my bad. Here's the entirety of my named.conf - there's nothing pertaining to logging in here, so I guess that means that 'log everything' is the default. I would only want to log critical named errors, so if anyone has syntax they have my gratitude: No, you just get the defaults as described in the ARM 6.2.10 "Only one logging statement is used to define as many channels and categories as are wanted. If there is no logging statement, the logging configuration will be: logging { category default { default_syslog; default_debug; }; category unmatched { null; }; };" The rest of 6.2.10 shows the syntax and provides the ability to "roll" the logs much as (r)syslogd.conf does for those that syslog gets. None of my named logs go to syslog as I do have a logging statement of my choices. Dave -- David Forrest St. Louis, Missouri ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
Hello Alan - Of course you are right, my bad. Here's the entirety of my named.conf - there's nothing pertaining to logging in here, so I guess that means that 'log everything' is the default. I would only want to log critical named errors, so if anyone has syntax they have my gratitude: options { directory "/var"; auth-nxdomain no; pid-file "/var/run/named/named.pid"; allow-recursion { localnets; }; allow-transfer { "none"; }; }; key "rndc-key" { algorithm hmac-md5; secret "CeMgS23y0oWE20nyv0x40Q=="; }; controls { inet 127.0.0.1 port 953 allow { 127.0.0.1; } keys { "rndc-key"; }; }; zone "." { type hint; file "named.root"; }; zone "0.0.127.IN-ADDR.ARPA" { type master; file "localhost.rev"; }; include "/etc/dnsmanager.include"; ... dnsmanager.include contains nothing but the zone definitions. Ed On 11/12/2012 8:09 AM, Alan Clegg wrote: On Nov 12, 2012, at 10:58 AM, Ed LaFrance wrote: Currently I'm not using query logging, it's not in my options at all. Are you saying that named logging by syslog into /var/log/messages is controlled by named.conf? Seems unlikely, I'd think it would be a function of syslog.conf. I'm trying to learn more about it but I'm swamped this am, just thought I'd post here to see if anyone knows a quick way to exclude named from the syslog completely. Logging queries to syslog is not on by default (in ISC distributed BIND), so something is doing it. Send us your logging stanza... (And yes, I'm absolutely sure that logging queries to syslog is handled by named.conf) AlanC -- (800) 362-7579 ext 1 +---+ + ColocationDedicated Servers IPv4 & IPv6 Transit + +---+ Connex Internet Services, Inc. direct: (916) 265-1568 11230 Gold Express Dr #310-313fax: (916) 880-5663 Gold River, CA 95670http://connexinternet.com +---+ ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
In article , Ed LaFrance wrote: > Hello Alan - > > Currently I'm not using query logging, it's not in my options at all. > Are you saying that named logging by syslog into /var/log/messages is > controlled by named.conf? Seems unlikely, I'd think it would be a > function of syslog.conf. I'm trying to learn more about it but I'm > swamped this am, just thought I'd post here to see if anyone knows a > quick way to exclude named from the syslog completely. syslog.conf tells syslogd what to do when it receives the log messages. It doesn't control the applications that send log messages in the first place, that's controlled by the application's own configuration. named doesn't log queries unless you tell it to. -- Barry Margolin Arlington, MA ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
On Nov 12, 2012, at 10:58 AM, Ed LaFrance wrote: > Currently I'm not using query logging, it's not in my options at all. Are you > saying that named logging by syslog into /var/log/messages is controlled by > named.conf? Seems unlikely, I'd think it would be a function of syslog.conf. > I'm trying to learn more about it but I'm swamped this am, just thought I'd > post here to see if anyone knows a quick way to exclude named from the syslog > completely. Logging queries to syslog is not on by default (in ISC distributed BIND), so something is doing it. Send us your logging stanza... (And yes, I'm absolutely sure that logging queries to syslog is handled by named.conf) AlanC -- Alan Clegg | +1-919-355-8851 | a...@clegg.com ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
On 11/12/2012 5:58 PM, Ed LaFrance wrote: Hello Alan - Currently I'm not using query logging, it's not in my options at all. Are you saying that named logging by syslog into /var/log/messages is controlled by named.conf? Seems unlikely, I'd think it would be a function of syslog.conf. I'm trying to learn more about it but I'm swamped this am, just thought I'd post here to see if anyone knows a quick way to exclude named from the syslog completely. Ed It's not about excluding but to decide whether to even send then logs from bind to syslogd. Regards, Eliezer -- Eliezer Croitoru https://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
Hello Alan - Currently I'm not using query logging, it's not in my options at all. Are you saying that named logging by syslog into /var/log/messages is controlled by named.conf? Seems unlikely, I'd think it would be a function of syslog.conf. I'm trying to learn more about it but I'm swamped this am, just thought I'd post here to see if anyone knows a quick way to exclude named from the syslog completely. Ed On 11/12/2012 7:34 AM, Alan Clegg wrote: On Nov 12, 2012, at 10:23 AM, Ed LaFrance wrote: I've been corresponding with several people on this issue but no one had questioned that when I pointed it out. I don't think I'd seen the logging stanza, but yes, logging to syslog is a bad thing, and logging queries to syslog is even worse. Having had someone pick this out of an strace output is indeed awesome. I really don't need this kind of logging in the messages log. I can turn on query logging in the named.conf if I need more detail on named. I think the simplest thing would just be to have an exclusion in the syslog config for named. I confess some general ignorance, so perhaps you know the directive for that? To reduce the load on named in general, just turn off query logging in the named.conf, or, you can leave the stanza in and put a "querylog no;" in your options stanza so that it is not started when named starts (I'm not sure what version introduced the querylog option, so you may need to test this. AlanC -- (800) 362-7579 ext 1 +---+ + ColocationDedicated Servers IPv4 & IPv6 Transit + +---+ Connex Internet Services, Inc. direct: (916) 265-1568 11230 Gold Express Dr #310-313fax: (916) 880-5663 Gold River, CA 95670http://connexinternet.com +---+ ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
On Nov 12, 2012, at 10:23 AM, Ed LaFrance wrote: > I've been corresponding with several people on this issue but no one had > questioned that when I pointed it out. I don't think I'd seen the logging stanza, but yes, logging to syslog is a bad thing, and logging queries to syslog is even worse. Having had someone pick this out of an strace output is indeed awesome. > I really don't need this kind of logging in the messages log. I can turn on > query logging in the named.conf if I need more detail on named. I think the > simplest thing would just be to have an exclusion in the syslog config for > named. I confess some general ignorance, so perhaps you know the directive > for that? To reduce the load on named in general, just turn off query logging in the named.conf, or, you can leave the stanza in and put a "querylog no;" in your options stanza so that it is not started when named starts (I'm not sure what version introduced the querylog option, so you may need to test this. AlanC -- Alan Clegg | +1-919-355-8851 | a...@clegg.com ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
On 12/11/12 15:23, Ed LaFrance wrote: I really don't need this kind of logging in the messages log. I can turn on query logging in the named.conf if I need more detail on named. I think the simplest thing would just be to have an exclusion in the syslog config for named. I confess some general ignorance, so perhaps Don't do that. Instead, configure named to not syslog if you don't want it to. Maybe log to files from within named, which is quicker. you know the directive for that? As per the ARM: http://www.isc.org/files/arm94_0.html#id2574861 ...the defaults are: """ Only one logging statement is used to define as many channels and categories as are wanted. If there is no logging statement, the logging configuration will be: logging { category default { default_syslog; default_debug; }; category unmatched { null; }; }; """ You can easily change this so that queries aren't logged to syslog. For example: logging { channel query_log { file "logs/query.log"; versions 4; size 10m; }; category queries { query_log; }; category default { default_syslog; default_debug; }; category unmatched { null; }; }; I would recommend tuning this further, as other log categories can generate a lot of output too. In fact, unless you need to, I would not use syslog for named at *all* e.g. logging { channel query_log { file "logs/query.log"; versions 4; size 10m; }; channel named_log { file "logs/named.log"; versions 4; size 10m; }; category queries { query_log; }; category default { named_log; }; }; ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
Hello Florian - You are my hero and new best friend. I stopped syslog: [root@ns1 lisinc]# /sbin/service syslog stop Shutting down kernel logger: [ OK ] Shutting down system logger: [ OK ] ...and all the problems cleared up instantly, so you called it correctly. I had noticed in /var/log/messages that basically every query was being logged: Nov 12 06:23:54 ns1 named[8349]: client 64.12.139.83#37778: query: 219.161.72.64.in-addr.arpa IN ANY -E Nov 12 06:23:54 ns1 named[8349]: client 208.69.32.21#17245: query: 129.160.72.64.in-addr.arpa IN PTR - Nov 12 06:23:54 ns1 named[8349]: client 64.12.139.81#31273: query: 211.21.140.204.in-addr.arpa IN PTR -E Nov 12 06:23:54 ns1 named[8349]: client 74.125.18.212#62466: query: 217.94.119.199.in-addr.arpa IN PTR - I've been corresponding with several people on this issue but no one had questioned that when I pointed it out. I really don't need this kind of logging in the messages log. I can turn on query logging in the named.conf if I need more detail on named. I think the simplest thing would just be to have an exclusion in the syslog config for named. I confess some general ignorance, so perhaps you know the directive for that? Thanks again! Ed On 11/11/2012 10:56 PM, Florian Weimer wrote: * Ed LaFrance: Thanks for chiming in. Named is PID 8349 in my case. Here's a snippet of the output from strace: [pid 8351] send(3, "<30>Nov 11 13:07:25 named[8349]:"..., 107, MSG_NOSIGNAL) = 107<0.015232> [pid 8353] send(3, "<30>Nov 11 13:07:25 named[8349]:"..., 103, [pid 8353]<... send resumed> )= 103<0.015034> This look like syslog logging is the culprit, each syslog message takes 15ms to complete. There could be several causes: syslogd is logging synchronously to disk (doing an fsync after each message), something else in the system is producing an extremely large number of messages (syslogd is single-threaded), or there is a request loop where writing out the syslog message for each reverse DNS request requires itself a reverse DNS lookup. You should also check if named is expected to log this many messages in the first place. You can pass "-s 200" to strace to see more of the logging message, so this should help to identify what's going on. I don't think this has got anything to do with the particular BIND version you use. -- (800) 362-7579 ext 1 +---+ + ColocationDedicated Servers IPv4 & IPv6 Transit + +---+ Connex Internet Services, Inc. direct: (916) 265-1568 11230 Gold Express Dr #310-313fax: (916) 880-5663 Gold River, CA 95670http://connexinternet.com +---+ ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
Hi there, On Mon, 12 Nov 2012, Ed LaFrance wrote: ... No idea on ip_conntrack. How do I check and if so, what setting should I try and how do I do it? Look for something like /proc/sys/net/netfilter/ip_conntrack_tcp_timeout_established and cat it to the terminal. It will just be a number (it's in seconds) and it's probably 432000 at the moment. You (root) can change it for example to one hour by the command /bin/echo 3600 > /proc/sys/net/netfilter/ip_conntrack_tcp_timeout_established If it's to persist across a reboot you'll need to put the command in a startup script such as rc.local or find out where the default settings are in your system and tweak it there. "something like" means that the name of the (virtual) file has changed over the years and it might now be nf_conntrack_tcp_timeout_established on your system. Search the Web for this setting - it's a very specific term - and you'll find that there are many other ways to tinker with TCP/IP. :) -- 73, Ged. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
* Ed LaFrance: > Thanks for chiming in. Named is PID 8349 in my case. Here's a snippet > of the output from strace: > [pid 8351] send(3, "<30>Nov 11 13:07:25 named[8349]:"..., 107, > MSG_NOSIGNAL) = 107 <0.015232> > [pid 8353] send(3, "<30>Nov 11 13:07:25 named[8349]:"..., 103, > [pid 8353] <... send resumed> )= 103 <0.015034> This look like syslog logging is the culprit, each syslog message takes 15ms to complete. There could be several causes: syslogd is logging synchronously to disk (doing an fsync after each message), something else in the system is producing an extremely large number of messages (syslogd is single-threaded), or there is a request loop where writing out the syslog message for each reverse DNS request requires itself a reverse DNS lookup. You should also check if named is expected to log this many messages in the first place. You can pass "-s 200" to strace to see more of the logging message, so this should help to identify what's going on. I don't think this has got anything to do with the particular BIND version you use. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
Hello - Thanks for chiming in. Named is PID 8349 in my case. Here's a snippet of the output from strace: [pid 8351] time( [pid 8352] <... sendmsg resumed> ) = 56 <0.000104> [pid 8352] recvmsg(515, {msg_name(16)={sa_family=AF_INET, sin_port=htons(38385), sin_addr=inet_addr("205.188.158.143")}, msg_iov(1)=[{"Q&\0\0\0\1\0\0\0\0\0\1\003157\003161\00272\00264\7in-ad"..., 4096}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 55 <0.31> [pid 8351] <... time resumed> NULL)= 1352668045 <0.000353> [pid 8352] futex(0x9b6aecc, FUTEX_WAIT_PRIVATE, 2, NULL [pid 8351] stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.000109> [pid 8351] stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.86> [pid 8351] stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.84> [pid 8351] send(3, "<30>Nov 11 13:07:25 named[8349]:"..., 107, MSG_NOSIGNAL) = 107 <0.015232> [pid 8351] futex(0x9b6aecc, FUTEX_WAKE_PRIVATE, 1 [pid 8353] <... futex resumed> ) = 0 <0.052813> [pid 8351] <... futex resumed> ) = 1 <0.000125> [pid 8353] time(NULL) = 1352668045 <0.20> [pid 8353] stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.25> [pid 8353] stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.22> [pid 8351] sendmsg(513, {msg_name(16)={sa_family=AF_INET, sin_port=htons(38162), sin_addr=inet_addr("205.188.158.207")}, msg_iov(1)=[{"@%\204\0\0\1\0\1\0\2\0\1\003249\00221\003140\003204\7in-a"..., 138}], msg_controllen=0, msg_flags=0}, 0 [pid 8353] stat64("/etc/localtime", [pid 8351] <... sendmsg resumed> ) = 138 <0.48> [pid 8353] <... stat64 resumed> {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.41> [pid 8351] recvmsg(513, [pid 8353] send(3, "<30>Nov 11 13:07:25 named[8349]:"..., 103, MSG_NOSIGNAL [pid 8351] <... recvmsg resumed> {msg_name(16)={sa_family=AF_INET, sin_port=htons(53507), sin_addr=inet_addr("205.188.158.206")}, msg_iov(1)=[{"\244\273\0\0\0\1\0\0\0\0\0\1\003246\003161\00272\00264\7in-ad"..., 4096}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 55 <0.86> [pid 8351] futex(0x9b6aecc, FUTEX_WAIT_PRIVATE, 2, NULL [pid 8353] <... send resumed> )= 103 <0.015034> [pid 8353] futex(0x9b6aecc, FUTEX_WAKE_PRIVATE, 1) = 1 <0.25> [pid 8350] <... futex resumed> ) = 0 <0.051772> [pid 8350] time( [pid 8353] sendmsg(513, {msg_name(16)={sa_family=AF_INET, sin_port=htons(60702), sin_addr=inet_addr("64.12.139.17")}, msg_iov(1)=[{"\343F\204\0\0\1\0\1\0\2\0\1\003251\003160\00272\00264\7in-ad"..., 151}], msg_controllen=0, msg_flags=0}, 0 [pid 8350] <... time resumed> NULL)= 1352668045 <0.000210> [pid 8353] <... sendmsg resumed> ) = 151 <0.84> [pid 8350] stat64("/etc/localtime", [pid 8353] recvmsg(513, [pid 8350] <... stat64 resumed> {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.85> [pid 8353] <... recvmsg resumed> {msg_name(16)={sa_family=AF_INET, sin_port=htons(3794), sin_addr=inet_addr("64.12.139.19")}, msg_iov(1)=[{"|\354\0\0\0\1\0\0\0\0\0\1\00230\003160\00272\00264\7in-add"..., 4096}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 54 <0.000150> [pid 8350] stat64("/etc/localtime", [pid 8353] futex(0x9b6aecc, FUTEX_WAIT_PRIVATE, 2, NULL [pid 8350] <... stat64 resumed> {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.76> [pid 8350] stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.29> [pid 8350] send(3, "<30>Nov 11 13:07:25 named[8349]:"..., 102, MSG_NOSIGNAL On 11/11/2012 1:46 PM, Florian Weimer wrote: * Ed LaFrance: Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server (3Ghz) with 2GB RAM. Named is being used only for rDNS queries against our address space. You should really upgrade to the latest version on that branch (likely bind-9.3.6-20.P1.el5_8.5). The bottom line is: I need to improve named performance. Tcpdump only shows about 20 requests per second on average, I would estimate. This should be handled easily, but instead it's gagging on it and the requests are stacking up. Something is stalling the named process. Try to run "strace -T -f -p 4509" (4509 is the PID for the named process) and see where named spends its time. The top output you quoted suggests that the process is not spinning in user space. -- (800) 362-7579 ext 1 +---+ + ColocationDedicated Servers IPv4 & IPv6 Transit + +---+ Connex Internet Services, Inc. direct: (916) 265-1568 11230 Gold Express Dr #310-313fax: (916) 880-5663 Gold River, CA 95670http://connexinternet.com +-
Re: Need to improve named performance
* Ed LaFrance: > Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server > (3Ghz) with 2GB RAM. Named is being used only for rDNS queries against > our address space. You should really upgrade to the latest version on that branch (likely bind-9.3.6-20.P1.el5_8.5). > The bottom line is: I need to improve named performance. Tcpdump only > shows about 20 requests per second on average, I would estimate. This > should be handled easily, but instead it's gagging on it and the > requests are stacking up. Something is stalling the named process. Try to run "strace -T -f -p 4509" (4509 is the PID for the named process) and see where named spends its time. The top output you quoted suggests that the process is not spinning in user space. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
On 11/10/2012 1:39 PM, Ed LaFrance wrote: Hello all - First post to this list, hope I'm on the right place. Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 on a quadcore xeon server (3Ghz) with 2GB RAM. Named is being used only for rDNS queries against our address space. The issue is that named is not keeping up with rdns requests. The nameserver is only doing rdns, and it's the only public process on the server (no webhosting, monitoring, etc). When I check the router above this server I'll see 200 - 500 legitimate connections to this server at any given time. This is what's happening: named is not keeping up with the requests, so the network receive queue fills up - I can see this with netstat: netstat -tulpn | grep :53 Proto Recv-Q Send-Q Local Address Foreign Address PID/Program name ... udp 110048 0 xxx.xxx.xxx.xxx:53 0.0.0.0:* 3918/named udp 110048 0 xxx.xxx.xxx.xxx:53 0.0.0.0:* 3918/named (two different IPs are on this machine to handle rDNS reqeusts) Once the queue gets near the max value set by sysctl, udp packets start to drop - this can also be seen in netstat: netstat -su ... Udp: 5157567 packets received 9761 packets to unknown port received. 1164232 packet receive errors 5157554 packets sent The errors apparently correspond to drops; the only increase when the queue is full. Of course by this point dns queries are timing out. I've tried increasing the queue size with sysctl using this command: sysctl -w net.core.rmem_max=1048576 net.core.rmem_default=10485 then restarting named; that did eliminate the drops, but the queue grows gigantic and I get pretty much 100% dns lookup timeouts at that point. The server loading is about 2.0 - busy, not not overwhelmed, I can run a shell or even a gui session on it with ease so it's by no means maxed out. Here's the first slice of top output: top - 09:13:38 up 18:40, 1 user, load average: 2.09, 2.05, 2.00 Tasks: 175 total, 1 running, 174 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 74.8%id, 24.7%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 2074984k total, 1743584k used, 331400k free, 166588k buffers Swap: 4128760k total, 28k used, 4128732k free, 1270032k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4509 named 24 0 71004 4580 2036 S 1.3 0.2 0:46.74 named 6877 root 15 0 2428 1064 788 R 0.7 0.1 0:00.04 top 467 root 10 -5 000 D 0.3 0.0 2:59.13 kjournald 2460 root 18 0 1816 584 484 D 0.3 0.0 3:30.35 syslogd 1 root 15 0 2160 644 556 S 0.0 0.0 0:01.08 init The bottom line is: I need to improve named performance. Tcpdump only shows about 20 requests per second on average, I would estimate. This should be handled easily, but instead it's gagging on it and the requests are stacking up. If you have any ideas, I welcome your input. Here's named.conf, it's pretty basic for the global config, the data for each zone is stored separately elsewhere: options { directory "/var"; auth-nxdomain no; pid-file "/var/run/named/named.pid"; allow-recursion { localnets; }; allow-transfer { "none"; }; }; key "rndc-key" { algorithm hmac-md5; secret "xx"; }; controls { inet 127.0.0.1 port 953 allow { 127.0.0.1; } keys { "rndc-key"; }; }; zone "." { type hint; file "named.root"; }; zone "0.0.127.IN-ADDR.ARPA" { type master; file "localhost.rev"; }; I wouldn't expect a nameserver process on Linux, hosting only a few reverse zones and doing nothing else, to be 71 megabytes in size; I just checked one of ours, serving *all* of our internal zone data, forward and reverse authoritative, plus some cached data for a significant number of zones delegated to business partners, and it's less than 100 Mb in size. Verify from your query logs, or by dumping cache, that it's *only* doing what it is supposed to do, and no more. If you've got a bunch of data in your cache, or a bunch of queries, that's unrelated to serving your reverse DNS, then that's probably the root cause of your problem. Consider turning off recursion, or severely limiting it, in order to enforce that the nameserver is only serving its intended purpose. 2Gb of memory is a little lean for a nameserver serving a *generic* Internet-name-lookup role... I guess another possibility is that you've gone crazy with your reverse zones (e.g. using $GENERATE willy-nilly), and thus are using up way more memory than you really need, to serve your reverse-resolution needs. - Kevin ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lis
Re: Need to improve named performance
Hi there, On Sun, 11 Nov 2012, Ed LaFrance wrote: Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 ... Somebody already said upgrade. Generally that's the first thing to do in a case like this (before asking on mailing lists:). The issue is that named is not keeping up with rdns requests. The nameserver is only doing rdns, and it's the only public process on the server (no webhosting, monitoring, etc). When I check the router above this server I'll see 200 - 500 legitimate connections to this server at any given time. ... I'm not convinced that BIND is the problem. What does 'top' tell you? Are you running netfilter/iptables on the box? Might be ip_conntrack. I once had an issue with a lot of dropped TCP connections, each of which was hanging around for five days (the default). They filled the connection tracking table. The default is too long, ridiculously so. After I reduced it to something more reasonable the problem went away. -- 73, Ged. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
Hello Alan - It's also worth noting that, since I have more IPs on the box than the ones that are designated as nameservers, and since I have dns listening on all addresses, I can query named using one of the non-nameserver IPs - and it works fine! For instance: nslookup x.x.x.29 y.y.y.114 Server: y.y.y.114 Address:y.y.y.114#53 29.x.x.x.in-addr.arpa name = foo.bar.net. The problem is that the UDP receive queue is flooded for the two IPs that correspond to the two nameservers on this box. I.e. ns2.mydomain.net = y.y.y.115 nslookup x.x.x.29 y.y.y.115 ;; connection timed out; no servers could be reached but since y.y.y.112/29 is on this box, you can query: nslookup x.x.x.29 y.y.y.116 Server: y.y.y.116 Address:y.y.y.116#53 29.x.x.x.in-addr.arpa name = foo-bar.net. [cololine@ns3 ~]$ nslookup x.x.x.29 y.y.y.117 Server: y.y.y.117 Address:y.y.y.117#53 29.x.x.x.in-addr.arpa name = foo.bar.net. ...etc. What I need, hope for, want, is someone to tell me how to fix upnamed and/or UDP on this box so it can keep up with requests that are happening on the nameserver IPs, as clearly the server can do what it needs to do if I can get past this brokenness. Thanks, Ed On 11/10/2012 3:46 PM, Alan Clegg wrote: On Nov 10, 2012, at 1:39 PM, Ed LaFrance wrote: When I check the router above this server I'll see 200 - 500 legitimate connections to this server at any given time. Having sent my snarky "update" e-mail, I now ask... you say later in the mail that you are doing about 20 queries per second (which I agree should be handled by any hardware with more oomph than a Z-80). I'm curious as to what these "200-500 legitimate connections" are. Are they TCP? If so, are you seeing lots of TCP connections hanging around? Do you have some firewall in the midst of this that might be messing around with TCP connections? If you do a "rndc recursing", what do you get? If you are only doing 20-30 transactions per second, the stats on the UDP counts would have taken a long time to get there... something doesn't add up. AlanC -- (800) 362-7579 ext 1 +---+ + ColocationDedicated Servers IPv4 & IPv6 Transit + +---+ Connex Internet Services, Inc. direct: (916) 265-1568 11230 Gold Express Dr #310-313fax: (916) 880-5663 Gold River, CA 95670http://connexinternet.com +---+ ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
Hello Alan - I will do an upgrade as soon as I get chance - a bit tied up right now. But in any case, since I posted this I've done some query logging for a bit and find that I'm getting an average of about 60 queries per second. All the dns queries are coming in via udp - the connections I mentioned are likewise udp. As I mentioned before, netstat shoes the udp Recv-Q filling up on the two IPs on that server that are taking the requests. There's a basic firewall setup on the server, only ports I need are open: Chain INPUT (policy ACCEPT) target prot opt source destination RH-Firewall-1-INPUT all -- 0.0.0.0/00.0.0.0/0 Chain FORWARD (policy ACCEPT) target prot opt source destination RH-Firewall-1-INPUT all -- 0.0.0.0/00.0.0.0/0 Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain RH-Firewall-1-INPUT (2 references) target prot opt source destination ACCEPT all -- 0.0.0.0/00.0.0.0/0 ACCEPT icmp -- 0.0.0.0/00.0.0.0/0 icmp type 255 ACCEPT esp -- 0.0.0.0/00.0.0.0/0 ACCEPT ah -- 0.0.0.0/00.0.0.0/0 ACCEPT udp -- 0.0.0.0/0224.0.0.251 udp dpt:5353 ACCEPT udp -- 0.0.0.0/00.0.0.0/0 udp dpt:631 ACCEPT tcp -- 0.0.0.0/00.0.0.0/0 tcp dpt:631 ACCEPT all -- 0.0.0.0/00.0.0.0/0 state RELATED,ESTABLISHED ACCEPT tcp -- 0.0.0.0/00.0.0.0/0 state NEW tcp dpt:10022 ACCEPT udp -- 0.0.0.0/00.0.0.0/0 state NEW udp dpt:53 ACCEPT tcp -- 0.0.0.0/00.0.0.0/0 state NEW tcp dpt:53 ACCEPT tcp -- 0.0.0.0/00.0.0.0/0 state NEW tcp dpt:5900 ACCEPT tcp -- 0.0.0.0/00.0.0.0/0 state NEW tcp dpt:5901 ACCEPT tcp -- 0.0.0.0/00.0.0.0/0 state NEW tcp dpt:8550 REJECT all -- 0.0.0.0/00.0.0.0/0 reject-with icmp-host-prohibited As far as recursing goes: /usr/sbin/rndc recursing rndc: 'recursing' failed: permission denied Any ideas are welcome Ed On 11/10/2012 3:46 PM, Alan Clegg wrote: On Nov 10, 2012, at 1:39 PM, Ed LaFrance wrote: When I check the router above this server I'll see 200 - 500 legitimate connections to this server at any given time. Having sent my snarky "update" e-mail, I now ask... you say later in the mail that you are doing about 20 queries per second (which I agree should be handled by any hardware with more oomph than a Z-80). I'm curious as to what these "200-500 legitimate connections" are. Are they TCP? If so, are you seeing lots of TCP connections hanging around? Do you have some firewall in the midst of this that might be messing around with TCP connections? If you do a "rndc recursing", what do you get? If you are only doing 20-30 transactions per second, the stats on the UDP counts would have taken a long time to get there... something doesn't add up. AlanC -- (800) 362-7579 ext 1 +---+ + ColocationDedicated Servers IPv4 & IPv6 Transit + +---+ Connex Internet Services, Inc. direct: (916) 265-1568 11230 Gold Express Dr #310-313fax: (916) 880-5663 Gold River, CA 95670http://connexinternet.com +---+ ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
On Nov 10, 2012, at 1:39 PM, Ed LaFrance wrote: > When I check the router above this server I'll see 200 - 500 legitimate > connections to this server at any given time. Having sent my snarky "update" e-mail, I now ask... you say later in the mail that you are doing about 20 queries per second (which I agree should be handled by any hardware with more oomph than a Z-80). I'm curious as to what these "200-500 legitimate connections" are. Are they TCP? If so, are you seeing lots of TCP connections hanging around? Do you have some firewall in the midst of this that might be messing around with TCP connections? If you do a "rndc recursing", what do you get? If you are only doing 20-30 transactions per second, the stats on the UDP counts would have taken a long time to get there... something doesn't add up. AlanC -- Alan Clegg | +1-919-355-8851 | a...@clegg.com ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Need to improve named performance
On Nov 10, 2012, at 1:39 PM, Ed LaFrance wrote: > Running BIND 9.3.6-P1-RedHat-9.3.6-16.P1.el5 Before everyone else says it... upgrade. AlanC -- Alan Clegg | +1-919-355-8851 | a...@clegg.com signature.asc Description: Message signed with OpenPGP using GPGMail ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users