Re: How does BIND 9 scale with multithreading?

2010-09-29 Thread Fabien Seisen
2010/9/29 Eivind Olsen eiv...@aminor.no

 Does anyone know if there are any benchmarks out in the public, which
 could give some insight into how well BIND 9 scales with multithreading?
 I've tried looking on this list, and googling, but haven't found anything
 yet.

 To be a bit more specific - I'm not sure what a good option for server
 hardware would be for a recursive DNS server. On one hand, the Sun (ok,
 Oracle) Niagara/Coolthreads architecture seems to work nicely enough, but
 maybe I'd be better off with some generic Intel/AMD based solution with
 fewer threads/cores but higher GHz per thread?


i did some test and Niagara (T1000 / T5240) performs badly (response time
and rate) compared to Intel/AMD

some numbers at 75% cpu
T1000 6 cores / 24threads~10ms 600 queries/second
2-core AMD 1210 1.8ghz:~0.6ms   7000 queries/second
8-core Intel E5410 2.33ghz: ~0.6ms  7 queries/second

-- 
Fabien
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Bind hang out when named reach to 5-600 Mb

2010-07-12 Thread Fabien Seisen
2010/7/8 khanh rua duonghoahoc_k4...@yahoo.com:
 Hi,

 I install bind as a cache server on Solaris 10, Sun Sparc T5140. It has
 problem, bind always hang out when named reach to 5-600 Mb ('prstat' check).
 I have several servers and all have this problem even when i install bind in
 zone or try with a 64bit version.  T5140's a powerful server but bind can't
 make use of its power. I'm newb with bind an so i have just try some other
 way but useless. What should i do to track this problem ?

is this specific to T5140 ? which server type did you use before ?

Some time ago, i did some simple benchmark (dnsperf / queryperf) on
T1000 and T5240 and the results were bad.

my numbers (bind caching server):
SUN X2100 can serve 7000 queries/s with 0.6-1ms response time
SUN T1000 can serve 600 queries/s with 10-15ms response time (more
than 600 means, response time jumps over 100ms)

You should do some benchmark (and heavily use rndc stats) before
choosing a new architecture

-- 
Fabien
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

bind 9.6.2 / solaris 10 intel / gcc 4 / compilation warning

2010-03-23 Thread Fabien Seisen
Hi,

when compiling bind, i saw some warnings.

my build box:
- Solaris 10 U8 i386
- gcc (GCC) 4.3.4 (from BlastWave.org)

./configure --without-openssl --prefix=/opt/bind-9.6.2 --sysconfdir=/etc
 --localstatedir=/var --disable-ipv6 --enable-threads

gcc  -I/opt/compil/bind-9.6.2 -I./include -I./../pthreads/include
-I../include -I./../include -I./..  -D_REENTRANT  -D_XPG4_2
-D__EXTENSIONS__ -g -O2 -I/usr/incl
ude/libxml2  -W -Wall -Wmissing-prototypes -Wcast-qual -Wwrite-strings
-Wformat -Wpointer-arith -fno-strict-aliasing  -c net.c
net.c:109: warning: braces around scalar initializer
net.c:109: warning: (near initialization for
'once_ipv6pktinfo.__pthread_once_pad[0]')
net.c:109: warning: excess elements in scalar initializer
net.c:109: warning: (near initialization for
'once_ipv6pktinfo.__pthread_once_pad[0]')
net.c:109: warning: excess elements in scalar initializer
net.c:109: warning: (near initialization for
'once_ipv6pktinfo.__pthread_once_pad[0]')
net.c:109: warning: excess elements in scalar initializer
net.c:109: warning: (near initialization for
'once_ipv6pktinfo.__pthread_once_pad[0]')

net.c:113: warning: braces around scalar initializer
net.c:113: warning: (near initialization for 'once.__pthread_once_pad[0]')
net.c:113: warning: excess elements in scalar initializer
net.c:113: warning: (near initialization for 'once.__pthread_once_pad[0]')
net.c:113: warning: excess elements in scalar initializer
net.c:113: warning: (near initialization for 'once.__pthread_once_pad[0]')
net.c:113: warning: excess elements in scalar initializer
net.c:113: warning: (near initialization for 'once.__pthread_once_pad[0]')

net.c:370: warning: 'initialize_ipv6pktinfo' defined but not used

same warning on files:
net.c:113
strerror.c:51
hash.c:103
lib.c:42
mem.c:114
random.c:39
result.c:110
lib.c:57
result.c:55
acl.c:481
db.c:68
dlz.c:83
lib.c:44
name.c:198
result.c:188
dst_lib.c:46
dst_result.c:58
statschannel.c:76
lwresd.c:73
-- 
Fabien
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: bind 9.6.2 with threads hangs

2010-03-22 Thread Fabien Seisen
2010/3/19 Chris Thompson c...@cam.ac.uk

 On Mar 19 2010, David Ford wrote:

  BIND has long had issues with threading since it started supporting
 threaded operation.  I recommend you simply recompile without thread
 support.

 I retry compiling with thread support about twice a year and as of late
 last year, BIND still hung soon after restart with threading enabled.


 Experiences seem to differ widely in this respect. We've been running BIND
 threaded for many years now, on Solaris platforms (currently 9.6.2 under
 Solaris 10_x86), without encountering this sort of problem.


how many queries did your named answered ?


 To the OP: do you specify max_cache_size? If not, what does the memory
 consumption of BIND look like when it gets into the non-functional state?


yes, max-cache-size 512M but named process takes ~900MB


-- 
Fabien
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: bind 9.6.2 with threads hangs

2010-03-22 Thread Fabien Seisen
2010/3/22 Cathy Almond cat...@isc.org

 Fabien Seisen wrote:

 yes, max-cache-size 512M but named process takes ~900MB

 The extra memory is for keeping track of recursive clients (i.e.
 in-progress client queries).


ok

This doesn't sound like a hugely loaded server,


exact, on my own test (with real life queries), the server can handle
 ~7 queries/s with response time ~1ms at 70% cpu and no
packet lost.

else it's somewhat throttled (not particularly large cache and probably
 default

limit on recursive clients).  What kind of query rates do you have?  Do you
 get
 any logging that suggests resource problems?  If so, you might need to
 increase some of the limits.


We have a pool of several more or less identicals servers with a
load-balancer in front.

On average, each server gets 1800 queries/s and 4000 at peak.

The problem occurs every few weeks and never on all servers at a time.

Recursive clients config is not modified (rndc status: recursive clients:
188/2900/3000) and we have
- on avg: 200 recursive clients
- at peak 600

It's intriguing that you're seeing the same issues on two bind versions
 and two OS (and that other people's experience is different from yours)


only Solaris 10
- Solaris 10 U6 with bind 9.5.1-P3 with threads compiled with SUNSpro 12
- Solaris 10 U6 with bind 9.6.2  with threads compiled with gcc


 - it suggests to me that it's specific to your configuration or client
 base/queries or your environment.


we gets real life queries from customers (evil?).

A simple rndc flush revives named.

Perhaps, a bad formated packet freeze named or create a cache dead lock

Can something go wrong in the cache ?

I am not fluent with core files but i have got one in my pocket.

For troubleshooting I'd start by looking at the logging output - if
 you've got any categories going to null, un-suppress them temporarily;
 and add query-errors (see 9.6.2 ARM).  Then perhaps do some sampling of
 network traffic (perhaps there's a UDP message size/fragmentation issue)
  to see what's happening (or not).


all category to non-null and we do not use specific 9.6.2 configuration.
I did not noticied weird log message (beside regular: shutting down due to
TCP receive error: 202.96.209.6#53: connection reset)

here is our log config:
category client { client.log; };
category config { config.log; default_syslog; };
category database { database.log; default_syslog; };
category default { default.log; default_syslog; };
category delegation-only { delegation-only.log; };
category dispatch { dispatch.log; };
category general { default.log; };
category lame-servers { lamers.log; };
category network { network.log; };
category notify { notify.log; default_syslog; };
category queries { queries.log; };
category resolver { resolver.log; };
category security { security; };
category unmatched { unmatched.log; };
category update { update.log; };
category xfer-in { xfer-in.log; default_syslog; };
category xfer-out { xfer-out.log; default_syslog; };

-- 
Fabien
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

bind 9.6.2 with threads hangs

2010-03-19 Thread Fabien Seisen
Hi,

We have several recursive cache bind servers and experiencing weird things
when named is compiled with-threads:

In 4 steps:

1) everything goes ok

2) for ~1h named began to answer slower (0,5ms to 100ms) and with symptoms:
  - load increase on the server (from 0,3 to 4)
  - number of recursive queries increase (+500%)
  - number of recursive slot increase (from 200 to 600)
  - cache hit decrease (from 9X% to
  - number of cache entries drops from 2M to 0

3) named answer no query
  - no recursive queries
  - 0 entry in cache
  - rndc stats/status works

4) We flush the named cache (rndc flush) and everything goes ok

We do a rndc stats every minute to get some stats.

Hardware:
 - intel or amd with a total of 4 or 8 cores
 - solaris 10
 - bind 9.6.2 with threads (gcc) or bind 9.5.1-P3 with threads (SUNWspro)

any clue ?






some numbers from named.stats :

++ Name Server Statistics ++
   437118882 IPv4 requests received
++ Zone Maintenance Statistics ++
++ Resolver Statistics ++
 120096973 IPv4 queries sent
   29784114 queries with RTT  10ms
   49289542 queries with RTT 10-100ms
   33448291 queries with RTT 100-500ms
   277957 queries with RTT 500-800ms
   105059 queries with RTT 800-1600ms
 31079 queries with RTT  1600ms

[View: _bind]
++ Socket I/O Statistics ++
   120075062 UDP/IPv4 sockets opened
   35059 TCP/IPv4 sockets opened
   120074870 UDP/IPv4 sockets closed
   42651 TCP/IPv4 sockets closed
   13116 UDP/IPv4 socket bind failures
 5513 TCP/IPv4 socket connect failures
   120061921 UDP/IPv4 connections established
 6901 TCP/IPv4 connections established
 7599 TCP/IPv4 connections accepted
 276089 UDP/IPv4 recv errors
   315 TCP/IPv4 recv errors
++ Cache DB RRsets ++
[View: mire]
[View: abonnes]
  885677 A
  751488 NS
  171869 CNAME
  144655 PTR
  312051 MX
41667 RRSIG
38816 NSEC
  130572 NXDOMAIN

-- 
Fabien
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users