Re: Bind named 9.7.2-P2 segfault and core dump when in debug mode

2010-09-29 Thread Mark Andrews

In message <62426.10.0.66.17.1285784847.squir...@interact.purplecow.org>, Denni
s Clarke writes:
> 
> I am trying to track down a bit of strange behavior. Not sure if anyone
> else sees this.
> 
> I tend to run named in the foreground and in debug level 2 for a while
> after I compile it. If all looks good then I can run it as a service
> daemon in the usual way.
> 
> This means I run it like so :
> 
> bash-3.00# /opt/csw/sbin/named -4 -c /etc/opt/csw/named.conf \
> > -d 2 -f -g -n 1 -p 53 -u named
> 
> Note the -d 2 there.
> 
> 29-Sep-2010 17:31:43.715 starting BIND 9.7.2-P2 -4 -c
> /etc/opt/csw/named.conf -d 2 -f -g -n 1 -p 53 -u named
> .
> .
> .
> 
> Everything seems to be fine until I saw this after 20 minutes or so :
> 
> 29-Sep-2010 17:40:35.964 error (unexpected RCODE REFUSED) resolving
> '243.136.240.111.dun.dnsrbl.net/A/IN': 66.11.124.26#53
> 29-Sep-2010 17:40:35.965 client 66.225.151.243#45979: query failed
> (SERVFAIL) for 243.136.240.111.dun.dnsrbl.net/IN/A at query.c:4650
> 
> 
>At this point it is just hung.
> 
> 
> So I started it up again in the exact same way and then went to a client
> machine and issued this query via dig :
> 
> $ /opt/csw/bin/dig +qr @ns1.blastwave.org 243.136.240.111.dun.dnsrbl.net
> 
> ; <<>> DiG 9.7.2-P2 <<>> +qr @ns1.blastwave.org
> 243.136.240.111.dun.dnsrbl.net
> ; (1 server found)
> ;; global options: +cmd
> ;; Sending:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 430
> ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;243.136.240.111.dun.dnsrbl.net.IN  A
> 
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 430
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;243.136.240.111.dun.dnsrbl.net.IN  A
> 
> ;; Query time: 235 msec
> ;; SERVER: 66.225.151.252#53(66.225.151.252)
> ;; WHEN: Wed Sep 29 17:47:25 2010
> ;; MSG SIZE  rcvd: 48
> 
> So that response looks okay but
> 
> QUERY, status: SERVFAIL, id: 430
> 
> also the named process was hung again :
> 
> 29-Sep-2010 17:47:24.850 createfetch: 243.136.240.111.dun.dnsrbl.net A
> 29-Sep-2010 17:47:24.986 error (unexpected RCODE REFUSED) resolving
> '243.136.240.111.dun.dnsrbl.net/A/IN': 66.11.124.26#53
> 29-Sep-2010 17:47:25.081 lame server resolving
> '243.136.240.111.dun.dnsrbl.net' (in 'dnsrbl.net'?): 66.11.124.30#53
> 29-Sep-2010 17:47:25.082 client 66.225.151.227#24722: query failed
> (SERVFAIL) for 243.136.240.111.dun.dnsrbl.net/IN/A at query.c:4650
> Killed
> 
> So I figured I would run this in debug level 4 :
> 
> bash-3.00# /opt/csw/sbin/named -4 -c /etc/opt/csw/named.conf -d 4 -f -g -n
> 1 -p 53 -u named
> 29-Sep-2010 17:48:56.400 starting BIND 9.7.2-P2 -4 -c
> /etc/opt/csw/named.conf -d 4 -f -g -n 1 -p 53 -u named
> .
> .
> .
> 29-Sep-2010 17:48:56.455 loading configuration from '/etc/opt/csw/named.conf'
> 29-Sep-2010 17:48:56.470 reading built-in trusted keys from file
> '/etc/opt/csw/bind.keys'
> Segmentation Fault (core dumped)
> 
> bash-3.00# file core
> core: ELF 32-bit MSB core file SPARC Version 1, from 'named'
> 
> wow .. that is really not good.
> 
> What failed ?
> 
> bash-3.00# mdb /opt/csw/sbin/sparcv8/named core
> Loading modules: [ libc.so.1 ld.so.1 ]
> > ::status
> debugging core file of named (32-bit) from callistoz
> file: /opt/csw/sbin/sparcv8/named
> initial argv:
> /opt/csw/sbin/named -4 -c /etc/opt/csw/named.conf -d 3 -f -g -n 1 -p 53 -u
> name
> threading model: multi-threaded
> status: process terminated by SIGSEGV (Segmentation Fault)
> 
> > $c
> libc.so.1`strlen+0x18(cf73c, 2000, a7cb0, fe94fc00, cf720, fe8c0d60)
> libisc.so.62`isc_log_doit+0x794(cf700, fee27ab0, c4f44, 3, 0, 0)
> libisc.so.62`isc_log_write+0x60(cf700, fee27ab0, c4f44, 3, a7cb0, a7cd8)
> set_limit+0x210(a7cb0, a7cd8, a7cd8, 9, , fffd)
> set_limits+0x64(fe94fdf8, d89e0, ff0719c0, fe94fe14, a8028, d89e0)
> load_configuration+0x7d4(ffbffe22, dc758, 1, 0, ea758, 5fc28)
> run_server+0x420(ea758, e89c8, fe935840, 0, fe7e0200, fe8c21f0)
> libisc.so.62`dispatch+0x7d8(d6758, 0, 0, 0, fe7e0200, 1)
> libisc.so.62`run+0x14(d6758, fe95, 0, 0, fedde7f0, 0)
> libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)
> 
> > ::stack
> libc.so.1`strlen+0x18(cf73c, 2000, a7cb0, fe94fc00, cf720, fe8c0d60)
> libisc.so.62`isc_log_doit+0x794(cf700, fee27ab0, c4f44, 3, 0, 0)
> libisc.so.62`isc_log_write+0x60(cf700, fee27ab0, c4f44, 3, a7cb0, a7cd8)
> set_limit+0x210(a7cb0, a7cd8, a7cd8, 9, , fffd)
> set_limits+0x64(fe94fdf8, d89e0, ff0719c0, fe94fe14, a8028, d89e0)
> load_configuration+0x7d4(ffbffe22, dc758, 1, 0, ea758, 5fc28)
> run_server+0x420(ea758, e89c8, fe935840, 0, fe7e0200, fe8c21f0)
> libisc.so.62`dispatch+0x7d8(d6758, 0, 0, 0, fe7e0200, 1)
> libisc.so.62`run+0x14(d6758, fe95, 0, 0, fedde7f0, 0)
> libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)
> 
> bash-3.00# pstack core
> core 'core' of 1647:/opt/csw/sbin/named -4 -c /etc/opt/csw/named.conf
> -d 3 -f -g -n 1 -p 5

Bind named 9.7.2-P2 segfault and core dump when in debug mode

2010-09-29 Thread Dennis Clarke

I am trying to track down a bit of strange behavior. Not sure if anyone
else sees this.

I tend to run named in the foreground and in debug level 2 for a while
after I compile it. If all looks good then I can run it as a service
daemon in the usual way.

This means I run it like so :

bash-3.00# /opt/csw/sbin/named -4 -c /etc/opt/csw/named.conf \
> -d 2 -f -g -n 1 -p 53 -u named

Note the -d 2 there.

29-Sep-2010 17:31:43.715 starting BIND 9.7.2-P2 -4 -c
/etc/opt/csw/named.conf -d 2 -f -g -n 1 -p 53 -u named
.
.
.

Everything seems to be fine until I saw this after 20 minutes or so :

29-Sep-2010 17:40:35.964 error (unexpected RCODE REFUSED) resolving
'243.136.240.111.dun.dnsrbl.net/A/IN': 66.11.124.26#53
29-Sep-2010 17:40:35.965 client 66.225.151.243#45979: query failed
(SERVFAIL) for 243.136.240.111.dun.dnsrbl.net/IN/A at query.c:4650


   At this point it is just hung.


So I started it up again in the exact same way and then went to a client
machine and issued this query via dig :

$ /opt/csw/bin/dig +qr @ns1.blastwave.org 243.136.240.111.dun.dnsrbl.net

; <<>> DiG 9.7.2-P2 <<>> +qr @ns1.blastwave.org
243.136.240.111.dun.dnsrbl.net
; (1 server found)
;; global options: +cmd
;; Sending:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 430
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;243.136.240.111.dun.dnsrbl.net.IN  A

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 430
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;243.136.240.111.dun.dnsrbl.net.IN  A

;; Query time: 235 msec
;; SERVER: 66.225.151.252#53(66.225.151.252)
;; WHEN: Wed Sep 29 17:47:25 2010
;; MSG SIZE  rcvd: 48

So that response looks okay but

QUERY, status: SERVFAIL, id: 430

also the named process was hung again :

29-Sep-2010 17:47:24.850 createfetch: 243.136.240.111.dun.dnsrbl.net A
29-Sep-2010 17:47:24.986 error (unexpected RCODE REFUSED) resolving
'243.136.240.111.dun.dnsrbl.net/A/IN': 66.11.124.26#53
29-Sep-2010 17:47:25.081 lame server resolving
'243.136.240.111.dun.dnsrbl.net' (in 'dnsrbl.net'?): 66.11.124.30#53
29-Sep-2010 17:47:25.082 client 66.225.151.227#24722: query failed
(SERVFAIL) for 243.136.240.111.dun.dnsrbl.net/IN/A at query.c:4650
Killed

So I figured I would run this in debug level 4 :

bash-3.00# /opt/csw/sbin/named -4 -c /etc/opt/csw/named.conf -d 4 -f -g -n
1 -p 53 -u named
29-Sep-2010 17:48:56.400 starting BIND 9.7.2-P2 -4 -c
/etc/opt/csw/named.conf -d 4 -f -g -n 1 -p 53 -u named
.
.
.
29-Sep-2010 17:48:56.455 loading configuration from '/etc/opt/csw/named.conf'
29-Sep-2010 17:48:56.470 reading built-in trusted keys from file
'/etc/opt/csw/bind.keys'
Segmentation Fault (core dumped)

bash-3.00# file core
core: ELF 32-bit MSB core file SPARC Version 1, from 'named'

wow .. that is really not good.

What failed ?

bash-3.00# mdb /opt/csw/sbin/sparcv8/named core
Loading modules: [ libc.so.1 ld.so.1 ]
> ::status
debugging core file of named (32-bit) from callistoz
file: /opt/csw/sbin/sparcv8/named
initial argv:
/opt/csw/sbin/named -4 -c /etc/opt/csw/named.conf -d 3 -f -g -n 1 -p 53 -u
name
threading model: multi-threaded
status: process terminated by SIGSEGV (Segmentation Fault)

> $c
libc.so.1`strlen+0x18(cf73c, 2000, a7cb0, fe94fc00, cf720, fe8c0d60)
libisc.so.62`isc_log_doit+0x794(cf700, fee27ab0, c4f44, 3, 0, 0)
libisc.so.62`isc_log_write+0x60(cf700, fee27ab0, c4f44, 3, a7cb0, a7cd8)
set_limit+0x210(a7cb0, a7cd8, a7cd8, 9, , fffd)
set_limits+0x64(fe94fdf8, d89e0, ff0719c0, fe94fe14, a8028, d89e0)
load_configuration+0x7d4(ffbffe22, dc758, 1, 0, ea758, 5fc28)
run_server+0x420(ea758, e89c8, fe935840, 0, fe7e0200, fe8c21f0)
libisc.so.62`dispatch+0x7d8(d6758, 0, 0, 0, fe7e0200, 1)
libisc.so.62`run+0x14(d6758, fe95, 0, 0, fedde7f0, 0)
libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)

> ::stack
libc.so.1`strlen+0x18(cf73c, 2000, a7cb0, fe94fc00, cf720, fe8c0d60)
libisc.so.62`isc_log_doit+0x794(cf700, fee27ab0, c4f44, 3, 0, 0)
libisc.so.62`isc_log_write+0x60(cf700, fee27ab0, c4f44, 3, a7cb0, a7cd8)
set_limit+0x210(a7cb0, a7cd8, a7cd8, 9, , fffd)
set_limits+0x64(fe94fdf8, d89e0, ff0719c0, fe94fe14, a8028, d89e0)
load_configuration+0x7d4(ffbffe22, dc758, 1, 0, ea758, 5fc28)
run_server+0x420(ea758, e89c8, fe935840, 0, fe7e0200, fe8c21f0)
libisc.so.62`dispatch+0x7d8(d6758, 0, 0, 0, fe7e0200, 1)
libisc.so.62`run+0x14(d6758, fe95, 0, 0, fedde7f0, 0)
libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)

bash-3.00# pstack core
core 'core' of 1647:/opt/csw/sbin/named -4 -c /etc/opt/csw/named.conf
-d 3 -f -g -n 1 -p 5
-  lwp# 1 / thread# 1  
 fe8cbc3c ___sigtimedwait (ffbffbb0, 0, 0, fefe2a00, 0, 1) + 8
 fe8b4158 __posix_sigwait (ffbffbb0, ffbffb2c, 0, 0, fefe2a00, 1) + 18
 fede4aac isc__app_ctxrun (fee27fb8, c5170, c4f34, fffe, a5574, a5728)
+ 45c
 fede4bd8 isc__app_run (c9be0, a56fc, 0, 6e616d65, 80808080, 1010101) + 28
 000458dc main