Re: [nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?

2019-06-12 Thread A. Schulze



Am 12.06.19 um 18:46 schrieb Stuart Henderson:
> On 2019/06/12 18:38, A. Schulze wrote:
>>
>>
>> Am 12.06.19 um 08:16 schrieb Wouter Wijngaards:
>>> Otherwise, perhaps a stack trace of the issue could be useful
>>> to have, or a way to reproduce it.
>>
>> I'm unable to produce core files.
>>
>> - ulimit -c unlimited is set
>> - cat /proc/sys/kernel/core_pattern: /core/corefile
>>
>> /core exist outside and inside chroot and is writeable by nsd user
>> nsd started chroot and without chroot
>> nsd crash but: no corefile :-/
>>
>> Any more magic required?
> 
> Try running nsd in the foreground directly under gdb?

OK here some result:

I use to start nsd as "nsd -d -c /path/to/nsd.conf

this start 3 processes
33786 ?Ss 0:00  \_ sshd: root@pts/1
33798 pts/1Ss 0:00  |   \_ -bash
30910 pts/1S+ 0:00  |   \_ /usr/sbin/nsd -c /chroot/etc/nsd/nsd.conf
30911 pts/1S+ 0:00  |   \_ /usr/sbin/nsd -c 
/chroot/etc/nsd/nsd.conf
32456 pts/1S+ 0:00  |   \_ /usr/sbin/nsd -c 
/chroot/etc/nsd/nsd.conf

# gdb /usr/sbin/nsd
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/nsd...(no debugging symbols found)...done.
(gdb) attach 32456
Attaching to program: /usr/sbin/nsd, process 32456
Reading symbols from /lib/x86_64-linux-gnu/libssl.so.1.1...(no debugging 
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libprotobuf-c.so.1...(no debugging 
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libfstrm.so.0...(no debugging 
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libcrypto.so.1.1...(no debugging 
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libev.so.4...(no debugging symbols 
found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols 
found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols 
found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging 
symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols 
found)...done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_compat.so.2...(no debugging 
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnsl.so.1...(no debugging symbols 
found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_nis.so.2...(no debugging 
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...(no debugging 
symbols found)...done.
0x7ffbed5602e3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) continue
Continuing.

{ now I start "nsd_control force_transfer" @ the SLAVE-Server ... 

Program received signal SIGSEGV, Segmentation fault.
0x7ffbee331785 in SSL_shutdown () from /lib/x86_64-linux-gnu/libssl.so.1.1
(gdb) bt
#0  0x7ffbee331785 in SSL_shutdown () from 
/lib/x86_64-linux-gnu/libssl.so.1.1
#1  0x55d50cb2ce0e in ?? ()
#2  0x55d50cb2dfd8 in ?? ()
#3  0x7ffbed81a413 in ev_invoke_pending () from 
/lib/x86_64-linux-gnu/libev.so.4
#4  0x7ffbed81da06 in ev_run () from /lib/x86_64-linux-gnu/libev.so.4
#5  0x7ffbed81ffd9 in event_base_loop () from 
/lib/x86_64-linux-gnu/libev.so.4
#6  0x55d50cb30bdd in ?? ()
#7  0x55d50cb30fcb in ?? ()
#8  0x55d50cb3244c in ?? ()
#9  0x55d50cae3ff7 in ?? ()
#10 0x7ffbed4972e1 in __libc_start_main () from 
/lib/x86_64-linux-gnu/libc.so.6
#11 0x55d50cae45fa in ?? ()
(gdb) quit
A debugging session is active.

Inferior 1 [process 32456] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/sbin/nsd, process 32456

Hope, this helps ...

Andreas
___
nsd-users mailing list
nsd-users@NLnetLabs.nl
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?

2019-06-12 Thread Stuart Henderson
On 2019/06/12 18:38, A. Schulze wrote:
> 
> 
> Am 12.06.19 um 08:16 schrieb Wouter Wijngaards:
> > Otherwise, perhaps a stack trace of the issue could be useful
> > to have, or a way to reproduce it.
> 
> I'm unable to produce core files.
> 
> - ulimit -c unlimited is set
> - cat /proc/sys/kernel/core_pattern: /core/corefile
> 
> /core exist outside and inside chroot and is writeable by nsd user
> nsd started chroot and without chroot
> nsd crash but: no corefile :-/
> 
> Any more magic required?

Try running nsd in the foreground directly under gdb?
___
nsd-users mailing list
nsd-users@NLnetLabs.nl
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?

2019-06-12 Thread A. Schulze



Am 12.06.19 um 08:16 schrieb Wouter Wijngaards:
> Otherwise, perhaps a stack trace of the issue could be useful
> to have, or a way to reproduce it.

I'm unable to produce core files.

- ulimit -c unlimited is set
- cat /proc/sys/kernel/core_pattern: /core/corefile

/core exist outside and inside chroot and is writeable by nsd user
nsd started chroot and without chroot
nsd crash but: no corefile :-/

Any more magic required?

Andreas
___
nsd-users mailing list
nsd-users@NLnetLabs.nl
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?

2019-06-12 Thread A. Schulze


Am 12.06.19 um 08:16 schrieb Wouter Wijngaards:
> Hi,
> 
> On 11/06/2019 22:22, PGNet Dev wrote:
>> On 6/11/19 11:59 AM, A. Schulze wrote:
>>> Am 11.06.19 um 19:54 schrieb A. Schulze:
   - I don't use TLS for AXFR
>>> if that matters: I *do* use TSIG-Keys to secure the AXFR
>>
>> Here, all the slave AXFRs are configured NOKEY.
> 
> So it is not SSL.  Then it could be TCP Fast Open (server-side fast
> open, from the logs I see this is the AXFR server).  Or perhaps the file
> descriptor events could be mangled, in the event handling.  This patch
> may help?  Otherwise, perhaps a stack trace of the issue could be useful
> to have, or a way to reproduce it.
> 
> https://github.com/NLnetLabs/nsd/commit/75c7e1a2fab265e504e8aa2f93dff195dc5b8394

I applied similiar changes to 4.2.0 but this make things worse. 
https://andreasschulze.de/tmp/axfr_segfault_fix.patch

Jun 12 17:48:57 dnsserver02 kernel: [2448848.071188] show_signal_msg: 7871 
callbacks suppressed
Jun 12 17:48:57 dnsserver02 kernel: [2448848.071191] nsd[59588]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.071684] nsd[59589]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.072179] nsd[59590]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.072672] nsd[59591]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.073452] nsd[59592]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.073972] nsd[59593]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.075735] nsd[59594]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.076253] nsd[59595]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.077077] nsd[59596]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.077576] nsd[59597]: segfault at 4 
ip 7f09e0d92ba3 sp 7ffe460a0978 error 6 in 
libev.so.4.0.0[7f09e0d89000+d000]


Andreas
___
nsd-users mailing list
nsd-users@NLnetLabs.nl
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?

2019-06-11 Thread Wouter Wijngaards
Hi,

On 11/06/2019 22:22, PGNet Dev wrote:
> On 6/11/19 11:59 AM, A. Schulze wrote:
>> Am 11.06.19 um 19:54 schrieb A. Schulze:
>>>   - I don't use TLS for AXFR
>> if that matters: I *do* use TSIG-Keys to secure the AXFR
>
> Here, all the slave AXFRs are configured NOKEY.

So it is not SSL.  Then it could be TCP Fast Open (server-side fast
open, from the logs I see this is the AXFR server).  Or perhaps the file
descriptor events could be mangled, in the event handling.  This patch
may help?  Otherwise, perhaps a stack trace of the issue could be useful
to have, or a way to reproduce it.

https://github.com/NLnetLabs/nsd/commit/75c7e1a2fab265e504e8aa2f93dff195dc5b8394

Best regards, Wouter


> ___
> nsd-users mailing list
> nsd-users@NLnetLabs.nl
> https://open.nlnetlabs.nl/mailman/listinfo/nsd-users
___
nsd-users mailing list
nsd-users@NLnetLabs.nl
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?

2019-06-11 Thread PGNet Dev

On 6/11/19 11:59 AM, A. Schulze wrote:

Am 11.06.19 um 19:54 schrieb A. Schulze:

  - I don't use TLS for AXFR

if that matters: I *do* use TSIG-Keys to secure the AXFR


Here, all the slave AXFRs are configured NOKEY.
___
nsd-users mailing list
nsd-users@NLnetLabs.nl
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?

2019-06-11 Thread A. Schulze



Am 11.06.19 um 19:54 schrieb A. Schulze:
>  - I don't use TLS for AXFR
if that matters: I *do* use TSIG-Keys to secure the AXFR

Andreas
___
nsd-users mailing list
nsd-users@NLnetLabs.nl
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?

2019-06-11 Thread A. Schulze


Am 11.06.19 um 18:02 schrieb PGNet Dev:
> I just bumped NSD
> 
>   nsd -v
>   NSD version 4.2.0
> 
> on a linux64 VM.
> 
> On axfrs, I'm seeing segfaults in libssl
> 
>   Jun 11 08:53:24 ns03 nsd[12296]: axfr for example1.com. from 
> 109.74.194.10
>   Jun 11 08:53:24 ns03 kernel: [35762.840704] nsd[12296]: segfault at 
> 560244acb618 ip 7fefedb81406 sp 7ffe6c552ac0 error 7 in 
> libssl.so.1.1[7fefedb43000+86000]
>   Jun 11 08:53:24 ns03 nsd[12246]: server 12296 died unexpectedly, 
> restarting
>   Jun 11 08:53:24 ns03 nsd[12225]: [2019-06-11 08:53:24.960] nsd[12246]: 
> warning: server 12296 died unexpectedly, restarting
>   Jun 11 08:53:25 ns03 nsd[12246]: process 12296 terminated with status 
> 139
>   Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.030] nsd[12246]: 
> warning: process 12296 terminated with status 139
>   Jun 11 08:53:25 ns03 nsd[12297]: axfr for example2.com. from 
> 207.192.70.10
>   Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.557] nsd[12297]: 
> info: axfr for example2.com. from 207.192.70.10
>   Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.557] nsd[12297]: 
> info: axfr for example3.com. from 207.192.70.10
>   Jun 11 08:53:25 ns03 nsd[12297]: axfr for example3.com. from 
> 207.192.70.10
>   Jun 11 08:53:25 ns03 nsd[12297]: axfr for example4.com. from 
> 207.192.70.10
>   Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.565] nsd[12297]: 
> info: axfr for example4.com. from 207.192.70.10
>   Jun 11 08:53:25 ns03 kernel: [35763.583172] nsd[12297]: segfault at 
> 560244acb618 ip 7fefedb81406 sp 7ffe6c552ac0 error 7 in 
> libssl.so.1.1[7fefedb43000+86000]
> 
> Note, *NOT* on every axfr; some seem to work.
> 
> Just starting to troubleshoot here ...
> 
> Any obvious issues that are already known that might cause this?

Hello "PGNet Dev"

now, as you ask:

I see similar messages before I updated from 4.1.27 to 4.2.0
And now, as you mentioned that issue, I also found the segfault message :-)

From what I see something bad must happen *after* AXFR is completed.

 - I see no warning/error on the consumer side
 - I don't use TLS for AXFR
 - it happened also on 4.1.27
 - I can't reproduce in an lab environment
 - none of my users asked me that they miss something

BTW: There is a draft ¹) "Message Digest for DNS Zones" to prove a transferred 
zone was received complete

I've added ²) the ldns-zone-digest tool in my ldns instance and can create and 
verify zone files.
Unfortunately not in this particilar installation :-/
Would be helpful if nsd could check such ZONEMD if available

Andreas


¹) https://tools.ietf.org/html/draft-wessels-dns-zone-digest-06
²) https://open.nlnetlabs.nl/pipermail/ldns-users/2018-November/000934.html
___
nsd-users mailing list
nsd-users@NLnetLabs.nl
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?

2019-06-11 Thread PGNet Dev

On 6/11/19 9:02 AM, PGNet Dev wrote:

I just bumped NSD

nsd -v
NSD version 4.2.0

on a linux64 VM.

On axfrs, I'm seeing segfaults in libssl



fwiw, dropback from NSD_4_2_0_REL -> clean build of NSD_4_1_27_REL, with 
no other config changes, cures the issue ... no segfaults on axfr.



___
nsd-users mailing list
nsd-users@NLnetLabs.nl
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users