Re: new BIND in 10.0_RC5/sparc dies w/Bus error
On 2024-03-05 1:13 am, matthew green wrote: ah. the problem is that struct isc_nmhandle grew a pointer member, adding 4 bytes to the struct size, and it uses C99 [] variable array for the final member, which is later assigned to other pointers, and this memory was now only 4-byte aligned. this hack patch works to stop named crashing for me, but i'll let christos figure out what the right general solution here is. .mrg. Index: lib/isc/netmgr/netmgr-int.h === RCS file: /cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v retrieving revision 1.8.2.1 diff -p -u -r1.8.2.1 netmgr-int.h --- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 - 1.8.2.1 +++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 - @@ -276,7 +276,7 @@ struct isc_nmhandle { LINK(isc_nmhandle_t) active_link; #endif void *opaque; - char extra[]; + char extra[] __attribute__((__aligned__(8))); }; typedef enum isc__netievent_type { Perhaps: union { void *p; long double d; long long lld; intmax_t im; } extra[]; Or simpler: struct { void *p; } extra[]; Does the second form work? christos
Re: new BIND in 10.0_RC5/sparc dies w/Bus error
On 2024-03-05 1:13 am, matthew green wrote: ah. the problem is that struct isc_nmhandle grew a pointer member, adding 4 bytes to the struct size, and it uses C99 [] variable array for the final member, which is later assigned to other pointers, and this memory was now only 4-byte aligned. this hack patch works to stop named crashing for me, but i'll let christos figure out what the right general solution here is. .mrg. Index: lib/isc/netmgr/netmgr-int.h === RCS file: /cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v retrieving revision 1.8.2.1 diff -p -u -r1.8.2.1 netmgr-int.h --- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 - 1.8.2.1 +++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 - @@ -276,7 +276,7 @@ struct isc_nmhandle { LINK(isc_nmhandle_t) active_link; #endif void *opaque; - char extra[]; + char extra[] __attribute__((__aligned__(8))); }; typedef enum isc__netievent_type { Does the following work, and is it more palatable? union { void *p; long double d; long long lld; intmax_t im; } extra[]; or just: -- christos
re: new BIND in 10.0_RC5/sparc dies w/Bus error
On Tue, 5 Mar 2024, John D. Baker wrote: > Thanks for the rapid analysis and workaround. I've applied it to my > netbsd-10 tree, rebuilt sparc and am updating now. It seems to be working now. Thanks again. -- |/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X |\ / jdbaker[snail]consolidated[flyspeck]net OpenBSDFreeBSD | X No HTML/proprietary data in email. BSD just sits there and works! |/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
re: new BIND in 10.0_RC5/sparc dies w/Bus error
On Tue, 5 Mar 2024, matthew green wrote: > ah. the problem is that struct isc_nmhandle grew a pointer member, > adding 4 bytes to the struct size, and it uses C99 [] variable array > for the final member, which is later assigned to other pointers, and > this memory was now only 4-byte aligned. this hack patch works to > stop named crashing for me, but i'll let christos figure out what the > right general solution here is. > > .mrg. > > Index: lib/isc/netmgr/netmgr-int.h > [diff] Thanks for the rapid analysis and workaround. I've applied it to my netbsd-10 tree, rebuilt sparc and am updating now. In the interrim, pointing the mailserver's 'resolv.conf' at my backup nameserver instead of itself and restarting sendmail has allowed mail to start working again. The backup nameserver is amd64 so should not have a problem with the changes in the new BIND when I get a chance to update it to 10.0_RC5. (I had long before set the "kern.defcorename=/var/tmp/cores/%n.core", but there was nothing there. I forget if the subdirectory "cores" will be automatically created or not, but it still isn't present on the system.) -- |/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X |\ / jdbaker[snail]consolidated[flyspeck]net OpenBSDFreeBSD | X No HTML/proprietary data in email. BSD just sits there and works! |/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
re: new BIND in 10.0_RC5/sparc dies w/Bus error
ah. the problem is that struct isc_nmhandle grew a pointer member, adding 4 bytes to the struct size, and it uses C99 [] variable array for the final member, which is later assigned to other pointers, and this memory was now only 4-byte aligned. this hack patch works to stop named crashing for me, but i'll let christos figure out what the right general solution here is. .mrg. Index: lib/isc/netmgr/netmgr-int.h === RCS file: /cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v retrieving revision 1.8.2.1 diff -p -u -r1.8.2.1 netmgr-int.h --- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 - 1.8.2.1 +++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 - @@ -276,7 +276,7 @@ struct isc_nmhandle { LINK(isc_nmhandle_t) active_link; #endif void *opaque; - char extra[]; + char extra[] __attribute__((__aligned__(8))); }; typedef enum isc__netievent_type {
re: new BIND in 10.0_RC5/sparc dies w/Bus error
this appears to be a badly aligned structure issue. i can reproduce it by doing "anita interact" with any recent sparc .iso, editing the named.conf to start, starting named, and doing 'dig ns netbsd.org' would trigger the crash. the stack trace is: (gdb) bt #0 ns__client_request (handle=0xeb02d008, eresult=ISC_R_SUCCESS, region=, arg=) at /usr/10/src/external/mpl/bind/lib/libns/../../dist/lib/ns/client.c:1825 #1 0xedb0dc80 in isc__nm_async_readcb (worker=0x0, ev0=0xeccf7ad4) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:2914 #2 0xedb0dde0 in isc__nm_readcb (sock=0xecfe8808, uvreq=0xeb0b6008, eresult=ISC_R_SUCCESS) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:2887 #3 0xedb1183c in udp_recv_cb (handle=, nrecv=53, buf=0xeccf7c54, addr=, flags=0) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/udp.c:653 #4 0xedb3aec8 in uv__udp_recvmsg (handle=0xecfe89f8) at /usr/10/src/external/mit/libuv/lib/../dist/src/unix/udp.c:303 #5 uv__udp_io (loop=, w=0xecfe8a38, revents=1) at /usr/10/src/external/mit/libuv/lib/../dist/src/unix/udp.c:178 #6 0xedb3a034 in uv__io_poll (loop=0xecf62810, timeout=) at /usr/10/src/external/mit/libuv/lib/../dist/src/unix/kqueue.c:390 #7 0xedb431a0 in uv_run (loop=0xecf62810, mode=UV_RUN_DEFAULT) at /usr/10/src/external/mit/libuv/lib/../dist/src/unix/core.c:406 #8 0xedb106ec in nm_thread (worker0=0xecf62808) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:704 #9 0xedb20f44 in isc__trampoline_run (arg=0xecf36be0) at /usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/trampoline.c:192 #10 0xed9ecda8 in pthread__create_tramp (cookie=0xecf7b000) at /usr/10/src/lib/libpthread/pthread.c:595 and the problem is that in ns__client_request(), we end up with: (gdb) p client $17 = (ns_client_t *) 0xeb02d144 but the alignment requirement for this structure is 8-bytes as it has 64-bit members. the fault actually occurs when reading two 4-byte members in one instruction: 1825env = client->manager->aclenv; 1826if (client->sctx->blackholeacl != NULL && 0x00036e70 <+408>: ldd [ %l6 + 0x10 ], %g2 "sctx" and "manager" are at offsets 0x10 and 0x14 and can both be read with a single ldd (64-bit load) but this requires correct alignment. i didn't track down how this client value is allocated, it's all via some opaque handle thing in the libraries, but this is a bug in the new bind not allocating structures properly aligned. .mrg.
re: new BIND in 10.0_RC5/sparc dies w/Bus error
actually, i found a core file in /var/chroot/named/etc/namedb/named.core. my build is missing debug info so i don't have a good idea what. .mrg.
re: new BIND in 10.0_RC5/sparc dies w/Bus error
> Unfortunately there was no core dump. this is almost certainly because /var/chroot/named is not writeable by user named, which is on purpose. you can set the corefile path for this process after it starts using sysctl proc.$pid.corename. i think setting to "/var/tmp/%n.core" should allow it to write to /var/tmp in the chroot. .mrg.
Re: new BIND in 10.0_RC5/sparc dies w/Bus error
John D. Baker wrote: > Sure enough, 'named' wasn't running. Checking the logs found > that some options in "named.conf" were no-longer available. >[...] > Has anyone else seen anything similar with the new BIND in -current > and since pulled up to 10.0_RC5? I noticed the same, both for RC5 and-current. It's "dnssec-enable yes;" in line 13 of the distributed /etc/named.conf (in the "options" section). This option was already deprecated before but has been finally put to rest in the most recent BIND update. (It still worked with the 10.0_RC4 "named".) In /var/log/messages I could see the initial named startup, the complaint about the option, and the service stop for this reason. Deleting the line and a "/etc/rc.d/named restart" (or just "start") worked nicely for me. Martin Neitzel