Re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-05 Thread Christos Zoulas

On 2024-03-05 1:13 am, matthew green wrote:

ah.  the problem is that struct isc_nmhandle grew a pointer member,
adding 4 bytes to the struct size, and it uses C99 [] variable array
for the final member, which is later assigned to other pointers, and
this memory was now only 4-byte aligned.  this hack patch works to
stop named crashing for me, but i'll let christos figure out what the
right general solution here is.


.mrg.


Index: lib/isc/netmgr/netmgr-int.h
===
RCS file: 
/cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v

retrieving revision 1.8.2.1
diff -p -u -r1.8.2.1 netmgr-int.h
--- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 -  1.8.2.1
+++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 -
@@ -276,7 +276,7 @@ struct isc_nmhandle {
LINK(isc_nmhandle_t) active_link;
 #endif
void *opaque;
-   char extra[];
+   char extra[] __attribute__((__aligned__(8)));
 };

 typedef enum isc__netievent_type {


Perhaps:
union {
void *p;
long double d;
long long lld;
intmax_t im;
} extra[];

Or simpler:
struct {
void *p;
} extra[];

Does the second form work?

christos


Re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-05 Thread Christos Zoulas

On 2024-03-05 1:13 am, matthew green wrote:

ah.  the problem is that struct isc_nmhandle grew a pointer member,
adding 4 bytes to the struct size, and it uses C99 [] variable array
for the final member, which is later assigned to other pointers, and
this memory was now only 4-byte aligned.  this hack patch works to
stop named crashing for me, but i'll let christos figure out what the
right general solution here is.


.mrg.


Index: lib/isc/netmgr/netmgr-int.h
===
RCS file: 
/cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v

retrieving revision 1.8.2.1
diff -p -u -r1.8.2.1 netmgr-int.h
--- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 -  1.8.2.1
+++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 -
@@ -276,7 +276,7 @@ struct isc_nmhandle {
LINK(isc_nmhandle_t) active_link;
 #endif
void *opaque;
-   char extra[];
+   char extra[] __attribute__((__aligned__(8)));
 };

 typedef enum isc__netievent_type {


Does the following work, and is it more palatable?

union {
void *p;
long double d;
long long lld;
intmax_t im;
} extra[];

or just:

--
christos


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-05 Thread John D. Baker
On Tue, 5 Mar 2024, John D. Baker wrote:

> Thanks for the rapid analysis and workaround.  I've applied it to my
> netbsd-10 tree, rebuilt sparc and am updating now.

It seems to be working now.  Thanks again.

-- 
|/"\ John D. Baker, KN5UKS   NetBSD Darwin/MacOS X
|\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSDFreeBSD
| X  No HTML/proprietary data in email.   BSD just sits there and works!
|/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-05 Thread John D. Baker
On Tue, 5 Mar 2024, matthew green wrote:

> ah.  the problem is that struct isc_nmhandle grew a pointer member,
> adding 4 bytes to the struct size, and it uses C99 [] variable array
> for the final member, which is later assigned to other pointers, and
> this memory was now only 4-byte aligned.  this hack patch works to
> stop named crashing for me, but i'll let christos figure out what the
> right general solution here is.
> 
> .mrg.
> 
> Index: lib/isc/netmgr/netmgr-int.h
> [diff]

Thanks for the rapid analysis and workaround.  I've applied it to my
netbsd-10 tree, rebuilt sparc and am updating now.

In the interrim, pointing the mailserver's 'resolv.conf' at my backup
nameserver instead of itself and restarting sendmail has allowed mail
to start working again.

The backup nameserver is amd64 so should not have a problem with the
changes in the new BIND when I get a chance to update it to 10.0_RC5.

(I had long before set the "kern.defcorename=/var/tmp/cores/%n.core",
but there was nothing there.  I forget if the subdirectory "cores" will
be automatically created or not, but it still isn't present on the
system.)

-- 
|/"\ John D. Baker, KN5UKS   NetBSD Darwin/MacOS X
|\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSDFreeBSD
| X  No HTML/proprietary data in email.   BSD just sits there and works!
|/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-04 Thread matthew green
ah.  the problem is that struct isc_nmhandle grew a pointer member,
adding 4 bytes to the struct size, and it uses C99 [] variable array
for the final member, which is later assigned to other pointers, and
this memory was now only 4-byte aligned.  this hack patch works to
stop named crashing for me, but i'll let christos figure out what the
right general solution here is.


.mrg.


Index: lib/isc/netmgr/netmgr-int.h
===
RCS file: /cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v
retrieving revision 1.8.2.1
diff -p -u -r1.8.2.1 netmgr-int.h
--- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 -  1.8.2.1
+++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 -
@@ -276,7 +276,7 @@ struct isc_nmhandle {
LINK(isc_nmhandle_t) active_link;
 #endif
void *opaque;
-   char extra[];
+   char extra[] __attribute__((__aligned__(8)));
 };
 
 typedef enum isc__netievent_type {


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-04 Thread matthew green
this appears to be a badly aligned structure issue.  i can reproduce
it by doing "anita interact" with any recent sparc .iso, editing the
named.conf to start, starting named, and doing 'dig ns netbsd.org'
would trigger the crash.

the stack trace is:

(gdb) bt
#0  ns__client_request (handle=0xeb02d008, eresult=ISC_R_SUCCESS, 
region=, arg=)
at /usr/10/src/external/mpl/bind/lib/libns/../../dist/lib/ns/client.c:1825
#1  0xedb0dc80 in isc__nm_async_readcb (worker=0x0, ev0=0xeccf7ad4) at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:2914
#2  0xedb0dde0 in isc__nm_readcb (sock=0xecfe8808, uvreq=0xeb0b6008, 
eresult=ISC_R_SUCCESS)
at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:2887
#3  0xedb1183c in udp_recv_cb (handle=, nrecv=53, 
buf=0xeccf7c54, addr=, flags=0)
at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/udp.c:653
#4  0xedb3aec8 in uv__udp_recvmsg (handle=0xecfe89f8) at 
/usr/10/src/external/mit/libuv/lib/../dist/src/unix/udp.c:303
#5  uv__udp_io (loop=, w=0xecfe8a38, revents=1) at 
/usr/10/src/external/mit/libuv/lib/../dist/src/unix/udp.c:178
#6  0xedb3a034 in uv__io_poll (loop=0xecf62810, timeout=) at 
/usr/10/src/external/mit/libuv/lib/../dist/src/unix/kqueue.c:390
#7  0xedb431a0 in uv_run (loop=0xecf62810, mode=UV_RUN_DEFAULT) at 
/usr/10/src/external/mit/libuv/lib/../dist/src/unix/core.c:406
#8  0xedb106ec in nm_thread (worker0=0xecf62808) at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/netmgr.c:704
#9  0xedb20f44 in isc__trampoline_run (arg=0xecf36be0) at 
/usr/10/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/trampoline.c:192
#10 0xed9ecda8 in pthread__create_tramp (cookie=0xecf7b000) at 
/usr/10/src/lib/libpthread/pthread.c:595

and the problem is that in ns__client_request(), we end up with:

(gdb) p client
$17 = (ns_client_t *) 0xeb02d144

but the alignment requirement for this structure is 8-bytes as it has
64-bit members.  the fault actually occurs when reading two 4-byte
members in one instruction:

1825env = client->manager->aclenv;
1826if (client->sctx->blackholeacl != NULL &&
   0x00036e70 <+408>: ldd  [ %l6 + 0x10 ], %g2

"sctx" and "manager" are at offsets 0x10 and 0x14 and can both be
read with a single ldd (64-bit load) but this requires correct
alignment.

i didn't track down how this client value is allocated, it's all
via some opaque handle thing in the libraries, but this is a bug
in the new bind not allocating structures properly aligned.


.mrg.


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-04 Thread matthew green
actually, i found a core file in /var/chroot/named/etc/namedb/named.core.

my build is missing debug info so i don't have a good idea what.


.mrg.


re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-04 Thread matthew green
> Unfortunately there was no core dump.

this is almost certainly because /var/chroot/named is not writeable
by user named, which is on purpose.

you can set the corefile path for this process after it starts using
sysctl proc.$pid.corename.  i think setting to "/var/tmp/%n.core"
should allow it to write to /var/tmp in the chroot.


.mrg.


Re: new BIND in 10.0_RC5/sparc dies w/Bus error

2024-03-04 Thread neitzel
John D. Baker wrote:
> Sure enough, 'named' wasn't running.  Checking the logs found
> that some options in "named.conf" were no-longer available.
>[...]
> Has anyone else seen anything similar with the new BIND in -current
> and since pulled up to 10.0_RC5?

I noticed the same, both for RC5 and-current.  It's "dnssec-enable
yes;" in line 13 of the distributed /etc/named.conf (in the "options"
section).  This option was already deprecated before but has been
finally put to rest in the most recent BIND update.  (It still worked
with the 10.0_RC4 "named".)

In /var/log/messages I could see the initial named startup, the
complaint about the option, and the service stop for this reason.

Deleting the line and a "/etc/rc.d/named restart" (or just "start")
worked nicely for me.

Martin Neitzel