Re: /usr/bin/ftp crash on -current (10.00.4) aarch64
Clean build resolved the problem. Thanks, Chavdar On Thu, 18 May 2023 at 15:10, Chavdar Ivanov wrote: > > On Thu, 18 May 2023 at 15:06, Brad Spencer wrote: > > > > Chavdar Ivanov writes: > > > > > On Thu, 18 May 2023 at 11:31, Robert Swindells wrote: > > >> > > >> > > >> Chavdar Ivanov wrote: > > >> > The weird and suspicious thing is that /usr/bin/ftp is linked to both > > >> > existing libcrypto.so versions: > > >> > > > >> > ldd /usr/bin/ftp > > >> > /usr/bin/ftp: > > >> >-ledit.3 => /usr/lib/libedit.so.3 > > >> >-lterminfo.2 => /usr/lib/libterminfo.so.2 > > >> >-lc.12 => /usr/lib/libc.so.12 > > >> >-lssl.15 => /usr/lib/libssl.so.15 > > >> >-lcrypto.14 => /usr/lib/libcrypto.so.14 > > >> >-lcrypt.1 => /lib/libcrypt.so.1 > > >> >-lcrypto.15 => /usr/lib/libcrypto.so.15 > > >> > > >> I'm guessing you did an update build not a clean one. Also, did you use > > >> the -j flag to build in parallel? > > > > > > I indeed usually do update builds; in this case however, I had deleted > > > the entire obj (it is on a zfs, removing takes a lot of time, so > > > subsequently I replaced it with a dedicated zfs so that I can just > > > destroy and recreate it...). I also 'make cleandir' in [x]src in > > > advance; that usually is enough. > > > > > > Snapshot the zfs fileset when it is empty and rollback when you want to > > make it empty again. Very quick works best when obj is its own > > fileset. > > It didn't come to my mind... I was doing destroy, create, set > mountpoint and chown, so the above saves me three commands... > > > > > > > >> > > >> Do a clean build and everything should work. > > > > > > That's next. > > > > > > > > -- > > Brad Spencer - b...@anduin.eldar.org - KC8VKS - http://anduin.eldar.org > > > > -- > --
Re: /usr/bin/ftp crash on -current (10.00.4) aarch64
On Thu, 18 May 2023 at 15:06, Brad Spencer wrote: > > Chavdar Ivanov writes: > > > On Thu, 18 May 2023 at 11:31, Robert Swindells wrote: > >> > >> > >> Chavdar Ivanov wrote: > >> > The weird and suspicious thing is that /usr/bin/ftp is linked to both > >> > existing libcrypto.so versions: > >> > > >> > ldd /usr/bin/ftp > >> > /usr/bin/ftp: > >> >-ledit.3 => /usr/lib/libedit.so.3 > >> >-lterminfo.2 => /usr/lib/libterminfo.so.2 > >> >-lc.12 => /usr/lib/libc.so.12 > >> >-lssl.15 => /usr/lib/libssl.so.15 > >> >-lcrypto.14 => /usr/lib/libcrypto.so.14 > >> >-lcrypt.1 => /lib/libcrypt.so.1 > >> >-lcrypto.15 => /usr/lib/libcrypto.so.15 > >> > >> I'm guessing you did an update build not a clean one. Also, did you use > >> the -j flag to build in parallel? > > > > I indeed usually do update builds; in this case however, I had deleted > > the entire obj (it is on a zfs, removing takes a lot of time, so > > subsequently I replaced it with a dedicated zfs so that I can just > > destroy and recreate it...). I also 'make cleandir' in [x]src in > > advance; that usually is enough. > > > Snapshot the zfs fileset when it is empty and rollback when you want to > make it empty again. Very quick works best when obj is its own > fileset. It didn't come to my mind... I was doing destroy, create, set mountpoint and chown, so the above saves me three commands... > > > >> > >> Do a clean build and everything should work. > > > > That's next. > > > > -- > Brad Spencer - b...@anduin.eldar.org - KC8VKS - http://anduin.eldar.org --
Re: /usr/bin/ftp crash on -current (10.00.4) aarch64
Chavdar Ivanov writes: > On Thu, 18 May 2023 at 11:31, Robert Swindells wrote: >> >> >> Chavdar Ivanov wrote: >> > The weird and suspicious thing is that /usr/bin/ftp is linked to both >> > existing libcrypto.so versions: >> > >> > ldd /usr/bin/ftp >> > /usr/bin/ftp: >> >-ledit.3 => /usr/lib/libedit.so.3 >> >-lterminfo.2 => /usr/lib/libterminfo.so.2 >> >-lc.12 => /usr/lib/libc.so.12 >> >-lssl.15 => /usr/lib/libssl.so.15 >> >-lcrypto.14 => /usr/lib/libcrypto.so.14 >> >-lcrypt.1 => /lib/libcrypt.so.1 >> >-lcrypto.15 => /usr/lib/libcrypto.so.15 >> >> I'm guessing you did an update build not a clean one. Also, did you use >> the -j flag to build in parallel? > > I indeed usually do update builds; in this case however, I had deleted > the entire obj (it is on a zfs, removing takes a lot of time, so > subsequently I replaced it with a dedicated zfs so that I can just > destroy and recreate it...). I also 'make cleandir' in [x]src in > advance; that usually is enough. Snapshot the zfs fileset when it is empty and rollback when you want to make it empty again. Very quick works best when obj is its own fileset. >> >> Do a clean build and everything should work. > > That's next. -- Brad Spencer - b...@anduin.eldar.org - KC8VKS - http://anduin.eldar.org
Re: /usr/bin/ftp crash on -current (10.00.4) aarch64
On Thu, 18 May 2023 at 13:16, RVP wrote: > > On Thu, 18 May 2023, Chavdar Ivanov wrote: > > > Yes indeed, with SIGILL passed I get: > > > > > > Program received signal SIGSEGV, Segmentation fault. > > 0xf03114c97890 in EC_GROUP_order_bits () from /usr/lib/libcrypto.so.14 > > (gdb) bt > > #0 0xf03114c97890 in EC_GROUP_order_bits () from > > /usr/lib/libcrypto.so.14 > > #1 0xf031154898a4 in engine_unlocked_init () from > > /usr/lib/libcrypto.so.15 > > #2 0xf03115489ab0 in ENGINE_init () from /usr/lib/libcrypto.so.15 > > #3 0xf031153d11f0 in ?? () from /usr/lib/libcrypto.so.15 > > #4 0xf03115694c30 in ssl_setup_sig_algs () from /usr/lib/libssl.so.15 > > #5 0xf031156a85c4 in SSL_CTX_new_ex () from /usr/lib/libssl.so.15 > > #6 0x0f1be6d8 in fetch_start_ssl () > > #7 0x0f1b0dfc in fetch_url () > > #8 0x0f1b3128 in auto_fetch () > > #9 0x0f1bf944 in main () > > > > You can see the cause right in that stack trace: > > EC_GROUP_order_bits is from libcrypto.so.14, but, > engine_unlocked_init etc., are from libcrypto.so.15 > > This is our old friend: library interpositioning and it happens due to > this: > > $ readelf -d /mnt/usr/bin/ftp | f NEEDED > 0x0001 NEEDED Shared library: [libedit.so.3] > 0x0001 NEEDED Shared library: [libterminfo.so.2] > 0x0001 NEEDED Shared library: [libssl.so.14] > 0x0001 NEEDED Shared library: [libcrypto.so.14] > 0x0001 NEEDED Shared library: [libc.so.12] > $ readelf -d /mnt/usr/lib/libssl.so.14 | f NEEDED > 0x0001 NEEDED Shared library: [libcrypto.so.14] > 0x0001 NEEDED Shared library: [libc.so.12] > > > So, my ftp binary explicitly needs `libcrypto.so.14'. and `libssl' also has > _the same version_ as a dependency. But, in your case, the ftp binary will > show `libcrypto.so.15', but libssl will need `libcrypto.so.14'. Ie. the > compiler linked in the newer version explicitly (cc ... -lcrypto') and the > other one was brought in implicitly via libssl. Indeed: # readelf -d /usr/bin/ftp | grep NEEDED 0x0001 (NEEDED) Shared library: [libedit.so.3] 0x0001 (NEEDED) Shared library: [libterminfo.so.2] 0x0001 (NEEDED) Shared library: [libssl.so.15] 0x0001 (NEEDED) Shared library: [libcrypto.so.15] 0x0001 (NEEDED) Shared library: [libc.so.12] # readelf -d /usr/lib/libssl.so.15.0 | grep NEEDED 0x0001 (NEEDED) Shared library: [libcrypto.so.14] 0x0001 (NEEDED) Shared library: [libc.so.12] Rebuilding clean now. Thanks, Chavdar > > -RVP > --
Re: /usr/bin/ftp crash on -current (10.00.4) aarch64
On Thu, 18 May 2023, Chavdar Ivanov wrote: Yes indeed, with SIGILL passed I get: Program received signal SIGSEGV, Segmentation fault. 0xf03114c97890 in EC_GROUP_order_bits () from /usr/lib/libcrypto.so.14 (gdb) bt #0 0xf03114c97890 in EC_GROUP_order_bits () from /usr/lib/libcrypto.so.14 #1 0xf031154898a4 in engine_unlocked_init () from /usr/lib/libcrypto.so.15 #2 0xf03115489ab0 in ENGINE_init () from /usr/lib/libcrypto.so.15 #3 0xf031153d11f0 in ?? () from /usr/lib/libcrypto.so.15 #4 0xf03115694c30 in ssl_setup_sig_algs () from /usr/lib/libssl.so.15 #5 0xf031156a85c4 in SSL_CTX_new_ex () from /usr/lib/libssl.so.15 #6 0x0f1be6d8 in fetch_start_ssl () #7 0x0f1b0dfc in fetch_url () #8 0x0f1b3128 in auto_fetch () #9 0x0f1bf944 in main () You can see the cause right in that stack trace: EC_GROUP_order_bits is from libcrypto.so.14, but, engine_unlocked_init etc., are from libcrypto.so.15 This is our old friend: library interpositioning and it happens due to this: $ readelf -d /mnt/usr/bin/ftp | f NEEDED 0x0001 NEEDED Shared library: [libedit.so.3] 0x0001 NEEDED Shared library: [libterminfo.so.2] 0x0001 NEEDED Shared library: [libssl.so.14] 0x0001 NEEDED Shared library: [libcrypto.so.14] 0x0001 NEEDED Shared library: [libc.so.12] $ readelf -d /mnt/usr/lib/libssl.so.14 | f NEEDED 0x0001 NEEDED Shared library: [libcrypto.so.14] 0x0001 NEEDED Shared library: [libc.so.12] So, my ftp binary explicitly needs `libcrypto.so.14'. and `libssl' also has _the same version_ as a dependency. But, in your case, the ftp binary will show `libcrypto.so.15', but libssl will need `libcrypto.so.14'. Ie. the compiler linked in the newer version explicitly (cc ... -lcrypto') and the other one was brought in implicitly via libssl. -RVP
Re: /usr/bin/ftp crash on -current (10.00.4) aarch64
On Thu, 18 May 2023 at 11:31, Robert Swindells wrote: > > > Chavdar Ivanov wrote: > > The weird and suspicious thing is that /usr/bin/ftp is linked to both > > existing libcrypto.so versions: > > > > ldd /usr/bin/ftp > > /usr/bin/ftp: > >-ledit.3 => /usr/lib/libedit.so.3 > >-lterminfo.2 => /usr/lib/libterminfo.so.2 > >-lc.12 => /usr/lib/libc.so.12 > >-lssl.15 => /usr/lib/libssl.so.15 > >-lcrypto.14 => /usr/lib/libcrypto.so.14 > >-lcrypt.1 => /lib/libcrypt.so.1 > >-lcrypto.15 => /usr/lib/libcrypto.so.15 > > I'm guessing you did an update build not a clean one. Also, did you use > the -j flag to build in parallel? I indeed usually do update builds; in this case however, I had deleted the entire obj (it is on a zfs, removing takes a lot of time, so subsequently I replaced it with a dedicated zfs so that I can just destroy and recreate it...). I also 'make cleandir' in [x]src in advance; that usually is enough. > > Do a clean build and everything should work. That's next. --
Re: /usr/bin/ftp crash on -current (10.00.4) aarch64
On Thu, 18 May 2023 at 11:33, RVP wrote: > > On Thu, 18 May 2023, Chavdar Ivanov wrote: > > > This turned out to be /usr/bin/ftp crashing: > > > > # /usr/bin/ftp -o node-v20.2.0.tar.xz > > 'https://nodejs.org/dist/v20.2.0/node-v20.2.0.tar.xz' > > Trying 104.20.23.46:443 ... > > [1]7100 segmentation fault /usr/bin/ftp -o node-v20.2.0.tar.xz > > > > > > If I run it under gdb, I get: > > > > (gdb) run -o node-v20.2.0.tar.xz > > 'https://nodejs.org/dist/v20.2.0/node-v20.2.0.tar.xz' > > Starting program: /usr/bin/ftp -o node-v20.2.0.tar.xz > > 'https://nodejs.org/dist/v20.2.0/node-v20.2.0.tar.xz' > > > > Program received signal SIGILL, Illegal instruction. > > 0xf7db5d54be70 in _armv8_sha512_probe () from /usr/lib/libcrypto.so.14 > > (gdb) bt > > #0 0xf7db5d54be70 in _armv8_sha512_probe () from > > /usr/lib/libcrypto.so.14 > > #1 0xf7db5d54c23c in OPENSSL_cpuid_setup () from > > /usr/lib/libcrypto.so.14 > > #2 0xef643398 in _rtld_call_init_function () from > > /usr/libexec/ld.elf_so > > #3 0xef6436a4 in _rtld_call_init_functions () from > > /usr/libexec/ld.elf_so > > #4 0xef643f74 in _rtld () from /usr/libexec/ld.elf_so > > #5 0xef640b10 in _rtld_start () from /usr/libexec/ld.elf_so > > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > > > You should ignore SIGILL when it's in libcrypto on some archs. eg. ARM, > PPC & Sparc. On x86 systems, libcrypto uses the CPUID instruction to > determine which optimized assembly routines can be used for speedup. On > ARM etc, it installs a SIGILL handler and just runs test instructions. The > handler being called means _those_ instructions are not available. > > So, on ARM, you have to tell gdb to pass through SIGILL to the program: > > ``` > (gdb) handle SIGILL nostop noprint pass > ``` > > > The weird and suspicious thing is that /usr/bin/ftp is linked to both > > existing libcrypto.so versions: > > > > ldd /usr/bin/ftp > > /usr/bin/ftp: > >-ledit.3 => /usr/lib/libedit.so.3 > >-lterminfo.2 => /usr/lib/libterminfo.so.2 > >-lc.12 => /usr/lib/libc.so.12 > >-lssl.15 => /usr/lib/libssl.so.15 > >-lcrypto.14 => /usr/lib/libcrypto.so.14 > >-lcrypt.1 => /lib/libcrypt.so.1 > >-lcrypto.15 => /usr/lib/libcrypto.so.15 > > > > I would say this is the real reason for the crash (SIGSEGV). Yes indeed, with SIGILL passed I get: Program received signal SIGSEGV, Segmentation fault. 0xf03114c97890 in EC_GROUP_order_bits () from /usr/lib/libcrypto.so.14 (gdb) bt #0 0xf03114c97890 in EC_GROUP_order_bits () from /usr/lib/libcrypto.so.14 #1 0xf031154898a4 in engine_unlocked_init () from /usr/lib/libcrypto.so.15 #2 0xf03115489ab0 in ENGINE_init () from /usr/lib/libcrypto.so.15 #3 0xf031153d11f0 in ?? () from /usr/lib/libcrypto.so.15 #4 0xf03115694c30 in ssl_setup_sig_algs () from /usr/lib/libssl.so.15 #5 0xf031156a85c4 in SSL_CTX_new_ex () from /usr/lib/libssl.so.15 #6 0x0f1be6d8 in fetch_start_ssl () #7 0x0f1b0dfc in fetch_url () #8 0x0f1b3128 in auto_fetch () #9 0x0f1bf944 in main () > > -RVP --
Re: /usr/bin/ftp crash on -current (10.00.4) aarch64
On Thu, 18 May 2023, Chavdar Ivanov wrote: This turned out to be /usr/bin/ftp crashing: # /usr/bin/ftp -o node-v20.2.0.tar.xz 'https://nodejs.org/dist/v20.2.0/node-v20.2.0.tar.xz' Trying 104.20.23.46:443 ... [1]7100 segmentation fault /usr/bin/ftp -o node-v20.2.0.tar.xz If I run it under gdb, I get: (gdb) run -o node-v20.2.0.tar.xz 'https://nodejs.org/dist/v20.2.0/node-v20.2.0.tar.xz' Starting program: /usr/bin/ftp -o node-v20.2.0.tar.xz 'https://nodejs.org/dist/v20.2.0/node-v20.2.0.tar.xz' Program received signal SIGILL, Illegal instruction. 0xf7db5d54be70 in _armv8_sha512_probe () from /usr/lib/libcrypto.so.14 (gdb) bt #0 0xf7db5d54be70 in _armv8_sha512_probe () from /usr/lib/libcrypto.so.14 #1 0xf7db5d54c23c in OPENSSL_cpuid_setup () from /usr/lib/libcrypto.so.14 #2 0xef643398 in _rtld_call_init_function () from /usr/libexec/ld.elf_so #3 0xef6436a4 in _rtld_call_init_functions () from /usr/libexec/ld.elf_so #4 0xef643f74 in _rtld () from /usr/libexec/ld.elf_so #5 0xef640b10 in _rtld_start () from /usr/libexec/ld.elf_so Backtrace stopped: previous frame identical to this frame (corrupt stack?) You should ignore SIGILL when it's in libcrypto on some archs. eg. ARM, PPC & Sparc. On x86 systems, libcrypto uses the CPUID instruction to determine which optimized assembly routines can be used for speedup. On ARM etc, it installs a SIGILL handler and just runs test instructions. The handler being called means _those_ instructions are not available. So, on ARM, you have to tell gdb to pass through SIGILL to the program: ``` (gdb) handle SIGILL nostop noprint pass ``` The weird and suspicious thing is that /usr/bin/ftp is linked to both existing libcrypto.so versions: ldd /usr/bin/ftp /usr/bin/ftp: -ledit.3 => /usr/lib/libedit.so.3 -lterminfo.2 => /usr/lib/libterminfo.so.2 -lc.12 => /usr/lib/libc.so.12 -lssl.15 => /usr/lib/libssl.so.15 -lcrypto.14 => /usr/lib/libcrypto.so.14 -lcrypt.1 => /lib/libcrypt.so.1 -lcrypto.15 => /usr/lib/libcrypto.so.15 I would say this is the real reason for the crash (SIGSEGV). -RVP
Re: /usr/bin/ftp crash on -current (10.00.4) aarch64
Chavdar Ivanov wrote: > The weird and suspicious thing is that /usr/bin/ftp is linked to both > existing libcrypto.so versions: > > ldd /usr/bin/ftp > /usr/bin/ftp: >-ledit.3 => /usr/lib/libedit.so.3 >-lterminfo.2 => /usr/lib/libterminfo.so.2 >-lc.12 => /usr/lib/libc.so.12 >-lssl.15 => /usr/lib/libssl.so.15 >-lcrypto.14 => /usr/lib/libcrypto.so.14 >-lcrypt.1 => /lib/libcrypt.so.1 >-lcrypto.15 => /usr/lib/libcrypto.so.15 I'm guessing you did an update build not a clean one. Also, did you use the -j flag to build in parallel? Do a clean build and everything should work.
/usr/bin/ftp crash on -current (10.00.4) aarch64
Hi, After having upgraded my aarch64 host to (NetBSD narvi 10.99.4 NetBSD 10.99.4 (GENERIC64) #0: Sun May 14 19:13:18 BST 2023 sysbu...@ymir.lorien.lan:/dumps/sysbuild/evbarm64/obj/home/sysbuild/src/sys/arch/evbarm/compile/GENERIC64 evbarm) I found out I can no longer fetch some packages: ... cd /usr/pkgsrc/lang/nodejs ➜ nodejs make fetch => Bootstrap dependency digest>=20211023: found digest-20220214 => Fetching node-v20.2.0.tar.xz => Total size: 41778040 bytes Trying 104.20.22.46:443 ... [1] Segmentation fault (cd ${fetchdir}; if ${TEST} -n "${resume}"; th... fetch: Unable to fetch expected file node-v20.2.0.tar.xz Trying 151.101.61.6:80 ... ... This turned out to be /usr/bin/ftp crashing: # /usr/bin/ftp -o node-v20.2.0.tar.xz 'https://nodejs.org/dist/v20.2.0/node-v20.2.0.tar.xz' Trying 104.20.23.46:443 ... [1]7100 segmentation fault /usr/bin/ftp -o node-v20.2.0.tar.xz If I run it under gdb, I get: (gdb) run -o node-v20.2.0.tar.xz 'https://nodejs.org/dist/v20.2.0/node-v20.2.0.tar.xz' Starting program: /usr/bin/ftp -o node-v20.2.0.tar.xz 'https://nodejs.org/dist/v20.2.0/node-v20.2.0.tar.xz' Program received signal SIGILL, Illegal instruction. 0xf7db5d54be70 in _armv8_sha512_probe () from /usr/lib/libcrypto.so.14 (gdb) bt #0 0xf7db5d54be70 in _armv8_sha512_probe () from /usr/lib/libcrypto.so.14 #1 0xf7db5d54c23c in OPENSSL_cpuid_setup () from /usr/lib/libcrypto.so.14 #2 0xef643398 in _rtld_call_init_function () from /usr/libexec/ld.elf_so #3 0xef6436a4 in _rtld_call_init_functions () from /usr/libexec/ld.elf_so #4 0xef643f74 in _rtld () from /usr/libexec/ld.elf_so #5 0xef640b10 in _rtld_start () from /usr/libexec/ld.elf_so Backtrace stopped: previous frame identical to this frame (corrupt stack?) The weird and suspicious thing is that /usr/bin/ftp is linked to both existing libcrypto.so versions: ldd /usr/bin/ftp /usr/bin/ftp: -ledit.3 => /usr/lib/libedit.so.3 -lterminfo.2 => /usr/lib/libterminfo.so.2 -lc.12 => /usr/lib/libc.so.12 -lssl.15 => /usr/lib/libssl.so.15 -lcrypto.14 => /usr/lib/libcrypto.so.14 -lcrypt.1 => /lib/libcrypt.so.1 -lcrypto.15 => /usr/lib/libcrypto.so.15 whereas on amd64, built a few hours earlier, I get: # ldd =ftp /usr/bin/ftp: -ledit.3 => /usr/lib/libedit.so.3 -lterminfo.2 => /usr/lib/libterminfo.so.2 -lc.12 => /usr/lib/libc.so.12 -lssl.15 => /usr/lib/libssl.so.15 -lcrypto.15 => /usr/lib/libcrypto.so.15 -lcrypt.1 => /lib/libcrypt.so.1 I will obviously rebuild the aarch64 system just in case, but thought it worth mentioning. Chavdar --