Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Am 2024-03-29 18:21, schrieb Alexander Leidinger: Am 2024-03-29 18:13, schrieb Mark Johnston: On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. The working src is from 2024-03-11-094351 (GMT+0100). The failing src was fetched after Glebs stabilization week message (and todays src before the sound stuff still fails). Retpoline wasn't the cause, next test is the CTF stuff in the kernel... A rather obscure problem was causing this. The "last" BE had canmount set to "on" instead of "noauto". No idea how this happened, but this resulted in the "last" BE to be mounted on "zfs mount -a" on top of the current BE. This means that all modules loaded after the zfs rc script has run was loading old kernel modules and the error message of kernel version mismatch was correct. I fiund the issue while bisecting the tree and suddenly the error message went away but the new issue of missing dev entries popped up (/dev was mounted correctly on the booting dataset, but the last BE was mounted on top of it and /dev went empty...). It looks to me like bectl was doing this (from "zpool history")... 2024-03-11.14:16:31 zpool set bootfs=rpool/ROOT/2024-03-11-094351 rpool 2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-01-18-092730 2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-02-10-144617 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-11-212006 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-16-082836 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211_ok 2024-03-11.14:16:33 zfs set canmount=on rpool/ROOT/2024-03-11-094351 2024-03-11.14:16:33 zfs promote rpool/ROOT/2024-03-11-094351 2024-03-11.14:17:03 zfs destroy -r rpool/ROOT/2024-02-24-140211_ok I surely didn't do the "zfs set canmount=..." for those by hand. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
On 3/29/24 16:52, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? This is indeed how it works, those messages are emitted by CTF loading routines in 'kern/kern_ctf.c' as a warning and do not affect the rest of the module loading process. However, I completely agree that they are cryptic and spammy, I'll try to do something about that. Bojan
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Am 2024-03-29 18:13, schrieb Mark Johnston: On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. The working src is from 2024-03-11-094351 (GMT+0100). The failing src was fetched after Glebs stabilization week message (and todays src before the sound stuff still fails). Retpoline wasn't the cause, next test is the CTF stuff in the kernel... I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? From my reading of linker_ctf_load_file(), this is exactly how it already works. Great that it works this way, I still suggest to print a message what the warning about missing stuff means. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: > Hi, > > sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work > (see below for the issue). As the monthly stabilisation pass didn't find > obvious issues, it is something related to my setup: > - not a generic kernel > - very modular kernel (as much as possible as a module) > - bind_now (a build without fails too, tested with clean /usr/obj) > - ccache (a build without fails too, tested with clean /usr/obj) > - kernel retpoline (build without in progress) > - userland retpoline (build without in progress) > - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't > retpoline) > - -fno-builtin > - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) > - malloc production > - COPTFLAGS= -O2 -pipe > > The issue is, that kernel modules load OK from loader, but once it starts > init any module fails to load (e.g. via autodetection of hardware or rc.conf > kld_list) with the message that the kernel and module versions are out of > sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. > I tried the workaround to load the modules from the loader, which works, but > then I can't login remotely as ssh fails to allocate a pty. By loading > modules via the loader, I can see messages about missing CTF info when the > nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko > instead of /boot/kernel/...ko) try to get initialised... and it looks like > they are failing to get initialised because of this missing CTF stuff (I'm > back to the previous boot env to be able to login remotely and send mails, I > don't have a copy of the failure message at hand). > > I assume the missing CTF stuff is due to the CTF based pretty printing > (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). > Is this supposed to fail to load modules which are compiled without CTF > data? Shouldn't this work gracefully (e.g. spit out a warning that pretty > printing is not available for module X and have the module working)? >From my reading of linker_ctf_load_file(), this is exactly how it already works. > Next steps: > - try a world without retpoline (bind_now and ccache active) > - try a kernel without CTF (bind now, ccache, retpoline active) > - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS > > If anyone has an idea how to debug this in some other way... > > Bye, > Alexander. > > -- > http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF > http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? Next steps: - try a world without retpoline (bind_now and ccache active) - try a kernel without CTF (bind now, ccache, retpoline active) - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS If anyone has an idea how to debug this in some other way... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature