Bug#1052069: ways to proceed?
On 12/11/2023 15.21, Adam Majer wrote: On 2023-11-10 10:51, Andreas Beckmann wrote: The module should continue to work on 6.1 The module should continue to work on 6.5 booted with ibt=off The module should fail to load with an error message describing the issue on 6.5 with ibt enabled, but without a kernel BUG. Did you find some time to test the package patched with my IBT related changes? I've managed to add the patch today and recompile. Complete success and works as expected. It no longer crashes when ibt=on. Initially I've had an idea to automatically set this kernel command line boot values when the module is installed, but having this informative Theoretically I could have patched the module to disable IBT at runtime while loading the incompatible module ... but I doubt that silently* lowering the system security level is a good idea ... that should rather be admin's choice. Also, this behavioral change wouldn't exist in any other distro. *) silently includes kprint("LOWERING SECURITY LEVEL BY DISABLING IBT TO LOAD PROPRIETARY LEGACY KERNEL MODULE") message is actually better. One nit-pick is that the kernel message is not clear as to what module it's talking about. kernel: NVRM: This module is incompatible with IBT. Try booting with ibt=off. A better message could be, kernel: NVRM: This Nvidia driver is incompatible with IBT. Try booting with ibt=off. Thanks for the feedback, I'll update the error message and apply this patch to all driver series up to 470. (This does not seem to be fixed in 470.223.02, and maybe upstream will never provide an IBT-enabled build for the old driver series. The percentage of people running a modern cpu with a legacy gpu is probably very low.) Andreas
Bug#1052069: ways to proceed?
On 2023-11-10 10:51, Andreas Beckmann wrote: The module should continue to work on 6.1 The module should continue to work on 6.5 booted with ibt=off The module should fail to load with an error message describing the issue on 6.5 with ibt enabled, but without a kernel BUG. Did you find some time to test the package patched with my IBT related changes? I've managed to add the patch today and recompile. Complete success and works as expected. It no longer crashes when ibt=on. Initially I've had an idea to automatically set this kernel command line boot values when the module is installed, but having this informative message is actually better. One nit-pick is that the kernel message is not clear as to what module it's talking about. kernel: NVRM: This module is incompatible with IBT. Try booting with ibt=off. A better message could be, kernel: NVRM: This Nvidia driver is incompatible with IBT. Try booting with ibt=off. Thanks! - Adam
Bug#1052069: ways to proceed?
Hi Adam, On 04/11/2023 01.24, Andreas Beckmann wrote: Or you could simply install the -dkms package from https://people.debian.org/~anbe/1052069/ The module should continue to work on 6.1 The module should continue to work on 6.5 booted with ibt=off The module should fail to load with an error message describing the issue on 6.5 with ibt enabled, but without a kernel BUG. Did you find some time to test the package patched with my IBT related changes? Thanks Andreas
Bug#1052069: ways to proceed?
On 03/11/2023 22.39, Adam Majer wrote: You need to specify ibt=off to kernel at boot time for the older nvidia modules to work. Since the kernel has this protection enabled by default, it will have to be disabled until such time as nvidia bothers to update/recompile these older drivers like they did the recent ones. Thanks for your test. Since that protection can be disabled with ibt=off, it should be possible to test for the status (not available/enabled/disabled) at module load time and fail the load process with an informative error message instead of calling into the incompatible code from the blob. The EoL driver series (e.g. anything predating the 470 series) will not see any further updates from NVIDIA. And people still really want to use these old drivers for ancient hardware. ... Maybe that was really simple. Could you try the attached patch? (apply it to the source in /usr/src/nvidia-tesla-470-* and rebuild the dkms module) Or you could simply install the -dkms package from https://people.debian.org/~anbe/1052069/ The module should continue to work on 6.1 The module should continue to work on 6.5 booted with ibt=off The module should fail to load with an error message describing the issue on 6.5 with ibt enabled, but without a kernel BUG. AndreasFrom 17b722086d1cbc19ec6ea811a334ff9d90ad90e6 Mon Sep 17 00:00:00 2001 From: Andreas Beckmann Date: Sat, 4 Nov 2023 00:44:56 +0100 Subject: [PATCH] refuse to load module if IBT is enabled --- nvidia-modeset/nvidia-modeset-linux.c | 7 +++ nvidia/nv.c | 7 +++ 2 files changed, 14 insertions(+) diff --git a/nvidia-modeset/nvidia-modeset-linux.c b/nvidia-modeset/nvidia-modeset-linux.c index 04a8ac4..95e668e 100644 --- a/nvidia-modeset/nvidia-modeset-linux.c +++ b/nvidia-modeset/nvidia-modeset-linux.c @@ -1651,6 +1651,13 @@ static int __init nvkms_init(void) { int ret; +#ifdef CONFIG_X86_KERNEL_IBT +if (cpu_feature_enabled(X86_FEATURE_IBT)) { +printk(KERN_ERR NVKMS_LOG_PREFIX "This module is incompatible with IBT. Try booting with ibt=off."); +return -EINVAL; +} +#endif + atomic_set(_alloc_called_count, 0); ret = nvkms_alloc_rm(); diff --git a/nvidia/nv.c b/nvidia/nv.c index 42778da..a005d69 100644 --- a/nvidia/nv.c +++ b/nvidia/nv.c @@ -739,6 +739,13 @@ int __init nvidia_init_module(void) nvidia_stack_t *sp = NULL; NvU32 allow_no_gpu_init = 0; +#ifdef CONFIG_X86_KERNEL_IBT +if (cpu_feature_enabled(X86_FEATURE_IBT)) { +printk(KERN_ERR "NVRM: This module is incompatible with IBT. Try booting with ibt=off."); +return -EINVAL; +} +#endif + nv_memdbg_init(); rc = nv_procfs_init(); -- 2.20.1
Bug#1052069: ways to proceed?
reopen 1052069 retitle -1 kernel oops on module load due to IBT=ON in recent kernels thanks On 2023-11-03 12:42, Andreas Beckmann wrote: Hi Adam, On 31/10/2023 22.06, Adam Majer wrote: So, what's the way to proceed here? Can we add the boot parameter when the legacy kernel module is to be loaded on newer Intel processors? I probably made a mistake while backporting some non-trivial changes for supporting recent kernels. With the availability of 470.223.02 I could verify my backport against the upstream version, drop a lot of unneeded bits and fix some discrepancies... So, I've now tested it against 6.5.0-3-amd64 and the problem remains. It's not related to any of your "mistakes" :-) You need to specify ibt=off to kernel at boot time for the older nvidia modules to work. Since the kernel has this protection enabled by default, it will have to be disabled until such time as nvidia bothers to update/recompile these older drivers like they did the recent ones. - Adam
Bug#1052069: ways to proceed?
On 11/3/23 12:42, Andreas Beckmann wrote: I probably made a mistake while backporting some non-trivial changes for supporting recent kernels. With the availability of 470.223.02 I could verify my backport against the upstream version, drop a lot of unneeded bits and fix some discrepancies... Hi Andreas! I will verify if this fixes this issue, though I will still bet that the ibt=on in the new kernels is the cause. Apparently in newer kernels, Indirect Branch Tracking [1] support was enabled by default [2] in the kernel, but older nvidia drivers do not have this support and end up with undefined behaviour instead. [3] On newer drivers they added support [4] This breaks 11th generation and newer intel CPUs with older nvidia cards. Maybe not so common :-) Please test the new driver version on a recent kernel once it gets available on your mirror in a few hours. Will test in few hours. - Adam [1] - https://lwn.net/Articles/889475/ [2] - https://www.phoronix.com/news/Linux-IBT-By-Default-Tip [3] - https://forum.garudalinux.org/t/nvidia-driver-crashes-my-computer-at-start-up/21157 [4] - https://www.nvidia.de/download/driverResults.aspx/200489/us
Bug#1052069: ways to proceed?
Hi Adam, On 31/10/2023 22.06, Adam Majer wrote: So, what's the way to proceed here? Can we add the boot parameter when the legacy kernel module is to be loaded on newer Intel processors? I probably made a mistake while backporting some non-trivial changes for supporting recent kernels. With the availability of 470.223.02 I could verify my backport against the upstream version, drop a lot of unneeded bits and fix some discrepancies... I've just uploaded 470.199.02-3 with these corrections (and support for Linux 6.6). This is not yet the new upstream release as I want to verify that these changes really fix the issue, as the same probably buggy backports have been applied to all older EoL driver series and need to be fixed there too. Please test the new driver version on a recent kernel once it gets available on your mirror in a few hours. Andreas
Bug#1052069: ways to proceed?
Hi, So, what's the way to proceed here? Can we add the boot parameter when the legacy kernel module is to be loaded on newer Intel processors? - Adam