Bug#1052069: ways to proceed?

2023-11-12 Thread Andreas Beckmann

On 12/11/2023 15.21, Adam Majer wrote:

On 2023-11-10 10:51, Andreas Beckmann wrote:

The module should continue to work on 6.1
The module should continue to work on 6.5 booted with ibt=off
The module should fail to load with an error message describing the 
issue on 6.5 with ibt enabled, but without a kernel BUG.


Did you find some time to test the package patched with my IBT related 
changes?


I've managed to add the patch today and recompile. Complete success and 
works as expected. It no longer crashes when ibt=on.


Initially I've had an idea to automatically set this kernel command line 
boot values when the module is installed, but having this informative 


Theoretically I could have patched the module to disable IBT at runtime 
while loading the incompatible module ... but I doubt that silently* 
lowering the system security level is a good idea ... that should rather 
be admin's choice. Also, this behavioral change wouldn't exist in any 
other distro.


*) silently includes kprint("LOWERING SECURITY LEVEL BY DISABLING IBT TO 
LOAD PROPRIETARY LEGACY KERNEL MODULE")


message is actually better. One nit-pick is that the kernel message is 
not clear as to what module it's talking about.


kernel: NVRM: This module is incompatible with IBT. Try booting with 
ibt=off.


A better message could be,

kernel: NVRM: This Nvidia driver is incompatible with IBT. Try booting 
with ibt=off.


Thanks for the feedback, I'll update the error message and apply this 
patch to all driver series up to 470. (This does not seem to be fixed in 
470.223.02, and maybe upstream will never provide an IBT-enabled build 
for the old driver series. The percentage of people running a modern cpu 
with a legacy gpu is probably very low.)


Andreas



Bug#1052069: ways to proceed?

2023-11-12 Thread Adam Majer

On 2023-11-10 10:51, Andreas Beckmann wrote:

The module should continue to work on 6.1
The module should continue to work on 6.5 booted with ibt=off
The module should fail to load with an error message describing the 
issue on 6.5 with ibt enabled, but without a kernel BUG.


Did you find some time to test the package patched with my IBT related 
changes?


I've managed to add the patch today and recompile. Complete success and 
works as expected. It no longer crashes when ibt=on.


Initially I've had an idea to automatically set this kernel command line 
boot values when the module is installed, but having this informative 
message is actually better. One nit-pick is that the kernel message is 
not clear as to what module it's talking about.


kernel: NVRM: This module is incompatible with IBT. Try booting with 
ibt=off.


A better message could be,

kernel: NVRM: This Nvidia driver is incompatible with IBT. Try booting 
with ibt=off.


Thanks!
- Adam



Bug#1052069: ways to proceed?

2023-11-10 Thread Andreas Beckmann

Hi Adam,

On 04/11/2023 01.24, Andreas Beckmann wrote:

Or you could simply install the -dkms package from
https://people.debian.org/~anbe/1052069/

The module should continue to work on 6.1
The module should continue to work on 6.5 booted with ibt=off
The module should fail to load with an error message describing the 
issue on 6.5 with ibt enabled, but without a kernel BUG.


Did you find some time to test the package patched with my IBT related 
changes?


Thanks

Andreas



Bug#1052069: ways to proceed?

2023-11-03 Thread Andreas Beckmann

On 03/11/2023 22.39, Adam Majer wrote:
You need to specify ibt=off to kernel at boot time for the older nvidia 
modules to work. Since the kernel has this protection enabled by 
default, it will have to be disabled until such time as nvidia bothers 
to update/recompile these older drivers like they did the recent ones.


Thanks for your test.

Since that protection can be disabled with ibt=off, it should be 
possible to test for the status (not available/enabled/disabled) at 
module load time and fail the load process with an informative error 
message instead of calling into the incompatible code from the blob.


The EoL driver series (e.g. anything predating the 470 series) will not 
see any further updates from NVIDIA. And people still really want to use 
these old drivers for ancient hardware.


...

Maybe that was really simple. Could you try the attached patch?
(apply it to the source in /usr/src/nvidia-tesla-470-* and rebuild the 
dkms module)

Or you could simply install the -dkms package from
https://people.debian.org/~anbe/1052069/

The module should continue to work on 6.1
The module should continue to work on 6.5 booted with ibt=off
The module should fail to load with an error message describing the 
issue on 6.5 with ibt enabled, but without a kernel BUG.



AndreasFrom 17b722086d1cbc19ec6ea811a334ff9d90ad90e6 Mon Sep 17 00:00:00 2001
From: Andreas Beckmann 
Date: Sat, 4 Nov 2023 00:44:56 +0100
Subject: [PATCH] refuse to load module if IBT is enabled

---
 nvidia-modeset/nvidia-modeset-linux.c | 7 +++
 nvidia/nv.c   | 7 +++
 2 files changed, 14 insertions(+)

diff --git a/nvidia-modeset/nvidia-modeset-linux.c b/nvidia-modeset/nvidia-modeset-linux.c
index 04a8ac4..95e668e 100644
--- a/nvidia-modeset/nvidia-modeset-linux.c
+++ b/nvidia-modeset/nvidia-modeset-linux.c
@@ -1651,6 +1651,13 @@ static int __init nvkms_init(void)
 {
 int ret;
 
+#ifdef CONFIG_X86_KERNEL_IBT
+if (cpu_feature_enabled(X86_FEATURE_IBT)) {
+printk(KERN_ERR NVKMS_LOG_PREFIX "This module is incompatible with IBT. Try booting with ibt=off.");
+return -EINVAL;
+}
+#endif
+
 atomic_set(_alloc_called_count, 0);
 
 ret = nvkms_alloc_rm();
diff --git a/nvidia/nv.c b/nvidia/nv.c
index 42778da..a005d69 100644
--- a/nvidia/nv.c
+++ b/nvidia/nv.c
@@ -739,6 +739,13 @@ int __init nvidia_init_module(void)
 nvidia_stack_t *sp = NULL;
 NvU32 allow_no_gpu_init = 0;
 
+#ifdef CONFIG_X86_KERNEL_IBT
+if (cpu_feature_enabled(X86_FEATURE_IBT)) {
+printk(KERN_ERR "NVRM: This module is incompatible with IBT. Try booting with ibt=off.");
+return -EINVAL;
+}
+#endif
+
 nv_memdbg_init();
 
 rc = nv_procfs_init();
-- 
2.20.1



Bug#1052069: ways to proceed?

2023-11-03 Thread Adam Majer

reopen 1052069
retitle -1 kernel oops on module load due to IBT=ON in recent kernels
thanks

On 2023-11-03 12:42, Andreas Beckmann wrote:

Hi Adam,

On 31/10/2023 22.06, Adam Majer wrote:
So, what's the way to proceed here? Can we add the boot parameter when 
the legacy kernel module is to be loaded on newer Intel processors?


I probably made a mistake while backporting some non-trivial changes for 
supporting recent kernels. With the availability of 470.223.02 I could 
verify my backport against the upstream version, drop a lot of unneeded 
bits and fix some discrepancies...


So, I've now tested it against 6.5.0-3-amd64 and the problem remains. 
It's not related to any of your "mistakes" :-)


You need to specify ibt=off to kernel at boot time for the older nvidia 
modules to work. Since the kernel has this protection enabled by 
default, it will have to be disabled until such time as nvidia bothers 
to update/recompile these older drivers like they did the recent ones.


- Adam



Bug#1052069: ways to proceed?

2023-11-03 Thread Adam Majer

On 11/3/23 12:42, Andreas Beckmann wrote:
I probably made a mistake while backporting some non-trivial changes for 
supporting recent kernels. With the availability of 470.223.02 I could 
verify my backport against the upstream version, drop a lot of unneeded 
bits and fix some discrepancies...


Hi Andreas!

I will verify if this fixes this issue, though I will still bet that the 
ibt=on in the new kernels is the cause. Apparently in newer kernels, 
Indirect Branch Tracking [1] support was enabled by default [2] in the 
kernel, but older nvidia drivers do not have this support and end up 
with undefined behaviour instead. [3] On newer drivers they added 
support [4]


This breaks 11th generation and newer intel CPUs with older nvidia 
cards. Maybe not so common :-)



Please test the new driver version on a recent kernel once it gets 
available on your mirror in a few hours.


Will test in few hours.

- Adam

[1] - https://lwn.net/Articles/889475/
[2] - https://www.phoronix.com/news/Linux-IBT-By-Default-Tip
[3] - 
https://forum.garudalinux.org/t/nvidia-driver-crashes-my-computer-at-start-up/21157

[4] - https://www.nvidia.de/download/driverResults.aspx/200489/us



Bug#1052069: ways to proceed?

2023-11-03 Thread Andreas Beckmann

Hi Adam,

On 31/10/2023 22.06, Adam Majer wrote:
So, what's the way to proceed here? Can we add the boot parameter when 
the legacy kernel module is to be loaded on newer Intel processors?


I probably made a mistake while backporting some non-trivial changes for 
supporting recent kernels. With the availability of 470.223.02 I could 
verify my backport against the upstream version, drop a lot of unneeded 
bits and fix some discrepancies...


I've just uploaded 470.199.02-3 with these corrections (and support for 
Linux 6.6). This is not yet the new upstream release as I want to verify 
that these changes really fix the issue, as the same probably buggy 
backports have been applied to all older EoL driver series and need to 
be fixed there too.


Please test the new driver version on a recent kernel once it gets 
available on your mirror in a few hours.



Andreas



Bug#1052069: ways to proceed?

2023-10-31 Thread Adam Majer

Hi,

So, what's the way to proceed here? Can we add the boot parameter when 
the legacy kernel module is to be loaded on newer Intel processors?


- Adam