Re: [gentoo-user] Invalid opcode after kernel update
Am Montag, 18. September 2023, 20:52:27 CEST schrieb Fernando Rodriguez: > On 9/18/23 11:04, Fernando Rodriguez wrote: > > On 9/17/23 18:03, Alan Mackenzie wrote: > > I will try to run it on gdb to find out which instruction is triggering > > the fault. > > > > Thanks, > > Fernando > > The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM) > i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it > listed on /proc/cpuinfo. I can't reboot into the old kernel right now > but I suspect that when I do it will be there because I kind of remember > seeing it there. Any clues? It is Intel DOWNFALL, also called GDS Gather Data Sampling. Maybe you want read: https://www.phoronix.com/review/downfall Regards, Peter
Re: [gentoo-user] Invalid opcode after kernel update
On 9/18/23 14:52, Fernando Rodriguez wrote: On 9/18/23 11:04, Fernando Rodriguez wrote: On 9/17/23 18:03, Alan Mackenzie wrote: I will try to run it on gdb to find out which instruction is triggering the fault. Thanks, Fernando The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it listed on /proc/cpuinfo. I can't reboot into the old kernel right now but I suspect that when I do it will be there because I kind of remember seeing it there. Any clues? Found this on my journal: "GDS: Microcode update needed! Disabling AVX as mitigation." So I guess it's a microcode issue. I'm using dracut with --early-microcode and I have CONFIG_MICROCODE_INTEL set and I have the latest (as of friday) intel-microcode. I don't have initramfs enabled for intel-microcode but never did and it was working. Will try it when I get back, gotta run now. Any more ideas? -- Fernando Rodriguez
Re: [gentoo-user] Invalid opcode after kernel update
On 9/18/23 11:04, Fernando Rodriguez wrote: On 9/17/23 18:03, Alan Mackenzie wrote: I will try to run it on gdb to find out which instruction is triggering the fault. Thanks, Fernando The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it listed on /proc/cpuinfo. I can't reboot into the old kernel right now but I suspect that when I do it will be there because I kind of remember seeing it there. Any clues? -- Fernando Rodriguez
Re: [gentoo-user] Invalid opcode after kernel update
On 9/17/23 18:03, Alan Mackenzie wrote: Hello, Fernando. On Sun, Sep 17, 2023 at 17:49:22 -0400, Fernando Rodriguez wrote: A few months ago after updating my kernel I started getting an invalid opcode error during boot on the init process on my initramfs which I did rebuilt. Switching to the old kernel and initramfs fixed the problem so I kept that kernel for a few months for lack of time. Today I rebuilt the whole system using `emerge -e @world` and after that I'm able to boot the new kernel but now some pre-compiled packages (and some that emerge -e missed because the ebuild was masked) crash with illegal opcode. In the case of chrome it's not crashing but it only renders garbage for webpages. Does anyone have a clue what is happening? It's like the instruction set changed after the kernel update (or was it the microcode?) Could it be that you've got a sporadic RAM failure? Running the standard RAM test (the one you boot into, I've forgotten its name) for many hours might pin down the problem. I ran the test to be sure but it's not sporadic. It happens all the time with the same pre-built binaries. My last working kernel was 5.15.122, if I boot from that kernel everything works. Before the update everything was built with -march=native and before the 'emerge -e' I switched to -mtune=generic but I don't think it was the flags that messed it up but the act of rebuilding because after rebuilding the whole system I'm still having issues with pre-compiled binaries and those should be generic builds. Strangely the same binaries that crash on the host system run fine on a VM using hw virtualization. I will try to run it on gdb to find out which instruction is triggering the fault. Thanks, Fernando
Re: [gentoo-user] Invalid opcode after kernel update
Hello, Fernando. On Sun, Sep 17, 2023 at 17:49:22 -0400, Fernando Rodriguez wrote: > A few months ago after updating my kernel I started getting an invalid > opcode error during boot on the init process on my initramfs which I did > rebuilt. Switching to the old kernel and initramfs fixed the problem so > I kept that kernel for a few months for lack of time. > Today I rebuilt the whole system using `emerge -e @world` and after that > I'm able to boot the new kernel but now some pre-compiled packages (and > some that emerge -e missed because the ebuild was masked) crash with > illegal opcode. In the case of chrome it's not crashing but it only > renders garbage for webpages. > Does anyone have a clue what is happening? It's like the instruction set > changed after the kernel update (or was it the microcode?) Could it be that you've got a sporadic RAM failure? Running the standard RAM test (the one you boot into, I've forgotten its name) for many hours might pin down the problem. > Thanks, > -- > Fernando Rodriguez -- Alan Mackenzie (Nuremberg, Germany).