Re: [gentoo-user] Invalid opcode after kernel update

2023-09-18 Thread Peter Böhm
Am Montag, 18. September 2023, 20:52:27 CEST schrieb Fernando Rodriguez:
> On 9/18/23 11:04, Fernando Rodriguez wrote:
> > On 9/17/23 18:03, Alan Mackenzie wrote:
> > I will try to run it on gdb to find out which instruction is triggering
> > the fault.
> >
> > Thanks,
> > Fernando
>
> The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM)
> i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it
> listed on /proc/cpuinfo. I can't reboot into the old kernel right now
> but I suspect that when I do it will be there because I kind of remember
>   seeing it there. Any clues?

It is Intel DOWNFALL, also called GDS Gather Data Sampling.

Maybe you want read: https://www.phoronix.com/review/downfall

Regards,
 Peter





Re: [gentoo-user] Invalid opcode after kernel update

2023-09-18 Thread Fernando Rodriguez

On 9/18/23 14:52, Fernando Rodriguez wrote:

On 9/18/23 11:04, Fernando Rodriguez wrote:

On 9/17/23 18:03, Alan Mackenzie wrote:
I will try to run it on gdb to find out which instruction is 
triggering the fault.


Thanks,
Fernando



The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM) 
i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it 
listed on /proc/cpuinfo. I can't reboot into the old kernel right now 
but I suspect that when I do it will be there because I kind of remember 
  seeing it there. Any clues?




Found this on my journal: "GDS: Microcode update needed! Disabling AVX 
as mitigation." So I guess it's a microcode issue. I'm using dracut with 
--early-microcode and I have CONFIG_MICROCODE_INTEL set and I have the 
latest (as of friday) intel-microcode. I don't have initramfs enabled 
for intel-microcode but never did and it was working. Will try it when I 
get back, gotta run now. Any more ideas?


--

Fernando Rodriguez




Re: [gentoo-user] Invalid opcode after kernel update

2023-09-18 Thread Fernando Rodriguez

On 9/18/23 11:04, Fernando Rodriguez wrote:

On 9/17/23 18:03, Alan Mackenzie wrote:
I will try to run it on gdb to find out which instruction is triggering 
the fault.


Thanks,
Fernando



The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM) 
i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it 
listed on /proc/cpuinfo. I can't reboot into the old kernel right now 
but I suspect that when I do it will be there because I kind of remember 
 seeing it there. Any clues?


--

Fernando Rodriguez




Re: [gentoo-user] Invalid opcode after kernel update

2023-09-18 Thread Fernando Rodriguez

On 9/17/23 18:03, Alan Mackenzie wrote:

Hello, Fernando.

On Sun, Sep 17, 2023 at 17:49:22 -0400, Fernando Rodriguez wrote:

A few months ago after updating my kernel I started getting an invalid
opcode error during boot on the init process on my initramfs which I did
rebuilt. Switching to the old kernel and initramfs fixed the problem so
I kept that kernel for a few months for lack of time.



Today I rebuilt the whole system using `emerge -e @world` and after that
I'm able to boot the new kernel but now some pre-compiled packages (and
some that emerge -e missed because the ebuild was masked) crash with
illegal opcode. In the case of chrome it's not crashing but it only
renders garbage for webpages.



Does anyone have a clue what is happening? It's like the instruction set
changed after the kernel update (or was it the microcode?)


Could it be that you've got a sporadic RAM failure?  Running the
standard RAM test (the one you boot into, I've forgotten its name) for
many hours might pin down the problem.


I ran the test to be sure but it's not sporadic. It happens all the time 
with the same pre-built binaries. My last working kernel was 5.15.122, 
if I boot from that kernel everything works. Before the update 
everything was built with -march=native and before the 'emerge -e' I 
switched to -mtune=generic but I don't think it was the flags that 
messed it up but the act of rebuilding because after rebuilding the 
whole system I'm still having issues with pre-compiled binaries and 
those should be generic builds. Strangely the same binaries that crash 
on the host system run fine on a VM using hw virtualization.


I will try to run it on gdb to find out which instruction is triggering 
the fault.


Thanks,
Fernando




Re: [gentoo-user] Invalid opcode after kernel update

2023-09-17 Thread Alan Mackenzie
Hello, Fernando.

On Sun, Sep 17, 2023 at 17:49:22 -0400, Fernando Rodriguez wrote:
> A few months ago after updating my kernel I started getting an invalid 
> opcode error during boot on the init process on my initramfs which I did 
> rebuilt. Switching to the old kernel and initramfs fixed the problem so 
> I kept that kernel for a few months for lack of time.

> Today I rebuilt the whole system using `emerge -e @world` and after that 
> I'm able to boot the new kernel but now some pre-compiled packages (and 
> some that emerge -e missed because the ebuild was masked) crash with 
> illegal opcode. In the case of chrome it's not crashing but it only 
> renders garbage for webpages.

> Does anyone have a clue what is happening? It's like the instruction set 
> changed after the kernel update (or was it the microcode?)

Could it be that you've got a sporadic RAM failure?  Running the
standard RAM test (the one you boot into, I've forgotten its name) for
many hours might pin down the problem.

> Thanks,

> -- 

> Fernando Rodriguez

-- 
Alan Mackenzie (Nuremberg, Germany).