Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-10-04 Thread Aurelien Jarno
Hi,

On 2022-10-04 08:51, Aurelien Jarno wrote:
> Hi
> 
> On 2022-09-25 13:43, Aurelien Jarno wrote:
> > > Running a quick diff against old procinfo reveals that "flags" has the
> > > following new entries now:
> > > 
> > > tsc_deadline_timer ssbd ibrs ibpb stibp bmi1 bmi2 md_clear flush_l1d
> > > 
> > > > it looks like that the BMI2
> > > > instructions support has been added in a microcode update
> > > 
> > > As such it does appear that indeed this is the case.
> > 
> > Thanks for the confirmation, it seems that the microcode update is also
> > useful for security reasons in order to mitigate the speculative
> > execution side channel issues (the famous spectre/meltdown).
> > 
> > Neverthless the AVX2 code should not use BMI2 instructions if they are
> > not available.
> 
> This has been fixed upstream and in the sid package. Next step is to get
> it fixed for stable.

Please find a test version here for people who have the possibility to
try:

https://people.debian.org/~aurel32/glibc/

Regards
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


signature.asc
Description: PGP signature


Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-10-04 Thread Aurelien Jarno
Hi,

On 2022-10-04 19:44, debian-bug-rep...@p0358.net wrote:
> > Is there an easy way to unbrick a system affected by the issue? such as
> > a kernel-line option or a configuration file in /etc? I don't see how I
> > can set a GLIBC_TUNABLES environment variable for the whole system.
> 
> I was trying during my testing to set such option globally somehow, but
> failed, though maybe some method for this exists. As it stands I only see
> two possibilities of unbricking a system, both assuming you can access the
> partition externally from some bootable system:
> 
> 1. Downgrade the affected libc6 package to a version before the one causing
> issues (either chroot and dpkg, or just extract and physically replace the
> files), after booting apt-mark hold libc6 to prevent faulty update from
> being installed until the issue is fixed
> 
> 2. Or install intel-microcode package, assuming the microcode update adds
> the missing instructions in particular case, basically coincidentally fixing
> this issue (the updated CPU microcode is loaded on every bootup)

Please note that the microcode update might also be done through a
BIOS/firmware update if available.

Regards
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-10-04 Thread Samuel Thibault
Hello,

Is there an easy way to unbrick a system affected by the issue? such as
a kernel-line option or a configuration file in /etc? I don't see how I
can set a GLIBC_TUNABLES environment variable for the whole system.

Samuel



Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-10-04 Thread debian-bug-report

Is there an easy way to unbrick a system affected by the issue? such as
a kernel-line option or a configuration file in /etc? I don't see how I
can set a GLIBC_TUNABLES environment variable for the whole system.


I was trying during my testing to set such option globally somehow, but 
failed, though maybe some method for this exists. As it stands I only 
see two possibilities of unbricking a system, both assuming you can 
access the partition externally from some bootable system:


1. Downgrade the affected libc6 package to a version before the one 
causing issues (either chroot and dpkg, or just extract and physically 
replace the files), after booting apt-mark hold libc6 to prevent faulty 
update from being installed until the issue is fixed


2. Or install intel-microcode package, assuming the microcode update 
adds the missing instructions in particular case, basically 
coincidentally fixing this issue (the updated CPU microcode is loaded on 
every bootup)




Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-10-04 Thread Aurelien Jarno
Hi

On 2022-09-25 13:43, Aurelien Jarno wrote:
> > Running a quick diff against old procinfo reveals that "flags" has the
> > following new entries now:
> > 
> > tsc_deadline_timer ssbd ibrs ibpb stibp bmi1 bmi2 md_clear flush_l1d
> > 
> > > it looks like that the BMI2
> > > instructions support has been added in a microcode update
> > 
> > As such it does appear that indeed this is the case.
> 
> Thanks for the confirmation, it seems that the microcode update is also
> useful for security reasons in order to mitigate the speculative
> execution side channel issues (the famous spectre/meltdown).
> 
> Neverthless the AVX2 code should not use BMI2 instructions if they are
> not available.

This has been fixed upstream and in the sid package. Next step is to get
it fixed for stable.

Regards
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-25 Thread Aurelien Jarno
On 2022-09-25 12:02, debian-bug-rep...@p0358.net wrote:
> > Now that we understood the bug, I actually find strange that the
> > microcode update is fixing this, it looks like that the BMI2
> > instructions support has been added in a microcode update. Would it be
> > possible to give the output of /proc/cpuinfo with and without the
> > microcode update applied?
> 
> The /proc/cpuinfo without microcode update is already attached somewhere
> above in the bug report, the new one after update is as follows:
> 
> processor   : 0
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 60
> model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
> stepping: 3
> microcode   : 0x28
> cpu MHz : 2400.000
> cache size  : 3072 KB
> physical id : 0
> siblings: 4
> core id : 0
> cpu cores   : 2
> apicid  : 0
> initial apicid  : 0
> fpu : yes
> fpu_exception   : yes
> cpuid level : 13
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2
> ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt
> tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm cpuid_fault epb
> invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept
> vpid ept_ad fsgsbase tsc_adjust bmi1 smep bmi2 erms invpcid xsaveopt dtherm
> arat pln pts md_clear flush_l1d
> vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb
> flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple
> bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
> mds swapgs itlb_multihit srbds
> bogomips: 4788.76
> clflush size: 64
> cache_alignment : 64
> address sizes   : 39 bits physical, 48 bits virtual
> power management:
> 
> Please note that "avx2" is once again missing due to the kernel masking flag
> from before that I once again forgot to remove before rebooting, and sorry
> for confusion it might cause -- that flag would normally be there.
> 
> Running a quick diff against old procinfo reveals that "flags" has the
> following new entries now:
> 
> tsc_deadline_timer ssbd ibrs ibpb stibp bmi1 bmi2 md_clear flush_l1d
> 
> > it looks like that the BMI2
> > instructions support has been added in a microcode update
> 
> As such it does appear that indeed this is the case.

Thanks for the confirmation, it seems that the microcode update is also
useful for security reasons in order to mitigate the speculative
execution side channel issues (the famous spectre/meltdown).

Neverthless the AVX2 code should not use BMI2 instructions if they are
not available.

Regards
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-25 Thread debian-bug-report

Now that we understood the bug, I actually find strange that the
microcode update is fixing this, it looks like that the BMI2
instructions support has been added in a microcode update. Would it be
possible to give the output of /proc/cpuinfo with and without the
microcode update applied?


The /proc/cpuinfo without microcode update is already attached somewhere 
above in the bug report, the new one after update is as follows:


processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x28
cpu MHz : 2400.000
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 
movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm 
cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi 
flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 smep bmi2 erms 
invpcid xsaveopt dtherm arat pln pts md_clear flush_l1d
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad 
ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid 
unrestricted_guest ple
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf mds swapgs itlb_multihit srbds

bogomips: 4788.76
clflush size: 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

Please note that "avx2" is once again missing due to the kernel masking 
flag from before that I once again forgot to remove before rebooting, 
and sorry for confusion it might cause -- that flag would normally be there.


Running a quick diff against old procinfo reveals that "flags" has the 
following new entries now:


tsc_deadline_timer ssbd ibrs ibpb stibp bmi1 bmi2 md_clear flush_l1d

> it looks like that the BMI2
> instructions support has been added in a microcode update

As such it does appear that indeed this is the case.



Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-25 Thread Aurelien Jarno
On 2022-09-25 00:35, debian-bug-rep...@p0358.net wrote:
> Hello, sorry for delayed response, I've managed to collect and analyze a few
> coredump files with valid symbols (I installed libc6-dbg and dpkg-dev, and
> pointed gdb at Debian's debuginfod server, also used apt-get source to get
> the sources for libc6).

Thanks a lot for your work. With more data, it's way easier to
understand the issue. 

> It seems there are at least 3-4 distinct places it crashes at, two places at
> memchr-avx2.S, one at strlen-avx2.S, and potentially one at
> syscall-template.S, although that last one may be just some kind of kill
> signal redirect.

The failing places in memchr-avx2.S and strlen-avx2.S points to BMI2
(bit manipulation instructions) which have been introduced in the AVX2
code, which should not have happened. The syscall-template.S is likely
code that catches the signal to display a message and then re-emit it. 

> It does seem in case of this SIGILL there's no additional stack trace, also
> the path containing ".." seems to cause the source code resolution to fail,
> but still the debug symbols seem to show the file source and line, so it
> should hopefully help see what exactly fails.
> 
> I'm yet to try rebooting with microcode package installed though (I'll soon
> check it and update on whether it helps, but even if it does, one without
> bootable system first won't get a chance to install it; I'm a bit curious
> how these changes did trigger this, given all these years it didn't happen
> to occur before)
 
I agree with you that this should be fixed without a microcode update, I
am going to report that issue upstream and we'll get the fix in the
Debian package.

Now that we understood the bug, I actually find strange that the
microcode update is fixing this, it looks like that the BMI2
instructions support has been added in a microcode update. Would it be
possible to give the output of /proc/cpuinfo with and without the
microcode update applied?

Regards
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-24 Thread debian-bug-report
I can confirm updating the microcode by installing the intel-microcode 
package and rebooting does indeed mitigate this issue. An LXC container 
that was previously bricked due to update now starts and seems to behave 
fully normally.


[0.00] microcode: microcode updated early to revision 0x28, date 
= 2019-11-12


But as microcode update needs to be loaded every time on boot (unless I 
presumably updated the UEFI), while it technically solves my problem on 
this installation, the concern of people with the same family of 
processors and outdated microcode running into this issue and having no 
idea why any Linux does not want to boot anymore still probably 
remains... (is there even any easy way to load updated microcode while 
installing Debian? I can most certainly bet its ISO does not include 
those due to non-free constraints)




Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-24 Thread debian-bug-report
Hello, sorry for delayed response, I've managed to collect and analyze a 
few coredump files with valid symbols (I installed libc6-dbg and 
dpkg-dev, and pointed gdb at Debian's debuginfod server, also used 
apt-get source to get the sources for libc6).


It seems there are at least 3-4 distinct places it crashes at, two 
places at memchr-avx2.S, one at strlen-avx2.S, and potentially one at 
syscall-template.S, although that last one may be just some kind of kill 
signal redirect.


Pasting all below:

Core was generated by `apt'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:400
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/memchr-avx2.S.
400 ../sysdeps/x86_64/multiarch/memchr-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `dpkg'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:514
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/strlen-avx2.S.
514 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `/usr/bin/perl /usr/sbin/adduser'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:135
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/memchr-avx2.S.
135 ../sysdeps/x86_64/multiarch/memchr-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `useradd'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:135
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/memchr-avx2.S.
135 ../sysdeps/x86_64/multiarch/memchr-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `passwd'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:514
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/strlen-avx2.S.
514 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `bash'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x7f2006faf087 in kill () at ../sysdeps/unix/syscall-template.S:120
Download failed: Invalid argument.  Continuing without source file 
./signal/../sysdeps/unix/syscall-template.S.

120 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb)

###

Core was generated by `su'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:135
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/memchr-avx2.S.
135 ../sysdeps/x86_64/multiarch/memchr-avx2.S: No such file or 
directory.

(gdb)

###

It does seem in case of this SIGILL there's no additional stack trace, 
also the path containing ".." seems to cause the source code resolution 
to fail, but still the debug symbols seem to show the file source and 
line, so it should hopefully help see what exactly fails.


I'm yet to try rebooting with microcode package installed though (I'll 
soon check it and update on whether it helps, but even if it does, one 
without bootable system first won't get a chance to install it; I'm a 
bit curious how these changes did trigger this, given all these years it 
didn't happen to occur before)




Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-24 Thread Stephen Kitt
Hi Aurelien,

On Tue, Sep 20, 2022 at 11:20:26PM +0200, Aurelien Jarno wrote:
> Have you been able to progress on that? Do you need some help for a
> specific step?

For what it’s worth, I’ve upgraded libc6 on my Haswell system (Xeon
E3-1245v3) and everything seems to be working fine.

Regards,

Stephen


signature.asc
Description: PGP signature


Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-20 Thread Aurelien Jarno
Hi,

Have you been able to progress on that? Do you need some help for a
specific step?

Regards
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-15 Thread Aurelien Jarno
Hi,

On 2022-09-15 20:59, debian-bug-rep...@p0358.net wrote:
> > The first thing would be to provide the output of /proc/cpuinfo
> 
> Pasting below (please **NOTE** that "avx2" would normally be there, but is
> currently missing due to this kernel option `clearcpuid=293` with which I
> booted the PC now -- I can **100%** confirm "avx2" was there before, but
> don't want to reboot for now to remove this kernel flag):
> 
> # cat /proc/cpuinfo
> processor   : 0
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 60
> model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
> stepping: 3
> microcode   : 0x12
> cpu MHz : 2394.664
> cache size  : 3072 KB
> physical id : 0
> siblings: 4
> core id : 0
> cpu cores   : 2
> apicid  : 0
> initial apicid  : 0
> fpu : yes
> fpu_exception   : yes
> cpuid level : 13
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2
> ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt xsave avx f16c
> rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi
> flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms invpcid xsaveopt
> dtherm arat pln pts
> vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb
> flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple
> bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
> mds swapgs itlb_multihit srbds
> bogomips: 4789.10
> clflush size: 64
> cache_alignment : 64
> address sizes   : 39 bits physical, 48 bits virtual
> power management:

Thanks.

> > If you believe the issue is due to AVX2, clearcpuid won't help, as it
> > just clear the corresponding flags from the kernel point of view, but
> > the cpuid instruction will just continue to behave the same. The way to
> > do disable that features at the glibc level is to set the GLIBC_TUNABLES
> > environment variable to "glibc.cpu.hwcaps=-AVX2_Usable".
> 
> This works! Indeed the clearcpuid flags itself on its own did nothing as you
> mentioned too. This workaround is great to know then for the time being.

Great, that's narrowing down the problem.

> > Same from there due to ASLR. It seems to fail in at least two different
> > locations. Do you have some extra lines around, sometimes the kernel
> > dump the addresses around the instruction pointer?
> 
> Generally these lines all followed similar pattern, and there was nothing
> printed below or after, just this single line per crash. I will paste a few
> more below. Isn't the "+15a000" the relative offset in libc .so though? It

The +15a000 is the size of the libc.so.6 mapping in the virtual memory.

> does seem like an oddly round number, but I loaded the library in IDA
> disassembler and the instructions at this offset do seem to be related to
> AVX2 (linking screenshot which I also pasted on the linked GitHub issue)
> (the highlighted instruction in gray seems to be the one at this
> aforementioned offset):
> https://user-images.githubusercontent.com/5182588/190256853-29ae80aa-0089-4da2-a430-990e2693d15c.png
> 
> If my above hypithesis is correct, then I looked at the mother function in
> x-refs and it does seem to be defined in rtld_global_ro table, and its name
> is "__strncmp_avx2". Was something changed in this function between the
> updates?
> 
> Pasting more kernel lines:
> kernel: [852124.361775] traps: dhclient[1583381] trap invalid opcode
> ip:7fe19118051d sp:7ffee6e36238 error:0 in libc-2.31.so[7fe191044000+15a000]
> kernel: [852124.468314] traps: nft[1583398] trap invalid opcode
> ip:7fe3418fe51d sp:7fff11342df8 error:0 in libc-2.31.so[7fe3417c2000+15a000]
> kernel: [852124.572700] traps: systemd-shutdow[1377424] trap invalid opcode
> ip:7fde88b724ad sp:7ffc13767028 error:0 in libc-2.31.so[7fde88a3a000+15a000]
> kernel: [  270.477024] traps: bun[2055] trap invalid opcode ip:2e363f4
> sp:7ffe2320d640 error:0 in bun[2a6f000+2ce2000]
> kernel: [  279.884807] traps: systemd[2115] trap invalid opcode
> ip:7faf645ec4ad sp:7ffe12e06c48 error:0 in libc-2.31.so[7faf644b4000+15a000]
> kernel: [  299.637575] traps: bun[2296] trap invalid opcode ip:2e363f4
> sp:7ffd0c0bc9c0 error:0 in bun[2a6f000+2ce2000]
> kernel: [  331.036417] traps: bash[2462] trap invalid opcode ip:7ff42840051d
> sp:7ffd34ad7278 error:0 in libc-2.31.so[7ff4282c4000+15a000]
> kernel: [  357.184428] traps: bash[2652] trap invalid opcode ip:7f717873751d
> sp:7fffd34c8848 error:0 in libc-2.31.so[7f71785fb000+15a000]
> kernel: [  645.517556] traps: bash[3508] trap invalid opcode ip:7f4b6ee8851d
> sp:7ffd74beb6e8 error:0 in libc-2.31.so[7f4b6ed4c000+15a000]
> kernel: [  876.760209] traps: bash[4225] 

Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-15 Thread debian-bug-report

> The first thing would be to provide the output of /proc/cpuinfo

Pasting below (please **NOTE** that "avx2" would normally be there, but 
is currently missing due to this kernel option `clearcpuid=293` with 
which I booted the PC now -- I can **100%** confirm "avx2" was there 
before, but don't want to reboot for now to remove this kernel flag):


# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x12
cpu MHz : 2394.664
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 
movbe popcnt xsave avx f16c rdrand lahf_lm abm cpuid_fault epb 
invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase 
tsc_adjust smep erms invpcid xsaveopt dtherm arat pln pts
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad 
ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid 
unrestricted_guest ple
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf mds swapgs itlb_multihit srbds

bogomips: 4789.10
clflush size: 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x12
cpu MHz : 2400.000
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 
movbe popcnt xsave avx f16c rdrand lahf_lm abm cpuid_fault epb 
invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase 
tsc_adjust smep erms invpcid xsaveopt dtherm arat pln pts
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad 
ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid 
unrestricted_guest ple
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf mds swapgs itlb_multihit srbds

bogomips: 4789.10
clflush size: 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x12
cpu MHz : 2400.000
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 
movbe popcnt xsave avx f16c rdrand lahf_lm abm cpuid_fault epb 
invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase 
tsc_adjust smep erms invpcid xsaveopt dtherm arat pln pts
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad 
ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid 
unrestricted_guest ple
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf mds swapgs itlb_multihit srbds

bogomips: 4789.10
clflush size: 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x12
cpu MHz : 2400.000
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
apicid  : 3
initial apicid  : 3
fpu : yes
fpu_exception   : yes
cpuid 

Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-15 Thread Aurelien Jarno
Hi,

On 2022-09-15 01:37, debian-bug-rep...@p0358.net wrote:
> Package: libc6
> Version: 2.31-13+deb11u4
> Severity: critical
> 
> Dear Maintainer,
> 
> After an upgrade to version +deb11u4 on my system running Haswell
> (4th gen Intel Core) CPU, most of the programs including bash or dpkg
> are immediately crashing with SIGILL. The problem seems to be caused/
> related to AVX2 and changes made to some functions utilizing this
> instruction set. I don't know much about Debian bug reporting, so forgive me
> any mistakes I've made.
> The issue is on both host, LXC and Docker.
> I have described more on this link:
> https://github.com/debuerreotype/docker-debian-artifacts/issues/175
> where I also linked my coredump from example program and described stuff
> more thoroughly.

First of all, sorry about the issue, it should not have slipped in a
stable release. Unfortunately I am not able to reproduce the issue. I
have tried on 3rd gen or 5th gen Intel Core CPUs, but failed to
reproduce it. Therefore I will need your help to understand the issue.

The first thing would be to provide the output of /proc/cpuinfo

> Coredump link directly just in case: 
> https://github.com/debuerreotype/docker-debian-artifacts/files/9569748/core.bash.10.2663c40e671041e6b40c882a70b83c3f.1480736.166318582400.zip

Unfortunately I am not able to use this core dump to get the instruction
that trigger the SIGILL, even after installing debug symbols packages.


> Also log lines from kernel:
> kernel: [834669.721253] traps: dpkg[1455373] trap invalid opcode
> ip:7fa39701951d sp:7ffc4ad26e58 error:0 in libc-2.31.so[7fa396edd000+15a000]
> kernel: [834669.732958] traps: dpkg[1455374] trap invalid opcode
> ip:7f529ca9551d sp:7fffb6f0a238 error:0 in libc-2.31.so[7f529c959000+15a000]
> kernel: [834669.840128] traps: dpkg[1455375] trap invalid opcode
> ip:7f1874cc951d sp:7fffc2c2f5d8 error:0 in libc-2.31.so[7f1874b8d000+15a000]
> kernel: [834669.907918] traps: dpkg[1455378] trap invalid opcode
> ip:7f3b4f8d851d sp:7fff3ec970f8 error:0 in libc-2.31.so[7f3b4f79c000+15a000]
> kernel: [834712.152139] traps: passwd[1455693] trap invalid opcode
> ip:7fefee4b52b7 sp:7cb506b8 error:0 in libc-2.31.so[7fefee37d000+15a000]

Same from there due to ASLR. It seems to fail in at least two different
locations. Do you have some extra lines around, sometimes the kernel
dump the addresses around the instruction pointer?

> Not sure what exactly might be causing the issue, but if these changes
> aren't pulled, potentially anyone with this or similar CPU as me will
> upgrade and end up with bricked system.

The changes that are in this stable release have been (or at least were
supposed to, given the bug you reported) in testing/sid for a few
months. Are you able to do a test with debian sid, for instance in
docker?

> I will proceed to try using `clearcpuid=293` kernel flag myself, but
> consider how many distros depend on Debian, live CDs etc, with people unable
> to figure out why their system became useless, unable to trace the source,
> and blaming it just on Linux...

If you believe the issue is due to AVX2, clearcpuid won't help, as it
just clear the corresponding flags from the kernel point of view, but
the cpuid instruction will just continue to behave the same. The way to
do disable that features at the glibc level is to set the GLIBC_TUNABLES
environment variable to "glibc.cpu.hwcaps=-AVX2_Usable".
 
Regards
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-14 Thread debian-bug-report

Package: libc6
Version: 2.31-13+deb11u4
Severity: critical

Dear Maintainer,

After an upgrade to version +deb11u4 on my system running Haswell
(4th gen Intel Core) CPU, most of the programs including bash or dpkg
are immediately crashing with SIGILL. The problem seems to be caused/
related to AVX2 and changes made to some functions utilizing this 
instruction set. I don't know much about Debian bug reporting, so 
forgive me any mistakes I've made.

The issue is on both host, LXC and Docker.
I have described more on this link:
https://github.com/debuerreotype/docker-debian-artifacts/issues/175
where I also linked my coredump from example program and described stuff 
more thoroughly.


Coredump link directly just in case: 
https://github.com/debuerreotype/docker-debian-artifacts/files/9569748/core.bash.10.2663c40e671041e6b40c882a70b83c3f.1480736.166318582400.zip


Also log lines from kernel:
kernel: [834669.721253] traps: dpkg[1455373] trap invalid opcode 
ip:7fa39701951d sp:7ffc4ad26e58 error:0 in libc-2.31.so[7fa396edd000+15a000]
kernel: [834669.732958] traps: dpkg[1455374] trap invalid opcode 
ip:7f529ca9551d sp:7fffb6f0a238 error:0 in libc-2.31.so[7f529c959000+15a000]
kernel: [834669.840128] traps: dpkg[1455375] trap invalid opcode 
ip:7f1874cc951d sp:7fffc2c2f5d8 error:0 in libc-2.31.so[7f1874b8d000+15a000]
kernel: [834669.907918] traps: dpkg[1455378] trap invalid opcode 
ip:7f3b4f8d851d sp:7fff3ec970f8 error:0 in libc-2.31.so[7f3b4f79c000+15a000]
kernel: [834712.152139] traps: passwd[1455693] trap invalid opcode 
ip:7fefee4b52b7 sp:7cb506b8 error:0 in libc-2.31.so[7fefee37d000+15a000]


Not sure what exactly might be causing the issue, but if these changes 
aren't pulled, potentially anyone with this or similar CPU as me will 
upgrade and end up with bricked system.
I will proceed to try using `clearcpuid=293` kernel flag myself, but 
consider how many distros depend on Debian, live CDs etc, with people 
unable to figure out why their system became useless, unable to trace 
the source, and blaming it just on Linux...


I'm filling this bug report from my downgraded host system to the 
previous libc6 version.


   * What led up to the situation? apt upgrade...
   * What exactly did you do (or not do) that was effective (or
 ineffective)? downgrade to +deb11u3
   * What was the outcome of this action? everything works on the older 
version

   * What outcome did you expect instead?


-- System Information:
Debian Release: 11.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 
'stable')

Architecture: amd64 (x86_64)

Kernel: Linux 5.15.39-1-pve (SMP w/4 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE 
not set

Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libc6 depends on:
ii  libcrypt1  1:4.4.18-4
ii  libgcc-s1  10.2.1-6

Versions of packages libc6 recommends:
ii  libidn2-0   2.3.0-5
pn  libnss-nis  
pn  libnss-nisplus  

Versions of packages libc6 suggests:
ii  debconf [debconf-2.0]  1.5.77
pn  glibc-doc  
ii  libc-l10n  2.31-13+deb11u3
ii  locales2.31-13+deb11u3

-- debconf information:
  glibc/disable-screensaver:
  glibc/restart-services:
  glibc/kernel-not-supported:
  glibc/kernel-too-old:
  libraries/restart-without-asking: false
  glibc/restart-failed:
  glibc/upgrade: true