Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-10-04 Thread debian-bug-report

Is there an easy way to unbrick a system affected by the issue? such as
a kernel-line option or a configuration file in /etc? I don't see how I
can set a GLIBC_TUNABLES environment variable for the whole system.


I was trying during my testing to set such option globally somehow, but 
failed, though maybe some method for this exists. As it stands I only 
see two possibilities of unbricking a system, both assuming you can 
access the partition externally from some bootable system:


1. Downgrade the affected libc6 package to a version before the one 
causing issues (either chroot and dpkg, or just extract and physically 
replace the files), after booting apt-mark hold libc6 to prevent faulty 
update from being installed until the issue is fixed


2. Or install intel-microcode package, assuming the microcode update 
adds the missing instructions in particular case, basically 
coincidentally fixing this issue (the updated CPU microcode is loaded on 
every bootup)




Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-25 Thread debian-bug-report

Now that we understood the bug, I actually find strange that the
microcode update is fixing this, it looks like that the BMI2
instructions support has been added in a microcode update. Would it be
possible to give the output of /proc/cpuinfo with and without the
microcode update applied?


The /proc/cpuinfo without microcode update is already attached somewhere 
above in the bug report, the new one after update is as follows:


processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x28
cpu MHz : 2400.000
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 
movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm 
cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi 
flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 smep bmi2 erms 
invpcid xsaveopt dtherm arat pln pts md_clear flush_l1d
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad 
ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid 
unrestricted_guest ple
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf mds swapgs itlb_multihit srbds

bogomips: 4788.76
clflush size: 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

Please note that "avx2" is once again missing due to the kernel masking 
flag from before that I once again forgot to remove before rebooting, 
and sorry for confusion it might cause -- that flag would normally be there.


Running a quick diff against old procinfo reveals that "flags" has the 
following new entries now:


tsc_deadline_timer ssbd ibrs ibpb stibp bmi1 bmi2 md_clear flush_l1d

> it looks like that the BMI2
> instructions support has been added in a microcode update

As such it does appear that indeed this is the case.



Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-24 Thread debian-bug-report
I can confirm updating the microcode by installing the intel-microcode 
package and rebooting does indeed mitigate this issue. An LXC container 
that was previously bricked due to update now starts and seems to behave 
fully normally.


[0.00] microcode: microcode updated early to revision 0x28, date 
= 2019-11-12


But as microcode update needs to be loaded every time on boot (unless I 
presumably updated the UEFI), while it technically solves my problem on 
this installation, the concern of people with the same family of 
processors and outdated microcode running into this issue and having no 
idea why any Linux does not want to boot anymore still probably 
remains... (is there even any easy way to load updated microcode while 
installing Debian? I can most certainly bet its ISO does not include 
those due to non-free constraints)




Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-24 Thread debian-bug-report
Hello, sorry for delayed response, I've managed to collect and analyze a 
few coredump files with valid symbols (I installed libc6-dbg and 
dpkg-dev, and pointed gdb at Debian's debuginfod server, also used 
apt-get source to get the sources for libc6).


It seems there are at least 3-4 distinct places it crashes at, two 
places at memchr-avx2.S, one at strlen-avx2.S, and potentially one at 
syscall-template.S, although that last one may be just some kind of kill 
signal redirect.


Pasting all below:

Core was generated by `apt'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:400
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/memchr-avx2.S.
400 ../sysdeps/x86_64/multiarch/memchr-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `dpkg'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:514
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/strlen-avx2.S.
514 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `/usr/bin/perl /usr/sbin/adduser'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:135
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/memchr-avx2.S.
135 ../sysdeps/x86_64/multiarch/memchr-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `useradd'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:135
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/memchr-avx2.S.
135 ../sysdeps/x86_64/multiarch/memchr-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `passwd'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:514
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/strlen-avx2.S.
514 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or 
directory.

(gdb)

###

Core was generated by `bash'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x7f2006faf087 in kill () at ../sysdeps/unix/syscall-template.S:120
Download failed: Invalid argument.  Continuing without source file 
./signal/../sysdeps/unix/syscall-template.S.

120 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb)

###

Core was generated by `su'.
Program terminated with signal SIGILL, Illegal instruction.
#0  __memchr_avx2 () at ../sysdeps/x86_64/multiarch/memchr-avx2.S:135
Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/memchr-avx2.S.
135 ../sysdeps/x86_64/multiarch/memchr-avx2.S: No such file or 
directory.

(gdb)

###

It does seem in case of this SIGILL there's no additional stack trace, 
also the path containing ".." seems to cause the source code resolution 
to fail, but still the debug symbols seem to show the file source and 
line, so it should hopefully help see what exactly fails.


I'm yet to try rebooting with microcode package installed though (I'll 
soon check it and update on whether it helps, but even if it does, one 
without bootable system first won't get a chance to install it; I'm a 
bit curious how these changes did trigger this, given all these years it 
didn't happen to occur before)




Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-15 Thread debian-bug-report

> The first thing would be to provide the output of /proc/cpuinfo

Pasting below (please **NOTE** that "avx2" would normally be there, but 
is currently missing due to this kernel option `clearcpuid=293` with 
which I booted the PC now -- I can **100%** confirm "avx2" was there 
before, but don't want to reboot for now to remove this kernel flag):


# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x12
cpu MHz : 2394.664
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 
movbe popcnt xsave avx f16c rdrand lahf_lm abm cpuid_fault epb 
invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase 
tsc_adjust smep erms invpcid xsaveopt dtherm arat pln pts
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad 
ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid 
unrestricted_guest ple
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf mds swapgs itlb_multihit srbds

bogomips: 4789.10
clflush size: 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x12
cpu MHz : 2400.000
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 
movbe popcnt xsave avx f16c rdrand lahf_lm abm cpuid_fault epb 
invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase 
tsc_adjust smep erms invpcid xsaveopt dtherm arat pln pts
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad 
ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid 
unrestricted_guest ple
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf mds swapgs itlb_multihit srbds

bogomips: 4789.10
clflush size: 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x12
cpu MHz : 2400.000
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 
movbe popcnt xsave avx f16c rdrand lahf_lm abm cpuid_fault epb 
invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase 
tsc_adjust smep erms invpcid xsaveopt dtherm arat pln pts
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad 
ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid 
unrestricted_guest ple
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf mds swapgs itlb_multihit srbds

bogomips: 4789.10
clflush size: 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping: 3
microcode   : 0x12
cpu MHz : 2400.000
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
apicid  : 3
initial apicid  : 3
fpu : yes
fpu_exception   : yes
cpuid 

Bug#1019855: Fwd: libc6: immediately crashes with SIGILL on 4th gen Intel Core CPUs (seems related to AVX2 instructions), bricking the whole system

2022-09-14 Thread debian-bug-report

Package: libc6
Version: 2.31-13+deb11u4
Severity: critical

Dear Maintainer,

After an upgrade to version +deb11u4 on my system running Haswell
(4th gen Intel Core) CPU, most of the programs including bash or dpkg
are immediately crashing with SIGILL. The problem seems to be caused/
related to AVX2 and changes made to some functions utilizing this 
instruction set. I don't know much about Debian bug reporting, so 
forgive me any mistakes I've made.

The issue is on both host, LXC and Docker.
I have described more on this link:
https://github.com/debuerreotype/docker-debian-artifacts/issues/175
where I also linked my coredump from example program and described stuff 
more thoroughly.


Coredump link directly just in case: 
https://github.com/debuerreotype/docker-debian-artifacts/files/9569748/core.bash.10.2663c40e671041e6b40c882a70b83c3f.1480736.166318582400.zip


Also log lines from kernel:
kernel: [834669.721253] traps: dpkg[1455373] trap invalid opcode 
ip:7fa39701951d sp:7ffc4ad26e58 error:0 in libc-2.31.so[7fa396edd000+15a000]
kernel: [834669.732958] traps: dpkg[1455374] trap invalid opcode 
ip:7f529ca9551d sp:7fffb6f0a238 error:0 in libc-2.31.so[7f529c959000+15a000]
kernel: [834669.840128] traps: dpkg[1455375] trap invalid opcode 
ip:7f1874cc951d sp:7fffc2c2f5d8 error:0 in libc-2.31.so[7f1874b8d000+15a000]
kernel: [834669.907918] traps: dpkg[1455378] trap invalid opcode 
ip:7f3b4f8d851d sp:7fff3ec970f8 error:0 in libc-2.31.so[7f3b4f79c000+15a000]
kernel: [834712.152139] traps: passwd[1455693] trap invalid opcode 
ip:7fefee4b52b7 sp:7cb506b8 error:0 in libc-2.31.so[7fefee37d000+15a000]


Not sure what exactly might be causing the issue, but if these changes 
aren't pulled, potentially anyone with this or similar CPU as me will 
upgrade and end up with bricked system.
I will proceed to try using `clearcpuid=293` kernel flag myself, but 
consider how many distros depend on Debian, live CDs etc, with people 
unable to figure out why their system became useless, unable to trace 
the source, and blaming it just on Linux...


I'm filling this bug report from my downgraded host system to the 
previous libc6 version.


   * What led up to the situation? apt upgrade...
   * What exactly did you do (or not do) that was effective (or
 ineffective)? downgrade to +deb11u3
   * What was the outcome of this action? everything works on the older 
version

   * What outcome did you expect instead?


-- System Information:
Debian Release: 11.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 
'stable')

Architecture: amd64 (x86_64)

Kernel: Linux 5.15.39-1-pve (SMP w/4 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE 
not set

Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libc6 depends on:
ii  libcrypt1  1:4.4.18-4
ii  libgcc-s1  10.2.1-6

Versions of packages libc6 recommends:
ii  libidn2-0   2.3.0-5
pn  libnss-nis  
pn  libnss-nisplus  

Versions of packages libc6 suggests:
ii  debconf [debconf-2.0]  1.5.77
pn  glibc-doc  
ii  libc-l10n  2.31-13+deb11u3
ii  locales2.31-13+deb11u3

-- debconf information:
  glibc/disable-screensaver:
  glibc/restart-services:
  glibc/kernel-not-supported:
  glibc/kernel-too-old:
  libraries/restart-without-asking: false
  glibc/restart-failed:
  glibc/upgrade: true