Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
On Wed, 2023-08-09 at 13:35 +0200, kolafl...@kolahilft.de wrote: > I was able to resolve the issue with my ASRock J4105-ITX board (Celeron > J4105) on Debian-12. I just had to >apt install intel-microcode > and reboot. After 4 weeks of running the system 23/7 I had no further > crashes. > > Without that intel-microcode package >/sys/devices/system/cpu/cpu*/microcode/version > is 0x26 (38 decimal). > And with intel-microcode-3.20230512.1 the version is 0x3e (decimal 62). > > > I'm not sure why intel-microcode was not installed by default. All my > other Debian computers have that package installed automatically. [...] We didn't use to install intel-microcode by default because it's non- free. Starting with Debian 12, non-free firmware and microcode are installed on systems where they are useful, but upgrades from older versions won't change this. But I'm fairly sure what you've found is a different issue from what Olivier originally reported. Ben. -- Ben Hutchings [W]e found...that it wasn't as easy to get programs right as we had thought. I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs. - Maurice Wilkes, 1949 signature.asc Description: This is a digitally signed message part
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
I was able to resolve the issue with my ASRock J4105-ITX board (Celeron J4105) on Debian-12. I just had to apt install intel-microcode and reboot. After 4 weeks of running the system 23/7 I had no further crashes. Without that intel-microcode package /sys/devices/system/cpu/cpu*/microcode/version is 0x26 (38 decimal). And with intel-microcode-3.20230512.1 the version is 0x3e (decimal 62). I'm not sure why intel-microcode was not installed by default. All my other Debian computers have that package installed automatically. I had some very rare crashes with Debian-11. So Debian-12 maybe just worsened the issue and it already existed before updating to Debian-12. In theory a BIOS update might also do the job. I run BIOS 1.40 and there is a 1.60 version from 2021. But 1.60 does not list a microcode update. https://www.asrock.com/mb/Intel/J4105-ITX/#BIOS (haven't updated to BIOS 1.60 yet) Kind regards, kolAflash OpenPGP_0xEA831012D83C3408.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
Le Mon, Jun 12, 2023 at 12:49:25PM +0200, Olivier Berger a écrit : > Hi. > > I can confirm the reproduction of the same kind of crash, this time without > wifi activated. > > It seems to occur whenever I'm away from the machine for a while, probably > linked to screen saving condition. > For the records, the video card is reported as "00:02.0 VGA compatible controller: Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] (rev 01)", offering a HDMI port (on Dell HP ProBook laptop), on which I connect a screen via an HDMI to Display Port adapter. I suspect some kind of weird corner case linked to that adapter, which is the "HDMI to DisplayPort Adapter - 4K Ready" from Cable Matters (https://www.cablematters.com/pc-825-139-hdmi-to-displayport-adapter-4k-ready.aspx ) Just my 2 more cents, -- Olivier BERGER https://www-public.imtbs-tsp.eu/~berger_o/ - OpenPGP 2048R/0xF9EAE3A65819D7E8 Ingenieur Recherche - Dept INF Institut Mines-Telecom, Telecom SudParis, Evry (France)
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
Hi. I can confirm the reproduction of the same kind of crash, this time without wifi activated. It seems to occur whenever I'm away from the machine for a while, probably linked to screen saving condition. Hope this helps, Le Fri, Jun 09, 2023 at 08:58:52AM +0200, Olivier Berger a écrit : > > As a followup, I've been able to get another crash, this time when netconsole > was on, and got a bunch of traces, in the attached logs. > > Hope this helps identify the culprit... probably i915/drm ? > > The title of the bug report should be changed, but I'm not sure how best to > retitle. > -- Olivier BERGER https://www-public.imtbs-tsp.eu/~berger_o/ - OpenPGP 2048R/0xF9EAE3A65819D7E8 Ingenieur Recherche - Dept INF Institut Mines-Telecom, Telecom SudParis, Evry (France) [ 1192.922330] netpoll: netconsole: local port [ 1192.922341] netpoll: netconsole: local IPv4 address 192.168.1.32 [ 1192.922345] netpoll: netconsole: interface 'enp2s0' [ 1192.922347] netpoll: netconsole: remote port [ 1192.922350] netpoll: netconsole: remote IPv4 address 192.168.1.25 [ 1192.922352] netpoll: netconsole: remote ethernet address 38:2c:4a:b1:63:94 [ 1192.922461] printk: console [netcon0] enabled [ 1192.922468] netconsole: network logging started [ 1793.154776] mce: CPU#1: Unexpected int18 (Machine Check) [ 1793.154809] mce: CPU#5: Unexpected int18 (Machine Check) [ 1794.400586] [ cut here ] [ 1794.400600] DPLL 0 assertion failure (expected on, current off) [ 1794.400763] WARNING: CPU: 1 PID: 1163 at drivers/gpu/drm/i915/display/intel_dpll_mgr.c:191 assert_shared_dpll+0x10a/0x120 [i915] [ 1794.400977] Modules linked in: netconsole xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ctr ccm rfcomm snd_seq_dummy snd_hrtimer snd_seq cmac algif_hash algif_skcipher af_alg qrtr cpufreq_ondemand cpufreq_conservative cpufreq_powersave overlay squashfs cpufreq_userspace bnep binfmt_misc snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp iwlmvm snd_sof snd_sof_utils snd_soc_hdac_hda x86_pkg_temp_thermal snd_hda_ext_core intel_powerclamp snd_soc_acpi_intel_match coretemp mac80211 snd_soc_acpi snd_soc_core [ 1794.401079] mei_hdcp snd_compress soundwire_bus intel_rapl_msr btusb libarc4 btrtl pmt_telemetry pmt_class btbcm btintel btmtk bluetooth kvm_intel snd_hda_intel snd_intel_dspcfg iwlwifi kvm uvcvideo jitterentropy_rng snd_intel_sdw_acpi irqbypass cfg80211 snd_usb_audio videobuf2_vmalloc snd_hda_codec videobuf2_memops drbg snd_usbmidi_lib videobuf2_v4l2 ansi_cprng snd_hda_core hp_wmi processor_thermal_device_pci_legacy rapl snd_rawmidi processor_thermal_device videobuf2_common nls_ascii platform_profile ecdh_generic snd_hwdep iTCO_wdt snd_seq_device processor_thermal_rfim intel_cstate ucsi_acpi intel_uncore snd_pcm videodev snd_timer typec_ucsi pcspkr nls_cp437 processor_thermal_mbox processor_thermal_rapl intel_pmc_bxt vfat mei_me snd roles intel_rapl_common iTCO_vendor_support fat wmi_bmof ee1004 mc int3403_thermal watchdog soundcore ecc mei rfkill intel_vsec typec joydev igen6_edac intel_soc_dts_iosf int340x_thermal_zone ac intel_hid int3400_thermal intel_pmc_core acpi_thermal_rel [ 1794.401185] sparse_keymap acpi_pad hid_multitouch evdev serio_raw nfsd auth_rpcgss msr parport_pc nfs_acl ppdev lockd lp grace parport fuse loop dm_mod efi_pstore configfs sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 hid_logitech_hidpp hid_logitech_dj usbhid btrfs blake2b_generic zstd_compress efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod i915 nvme drm_buddy crc32_pclmul i2c_algo_bit crc32c_intel nvme_core drm_display_helper t10_pi hid_generic xhci_pci ghash_clmulni_intel xhci_hcd cec crc64_rocksoft_generic rc_core rtsx_pci_sdmmc crc64_rocksoft crc_t10dif ttm crct10dif_generic mmc_core usbcore i2c_i801 drm_kms_helper r8169 intel_lpss_pci intel_lpss i2c_hid_acpi realtek i2c_hid crct10dif_pclmul crc64 mdio_devres aesni_intel drm crypto_simd cryptd libphy rtsx_pci i2c_smbus crct10dif_common idma64 vmd usb_common battery hid video wmi button sha512_ssse3 [ 1794.401312] sha512_generic [ 1794.401328] CPU: 1 PID: 1163 Comm: Xorg Tainted: G OE 6.1.0-9-amd64 #1 Debian 6.1.27-1 [ 1794.401339] Hardware name: HP HP ProBook 450 G8 Notebook PC/87E1, BIOS T70 Ver. 01.13.01 03/30/2023 [ 1794.401346] RIP: 0010:assert_shared_dpll+0x10a/0x120 [i915] [ 1794.401579] Code: ed 48
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
Hi. As a followup, I've been able to get another crash, this time when netconsole was on, and got a bunch of traces, in the attached logs. Hope this helps identify the culprit... probably i915/drm ? The title of the bug report should be changed, but I'm not sure how best to retitle. Best regards, Le Wed, May 24, 2023 at 01:35:31PM +0200, Olivier Berger a écrit : > The i915 hint is interesting. > > Salvatore Bonaccorso writes: > > > > > Would you be able to bisect the changes between 6.1.20 and 6.1.27 to > > identify the culprit, though not instantntly triggerable? Maybe > > focusing around the i915 changes, I stumpled over a2b6e99d8a62 > > ("drm/i915: Disable DC states for all commits") which was backported > > to 6.1.23. > > -- Olivier BERGER https://www-public.imtbs-tsp.eu/~berger_o/ - OpenPGP 2048R/0xF9EAE3A65819D7E8 Ingenieur Recherche - Dept INF Institut Mines-Telecom, Telecom SudParis, Evry (France) [ 118.158855] netpoll: netconsole: local port [ 118.158865] netpoll: netconsole: local IPv4 address 192.168.0.35 [ 118.158870] netpoll: netconsole: interface 'wlp0s20f3' [ 118.158872] netpoll: netconsole: remote port [ 118.158874] netpoll: netconsole: remote IPv4 address 192.168.0.47 [ 118.158877] netpoll: netconsole: remote ethernet address 38:2c:4a:b1:63:94 [ 118.159010] [ cut here ] [ 118.159012] WARNING: CPU: 3 PID: 3290 at net/mac80211/tx.c:3723 ieee80211_tx_dequeue+0xcb3/0xd30 [mac80211] [ 118.159102] Modules linked in: netconsole(+) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ctr ccm rfcomm snd_seq_dummy snd_hrtimer snd_seq cmac algif_hash algif_skcipher af_alg squashfs cpufreq_ondemand qrtr cpufreq_conservative cpufreq_powersave overlay bnep cpufreq_userspace hid_logitech_hidpp binfmt_misc nls_ascii nls_cp437 vfat fat snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_hda_codec_hdmi hid_logitech_dj snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_soc_dmic snd_sof_pci_intel_tgl iwlmvm snd_sof_intel_hda_common btusb btrtl btbcm btintel soundwire_intel btmtk soundwire_generic_allocation mac80211 soundwire_cadence snd_sof_intel_hda snd_sof_pci bluetooth snd_sof_xtensa_dsp snd_sof snd_usb_audio snd_sof_utils [ 118.159163] snd_soc_hdac_hda libarc4 snd_hda_ext_core snd_soc_acpi_intel_match snd_usbmidi_lib snd_soc_acpi snd_rawmidi x86_pkg_temp_thermal intel_powerclamp usbhid snd_seq_device snd_soc_core iwlwifi coretemp snd_compress jitterentropy_rng soundwire_bus joydev drbg snd_hda_intel mei_hdcp kvm_intel snd_intel_dspcfg snd_intel_sdw_acpi pmt_telemetry snd_hda_codec intel_rapl_msr pmt_class ansi_cprng uvcvideo cfg80211 kvm snd_hda_core hp_wmi videobuf2_vmalloc snd_hwdep platform_profile irqbypass ecdh_generic videobuf2_memops videobuf2_v4l2 snd_pcm processor_thermal_device_pci_legacy processor_thermal_device rapl processor_thermal_rfim videobuf2_common processor_thermal_mbox snd_timer iTCO_wdt intel_cstate processor_thermal_rapl ucsi_acpi intel_uncore videodev typec_ucsi intel_pmc_bxt snd iTCO_vendor_support roles mei_me intel_rapl_common mc pcspkr ecc wmi_bmof ee1004 watchdog soundcore mei rfkill intel_vsec igen6_edac typec intel_soc_dts_iosf int3403_thermal int340x_thermal_zone [ 118.159218] int3400_thermal acpi_thermal_rel intel_hid sparse_keymap intel_pmc_core acpi_pad ac hid_multitouch serio_raw evdev nfsd msr parport_pc auth_rpcgss ppdev nfs_acl lockd lp grace parport fuse loop dm_mod efi_pstore configfs sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic zstd_compress efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod i915 drm_buddy i2c_algo_bit crc32_pclmul drm_display_helper nvme crc32c_intel nvme_core cec hid_generic rc_core rtsx_pci_sdmmc t10_pi ghash_clmulni_intel mmc_core ttm i2c_hid_acpi crc64_rocksoft_generic crc64_rocksoft drm_kms_helper r8169 crc_t10dif realtek xhci_pci mdio_devres crct10dif_generic i2c_hid aesni_intel xhci_hcd intel_lpss_pci crct10dif_pclmul crypto_simd i2c_i801 intel_lpss crc64 cryptd drm usbcore i2c_smbus libphy rtsx_pci crct10dif_common idma64 usb_common vmd hid battery video wmi button [ 118.159289] sha512_ssse3 sha512_generic [ 118.159292] CPU: 3 PID: 3290 Comm: modprobe Tainted: G OE 6.1.0-9-amd64 #1 Debian 6.1.27-1 [ 118.159298] Hardware name: HP HP ProBook 450 G8 Notebook PC/87E1, BIOS T70 Ver. 01.13.01 03/30/2023 [ 118.159300] RIP: 0010:ieee80211_tx_dequeue+0xcb3/0xd30 [mac80211] [ 118.159374] Code: ff ff 01 ce 48 89 ef 29 d6 e8 09 ab 35 d5 48 85 c0 0f 84 23 f8 ff ff 0f b7 85 b8 00 00 00 48 03 85 c8 00 00 00 e9 f5 f7 ff ff <0f> 0b e9 ab f3 ff ff
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
Hi. I'm afraid this would be much beyond my capacity, sorry. The i915 hint is interesting. Thanks. Best regards, Salvatore Bonaccorso writes: > Control: tags -1 + moreinfo > > Hi Olivier, > > > Would you be able to bisect the changes between 6.1.20 and 6.1.27 to > identify the culprit, though not instantntly triggerable? Maybe > focusing around the i915 changes, I stumpled over a2b6e99d8a62 > ("drm/i915: Disable DC states for all commits") which was backported > to 6.1.23. > > Regards, > Salvatore > -- Olivier BERGER https://www-public.imtbs-tsp.eu/~berger_o/ - OpenPGP 2048R/0xF9EAE3A65819D7E8 Ingenieur Recherche - Dept INF Institut Mines-Telecom, Telecom SudParis, Evry (France)
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
Control: tags -1 + moreinfo Hi Olivier, On Tue, May 23, 2023 at 06:49:00PM +0200, Olivier Berger wrote: > Package: src:linux > Version: 6.1.27-1 > Severity: normal > > Hi. > > I'm experiencing crashes (computer reset or completely shutting down) without > much details available on why. It used to work fine with 6.1.0-7 but has had > problems with the 2 later updates of the testing kernel. > > I've managed to get a log of the kernel panic with netconsole (otherwise > wouldn't get any hints whatsoever in logs on disks after restarting), bellow. > > I guess this is nasty as being close to the freeze. I've had the issue for a > few days now, but only managed to test a netconsole remote log today. > > It seems to me that the crash mainly happen when I'm away from the laptop for > several minutes, so maybe related to some kind of energy saving stuff... > > Hope this provides enough details to help. > > [ 394.735702] netpoll: netconsole: local port > [ 394.735711] netpoll: netconsole: local IPv4 address 192.168.0.23 > [ 394.735715] netpoll: netconsole: interface 'enp2s0' > [ 394.735717] netpoll: netconsole: remote port > [ 394.735719] netpoll: netconsole: remote IPv4 address 192.168.0.47 > [ 394.735722] netpoll: netconsole: remote ethernet address 38:2c:4a:b1:63:94 > [ 394.735819] printk: console [netcon0] enabled > [ 394.735825] netconsole: network logging started > [ 463.655009] usb 3-6: new high-speed USB device number 8 using xhci_hcd > [ 463.659448] systemd-journald[428]: Sent WATCHDOG=1 notification. > [ 463.943099] usb 3-6: New USB device found, idVendor=1307, idProduct=0190, > bcdDevice= 1.00 > [ 463.943133] usb 3-6: New USB device strings: Mfr=1, Product=2, > SerialNumber=3 > [ 463.943144] usb 3-6: Product: USB Mass Storage Device > [ 463.943153] usb 3-6: Manufacturer: USBest Technology > [ 463.943160] usb 3-6: SerialNumber: 00027F > [ 463.974560] systemd-journald[428]: Successfully sent stream file > descriptor to service manager. > [ 463.974717] systemd-journald[428]: Successfully sent stream file > descriptor to service manager. > [ 463.987184] SCSI subsystem initialized > [ 463.990687] usb-storage 3-6:1.0: USB Mass Storage device detected > [ 463.990771] scsi host0: usb-storage 3-6:1.0 > [ 463.990859] usbcore: registered new interface driver usb-storage > [ 463.992482] usbcore: registered new interface driver uas > [ 464.995952] scsi 0:0:0:0: Direct-Access Ut190USB2FlashStorage 0.00 > PQ: 0 ANSI: 2 > [ 464.996613] scsi 0:0:0:1: Direct-Access Ut190SD0StorageDevice 0.00 > PQ: 0 ANSI: 2 > [ 465.008300] scsi 0:0:0:0: Attached scsi generic sg0 type 0 > [ 465.008343] scsi 0:0:0:1: Attached scsi generic sg1 type 0 > [ 465.014353] sd 0:0:0:0: [sda] 7897088 512-byte logical blocks: (4.04 > GB/3.77 GiB) > [ 465.014619] sd 0:0:0:1: [sdb] Media removed, stopped polling > [ 465.014756] sd 0:0:0:0: [sda] Write Protect is off > [ 465.014764] sd 0:0:0:0: [sda] Mode Sense: 00 00 00 00 > [ 465.014804] sd 0:0:0:1: [sdb] Attached SCSI removable disk > [ 465.014951] sd 0:0:0:0: [sda] Asking for cache data failed > [ 465.014957] sd 0:0:0:0: [sda] Assuming drive cache: write through > [ 465.284600] GPT:Primary header thinks Alt. header is not at the end of the > disk. > [ 465.284627] GPT:2590719 != 7897087 > [ 465.284634] GPT:Alternate GPT header not at the end of the disk. > [ 465.284640] GPT:2590719 != 7897087 > [ 465.284645] GPT: Use GNU Parted to correct GPT errors. > [ 465.284659] sda: sda1 > [ 465.285144] sd 0:0:0:0: [sda] Attached SCSI removable disk > [ 474.111368] systemd-journald[428]: Successfully sent stream file > descriptor to service manager. > [ 497.264500] sda: detected capacity change from 7897088 to 0 > [ 502.045711] usb 3-6: USB disconnect, device number 8 > [ 519.695345] systemd-journald[428]: Successfully sent stream file > descriptor to service manager. > [ 535.857315] EXT4-fs (dm-0): recovery complete > [ 535.858056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. > Quota mode: none. > [ 543.576681] systemd-journald[428]: Sent WATCHDOG=1 notification. > [ 551.263395] systemd-journald[428]: Successfully sent stream file > descriptor to service manager. > [ 634.375963] systemd-journald[428]: Sent WATCHDOG=1 notification. > [ 725.578095] systemd-journald[428]: Sent WATCHDOG=1 notification. > [ 845.577721] systemd-journald[428]: Sent WATCHDOG=1 notification. > [ 871.117193] systemd-journald[428]: Successfully sent stream file > descriptor to service manager. > [ 905.577391] systemd-journald[428]: Sent WATCHDOG=1 notification. > [ 905.620289] systemd-journald[428]: Successfully sent stream file > descriptor to service manager. > [ 905.623541] systemd-journald[428]: Successfully sent stream file > descriptor to service manager. > [ 995.577111] systemd-journald[428]: Sent WATCHDOG=1 notification. > [ 1085.576193] systemd-journald[428]: Sent WATCHDOG=1 notification. > [
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
Hi. Diederik de Haas writes: > > The stack traces should be useful for someone who understands those (which > isn't me), but I did notice several other items: > > - [ 465.284645] GPT: Use GNU Parted to correct GPT errors > That happened after you plugged in an USB drive? > I would follow that advice, but it would be useful to get that USB drive > 'out of the equation'. > Does the issue also occur when that USB drive isn't used? > The kernel seems to assign both sda and sdb before settling on sda(1)? > Not sure what to make of that, but it doesn't look good > I guess the USB drive has nothing to do with the issue, AFAIU. Actually, I just wanted to be sure that netconsole was indeed capturing kernel events, as suggested by a howto on remote debugging of kernel panics with netconsole. And FYI, this is a USB key that embeds a SD card reader, hence the 2 drives that popup... as for GPT, dunno, maybe a formatting mistake. In any case, the laptop crashed in the past whenever no such USB key was being plugged. > - [ 535.857315] EXT4-fs (dm-0): recovery complete > I can understand a FS recovery when you're dealing with a freeze/crash, > but I find the timing a 'bit' unusual. After 9.5 minutes, I doubt it's the > primary/boot drive (and we had the USB drive before that), so where > is that coming from? > Thats a LUKS partition being mounted after a while by me, for secrets stored on the hard drive in a dedicated partition. As the laptop crashed in the previous execution with the partition mounted, it explains the FS recovery at mount time. Nothing strange here either. > - [ 543.576681] systemd-journald[428]: Sent WATCHDOG=1 notification > I'm not really sure what that means, but afaik a watchdog is used to > (automatically) reboot the machine if the system hangs. > So seeing that message numerous times, is worrisome. And it looks like it > doesn't do its actual job? > I booted with 'debug ignore_loglevel' as kernel arguments... maybe that explains the occurence of such logs... dunno exactly if this is worrysome. > - BIOS T70 Ver. 01.13.01 03/30/2023 > Can you check whether there is a newer BIOS version available? > I believe 'NMI' is BIOS related, so it may have an effect. I just updated the HP BIOS to the latest available the last day, but crashes were occuring before too... maybe related, but nothing can be updated more for the moment, at least from what the Windows HP Support Assistant can show. Thanks for your help. Best regards, -- Olivier BERGER https://www-public.imtbs-tsp.eu/~berger_o/ - OpenPGP 2048R/0xF9EAE3A65819D7E8 Ingenieur Recherche - Dept INF Institut Mines-Telecom, Telecom SudParis, Evry (France)
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
Control: found -1 6.1.25-1 Control: retitle -1 Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler On Tuesday, 23 May 2023 18:49:00 CEST Olivier Berger wrote: > It used to work fine with 6.1.0-7 but has had problems with the 2 later > updates of the testing kernel. The stack traces should be useful for someone who understands those (which isn't me), but I did notice several other items: - [ 465.284645] GPT: Use GNU Parted to correct GPT errors That happened after you plugged in an USB drive? I would follow that advice, but it would be useful to get that USB drive 'out of the equation'. Does the issue also occur when that USB drive isn't used? The kernel seems to assign both sda and sdb before settling on sda(1)? Not sure what to make of that, but it doesn't look good - [ 535.857315] EXT4-fs (dm-0): recovery complete I can understand a FS recovery when you're dealing with a freeze/crash, but I find the timing a 'bit' unusual. After 9.5 minutes, I doubt it's the primary/boot drive (and we had the USB drive before that), so where is that coming from? - [ 543.576681] systemd-journald[428]: Sent WATCHDOG=1 notification I'm not really sure what that means, but afaik a watchdog is used to (automatically) reboot the machine if the system hangs. So seeing that message numerous times, is worrisome. And it looks like it doesn't do its actual job? - BIOS T70 Ver. 01.13.01 03/30/2023 Can you check whether there is a newer BIOS version available? I believe 'NMI' is BIOS related, so it may have an effect. signature.asc Description: This is a digitally signed message part.
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
Hi. Just in order to provide a bit more useful hints, maybe, the latest version working fine is linux-image-6.1.0-7-amd64 as 6.1.20-2. Sorry about the lack of clarity in the initial report. Le Tue, May 23, 2023 at 06:49:00PM +0200, Olivier Berger a écrit : > > I'm experiencing crashes (computer reset or completely shutting down) without > much details available on why. It used to work fine with 6.1.0-7 but has had > problems with the 2 later updates of the testing kernel. > > I've managed to get a log of the kernel panic with netconsole (otherwise > wouldn't get any hints whatsoever in logs on disks after restarting), bellow. > > I guess this is nasty as being close to the freeze. I've had the issue for a > few days now, but only managed to test a netconsole remote log today. > > It seems to me that the crash mainly happen when I'm away from the laptop for > several minutes, so maybe related to some kind of energy saving stuff... > > Hope this provides enough details to help. > -- Olivier BERGER https://www-public.imtbs-tsp.eu/~berger_o/ - OpenPGP 2048R/0xF9EAE3A65819D7E8 Ingenieur Recherche - Dept INF Institut Mines-Telecom, Telecom SudParis, Evry (France)
Bug#1036644: linux-image-6.1.0-9-amd64: System crashes. Netconsole reports CPUs not responding to MCE broadcast
Package: src:linux Version: 6.1.27-1 Severity: normal Hi. I'm experiencing crashes (computer reset or completely shutting down) without much details available on why. It used to work fine with 6.1.0-7 but has had problems with the 2 later updates of the testing kernel. I've managed to get a log of the kernel panic with netconsole (otherwise wouldn't get any hints whatsoever in logs on disks after restarting), bellow. I guess this is nasty as being close to the freeze. I've had the issue for a few days now, but only managed to test a netconsole remote log today. It seems to me that the crash mainly happen when I'm away from the laptop for several minutes, so maybe related to some kind of energy saving stuff... Hope this provides enough details to help. [ 394.735702] netpoll: netconsole: local port [ 394.735711] netpoll: netconsole: local IPv4 address 192.168.0.23 [ 394.735715] netpoll: netconsole: interface 'enp2s0' [ 394.735717] netpoll: netconsole: remote port [ 394.735719] netpoll: netconsole: remote IPv4 address 192.168.0.47 [ 394.735722] netpoll: netconsole: remote ethernet address 38:2c:4a:b1:63:94 [ 394.735819] printk: console [netcon0] enabled [ 394.735825] netconsole: network logging started [ 463.655009] usb 3-6: new high-speed USB device number 8 using xhci_hcd [ 463.659448] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 463.943099] usb 3-6: New USB device found, idVendor=1307, idProduct=0190, bcdDevice= 1.00 [ 463.943133] usb 3-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 463.943144] usb 3-6: Product: USB Mass Storage Device [ 463.943153] usb 3-6: Manufacturer: USBest Technology [ 463.943160] usb 3-6: SerialNumber: 00027F [ 463.974560] systemd-journald[428]: Successfully sent stream file descriptor to service manager. [ 463.974717] systemd-journald[428]: Successfully sent stream file descriptor to service manager. [ 463.987184] SCSI subsystem initialized [ 463.990687] usb-storage 3-6:1.0: USB Mass Storage device detected [ 463.990771] scsi host0: usb-storage 3-6:1.0 [ 463.990859] usbcore: registered new interface driver usb-storage [ 463.992482] usbcore: registered new interface driver uas [ 464.995952] scsi 0:0:0:0: Direct-Access Ut190USB2FlashStorage 0.00 PQ: 0 ANSI: 2 [ 464.996613] scsi 0:0:0:1: Direct-Access Ut190SD0StorageDevice 0.00 PQ: 0 ANSI: 2 [ 465.008300] scsi 0:0:0:0: Attached scsi generic sg0 type 0 [ 465.008343] scsi 0:0:0:1: Attached scsi generic sg1 type 0 [ 465.014353] sd 0:0:0:0: [sda] 7897088 512-byte logical blocks: (4.04 GB/3.77 GiB) [ 465.014619] sd 0:0:0:1: [sdb] Media removed, stopped polling [ 465.014756] sd 0:0:0:0: [sda] Write Protect is off [ 465.014764] sd 0:0:0:0: [sda] Mode Sense: 00 00 00 00 [ 465.014804] sd 0:0:0:1: [sdb] Attached SCSI removable disk [ 465.014951] sd 0:0:0:0: [sda] Asking for cache data failed [ 465.014957] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 465.284600] GPT:Primary header thinks Alt. header is not at the end of the disk. [ 465.284627] GPT:2590719 != 7897087 [ 465.284634] GPT:Alternate GPT header not at the end of the disk. [ 465.284640] GPT:2590719 != 7897087 [ 465.284645] GPT: Use GNU Parted to correct GPT errors. [ 465.284659] sda: sda1 [ 465.285144] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 474.111368] systemd-journald[428]: Successfully sent stream file descriptor to service manager. [ 497.264500] sda: detected capacity change from 7897088 to 0 [ 502.045711] usb 3-6: USB disconnect, device number 8 [ 519.695345] systemd-journald[428]: Successfully sent stream file descriptor to service manager. [ 535.857315] EXT4-fs (dm-0): recovery complete [ 535.858056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none. [ 543.576681] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 551.263395] systemd-journald[428]: Successfully sent stream file descriptor to service manager. [ 634.375963] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 725.578095] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 845.577721] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 871.117193] systemd-journald[428]: Successfully sent stream file descriptor to service manager. [ 905.577391] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 905.620289] systemd-journald[428]: Successfully sent stream file descriptor to service manager. [ 905.623541] systemd-journald[428]: Successfully sent stream file descriptor to service manager. [ 995.577111] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 1085.576193] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 1205.575316] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 1265.574866] systemd-journald[428]: Sent WATCHDOG=1 notification. [ 1305.267119] mce: CPUs not responding to MCE broadcast (may include false positives): 0-1,3-5,7 [ 1305.267121] mce: CPUs not responding to MCE broadcast (may include