[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux-mtk/5.15.0-1030.34 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-mtk' to 'verification-done-jammy- linux-mtk'. If the problem still exists, change the tag 'verification- needed-jammy-linux-mtk' to 'verification-failed-jammy-linux-mtk'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-mtk-v2 verification-needed-jammy-linux-mtk -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Released Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux-xilinx- zynqmp/5.15.0-1025.29 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-xilinx-zynqmp' to 'verification-done-jammy-linux-xilinx-zynqmp'. If the problem still exists, change the tag 'verification-needed-jammy-linux-xilinx-zynqmp' to 'verification-failed-jammy-linux-xilinx-zynqmp'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-xilinx-zynqmp-v2 verification-needed-jammy-linux-xilinx-zynqmp -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Released Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux-nvidia- tegra-5.15/5.15.0-1018.18~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux- nvidia-tegra-5.15' to 'verification-done-focal-linux-nvidia-tegra-5.15'. If the problem still exists, change the tag 'verification-needed-focal- linux-nvidia-tegra-5.15' to 'verification-failed-focal-linux-nvidia- tegra-5.15'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-focal-linux-nvidia-tegra-5.15-v2 verification-needed-focal-linux-nvidia-tegra-5.15 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Released Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux-nvidia-tegra- igx/5.15.0-1005.5 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra- igx' to 'verification-done-jammy-linux-nvidia-tegra-igx'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia- tegra-igx' to 'verification-failed-jammy-linux-nvidia-tegra-igx'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-nvidia-tegra-igx-v2 verification-needed-jammy-linux-nvidia-tegra-igx -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Released Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux- bluefield/5.15.0-1027.29 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-done-jammy-linux-bluefield'. If the problem still exists, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-failed-jammy-linux-bluefield'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-bluefield-v2 verification-needed-jammy-linux-bluefield -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Released Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux-raspi/5.15.0-1040.43 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-raspi' to 'verification-done-jammy- linux-raspi'. If the problem still exists, change the tag 'verification- needed-jammy-linux-raspi' to 'verification-failed-jammy-linux-raspi'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-raspi-v2 verification-needed-jammy-linux-raspi -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Released Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux-nvidia- tegra/5.15.0-1018.18 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to 'verification-done-jammy-linux-nvidia-tegra'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to 'verification-failed-jammy-linux-nvidia-tegra'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-nvidia-tegra-v2 verification-needed-jammy-linux-nvidia-tegra -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Released Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux-aws/5.15.0-1048.53 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws' to 'verification-done-jammy- linux-aws'. If the problem still exists, change the tag 'verification- needed-jammy-linux-aws' to 'verification-failed-jammy-linux-aws'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-aws-v2 verification-needed-jammy-linux-aws -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Released Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux-azure/5.15.0-1050.57 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-azure' to 'verification-done-jammy- linux-azure'. If the problem still exists, change the tag 'verification- needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-azure-v2 verification-needed-jammy-linux-azure -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Released Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug was fixed in the package linux - 5.15.0-86.96 --- linux (5.15.0-86.96) jammy; urgency=medium * jammy/linux: 5.15.0-86.96 -proposed tracker (LP: #2036575) * 5.15.0-85 live migration regression (LP: #2036675) - Revert "KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES" - Revert "x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0" * Regression for ubuntu_bpf test build on Jammy 5.15.0-85.95 (LP: #2035181) - selftests/bpf: fix static assert compilation issue for test_cls_*.c * `refcount_t: underflow; use-after-free.` on hidon w/ 5.15.0-85-generic (LP: #2034447) - crypto: rsa-pkcs1pad - Use helper to set reqsize linux (5.15.0-85.95) jammy; urgency=medium * jammy/linux: 5.15.0-85.95 -proposed tracker (LP: #2033821) * Please enable Renesas RZ platform serial installer (LP: #2022361) - [Config] enable hihope RZ/G2M serial console - [Config] Mark sh-sci as built-in * Request backport of xen timekeeping performance improvements (LP: #2033122) - x86/xen/time: prefer tsc as clocksource when it is invariant * kdump doesn't work with UEFI secure boot and kernel lockdown enabled on ARM64 (LP: #2033007) - [Config]: Enable CONFIG_KEXEC_IMAGE_VERIFY_SIG - kexec, KEYS: make the code in bzImage64_verify_sig generic - arm64: kexec_file: use more system keyrings to verify kernel image signature * ubuntu_kernel_selftests:net:vrf-xfrm-tests.sh: 8 failed test cases on jammy/fips (LP: #2019880) - selftests: net: vrf-xfrm-tests: change authentication and encryption algos * ubuntu_kernel_selftests:net:tls: 88 failed test cases on jammy/fips (LP: #2019868) - selftests/harness: allow tests to be skipped during setup - selftests: net: tls: check if FIPS mode is enabled * A general-proteciton exception during guest migration to unsupported PKRU machine (LP: 2032164, reverted) - x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 - KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES * CVE-2023-4569 - netfilter: nf_tables: deactivate catchall elements in next generation * CVE-2023-20569 - x86/cpu, kvm: Add support for CPUID_8021_EAX - x86/srso: Add a Speculative RAS Overflow mitigation - x86/srso: Add IBPB_BRTYPE support - x86/srso: Add SRSO_NO support - x86/srso: Add IBPB - x86/srso: Add IBPB on VMEXIT - x86/srso: Fix return thunks in generated code - x86/srso: Tie SBPB bit setting to microcode patch detection - x86: fix backwards merge of GDS/SRSO bit - x86/srso: Fix build breakage with the LLVM linker - x86/cpu: Fix __x86_return_thunk symbol type - x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk() - x86/alternative: Make custom return thunk unconditional - objtool: Add frame-pointer-specific function ignore - x86/ibt: Add ANNOTATE_NOENDBR - x86/cpu: Clean up SRSO return thunk mess - x86/cpu: Rename original retbleed methods - x86/cpu: Rename srso_(.*)_alias to srso_alias_\1 - x86/cpu: Cleanup the untrain mess - x86/srso: Explain the untraining sequences a bit more - x86/static_call: Fix __static_call_fixup() - x86/retpoline: Don't clobber RFLAGS during srso_safe_ret() - x86/srso: Disable the mitigation on unaffected configurations - x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG - objtool/x86: Fixup frame-pointer vs rethunk - x86/srso: Correct the mitigation status when SMT is disabled - objtool/x86: Fix SRSO mess - Ubuntu: [Config]: enable Speculative Return Stack Overflow mitigation * Fix unreliable ethernet cable detection on I219 NIC (LP: #2028122) - e1000e: Use PME poll to circumvent unreliable ACPI wake * Need to get fine-grained control for FAN(TFN) Participant. (LP: #2031333) - ACPI: fan: Separate file for attributes creation - ACPI: fan: Optimize struct acpi_fan_fif - ACPI: fan: Properly handle fine grain control - ACPI: fan: Add additional attributes for fine grain control * [SRU][Ubuntu 22.04.1] Unable to interpret the frequency values in cpuinfo_min_freq and cpuino_max_freq sysfs files. (LP: #2030924) - cpufreq: intel_pstate: Fix scaling for hybrid-capable * CVE-2023-40283 - Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb * CVE-2023-20588 - x86/bugs: Increase the x86 bugs vector size to two u32s - x86/CPU/AMD: Do not leak quotient data after a division by 0 - x86/CPU/AMD: Fix the DIV(0) initial fix attempt * CVE-2023-4194 - net: tun_chr_open(): set sk_uid from current_fsuid() - net: tap_open(): set sk_uid from current_fsuid() * CVE-2023-4155 - KVM: SEV: Refactor out sev_es_state struct - KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary - KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure - KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
Hi, is there a timeline on when this patch will reach the general availability kernel? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Committed Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) DistroRelease: Ubuntu 22.04 Ec2AMI: ami-0fbb Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-2 Ec2InstanceType: builder-cpu2-ram44-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M MachineType: OpenStack Foundation OpenStack Nova NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
No crashes observed with the proposed kernel. Changed the tag to 'verification-done-jammy-linux'. ** Tags removed: verification-needed-jammy-linux ** Tags added: verification-done-jammy-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Committed Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) DistroRelease: Ubuntu 22.04 Ec2AMI: ami-0fbb Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-2 Ec2InstanceType: builder-cpu2-ram44-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M MachineType: OpenStack Foundation OpenStack Nova NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
Currently, testing out the linux/5.15.0-85.95 version of kernel. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Committed Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) DistroRelease: Ubuntu 22.04 Ec2AMI: ami-0fbb Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-2 Ec2InstanceType: builder-cpu2-ram44-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M MachineType: OpenStack Foundation OpenStack Nova NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
This bug is awaiting verification that the linux/5.15.0-85.95 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux' to 'verification-done-jammy-linux'. If the problem still exists, change the tag 'verification-needed-jammy- linux' to 'verification-failed-jammy-linux'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-v2 verification-needed-jammy-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Committed Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) DistroRelease: Ubuntu
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
Right, fix-committed means it is applied to git and will be included in the 2023.09.04 SRU cycle. The fix for 5.15 is to change the default of tdp_mmu to off (like you did for testing). Changing the deployments like you did would be the work-around in the mean-time. The parent Ubuntu state refers to current development (Mantic right now). This should be fixed. This would leave Lunar (6.2). That should at least contain improvements to leave this enabled by default. And I read Thadeu's comment as he was not able to reproduce with 6.2 (to me it sounded like without changing the value there, but it is a bit ambigous). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Committed Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) DistroRelease: Ubuntu 22.04 Ec2AMI: ami-0fbb Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-2
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
We are still seeing this issue on 5.15.0-82-generic for 22.04 (Jammy) Since the (Ubuntu Jammy) is on Fix Committed and not Fix Released, I would assume this is normal right? And the Fix Released status on (Ubuntu) means the bug is not present on other Ubuntu versions? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Committed Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) DistroRelease: Ubuntu 22.04 Ec2AMI: ami-0fbb Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-2 Ec2InstanceType: builder-cpu2-ram44-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M MachineType: OpenStack Foundation
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
** Changed in: linux (Ubuntu Jammy) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Jammy: Fix Committed Bug description: Impact: We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. Fix: The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). Regression potential: VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. Testcase: Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) DistroRelease: Ubuntu 22.04 Ec2AMI: ami-0fbb Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-2 Ec2InstanceType: builder-cpu2-ram44-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M MachineType: OpenStack Foundation OpenStack Nova NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user)
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
** Changed in: linux (Ubuntu) Status: Confirmed => Fix Released ** Changed in: linux (Ubuntu Jammy) Status: Triaged => In Progress ** Changed in: linux (Ubuntu Jammy) Assignee: (unassigned) => Stefan Bader (smb) ** Description changed: + Impact: + We had reports of VM setups which would show intermediate crashes and after that locking up completely. This could be reproduced with large memory setups. + The problem seems to be that fixes to performance regressions caused more problems in 5.15 kernels and the full fixes are too intrusive to be backported. + + Fix: + The following patch was recently sent to the upstream stable mailing list and looks to be making its way into linux-5.15.y. This changes the default value of kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed back in config). + + Regression potential: + VM hosts with many large memory tennants might see a performance impact which the TDP MMU approach tried to solve. If those did not see other problems they might turn this on again. + + Testcase: + Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large second level guest (32GB memory). Repeatedly starting and stopping the 2nd level guest. + + + --- original description --- The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) - --- + --- ProblemType: Bug AlsaDevices: - total 0 - crw-rw 1 root audio 116, 1 Aug 23 03:23 seq - crw-rw 1 root audio 116, 33 Aug 23 03:23 timer + total 0 + crw-rw 1 root audio 116, 1 Aug 23 03:23 seq + crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) DistroRelease: Ubuntu 22.04 Ec2AMI: ami-0fbb Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-2 Ec2InstanceType: builder-cpu2-ram44-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M MachineType: OpenStack Foundation OpenStack Nova NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: - + ProcEnviron: - TERM=xterm-256color - PATH=(custom, no user) - LANG=C.UTF-8 - SHELL=/bin/bash + TERM=xterm-256color +
[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1
Confirm the instances with tdp_mmu=0 does not seem to crash. Had 5 instances running for 4 days. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2032176 Title: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1 Status in linux package in Ubuntu: Confirmed Status in linux source package in Jammy: Triaged Bug description: The crash occurred on a juju machine, and the juju agent was lost. The juju machine is on an openstack instance provision by juju. The openstack console log indicts the it is related to spin_lock and KVM MMU: [418200.348830] ? _raw_spin_lock+0x22/0x30 [418200.349588] _raw_write_lock+0x20/0x30 [418200.350196] kvm_tdp_mmu_map+0x2b1/0x490 [kvm] [418200.351014] kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm] [418200.351796] direct_page_fault+0x206/0x310 [kvm] [418200.352667] __mmu_notifier_invalidate_range_start+0x91/0x1b0 [418200.353624] kvm_tdp_page_fault+0x72/0x90 [kvm] [418200.354496] try_to_migrate_one+0x691/0x730 [418200.355436] kvm_mmu_page_fault+0x73/0x1c0 [kvm] openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/ syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/ The syslog was rotated after the crash occurred, so the syslog at the time of the initial crash was lost. Other juju machine with 5.15.0.79.76 kernel seems to have the same issues. We previously have a similar issue with 5.15.0-73. The juju machine crashed with raw_spin_lock and kvm mmu in the logs as well: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229 ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1 ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17 Uname: Linux 5.19.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 21 08:59:46 2023 Ec2AMI: ami-0c61 Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-1 Ec2InstanceType: builder-cpu4-ram72-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-hwe-5.19 UpgradeStatus: No upgrade log present (probably fresh install) --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Aug 23 03:23 seq crw-rw 1 root audio 116, 33 Aug 23 03:23 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudID: openstack CloudName: openstack CloudPlatform: openstack CloudSubPlatform: metadata (http://169.254.169.254) DistroRelease: Ubuntu 22.04 Ec2AMI: ami-0fbb Ec2AMIManifest: FIXME Ec2AvailabilityZone: availability-zone-2 Ec2InstanceType: builder-cpu2-ram44-disk20 Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M MachineType: OpenStack Foundation OpenStack Nova NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=C.UTF-8 SHELL=/bin/bash ProcFB: 0 qxldrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-83-generic root=UUID=a6de04b8-3631-4ce4-bb96-48076f4a56bf ro console=tty1 console=ttyS0 ProcVersionSignature: Ubuntu 5.15.0-83.92-generic 5.15.116 RelatedPackageVersions: linux-restricted-modules-5.15.0-83-generic N/A linux-backports-modules-5.15.0-83-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.17 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: jammy ec2-images Uname: Linux 5.15.0-83-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: N/A _MarkForUpload: True dmi.bios.date: 04/01/2014 dmi.bios.release: 0.0 dmi.bios.vendor: SeaBIOS dmi.bios.version: 1.13.0-1ubuntu1.1 dmi.chassis.type: 1 dmi.chassis.vendor: QEMU dmi.chassis.version: pc-i440fx-4.2 dmi.modalias: