[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2024-02-29 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-mtk/5.15.0-1030.34
kernel in -proposed solves the problem. Please test the kernel and
update this bug with the results. If the problem is solved, change the
tag 'verification-needed-jammy-linux-mtk' to 'verification-done-jammy-
linux-mtk'. If the problem still exists, change the tag 'verification-
needed-jammy-linux-mtk' to 'verification-failed-jammy-linux-mtk'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-mtk-v2 
verification-needed-jammy-linux-mtk

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata 

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-11-09 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-xilinx-
zynqmp/5.15.0-1025.29 kernel in -proposed solves the problem. Please
test the kernel and update this bug with the results. If the problem is
solved, change the tag 'verification-needed-jammy-linux-xilinx-zynqmp'
to 'verification-done-jammy-linux-xilinx-zynqmp'. If the problem still
exists, change the tag 'verification-needed-jammy-linux-xilinx-zynqmp'
to 'verification-failed-jammy-linux-xilinx-zynqmp'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-xilinx-zynqmp-v2 
verification-needed-jammy-linux-xilinx-zynqmp

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-10-23 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-nvidia-
tegra-5.15/5.15.0-1018.18~20.04.1 kernel in -proposed solves the
problem. Please test the kernel and update this bug with the results. If
the problem is solved, change the tag 'verification-needed-focal-linux-
nvidia-tegra-5.15' to 'verification-done-focal-linux-nvidia-tegra-5.15'.
If the problem still exists, change the tag 'verification-needed-focal-
linux-nvidia-tegra-5.15' to 'verification-failed-focal-linux-nvidia-
tegra-5.15'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-focal-linux-nvidia-tegra-5.15-v2 
verification-needed-focal-linux-nvidia-tegra-5.15

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-10-18 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-nvidia-tegra-
igx/5.15.0-1005.5 kernel in -proposed solves the problem. Please test
the kernel and update this bug with the results. If the problem is
solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra-
igx' to 'verification-done-jammy-linux-nvidia-tegra-igx'. If the problem
still exists, change the tag 'verification-needed-jammy-linux-nvidia-
tegra-igx' to 'verification-failed-jammy-linux-nvidia-tegra-igx'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-nvidia-tegra-igx-v2 
verification-needed-jammy-linux-nvidia-tegra-igx

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-10-09 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-
bluefield/5.15.0-1027.29 kernel in -proposed solves the problem. Please
test the kernel and update this bug with the results. If the problem is
solved, change the tag 'verification-needed-jammy-linux-bluefield' to
'verification-done-jammy-linux-bluefield'. If the problem still exists,
change the tag 'verification-needed-jammy-linux-bluefield' to
'verification-failed-jammy-linux-bluefield'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-bluefield-v2 
verification-needed-jammy-linux-bluefield

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-10-09 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-raspi/5.15.0-1040.43
kernel in -proposed solves the problem. Please test the kernel and
update this bug with the results. If the problem is solved, change the
tag 'verification-needed-jammy-linux-raspi' to 'verification-done-jammy-
linux-raspi'. If the problem still exists, change the tag 'verification-
needed-jammy-linux-raspi' to 'verification-failed-jammy-linux-raspi'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-raspi-v2 
verification-needed-jammy-linux-raspi

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata 

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-10-08 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-nvidia-
tegra/5.15.0-1018.18 kernel in -proposed solves the problem. Please test
the kernel and update this bug with the results. If the problem is
solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to
'verification-done-jammy-linux-nvidia-tegra'. If the problem still
exists, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to
'verification-failed-jammy-linux-nvidia-tegra'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-nvidia-tegra-v2 
verification-needed-jammy-linux-nvidia-tegra

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-10-05 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-aws/5.15.0-1048.53
kernel in -proposed solves the problem. Please test the kernel and
update this bug with the results. If the problem is solved, change the
tag 'verification-needed-jammy-linux-aws' to 'verification-done-jammy-
linux-aws'. If the problem still exists, change the tag 'verification-
needed-jammy-linux-aws' to 'verification-failed-jammy-linux-aws'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-aws-v2 
verification-needed-jammy-linux-aws

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata 

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-10-05 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-azure/5.15.0-1050.57
kernel in -proposed solves the problem. Please test the kernel and
update this bug with the results. If the problem is solved, change the
tag 'verification-needed-jammy-linux-azure' to 'verification-done-jammy-
linux-azure'. If the problem still exists, change the tag 'verification-
needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-azure-v2 
verification-needed-jammy-linux-azure

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata 

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-10-03 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.15.0-86.96

---
linux (5.15.0-86.96) jammy; urgency=medium

  * jammy/linux: 5.15.0-86.96 -proposed tracker (LP: #2036575)

  * 5.15.0-85 live migration regression (LP: #2036675)
- Revert "KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES"
- Revert "x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0"

  * Regression for ubuntu_bpf test build on Jammy 5.15.0-85.95 (LP: #2035181)
- selftests/bpf: fix static assert compilation issue for test_cls_*.c

  * `refcount_t: underflow; use-after-free.` on hidon w/ 5.15.0-85-generic
(LP: #2034447)
- crypto: rsa-pkcs1pad - Use helper to set reqsize

linux (5.15.0-85.95) jammy; urgency=medium

  * jammy/linux: 5.15.0-85.95 -proposed tracker (LP: #2033821)

  * Please enable Renesas RZ platform serial installer (LP: #2022361)
- [Config] enable hihope RZ/G2M serial console
- [Config] Mark sh-sci as built-in

  * Request backport of xen timekeeping performance improvements (LP: #2033122)
- x86/xen/time: prefer tsc as clocksource when it is invariant

  * kdump doesn't work with UEFI secure boot and kernel lockdown enabled on
ARM64 (LP: #2033007)
- [Config]: Enable CONFIG_KEXEC_IMAGE_VERIFY_SIG
- kexec, KEYS: make the code in bzImage64_verify_sig generic
- arm64: kexec_file: use more system keyrings to verify kernel image 
signature

  * ubuntu_kernel_selftests:net:vrf-xfrm-tests.sh: 8 failed test cases on
jammy/fips (LP: #2019880)
- selftests: net: vrf-xfrm-tests: change authentication and encryption algos

  * ubuntu_kernel_selftests:net:tls: 88 failed test cases on jammy/fips
(LP: #2019868)
- selftests/harness: allow tests to be skipped during setup
- selftests: net: tls: check if FIPS mode is enabled

  * A general-proteciton exception during guest migration to unsupported PKRU
machine (LP: 2032164, reverted)
- x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
- KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES

  * CVE-2023-4569
- netfilter: nf_tables: deactivate catchall elements in next generation

  * CVE-2023-20569
- x86/cpu, kvm: Add support for CPUID_8021_EAX
- x86/srso: Add a Speculative RAS Overflow mitigation
- x86/srso: Add IBPB_BRTYPE support
- x86/srso: Add SRSO_NO support
- x86/srso: Add IBPB
- x86/srso: Add IBPB on VMEXIT
- x86/srso: Fix return thunks in generated code
- x86/srso: Tie SBPB bit setting to microcode patch detection
- x86: fix backwards merge of GDS/SRSO bit
- x86/srso: Fix build breakage with the LLVM linker
- x86/cpu: Fix __x86_return_thunk symbol type
- x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()
- x86/alternative: Make custom return thunk unconditional
- objtool: Add frame-pointer-specific function ignore
- x86/ibt: Add ANNOTATE_NOENDBR
- x86/cpu: Clean up SRSO return thunk mess
- x86/cpu: Rename original retbleed methods
- x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
- x86/cpu: Cleanup the untrain mess
- x86/srso: Explain the untraining sequences a bit more
- x86/static_call: Fix __static_call_fixup()
- x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()
- x86/srso: Disable the mitigation on unaffected configurations
- x86/retpoline,kprobes: Fix position of thunk sections with 
CONFIG_LTO_CLANG
- objtool/x86: Fixup frame-pointer vs rethunk
- x86/srso: Correct the mitigation status when SMT is disabled
- objtool/x86: Fix SRSO mess
- Ubuntu: [Config]: enable Speculative Return Stack Overflow mitigation

  * Fix unreliable ethernet cable detection on I219 NIC (LP: #2028122)
- e1000e: Use PME poll to circumvent unreliable ACPI wake

  * Need to get fine-grained control for FAN(TFN) Participant. (LP: #2031333)
- ACPI: fan: Separate file for attributes creation
- ACPI: fan: Optimize struct acpi_fan_fif
- ACPI: fan: Properly handle fine grain control
- ACPI: fan: Add additional attributes for fine grain control

  * [SRU][Ubuntu 22.04.1] Unable to interpret the frequency values in
cpuinfo_min_freq and cpuino_max_freq sysfs files. (LP: #2030924)
- cpufreq: intel_pstate: Fix scaling for hybrid-capable

  * CVE-2023-40283
- Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb

  * CVE-2023-20588
- x86/bugs: Increase the x86 bugs vector size to two u32s
- x86/CPU/AMD: Do not leak quotient data after a division by 0
- x86/CPU/AMD: Fix the DIV(0) initial fix attempt

  * CVE-2023-4194
- net: tun_chr_open(): set sk_uid from current_fsuid()
- net: tap_open(): set sk_uid from current_fsuid()

  * CVE-2023-4155
- KVM: SEV: Refactor out sev_es_state struct
- KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary
- KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure
- KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB 

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-09-19 Thread Andrew Liaw
Hi, is there a timeline on when this patch will reach the general
availability kernel?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  DistroRelease: Ubuntu 22.04
  Ec2AMI: ami-0fbb
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-2
  Ec2InstanceType: builder-cpu2-ram44-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  Lsusb-t: /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
  MachineType: OpenStack Foundation OpenStack Nova
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:

  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no 

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-09-10 Thread Andrew Liaw
No crashes observed with the proposed kernel. Changed the tag to
'verification-done-jammy-linux'.

** Tags removed: verification-needed-jammy-linux
** Tags added: verification-done-jammy-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  DistroRelease: Ubuntu 22.04
  Ec2AMI: ami-0fbb
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-2
  Ec2InstanceType: builder-cpu2-ram44-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  Lsusb-t: /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
  MachineType: OpenStack Foundation OpenStack Nova
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-09-07 Thread Andrew Liaw
Currently, testing out the linux/5.15.0-85.95 version of kernel.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  DistroRelease: Ubuntu 22.04
  Ec2AMI: ami-0fbb
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-2
  Ec2InstanceType: builder-cpu2-ram44-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  Lsusb-t: /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
  MachineType: OpenStack Foundation OpenStack Nova
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:

  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-09-06 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux/5.15.0-85.95 kernel in
-proposed solves the problem. Please test the kernel and update this bug
with the results. If the problem is solved, change the tag
'verification-needed-jammy-linux' to 'verification-done-jammy-linux'. If
the problem still exists, change the tag 'verification-needed-jammy-
linux' to 'verification-failed-jammy-linux'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-v2 verification-needed-jammy-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  DistroRelease: Ubuntu 

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-09-01 Thread Stefan Bader
Right, fix-committed means it is applied to git and will be included in
the 2023.09.04 SRU cycle. The fix for 5.15 is to change the default  of
tdp_mmu to off (like you did for testing). Changing the deployments like
you did would be the work-around in the mean-time.

The parent Ubuntu state refers to current development (Mantic right
now). This should be fixed. This would leave Lunar (6.2). That should at
least contain improvements to leave this enabled by default. And I read
Thadeu's comment as he was not able to reproduce with 6.2 (to me it
sounded like without changing the value there, but it is a bit
ambigous).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  DistroRelease: Ubuntu 22.04
  Ec2AMI: ami-0fbb
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-2
 

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-08-31 Thread Andrew Liaw
We are still seeing this issue on 5.15.0-82-generic for 22.04 (Jammy)

Since the (Ubuntu Jammy) is on Fix Committed and not Fix Released, I
would assume this is normal right? And  the Fix Released status on
(Ubuntu) means the bug is not present on other Ubuntu versions?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  DistroRelease: Ubuntu 22.04
  Ec2AMI: ami-0fbb
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-2
  Ec2InstanceType: builder-cpu2-ram44-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  Lsusb-t: /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
  MachineType: OpenStack Foundation 

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-08-31 Thread Stefan Bader
** Changed in: linux (Ubuntu Jammy)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  Impact:
  We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
  The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.

  Fix:
  The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).

  Regression potential:
  VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.

  Testcase:
  Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.

  
  --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  DistroRelease: Ubuntu 22.04
  Ec2AMI: ami-0fbb
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-2
  Ec2InstanceType: builder-cpu2-ram44-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  Lsusb-t: /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
  MachineType: OpenStack Foundation OpenStack Nova
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:

  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-08-30 Thread Stefan Bader
** Changed in: linux (Ubuntu)
   Status: Confirmed => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: Triaged => In Progress

** Changed in: linux (Ubuntu Jammy)
 Assignee: (unassigned) => Stefan Bader (smb)

** Description changed:

+ Impact:
+ We had reports of VM setups which would show intermediate crashes and after 
that locking up completely. This could be reproduced with large memory setups.
+ The problem seems to be that fixes to performance regressions caused more 
problems in 5.15 kernels and the full fixes are too intrusive to be backported.
+ 
+ Fix:
+ The following patch was recently sent to the upstream stable mailing list and 
looks to be making its way into linux-5.15.y. This changes the default value of 
kvm.tdp_mmu to off (if anyone is willing to take the risks, this can be changed 
back in config).
+ 
+ Regression potential:
+ VM hosts with many large memory tennants might see a performance impact which 
the TDP MMU approach tried to solve. If those did not see other problems they 
might turn this on again.
+ 
+ Testcase:
+ Large openstack instance (64GB memory, AMD CPU (using SVM)) with a large 
second level guest (32GB memory). Repeatedly starting and stopping the 2nd 
level guest.
+ 
+ 
+ --- original description ---
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.
  
  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]
  
  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/
  
  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.
  
  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.
  
  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229
  
  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
- --- 
+ ---
  ProblemType: Bug
  AlsaDevices:
-  total 0
-  crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
-  crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
+  total 0
+  crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
+  crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  DistroRelease: Ubuntu 22.04
  Ec2AMI: ami-0fbb
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-2
  Ec2InstanceType: builder-cpu2-ram44-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  Lsusb-t: /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
  MachineType: OpenStack Foundation OpenStack Nova
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
-  
+ 
  ProcEnviron:
-  TERM=xterm-256color
-  PATH=(custom, no user)
-  LANG=C.UTF-8
-  SHELL=/bin/bash
+  TERM=xterm-256color
+  

[Kernel-packages] [Bug 2032176] Re: Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel 5.19.0-46.47-22.04.1

2023-08-30 Thread Andrew Liaw
Confirm the instances with tdp_mmu=0 does not seem to crash. Had 5
instances running for 4 days.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2032176

Title:
  Crashing with CPU soft lock on GA kernel 5.15.0.79.76 and HWE kernel
  5.19.0-46.47-22.04.1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Jammy:
  Triaged

Bug description:
  The crash occurred on a juju machine, and the juju agent was lost.
  The juju machine is on an openstack instance provision by juju.

  The openstack console log indicts the it is related to spin_lock and KVM MMU:
  [418200.348830]  ? _raw_spin_lock+0x22/0x30
  [418200.349588]  _raw_write_lock+0x20/0x30
  [418200.350196]  kvm_tdp_mmu_map+0x2b1/0x490 [kvm]
  [418200.351014]  kvm_mmu_notifier_invalidate_range_start+0x1ad/0x300 [kvm]
  [418200.351796]  direct_page_fault+0x206/0x310 [kvm]
  [418200.352667]  __mmu_notifier_invalidate_range_start+0x91/0x1b0
  [418200.353624]  kvm_tdp_page_fault+0x72/0x90 [kvm]
  [418200.354496]  try_to_migrate_one+0x691/0x730
  [418200.355436]  kvm_mmu_page_fault+0x73/0x1c0 [kvm]

  openstack console log: https://pastebin.canonical.com/p/spmH8r3crQ/

  syslog: https://pastebin.canonical.com/p/wFPsFD8G9n/
  The syslog was rotated after the crash occurred, so the syslog at the time of 
the initial crash was lost.

  Other juju machine with 5.15.0.79.76 kernel seems to have the same
  issues.

  We previously have a similar issue with 5.15.0-73. The juju machine
  crashed with raw_spin_lock and kvm mmu in the logs as well:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026229

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.19.0-46-generic 5.19.0-46.47~22.04.1
  ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
  Uname: Linux 5.19.0-46-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug 21 08:59:46 2023
  Ec2AMI: ami-0c61
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-1
  Ec2InstanceType: builder-cpu4-ram72-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.19
  UpgradeStatus: No upgrade log present (probably fresh install)
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Aug 23 03:23 seq
   crw-rw 1 root audio 116, 33 Aug 23 03:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: openstack
  CloudName: openstack
  CloudPlatform: openstack
  CloudSubPlatform: metadata (http://169.254.169.254)
  DistroRelease: Ubuntu 22.04
  Ec2AMI: ami-0fbb
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: availability-zone-2
  Ec2InstanceType: builder-cpu2-ram44-disk20
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  Lsusb-t: /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
  MachineType: OpenStack Foundation OpenStack Nova
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 qxldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-83-generic 
root=UUID=a6de04b8-3631-4ce4-bb96-48076f4a56bf ro console=tty1 console=ttyS0
  ProcVersionSignature: Ubuntu 5.15.0-83.92-generic 5.15.116
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-83-generic N/A
   linux-backports-modules-5.15.0-83-generic  N/A
   linux-firmware 20220329.git681281e4-0ubuntu3.17
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  jammy ec2-images
  Uname: Linux 5.15.0-83-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: True
  dmi.bios.date: 04/01/2014
  dmi.bios.release: 0.0
  dmi.bios.vendor: SeaBIOS
  dmi.bios.version: 1.13.0-1ubuntu1.1
  dmi.chassis.type: 1
  dmi.chassis.vendor: QEMU
  dmi.chassis.version: pc-i440fx-4.2
  dmi.modalias: