Cornered this to zswap and not an issue with mm or I/O. Figured out
that 3 hours soak testing on each bisect step is the only reliably way
to do a bisect. Bisected between 4.20 and 5.0 finally cornered the
issue and hence the commits required to fix this.
** Description changed:
+ == SRU Justification ==
+
+ When using zram (as installed and configured with the zram-config package)
+ systems can lockup after about a week of use. This occurs because of
+ a hang in a lock in zram.
+
+ == Test Case ==
+
+ Run stress-ng --brk 0 --stack 0 in a Bionic amd64 server VM with 1GM of
+ memory, 16 CPU threads and zram-config installed. Without the fix the
+ kernel will hang in a spinlock after 1-2 hours of run time. With the fix,
+ the hang does not occur. Testing shows that with the fix, 5 x 16 CPU hours
+ of stress testing with stress-ng works fine without the lockup occurring.
+
+ == The fix ==
+
+ Upstream commit c4d6c4cc7bfd ("zram: correct flag name of ZRAM_ACCESS") as
+ a prerequisite followed by a minor context wiggle backport of the fix with
+ commit 3c9959e02547 ("zram: fix lockdep warning of free block handling").
+
+ == Regression Potential ==
+
+ This touches the zram locking, so the core zram driver is affected. However
+ the fixes are backports from 5.0, so the fixes have had a fair amount of
+ testing in later kernels.
+
+
My main server has been running into hard lockups about once a week ever
since I switched to the 4.15 Ubuntu 18.04 kernel.
When this happens, nothing is printed to the console, it's effectively
stuck showing a login prompt. The system is running with panic=1 on the
cmdline but isn't rebooting so the kernel isn't even processing this as
a kernel panic.
-
- As this felt like a potential hardware issue, I had my hosting provider give
me a completely different system, different motherboard, different CPU,
different RAM and different storage, I installed that system on 18.04 and moved
my data over, a week later, I hit the issue again.
+ As this felt like a potential hardware issue, I had my hosting provider
+ give me a completely different system, different motherboard, different
+ CPU, different RAM and different storage, I installed that system on
+ 18.04 and moved my data over, a week later, I hit the issue again.
We've since also had a LXD user reporting similar symptoms here also on
varying hardware:
- https://github.com/lxc/lxd/issues/5197
+ https://github.com/lxc/lxd/issues/5197
-
- My system doesn't have a lot of memory pressure with about 50% of free memory:
+ My system doesn't have a lot of memory pressure with about 50% of free
+ memory:
root@vorash:~# free -m
- total used free shared buff/cache
available
+ total used free shared buff/cache
available
Mem: 31819 17574 402 513 13842
13292
Swap: 15909 2687 13222
I will now try to increase console logging as much as possible on the
system in the hopes that next time it hangs we can get a better idea of
what happened but I'm not too hopeful given the complete silence on the
console when this occurs.
System is currently on:
- Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux
+ Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux
But I've seen this since the GA kernel on 4.15 so it's not a recent
regression.
- ---
+ ---
ProblemType: Bug
AlsaDevices:
- total 0
- crw-rw---- 1 root audio 116, 1 Oct 23 16:12 seq
- crw-rw---- 1 root audio 116, 33 Oct 23 16:12 timer
+ total 0
+ crw-rw---- 1 root audio 116, 1 Oct 23 16:12 seq
+ crw-rw---- 1 root audio 116, 33 Oct 23 16:12 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.4
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord':
'arecord'
AudioDevicesInUse:
- Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
- Cannot stat file /proc/22831/fd/10: Permission denied
+ Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
+ Cannot stat file /proc/22831/fd/10: Permission denied
DistroRelease: Ubuntu 18.04
HibernationDevice:
- RESUME=none
- CRYPTSETUP=n
+ RESUME=none
+ CRYPTSETUP=n
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
- Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
- Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard
and Mouse
- Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
+ Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
+ Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard
and Mouse
+ Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Intel Corporation S1200SP
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:
-
+
ProcEnviron:
- TERM=xterm
- PATH=(custom, no user)
- XDG_RUNTIME_DIR=<set>
- LANG=en_US.UTF-8
- SHELL=/bin/bash
+ TERM=xterm
+ PATH=(custom, no user)
+ XDG_RUNTIME_DIR=<set>
+ LANG=en_US.UTF-8
+ SHELL=/bin/bash
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0
panic=1 verbose console=tty0 console=ttyS0,115200n8
ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
RelatedPackageVersions:
- linux-restricted-modules-4.15.0-38-generic N/A
- linux-backports-modules-4.15.0-38-generic N/A
- linux-firmware 1.173.1
+ linux-restricted-modules-4.15.0-38-generic N/A
+ linux-backports-modules-4.15.0-38-generic N/A
+ linux-firmware 1.173.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.15.0-38-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
-
+
_MarkForUpload: False
dmi.bios.date: 01/25/2018
dmi.bios.vendor: Intel Corporation
dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: S1200SP
dmi.board.vendor: Intel Corporation
dmi.board.version: H57532-271
dmi.chassis.asset.tag: ....................
dmi.chassis.type: 23
dmi.chassis.vendor: ...............................
dmi.chassis.version: ..................
dmi.modalias:
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr....................:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...............................:ct23:cvr..................:
dmi.product.family: Family
dmi.product.name: S1200SP
dmi.product.version: ....................
dmi.sys.vendor: Intel Corporation
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1799497
Title:
4.15 kernel hard lockup about once a week
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1799497/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs