Re: [Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
It sounds like what I was getting. On Thu, Jan 16, 2020 at 11:05 PM Colin Ian King <1799...@bugs.launchpad.net> wrote: > After quite a bit of experimentation I found that I can reproduce the bug > if I have zram *and* also swap on the filesystem enabled while exercising > the brk stressors and aiol (to cause lots of I/O). Eventually the system > grinds to a halt, we lose interactivity and we eventually get lockups as > follows: > [ 2012.040006] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! > [stress-ng-brk:1632] > [ 2012.040922] Modules linked in: zram(E) kvm_intel(E) kvm(E) irqbypass(E) > crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) > aesni_intel(E) aes_x86_64(E) crypto_simd(E) glue_helper(E) cryptd(E) > psmouse(E) input_leds(E) floppy(E) virtio_scsi(E) serio_raw(E) i2c_piix4(E) > mac_hid(E) pata_acpi(E) qemu_fw_cfg(E) 9pnet_virtio(E) 9p(E) 9pnet(E) > fscache(E) > [ 2012.044655] CPU: 2 PID: 1632 Comm: stress-ng-brk Tainted: G > EL 4.15.18 #1 > [ 2012.045581] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.13.0-1 04/01/2014 > [ 2012.046555] RIP: > 0010:__raw_callee_save___pv_queued_spin_unlock+0x10/0x17 > [ 2012.047340] RSP: 0018:b73382083718 EFLAGS: 0246 ORIG_RAX: > ff11 > [ 2012.048238] RAX: 0001 RBX: RCX: > 0002 > [ 2012.049078] RDX: RSI: 9d327c2f6918 RDI: > a3269978 > [ 2012.049909] RBP: b73382083720 R08: 9d327c2f6918 R09: > 9d327c0a5328 > [ 2012.050746] R10: 9d327c1e2310 R11: 9d327c1e2328 R12: > 9d327c2f6800 > [ 2012.051574] R13: 9d327c1e2328 R14: 9d327c1e2310 R15: > 9d327c1e2200 > [ 2012.052436] FS: 7f89f2ccd740() GS:9d327f28() > knlGS: > [ 2012.053382] CS: 0010 DS: ES: CR0: 80050033 > [ 2012.054058] CR2: 7f1350a8dd90 CR3: 311a4004 CR4: > 00160ee0 > [ 2012.054889] Call Trace: > [ 2012.055192] get_swap_pages+0x193/0x360 > [ 2012.055652] get_swap_page+0x13f/0x1e0 > [ 2012.056123] add_to_swap+0x14/0x70 > [ 2012.056530] shrink_page_list+0x81d/0xbc0 > [ 2012.057013] shrink_inactive_list+0x242/0x590 > [ 2012.057523] shrink_node_memcg+0x364/0x770 > [ 2012.058012] shrink_node+0xf7/0x300 > [ 2012.058432] ? shrink_node+0xf7/0x300 > [ 2012.058863] do_try_to_free_pages+0xc9/0x330 > [ 2012.059368] try_to_free_pages+0xee/0x1b0 > [ 2012.059842] __alloc_pages_slowpath+0x3fc/0xe00 > [ 2012.060424] __alloc_pages_nodemask+0x29a/0x2c0 > [ 2012.060963] alloc_pages_vma+0x88/0x1f0 > [ 2012.061414] __handle_mm_fault+0x8b7/0x12e0 > [ 2012.061909] handle_mm_fault+0xb1/0x210 > [ 2012.062375] __do_page_fault+0x281/0x4b0 > [ 2012.062848] do_page_fault+0x2e/0xe0 > [ 2012.063274] ? async_page_fault+0x2f/0x50 > [ 2012.063751] do_async_page_fault+0x51/0x80 > [ 2012.064262] async_page_fault+0x45/0x50 > [ 2012.064719] RIP: 0033:0x55ec1997bd0a > [ 2012.065147] RSP: 002b:7ffeacd21600 EFLAGS: 00010246 > [ 2012.065754] RAX: 55ec28601000 RBX: 0005 RCX: > 7f89f2de956b > [ 2012.066580] RDX: 55ec28601000 RSI: 7ffeacd216d0 RDI: > 55ec28602000 > [ 2012.067410] RBP: 7ffeacd216c0 R08: R09: > 7f89f3d0c2f0 > [ 2012.068290] R10: R11: 0246 R12: > > [ 2012.069129] R13: 0002 R14: 0001 R15: > 7ffeacd216d0 > [ 2012.069965] Code: 50 41 51 41 52 41 53 e8 3b 05 00 00 41 5b 41 5a 41 59 > 41 58 5f 5e 5a 59 5d c3 90 55 48 89 e5 52 b8 01 00 00 00 31 d2 f0 0f b0 17 > <3c> 01 75 03 5a 5d c3 56 0f b6 f0 e8 bc ff ff ff 5e 5a 5d c3 0f > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1799497 > > Title: > 4.15 kernel hard lockup about once a week > > Status in linux package in Ubuntu: > Incomplete > Status in zram-config package in Ubuntu: > Incomplete > Status in linux source package in Bionic: > Confirmed > Status in zram-config source package in Bionic: > Confirmed > > Bug description: > My main server has been running into hard lockups about once a week > ever since I switched to the 4.15 Ubuntu 18.04 kernel. > > When this happens, nothing is printed to the console, it's effectively > stuck showing a login prompt. The system is running with panic=1 on > the cmdline but isn't rebooting so the kernel isn't even processing > this as a kernel panic. > > > As this felt like a potential hardware issue, I had my hosting provider > give me a completely different system, different motherboard, different > CPU, different RAM and different storage, I installed that system on 18.04 > and moved my data over, a week later, I hit the issue again. > > We've since also had a LXD user reporting similar symptoms here also on > varying hardware: > https://github.com/lxc/lxd/issues/5197 > > > My system doesn't have a lot of memory pressure with about 50% of free > memory: > >
Re: [Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
Hi.. I had to remove zram config from my production servers long ago. ... since then I don't have the issue. I was using LXD containers a lot on the hosts with different kind of usage,, But I don't have any other setup at the moment On Fri, Jan 10, 2020 at 12:11 AM Colin Ian King <1799...@bugs.launchpad.net> wrote: > Can reproduce this with stress-ng exercising high memory pressure scenario > using: > stress-ng --brk 0 -v --aiol 0 > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1799497 > > Title: > 4.15 kernel hard lockup about once a week > > Status in linux package in Ubuntu: > Incomplete > Status in zram-config package in Ubuntu: > Incomplete > Status in linux source package in Bionic: > Confirmed > Status in zram-config source package in Bionic: > Confirmed > > Bug description: > My main server has been running into hard lockups about once a week > ever since I switched to the 4.15 Ubuntu 18.04 kernel. > > When this happens, nothing is printed to the console, it's effectively > stuck showing a login prompt. The system is running with panic=1 on > the cmdline but isn't rebooting so the kernel isn't even processing > this as a kernel panic. > > > As this felt like a potential hardware issue, I had my hosting provider > give me a completely different system, different motherboard, different > CPU, different RAM and different storage, I installed that system on 18.04 > and moved my data over, a week later, I hit the issue again. > > We've since also had a LXD user reporting similar symptoms here also on > varying hardware: > https://github.com/lxc/lxd/issues/5197 > > > My system doesn't have a lot of memory pressure with about 50% of free > memory: > > root@vorash:~# free -m > totalusedfree shared buff/cache > available > Mem: 31819 17574 402 513 13842 > 13292 > Swap: 159092687 13222 > > I will now try to increase console logging as much as possible on the > system in the hopes that next time it hangs we can get a better idea > of what happened but I'm not too hopeful given the complete silence on > the console when this occurs. > > System is currently on: > Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC > 2018 x86_64 x86_64 x86_64 GNU/Linux > > But I've seen this since the GA kernel on 4.15 so it's not a recent > regression. > --- > ProblemType: Bug > AlsaDevices: >total 0 >crw-rw 1 root audio 116, 1 Oct 23 16:12 seq >crw-rw 1 root audio 116, 33 Oct 23 16:12 timer > AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': > 'aplay' > ApportVersion: 2.20.9-0ubuntu7.4 > Architecture: amd64 > ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': > 'arecord' > AudioDevicesInUse: >Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed > with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied >Cannot stat file /proc/22831/fd/10: Permission denied > DistroRelease: Ubuntu 18.04 > HibernationDevice: >RESUME=none >CRYPTSETUP=n > IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': > 'iwconfig' > Lsusb: >Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub >Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual > Keyboard and Mouse >Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub > MachineType: Intel Corporation S1200SP > NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair > Package: linux (not installed) > PciMultimedia: > > ProcEnviron: >TERM=xterm >PATH=(custom, no user) >XDG_RUNTIME_DIR= >LANG=en_US.UTF-8 >SHELL=/bin/bash > ProcFB: 0 mgadrmfb > ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic > root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 > net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 > ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 > RelatedPackageVersions: >linux-restricted-modules-4.15.0-38-generic N/A >linux-backports-modules-4.15.0-38-generic N/A >linux-firmware 1.173.1 > RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' > Tags: bionic > Uname: Linux 4.15.0-38-generic x86_64 > UnreportableReason: This report is about a package that is not installed. > UpgradeStatus: No upgrade log present (probably fresh install) > UserGroups: > > _MarkForUpload: False > dmi.bios.date: 01/25/2018 > dmi.bios.vendor: Intel Corporation > dmi.bios.version: S1200SP.86B.03.01.1029.012520180838 > dmi.board.asset.tag: Base Board Asset Tag > dmi.board.name: S1200SP > dmi.board.vendor: Intel Corporation > dmi.board.version: H57532-271 > dmi.chassis.asset.tag: > dmi.chassis.type: