Public bug reported: Hi!
Recently (throughout the last 6 months) we've upgraded our hypervisor compute hosts from ubuntu bionic kernel 4.15.* to ubuntu bionic hwe kernel 5.4. This month we noticed that several nodes failed due to bugs in cgroups. Trace was different almost every time, but it all revolves around cgroups - either null pointer failures, or panic caught by BUG_ON() macro. Looked like some cgroup didn't exist anymore but somebody tried to access it, thus causing kernel panic. Please find the logs attached. 3 of 4 cases happened after a VM shutdown. We tried to spawn lots of VMs, load them, shut them down, but didn't manage to reproduce the behavior. Actually, every case is sort of different - patch kernel versions (5.4.0-42 to 5.4.0-66), uptime vary (from 1 day to ~half a year). There are also lots of hosts with several months of uptime, no issue with them. Also, on 4.15 we've never seen this behavior, at all. That's quite disturbing, as I don't want dozens of VMs crash (due to host outage) at random times for some vague reason... I didn't manage to find any related bugs on the bug tracker, thus creating this one. I wonder if anybody in the community came across something like that. Could somebody give an advice how to debug further, or where else to report / look for a similar the case? ** Affects: linux-hwe-5.4 (Ubuntu) Importance: Undecided Status: New ** Tags: cgroups ** Attachment added: "crash-030321.log" https://bugs.launchpad.net/bugs/1921355/+attachment/5480836/+files/crash-030321.log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs