Public bug reported:

Hi!

Recently (throughout the last 6 months) we've upgraded our hypervisor
compute hosts from ubuntu bionic kernel 4.15.* to ubuntu bionic hwe
kernel 5.4.

This month we noticed that several nodes failed due to bugs in cgroups.
Trace was different almost every time, but it all revolves around cgroups - 
either null pointer failures, or panic caught by BUG_ON() macro. Looked like 
some cgroup didn't exist anymore but somebody tried to access it, thus causing 
kernel panic.
Please find the logs attached.

3 of 4 cases happened after a VM shutdown. We tried to spawn lots of VMs, load 
them, shut them down, but didn't manage to reproduce the behavior.
Actually, every case is sort of different - patch kernel versions (5.4.0-42 to 
5.4.0-66), uptime vary (from 1 day to ~half a year). There are also lots of 
hosts with several months of uptime, no issue with them. Also, on 4.15 we've 
never seen this behavior, at all.
That's quite disturbing, as I don't want dozens of VMs crash (due to host 
outage) at random times for some vague reason...
I didn't manage to find any related bugs on the bug tracker, thus creating this 
one.

I wonder if anybody in the community came across something like that.
Could somebody give an advice how to debug further, or where else to report / 
look for a similar the case?

** Affects: linux-hwe-5.4 (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: cgroups

** Attachment added: "crash-030321.log"
   
https://bugs.launchpad.net/bugs/1921355/+attachment/5480836/+files/crash-030321.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921355

Title:
  cgroups related kernel panics

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to