Public bug reported:

There is a kernel null pointer dereference caused by Libvirt VM shutdown
when vEGM is enabled:

[  616.355597] Unable to handle kernel NULL pointer dereference at virtual 
address 0000000000000000
[  616.601430] pc : __rb_erase_color+0xc4/0x2a8
[  616.605878] lr : interval_tree_remove+0x184/0x2e8
[  616.696464]  interval_tree_remove+0x184/0x2e8
[  616.701185]  unregister_pfn_address_space+0x4c/0xc0
[  616.706439]  nvgrace_egm_release+0x98/0xd8 [nvgrace_egm]

When booting and shutting down a raw qemu VM with vEGM, we can observe
an open() syscall on the EGM device on boot, a subsequent mmap(), and
then a close() on VM shutdown.

For libvirt + qemu, there is an additional close() on VM shutdown but
the __rb_parent_color field of the egm_region->pfn_address_space.node
struct isn't cleared after the first unregister. When the second close
happens, unregister_pfn_address_space() is called again on the same
struct. The interval_tree_remove() code checks __rb_parent_color and
assumes the node is still in the tree (because it's non-zero), then
tries to traverse the tree using the stale parent pointer, resulting in
a NULL pointer dereference.

** Affects: linux-nvidia (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: linux-nvidia-6.14 (Ubuntu)
     Importance: Undecided
         Status: New

** Also affects: linux-nvidia-6.14 (Ubuntu)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131582

Title:
  NULL pointer dereference during vEGM Libvirt VM lifecycle

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2131582/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to