** Also affects: linux-azure (Ubuntu Groovy)
Importance: Undecided
Status: New
** Also affects: linux-azure (Ubuntu Focal)
Importance: Undecided
Status: New
** Changed in: linux-azure (Ubuntu Focal)
Status: New => In Progress
** Changed in: linux-azure (Ubuntu Groovy)
Status: New => Fix Committed
** Changed in: linux-azure (Ubuntu Groovy)
Status: Fix Committed => In Progress
** Description changed:
+ [Impact]
+
There are failed logs after resume from hibernation in NV6 (GPU passthrough
size) VM in Azure:
[ 1432.153730] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask()
failed: 0x5
[ 1432.167910] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask()
failed: 0x5
This happens to the latest stable release of the linux-azure
5.4.0-1023.23 kernel and the latest mainline linux kernel.
- How reproducible:
+ [Test Case]
+
+ How reproducible:
100%
Steps to Reproduce:
1. Start a Standard_NV6 VM in Azure and enable hibernation properly (please
refer to
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14 )
E.g. here I create a Generation-1 Ubuntu 20.04 Standard NV6_Promo (6
vcpus, 56 GiB memory) VM in East US 2.
2. Make sure the in-kernel open-source nouveau driver is loaded, or
blacklist the nouveau driver and install the official Nvidia GPU driver
(please follow https://docs.microsoft.com/en-us/azure/virtual-
machines/linux/n-series-driver-setup : "Install GRID drivers on NV or
NVv3-series VMs" -- the most important step to run the "./NVIDIA-Linux-
x86_64-grid.run".)
3. Run hibernation from serial console
# systemctl hibernate
4. After hibernation finishes, start VM and check dmesg
# dmesg|grep fail
Actual results:
[ 1432.153730] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask()
failed: 0x5
[ 1432.167910] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask()
failed: 0x5
And /proc/interrupts shows that the GPU interrupts are no longer
happening.
Expected results:
No failed logs, and the GPU interrupt should still happen after hibernation.
+ [Regression Potential]
+
+ The fix touches the pci-hyperv and can compromise the hyper-v guest
+ drivers. However the change is focuses on the execution path used for
+ hibernation that is still not officially supported.
+
+ [Other info]
BUG FIX:
I made a fix here: https://lkml.org/lkml/2020/9/4/1268.
Without the patch, we see the error "hv_pci
47505500-0001-0000-3130-444531334632: hv_irq_unmask() failed: 0x5"
during hibernation when the VM has the Nvidia GPU driver loaded, and
after hibernation the GPU driver can no longer receive any MSI/MSI-X
interrupts when we check /proc/interrupts.
With the patch, we should no longer see the error, and the GPU driver
should still receive interrupts after hibernation.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894893
Title:
[linux-azure][hibernation] GPU device no longer working after resume
from hibernation in NV6 VM size
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs