Hi, This question came up while I was investigating a Libvirt bug [1], where an user is removing VFs from the host while Libvirt domains was using them, causing Libvirt to remain in an inconsistent state. I'm trying to alleviate the effects of this in Libvirt (see [2] if curious), but QEMU is throwing some messages in the terminal that, although it appears to be benign, I'm not sure if it's a symptom of something that should be fixed.
In a Power 9 server running a Mellanox MT28800 SR-IOV netcard I have the following IOMMU settings, where the physical card is at Group 0 and all the VFs are allocated from Group 12 and onwards: IOMMU Group 0 0000:01:00.0 Infiniband controller [0207]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:1019] (...) IOMMU Group 12 0000:01:00.2 Infiniband controller [0207]: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function] [15b3:1018] IOMMU Group 13 0000:01:00.3 Infiniband controller [0207]: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function] [15b3:1018] (...) Creating a guest with the Group 12 VF and trying to remove the VF from the host via echo 0 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs Makes the guest remove the VF card, but throwing a warning/error message in QEMU log: "qemu-system-ppc64: vfio: Cannot reset device 0000:01:00.2, depends on group 0 which is not owned." I found this message confusing because the VF was occupying IOMMU group 12, but the message is claiming that the reset wasn't possible because Group 0 wasn't owned by the process. Digging it a bit, the hotunplug is fired up via the poweroff state of the card triggering pSeries internals, and then reaching spapr_pci_unplug() in hw/ppc/spapr_pci.c. The body of the function reads: ------- /* some version guests do not wait for completion of a device * cleanup (generally done asynchronously by the kernel) before * signaling to QEMU that the device is safe, but instead sleep * for some 'safe' period of time. unfortunately on a busy host * this sleep isn't guaranteed to be long enough, resulting in * bad things like IRQ lines being left asserted during final * device removal. to deal with this we call reset just prior * to finalizing the device, which will put the device back into * an 'idle' state, as the device cleanup code expects. */ pci_device_reset(PCI_DEVICE(plugged_dev)); ------- My first question is right at this point: do we need PCI reset for a VF removal? I am not sure about handling IRQ lines asserted for a device that the kernel is going to remove. Going on further to the origin on the warning message we get to hw/vfio/pci.c, vfio_pci_hot_reset(). The VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl() is returning all VFs of the card, including the physical function, in the vfio_pci_hot_reset_info struct. Then, down where it verifies if all IOMMU groups required for reset belongs to the process, it fails to reset the VF because QEMU does not have Group 0 ownership: ------- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info); if (ret) { ret = -errno; error_report("vfio: hot reset info failed: %m"); goto out_single; } (...) QLIST_FOREACH(group, &vfio_group_list, next) { if (group->groupid == devices[i].group_id) { break; } } if (!group) { if (!vdev->has_pm_reset) { error_report("vfio: Cannot reset device %s, " "depends on group %d which is not owned.", vdev->vbasedev.name, devices[i].group_id); } ret = -EPERM; goto out; } ------- This message is not clear to me because I'm aware that the VF was in Group 12, but apparently the code is demanding ownership of all IOMMU Groups related to the card to allow the reset. The second question: is this intended? If not, then someone is behaving badly (perhaps the card driver, mlx5_core) and reporting wrong info to that VFIO ioctl(). If this reset behavior is intended, then I might insert a code in spapr_pci_unplug() to skip resetting the VF in this particular case to avoid the error message (assuming that we really can live without a reset in this case). Thanks, DHB [1] https://gitlab.com/libvirt/libvirt/-/issues/72 [2] https://www.redhat.com/archives/libvir-list/2021-January/msg00028.html