On 8/21/20 4:43 PM, Alex Williamson wrote:

When a device is added to a live group there's a risk that it will be
auto-probed by a host driver, if that occurs then isolation of the
group has been violated and vfio code will BUG_ON to halt the system.
The warning is effectively just a notification that we're in a risky
situation where the wrong driver binding could escalate the issue.

There is a ToDo in the code at that point to prevent driver probing,
but ISTR at that time we may not have had a good way to do that.  I'm
not sure if we do now either.  We have the driver_override field for
the device that we could write into, but at this point we're looking at
a generic device, we don't even know that it's a PCI device.  We could
determine that, but even then it's not clear that the kernel should set
the policy to define that it should be bound to the vfio-pci driver,
potentially versus other vfio drivers that could legitimately manage
the device safely.  If we write a random string to the driver_override
field we could prevent automatic binding to any driver, but then we put
a barrier to making use of the device, which seems like it has support
issues as well.  I'm not sure what the best approach is... that's why
we currently generate a warning and hope it doesn't happen.

Interesting, it definitely seems like there's no easy generic solution
then.

In my use case, the devices that will be hotplugged have a known
vendor+product ID and are already registered with the vfio-pci driver
via /sys/bus/pci/drivers/vfio-pci/new_id. In this case, it
should be safe to write into driver_override, since the user has
already explicitly stated that they wish to use the vfio-pci driver,
right?

On a truly bare metal platform, I don't think this should ever occur in
practice without manually removing and re-scanning devices.  We'd
expect PCIe hotplug to occur on the slot level with isolation to the
downstream port providing that slot.  Without that isolation, or the
increasingly unlikely chance of encountering this with conventional PCI
hotplug, we'd probably hand wave the system as inappropriate for the
task.  Here I think you have a bare metal hypervisor exposing portions
of devices to the "host" in unusual ways that can trigger this and are
expected to be supported.

Sorry, I should have been more clear. I'm encountering the warning when
hotplugging virtual PCI devices (ivshmem) to the guest which accesses
them from its userspace with VFIO - there's no physical PCI device being
passed through.

I think the QEMU pseries/sPAPR platform differs from conventional X86_64
platforms like Q35 in how it handles hotplug. Specifically, all devices
on a given spapr-pci-host-bridge end up in the same IOMMU group, even
for hotplugged slots. This is why it'd be nice to have a solution to
allow VFIO to handle this gracefully, but it certainly doesn't seem
as straightforward as I'd hoped.

Sorry, I don't have a good proposal to
resolve how we should handle group composition changing while the group
is in use... thus we currently just whine about it with a warning.

Thank you for sharing your thoughts. It sounds like in my case where
the device is known to not be registered with any other kernel drivers,
this warning should be fine to ignore in practice, though it'd definitely
be nice to have a way to suppress it in known-safe situations.

Thanks,

Alex


_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users

Reply via email to