On Thu, Jan 21, 2021 at 04:09:09PM +, John Levon wrote:
>
> Hi, please take a look. For context, this addition is against the current
> https://github.com/nutanix/libvfio-user/blob/master/docs/vfio-user.rst
> specification.
>
> kvm@ readers: Stefan suggested this may be of interest from a VFIO point of
> view, in case there is any potential cross-over in defining how to wire up
> these
> fds.
After talking to Alex Williamson about including these APIs in the
kernel VFIO ioctls it became clear to me that there is an architectural
difference between classic vfio-pci devices and vfio-user devices:
The current model with kernel VFIO is that the VMM sets up ioeventfds
using ioctl(VFIO_DEVICE_IOEVENTFD). The VMM has device-specific
knowledge that allows it to add ioeventfds for specific hardware
registers.
In vfio-user the VMM has no knowledge of the specific device. Instead
the device tells the VMM how to configure ioeventfds and ioregionfds.
There are a few reasons for this:
1. It is not practical to add knowledge of every vfio-user device
implementation into the VMM. A cleaner solution is for the device to
advertise its desired configuration to the VMM instead.
2. The device implementation may prefer a specific ioeventfd or
ioregionfd configuration to fit its multi-threading architecture. A
multi-threaded (multi-queue) device implementation may want
ioeventfds and ioregionfds to be configured so that the right thread
receives certain MMIO/PIO accesses.
3. The device implementation may want to specify ioregionfd's user_data
value. This is the application-specific value that is passed along
with an MMIO/PIO access.
I think mdev fits in somewhere between these two models. I can imagine
complex mdev devices wanting to control the ioeventfd and ioregionfd
configuration just like vfio-user. But in simple cases it doesn't
matter.
I suggest including John's ioeventfd/ioregionfd work into the kernel
VFIO interface so mdev devices can take advantage of it too.
> Note that this is a new message instead of adding a new region capability to
> VFIO_USER_DEVICE_GET_REGION_INFO: with a new capability, there's no way for
> the
> server to know if ioeventfd/ioregionfd is actually used/requested by the
> client
> (the client would just have to close those fds if it didn't want to use FDs).
> An
> explicit new call is much clearer for this.
>
> The ioregionfd feature itself is at proposal stage, so there's a good chance
> of
> further changes depending on that.
>
> I also have these pending issues so far:
>
> 1) I'm not familiar with CCW bus, so don't know if
> KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY flag makes sense or is supportable in
> vfio-user context
>
> 2) if ioregionfd subsumes all ioeventfd use cases, we can remove the
> distinction
> from this API, and the client caller can translate to ioregionfd or ioeventfd
> as
> needed
>
> 3) is it neccessary for the client to indicate support (e.g. lacking
> ioregionfd
> support) ?
>
> 4) (ioregionfd issue) region_id is 4 bytes, which seems a little awkward from
> the server side: this has to encode both the region ID as well as the offset
> of
> the sub-region within that region. Can this be 64 bits wide?
The user_data field in Elena's ioregionfd patches is 64 bits. Does this
solve the problem?
>
> thanks
> john
>
> (NB: I edited the diff so the new sections are more readable.)
>
> diff --git a/docs/vfio-user.rst b/docs/vfio-user.rst
> index e3adc7a..e7274a2 100644
> --- a/docs/vfio-user.rst
> +++ b/docs/vfio-user.rst
> @@ -161,6 +161,17 @@ in the region info reply of a device-specific region.
> Such regions are reflected
> in ``struct vfio_device_info.num_regions``. Thus, for PCI devices this value
> can
> be equal to, or higher than, VFIO_PCI_NUM_REGIONS.
>
> +Region I/O via file descriptors
> +---
> +
> +For unmapped regions, region I/O from the client is done via
> +VFIO_USER_REGION_READ/WRITE. As an optimization, ioeventfds or ioregionfds
> may
> +be configured for sub-regions of some regions. A client may request
> information
> +on these sub-regions via VFIO_USER_DEVICE_GET_REGION_IO_FDS; by configuring
> the
> +returned file descriptors as ioeventfds or ioregionfds, the server can be
> +directly notified of I/O (for example, by KVM) without taking a trip through
> the
> +client.
> +
> Interrupts
> ^^
> The client uses VFIO_USER_DEVICE_GET_IRQ_INFO messages to query the server
> for
> @@ -293,37 +304,39 @@ Commands
> The following table lists the VFIO message command IDs, and whether the
> message command is sent from the client or the server.
>
> -+--+-+---+
> -| Name | Command | Request Direction |
> -+==+=+===+
> -| VFIO_USER_VERSION| 1 | client -> server |
>