On Fri, May 5, 2023 at 8:49 PM Parav Pandit <[email protected]> wrote:
>
>
>
> > From: Jason Wang <[email protected]>
> > Sent: Thursday, May 4, 2023 11:27 PM
> >
> > So the "single stack" is kind of misleading, you need a dedicated virtio
> > mediation layer which has different code path than a simpler vfio-pci which
> > is
> > completely duplicated with vDPA subsystem.
> Huh. No. it is not duplicated.
> Vfio-pci provides the framework for extension than doing simple vfio-pci for
> extensions.
I'm not sure how to define simple here, do you mean mdev?
> I am not debating here vdpa vs non vdpa yet again.
>
> > And you lose all the advantages of
> > vDPA in this way. The device should not be designed for a single type of
> > software stack , it need to leave the decision to the hypervisor/cloud
> > vendors.
> >
> It is left to the hypervisor/cloud user to decide to use vdpa or vfio or
> something else.
>
> >
> > > virtio device type (net/blk) and be future compatible with a
> > > single vfio stack using SR-IOV or other scalable device
> > > virtualization technology to map PCI devices to the guest VM.
> > > (as transitional or otherwise)
> > >
> > > Motivation/Background:
> > > ----------------------
> > > The existing virtio transitional PCI device is missing support for PCI
> > > SR-IOV based devices. Currently it does not work beyond PCI PF, or as
> > > software emulated device in reality. Currently it has below cited
> > > system level limitations:
> > >
> > > [a] PCIe spec citation:
> > > VFs do not support I/O Space and thus VF BARs shall not indicate I/O
> > > Space.
> > >
> > > [b] cpu arch citiation:
> > > Intel 64 and IA-32 Architectures Software Developer’s Manual:
> > > The processor’s I/O address space is separate and distinct from the
> > > physical-memory address space. The I/O address space consists of 64K
> > > individually addressable 8-bit I/O ports, numbered 0 through FFFFH.
> > >
> > > [c] PCIe spec citation:
> > > If a bridge implements an I/O address range,...I/O address range will
> > > be aligned to a 4 KB boundary.
> > >
> > > Above usecase requirements can be solved by PCI PF group owner
> > > enabling the access to its group member PCI VFs legacy registers using
> > > an admin virtqueue of the group owner PCI PF.
> > >
> > > Software usage example:
> > > -----------------------
> > > The most common way to use and map to the guest VM is by using vfio
> > > driver framework in Linux kernel.
> > >
> > > +----------------------+
> > > |pci_dev_id = 0x100X |
> > > +---------------|pci_rev_id = 0x0 |-----+
> > > |vfio device |BAR0 = I/O region | |
> > > | |Other attributes | |
> > > | +----------------------+ |
> > > | |
> > > + +--------------+ +-----------------+ |
> > > | |I/O BAR to AQ | | Other vfio | |
> > > | |rd/wr mapper | | functionalities | |
> > > | +--------------+ +-----------------+ |
> > > | |
> > > +------+-------------------------+-----------+
> > > | |
> >
> >
> > So the mapper here is actually the control path mediation layer which
> > duplicates with vDPA.
> >
> Yet again no. It implements PCI level abstraction.
> It is not touching the whole QEMU layer and not at all getting involved in
> virtio device flow of understanding device reset, and device config space,
> cvq, features bits and more.
I think you miss the fact that QEMU can choose to not understand all
you mentioned here with the help of the general vdpa device.
Vhost-vDPA provides a much simpler device abstraction than vfio-pci.
If a cloud vendor wants a tiny/thin hypervisor layer, it can be done
through vDPA for sure.
> All of these were discussed in v0, lets not repeat.
>
> >
> > > +----+------------+ +----+------------+
> > > | +-----+ | | PCI VF device A |
> > > | | AQ |-------------+---->+-------------+ |
> > > | +-----+ | | | | legacy regs | |
> > > | PCI PF device | | | +-------------+ |
> > > +-----------------+ | +-----------------+
> > > |
> > > | +----+------------+
> > > | | PCI VF device N |
> > > +---->+-------------+ |
> > > | | legacy regs | |
> > > | +-------------+ |
> > > +-----------------+
> > >
> > > 2. Virtio pci driver to bind to the listed device id and
> > > use it as native device in the host.
> >
> >
> > How this can be done now?
> >
> Currently a PCI VF binds to the virtio driver and without any vdpa layering,
> virtio net/blk etc devices are created on top of virtio PCI VF device.
> Not sure I understood your question.
I meant the current virtio-pci driver can use what you propose here.
>
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd_lreg_wr_data {
> > > + u8 offset; /* Starting byte offset of the register(s) to write */
> > > + u8 size; /* Number of bytes to write into the register. */
> > > + u8 register[];
> > > +};
> > > +\end{lstlisting}
> >
> >
> > So this actually implements a transport, I wonder if it would be better
> > (and simpler) to do it on top of the transport vq proposal:
> >
> > https://lists.oasis-open.org/archives/virtio-comment/202208/msg00003.html
> >
> I also wonder why TVQ cannot use AQ.
It can for sure, but whether using a single virtqueue type for both
administration and transport is still questionable.
>
> > Then it aligns with SIOV natively.
> >
> SIOV is not well defined spec, whenever it is defined, it can use AQ or TVQ.
>
> We also discussed that mediation of hypervisor for control path in some use
> case is not desired, hence I will leave that discussion to the future when
> SIOV arrives.
We need to plan it ahead. We don't want to end up with redundant
design. For example, this proposal is actually a partial transport
implementation. Transport virtqueue can do much better in this case.
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]