This is the updated version of SR-IOV support for the NVMe device. Changes since v1: - Dropped the "pcie: Set default and supported MaxReadReq to 512" patch. The original author agrees it may be no longer needed for recent kernels. - Dropped the "pcie: Add callback preceding SR-IOV VFs update" patch. A customized pc->config_write callback is used instead. - Split the "hw/nvme: Calculate BAR attributes in a function” patch into cleanup and update parts. - Reworked the configuration options related to SR-IOV. - Fixed nvme_update_msixcap_ts() for platform without MSIX support. - Updated handling of SUBSYS_SLOT_RSVD values in nvme_subsys_ctrl(). - Updated error codes returned from the Virtualization Management command (DNR is set). - Updated typedef/enum names mismatch. - Few other minor tweaks and improvements.
List of known gaps and nice-to-haves: 1) Interaction of secondary controllers with namespaces is not 100% following the spec The limitation: VF has to be visible on the PCI bus first, and only then such VF can have a namespace attached. The problem is that the mapping between controller and attached namespaces is stored in the NvmeCtrl object, and unrealized VF doesn’t have this object allocated. There are multiple ways to address the issue, but none of those we’ve considered so far is elegant. The fact, that the namespace-related code area is busy (pending patches, some rework?), doesn’t help either. 2) VFs report and support the same features as the parent PF Due to security reasons, user of a VF should be not allowed to interact with other VFs. Essentially, VFs should not announce and support: Namespace Management, Attachment, corresponding Identify commands, and maybe other features as well. 3) PMR and CMB must be disabled when SR-IOV is active The main concern is whether PMR/CMB should be even implemented for a device that supports SR-IOV. If the answer is yes, then another question arises: how the feature should behave? Simulation of different devices may require different implementation. It's too early to get into such details, so we’ve decided to disallow both features altogether if SR-IOV is enabled. 4) The Private Resources Mode The NVMe Spec defines Flexible Resources as an optional feature. VFs can alternatively support a fixed number of queues/interrupts. This SR-IOV implementation supports Flexible Resources with the Virtualization Management command, and a fallback to Private Resources is not available. Support for such fallback, if there’s demand, can be implemented later. 5) Support for Namespace Management command Device with virtualization enhancements must support the Namespace Management command. The command is not absolutely necessary to use SR-IOV, but for a more complete set of features you may want to cherry-pick this patch: https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg03107.html together with this fix: https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06734.html Knut Omang (2): pcie: Add support for Single Root I/O Virtualization (SR/IOV) pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt Lukasz Maniak (4): hw/nvme: Add support for SR-IOV hw/nvme: Add support for Primary Controller Capabilities hw/nvme: Add support for Secondary Controller List docs: Add documentation for SR-IOV and Virtualization Enhancements Łukasz Gieryk (9): pcie: Add helpers to the SR/IOV API pcie: Add 1.2 version token for the Power Management Capability hw/nvme: Implement the Function Level Reset hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime hw/nvme: Remove reg_size variable and update BAR0 size calculation hw/nvme: Calculate BAR attributes in a function hw/nvme: Initialize capability structures for primary/secondary controllers hw/nvme: Add support for the Virtualization Management command hw/nvme: Update the initalization place for the AER queue docs/pcie_sriov.txt | 115 +++++++ docs/system/devices/nvme.rst | 35 ++ hw/nvme/ctrl.c | 634 ++++++++++++++++++++++++++++++++--- hw/nvme/ns.c | 2 +- hw/nvme/nvme.h | 51 ++- hw/nvme/subsys.c | 74 +++- hw/nvme/trace-events | 6 + hw/pci/meson.build | 1 + hw/pci/pci.c | 97 ++++-- hw/pci/pcie.c | 5 + hw/pci/pcie_sriov.c | 301 +++++++++++++++++ hw/pci/trace-events | 5 + include/block/nvme.h | 65 ++++ include/hw/pci/pci.h | 12 +- include/hw/pci/pci_ids.h | 1 + include/hw/pci/pci_regs.h | 1 + include/hw/pci/pcie.h | 6 + include/hw/pci/pcie_sriov.h | 75 +++++ include/qemu/typedefs.h | 2 + 19 files changed, 1407 insertions(+), 81 deletions(-) create mode 100644 docs/pcie_sriov.txt create mode 100644 hw/pci/pcie_sriov.c create mode 100644 include/hw/pci/pcie_sriov.h -- 2.25.1