On 2023/11/28 17:47, Yui Washizu wrote:
On 2023/11/18 21:10, Akihiko Odaki wrote:
Hi,
We are planning to add PCIe SR-IOV support to the virtio-net driver
for Windows ("NetKVM")[1], and we want a SR-IOV feature for virtio-net
emulation code in QEMU to test it. I expect there are other people
interested in such a feature, considering that people are using igb[2]
to test SR-IOV support in VMs.
Washizu Yui have already proposed an RFC patch to add a SR-IOV feature
to virtio-net emulation[3][4] but it's preliminary and has no
configurability for VFs.
Now I'm proposing to add SR-IOV support to virtio-net with full
configurability for VFs by following the implementation of virtio-net
failover[5]. I'm planning to write patches myself, but I know there
are people interested in such patches so I'd like to let you know the
idea beforehand.
The idea:
The problem when implementing configurability for VFs is that SR-IOV
VFs can be realized and unrealized at runtime with a request from the
guest. So a naive implementation cannot deal with a command line like
the following:
-device virtio-net-pci,addr=0x0.0x0,sriov=on
-device virtio-net-pci,addr=0x0.0x1
-device virtio-net-pci,addr=0x0.0x2
This will realize the virtio-net functions in 0x0.0x1 and 0x0.0x2 when
the guest starts instead of when the guest requests to enable VFs.
However, reviewing the virtio-net emulation code, I realized the
virtio-net failover also "hides" devices when the guest starts. The
following command line hides hostdev0 when the guest starts, and adds
it when the guest requests VIRTIO_NET_F_STANDBY feature:
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \
bus=root2,failover=on
-device vfiopci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1
So it should be also possible to do similar to "hide" VFs and
realize/unrealize them when the guest requests.
There are two things I hate with this idea when contrasting it with
the conventional multifunction feature[6] though. One is that the PF
must be added before VFs; a similar limitation is imposed for failover.
Another is that it will be specific to virtio-net. I was considering
to implement a "generic" SR-IOV feature that will work on various
devices, but I realized that will need lots of configuration
validations. We may eventually want it, but probably it's better to
avoid such a big leap as the first step.
Please tell me if you have questions or suggestions.
Hi, Odaki-san
Hi,
The idea appears to be practical and convenient.
I have some things I want to confirm.
I understood your idea can make deices for VFs,
created by qdev_new or qdev_realize function, invisible from guest OS.
Is my understanding correct ?
Yes, the guest will request to enable VFs with the standard SR-IOV
capability, and the virtio-net implementation will use appropriate
QEMU-internal APIs to create and realize VFs accordingly.
And, if your idea is realized,
will it be possible to specify the backend device for the virtio-pci-net
device ?
Yes, you can specify netdev like conventional virtio-net devices.
Could you provide insights into the next steps
beyond the implementation details ?
About when do you expect your implementation
to be merged into qemu ?
Do you have a timeline for this plan ?
Moreover, is there any way
we can collaborate on the implementation you're planning ?
I intend to upstream my implementation. The flexibility of this design
will make the SR-IOV support useful for many people and make it suitable
for upstreaming. I also expect the implementation will be clean enough
for upstreaming. I'll submit it to the mailing list when I finish the
implementation so I'd like you to test and review it.
By the way, I started the implementation and realized it may be better
to change the design so I present the design changes below:
First I intend to change the CLI. The interface in my last proposal
expects there is only one PF in a bus and it is marked with "sriov"
property. However, the specification allows to have multiple PFs in a
bus so it's better to design the CLI so that it allows to have multiple
PFs though I'm not going to implement such a feature at first.
The new CLI will instead add "sriov-pf" property to VFs, which
designates the PF paired with them. The below is an example of a command
line conforming to the new interface:
-device virtio-net-pci,addr=0x0.0x3,netdev=tap3,sriov-pf=pf1
-device virtio-net-pci,addr=0x0.0x2,netdev=tap2,id=pf1
-device virtio-net-pci,addr=0x0.0x1,netdev=tap1,sriov-pf=pf0
-device virtio-net-pci,addr=0x0.0x0,netdev=tap0,id=pf0
Another design change is *not* to use the "device hiding" API of
failover. It is because fully-realized devices are useful when
validating the configuration. In particular, VFs must have a consistent
BAR configuration, and that can be validated only after they are realized.
So I'm now considering to have "prototype VFs" realized before the PF
gets realized. Prototype VFs will be fully realized, but
virtio_write_config() and virtio_read_config() will do nothing for those
VFs, which effectively disables them. It is similar how functions are
disabled until function 0 gets plugged for a conventional multifunction
device (c.f., pci_host_config_write_common() and
pci_host_config_read_common()).
When the PF gets realized, the PF will validate the configuration by
inspecting the prototype VFs. If the configuration looks valid, the PF
backs up DeviceState::opts and unplugs them. The PF will later use the
backed up device options to realize VFs when the guest requests.
This design change forces to create VFs before the PF in the command
line. It is similar that the conventional multifunction requires
function 0 to be realized after the other functions.
I may make other design changes as the implementation progresses, but
the above is the current design I have in mind.
Regards,
Akihiko Odaki