On 2023/11/28 17:47, Yui Washizu wrote:

On 2023/11/18 21:10, Akihiko Odaki wrote:
Hi,

We are planning to add PCIe SR-IOV support to the virtio-net driver for Windows ("NetKVM")[1], and we want a SR-IOV feature for virtio-net emulation code in QEMU to test it. I expect there are other people interested in such a feature, considering that people are using igb[2] to test SR-IOV support in VMs.

Washizu Yui have already proposed an RFC patch to add a SR-IOV feature to virtio-net emulation[3][4] but it's preliminary and has no configurability for VFs.

Now I'm proposing to add SR-IOV support to virtio-net with full configurability for VFs by following the implementation of virtio-net failover[5]. I'm planning to write patches myself, but I know there are people interested in such patches so I'd like to let you know the idea beforehand.

The idea:

The problem when implementing configurability for VFs is that SR-IOV VFs can be realized and unrealized at runtime with a request from the guest. So a naive implementation cannot deal with a command line like the following:
-device virtio-net-pci,addr=0x0.0x0,sriov=on
-device virtio-net-pci,addr=0x0.0x1
-device virtio-net-pci,addr=0x0.0x2

This will realize the virtio-net functions in 0x0.0x1 and 0x0.0x2 when the guest starts instead of when the guest requests to enable VFs.

However, reviewing the virtio-net emulation code, I realized the virtio-net failover also "hides" devices when the guest starts. The following command line hides hostdev0 when the guest starts, and adds it when the guest requests VIRTIO_NET_F_STANDBY feature:

-device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \
  bus=root2,failover=on
-device vfiopci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1

So it should be also possible to do similar to "hide" VFs and realize/unrealize them when the guest requests.

There are two things I hate with this idea when contrasting it with the conventional multifunction feature[6] though. One is that the PF must be added before VFs; a similar limitation is imposed for failover.

Another is that it will be specific to virtio-net. I was considering to implement a "generic" SR-IOV feature that will work on various devices, but I realized that will need lots of configuration validations. We may eventually want it, but probably it's better to avoid such a big leap as the first step.

Please tell me if you have questions or suggestions.



Hi, Odaki-san

Hi,


The idea appears to be practical and convenient.

I have some things I want to confirm.
I understood your idea can make deices for VFs,
created by qdev_new or qdev_realize function, invisible from guest OS.
Is my understanding correct ?

Yes, the guest will request to enable VFs with the standard SR-IOV capability, and the virtio-net implementation will use appropriate QEMU-internal APIs to create and realize VFs accordingly.

And, if your idea is realized,
will it be possible to specify the backend device for the virtio-pci-net device ?

Yes, you can specify netdev like conventional virtio-net devices.


Could you provide insights into the next steps
beyond the implementation details ?
About when do you expect your implementation
to be merged into qemu ?
Do you have a timeline for this plan ?
Moreover, is there any way
we can collaborate on the implementation you're planning ?

I intend to upstream my implementation. The flexibility of this design will make the SR-IOV support useful for many people and make it suitable for upstreaming. I also expect the implementation will be clean enough for upstreaming. I'll submit it to the mailing list when I finish the implementation so I'd like you to test and review it.

By the way, I started the implementation and realized it may be better to change the design so I present the design changes below:

First I intend to change the CLI. The interface in my last proposal expects there is only one PF in a bus and it is marked with "sriov" property. However, the specification allows to have multiple PFs in a bus so it's better to design the CLI so that it allows to have multiple PFs though I'm not going to implement such a feature at first.

The new CLI will instead add "sriov-pf" property to VFs, which designates the PF paired with them. The below is an example of a command line conforming to the new interface:

-device virtio-net-pci,addr=0x0.0x3,netdev=tap3,sriov-pf=pf1
-device virtio-net-pci,addr=0x0.0x2,netdev=tap2,id=pf1
-device virtio-net-pci,addr=0x0.0x1,netdev=tap1,sriov-pf=pf0
-device virtio-net-pci,addr=0x0.0x0,netdev=tap0,id=pf0

Another design change is *not* to use the "device hiding" API of failover. It is because fully-realized devices are useful when validating the configuration. In particular, VFs must have a consistent BAR configuration, and that can be validated only after they are realized.

So I'm now considering to have "prototype VFs" realized before the PF gets realized. Prototype VFs will be fully realized, but virtio_write_config() and virtio_read_config() will do nothing for those VFs, which effectively disables them. It is similar how functions are disabled until function 0 gets plugged for a conventional multifunction device (c.f., pci_host_config_write_common() and pci_host_config_read_common()).

When the PF gets realized, the PF will validate the configuration by inspecting the prototype VFs. If the configuration looks valid, the PF backs up DeviceState::opts and unplugs them. The PF will later use the backed up device options to realize VFs when the guest requests.

This design change forces to create VFs before the PF in the command line. It is similar that the conventional multifunction requires function 0 to be realized after the other functions.

I may make other design changes as the implementation progresses, but the above is the current design I have in mind.

Regards,
Akihiko Odaki

Reply via email to