On 20.10.21 15:48, Daniel P. Berrangé wrote: > On Wed, Oct 20, 2021 at 03:44:08PM +0200, David Hildenbrand wrote: >> On 18.08.21 21:42, Peter Xu wrote: >>> This is a long pending issue that we haven't fixed. The issue is in QEMU we >>> have implicit device ordering requirement when realizing, otherwise some of >>> the >>> device may not work properly. >>> >>> The initial requirement comes from when vfio-pci starts to work with >>> vIOMMUs. >>> To make sure vfio-pci will get the correct DMA address space, the vIOMMU >>> device >>> needs to be created before vfio-pci otherwise vfio-pci will stop working >>> when >>> the guest enables the vIOMMU and the device at the same time. >>> >>> AFAIU Libvirt should have code that guarantees that. For QEMU cmdline >>> users, >>> they need to pay attention or things will stop working at some point. >>> >>> Recently there's a growing and similar requirement on vDPA. It's not a hard >>> requirement so far but vDPA has patches that try to workaround this issue. >>> >>> This patchset allows us to realize the devices in the order that e.g. >>> platform >>> devices will be created first (bus device, IOMMU, etc.), then the rest of >>> normal devices. It's done simply by ordering the QemuOptsList of "device" >>> entries before realization. The priority so far comes from migration >>> priorities which could be a little bit odd, but that's really about the same >>> problem and we can clean that part up in the future. >>> >>> Libvirt can still keep its ordering for sure so old QEMU will still work, >>> however that won't be needed for new qemus after this patchset, so with the >>> new >>> binary we should be able to specify qemu cmdline as wish on '-device'. >>> >>> Logically this should also work for vDPA and the workaround code can be done >>> with more straightforward approaches. >>> >>> Please review, thanks. >> >> Hi Peter, looks like I have another use case: >> >> vhost devices can heavily restrict the number of available memslots: >> e.g., upstream KVM ~64k, vhost-user usually 32 (!). With virtio-mem >> intending to make use of multiple memslots [1] and auto-detecting how >> many to use based on currently avilable memslots when plugging and >> realizing the virtio-mem device, this implies that realizing vhost >> devices (especially vhost-user device) after virtio-mem devices can >> similarly result in issues: when trying realization of the vhost device >> with restricted memslots, QEMU will bail out. >> >> So similarly, we'd want to realize any vhost-* before any virtio-mem device. > > Ordering virtio-mem vs vhost-* devices doesn't feel like a good > solution to this problem. eg if you start a guest with several > vhost-* devices, then virtio-mem auto-decides to use all/most > remaining memslots, we've now surely broken the abiltiy to then > hotplug more vhost-* devices at runtime by not leaving memslots > for them.
You can hotplug vhost-* devices devices as you want; they don't "consume" memslots, they can only restrict the number of total memslots if they provide less.. We have this situation today already: Coldplug/hotplug > 32 DIMMs to a VM. Then hotplug a vhost-user device that's based on libvhost-user or rust's vhost-user-backend. Hotplug will fail. Nothing really different with virtio-mem, except that you can configure how many memslots it should actually use if you care about above situation. > > I think virtio-mem configuration needs to be stable in its memslot > usage regardless of how many other types of devices are present, > and not auto-adjust how many it consumes. There is a parameter to limit the number of memslots a virtio-mem device can use ("max-memslots") to handle such corner-case environments as you describe. Set to 1 - exactly one ("old behavior"). Set to 0 - auto-detect. Set to > 1 - auto detect and cap at the given value. 99.999% of all users don't care about hotplug of limiting vhost devices and will happily use "0". The remainder can be handled via realization priority. Nothing to confuse ordinary users with IMHO. -- Thanks, David / dhildenb