On Mon, Aug 23, 2021 at 03:18:51PM -0400, Peter Xu wrote: > On Mon, Aug 23, 2021 at 02:49:12PM -0400, Eduardo Habkost wrote: > > On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote: > > > QEMU creates -device objects in order as specified by the user's cmdline. > > > However that ordering may not be the ideal order. For example, some > > > platform > > > devices (vIOMMUs) may want to be created earlier than most of the rest > > > devices (e.g., vfio-pci, virtio). > > > > > > This patch orders the QemuOptsList of '-device's so they'll be sorted > > > first > > > before kicking off the device realizations. This will allow the device > > > realization code to be able to use APIs like > > > pci_device_iommu_address_space() > > > correctly, because those functions rely on the platfrom devices being > > > realized. > > > > > > Now we rely on vmsd->priority which is defined as MigrationPriority to > > > provide > > > the ordering, as either VM init and migration completes will need such an > > > ordering. In the future we can move that priority information out of > > > vmsd. > > > > > > Signed-off-by: Peter Xu <pet...@redhat.com> > > > > Can we be 100% sure that changing the ordering of every single > > device being created won't affect guest ABI? (I don't think we can) > > That's a good question, however I doubt whether there's any real-world guest > ABI for that. As a developer, I normally specify cmdline parameter in an > adhoc > way, so that I assume most parameters are not sensitive to ordering and I can > tune the ordering as wish. I'm not sure whether that's common for qemu users, > I would expect so, but I may have missed something that I'm not aware of.
To give just one example: $ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device virtio-net-pci -device e1000e -monitor stdio | tail -n 20 Bus 0, device 4, function 0: Ethernet controller: PCI device 1af4:1000 PCI subsystem 1af4:0001 IRQ 0, pin A BAR0: I/O at 0xffffffffffffffff [0x001e]. BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe]. id "" Bus 0, device 5, function 0: Ethernet controller: PCI device 8086:10d3 PCI subsystem 8086:0000 IRQ 0, pin A BAR0: 32 bit memory at 0xffffffffffffffff [0x0001fffe]. BAR1: 32 bit memory at 0xffffffffffffffff [0x0001fffe]. BAR2: I/O at 0xffffffffffffffff [0x001e]. BAR3: 32 bit memory at 0xffffffffffffffff [0x00003ffe]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe]. id "" (qemu) quit $ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device e1000e -device virtio-net-pci -monitor stdio | tail -n 20 Bus 0, device 4, function 0: Ethernet controller: PCI device 8086:10d3 PCI subsystem 8086:0000 IRQ 0, pin A BAR0: 32 bit memory at 0xffffffffffffffff [0x0001fffe]. BAR1: 32 bit memory at 0xffffffffffffffff [0x0001fffe]. BAR2: I/O at 0xffffffffffffffff [0x001e]. BAR3: 32 bit memory at 0xffffffffffffffff [0x00003ffe]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe]. id "" Bus 0, device 5, function 0: Ethernet controller: PCI device 1af4:1000 PCI subsystem 1af4:0001 IRQ 0, pin A BAR0: I/O at 0xffffffffffffffff [0x001e]. BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe]. BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe]. id "" (qemu) quit If the order of the -device arguments changes, the devices are assigned to different PCI slots. > > Per my knowledge the only "guest ABI" change is e.g. when we specify > "vfio-pci" > to be before "intel-iommu": it'll be constantly broken before this patchset, > while after this series it'll be working. It's just that I don't think those > "guest ABI" is necessary to be kept, and that's exactly what I want to fix > with > the patchset.. If the only ordering changes caused by this patch were intentional and affected only configurations that are known to be broken (like vfio-pci vs intel-iommu), I would agree. However, if we are reordering every single -device option in an unspecified way (like qsort() does when elements compare as equal), we are probably breaking guest ABI and creating a completely different machine (like in the PCI example above). > > > > > How many device types in QEMU have non-default vmsd priority? > > Not so much; here's the list of priorities and the devices using it: > > |--------------------+---------| > | priority | devices | > |--------------------+---------| > | MIG_PRI_IOMMU | 3 | > | MIG_PRI_PCI_BUS | 7 | > | MIG_PRI_VIRTIO_MEM | 1 | > | MIG_PRI_GICV3_ITS | 1 | > | MIG_PRI_GICV3 | 1 | > |--------------------+---------| > > All the rest devices are using the default (0) priority. > > > > > Can we at least ensure devices with the same priority won't be > > reordered, just to be safe? (qsort() doesn't guarantee that) > > > > If very few device types have non-default vmsd priority and > > devices with the same priority aren't reordered, the risk of > > compatibility breakage would be much smaller. > > I'm also wondering whether it's a good thing to break some guest ABI due to > this change, if possible. > > Let's imagine something breaks after applied, then the only reason should be > that qsort() changed the order of some same-priority devices and it's not the > same as user specified any more. Then, does it also means there's yet another > ordering requirement that we didn't even notice? Does the PCI slot assignment example above demonstrate a ordering requirement? I don't think it does. It just demonstrates that different orderings are expected to give different results, but both orderings are valid. > > I doubt whether that'll even happen (or I think there'll be report already, as > in qemu man page there's no requirement on parameter ordering). In all cases, > instead of "keeping the same priority devices in the same order as the user > has > specified", IMHO we should make the broken devices to have different > priorities > so the ordering will be guaranteed by qemu internal, rather than how user > specified it. > > From that pov, maybe this patchset would be great if it can be accepted and > applied in early stage of a release? So we can figure out what's missing and > fix them within the same release. However again I still doubt whether there's > any user that will break in a bad way. > > Thanks, > > -- > Peter Xu > -- Eduardo