On Mon, May 16, 2022 at 02:22:00PM -0600, Alex Williamson wrote: > On Mon, 16 May 2022 13:22:14 +0200 > Juan Quintela <quint...@redhat.com> wrote: > > > Avihai Horon <avih...@nvidia.com> wrote: > > > Currently, if IOMMU of a VFIO container doesn't support dirty page > > > tracking, migration is blocked completely. This is because a DMA-able > > > VFIO device can dirty RAM pages without updating QEMU about it, thus > > > breaking the migration. > > > > > > However, this doesn't mean that migration can't be done at all. If > > > migration pre-copy phase is skipped, the VFIO device doesn't have a > > > chance to dirty RAM pages that have been migrated already, thus > > > eliminating the problem previously mentioned. > > > > > > Hence, in such case allow migration but skip pre-copy phase. > > > > > > Signed-off-by: Avihai Horon <avih...@nvidia.com> > > > > I don't know (TM). > > Several issues: > > - Patch is ugly as hell (ok, that depends on taste) > > - It changes migration_iteration_run() instead of directly > > migration_thread. > > - There is already another case where we skip the sending of RAM > > (localhost migration with shared memory) > > > > In migration/ram.c: > > > > static int ram_find_and_save_block(RAMState *rs, bool last_stage) > > { > > PageSearchStatus pss; > > int pages = 0; > > bool again, found; > > > > /* No dirty page as there is zero RAM */ > > if (!ram_bytes_total()) { > > return pages; > > } > > > > This is the other place where we _don't_ send any RAM at all. > > > > I don't have a great idea about how to make things clear at a higher > > level, I have to think about this. > > It seems like if we have devices dictating what type of migrations can > be performed then there probably needs to be a switch to restrict use of > such devices just as we have the -only-migratable switch now to prevent > attaching devices that don't support migration. I'd guess that we need > the switch to opt-in to allowing such devices to maintain > compatibility. There's probably a whole pile of qapi things missing to > expose this to management tools as well. Thanks,
This is really intended to be a NOP from where things are now, as if you use mlx5 live migration without a patch like this then it causes a botched pre-copy since everything just ends up permanently dirty. If it makes more sense we could abort the pre-copy too - in the end there will be dirty tracking so I don't know if I'd invest in a big adventure to fully define non-dirty tracking migration. Jason