Peter Xu <pet...@redhat.com> writes:

> There're a few things off here in that logic, rewrite it.  When at it, add
> rich comment to explain each of the decisions.
>
> Since this is very sensitive path for migration, below are the list of
> things changed with their reasonings.
>
>   (1) Exact pending size is only needed for precopy not postcopy
>
>       Fundamentally it's because "exact" version only does one more deep
>       sync to fetch the pending results, while in postcopy's case it's
>       never going to sync anything more than estimate as the VM on source
>       is stopped.
>
>   (2) Do _not_ rely on threshold_size anymore to decide whether postcopy
>       should complete
>
>       threshold_size was calculated from the expected downtime and
>       bandwidth only during precopy as an efficient way to decide when to
>       switchover.  It's not sensible to rely on threshold_size in postcopy.
>
>       For precopy, if switchover is decided, the migration will complete
>       soon.  It's not true for postcopy.  Logically speaking, postcopy
>       should only complete the migration if all pending data is flushed.
>
>       Here it used to work because save_complete() used to implicitly
>       contain save_live_iterate() when there's pending size.
>
>       Even if that looks benign, having RAMs to be migrated in postcopy's
>       save_complete() has other bad side effects:
>
>       (a) Since save_complete() needs to be run once at a time, it means
>       when moving RAM there's no way moving other things (rather than
>       round-robin iterating the vmstate handlers like what we do with
>       ITERABLE phase).  Not an immediate concern, but it may stop working
>       in the future when there're more than one iterables (e.g. vfio
>       postcopy).
>
>       (b) postcopy recovery, unfortunately, only works during ITERABLE
>       phase. IOW, if src QEMU moves RAM during postcopy's save_complete()
>       and network failed, then it'll crash both QEMUs... OTOH if it failed
>       during iteration it'll still be recoverable.  IOW, this change should
>       further reduce the window QEMU split brain and crash in extreme cases.
>
>       If we enable the ram_save_complete() tracepoints, we'll see this
>       before this patch:
>
>       1267959@1748381938.294066:ram_save_complete dirty=9627, done=0
>       1267959@1748381938.308884:ram_save_complete dirty=0, done=1
>
>       It means in this migration there're 9627 pages migrated at complete()
>       of postcopy phase.
>
>       After this change, all the postcopy RAM should be migrated in iterable
>       phase, rather than save_complete():
>
>       1267959@1748381938.294066:ram_save_complete dirty=0, done=0
>       1267959@1748381938.308884:ram_save_complete dirty=0, done=1
>
>   (3) Adjust when to decide to switch to postcopy
>
>       This shouldn't be super important, the movement makes sure there's
>       only one in_postcopy check, then we are clear on what we do with the
>       two completely differnt use cases (precopy v.s. postcopy).
>
>   (4) Trivial touch up on threshold_size comparision
>
>       Which changes:
>
>       "(!pending_size || pending_size < s->threshold_size)"
>
>       into:
>
>       "(pending_size <= s->threshold_size)"
>
> Reviewed-by: Juraj Marcin <jmar...@redhat.com>
> Signed-off-by: Peter Xu <pet...@redhat.com>

Reviewed-by: Fabiano Rosas <faro...@suse.de>

Reply via email to