On 2025-09-22 11:51, Peter Xu wrote:
> On Mon, Sep 22, 2025 at 02:58:38PM +0200, Juraj Marcin wrote:
> > Hi Fabiano,
> > 
> > On 2025-09-19 13:46, Fabiano Rosas wrote:
> > > Juraj Marcin <[email protected]> writes:
> > > 
> > > Hi Juraj,
> > > 
> > > Good patch, nice use of migrate_has_failed()
> > 
> > Thanks!
> > 
> > > 
> > > > From: Juraj Marcin <[email protected]>
> > > >
> > > > Currently, there are two functions that are responsible for cleanup of
> > > > the incoming migration state. With successful precopy, it's the main
> > > > thread and with successful postcopy it's the listen thread. However, if
> > > > postcopy fails during in the device load, both functions will try to do
> > > > the cleanup. Moreover, when exit-on-error parameter was added, it was
> > > > applied only to precopy.
> > > >
> > > 
> > > Someone could be relying in postcopy always exiting on error while
> > > explicitly setting exit-on-error=false for precopy and this patch would
> > > change the behavior incompatibly. Is this an issue? I'm willing to
> > > ignore it, but you guys know more about postcopy.
> > 
> > Good question. When going through older patches where postcopy listen
> > thread and then where exit-on-error were implemented, it seemed more
> > like an overlook than intentional omission. However, it might be better
> > to not break any potential users of this, we could add another option,
> > "exit-on-postcopy-error" that would allow such handling if postscopy
> > failed unrecoverably. I've already talked about such option with
> > @jdenemar and he agreed with it.
> 
> The idea for postcopy ram is, it should never fail.. as failing should
> never be better than a pause.  Block dirty bitmap might be different,
> though, when enabled separately.
> 
> For postcopy-ram, qemu_loadvm_state_main() will in reality only receive RAM
> updates. It'll almost always trigger the postcopy_pause_incoming() path
> when anything fails.
> 
> For pure block-dirty-bitmap-only styled postcopy: for this exit-on-error, I
> also don't think we should really "exit on errors", even if the flag is
> set.  IIUC, it's not fatal to the VM if that failed, as described in:

I agree, however, this patch doesn't add any new cases in which the
destination QEMU would exit. If there is an error in block dirty bitmaps
it is only reported to the console, and then it continues to waiting for
main_thread_load_event, marks the migration as COMPLETED and does the
cleanup, same as before. [1] I can add a comment similar to "prevent
further exit" as there was before.

However, if there is other error, in which the postcopy cannot pause
(for example there was a failure in the main thread loading the device
state before the machine started), the migration status changes to
FAILED and jumps right to cleanup which then checks exit-on-error and
optionally exits the QEMU, before it would always exit in such case [2]:

[1]: 
https://gitlab.com/qemu-project/qemu/-/blob/ab8008b231e758e03c87c1c483c03afdd9c02e19/migration/savevm.c#L2120
[2]: 
https://gitlab.com/qemu-project/qemu/-/blob/ab8008b231e758e03c87c1c483c03afdd9c02e19/migration/savevm.c#L2150

> 
> commit ee64722514fabcad2430982ade86180208f5be4f
> Author: Vladimir Sementsov-Ogievskiy <[email protected]>
> Date:   Mon Jul 27 22:42:32 2020 +0300
> 
>     migration/savevm: don't worry if bitmap migration postcopy failed
> 
>     ...
> 
>     And anyway, bitmaps postcopy is not prepared to be somehow recovered.
>     The original idea instead is that if bitmaps postcopy failed, we just
>     lose some bitmaps, which is not critical. So, on failure we just need
>     to remove unfinished bitmaps and guest should continue execution on
>     destination.
> 
> Hence, exit here might be an overkill.. need block developers to double
> check, though..
> 

/* snip */

> > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > index fabbeb296a..d7eb416d48 100644
> > > > --- a/migration/savevm.c
> > > > +++ b/migration/savevm.c
> > > > @@ -2069,6 +2069,11 @@ static int 
> > > > loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> > > >      return 0;
> > > >  }
> > > >  
> > > > +static void postcopy_ram_listen_thread_bh(void *opaque)
> > > > +{
> > > > +    migration_incoming_finish();
> > > > +}
> > > > +
> > > >  /*
> > > >   * Triggered by a postcopy_listen command; this thread takes over 
> > > > reading
> > > >   * the input stream, leaving the main thread free to carry on loading 
> > > > the rest
> > > > @@ -2122,52 +2127,31 @@ static void *postcopy_ram_listen_thread(void 
> > > > *opaque)
> > > >                           "bitmaps may be lost, and present migrated 
> > > > dirty "
> > > >                           "bitmaps are correctly migrated and valid.",
> > > >                           __func__, load_res);
> > > > -            load_res = 0; /* prevent further exit() */
> > > >          } else {
> > > >              error_report("%s: loadvm failed: %d", __func__, load_res);
> > > >              migrate_set_state(&mis->state, 
> > > > MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > >                                             MIGRATION_STATUS_FAILED);
> > > > +            goto out;
> > > >          }
> > > >      }
> > > > -    if (load_res >= 0) {
> > > > -        /*
> > > > -         * This looks good, but it's possible that the device loading 
> > > > in the
> > > > -         * main thread hasn't finished yet, and so we might not be in 
> > > > 'RUN'
> > > > -         * state yet; wait for the end of the main thread.
> > > > -         */
> > > > -        qemu_event_wait(&mis->main_thread_load_event);
> > > > -    }
> > > > -    postcopy_ram_incoming_cleanup(mis);
> > > > -
> > > > -    if (load_res < 0) {
> > > > -        /*
> > > > -         * If something went wrong then we have a bad state so exit;
> > > > -         * depending how far we got it might be possible at this point
> > > > -         * to leave the guest running and fire MCEs for pages that 
> > > > never
> > > > -         * arrived as a desperate recovery step.
> > > > -         */
> > > > -        rcu_unregister_thread();
> > > > -        exit(EXIT_FAILURE);
> > > > -    }
> > > > +    /*
> > > > +     * This looks good, but it's possible that the device loading in 
> > > > the
> > > > +     * main thread hasn't finished yet, and so we might not be in 'RUN'
> > > > +     * state yet; wait for the end of the main thread.
> > > > +     */
> > > > +    qemu_event_wait(&mis->main_thread_load_event);
> 
> PS: I didn't notice this change, looks like this may be better to be a
> separate patch when moving out of the if.  Meanwhile, I don't think we set
> it right either, in qemu_loadvm_state():
> 
>     qemu_event_set(&mis->main_thread_load_event);
> 
> The problem is e.g. load_snapshot / qmp_xen_load_devices_state also set
> that event, even if there'll be no one to consume it.. not a huge deal, but
> maybe while moving it out of the if, we can also cleanup the set() side
> too, by moving the set() upper into process_incoming_migration_co().

While I have moved it out of the condition, it is still only waited in
the success path, if there is an error that would previously cause the
condition to be false, the execution now jumps directly to the cleanup
section (out label), skipping this wait and setting migration state to
COMPLETED (it's set to FAILED before the jump). But I can still look
into moving the set() up.

> 
> > > >  
> > > >      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > >                                     MIGRATION_STATUS_COMPLETED);
> > > > -    /*
> > > > -     * If everything has worked fine, then the main thread has waited
> > > > -     * for us to start, and we're the last use of the mis.
> > > > -     * (If something broke then qemu will have to exit anyway since 
> > > > it's
> > > > -     * got a bad migration state).
> > > > -     */
> > > > -    bql_lock();
> > > > -    migration_incoming_state_destroy();
> > > > -    bql_unlock();
> > > >  
> > > > +out:
> > > >      rcu_unregister_thread();
> > > > -    mis->have_listen_thread = false;
> > > >      postcopy_state_set(POSTCOPY_INCOMING_END);
> > > >  
> > > >      object_unref(OBJECT(migr));
> > > >  
> > > > +    migration_bh_schedule(postcopy_ram_listen_thread_bh, NULL);
> > > 
> > > Better to schedule before the object_unref to ensure there's always
> > > someone holding a reference?
> > 
> > True, I'll move it.
> 
> Good point.  Though I'm not sure moving it upper would help, because it'll
> be the BH that references the MigrationState*..  So maybe we could unref at
> the end of postcopy_ram_listen_thread_bh().  If so, we should add a comment
> on ref() / unref() saying how they're paired.
> 
> > 
> > > 
> > > > +
> > > >      return NULL;
> > > >  }
> > > >  
> > > > @@ -2217,7 +2201,7 @@ static int 
> > > > loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
> > > >      mis->have_listen_thread = true;
> > > >      postcopy_thread_create(mis, &mis->listen_thread,
> > > >                             MIGRATION_THREAD_DST_LISTEN,
> > > > -                           postcopy_ram_listen_thread, 
> > > > QEMU_THREAD_DETACHED);
> > > > +                           postcopy_ram_listen_thread, 
> > > > QEMU_THREAD_JOINABLE);
> > > >      trace_loadvm_postcopy_handle_listen("return");
> > > >  
> > > >      return 0;
> > > 
> > 
> 
> -- 
> Peter Xu
> 


Reply via email to