RE: [PATCH v4 07/10] migration: split migration_incoming_co

2023-05-04 Thread Zhang, Chen


> -Original Message-
> From: Vladimir Sementsov-Ogievskiy 
> Sent: Thursday, May 4, 2023 6:52 AM
> To: Peter Xu 
> Cc: qemu-devel@nongnu.org; lukasstra...@web.de; quint...@redhat.com;
> Zhang, Chen ; Zhang, Hailiang
> ; Leonardo Bras 
> Subject: Re: [PATCH v4 07/10] migration: split migration_incoming_co
> 
> On 02.05.23 23:48, Peter Xu wrote:
> > On Fri, Apr 28, 2023 at 10:49:25PM +0300, Vladimir Sementsov-Ogievskiy
> wrote:
> >> Originally, migration_incoming_co was introduced by
> >> 25d0c16f625feb3b6
> >> "migration: Switch to COLO process after finishing loadvm"
> >> to be able to enter from COLO code to one specific yield point, added
> >> by 25d0c16f625feb3b6.
> >>
> >> Later in 923709896b1b0
> >>   "migration: poll the cm event for destination qemu"
> >> we reused this variable to wake the migration incoming coroutine from
> >> RDMA code.
> >>
> >> That was doubtful idea. Entering coroutines is a very fragile thing:
> >> you should be absolutely sure which yield point you are going to enter.
> >>
> >> I don't know how much is it safe to enter during qemu_loadvm_state()
> >> which I think what RDMA want to do. But for sure RDMA shouldn't enter
> >> the special COLO-related yield-point. As well, COLO code doesn't want
> >> to enter during qemu_loadvm_state(), it want to enter it's own
> >> specific yield-point.
> >>
> >> As well, when in 8e48ac95865ac97d
> >>   "COLO: Add block replication into colo process" we added
> >> bdrv_invalidate_cache_all() call (now it's called activate_all()) it
> >> became possible to enter the migration incoming coroutine during that
> >> call which is wrong too.
> >>
> >> So, let't make these things separate and disjoint: loadvm_co for
> >> RDMA, non-NULL during qemu_loadvm_state(), and colo_incoming_co for
> >> COLO, non-NULL only around specific yield.
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy
> >> 
> >> ---
> >>   migration/colo.c  | 4 ++--
> >>   migration/migration.c | 8 ++--
> >>   migration/migration.h | 9 -
> >>   3 files changed, 16 insertions(+), 5 deletions(-)
> >
> > The idea looks right to me, but I really know mostly nothing on
> > coroutines and also rdma+colo..
> >
> > Is the other ref in rdma.c (rdma_cm_poll_handler()) still missing?
> >
> 
> Oops right.. I was building with rdma disabled. Will fix.
> 
> Thanks a lot for reviewing!
> 

Yes, I know some people and company try to enable COLO with RDMA.
But in my side, I haven't tried this way yet.

Thanks
Chen

> --
> Best regards,
> Vladimir



Re: [PATCH v4 07/10] migration: split migration_incoming_co

2023-05-03 Thread Vladimir Sementsov-Ogievskiy

On 02.05.23 23:48, Peter Xu wrote:

On Fri, Apr 28, 2023 at 10:49:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:

Originally, migration_incoming_co was introduced by
25d0c16f625feb3b6
"migration: Switch to COLO process after finishing loadvm"
to be able to enter from COLO code to one specific yield point, added
by 25d0c16f625feb3b6.

Later in 923709896b1b0
  "migration: poll the cm event for destination qemu"
we reused this variable to wake the migration incoming coroutine from
RDMA code.

That was doubtful idea. Entering coroutines is a very fragile thing:
you should be absolutely sure which yield point you are going to enter.

I don't know how much is it safe to enter during qemu_loadvm_state()
which I think what RDMA want to do. But for sure RDMA shouldn't enter
the special COLO-related yield-point. As well, COLO code doesn't want
to enter during qemu_loadvm_state(), it want to enter it's own specific
yield-point.

As well, when in 8e48ac95865ac97d
  "COLO: Add block replication into colo process" we added
bdrv_invalidate_cache_all() call (now it's called activate_all())
it became possible to enter the migration incoming coroutine during
that call which is wrong too.

So, let't make these things separate and disjoint: loadvm_co for RDMA,
non-NULL during qemu_loadvm_state(), and colo_incoming_co for COLO,
non-NULL only around specific yield.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  migration/colo.c  | 4 ++--
  migration/migration.c | 8 ++--
  migration/migration.h | 9 -
  3 files changed, 16 insertions(+), 5 deletions(-)


The idea looks right to me, but I really know mostly nothing on coroutines
and also rdma+colo..

Is the other ref in rdma.c (rdma_cm_poll_handler()) still missing?



Oops right.. I was building with rdma disabled. Will fix.

Thanks a lot for reviewing!

--
Best regards,
Vladimir




Re: [PATCH v4 07/10] migration: split migration_incoming_co

2023-05-02 Thread Peter Xu
On Fri, Apr 28, 2023 at 10:49:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Originally, migration_incoming_co was introduced by
> 25d0c16f625feb3b6
>"migration: Switch to COLO process after finishing loadvm"
> to be able to enter from COLO code to one specific yield point, added
> by 25d0c16f625feb3b6.
> 
> Later in 923709896b1b0
>  "migration: poll the cm event for destination qemu"
> we reused this variable to wake the migration incoming coroutine from
> RDMA code.
> 
> That was doubtful idea. Entering coroutines is a very fragile thing:
> you should be absolutely sure which yield point you are going to enter.
> 
> I don't know how much is it safe to enter during qemu_loadvm_state()
> which I think what RDMA want to do. But for sure RDMA shouldn't enter
> the special COLO-related yield-point. As well, COLO code doesn't want
> to enter during qemu_loadvm_state(), it want to enter it's own specific
> yield-point.
> 
> As well, when in 8e48ac95865ac97d
>  "COLO: Add block replication into colo process" we added
> bdrv_invalidate_cache_all() call (now it's called activate_all())
> it became possible to enter the migration incoming coroutine during
> that call which is wrong too.
> 
> So, let't make these things separate and disjoint: loadvm_co for RDMA,
> non-NULL during qemu_loadvm_state(), and colo_incoming_co for COLO,
> non-NULL only around specific yield.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  migration/colo.c  | 4 ++--
>  migration/migration.c | 8 ++--
>  migration/migration.h | 9 -
>  3 files changed, 16 insertions(+), 5 deletions(-)

The idea looks right to me, but I really know mostly nothing on coroutines
and also rdma+colo..

Is the other ref in rdma.c (rdma_cm_poll_handler()) still missing?

-- 
Peter Xu




[PATCH v4 07/10] migration: split migration_incoming_co

2023-04-28 Thread Vladimir Sementsov-Ogievskiy
Originally, migration_incoming_co was introduced by
25d0c16f625feb3b6
   "migration: Switch to COLO process after finishing loadvm"
to be able to enter from COLO code to one specific yield point, added
by 25d0c16f625feb3b6.

Later in 923709896b1b0
 "migration: poll the cm event for destination qemu"
we reused this variable to wake the migration incoming coroutine from
RDMA code.

That was doubtful idea. Entering coroutines is a very fragile thing:
you should be absolutely sure which yield point you are going to enter.

I don't know how much is it safe to enter during qemu_loadvm_state()
which I think what RDMA want to do. But for sure RDMA shouldn't enter
the special COLO-related yield-point. As well, COLO code doesn't want
to enter during qemu_loadvm_state(), it want to enter it's own specific
yield-point.

As well, when in 8e48ac95865ac97d
 "COLO: Add block replication into colo process" we added
bdrv_invalidate_cache_all() call (now it's called activate_all())
it became possible to enter the migration incoming coroutine during
that call which is wrong too.

So, let't make these things separate and disjoint: loadvm_co for RDMA,
non-NULL during qemu_loadvm_state(), and colo_incoming_co for COLO,
non-NULL only around specific yield.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 migration/colo.c  | 4 ++--
 migration/migration.c | 8 ++--
 migration/migration.h | 9 -
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 6c7c313956..a688ac553a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -145,8 +145,8 @@ static void secondary_vm_do_failover(void)
 qemu_sem_post(>colo_incoming_sem);
 
 /* For Secondary VM, jump to incoming co */
-if (mis->migration_incoming_co) {
-qemu_coroutine_enter(mis->migration_incoming_co);
+if (mis->colo_incoming_co) {
+qemu_coroutine_enter(mis->colo_incoming_co);
 }
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index 8db0892317..23b2d187de 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -505,12 +505,14 @@ process_incoming_migration_co(void *opaque)
 Error *local_err = NULL;
 
 assert(mis->from_src_file);
-mis->migration_incoming_co = qemu_coroutine_self();
 mis->largest_page_size = qemu_ram_pagesize_largest();
 postcopy_state_set(POSTCOPY_INCOMING_NONE);
 migrate_set_state(>state, MIGRATION_STATUS_NONE,
   MIGRATION_STATUS_ACTIVE);
+
+mis->loadvm_co = qemu_coroutine_self();
 ret = qemu_loadvm_state(mis->from_src_file);
+mis->loadvm_co = NULL;
 
 ps = postcopy_state_get();
 trace_process_incoming_migration_co_end(ret, ps);
@@ -551,7 +553,10 @@ process_incoming_migration_co(void *opaque)
 
 qemu_thread_create(_incoming_thread, "COLO incoming",
  colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
+
+mis->colo_incoming_co = qemu_coroutine_self();
 qemu_coroutine_yield();
+mis->colo_incoming_co = NULL;
 
 qemu_mutex_unlock_iothread();
 /* Wait checkpoint incoming thread exit before free resource */
@@ -563,7 +568,6 @@ process_incoming_migration_co(void *opaque)
 
 mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
 qemu_bh_schedule(mis->bh);
-mis->migration_incoming_co = NULL;
 return;
 fail:
 local_err = NULL;
diff --git a/migration/migration.h b/migration/migration.h
index 7721c7658b..48a46123a0 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -162,8 +162,15 @@ struct MigrationIncomingState {
 
 int state;
 
+/*
+ * The incoming migration coroutine, non-NULL during qemu_loadvm_state().
+ * Used to wake the migration incoming coroutine from rdma code. How much 
is
+ * it safe - it's a question.
+ */
+Coroutine *loadvm_co;
+
 /* The coroutine we should enter (back) after failover */
-Coroutine *migration_incoming_co;
+Coroutine *colo_incoming_co;
 QemuSemaphore colo_incoming_sem;
 
 /*
-- 
2.34.1