Re: [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err()

2024-04-25 Thread Vladimir Sementsov-Ogievskiy

On 25.04.24 00:52, Peter Xu wrote:

On Wed, Apr 24, 2024 at 10:50:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 24.04.24 22:40, Peter Xu wrote:

On Wed, Apr 24, 2024 at 08:42:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:

1. Most of callers want to free the error after call. Let's help them.

2. Some callers log the error, some not. We also have places where we
 log the stored error. Let's instead simply log every migration
 error.

3. Some callers have to retrieve current migration state only to pass
 it to migrate_set_error(). In the new helper let's get the state
 automatically.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
   migration/migration.c| 48 
   migration/migration.h|  2 +-
   migration/multifd.c  | 18 ++-
   migration/postcopy-ram.c |  3 +--
   migration/savevm.c   | 16 +-
   5 files changed, 28 insertions(+), 59 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 696762bc64..806b7b080b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -285,7 +285,7 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
   void migration_cancel(const Error *error)
   {
   if (error) {
-migrate_set_error(current_migration, error);
+migrate_report_err(error_copy(error));
   }
   if (migrate_dirty_limit()) {
   qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
@@ -779,13 +779,6 @@ process_incoming_migration_co(void *opaque)
   }
   if (ret < 0) {
-MigrationState *s = migrate_get_current();
-
-if (migrate_has_error(s)) {
-WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(s->error);
-}
-}
   error_report("load of migration failed: %s", strerror(-ret));
   goto fail;
   }
@@ -1402,10 +1395,6 @@ static void migrate_fd_cleanup(MigrationState *s)
 MIGRATION_STATUS_CANCELLED);
   }
-if (s->error) {
-/* It is used on info migrate.  We can't free it */
-error_report_err(error_copy(s->error));
-}
   type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
MIG_EVENT_PRECOPY_DONE;
   migration_call_notifiers(s, type, NULL);
@@ -1418,12 +1407,14 @@ static void migrate_fd_cleanup_bh(void *opaque)
   migrate_fd_cleanup(opaque);
   }
-void migrate_set_error(MigrationState *s, const Error *error)
+void migrate_report_err(Error *error)
   {
+MigrationState *s = migrate_get_current();


Avoid passing in *s looks ok.


   QEMU_LOCK_GUARD(>error_mutex);
   if (!s->error) {
   s->error = error_copy(error);


I think I get your point, but then looks like this error_copy() should be
removed but forgotten?

I remember I had an attempt to do similarly (but only transfer the
ownership):

https://lore.kernel.org/qemu-devel/20230829214235.69309-3-pet...@redhat.com/

However I gave up later and I forgot why.  One reason could be that I hit a
use-after-free, then found that well indeed leaking an Error is much better
than debugging a UAF.


Hmm, yes I saw a leaked Error somewhere, and this patch should fix it. But may 
be I missed introducing this use-after-free again)


Ah do you mean you fixed a bug?  If so please use a standalone patch for
that and we'll need to copy stable.



Hmm, this hunk:

@@ -1287,7 +1283,7 @@ static void multifd_recv_terminate_threads(Error *err)
 
 if (err) {

 MigrationState *s = migrate_get_current();
-migrate_set_error(s, err);
+migrate_report_err(error_copy(err));
 if (s->state == MIGRATION_STATUS_SETUP ||
 s->state == MIGRATION_STATUS_ACTIVE) {
 migrate_set_state(>state, s->state,

No, seems I don't fix bug, but exactly introduce use-after-free :). Will fix.


I did notice that in this series on patch looks like does more than one
thing.  It'll be helpful too for reviewers when patch can be split
properly.


Agree, OK







So maybe we simply keep it like before?  If you like such change, let's
just be extremely caucious.


   }
+error_report_err(error);


The ideal case to me is we don't trigger an error_report() all over the
place.  We're slightly going backward from that regard, IMHO..

Ideally I think we shouldn't dump anything to stderr, but user should
always query from qmp.



Sounds reasonable to me. I'm just unsure, if keep it like before, how
should I correctly update logging to stderr in
process_incoming_migration_co(). Could we just drop error reporting to
stderr? Or should it be kept as is for the case when exit_on_error is
true?


I'm not sure I get the question, but I'll comment in patch 2 very soon, so
we can move the discussion there.

Thanks,



--
Best regards,
Vladimir




Re: [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err()

2024-04-24 Thread Peter Xu
On Wed, Apr 24, 2024 at 10:50:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 24.04.24 22:40, Peter Xu wrote:
> > On Wed, Apr 24, 2024 at 08:42:44PM +0300, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > 1. Most of callers want to free the error after call. Let's help them.
> > > 
> > > 2. Some callers log the error, some not. We also have places where we
> > > log the stored error. Let's instead simply log every migration
> > > error.
> > > 
> > > 3. Some callers have to retrieve current migration state only to pass
> > > it to migrate_set_error(). In the new helper let's get the state
> > > automatically.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > > ---
> > >   migration/migration.c| 48 
> > >   migration/migration.h|  2 +-
> > >   migration/multifd.c  | 18 ++-
> > >   migration/postcopy-ram.c |  3 +--
> > >   migration/savevm.c   | 16 +-
> > >   5 files changed, 28 insertions(+), 59 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 696762bc64..806b7b080b 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -285,7 +285,7 @@ void migration_bh_schedule(QEMUBHFunc *cb, void 
> > > *opaque)
> > >   void migration_cancel(const Error *error)
> > >   {
> > >   if (error) {
> > > -migrate_set_error(current_migration, error);
> > > +migrate_report_err(error_copy(error));
> > >   }
> > >   if (migrate_dirty_limit()) {
> > >   qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
> > > @@ -779,13 +779,6 @@ process_incoming_migration_co(void *opaque)
> > >   }
> > >   if (ret < 0) {
> > > -MigrationState *s = migrate_get_current();
> > > -
> > > -if (migrate_has_error(s)) {
> > > -WITH_QEMU_LOCK_GUARD(>error_mutex) {
> > > -error_report_err(s->error);
> > > -}
> > > -}
> > >   error_report("load of migration failed: %s", strerror(-ret));
> > >   goto fail;
> > >   }
> > > @@ -1402,10 +1395,6 @@ static void migrate_fd_cleanup(MigrationState *s)
> > > MIGRATION_STATUS_CANCELLED);
> > >   }
> > > -if (s->error) {
> > > -/* It is used on info migrate.  We can't free it */
> > > -error_report_err(error_copy(s->error));
> > > -}
> > >   type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
> > >MIG_EVENT_PRECOPY_DONE;
> > >   migration_call_notifiers(s, type, NULL);
> > > @@ -1418,12 +1407,14 @@ static void migrate_fd_cleanup_bh(void *opaque)
> > >   migrate_fd_cleanup(opaque);
> > >   }
> > > -void migrate_set_error(MigrationState *s, const Error *error)
> > > +void migrate_report_err(Error *error)
> > >   {
> > > +MigrationState *s = migrate_get_current();
> > 
> > Avoid passing in *s looks ok.
> > 
> > >   QEMU_LOCK_GUARD(>error_mutex);
> > >   if (!s->error) {
> > >   s->error = error_copy(error);
> > 
> > I think I get your point, but then looks like this error_copy() should be
> > removed but forgotten?
> > 
> > I remember I had an attempt to do similarly (but only transfer the
> > ownership):
> > 
> > https://lore.kernel.org/qemu-devel/20230829214235.69309-3-pet...@redhat.com/
> > 
> > However I gave up later and I forgot why.  One reason could be that I hit a
> > use-after-free, then found that well indeed leaking an Error is much better
> > than debugging a UAF.
> 
> Hmm, yes I saw a leaked Error somewhere, and this patch should fix it. But 
> may be I missed introducing this use-after-free again)

Ah do you mean you fixed a bug?  If so please use a standalone patch for
that and we'll need to copy stable.

I did notice that in this series on patch looks like does more than one
thing.  It'll be helpful too for reviewers when patch can be split
properly.

> 
> > 
> > So maybe we simply keep it like before?  If you like such change, let's
> > just be extremely caucious.
> > 
> > >   }
> > > +error_report_err(error);
> > 
> > The ideal case to me is we don't trigger an error_report() all over the
> > place.  We're slightly going backward from that regard, IMHO..
> > 
> > Ideally I think we shouldn't dump anything to stderr, but user should
> > always query from qmp.
> > 
> 
> Sounds reasonable to me. I'm just unsure, if keep it like before, how
> should I correctly update logging to stderr in
> process_incoming_migration_co(). Could we just drop error reporting to
> stderr? Or should it be kept as is for the case when exit_on_error is
> true?

I'm not sure I get the question, but I'll comment in patch 2 very soon, so
we can move the discussion there.

Thanks,

-- 
Peter Xu




Re: [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err()

2024-04-24 Thread Vladimir Sementsov-Ogievskiy

On 24.04.24 22:40, Peter Xu wrote:

On Wed, Apr 24, 2024 at 08:42:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:

1. Most of callers want to free the error after call. Let's help them.

2. Some callers log the error, some not. We also have places where we
log the stored error. Let's instead simply log every migration
error.

3. Some callers have to retrieve current migration state only to pass
it to migrate_set_error(). In the new helper let's get the state
automatically.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  migration/migration.c| 48 
  migration/migration.h|  2 +-
  migration/multifd.c  | 18 ++-
  migration/postcopy-ram.c |  3 +--
  migration/savevm.c   | 16 +-
  5 files changed, 28 insertions(+), 59 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 696762bc64..806b7b080b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -285,7 +285,7 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
  void migration_cancel(const Error *error)
  {
  if (error) {
-migrate_set_error(current_migration, error);
+migrate_report_err(error_copy(error));
  }
  if (migrate_dirty_limit()) {
  qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
@@ -779,13 +779,6 @@ process_incoming_migration_co(void *opaque)
  }
  
  if (ret < 0) {

-MigrationState *s = migrate_get_current();
-
-if (migrate_has_error(s)) {
-WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(s->error);
-}
-}
  error_report("load of migration failed: %s", strerror(-ret));
  goto fail;
  }
@@ -1402,10 +1395,6 @@ static void migrate_fd_cleanup(MigrationState *s)
MIGRATION_STATUS_CANCELLED);
  }
  
-if (s->error) {

-/* It is used on info migrate.  We can't free it */
-error_report_err(error_copy(s->error));
-}
  type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
   MIG_EVENT_PRECOPY_DONE;
  migration_call_notifiers(s, type, NULL);
@@ -1418,12 +1407,14 @@ static void migrate_fd_cleanup_bh(void *opaque)
  migrate_fd_cleanup(opaque);
  }
  
-void migrate_set_error(MigrationState *s, const Error *error)

+void migrate_report_err(Error *error)
  {
+MigrationState *s = migrate_get_current();


Avoid passing in *s looks ok.


  QEMU_LOCK_GUARD(>error_mutex);
  if (!s->error) {
  s->error = error_copy(error);


I think I get your point, but then looks like this error_copy() should be
removed but forgotten?

I remember I had an attempt to do similarly (but only transfer the
ownership):

https://lore.kernel.org/qemu-devel/20230829214235.69309-3-pet...@redhat.com/

However I gave up later and I forgot why.  One reason could be that I hit a
use-after-free, then found that well indeed leaking an Error is much better
than debugging a UAF.


Hmm, yes I saw a leaked Error somewhere, and this patch should fix it. But may 
be I missed introducing this use-after-free again)



So maybe we simply keep it like before?  If you like such change, let's
just be extremely caucious.


  }
+error_report_err(error);


The ideal case to me is we don't trigger an error_report() all over the
place.  We're slightly going backward from that regard, IMHO..

Ideally I think we shouldn't dump anything to stderr, but user should
always query from qmp.



Sounds reasonable to me. I'm just unsure, if keep it like before, how should I 
correctly update logging to stderr in process_incoming_migration_co(). Could we 
just drop error reporting to stderr? Or should it be kept as is for the case 
when exit_on_error is true?

--
Best regards,
Vladimir




Re: [PATCH v2 1/2] migration: rework migrate_set_error() to migrate_report_err()

2024-04-24 Thread Peter Xu
On Wed, Apr 24, 2024 at 08:42:44PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 1. Most of callers want to free the error after call. Let's help them.
> 
> 2. Some callers log the error, some not. We also have places where we
>log the stored error. Let's instead simply log every migration
>error.
> 
> 3. Some callers have to retrieve current migration state only to pass
>it to migrate_set_error(). In the new helper let's get the state
>automatically.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  migration/migration.c| 48 
>  migration/migration.h|  2 +-
>  migration/multifd.c  | 18 ++-
>  migration/postcopy-ram.c |  3 +--
>  migration/savevm.c   | 16 +-
>  5 files changed, 28 insertions(+), 59 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 696762bc64..806b7b080b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -285,7 +285,7 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
>  void migration_cancel(const Error *error)
>  {
>  if (error) {
> -migrate_set_error(current_migration, error);
> +migrate_report_err(error_copy(error));
>  }
>  if (migrate_dirty_limit()) {
>  qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
> @@ -779,13 +779,6 @@ process_incoming_migration_co(void *opaque)
>  }
>  
>  if (ret < 0) {
> -MigrationState *s = migrate_get_current();
> -
> -if (migrate_has_error(s)) {
> -WITH_QEMU_LOCK_GUARD(>error_mutex) {
> -error_report_err(s->error);
> -}
> -}
>  error_report("load of migration failed: %s", strerror(-ret));
>  goto fail;
>  }
> @@ -1402,10 +1395,6 @@ static void migrate_fd_cleanup(MigrationState *s)
>MIGRATION_STATUS_CANCELLED);
>  }
>  
> -if (s->error) {
> -/* It is used on info migrate.  We can't free it */
> -error_report_err(error_copy(s->error));
> -}
>  type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
>   MIG_EVENT_PRECOPY_DONE;
>  migration_call_notifiers(s, type, NULL);
> @@ -1418,12 +1407,14 @@ static void migrate_fd_cleanup_bh(void *opaque)
>  migrate_fd_cleanup(opaque);
>  }
>  
> -void migrate_set_error(MigrationState *s, const Error *error)
> +void migrate_report_err(Error *error)
>  {
> +MigrationState *s = migrate_get_current();

Avoid passing in *s looks ok.

>  QEMU_LOCK_GUARD(>error_mutex);
>  if (!s->error) {
>  s->error = error_copy(error);

I think I get your point, but then looks like this error_copy() should be
removed but forgotten?

I remember I had an attempt to do similarly (but only transfer the
ownership):

https://lore.kernel.org/qemu-devel/20230829214235.69309-3-pet...@redhat.com/

However I gave up later and I forgot why.  One reason could be that I hit a
use-after-free, then found that well indeed leaking an Error is much better
than debugging a UAF.

So maybe we simply keep it like before?  If you like such change, let's
just be extremely caucious.

>  }
> +error_report_err(error);

The ideal case to me is we don't trigger an error_report() all over the
place.  We're slightly going backward from that regard, IMHO..

Ideally I think we shouldn't dump anything to stderr, but user should
always query from qmp.

Thanks,

-- 
Peter Xu