Peter Xu <pet...@redhat.com> writes:

> On Wed, Oct 23, 2024 at 04:32:01PM -0300, Fabiano Rosas wrote:
>> Peter Xu <pet...@redhat.com> writes:
>> 
>> > This is a follow up of below patch from Avihai as a replacement:
>> >
>> > https://lore.kernel.org/qemu-devel/20241020130108.27148-3-avih...@nvidia.com/
>> >
>> > This is v2 of the series, and it became a more generic rework on how we do
>> > migration object refcounts, so I skipped a changelog because most of this
>> > is new things.
>> >
>> > To put it simple, now I introduced another pointer to migration object, and
>> > here's a simple explanation for both after all change applied (copy-paste
>> > from one of the patch):
>> >
>> > /*
>> >  * We have two pointers for the global migration objects.  Both of them are
>> >  * initialized early during QEMU starts, but they have different 
>> > lifecycles.
>> 
>> The very next person that needs to access one of those in migration code
>> will get this wrong.
>
> Migration code should never access global_migration except during init /
> shutdown (or a renamed version of it), that's the plan.. so no one should
> use it within migration/.

Right, but consider that it's easy enough for someone to look for a
global object to write in the code, find the wrong one and just use
it. It would then be up to reviewers to catch the mistake.

Look at this:

  bool migration_is_idle(void)
  {
      MigrationState *s = current_migration;
      ...
  }
  
  bool migration_is_active(void)
  {
      MigrationState *s = global_migration;
      ...
  }

Of course we'll get this wrong at some point.

Also, if in the future someone decides to call migration_is_idle() from
outside migration/, we'd be not only adding the burden to change the
variable, but also the functional change at shutdown time.

If we're to carry on with this idea, we'll need to play with some
headers to properly isolate these two usages. Something like:

migration.h:
  bool migration_is_active_internal(MigrationState *s);
  void set_global_obj(MigrationState *obj);

migration.c:
  bool migration_is_active_internal(MigrationState *s)
  {
  }

  void migration_object_init(void)
  {
      set_global_object(MIGRATION_OBJ(object_new(TYPE_MIGRATION)));
  }

exports.h:
  #include migration.h
  bool migration_is_active(void);

exports.c:
  static MigrationState *global_migration;
  
  void set_global_obj(MigrationState *obj)
  {
      global_migration = obj;
  }
  
  bool migration_is_active(void)
  {
      return migration_is_active_internal(global_migration);
  }

That way, the internal code never calls the exported functions with
global, always with current.

>> 
>> >  *
>> >  * - current_migration
>> >  *
>> >  *   This variable reflects the whole lifecycle of the migration object
>> >  *   (which each QEMU can only have one).  It is valid until the migration
>> >  *   object is destroyed.
>> >  *
>> >  *   This is the object that internal migration so far use.  For example,
>> >  *   internal helper migrate_get_current() references it.
>> >  *
>> >  *   When all migration code can always pass over a MigrationState* around,
>> >  *   this variable can logically be dropped.  But we're not yet there.
>> 
>> Won't dropping it just bring us back to the situation before this patch?
>> I'd say we need the opposite, to stop using migrate_get_current()
>> everywhere in the migration code and instead expose the
>> current_migration via an internal header.
>
> I meant dropping all the global access to current_migration within
> migration/ dir.
>
> Consider all functions has the 1st parameter as MigrationState*, for
> example.  Then we don't need that for internal use, while global_migration
> is still needed for external use, but only for external use.
>
>> 
>> >  *
>> >  * - global_migration
>> 
>> Both are global, we can't prefix one of them with global_.
>
> I can rename it to migration_export, but the question is whether the name
> is the issue, or you'd think having two global variables pointing to
> migration object is the issue / concern.
>
> So I think fundaementally we indeed only need one global var there, if we
> can cleanup the migration/ everywhere to always take the pointer from the
> caller, then migration thread will take 1 refcount and that guarantees it's
> available for migration/.

We can discuss when we get to that point, but that might make the code
too cumbersome. Having to change signatures all the time to
include/remove MigrationState.

>
>> 
>> >  *
>> >  *   This is valid only until the migration object is still valid to the
>> >  *   outside-migration world (until migration_shutdown()).
>> >  *
>> >  *   This should normally be always set, cleared or accessed by the main
>> >  *   thread only, rather than the migration thread.
>> >  *
>> >  *   All the exported functions (in include/migration) should reference the
>> >  *   exported migration object only to avoid race conditions, as
>> >  *   current_migration can be freed concurrently by migration thread when
>> >  *   the migration thread holds the last refcount.
>> >  */
>> 
>> Have you thought of locking the migration object instead?
>
> Yes, logically we could use RCU if we want, then take BQL for example only
> if update.  but I worry that's an overkill: we'll need too many rcu read
> lock all over the places..
>
>> 
>> Having two global pointers to the same object with one having slightly
>> different lifecycle than the other will certainly lead to misuse.
>
> That's indeed tricky, it's just that this is so far the easiest change I
> can think of.
>
>> 
>> I worry about mixing the usage of both globals due to some migration
>> code that needs to call these exported functions or external code
>> reaching into migration/ through some indirect path. Check global and
>> try to use current sort of scenarios (and vice-versa).
>
> Yeh I get your concern.  I actually tried to observe such usages (for now,
> especially when migration/ uses the misc.h exported functions) and they're
> all safe.  I should have mentioned that.

I'm afraid that's not enough, code changes after all. It's not possible
to carry these requirements to the future, we must account for the gaps
now.

>
> For the other way round, I don't yet expect that to happen: the plan is
> anything outside must call a function in include/migration/* and that must
> only use global_migration.
>
>> 
>> Similarly, what about when a lingering reference still exists, but
>> global_migration is already clear? Any migration code looking at
>> global_migration would do the wrong thing.
>
> That's how it works: migration thread will take one refcount and that'll
> happen when migration is running but when VM shutdowns itself.  I checked
> that all the migration code should be fine when using the exported
> functions even if they should reference current_migration.
>
> So logically if with such design, indeed internal migration/ code shouldn't
> reference global_migration at all just to be always safe.
>
> Any idea in your mind that can make this easier?  I'm definitely open to
> suggestions.
>
>> 
>> >
>> > It allows all misc.h exported helpers to be used for the whole VM
>> > lifecycle, so as to never crash QEMU with freed migration objects.
>> 
>> Isn't there a race with the last reference being dropped at
>> migration_shutdown() and global_migration still being set?
>
> It needs to be protected by BQL in this case, so anyone using exported
> functions need to take BQL first.

Even code under migration/ ? All the various threads? The BQL is a
heavyweight lock. We shouldn't be adding more instances of it unless it
protects against something from the main loop/other subsystems.

>
>> 
>> >
>> > Thanks,
>> >
>> > Peter Xu (4):
>> >   migration: Unexport dirty_bitmap_mig_init() in misc.h
>> >   migration: Reset current_migration properly
>> >   migration: Add global_migration
>> >   migration: Make all helpers in misc.h safe to use without migration
>> >
>> >  include/migration/misc.h | 29 ++++++++----
>> >  migration/migration.h    |  4 ++
>> >  migration/migration.c    | 99 +++++++++++++++++++++++++++++++++++-----
>> >  3 files changed, 113 insertions(+), 19 deletions(-)
>> 

Reply via email to