On Wed, Oct 30, 2024 at 01:51:54PM -0400, Peter Xu wrote:
> On Wed, Oct 30, 2024 at 04:13:57PM +0000, Daniel P. Berrangé wrote:
> > On Wed, Oct 30, 2024 at 09:13:13AM -0400, Peter Xu wrote:
> > > On Wed, Oct 30, 2024 at 09:48:07AM +0000, Daniel P. Berrangé wrote:
> > > > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote:
> > > > > v1: 
> > > > > https://lore.kernel.org/r/20241024165627.1372621-1-pet...@redhat.com
> > > > 
> > > > > Meanwhile, migration has a long standing issue on current_migration
> > > > > pointer, where it can point to freed data after the migration object 
> > > > > is
> > > > > finalized.  It is debatable that the pointer can be cleared after the 
> > > > > main
> > > > > thread (1) join() the migration thread first, then (2) release the 
> > > > > last
> > > > > refcount for the migration object and clear the pointer.  However 
> > > > > there's
> > > > > still major challenges [1].  With singleton, we could have a slightly 
> > > > > but
> > > > > hopefully working workaround to clear the pointer during finalize().
> > > > 
> > > > I'm still not entirely convinced that this singleton proposal is
> > > > fixing the migration problem correctly.
> > > > 
> > > > Based on discussions in v1, IIUC, the situation is that we have
> > > > migration_shutdown() being called from qemu_cleanup(). The former
> > > > will call object_unref(current_migration), but there may still
> > > > be background migration threads running that access 'current_migration',
> > > > and thus a potential use-after-free.
> > > 
> > > migration thread is fine, it takes a refcount at the entry.
> > > 
> > > And btw, taking it at the entry is racy, we've just fixed it, see (in my
> > > next migration pull):
> > > 
> > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-2-pet...@redhat.com/
> > 
> > Yep, acquiring the refcount immediately before thread-create
> > is what I meant.
> > 
> > > The access reported was, IIUC, outside migration code, but after both
> > > main/migration threads released the refcount, hence after finalize().  It
> > > could be a random migration_is_running() call very late in device code, 
> > > for
> > > example.
> > 
> > 
> > 
> > > 
> > > > 
> > > > Based on what the 7th patch here does, the key difference is that
> > > > the finalize() method for MigrationState will set 'current_migration'
> > > > to NULL after free'ing it.
> > > 
> > > Yes.  But this show case series isn't complete.  We need a migration-side
> > > lock finally to make it safe to access.  For that, see:
> > > 
> > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-9-pet...@redhat.com/
> > > 
> > > > 
> > > > I don't believe that is safe.
> > > 
> > > I hope after the other series applied it will be 100% safe, even though I
> > > agree it's tricky.  But hopefully QOM is very clean, the trickly part is
> > > still within migration, and it should be less tricky than migration
> > > implement a refcount on top of Object..
> > 
> > Ok, so with the other series applied, this does look safe, but
> > it also doesn't seem to really have any dependancy on the
> > single interface code.  Patch 7 here looks sufficient, in combo
> > with the other 2 series to avoid the use-after-free flaws.
> 
> Patch 7, when applied without patch 6 and prior, will crash in
> device-introspect-test, trying to create yet another migration object when
> processing the "device-list-properties" QMP command.  And it turns out
> that's also not the only way QEMU can crash by that.
> 
> Fundamentally it's because patch 7 has global operations within
> init()/finalize() to fix the migration dangling pointer, hence it must not
> be instanciated more than once.

That's a result from moving the "assert()" into the constructor.
The assert(!current_migration) can be kept in migration_object_init,
the constructor could conditionally set current_migration only if it
is NULL, and the finalizer could conditionally clear current_migration
only if it matches the current object. There's no conceptual dependancy
on having a singleton interface in the patch.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to