On Wed, Oct 30, 2024 at 04:13:57PM +0000, Daniel P. Berrangé wrote:
> On Wed, Oct 30, 2024 at 09:13:13AM -0400, Peter Xu wrote:
> > On Wed, Oct 30, 2024 at 09:48:07AM +0000, Daniel P. Berrangé wrote:
> > > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote:
> > > > v1: https://lore.kernel.org/r/20241024165627.1372621-1-pet...@redhat.com
> > > 
> > > > Meanwhile, migration has a long standing issue on current_migration
> > > > pointer, where it can point to freed data after the migration object is
> > > > finalized.  It is debatable that the pointer can be cleared after the 
> > > > main
> > > > thread (1) join() the migration thread first, then (2) release the last
> > > > refcount for the migration object and clear the pointer.  However 
> > > > there's
> > > > still major challenges [1].  With singleton, we could have a slightly 
> > > > but
> > > > hopefully working workaround to clear the pointer during finalize().
> > > 
> > > I'm still not entirely convinced that this singleton proposal is
> > > fixing the migration problem correctly.
> > > 
> > > Based on discussions in v1, IIUC, the situation is that we have
> > > migration_shutdown() being called from qemu_cleanup(). The former
> > > will call object_unref(current_migration), but there may still
> > > be background migration threads running that access 'current_migration',
> > > and thus a potential use-after-free.
> > 
> > migration thread is fine, it takes a refcount at the entry.
> > 
> > And btw, taking it at the entry is racy, we've just fixed it, see (in my
> > next migration pull):
> > 
> > https://lore.kernel.org/qemu-devel/20241024213056.1395400-2-pet...@redhat.com/
> 
> Yep, acquiring the refcount immediately before thread-create
> is what I meant.
> 
> > The access reported was, IIUC, outside migration code, but after both
> > main/migration threads released the refcount, hence after finalize().  It
> > could be a random migration_is_running() call very late in device code, for
> > example.
> 
> 
> 
> > 
> > > 
> > > Based on what the 7th patch here does, the key difference is that
> > > the finalize() method for MigrationState will set 'current_migration'
> > > to NULL after free'ing it.
> > 
> > Yes.  But this show case series isn't complete.  We need a migration-side
> > lock finally to make it safe to access.  For that, see:
> > 
> > https://lore.kernel.org/qemu-devel/20241024213056.1395400-9-pet...@redhat.com/
> > 
> > > 
> > > I don't believe that is safe.
> > 
> > I hope after the other series applied it will be 100% safe, even though I
> > agree it's tricky.  But hopefully QOM is very clean, the trickly part is
> > still within migration, and it should be less tricky than migration
> > implement a refcount on top of Object..
> 
> Ok, so with the other series applied, this does look safe, but
> it also doesn't seem to really have any dependancy on the
> single interface code.  Patch 7 here looks sufficient, in combo
> with the other 2 series to avoid the use-after-free flaws.

Patch 7, when applied without patch 6 and prior, will crash in
device-introspect-test, trying to create yet another migration object when
processing the "device-list-properties" QMP command.  And it turns out
that's also not the only way QEMU can crash by that.

Fundamentally it's because patch 7 has global operations within
init()/finalize() to fix the migration dangling pointer, hence it must not
be instanciated more than once.

It's also probably because I always think singleton can be useful in
general to QEMU's device model where can be special devices all over the
places that I'm not aware of.  I didn't work on a lot of QEMU devices, but
with that limited experience I still stumbled upon two devices (if taking
migration object as one..) that might benefit from it.

That leads to this whole series, which is also the cleanest so far I can
think of to solve the immediate migration UAF.

Thanks,

> 
> > I think the question is whether things like migration_is_running() is
> > allowed to be used anywhere, even after migration_shutdown().  My answer
> > is, it should be ok to be used anywhere, and we don't necessarilly need to
> > limit that.  In that case the caller doesn't need to take a refcount
> > because it's an immediate query.  It can simply check its existance with
> > the lock (after my patch 8 of the other series applied, which depends on
> > this qom series).
> 
> Agree, and from a practical POV, I think it would be impossible to
> require a ref count be held from other non-migration threads, so the
> locking around 'current_migration' looks like the only practical
> option.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 

-- 
Peter Xu


Reply via email to