On Wed, Oct 30, 2024 at 04:13:57PM +0000, Daniel P. Berrangé wrote: > On Wed, Oct 30, 2024 at 09:13:13AM -0400, Peter Xu wrote: > > On Wed, Oct 30, 2024 at 09:48:07AM +0000, Daniel P. Berrangé wrote: > > > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > > > > v1: https://lore.kernel.org/r/20241024165627.1372621-1-pet...@redhat.com > > > > > > > Meanwhile, migration has a long standing issue on current_migration > > > > pointer, where it can point to freed data after the migration object is > > > > finalized. It is debatable that the pointer can be cleared after the > > > > main > > > > thread (1) join() the migration thread first, then (2) release the last > > > > refcount for the migration object and clear the pointer. However > > > > there's > > > > still major challenges [1]. With singleton, we could have a slightly > > > > but > > > > hopefully working workaround to clear the pointer during finalize(). > > > > > > I'm still not entirely convinced that this singleton proposal is > > > fixing the migration problem correctly. > > > > > > Based on discussions in v1, IIUC, the situation is that we have > > > migration_shutdown() being called from qemu_cleanup(). The former > > > will call object_unref(current_migration), but there may still > > > be background migration threads running that access 'current_migration', > > > and thus a potential use-after-free. > > > > migration thread is fine, it takes a refcount at the entry. > > > > And btw, taking it at the entry is racy, we've just fixed it, see (in my > > next migration pull): > > > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-2-pet...@redhat.com/ > > Yep, acquiring the refcount immediately before thread-create > is what I meant. > > > The access reported was, IIUC, outside migration code, but after both > > main/migration threads released the refcount, hence after finalize(). It > > could be a random migration_is_running() call very late in device code, for > > example. > > > > > > > > > > > Based on what the 7th patch here does, the key difference is that > > > the finalize() method for MigrationState will set 'current_migration' > > > to NULL after free'ing it. > > > > Yes. But this show case series isn't complete. We need a migration-side > > lock finally to make it safe to access. For that, see: > > > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-9-pet...@redhat.com/ > > > > > > > > I don't believe that is safe. > > > > I hope after the other series applied it will be 100% safe, even though I > > agree it's tricky. But hopefully QOM is very clean, the trickly part is > > still within migration, and it should be less tricky than migration > > implement a refcount on top of Object.. > > Ok, so with the other series applied, this does look safe, but > it also doesn't seem to really have any dependancy on the > single interface code. Patch 7 here looks sufficient, in combo > with the other 2 series to avoid the use-after-free flaws.
Patch 7, when applied without patch 6 and prior, will crash in device-introspect-test, trying to create yet another migration object when processing the "device-list-properties" QMP command. And it turns out that's also not the only way QEMU can crash by that. Fundamentally it's because patch 7 has global operations within init()/finalize() to fix the migration dangling pointer, hence it must not be instanciated more than once. It's also probably because I always think singleton can be useful in general to QEMU's device model where can be special devices all over the places that I'm not aware of. I didn't work on a lot of QEMU devices, but with that limited experience I still stumbled upon two devices (if taking migration object as one..) that might benefit from it. That leads to this whole series, which is also the cleanest so far I can think of to solve the immediate migration UAF. Thanks, > > > I think the question is whether things like migration_is_running() is > > allowed to be used anywhere, even after migration_shutdown(). My answer > > is, it should be ok to be used anywhere, and we don't necessarilly need to > > limit that. In that case the caller doesn't need to take a refcount > > because it's an immediate query. It can simply check its existance with > > the lock (after my patch 8 of the other series applied, which depends on > > this qom series). > > Agree, and from a practical POV, I think it would be impossible to > require a ref count be held from other non-migration threads, so the > locking around 'current_migration' looks like the only practical > option. > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > -- Peter Xu