On Wed, Oct 30, 2024 at 01:51:54PM -0400, Peter Xu wrote: > On Wed, Oct 30, 2024 at 04:13:57PM +0000, Daniel P. Berrangé wrote: > > On Wed, Oct 30, 2024 at 09:13:13AM -0400, Peter Xu wrote: > > > On Wed, Oct 30, 2024 at 09:48:07AM +0000, Daniel P. Berrangé wrote: > > > > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > > > > > v1: > > > > > https://lore.kernel.org/r/20241024165627.1372621-1-pet...@redhat.com > > > > > > > > > Meanwhile, migration has a long standing issue on current_migration > > > > > pointer, where it can point to freed data after the migration object > > > > > is > > > > > finalized. It is debatable that the pointer can be cleared after the > > > > > main > > > > > thread (1) join() the migration thread first, then (2) release the > > > > > last > > > > > refcount for the migration object and clear the pointer. However > > > > > there's > > > > > still major challenges [1]. With singleton, we could have a slightly > > > > > but > > > > > hopefully working workaround to clear the pointer during finalize(). > > > > > > > > I'm still not entirely convinced that this singleton proposal is > > > > fixing the migration problem correctly. > > > > > > > > Based on discussions in v1, IIUC, the situation is that we have > > > > migration_shutdown() being called from qemu_cleanup(). The former > > > > will call object_unref(current_migration), but there may still > > > > be background migration threads running that access 'current_migration', > > > > and thus a potential use-after-free. > > > > > > migration thread is fine, it takes a refcount at the entry. > > > > > > And btw, taking it at the entry is racy, we've just fixed it, see (in my > > > next migration pull): > > > > > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-2-pet...@redhat.com/ > > > > Yep, acquiring the refcount immediately before thread-create > > is what I meant. > > > > > The access reported was, IIUC, outside migration code, but after both > > > main/migration threads released the refcount, hence after finalize(). It > > > could be a random migration_is_running() call very late in device code, > > > for > > > example. > > > > > > > > > > > > > > > > > Based on what the 7th patch here does, the key difference is that > > > > the finalize() method for MigrationState will set 'current_migration' > > > > to NULL after free'ing it. > > > > > > Yes. But this show case series isn't complete. We need a migration-side > > > lock finally to make it safe to access. For that, see: > > > > > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-9-pet...@redhat.com/ > > > > > > > > > > > I don't believe that is safe. > > > > > > I hope after the other series applied it will be 100% safe, even though I > > > agree it's tricky. But hopefully QOM is very clean, the trickly part is > > > still within migration, and it should be less tricky than migration > > > implement a refcount on top of Object.. > > > > Ok, so with the other series applied, this does look safe, but > > it also doesn't seem to really have any dependancy on the > > single interface code. Patch 7 here looks sufficient, in combo > > with the other 2 series to avoid the use-after-free flaws. > > Patch 7, when applied without patch 6 and prior, will crash in > device-introspect-test, trying to create yet another migration object when > processing the "device-list-properties" QMP command. And it turns out > that's also not the only way QEMU can crash by that. > > Fundamentally it's because patch 7 has global operations within > init()/finalize() to fix the migration dangling pointer, hence it must not > be instanciated more than once.
That's a result from moving the "assert()" into the constructor. The assert(!current_migration) can be kept in migration_object_init, the constructor could conditionally set current_migration only if it is NULL, and the finalizer could conditionally clear current_migration only if it matches the current object. There's no conceptual dependancy on having a singleton interface in the patch. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|