On Wed, Oct 30, 2024 at 09:48:07AM +0000, Daniel P. Berrangé wrote: > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > > v1: https://lore.kernel.org/r/20241024165627.1372621-1-pet...@redhat.com > > > Meanwhile, migration has a long standing issue on current_migration > > pointer, where it can point to freed data after the migration object is > > finalized. It is debatable that the pointer can be cleared after the main > > thread (1) join() the migration thread first, then (2) release the last > > refcount for the migration object and clear the pointer. However there's > > still major challenges [1]. With singleton, we could have a slightly but > > hopefully working workaround to clear the pointer during finalize(). > > I'm still not entirely convinced that this singleton proposal is > fixing the migration problem correctly. > > Based on discussions in v1, IIUC, the situation is that we have > migration_shutdown() being called from qemu_cleanup(). The former > will call object_unref(current_migration), but there may still > be background migration threads running that access 'current_migration', > and thus a potential use-after-free.
migration thread is fine, it takes a refcount at the entry. And btw, taking it at the entry is racy, we've just fixed it, see (in my next migration pull): https://lore.kernel.org/qemu-devel/20241024213056.1395400-2-pet...@redhat.com/ The access reported was, IIUC, outside migration code, but after both main/migration threads released the refcount, hence after finalize(). It could be a random migration_is_running() call very late in device code, for example. > > Based on what the 7th patch here does, the key difference is that > the finalize() method for MigrationState will set 'current_migration' > to NULL after free'ing it. Yes. But this show case series isn't complete. We need a migration-side lock finally to make it safe to access. For that, see: https://lore.kernel.org/qemu-devel/20241024213056.1395400-9-pet...@redhat.com/ > > I don't believe that is safe. I hope after the other series applied it will be 100% safe, even though I agree it's tricky. But hopefully QOM is very clean, the trickly part is still within migration, and it should be less tricky than migration implement a refcount on top of Object.. > > Back to the current code, if there is a use-after-free today, that > implies that the background threads are *not* holding their own > reference on 'current_migration', allowing the object to be free'd > while they're still using it. If they held their own reference then > the object_unref in migration_shutdown would not have any use after > free risk. > > The new code is not changing the ref counting done by any threads. > Therefore if there's a use-after-free in existing code, AFAICT, the > same use-after-free *must* still exist in the current code. > > The 7th patch only fixes the use-after-free, *if and only if* the > background thread tries to access 'current_migration', /after/ > finalize as completed. The use-after-free in this case, has been > turned into a NULL pointer reference. > > A background thread could be accessing the 'current_migration' pointer > *concurrently* with the finalize method executing though. In this case > we still have a use after free problem, only the time window in which > it exists has been narrowed a little. > > Shouldn't the problem with migration be solved by every migration thread > holding a reference on current_migration, that the thread releases when > it exits, such that MigrationState is only finalized once every thread > has exited ? That would not require any join() synchronization point. I think the question is whether things like migration_is_running() is allowed to be used anywhere, even after migration_shutdown(). My answer is, it should be ok to be used anywhere, and we don't necessarilly need to limit that. In that case the caller doesn't need to take a refcount because it's an immediate query. It can simply check its existance with the lock (after my patch 8 of the other series applied, which depends on this qom series). Thanks, -- Peter Xu