On Tue, 1 Nov 2022 at 14:17, Peter Maydell <peter.mayd...@linaro.org> wrote: > > Hi; I'm trying to find out what the UI layer's threading and > locking strategy is, at least as far as it applies to display > device models.
Ping! :-) I'm still looking for information about this, and about what threads call_rcu() callbacks might be run on... thanks -- PMM > Specifically: > * is the device's GraphicHwOps::gfx_update method always called > from one specific thread, or might it be called from any thread? > * is that method called with any locks guaranteed held? (eg the > iothread lock) > * is the caller of the gfx_update method OK if an implementation > of the method drops the iothread lock temporarily while it is > executing? (my guess would be "no") > * for a gfx_update_async = true device, what are the requirements > on calling graphic_hw_update_done()? Does the caller need to hold > any particular lock? Does the call need to be done from any > particular thread? > > The background to this is that I'm looking again at the race > condition involving the memory_region_snapshot_and_clear_dirty() > function, as described here: > > https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/#u > > Having worked through what is going on, as far as I can see: > (1) in order to be sure that we have the right data to match > the snapshotted dirty bitmap state, we must wait for all TCG > vCPUs to leave their current TB > (2) a vCPU might block waiting for the iothread lock mid-TB > (3) therefore we cannot wait for the TCG vCPUs without dropping > the iothread lock one way or another > (4) but none of the callers expect that and various things break > > My tentative idea for a fix is a bit of an upheaval: > * have the display devices set gfx_update_async = true > * instead of doing everything synchronously in their gfx_update > method, they do the initial setup and call an 'async' version > of memory_region_snapshot_and_clear_dirty() > * that async version of the function will do what it does today, > but without trying to wait for TCG vCPUs > * instead the caller arranges (via call_rcu(), probably) a > callback that will happen once all the TCG CPUs have finished > executing their current TB > * that callback does the actual copy-from-guest-ram-to-display > and then calls graphic_hw_update_done() > > This seems like an awful pain in the neck but I couldn't see > anything better :-( > > Paolo: what (if any) guarantee does call_rcu() make about > which thread the callback function gets executed on, and what > locks are/are not held when it's called? > > (I haven't looked at the migration code's use of > memory_global_after_dirty_log_sync() but I suspect it's > similarly broken.) > > thanks > -- PMM