On 5/9/25 5:47 AM, Alejandro Vallejo wrote: >>>>>> 2. It can grab the *current* location of the pages and register an >>>>>> MMU notifier. This works for GPU memory and file-backed memory. >>>>>> However, when the invalidate_range function of this callback, the >>>>>> driver *must* stop all further accesses to the pages. >>>>>> >>>>>> The invalidate_range callback is not allowed to block for a long >>>>>> period of time. My understanding is that things like dirty page >>>>>> writeback are blocked while the callback is in progress. My >>>>>> understanding is also that the callback is not allowed to fail. >>>>>> I believe it can return a retryable error but I don’t think that >>>>>> it is allowed to keep failing forever. >>>>>> >>>>>> Linux’s grant table driver actually had a bug in this area, which >>>>>> led to deadlocks. I fixed that a while back. >>>>>> >>>>>> KVM implements the second option: it maps pages into the stage-2 >>>>>> page tables (or shadow page tables, if that is chosen) and unmaps >>>>>> them when the invalidate_range callback is called. > > I'm still lost as to what is where, who initiates what and what the end > goal is. Is this about using userspace memory in dom0, and THEN sharing > that with guests for as long as its live? And make enough magic so the > guests don't notice the transitionary period in which there may not be > any memory? > > Or is this about using domU memory for the driver living in dom0? > > Or is this about something else entirely? > > For my own education. Is the following sequence diagram remotely accurate? > > dom0 domU > | | > |---+ | > | | use gfn3 in the driver | > | | (mapped on user thread) | > |<--+ | > | | > | map mfn(gfn3) in domU BAR | > |--------------------------------->| > | +---| > | happily use BAR | | > | +-->| > |---+ | > | | mmu notifier for gfn3 | > | | (invalidate_range) | > |<--+ | > | | > | unmap mfn(gfn3) | > |--------------------------------->| <--- Plus some means to making guest > |---+ +---| vCPUs pause on access. > | | reclaim gfn3 block on | | > |<--+ access | | > | | | > |---+ | | > | | use gfn7 in the driver | | > | | (mapped on user thread) | | > |<--+ | | > | | | > | map mfn(gfn7) in domU BAR | | > |------------------------------+-->| <--- Unpause blocked domU vCPUs > | |
I believe this is accurate, yes. >>>> - The switch from “emulated MMIO” to “MMIO or real RAM” needs to >>>> be atomic from the guest’s perspective. >>> >>> Updates of p2m PTEs are always atomic. >> That’s good. > > Updates to a single PTE are atomic, sure. But mapping/unmapping sizes > not congruent with a whole superpage size (i.e: 256 KiB, more than a > page, less than a superpage) wouldn't be, as far as the guest is > concerned. > > But if my understanding above is correct maybe it doesn't matter? It > only needs to be atomic wrt the hypercall that requests it, so that the > gfn is never reused while the guest p2m still holds that mfn. I believe you are correct. The only requirement is that the guest behaves correctly if its page faults race against what is happening in the backend domain. -- Sincerely, Demi Marie Obenour (she/her/hers)
OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key
OpenPGP_signature.asc
Description: OpenPGP digital signature