On 5/9/25 5:47 AM, Alejandro Vallejo wrote:
>>>>>> 2. It can grab the *current* location of the pages and register an
>>>>>>    MMU notifier.  This works for GPU memory and file-backed memory.
>>>>>>    However, when the invalidate_range function of this callback, the
>>>>>>    driver *must* stop all further accesses to the pages.
>>>>>>
>>>>>>    The invalidate_range callback is not allowed to block for a long
>>>>>>    period of time.  My understanding is that things like dirty page
>>>>>>    writeback are blocked while the callback is in progress.  My
>>>>>>    understanding is also that the callback is not allowed to fail.
>>>>>>    I believe it can return a retryable error but I don’t think that
>>>>>>    it is allowed to keep failing forever.
>>>>>>
>>>>>>    Linux’s grant table driver actually had a bug in this area, which
>>>>>>    led to deadlocks.  I fixed that a while back.
>>>>>>
>>>>>> KVM implements the second option: it maps pages into the stage-2
>>>>>> page tables (or shadow page tables, if that is chosen) and unmaps
>>>>>> them when the invalidate_range callback is called.
> 
> I'm still lost as to what is where, who initiates what and what the end
> goal is. Is this about using userspace memory in dom0, and THEN sharing
> that with guests for as long as its live? And make enough magic so the
> guests don't notice the transitionary period in which there may not be
> any memory?
> 
> Or is this about using domU memory for the driver living in dom0?
> 
> Or is this about something else entirely?
> 
> For my own education. Is the following sequence diagram remotely accurate?
> 
> dom0                              domU
>  |                                  |
>  |---+                              |
>  |   | use gfn3 in the driver       |
>  |   | (mapped on user thread)      |
>  |<--+                              |
>  |                                  |
>  |  map mfn(gfn3) in domU BAR       |
>  |--------------------------------->|
>  |                              +---|
>  |              happily use BAR |   |
>  |                              +-->|
>  |---+                              |
>  |   | mmu notifier for gfn3        |
>  |   | (invalidate_range)           |
>  |<--+                              |
>  |                                  |
>  |  unmap mfn(gfn3)                 |
>  |--------------------------------->| <--- Plus some means to making guest 
>  |---+                          +---|      vCPUs pause on access.
>  |   | reclaim gfn3    block on |   |
>  |<--+                 access   |   |
>  |                              |   |
>  |---+                          |   |
>  |   | use gfn7 in the driver   |   |
>  |   | (mapped on user thread)  |   |
>  |<--+                          |   |
>  |                              |   |
>  |  map mfn(gfn7) in domU BAR   |   |
>  |------------------------------+-->| <--- Unpause blocked domU vCPUs
>  |                                  |

I believe this is accurate, yes.

>>>> - The switch from “emulated MMIO” to “MMIO or real RAM” needs to
>>>>   be atomic from the guest’s perspective.
>>>
>>> Updates of p2m PTEs are always atomic.
>> That’s good.
> 
> Updates to a single PTE are atomic, sure. But mapping/unmapping sizes
> not congruent with a whole superpage size (i.e: 256 KiB, more than a
> page, less than a superpage) wouldn't be, as far as the guest is
> concerned.
> 
> But if my understanding above is correct maybe it doesn't matter? It
> only needs to be atomic wrt the hypercall that requests it, so that the
> gfn is never reused while the guest p2m still holds that mfn.

I believe you are correct.  The only requirement is that the guest behaves
correctly if its page faults race against what is happening in the backend
domain.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

Attachment: OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to