On Fri, 9 May 2025, Teddy Astie wrote:
> Le 09/05/2025 à 23:13, Stefano Stabellini a écrit :
> > On Fri, 9 May 2025, Roger Pau Monné wrote:
> >> On Thu, May 08, 2025 at 04:25:28PM -0700, Stefano Stabellini wrote:
> >>> On Thu, 8 May 2025, Roger Pau Monné wrote:
> >>>> On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
> >>>>> On Tue, 6 May 2025, Roger Pau Monné wrote:
> >>>>>> On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> >>>>>>> On Mon, 5 May 2025, Roger Pau Monné wrote:
> >>>>>>>> On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki 
> >>>>>>>> wrote:
> >>>>>>>>> On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> >>>>>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
> >>>>>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
> >>>>>>>>>>>> From: Xenia Ragiadakou <xenia.ragiada...@amd.com>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Dom0 PVH might need XENMEM_exchange when passing contiguous 
> >>>>>>>>>>>> memory
> >>>>>>>>>>>> addresses to firmware or co-processors not behind an IOMMU.
> >>>>>>>>>>>
> >>>>>>>>>>> I definitely don't understand the firmware part: It's subject to 
> >>>>>>>>>>> the
> >>>>>>>>>>> same transparent P2M translations as the rest of the VM; it's just
> >>>>>>>>>>> another piece of software running there.
> >>>>>>>>>>>
> >>>>>>>>>>> "Co-processors not behind an IOMMU" is also interesting; a more
> >>>>>>>>>>> concrete scenario might be nice, yet I realize you may be limited 
> >>>>>>>>>>> in
> >>>>>>>>>>> what you're allowed to say.
> >>>>>>>>>>
> >>>>>>>>>> Sure. On AMD x86 platforms there is a co-processor called PSP 
> >>>>>>>>>> running
> >>>>>>>>>> TEE firmware. The PSP is not behind an IOMMU. Dom0 needs 
> >>>>>>>>>> occasionally to
> >>>>>>>>>> pass addresses to it.  See drivers/tee/amdtee/ and
> >>>>>>>>>> include/linux/psp-tee.h in Linux.
> >>>>>>>>>
> >>>>>>>>> We had (have?) similar issue with amdgpu (for integrated graphics) 
> >>>>>>>>> - it
> >>>>>>>>> uses PSP for loading its firmware. With PV dom0 there is a 
> >>>>>>>>> workaround as
> >>>>>>>>> dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, 
> >>>>>>>>> but I
> >>>>>>>>> expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and 
> >>>>>>>>> it's
> >>>>>>>>> the one I used for debugging this issue).
> >>>>>>>>
> >>>>>>>> That's ugly, and problematic when used in conjunction with AMD-SEV.
> >>>>>>>>
> >>>>>>>> I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> >>>>>>>> to use, while allowing Xen to be the sole owner of the device.  
> >>>>>>>> Having
> >>>>>>>> both Xen and dom0 use it (for different purposes) seems like asking
> >>>>>>>> for trouble.  But I also have no idea how complex the PSP interface
> >>>>>>>> is, neither whether it would be feasible to emulate the
> >>>>>>>> interfaces/registers needed for firmware loading.
> >>>>>>>
> >>>>>>> Let me take a step back from the PSP for a moment. I am not opposed 
> >>>>>>> to a
> >>>>>>> PSP mediator in Xen, but I want to emphasize that the issue is more
> >>>>>>> general and extends well beyond the PSP.
> >>>>>>>
> >>>>>>> In my years working in embedded systems, I have consistently seen 
> >>>>>>> cases
> >>>>>>> where Dom0 needs to communicate with something that does not go 
> >>>>>>> through
> >>>>>>> the IOMMU. This could be due to special firmware on a co-processor, a
> >>>>>>> hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> >>>>>>> device that technically supports the IOMMU but performs poorly unless
> >>>>>>> the IOMMU is disabled. All of these are real-world examples that I 
> >>>>>>> have
> >>>>>>> seen personally.
> >>>>>>
> >>>>>> I wouldn't be surprised, classic PV dom0 avoided those issues because
> >>>>>> it was dealing directly with host addresses (mfns), but that's not the
> >>>>>> case with PVH dom0.
> >>>>>
> >>>>> Yeah
> >>>>>
> >>>>>
> >>>>>>> In my opinion, we definitely need a solution like this patch for Dom0
> >>>>>>> PVH to function correctly in all scenarios.
> >>>>>>
> >>>>>> I'm not opposed to having such interface available for PVH hardware
> >>>>>> domains.  I find it ugly, but I don't see much other way to deal with
> >>>>>> those kind of "devices".  Xen mediating accesses for each one of them
> >>>>>> is unlikely to be doable.
> >>>>>>
> >>>>>> How do you hook this exchange interface into Linux to differentiate
> >>>>>> which drivers need to use mfns when interacting with the hardware?
> >>>>>
> >>>>> In the specific case we have at hands the driver is in Linux userspace
> >>>>> and is specially-written for our use case. It is not generic, so we
> >>>>> don't have this problem. But your question is valid.
> >>>>
> >>>> Oh, so you then have some kind of ioctl interface that does the memory
> >>>> exchange and bouncing inside of the kernel on behalf of the user-space
> >>>> side I would think?
> >>>
> >>> I am not sure... Xenia might know more than me here.
> >>
> >> One further question I have regarding this approach: have you
> >> considered just populating an empty p2m space with contiguous physical
> >> memory instead of exchanging an existing area?  That would increase
> >> dom0 memory usage, but would prevent super page shattering in the p2m.
> >> You could use a dom0_mem=X,max:X+Y command line option, where Y
> >> would be your extra room for swiotlb-xen bouncing usage.
> >>
> >> XENMEM_increase_reservation documentation notes such hypercall already
> >> returns the base MFN of the allocated page (see comment in
> >> xen_memory_reservation struct declaration).
> >
> > XENMEM_exchange is the way it has been implemented traditionally in
> > Linux swiotlb-xen (it has been years). But your idea is good.
> >
> > Another, more drastic, idea would be to attempt to map Dom0 PVH memory
> > 1:1 at domain creation time like we do on ARM. If not all of it, as much
> > as possible. That would resolve the problem very efficiently. We could
> > communicate to Dom0 PVH the range that is 1:1 in one of the initial data
> > structures, and that would be the end of it.
> >
> 
> Could that be done by introducing a "fake" reserved region in advance
> (IVMD?) ? Such region are usually mapped to the domain 1:1 in addition
> to be coherent on the IOMMU side (so it doesn't break in case the PSP
> gets IOMMU-aware).

It doesn't have to be an "official" reserved-memory region (in the sense
of Documentation/devicetree/bindings/reserved-memory/) or exposed via
IVMD.

The memory that ends up mapped 1:1 in Dom0 PVH will be good memory which
could be used for all the regular stuff. If it is not a tiny amount, we
could let Linux (or other OS) do as they please with it.

It is only important for swiotlb-xen to know which one is the 1:1 range
so that it can manage bouncing (or not) over it.

If the 1:1 region is tiny, possibly due to memory allocation constraints
or other reasons, then yes, it makes sense to mark it as reserved
memory as you suggested, which we would do with a /reserved-memory node
if this was device tree system.

Reply via email to