Le 09/05/2025 à 23:13, Stefano Stabellini a écrit :
> On Fri, 9 May 2025, Roger Pau Monné wrote:
>> On Thu, May 08, 2025 at 04:25:28PM -0700, Stefano Stabellini wrote:
>>> On Thu, 8 May 2025, Roger Pau Monné wrote:
>>>> On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
>>>>> On Tue, 6 May 2025, Roger Pau Monné wrote:
>>>>>> On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
>>>>>>> On Mon, 5 May 2025, Roger Pau Monné wrote:
>>>>>>>> On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki 
>>>>>>>> wrote:
>>>>>>>>> On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
>>>>>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
>>>>>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
>>>>>>>>>>>> From: Xenia Ragiadakou <xenia.ragiada...@amd.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Dom0 PVH might need XENMEM_exchange when passing contiguous memory
>>>>>>>>>>>> addresses to firmware or co-processors not behind an IOMMU.
>>>>>>>>>>>
>>>>>>>>>>> I definitely don't understand the firmware part: It's subject to the
>>>>>>>>>>> same transparent P2M translations as the rest of the VM; it's just
>>>>>>>>>>> another piece of software running there.
>>>>>>>>>>>
>>>>>>>>>>> "Co-processors not behind an IOMMU" is also interesting; a more
>>>>>>>>>>> concrete scenario might be nice, yet I realize you may be limited in
>>>>>>>>>>> what you're allowed to say.
>>>>>>>>>>
>>>>>>>>>> Sure. On AMD x86 platforms there is a co-processor called PSP running
>>>>>>>>>> TEE firmware. The PSP is not behind an IOMMU. Dom0 needs 
>>>>>>>>>> occasionally to
>>>>>>>>>> pass addresses to it.  See drivers/tee/amdtee/ and
>>>>>>>>>> include/linux/psp-tee.h in Linux.
>>>>>>>>>
>>>>>>>>> We had (have?) similar issue with amdgpu (for integrated graphics) - 
>>>>>>>>> it
>>>>>>>>> uses PSP for loading its firmware. With PV dom0 there is a workaround 
>>>>>>>>> as
>>>>>>>>> dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, 
>>>>>>>>> but I
>>>>>>>>> expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
>>>>>>>>> the one I used for debugging this issue).
>>>>>>>>
>>>>>>>> That's ugly, and problematic when used in conjunction with AMD-SEV.
>>>>>>>>
>>>>>>>> I wonder if Xen could emulate/mediate some parts of the PSP for dom0
>>>>>>>> to use, while allowing Xen to be the sole owner of the device.  Having
>>>>>>>> both Xen and dom0 use it (for different purposes) seems like asking
>>>>>>>> for trouble.  But I also have no idea how complex the PSP interface
>>>>>>>> is, neither whether it would be feasible to emulate the
>>>>>>>> interfaces/registers needed for firmware loading.
>>>>>>>
>>>>>>> Let me take a step back from the PSP for a moment. I am not opposed to a
>>>>>>> PSP mediator in Xen, but I want to emphasize that the issue is more
>>>>>>> general and extends well beyond the PSP.
>>>>>>>
>>>>>>> In my years working in embedded systems, I have consistently seen cases
>>>>>>> where Dom0 needs to communicate with something that does not go through
>>>>>>> the IOMMU. This could be due to special firmware on a co-processor, a
>>>>>>> hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
>>>>>>> device that technically supports the IOMMU but performs poorly unless
>>>>>>> the IOMMU is disabled. All of these are real-world examples that I have
>>>>>>> seen personally.
>>>>>>
>>>>>> I wouldn't be surprised, classic PV dom0 avoided those issues because
>>>>>> it was dealing directly with host addresses (mfns), but that's not the
>>>>>> case with PVH dom0.
>>>>>
>>>>> Yeah
>>>>>
>>>>>
>>>>>>> In my opinion, we definitely need a solution like this patch for Dom0
>>>>>>> PVH to function correctly in all scenarios.
>>>>>>
>>>>>> I'm not opposed to having such interface available for PVH hardware
>>>>>> domains.  I find it ugly, but I don't see much other way to deal with
>>>>>> those kind of "devices".  Xen mediating accesses for each one of them
>>>>>> is unlikely to be doable.
>>>>>>
>>>>>> How do you hook this exchange interface into Linux to differentiate
>>>>>> which drivers need to use mfns when interacting with the hardware?
>>>>>
>>>>> In the specific case we have at hands the driver is in Linux userspace
>>>>> and is specially-written for our use case. It is not generic, so we
>>>>> don't have this problem. But your question is valid.
>>>>
>>>> Oh, so you then have some kind of ioctl interface that does the memory
>>>> exchange and bouncing inside of the kernel on behalf of the user-space
>>>> side I would think?
>>>
>>> I am not sure... Xenia might know more than me here.
>>
>> One further question I have regarding this approach: have you
>> considered just populating an empty p2m space with contiguous physical
>> memory instead of exchanging an existing area?  That would increase
>> dom0 memory usage, but would prevent super page shattering in the p2m.
>> You could use a dom0_mem=X,max:X+Y command line option, where Y
>> would be your extra room for swiotlb-xen bouncing usage.
>>
>> XENMEM_increase_reservation documentation notes such hypercall already
>> returns the base MFN of the allocated page (see comment in
>> xen_memory_reservation struct declaration).
>
> XENMEM_exchange is the way it has been implemented traditionally in
> Linux swiotlb-xen (it has been years). But your idea is good.
>
> Another, more drastic, idea would be to attempt to map Dom0 PVH memory
> 1:1 at domain creation time like we do on ARM. If not all of it, as much
> as possible. That would resolve the problem very efficiently. We could
> communicate to Dom0 PVH the range that is 1:1 in one of the initial data
> structures, and that would be the end of it.
>

Could that be done by introducing a "fake" reserved region in advance
(IVMD?) ? Such region are usually mapped to the domain 1:1 in addition
to be coherent on the IOMMU side (so it doesn't break in case the PSP
gets IOMMU-aware).

Teddy


 | Vates

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



Reply via email to