On Thu Apr 24, 2025 at 7:22 PM BST, Ariadne Conill wrote:
> Hi,
>
>> On Apr 24, 2025, at 6:48 AM, Alejandro Vallejo <agarc...@amd.com> wrote:
>> 
>> On Thu Apr 24, 2025 at 1:45 PM BST, Alejandro Vallejo wrote:
>>> Xen nowadays crashes under some Hyper-V configurations when
>>> paddr_bits>36. At the 44bit boundary we reach an edge case in which the
>>> end of the guest physical address space is not representable using 32bit
>>> MFNs. Furthermore, it's an act of faith that the tail of the physical
>>> address space has no reserved regions already.
>>> 
>>> This commit uses the first unused MFN rather than the last, thus
>>> ensuring the hypercall page placement is more resilient against such
>>> corner cases.
>>> 
>>> While at this, add an extra BUG_ON() to explicitly test for the
>>> hypercall page being correctly set, and mark hcall_page_ready as
>>> __ro_after_init.
>>> 
>>> Fixes: 620fc734f854("x86/hyperv: setup hypercall page")
>>> Signed-off-by: Alejandro Vallejo <agarc...@amd.com>
>> 
>> After a side discussion, this seems on the unsafe side of things due to
>> potential collision with MMIO. I'll resend (though not today) with the
>> page overlapping a RAM page instead. Possibly the last page of actual
>> RAM.
>
> We have been working on bringing Xen up on Azure over at Edera, and
> have encountered this problem.  Our solution to this problem was to
> change Xen to handle the hypercall trampoline page in the same way as
> Linux: dynamically allocating a page from the heap and then marking it
> as executable.
>
> This approach should avoid the issues with MMIO and page overlaps.

Yes, that's what I meant by overlapping RAM. Overlaying the hypercall
page on top of existing RAM rather than trying to find a suitable hole.

> Would it be more interesting to start with our patch instead?

If you have it ready to go, for sure. My ability to test any of this is
fairly limited. I suspect the VM is just not getting 48 bits worth of
guest-physical address space, and so making any hypercall turns into an
EPT violation.

I couldn't run the tests that would definitely prove it though

>From the little I saw of the dmesg going forward, I suspect there's more
required (at least in time handling) to enable support in gen2
insteances.

Cheers,
Alejandro


Reply via email to