On Tue Apr 29, 2025 at 9:15 AM BST, Roger Pau Monné wrote:
> On Mon, Apr 28, 2025 at 12:50:55PM +0100, Alejandro Vallejo wrote:
>> On Mon Apr 28, 2025 at 12:07 PM BST, Andrew Cooper wrote:
>> > On 28/04/2025 11:55 am, Alejandro Vallejo wrote:
>> >> On Mon Apr 28, 2025 at 10:41 AM BST, Roger Pau Monné wrote:
>> >>> On Fri, Apr 25, 2025 at 04:43:31PM -0700, Ariadne Conill wrote:
>> >>>> Previously Xen placed the hypercall page at the highest possible MFN,
>> >>>> but this caused problems on systems where there is more than 36 bits
>> >>>> of physical address space.
>> >>>>
>> >>>> In general, it also seems unreliable to assume that the highest possible
>> >>>> MFN is not already reserved for some other purpose.
>> >>>>
>> >>>> Changes from v1:
>> >>>> - Continue to use fixmap infrastructure
>> >>>> - Use panic in Hyper-V setup() function instead of returning -ENOMEM
>> >>>>   on hypercall page allocation failure
>> >>>>
>> >>>> Fixes: 620fc734f854 ("x86/hyperv: setup hypercall page")
>> >>>> Cc: Alejandro Vallejo <agarc...@amd.com>
>> >>>> Cc: Alexander M. Merritt <alexan...@edera.dev>
>> >>>> Signed-off-by: Ariadne Conill <ariadne@ariadne.space>
>> >>>> ---
>> >>>>  xen/arch/x86/guest/hyperv/hyperv.c      | 17 +++++++----------
>> >>>>  xen/arch/x86/include/asm/guest/hyperv.h |  3 ---
>> >>>>  2 files changed, 7 insertions(+), 13 deletions(-)
>> >>>>
>> >>>> diff --git a/xen/arch/x86/guest/hyperv/hyperv.c 
>> >>>> b/xen/arch/x86/guest/hyperv/hyperv.c
>> >>>> index 6989af38f1..0305374a06 100644
>> >>>> --- a/xen/arch/x86/guest/hyperv/hyperv.c
>> >>>> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
>> >>>> @@ -98,7 +98,13 @@ static void __init setup_hypercall_page(void)
>> >>>>      rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >>>>      if ( !hypercall_msr.enable )
>> >>>>      {
>> >>>> -        mfn = HV_HCALL_MFN;
>> >>>> +        void *hcall_page = alloc_xenheap_page();
>> >>>> +        if ( !hcall_page )
>> >>>> +            panic("Hyper-V: Failed to allocate hypercall trampoline 
>> >>>> page");
>> >>>> +
>> >>>> +        printk("Hyper-V: Allocated hypercall page @ %p.\n", 
>> >>>> hcall_page);
>> >>> This likely wants to be a dprintk, and possibly also print the
>> >>> physical address of the used page?  And no period at the end of the
>> >>> sentence IMO.
>> >>>
>> >>> I think Xen might have used the last page in the physical address
>> >>> range to prevent HyperV from possibly shattering a superpage in the
>> >>> second stage translation page-tables if normal RAM was used?
>> >>>
>> >>> However I don't know whether HyperV will shatter super-pages if a
>> >>> sub-page of it is used to contain the hypercall page (I don't think it
>> >>> should?)
>> >> I think it's quite unlikely.
>> >
>> > It will shatter superpages.
>> >
>> > The overlay is not part of guest memory, and will hide whatever is
>> > behind it while it is mapped, which will force a 4k PTE in EPT/NPT.
>> 
>> That's an implementation detail. They can very well copy the trampoline
>> to guest memory when there is such (and save the previous contents
>> elsewhere) and restore them when disabling the trampoline. It's a
>> trivial optimisation that would prevent shattering while being fully
>> compliant with the TLFS.
>
> It's an implementation detail relevant from a guest perspective, as it
> impacts guest performance.  IOW: we care about the specific (current)
> implementation, as it's meaningful to how the guest-side should be
> implemented.

Indeed, I didn't mean otherwise.

My reply had more to do with the assertiveness of "it will shatter". The
reality is that we can't know, short of microbenching a read into the
trampoline after a TLB flush. And even then it may very well be
version-dependent (where old did, and newer doesn't, for instance).

>
>> The actual physical location of the trampoline is fully undefined. It
>> is defined to be an overlay; but that's a specification, not an
>> implementation.
>> 
>> >
>> > Thinking about it, a better position would be adjacent to the APIC MMIO
>> > window, so at 0xfee01000.  The APIC MMIO window is forced to be a 4k
>> > mapping too, and the rest of the 2M window is normally empty.
>> >
>> 
>> Sounds like an assumption waiting to be broken. Just like the last page
>> of guest-physical was.
>
> As a compromise - could we try to allocate from < 4GB first, and
> resort to high memory if that doesn't work?  That would at least limit
> shattering (if done) to the low 4GB, which is quite likely fragmented
> already:
>
> hcall_page = alloc_xenheap_pages(0, MEMF_bits(32));
> if ( !hcall_page )
>     hcall_page = alloc_xenheap_page();
> if ( !hcall_page )
>     panic(...);
>
> That will need a comment to describe what's going on.

Sounds far more resilient to me. I could get behind that, sure.

Ariadne, thoughts?

Cheers,
Alejandro

Reply via email to