On Fri, Nov 05, 2021 at 10:11:48PM +0000, Sean Christopherson wrote:
> On Fri, Oct 08, 2021, Kuppuswamy Sathyanarayanan wrote:
> > From: "Kirill A. Shutemov" <[email protected]>
> > 
> > Just like MKTME, TDX reassigns bits of the physical address for
> > metadata.  MKTME used several bits for an encryption KeyID. TDX
> > uses a single bit in guests to communicate whether a physical page
> > should be protected by TDX as private memory (bit set to 0) or
> > unprotected and shared with the VMM (bit set to 1).
> > 
> > Add a helper, tdx_shared_mask() to generate the mask.  The processor
> > enumerates its physical address width to include the shared bit, which
> > means it gets included in __PHYSICAL_MASK by default.
> 
> This is incorrect.  The shared bit _may_ be a legal PA bit, but AIUI it's not 
> a
> hard requirement.

Good point, will fix.

> > Remove the shared mask from 'physical_mask' since any bits in
> > tdx_shared_mask() are not used for physical addresses in page table
> > entries.
> 
> ...
> 
> > @@ -94,6 +100,9 @@ static void tdx_get_info(void)
> >  
> >     td_info.gpa_width = out.rcx & GENMASK(5, 0);
> >     td_info.attributes = out.rdx;
> > +
> > +   /* Exclude Shared bit from the __PHYSICAL_MASK */
> > +   physical_mask &= ~tdx_shared_mask();
> 
> This is insufficient, though it's not really the fault of this patch, the 
> specs
> themselves botch this whole thing.
> 
> The TDX Module spec explicitly states that GPAs above GPAW are considered 
> reserved.
> 
>     10.11.1. GPAW-Relate EPT Violations
>     GPA bits higher than the SHARED bit are considered reserved and must be 0.
>     Address translation with any of the reserved bits set to 1 cause a #PF 
> with
>     PFEC (Page Fault Error Code) RSVD bit set.
> 
> But this is contradicted by the architectural extensions spec, which states 
> that
> a GPA that satisfies MAXPA >= GPA > GPAW "can" cause an EPT violation, not 
> #PF.
> Note, this section also appears to have a bug, as it states that GPA bit 47 is
> both the SHARED bit and reserved.  I assume that blurb is intended to clarify
> that bit 47 _would_ be reserved if it weren't the SHARED bit, but because it's
> the shared bit it's ok to access.
> 
>     1.4.2
>     Guest Physical Address Translation
>     If the CPU's maximum physical-address width (MAXPA) is 52 and the guest 
> physical
>     address width is configured to be 48, accesses with GPA bits 51:48 not 
> all being
>     0 can cause an EPT-violation, where such EPT-violations are not mutated 
> to #VE,
>     even if the “EPT-violations #VE” execution control is 1.
> 
>     If the CPU's physical-address width (MAXPA) is less than 48 and the 
> SHARED bit
>     is configured to be in bit position 47, GPA bit 47 would be reserved, and 
> GPA
>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^      
>               
>     bits 46:MAXPA would be reserved. On such CPUs, setting bits 51:48 or bits
>     46:MAXPA in any paging structure can cause a reserved bit page fault on 
> access.
> 
> The Module spec also calls out that the effective GPA is not to be confused 
> with
> MAXPA, which combined with the above blurb about MAXPA < GPAW, suggests that 
> MAXPA
> is enumerated separately by design so that the guest doesn't incorrectly think
> 46:MAXPA are usable.  But that is problematic for the case where MAXPA > GPAW.
> 
>     The effective GPA width (in bits) for this TD (do not confuse with MAXPA).
>     SHARED bit is at GPA bit GPAW-1.
> 
> I can't find the exact reference, but the TDX module always passes through 
> host's
> MAXPHYADDR.  As it pertains to this patch, just doing
> 
>       physical_mask &= ~tdx_shared_mask()
> 
> means that a guest running with GPAW=0 and MAXPHYADDR>48 will have a 
> discontiguous
> physical_mask, and could access "reserved" memory.  If the VMM defines legal 
> memory
> with bits [MAXPHYADDR:48]!=0, explosions may ensue.  That's arguably a VMM 
> bug, but
> given that the VMM is untrusted I think the guest should be paranoid when 
> handling
> the SHARED bit.  I also don't know that the kernel will play nice with a 
> discontiguous
> mask.

I expect it to be buggy.

> Specs aside, unless Intel makes a hardware change to treat GPAW as 
> guest.MAXPHYADDR,
> or the TDX Module emulates on EPT violations to inject #PF(RSVD) when 
> appropriate,
> this mess isn't going to be truly fixed from the guest perspective.
> 
> So, IMO all bits >= GPAW should be cleared, and the kernel should warn and/or
> refuse to boot if the host has defined legal memory in that range.

Right. But only >= GPAW-1 as shared bit is the MSB within GPAW:

        physical_mask &= GENMASK_ULL(td_info.gpa_width - 2, 0);

'2' here smells bad, but well...

Given that physical_mask is now contiguous we can truncate anything from
e820 that cannot be addressed with adjusted __PHYSICAL_MASK:

iff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index bc0657f0deed..16d57a8769e8 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -833,6 +833,9 @@ static unsigned long __init e820_end_pfn(unsigned long 
limit_pfn, enum e820_type
        unsigned long last_pfn = 0;
        unsigned long max_arch_pfn = MAX_ARCH_PFN;

+       if (max_arch_pfn > PHYS_PFN(__PHYSICAL_MASK + 1))
+               max_arch_pfn = PHYS_PFN(__PHYSICAL_MASK + 1);
+
        for (i = 0; i < e820_table->nr_entries; i++) {
                struct e820_entry *entry = &e820_table->entries[i];
                unsigned long start_pfn;

Does it look reasonable?

> FWIW, from a VMM perspective, I'm pretty sure the only sane approach is to 
> force
> GPAW=1, a.k.a. SHARED bit == 51, if host.MAXPHYADDR>=49.  But on the guest 
> side,
> I think we should be paranoid.

-- 
 Kirill A. Shutemov
_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to