On 19/07/18 11:49, Jan Beulich wrote:
> Checking the low 5 bits of CR3 is not the job of vmx_load_pdptrs().
> Instead it should #GP upon bad PDPTE values, rather than causing a VM
> entry failure.
>
> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1361,20 +1361,18 @@ static void vmx_set_interrupt_shadow(str
>      __vmwrite(GUEST_INTERRUPTIBILITY_INFO, intr_shadow);
>  }
>  
> -static void vmx_load_pdptrs(struct vcpu *v)
> +static bool vmx_load_pdptrs(struct vcpu *v)
>  {
>      unsigned long cr3 = v->arch.hvm_vcpu.guest_cr[3];
> -    uint64_t *guest_pdptes;
> +    uint64_t *guest_pdptes, valid_mask;
>      struct page_info *page;
>      p2m_type_t p2mt;
>      char *p;
> +    unsigned int i;
>  
>      /* EPT needs to load PDPTRS into VMCS for PAE. */
>      if ( !hvm_pae_enabled(v) || (v->arch.hvm_vcpu.guest_efer & EFER_LMA) )
> -        return;
> -
> -    if ( (cr3 & 0x1fUL) && !hvm_pcid_enabled(v) )
> -        goto crash;
> +        return true;
>  
>      page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt, 
> P2M_UNSHARE);

cr3 needs truncating to 32 bits before doing this.  The upper bits of
cr3 can remain set after transitioning away from long mode, which will
cause this emulation to do the wrong thing.

>      if ( !page )
> @@ -1385,34 +1383,47 @@ static void vmx_load_pdptrs(struct vcpu
>          gdprintk(XENLOG_ERR,
>                   "Bad cr3 on load pdptrs gfn %lx type %d\n",
>                   cr3 >> PAGE_SHIFT, (int) p2mt);
> -        goto crash;
> +        domain_crash(v->domain);
> +        return false;
>      }
>  
>      p = __map_domain_page(page);
>  
> -    guest_pdptes = (uint64_t *)(p + (cr3 & ~PAGE_MASK));
> +    guest_pdptes = (uint64_t *)(p + (cr3 & 0xfe0));

You can drop p, and guest_pdptes can lose its prefix.

>  
> -    /*
> -     * We do not check the PDPTRs for validity. The CPU will do this during
> -     * vm entry, and we can handle the failure there and crash the guest.
> -     * The only thing we could do better here is #GP instead.
> -     */
> +    valid_mask = ((1ULL << v->domain->arch.cpuid->extd.maxphysaddr) - 1) &
> +                 (PAGE_MASK | _PAGE_AVAIL | _PAGE_PRESENT);

How did you come across this list?  The only valid bits are Present, PWT
and PCD, while the upper nibble of control bits is documented as ignored
rather than reserved.

> +    for ( i = 0; i < 4; ++i )
> +        if ( (guest_pdptes[i] & _PAGE_PRESENT) &&
> +             (guest_pdptes[i] & ~valid_mask) )
> +        {
> +            if ( v == current )
> +                hvm_inject_hw_exception(TRAP_gp_fault, 0);

The function is void for the same reason why this isn't correct.  We are
in the hvm_update_* path rather than the set_* path, which is beyond the
point of being able to unwind (and why I haven't yet got around to
fixing this function).

The only way I can see of fixing this is plumbing everything into a new
paging_set_cr3() callback, which can return X86EMUL_EXCEPTION and fail
the hvm_set_cr3() call.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to