Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses

2017-05-12 Thread Paolo Bonzini
On 12/05/2017 09:38, Xiao Guangrong wrote:
> CC Kevin as i am not sure if Intel is aware of this issue, it
> breaks other hypervisors, e.g, Xen, as swell.

It's actually more complicated.

When EPT A/D bits are disabled, reads of the page tables behave as
described in the manual; writes have both bit 0 and bit 1 set, while the
manual suggests only bit 1 is set.

Peter and David convinced me that it's a hypervisor bug, and I'm not
surprised that Xen has the same issue.  You have to disable EPT A/D bits
for shadow EPT page tables when the L1 hypervisor is not using them.

Paolo


Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses

2017-05-12 Thread Paolo Bonzini
On 12/05/2017 09:38, Xiao Guangrong wrote:
> CC Kevin as i am not sure if Intel is aware of this issue, it
> breaks other hypervisors, e.g, Xen, as swell.

It's actually more complicated.

When EPT A/D bits are disabled, reads of the page tables behave as
described in the manual; writes have both bit 0 and bit 1 set, while the
manual suggests only bit 1 is set.

Peter and David convinced me that it's a hypervisor bug, and I'm not
surprised that Xen has the same issue.  You have to disable EPT A/D bits
for shadow EPT page tables when the L1 hypervisor is not using them.

Paolo


Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses

2017-05-12 Thread Xiao Guangrong


CC Kevin as i am not sure if Intel is aware of this issue, it
breaks other hypervisors, e.g, Xen, as swell.

On 05/11/2017 07:23 PM, Paolo Bonzini wrote:

The new ept_access_test_paddr_read_only_ad_disabled testcase
caused an infinite stream of EPT violations because KVM did not
find anything bad in the page tables and kept re-executing the
faulting instruction.

This is because the exit qualification said we were reading from
the page tables, but actually writing the cause of the EPT violation
was writing the A/D bits.  This happened even with eptad=0, quite
surprisingly.

Thus, always treat guest page table accesses as read+write operations,
even if the exit qualification says otherwise.  This fixes the
testcase.

Signed-off-by: Paolo Bonzini 
---
  arch/x86/kvm/vmx.c | 36 +++-
  1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6f4ad44aa95..c868cbdad29a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
u32 error_code;
  
  	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);

+   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+   trace_kvm_page_fault(gpa, exit_qualification);
  
-	if (is_guest_mode(vcpu)

-   && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
-   /*
-* Fix up exit_qualification according to whether guest
-* page table accesses are reads or writes.
-*/
-   u64 eptp = nested_ept_get_cr3(vcpu);
-   if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
-   exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
-   }
+   /*
+* All guest page table accesses are potential writes to A/D bits.
+* but EPT microcode only reports them as such when EPT A/D is
+* enabled.  Tracing ept_access_test_paddr_read_only_ad_disabled (from
+* kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor
+* does not change its behavior when EPTP enables A/D bits; the only
+* difference is in the exit qualification.  So fix this up here.
+*/
+   if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED))
+   exit_qualification |= EPT_VIOLATION_ACC_WRITE;
  
  	/*

 * EPT violation happened while executing iret from NMI,
@@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
(exit_qualification & INTR_INFO_UNBLOCK_NMI))
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, 
GUEST_INTR_STATE_NMI);
  
-	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);

-   trace_kvm_page_fault(gpa, exit_qualification);
-
/* Is it a read fault? */
error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
 ? PFERR_USER_MASK : 0;
@@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
  ? PFERR_PRESENT_MASK : 0;
  
  	vcpu->arch.gpa_available = true;

+
+   if (is_guest_mode(vcpu)
+   && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
+   /*
+* Now fix up exit_qualification according to what the
+* L1 hypervisor expects to see.
+*/
+   u64 eptp = nested_ept_get_cr3(vcpu);
+   if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
+   exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
+   }


I am not sure if this is really needed, it (PFEC.W = 0 if A/D need to be set on
page structures) is not we expect.

Maybe always report the right behavior is better? Especially,Intel may fix its
microcode as it hurts the newest CPUs as well.

Thanks!


Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses

2017-05-12 Thread Xiao Guangrong


CC Kevin as i am not sure if Intel is aware of this issue, it
breaks other hypervisors, e.g, Xen, as swell.

On 05/11/2017 07:23 PM, Paolo Bonzini wrote:

The new ept_access_test_paddr_read_only_ad_disabled testcase
caused an infinite stream of EPT violations because KVM did not
find anything bad in the page tables and kept re-executing the
faulting instruction.

This is because the exit qualification said we were reading from
the page tables, but actually writing the cause of the EPT violation
was writing the A/D bits.  This happened even with eptad=0, quite
surprisingly.

Thus, always treat guest page table accesses as read+write operations,
even if the exit qualification says otherwise.  This fixes the
testcase.

Signed-off-by: Paolo Bonzini 
---
  arch/x86/kvm/vmx.c | 36 +++-
  1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6f4ad44aa95..c868cbdad29a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
u32 error_code;
  
  	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);

+   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+   trace_kvm_page_fault(gpa, exit_qualification);
  
-	if (is_guest_mode(vcpu)

-   && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
-   /*
-* Fix up exit_qualification according to whether guest
-* page table accesses are reads or writes.
-*/
-   u64 eptp = nested_ept_get_cr3(vcpu);
-   if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
-   exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
-   }
+   /*
+* All guest page table accesses are potential writes to A/D bits.
+* but EPT microcode only reports them as such when EPT A/D is
+* enabled.  Tracing ept_access_test_paddr_read_only_ad_disabled (from
+* kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor
+* does not change its behavior when EPTP enables A/D bits; the only
+* difference is in the exit qualification.  So fix this up here.
+*/
+   if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED))
+   exit_qualification |= EPT_VIOLATION_ACC_WRITE;
  
  	/*

 * EPT violation happened while executing iret from NMI,
@@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
(exit_qualification & INTR_INFO_UNBLOCK_NMI))
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, 
GUEST_INTR_STATE_NMI);
  
-	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);

-   trace_kvm_page_fault(gpa, exit_qualification);
-
/* Is it a read fault? */
error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
 ? PFERR_USER_MASK : 0;
@@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
  ? PFERR_PRESENT_MASK : 0;
  
  	vcpu->arch.gpa_available = true;

+
+   if (is_guest_mode(vcpu)
+   && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
+   /*
+* Now fix up exit_qualification according to what the
+* L1 hypervisor expects to see.
+*/
+   u64 eptp = nested_ept_get_cr3(vcpu);
+   if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
+   exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
+   }


I am not sure if this is really needed, it (PFEC.W = 0 if A/D need to be set on
page structures) is not we expect.

Maybe always report the right behavior is better? Especially,Intel may fix its
microcode as it hurts the newest CPUs as well.

Thanks!


[PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses

2017-05-11 Thread Paolo Bonzini
The new ept_access_test_paddr_read_only_ad_disabled testcase
caused an infinite stream of EPT violations because KVM did not
find anything bad in the page tables and kept re-executing the
faulting instruction.

This is because the exit qualification said we were reading from
the page tables, but actually writing the cause of the EPT violation
was writing the A/D bits.  This happened even with eptad=0, quite
surprisingly.

Thus, always treat guest page table accesses as read+write operations,
even if the exit qualification says otherwise.  This fixes the
testcase.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/vmx.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6f4ad44aa95..c868cbdad29a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
u32 error_code;
 
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+   trace_kvm_page_fault(gpa, exit_qualification);
 
-   if (is_guest_mode(vcpu)
-   && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
-   /*
-* Fix up exit_qualification according to whether guest
-* page table accesses are reads or writes.
-*/
-   u64 eptp = nested_ept_get_cr3(vcpu);
-   if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
-   exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
-   }
+   /*
+* All guest page table accesses are potential writes to A/D bits.
+* but EPT microcode only reports them as such when EPT A/D is
+* enabled.  Tracing ept_access_test_paddr_read_only_ad_disabled (from
+* kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor
+* does not change its behavior when EPTP enables A/D bits; the only
+* difference is in the exit qualification.  So fix this up here.
+*/
+   if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED))
+   exit_qualification |= EPT_VIOLATION_ACC_WRITE;
 
/*
 * EPT violation happened while executing iret from NMI,
@@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
(exit_qualification & INTR_INFO_UNBLOCK_NMI))
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, 
GUEST_INTR_STATE_NMI);
 
-   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-   trace_kvm_page_fault(gpa, exit_qualification);
-
/* Is it a read fault? */
error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
 ? PFERR_USER_MASK : 0;
@@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
  ? PFERR_PRESENT_MASK : 0;
 
vcpu->arch.gpa_available = true;
+
+   if (is_guest_mode(vcpu)
+   && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
+   /*
+* Now fix up exit_qualification according to what the
+* L1 hypervisor expects to see.
+*/
+   u64 eptp = nested_ept_get_cr3(vcpu);
+   if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
+   exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
+   }
vcpu->arch.exit_qualification = exit_qualification;
 
return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
-- 
1.8.3.1



[PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses

2017-05-11 Thread Paolo Bonzini
The new ept_access_test_paddr_read_only_ad_disabled testcase
caused an infinite stream of EPT violations because KVM did not
find anything bad in the page tables and kept re-executing the
faulting instruction.

This is because the exit qualification said we were reading from
the page tables, but actually writing the cause of the EPT violation
was writing the A/D bits.  This happened even with eptad=0, quite
surprisingly.

Thus, always treat guest page table accesses as read+write operations,
even if the exit qualification says otherwise.  This fixes the
testcase.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/vmx.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6f4ad44aa95..c868cbdad29a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
u32 error_code;
 
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+   trace_kvm_page_fault(gpa, exit_qualification);
 
-   if (is_guest_mode(vcpu)
-   && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
-   /*
-* Fix up exit_qualification according to whether guest
-* page table accesses are reads or writes.
-*/
-   u64 eptp = nested_ept_get_cr3(vcpu);
-   if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
-   exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
-   }
+   /*
+* All guest page table accesses are potential writes to A/D bits.
+* but EPT microcode only reports them as such when EPT A/D is
+* enabled.  Tracing ept_access_test_paddr_read_only_ad_disabled (from
+* kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor
+* does not change its behavior when EPTP enables A/D bits; the only
+* difference is in the exit qualification.  So fix this up here.
+*/
+   if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED))
+   exit_qualification |= EPT_VIOLATION_ACC_WRITE;
 
/*
 * EPT violation happened while executing iret from NMI,
@@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
(exit_qualification & INTR_INFO_UNBLOCK_NMI))
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, 
GUEST_INTR_STATE_NMI);
 
-   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-   trace_kvm_page_fault(gpa, exit_qualification);
-
/* Is it a read fault? */
error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
 ? PFERR_USER_MASK : 0;
@@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
  ? PFERR_PRESENT_MASK : 0;
 
vcpu->arch.gpa_available = true;
+
+   if (is_guest_mode(vcpu)
+   && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
+   /*
+* Now fix up exit_qualification according to what the
+* L1 hypervisor expects to see.
+*/
+   u64 eptp = nested_ept_get_cr3(vcpu);
+   if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
+   exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
+   }
vcpu->arch.exit_qualification = exit_qualification;
 
return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
-- 
1.8.3.1