Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses
On 12/05/2017 09:38, Xiao Guangrong wrote: > CC Kevin as i am not sure if Intel is aware of this issue, it > breaks other hypervisors, e.g, Xen, as swell. It's actually more complicated. When EPT A/D bits are disabled, reads of the page tables behave as described in the manual; writes have both bit 0 and bit 1 set, while the manual suggests only bit 1 is set. Peter and David convinced me that it's a hypervisor bug, and I'm not surprised that Xen has the same issue. You have to disable EPT A/D bits for shadow EPT page tables when the L1 hypervisor is not using them. Paolo
Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses
On 12/05/2017 09:38, Xiao Guangrong wrote: > CC Kevin as i am not sure if Intel is aware of this issue, it > breaks other hypervisors, e.g, Xen, as swell. It's actually more complicated. When EPT A/D bits are disabled, reads of the page tables behave as described in the manual; writes have both bit 0 and bit 1 set, while the manual suggests only bit 1 is set. Peter and David convinced me that it's a hypervisor bug, and I'm not surprised that Xen has the same issue. You have to disable EPT A/D bits for shadow EPT page tables when the L1 hypervisor is not using them. Paolo
Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses
CC Kevin as i am not sure if Intel is aware of this issue, it breaks other hypervisors, e.g, Xen, as swell. On 05/11/2017 07:23 PM, Paolo Bonzini wrote: The new ept_access_test_paddr_read_only_ad_disabled testcase caused an infinite stream of EPT violations because KVM did not find anything bad in the page tables and kept re-executing the faulting instruction. This is because the exit qualification said we were reading from the page tables, but actually writing the cause of the EPT violation was writing the A/D bits. This happened even with eptad=0, quite surprisingly. Thus, always treat guest page table accesses as read+write operations, even if the exit qualification says otherwise. This fixes the testcase. Signed-off-by: Paolo Bonzini--- arch/x86/kvm/vmx.c | 36 +++- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c6f4ad44aa95..c868cbdad29a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) u32 error_code; exit_qualification = vmcs_readl(EXIT_QUALIFICATION); + gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); + trace_kvm_page_fault(gpa, exit_qualification); - if (is_guest_mode(vcpu) - && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) { - /* -* Fix up exit_qualification according to whether guest -* page table accesses are reads or writes. -*/ - u64 eptp = nested_ept_get_cr3(vcpu); - if (!(eptp & VMX_EPT_AD_ENABLE_BIT)) - exit_qualification &= ~EPT_VIOLATION_ACC_WRITE; - } + /* +* All guest page table accesses are potential writes to A/D bits. +* but EPT microcode only reports them as such when EPT A/D is +* enabled. Tracing ept_access_test_paddr_read_only_ad_disabled (from +* kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor +* does not change its behavior when EPTP enables A/D bits; the only +* difference is in the exit qualification. So fix this up here. +*/ + if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) + exit_qualification |= EPT_VIOLATION_ACC_WRITE; /* * EPT violation happened while executing iret from NMI, @@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) (exit_qualification & INTR_INFO_UNBLOCK_NMI)) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); - gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); - trace_kvm_page_fault(gpa, exit_qualification); - /* Is it a read fault? */ error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) ? PFERR_USER_MASK : 0; @@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) ? PFERR_PRESENT_MASK : 0; vcpu->arch.gpa_available = true; + + if (is_guest_mode(vcpu) + && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) { + /* +* Now fix up exit_qualification according to what the +* L1 hypervisor expects to see. +*/ + u64 eptp = nested_ept_get_cr3(vcpu); + if (!(eptp & VMX_EPT_AD_ENABLE_BIT)) + exit_qualification &= ~EPT_VIOLATION_ACC_WRITE; + } I am not sure if this is really needed, it (PFEC.W = 0 if A/D need to be set on page structures) is not we expect. Maybe always report the right behavior is better? Especially,Intel may fix its microcode as it hurts the newest CPUs as well. Thanks!
Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses
CC Kevin as i am not sure if Intel is aware of this issue, it breaks other hypervisors, e.g, Xen, as swell. On 05/11/2017 07:23 PM, Paolo Bonzini wrote: The new ept_access_test_paddr_read_only_ad_disabled testcase caused an infinite stream of EPT violations because KVM did not find anything bad in the page tables and kept re-executing the faulting instruction. This is because the exit qualification said we were reading from the page tables, but actually writing the cause of the EPT violation was writing the A/D bits. This happened even with eptad=0, quite surprisingly. Thus, always treat guest page table accesses as read+write operations, even if the exit qualification says otherwise. This fixes the testcase. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx.c | 36 +++- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c6f4ad44aa95..c868cbdad29a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) u32 error_code; exit_qualification = vmcs_readl(EXIT_QUALIFICATION); + gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); + trace_kvm_page_fault(gpa, exit_qualification); - if (is_guest_mode(vcpu) - && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) { - /* -* Fix up exit_qualification according to whether guest -* page table accesses are reads or writes. -*/ - u64 eptp = nested_ept_get_cr3(vcpu); - if (!(eptp & VMX_EPT_AD_ENABLE_BIT)) - exit_qualification &= ~EPT_VIOLATION_ACC_WRITE; - } + /* +* All guest page table accesses are potential writes to A/D bits. +* but EPT microcode only reports them as such when EPT A/D is +* enabled. Tracing ept_access_test_paddr_read_only_ad_disabled (from +* kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor +* does not change its behavior when EPTP enables A/D bits; the only +* difference is in the exit qualification. So fix this up here. +*/ + if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) + exit_qualification |= EPT_VIOLATION_ACC_WRITE; /* * EPT violation happened while executing iret from NMI, @@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) (exit_qualification & INTR_INFO_UNBLOCK_NMI)) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); - gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); - trace_kvm_page_fault(gpa, exit_qualification); - /* Is it a read fault? */ error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) ? PFERR_USER_MASK : 0; @@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) ? PFERR_PRESENT_MASK : 0; vcpu->arch.gpa_available = true; + + if (is_guest_mode(vcpu) + && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) { + /* +* Now fix up exit_qualification according to what the +* L1 hypervisor expects to see. +*/ + u64 eptp = nested_ept_get_cr3(vcpu); + if (!(eptp & VMX_EPT_AD_ENABLE_BIT)) + exit_qualification &= ~EPT_VIOLATION_ACC_WRITE; + } I am not sure if this is really needed, it (PFEC.W = 0 if A/D need to be set on page structures) is not we expect. Maybe always report the right behavior is better? Especially,Intel may fix its microcode as it hurts the newest CPUs as well. Thanks!
[PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses
The new ept_access_test_paddr_read_only_ad_disabled testcase caused an infinite stream of EPT violations because KVM did not find anything bad in the page tables and kept re-executing the faulting instruction. This is because the exit qualification said we were reading from the page tables, but actually writing the cause of the EPT violation was writing the A/D bits. This happened even with eptad=0, quite surprisingly. Thus, always treat guest page table accesses as read+write operations, even if the exit qualification says otherwise. This fixes the testcase. Signed-off-by: Paolo Bonzini--- arch/x86/kvm/vmx.c | 36 +++- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c6f4ad44aa95..c868cbdad29a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) u32 error_code; exit_qualification = vmcs_readl(EXIT_QUALIFICATION); + gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); + trace_kvm_page_fault(gpa, exit_qualification); - if (is_guest_mode(vcpu) - && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) { - /* -* Fix up exit_qualification according to whether guest -* page table accesses are reads or writes. -*/ - u64 eptp = nested_ept_get_cr3(vcpu); - if (!(eptp & VMX_EPT_AD_ENABLE_BIT)) - exit_qualification &= ~EPT_VIOLATION_ACC_WRITE; - } + /* +* All guest page table accesses are potential writes to A/D bits. +* but EPT microcode only reports them as such when EPT A/D is +* enabled. Tracing ept_access_test_paddr_read_only_ad_disabled (from +* kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor +* does not change its behavior when EPTP enables A/D bits; the only +* difference is in the exit qualification. So fix this up here. +*/ + if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) + exit_qualification |= EPT_VIOLATION_ACC_WRITE; /* * EPT violation happened while executing iret from NMI, @@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) (exit_qualification & INTR_INFO_UNBLOCK_NMI)) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); - gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); - trace_kvm_page_fault(gpa, exit_qualification); - /* Is it a read fault? */ error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) ? PFERR_USER_MASK : 0; @@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) ? PFERR_PRESENT_MASK : 0; vcpu->arch.gpa_available = true; + + if (is_guest_mode(vcpu) + && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) { + /* +* Now fix up exit_qualification according to what the +* L1 hypervisor expects to see. +*/ + u64 eptp = nested_ept_get_cr3(vcpu); + if (!(eptp & VMX_EPT_AD_ENABLE_BIT)) + exit_qualification &= ~EPT_VIOLATION_ACC_WRITE; + } vcpu->arch.exit_qualification = exit_qualification; return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); -- 1.8.3.1
[PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses
The new ept_access_test_paddr_read_only_ad_disabled testcase caused an infinite stream of EPT violations because KVM did not find anything bad in the page tables and kept re-executing the faulting instruction. This is because the exit qualification said we were reading from the page tables, but actually writing the cause of the EPT violation was writing the A/D bits. This happened even with eptad=0, quite surprisingly. Thus, always treat guest page table accesses as read+write operations, even if the exit qualification says otherwise. This fixes the testcase. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx.c | 36 +++- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c6f4ad44aa95..c868cbdad29a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) u32 error_code; exit_qualification = vmcs_readl(EXIT_QUALIFICATION); + gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); + trace_kvm_page_fault(gpa, exit_qualification); - if (is_guest_mode(vcpu) - && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) { - /* -* Fix up exit_qualification according to whether guest -* page table accesses are reads or writes. -*/ - u64 eptp = nested_ept_get_cr3(vcpu); - if (!(eptp & VMX_EPT_AD_ENABLE_BIT)) - exit_qualification &= ~EPT_VIOLATION_ACC_WRITE; - } + /* +* All guest page table accesses are potential writes to A/D bits. +* but EPT microcode only reports them as such when EPT A/D is +* enabled. Tracing ept_access_test_paddr_read_only_ad_disabled (from +* kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor +* does not change its behavior when EPTP enables A/D bits; the only +* difference is in the exit qualification. So fix this up here. +*/ + if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) + exit_qualification |= EPT_VIOLATION_ACC_WRITE; /* * EPT violation happened while executing iret from NMI, @@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) (exit_qualification & INTR_INFO_UNBLOCK_NMI)) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); - gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); - trace_kvm_page_fault(gpa, exit_qualification); - /* Is it a read fault? */ error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) ? PFERR_USER_MASK : 0; @@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) ? PFERR_PRESENT_MASK : 0; vcpu->arch.gpa_available = true; + + if (is_guest_mode(vcpu) + && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) { + /* +* Now fix up exit_qualification according to what the +* L1 hypervisor expects to see. +*/ + u64 eptp = nested_ept_get_cr3(vcpu); + if (!(eptp & VMX_EPT_AD_ENABLE_BIT)) + exit_qualification &= ~EPT_VIOLATION_ACC_WRITE; + } vcpu->arch.exit_qualification = exit_qualification; return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); -- 1.8.3.1