Re: [Xen-devel] Ping: [PATCH] VMX: sync CPU state upon vCPU destruction

2017-11-21 Thread Sergey Dyasli
On Tue, 2017-11-21 at 08:29 -0700, Jan Beulich wrote:
> > > > On 21.11.17 at 15:07,  wrote:
> > 
> > On 21/11/17 13:22, Jan Beulich wrote:
> > > > > > On 09.11.17 at 15:49,  wrote:
> > > > 
> > > > See the code comment being added for why we need this.
> > > > 
> > > > Reported-by: Igor Druzhinin 
> > > > Signed-off-by: Jan Beulich 
> > > 
> > > I realize we aren't settled yet on where to put the sync call. The
> > > discussion appears to have stalled, though. Just to recap,
> > > alternatives to the placement below are
> > > - at the top of complete_domain_destroy(), being the specific
> > >   RCU callback exhibiting the problem (others are unlikely to
> > >   touch guest state)
> > > - in rcu_do_batch(), paralleling the similar call from
> > >   do_tasklet_work()
> > 
> > rcu_do_batch() sounds better to me. As I said before I think that the
> > problem is general for the hypervisor (not for VMX only) and might
> > appear in other places as well.
> 
> The question here is: In what other cases do we expect an RCU
> callback to possibly touch guest state? I think the common use is
> to merely free some memory in a delayed fashion.
> 
> > Those choices that you outlined appear to be different in terms whether
> > we solve the general problem and probably have some minor performance
> > impact or we solve the ad-hoc problem but make the system more
> > entangled. Here I'm more inclined to the first choice because this
> > particular scenario the performance impact should be negligible.
> 
> For the problem at hand there's no question about a
> performance effect. The question is whether doing this for _other_
> RCU callbacks would introduce a performance drop in certain cases.

So what are performance implications of my original suggestion of
removing !v->is_running check from vmx_ctxt_switch_from() ?
From what I can see:

1. Another field in struct vcpu will be checked instead (vmcs_pa)
2. Additionally this_cpu(current_vmcs) will be loaded, which shouldn't
   be terrible, given how heavy a context switch already is.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VMX: sync CPU state upon vCPU destruction

2017-11-10 Thread Sergey Dyasli
On Thu, 2017-11-09 at 07:49 -0700, Jan Beulich wrote:
> See the code comment being added for why we need this.
> 
> Reported-by: Igor Druzhinin 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -479,7 +479,13 @@ static void vmx_vcpu_destroy(struct vcpu
>   * we should disable PML manually here. Note that vmx_vcpu_destroy is 
> called
>   * prior to vmx_domain_destroy so we need to disable PML for each vcpu
>   * separately here.
> + *
> + * Before doing that though, flush all state for the vCPU previously 
> having
> + * run on the current CPU, so that this flushing of state won't happen 
> from
> + * the TLB flush IPI handler behind the back of a vmx_vmcs_enter() /
> + * vmx_vmcs_exit() section.
>   */
> +sync_local_execstate();
>  vmx_vcpu_disable_pml(v);
>  vmx_destroy_vmcs(v);
>  passive_domain_destroy(v);

This patch fixes only one particular issue and not the general problem.
What if vmcs is cleared, possibly in some future code, at another place?

The original intent of vmx_vmcs_reload() is correct: it lazily loads
the vmcs when it's needed. It's just the logic which checks for
v->is_running inside vmx_ctxt_switch_from() is flawed: v might be
"running" on another pCPU.

IMHO there are 2 possible solutions:

1. Add additional pCPU check into vmx_ctxt_switch_from()
2. Drop v->is_running check inside vmx_ctxt_switch_from() making
   vmx_vmcs_reload() unconditional.

Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Sergey Dyasli
On Thu, 2017-11-09 at 03:17 -0700, Jan Beulich wrote:
> > > > On 09.11.17 at 10:54,  wrote:
> > 
> > On Tue, 2017-11-07 at 14:24 +, Igor Druzhinin wrote:
> > > Perhaps I should improve my diagram:
> > > 
> > > pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle
> > > context
> > > -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) ->
> > > vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at this
> > > point on pCPU1)
> > > 
> > > pCPU2: context switch into vCPUx -> vCPUx.is_running = 1 -> TLB flush
> > > from context switch to clean TLB on pCPU1
> > > 
> > 
> > Sorry, there must be something I'm missing (or misunderstanding).
> > 
> > What is this code that checks is_running and triggers the TLB flush?
> 
> I don't see where Igor said is_running is being checked around a
> TLB flush. The TLB flush itself is what happens first thing in
> context_switch() (and it's really using the TLB flush interface to
> mainly effect the state flush, with the TLB flush being an implied
> side effect; I've already got a series of further patches to make
> this less implicit).
> 
> > But, more important, how come you are context switching to something
> > that has is_running == 1 ? That should not be possible.
> 
> That's not what Igor's diagram says - it's indicating the fact that
> is_running is being set to 1 in the process of context switching
> into vCPUx.

Jan, Dario,

Igor was referring to the following situation:


pCPU1   pCPU2
=   =
current == vCPU1
context_switch(next == idle)
!! __context_switch() is skipped
vcpu_migrate(vCPU1)
RCU callbacks
vmx_vcpu_destroy()
vmx_vcpu_disable_pml()
current_vmcs = 0

schedule(next == vCPU1)
vCPU1->is_running = 1;
context_switch(next == vCPU1)
flush_tlb_mask(_mask);

<--- IPI

__sync_local_execstate()
__context_switch(prev == vCPU1)
vmx_ctxt_switch_from(vCPU1)
vCPU1->is_running == 1
!! vmx_vmcs_reload() is skipped


I hope that this better illustrates the root cause.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1] x86/vvmx: don't enable vmcs shadowing for nested guests

2017-10-23 Thread Sergey Dyasli
Running "./xtf_runner vvmx" in L1 Xen under L0 Xen produces the
following result on H/W with VMCS shadowing:

Test: vmxon
Failure in test_vmxon_in_root_cpl0()
  Expected 0x820f: VMfailValid(15) VMXON_IN_ROOT
   Got 0x82004400: VMfailValid(17408) 
Test result: FAILURE

This happens because SDM allows vmentries with enabled VMCS shadowing
VM-execution control and VMCS link pointer value of ~0ull. But results
of a nested VMREAD are undefined in such cases.

Fix this by not copying the value of VMCS shadowing control from vmcs01
to vmcs02.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index dde02c076b..013d049f8a 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -633,6 +633,7 @@ void nvmx_update_secondary_exec_control(struct vcpu *v,
 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
 
 host_cntrl &= ~apicv_bit;
+host_cntrl &= ~SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
 shadow_cntrl = get_vvmcs(v, SECONDARY_VM_EXEC_CONTROL);
 
 /* No vAPIC-v support, so it shouldn't be set in vmcs12. */
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 7/7] x86/msr: handle VMX MSRs with guest_rd/wrmsr()

2017-10-18 Thread Sergey Dyasli
Now that each domain has a correct view of VMX MSRs in it's per-domain
MSR policy, it's possible to handle guest's RD/WRMSR with the new
handlers. Do it and remove the old nvmx_msr_read_intercept() and
associated bits.

There is no functional change to what a guest sees in VMX MSRs.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c |   6 --
 xen/arch/x86/hvm/vmx/vvmx.c| 178 -
 xen/arch/x86/msr.c |  37 
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 4 files changed, 37 insertions(+), 186 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c2148701ee..1a1cb98069 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2906,10 +2906,6 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 if ( nestedhvm_enabled(curr->domain) )
 *msr_content |= IA32_FEATURE_CONTROL_ENABLE_VMXON_OUTSIDE_SMX;
 break;
-case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
-if ( !nvmx_msr_read_intercept(msr, msr_content) )
-goto gp_fault;
-break;
 case MSR_IA32_MISC_ENABLE:
 rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
 /* Debug Trace Store is not supported. */
@@ -3133,8 +3129,6 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 break;
 }
 case MSR_IA32_FEATURE_CONTROL:
-case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
-/* None of these MSRs are writeable. */
 goto gp_fault;
 
 case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 4d9ffc490c..b0474ad310 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1976,184 +1976,6 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
 return X86EMUL_OKAY;
 }
 
-#define __emul_value(enable1, default1) \
-((enable1 | default1) << 32 | (default1))
-
-#define gen_vmx_msr(enable1, default1, host_value) \
-(((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \
-((uint32_t)(__emul_value(enable1, default1) | host_value)))
-
-/*
- * Capability reporting
- */
-int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
-{
-struct vcpu *v = current;
-struct domain *d = v->domain;
-u64 data = 0, host_data = 0;
-int r = 1;
-
-/* VMX capablity MSRs are available only when guest supports VMX. */
-if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
-return 0;
-
-/*
- * These MSRs are only available when flags in other MSRs are set.
- * These prerequisites are listed in the Intel 64 and IA-32
- * Architectures Software Developer’s Manual, Vol 3, Appendix A.
- */
-switch ( msr )
-{
-case MSR_IA32_VMX_PROCBASED_CTLS2:
-if ( !cpu_has_vmx_secondary_exec_control )
-return 0;
-break;
-
-case MSR_IA32_VMX_EPT_VPID_CAP:
-if ( !(cpu_has_vmx_ept || cpu_has_vmx_vpid) )
-return 0;
-break;
-
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_EXIT_CTLS:
-case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
-if ( !(vmx_basic_msr & VMX_BASIC_DEFAULT1_ZERO) )
-return 0;
-break;
-
-case MSR_IA32_VMX_VMFUNC:
-if ( !cpu_has_vmx_vmfunc )
-return 0;
-break;
-}
-
-rdmsrl(msr, host_data);
-
-/*
- * Remove unsupport features from n1 guest capability MSR
- */
-switch (msr) {
-case MSR_IA32_VMX_BASIC:
-{
-const struct vmcs_struct *vmcs =
-map_domain_page(_mfn(PFN_DOWN(v->arch.hvm_vmx.vmcs_pa)));
-
-data = (host_data & (~0ul << 32)) |
-   (vmcs->vmcs_revision_id & 0x7fff);
-unmap_domain_page(vmcs);
-break;
-}
-case MSR_IA32_VMX_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-/* 1-settings */
-data = PIN_BASED_EXT_INTR_MASK |
-   PIN_BASED_NMI_EXITING |
-   PIN_BASED_PREEMPT_TIMER;
-data = gen_vmx_msr(data, VMX_PINBASED_CTLS_DEFAULT1, host_data);
-break;
-case MSR_IA32_VMX_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-{
-u32 default1_bits = VMX_PROCBASED_CTLS_DEFAULT1;
-/* 1-settings */
-data = CPU_BASED_HLT_EXITING |
-   CPU_BASED_VIRTUAL_INTR_PENDING |
-   CPU_BASED_CR8_LOAD_EXITING |
-   CPU_BASED_CR8_STORE_EXITING |
-   CPU_BASED_INVLPG_EXITING |
-   CPU_BASED_CR3_LOAD_EXITING |
-   CPU_BASED_CR3_STORE_EXITING |
-   CPU_BASED_MONITOR_EXITING |
-   CPU_BASED_MWAIT_EXITING |
-   CPU_BASED_MOV_DR_EXITING |
-   CPU_BASED_AC

[Xen-devel] [PATCH v4 6/7] x86/msr: update domain policy on CPUID policy changes

2017-10-18 Thread Sergey Dyasli
Availability of some MSRs depends on certain CPUID bits. Add function
recalculate_domain_msr_policy() which updates availability of per-domain
MSRs based on current domain's CPUID policy. This function is called
when CPUID policy is changed from a toolstack.

Add recalculate_domain_vmx_msr_policy() which changes availability of
VMX MSRs based on domain's nested virt settings. Unavailable MSRs are
zeroed which allows checking availability bits in them directly without
preliminary checks (e.g. cpuid->basic.vmx, activate_secondary_controls,
enable_ept).

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/domctl.c |  1 +
 xen/arch/x86/msr.c| 55 +++
 xen/include/asm-x86/msr.h |  3 +++
 3 files changed, 59 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 80b4df9ec9..334c67d261 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -124,6 +124,7 @@ static int update_domain_cpuid_info(struct domain *d,
 }
 
 recalculate_cpuid_policy(d);
+recalculate_domain_msr_policy(d);
 
 switch ( ctl->input[0] )
 {
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index ff270befbb..9ea7447de3 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct msr_domain_policy __read_mostly raw_msr_domain_policy,
  __read_mostlyhost_msr_domain_policy,
@@ -257,6 +258,59 @@ void __init init_guest_msr_policy(void)
 calculate_pv_max_policy();
 }
 
+static void recalculate_domain_vmx_msr_policy(struct domain *d)
+{
+struct msr_domain_policy *dp = d->arch.msr;
+
+if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
+{
+memset(dp->vmx.raw, 0, sizeof(dp->vmx.raw));
+dp->vmx_procbased_ctls2.raw = 0;
+dp->vmx_ept_vpid_cap.raw = 0;
+memset(dp->vmx_true_ctls.raw, 0, sizeof(dp->vmx_true_ctls.raw));
+dp->vmx_vmfunc.raw = 0;
+}
+else
+{
+memcpy(dp->vmx.raw, hvm_max_msr_domain_policy.vmx.raw,
+   sizeof(dp->vmx.raw));
+/* Get allowed CR4 bits from CPUID policy */
+dp->vmx.cr4_fixed1.raw = hvm_cr4_guest_valid_bits(d, false);
+
+if ( dp->vmx.procbased_ctls.allowed_1.activate_secondary_controls )
+{
+dp->vmx_procbased_ctls2.raw =
+hvm_max_msr_domain_policy.vmx_procbased_ctls2.raw;
+
+if ( dp->vmx_procbased_ctls2.allowed_1.enable_ept ||
+ dp->vmx_procbased_ctls2.allowed_1.enable_vpid )
+dp->vmx_ept_vpid_cap.raw =
+hvm_max_msr_domain_policy.vmx_ept_vpid_cap.raw;
+else
+dp->vmx_ept_vpid_cap.raw = 0;
+}
+else
+{
+dp->vmx_procbased_ctls2.raw = 0;
+dp->vmx_ept_vpid_cap.raw = 0;
+}
+
+if ( dp->vmx.basic.default1_zero )
+memcpy(dp->vmx_true_ctls.raw,
+   hvm_max_msr_domain_policy.vmx_true_ctls.raw,
+   sizeof(dp->vmx_true_ctls.raw));
+else
+memset(dp->vmx_true_ctls.raw, 0, sizeof(dp->vmx_true_ctls.raw));
+
+dp->vmx_vmfunc.raw = 0;
+}
+}
+
+void recalculate_domain_msr_policy(struct domain *d)
+{
+recalculate_domain_vmx_msr_policy(d);
+}
+
 int init_domain_msr_policy(struct domain *d)
 {
 struct msr_domain_policy *dp;
@@ -277,6 +331,7 @@ int init_domain_msr_policy(struct domain *d)
 }
 
 d->arch.msr = dp;
+recalculate_domain_msr_policy(d);
 
 return 0;
 }
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 15551f..f19e113612 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -608,6 +608,9 @@ int init_vcpu_msr_policy(struct vcpu *v);
 int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val);
 int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val);
 
+/* Update availability of per-domain MSRs based on CPUID policy */
+void recalculate_domain_msr_policy(struct domain *d);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_MSR_H */
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 5/7] x86/cpuid: update signature of hvm_cr4_guest_valid_bits()

2017-10-18 Thread Sergey Dyasli
With the new cpuid infrastructure there is a domain-wide struct cpuid
policy and there is no need to pass a separate struct vcpu * into
hvm_cr4_guest_valid_bits() anymore. Make the function accept struct
domain * instead and update callers.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/domain.c   | 3 ++-
 xen/arch/x86/hvm/hvm.c  | 7 +++
 xen/arch/x86/hvm/svm/svmdebug.c | 4 ++--
 xen/arch/x86/hvm/vmx/vvmx.c | 2 +-
 xen/include/asm-x86/hvm/hvm.h   | 2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c
index 60474649de..ce15ce0470 100644
--- a/xen/arch/x86/hvm/domain.c
+++ b/xen/arch/x86/hvm/domain.c
@@ -111,6 +111,7 @@ static int check_segment(struct segment_register *reg, enum 
x86_segment seg)
 /* Called by VCPUOP_initialise for HVM guests. */
 int arch_set_info_hvm_guest(struct vcpu *v, const vcpu_hvm_context_t *ctx)
 {
+const struct domain *d = v->domain;
 struct cpu_user_regs *uregs = >arch.user_regs;
 struct segment_register cs, ds, ss, es, tr;
 const char *errstr;
@@ -272,7 +273,7 @@ int arch_set_info_hvm_guest(struct vcpu *v, const 
vcpu_hvm_context_t *ctx)
 if ( v->arch.hvm_vcpu.guest_efer & EFER_LME )
 v->arch.hvm_vcpu.guest_efer |= EFER_LMA;
 
-if ( v->arch.hvm_vcpu.guest_cr[4] & ~hvm_cr4_guest_valid_bits(v, 0) )
+if ( v->arch.hvm_vcpu.guest_cr[4] & ~hvm_cr4_guest_valid_bits(d, false) )
 {
 gprintk(XENLOG_ERR, "Bad CR4 value: %#016lx\n",
 v->arch.hvm_vcpu.guest_cr[4]);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 205b4cb685..1784c32c7e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -928,9 +928,8 @@ const char *hvm_efer_valid(const struct vcpu *v, uint64_t 
value,
 X86_CR0_CD | X86_CR0_PG)))
 
 /* These bits in CR4 can be set by the guest. */
-unsigned long hvm_cr4_guest_valid_bits(const struct vcpu *v, bool restore)
+unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore)
 {
-const struct domain *d = v->domain;
 const struct cpuid_policy *p;
 bool mce, vmxe;
 
@@ -997,7 +996,7 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 return -EINVAL;
 }
 
-if ( ctxt.cr4 & ~hvm_cr4_guest_valid_bits(v, 1) )
+if ( ctxt.cr4 & ~hvm_cr4_guest_valid_bits(d, true) )
 {
 printk(XENLOG_G_ERR "HVM%d restore: bad CR4 %#" PRIx64 "\n",
d->domain_id, ctxt.cr4);
@@ -2308,7 +2307,7 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
 struct vcpu *v = current;
 unsigned long old_cr;
 
-if ( value & ~hvm_cr4_guest_valid_bits(v, 0) )
+if ( value & ~hvm_cr4_guest_valid_bits(v->domain, false) )
 {
 HVM_DBG_LOG(DBG_LEVEL_1,
 "Guest attempts to set reserved bit in CR4: %lx",
diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c
index 89ef2db932..e25e3e0423 100644
--- a/xen/arch/x86/hvm/svm/svmdebug.c
+++ b/xen/arch/x86/hvm/svm/svmdebug.c
@@ -119,9 +119,9 @@ bool svm_vmcb_isvalid(const char *from, const struct 
vmcb_struct *vmcb,
(cr3 >> v->domain->arch.cpuid->extd.maxphysaddr))) )
 PRINTF("CR3: MBZ bits are set (%#"PRIx64")\n", cr3);
 
-if ( cr4 & ~hvm_cr4_guest_valid_bits(v, false) )
+if ( cr4 & ~hvm_cr4_guest_valid_bits(v->domain, false) )
 PRINTF("CR4: invalid bits are set (%#"PRIx64", valid: %#"PRIx64")\n",
-   cr4, hvm_cr4_guest_valid_bits(v, false));
+   cr4, hvm_cr4_guest_valid_bits(v->domain, false));
 
 if ( vmcb_get_dr6(vmcb) >> 32 )
 PRINTF("DR6: bits [63:32] are not zero (%#"PRIx64")\n",
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index dde02c076b..4d9ffc490c 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2136,7 +2136,7 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 
*msr_content)
 data = X86_CR4_VMXE;
 break;
 case MSR_IA32_VMX_CR4_FIXED1:
-data = hvm_cr4_guest_valid_bits(v, 0);
+data = hvm_cr4_guest_valid_bits(d, false);
 break;
 case MSR_IA32_VMX_MISC:
 /* Do not support CR3-target feature now */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index b687e03dce..47a5f7916d 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -612,7 +612,7 @@ static inline bool altp2m_vcpu_emulate_ve(struct vcpu *v)
 /* Check CR4/EFER values */
 const char *hvm_efer_valid(const struct vcpu *v, uint64_t value,
signed int cr0_pg);
-unsigned long hvm_cr4_guest_valid_bits(const struct vcpu *v, bool restore);
+unsigned long hvm_cr4_guest_v

[Xen-devel] [PATCH v4 1/7] x86/msr: add Raw and Host domain policies

2017-10-18 Thread Sergey Dyasli
Raw policy contains the actual values from H/W MSRs. PLATFORM_INFO msr
needs to be read again because probe_intel_cpuid_faulting() records
the presence of X86_FEATURE_CPUID_FAULTING but not the presence of msr
itself (if cpuid faulting is not available).

Host policy might have certain features disabled if Xen decides not
to use them. For now, make Host policy equal to Raw policy.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/msr.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index baba44f43d..9737ed706e 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -24,12 +24,34 @@
 #include 
 #include 
 
-struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy,
+struct msr_domain_policy __read_mostly raw_msr_domain_policy,
+ __read_mostlyhost_msr_domain_policy,
+ __read_mostly hvm_max_msr_domain_policy,
  __read_mostly  pv_max_msr_domain_policy;
 
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+static void __init calculate_raw_policy(void)
+{
+struct msr_domain_policy *dp = _msr_domain_policy;
+uint64_t val;
+
+if ( rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) == 0 )
+{
+dp->plaform_info.available = true;
+if ( val & MSR_PLATFORM_INFO_CPUID_FAULTING )
+dp->plaform_info.cpuid_faulting = true;
+}
+}
+
+static void __init calculate_host_policy(void)
+{
+struct msr_domain_policy *dp = _msr_domain_policy;
+
+*dp = raw_msr_domain_policy;
+}
+
 static void __init calculate_hvm_max_policy(void)
 {
 struct msr_domain_policy *dp = _max_msr_domain_policy;
@@ -67,6 +89,8 @@ static void __init calculate_pv_max_policy(void)
 
 void __init init_guest_msr_policy(void)
 {
+calculate_raw_policy();
+calculate_host_policy();
 calculate_hvm_max_policy();
 calculate_pv_max_policy();
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 4/7] x86/msr: add VMX MSRs into HVM_max domain policy

2017-10-18 Thread Sergey Dyasli
Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

Add calculate_hvm_max_vmx_policy() which will save the end result of
nvmx_msr_read_intercept() on current H/W into HVM_max domain policy.
There will be no functional change to what L1 sees in VMX MSRs. But the
actual use of HVM_max domain policy will happen later, when VMX MSRs
are handled by guest_rd/wrmsr().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/msr.c | 129 +
 1 file changed, 129 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 7ac0fceb49..ff270befbb 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -85,6 +85,133 @@ static void __init calculate_host_policy(void)
 *dp = raw_msr_domain_policy;
 }
 
+#define vmx_host_allowed_cpy(dp, msr, field)\
+do {\
+dp->msr.allowed_1.field =   \
+host_msr_domain_policy.msr.allowed_1.field; \
+dp->msr.allowed_0.field =   \
+host_msr_domain_policy.msr.allowed_0.field; \
+} while (0)
+
+#define vmx_host_allowed_cpyb(dp, block, msr, field)\
+do {\
+dp->block.msr.allowed_1.field = \
+host_msr_domain_policy.block.msr.allowed_1.field;   \
+dp->block.msr.allowed_0.field = \
+host_msr_domain_policy.block.msr.allowed_0.field;   \
+} while (0)
+
+static void __init calculate_hvm_max_vmx_policy(struct msr_domain_policy *dp)
+{
+if ( !cpu_has_vmx )
+return;
+
+dp->vmx.basic.raw = host_msr_domain_policy.vmx.basic.raw;
+
+dp->vmx.pinbased_ctls.raw = ((uint64_t) VMX_PINBASED_CTLS_DEFAULT1 << 32) |
+VMX_PINBASED_CTLS_DEFAULT1;
+vmx_host_allowed_cpyb(dp, vmx, pinbased_ctls, ext_intr_exiting);
+vmx_host_allowed_cpyb(dp, vmx, pinbased_ctls, nmi_exiting);
+vmx_host_allowed_cpyb(dp, vmx, pinbased_ctls, preempt_timer);
+
+dp->vmx.procbased_ctls.raw =
+((uint64_t) VMX_PROCBASED_CTLS_DEFAULT1 << 32) |
+VMX_PROCBASED_CTLS_DEFAULT1;
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, virtual_intr_pending);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, use_tsc_offseting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, hlt_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, invlpg_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, mwait_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, rdpmc_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, rdtsc_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, cr8_load_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, cr8_store_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, tpr_shadow);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, virtual_nmi_pending);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, mov_dr_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, uncond_io_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, activate_io_bitmap);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, monitor_trap_flag);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, activate_msr_bitmap);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, monitor_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, pause_exiting);
+vmx_host_allowed_cpyb(dp, vmx, procbased_ctls, 
activate_secondary_controls);
+
+dp->vmx.exit_ctls.raw = ((uint64_t) VMX_EXIT_CTLS_DEFAULT1 << 32) |
+VMX_EXIT_CTLS_DEFAULT1;
+vmx_host_allowed_cpyb(dp, vmx, exit_ctls, ia32e_mode);
+vmx_host_allowed_cpyb(dp, vmx, exit_ctls, load_perf_global_ctrl);
+vmx_host_allowed_cpyb(dp, vmx, exit_ctls, ack_intr_on_exit);
+vmx_host_allowed_cpyb(dp, vmx, exit_ctls, save_guest_pat);
+vmx_host_allowed_cpyb(dp, vmx, exit_ctls, load_host_pat);
+vmx_host_allowed_cpyb(dp, vmx, exit_ctls, save_guest_efer);
+vmx_host_allowed_cpyb(dp, vmx, exit_ctls, load_host_efer);
+vmx_host_allowed_cpyb(dp, vmx, exit_ctls, save_preempt_timer);
+
+dp->vmx.entry_ctls.raw = ((uint64_t) VMX_ENTRY_CTLS_DEFAULT1 << 32) |
+ VMX_ENTRY_CTLS_DEFAULT1;
+vmx_host_allowed_cpyb(dp, vmx, entry_ctls, ia32e_mode);
+vmx_host_allowed_cpyb(dp, vmx, entry_ctls, load_perf_glo

[Xen-devel] [PATCH v4 3/7] x86/msr: read VMX MSRs values into Raw policy

2017-10-18 Thread Sergey Dyasli
Add calculate_raw_vmx_policy() which fills Raw policy with H/W values
of VMX MSRs. Host policy will contain a copy of these values.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/msr.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 9dc3de8ce1..7ac0fceb49 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -32,6 +32,37 @@ struct msr_domain_policy __read_mostly 
raw_msr_domain_policy,
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+static void __init calculate_raw_vmx_policy(struct msr_domain_policy *dp)
+{
+unsigned int i;
+
+if ( !cpu_has_vmx )
+return;
+
+for ( i = MSR_IA32_VMX_BASIC; i <= MSR_IA32_VMX_VMCS_ENUM; i++ )
+rdmsrl(i, dp->vmx.raw[i - MSR_IA32_VMX_BASIC]);
+
+if ( dp->vmx.procbased_ctls.allowed_1.activate_secondary_controls )
+{
+rdmsrl(MSR_IA32_VMX_PROCBASED_CTLS2, dp->vmx_procbased_ctls2.raw);
+
+if ( dp->vmx_procbased_ctls2.allowed_1.enable_ept ||
+ dp->vmx_procbased_ctls2.allowed_1.enable_vpid )
+rdmsrl(MSR_IA32_VMX_EPT_VPID_CAP, dp->vmx_ept_vpid_cap.raw);
+}
+
+if ( dp->vmx.basic.default1_zero )
+{
+for ( i = MSR_IA32_VMX_TRUE_PINBASED_CTLS;
+  i <= MSR_IA32_VMX_TRUE_ENTRY_CTLS; i++ )
+rdmsrl(i,
+   dp->vmx_true_ctls.raw[i - MSR_IA32_VMX_TRUE_PINBASED_CTLS]);
+}
+
+if ( dp->vmx_procbased_ctls2.allowed_1.enable_vm_functions )
+rdmsrl(MSR_IA32_VMX_VMFUNC, dp->vmx_vmfunc.raw);
+}
+
 static void __init calculate_raw_policy(void)
 {
 struct msr_domain_policy *dp = _msr_domain_policy;
@@ -43,6 +74,8 @@ static void __init calculate_raw_policy(void)
 if ( val & MSR_PLATFORM_INFO_CPUID_FAULTING )
 dp->plaform_info.cpuid_faulting = true;
 }
+
+calculate_raw_vmx_policy(dp);
 }
 
 static void __init calculate_host_policy(void)
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 0/7] VMX MSRs policy for Nested Virt: part 1

2017-10-18 Thread Sergey Dyasli
The end goal of having VMX MSRs policy is to be able to manage
L1 VMX features. This patch series is the first part of this work.
There is no functional change to what L1 sees in VMX MSRs at this
point. But each domain will have a policy object which allows to
sensibly query what VMX features the domain has. This will unblock
some other nested virtualization work items.

Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

The above makes L1 VMX feature set inconsistent between different H/W
and there is no ability to control what features are available to L1.
The overall set of issues has much in common with CPUID policy.

Part 1 adds VMX MSRs into struct msr_domain_policy and initializes them
during domain creation based on CPUID policy. In the future it should be
possible to independently configure values of VMX MSRs for each domain.

v3 --> v4:
- VMX MSRs are now separated into 5 logical blocks
- Per MSR availability flags are dropped
- Separate patch for hvm_cr4_guest_valid_bits() is added

v2 --> v3:
- Rebase on top of Generic MSR Policy
- Each VMX MSR now has its own availability flag
- VMX MSRs are now completely defined during domain creation
  (all CPUID policy changes are taken into account)

Sergey Dyasli (7):
  x86/msr: add Raw and Host domain policies
  x86/msr: add VMX MSRs into struct msr_domain_policy
  x86/msr: read VMX MSRs values into Raw policy
  x86/msr: add VMX MSRs into HVM_max domain policy
  x86/cpuid: update signature of hvm_cr4_guest_valid_bits()
  x86/msr: update domain policy on CPUID policy changes
  x86/msr: handle VMX MSRs with guest_rd/wrmsr()

 xen/arch/x86/domctl.c  |   1 +
 xen/arch/x86/hvm/domain.c  |   3 +-
 xen/arch/x86/hvm/hvm.c |   7 +-
 xen/arch/x86/hvm/svm/svmdebug.c|   4 +-
 xen/arch/x86/hvm/vmx/vmx.c |   6 -
 xen/arch/x86/hvm/vmx/vvmx.c| 178 --
 xen/arch/x86/msr.c | 343 -
 xen/include/asm-x86/hvm/hvm.h  |   2 +-
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 xen/include/asm-x86/msr.h  | 376 +
 10 files changed, 727 insertions(+), 195 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 2/7] x86/msr: add VMX MSRs into struct msr_domain_policy

2017-10-18 Thread Sergey Dyasli
New definitions provide a convenient way of accessing contents of
VMX MSRs. They are separated into 5 logical blocks:

1. vmx: [VMX_BASIC, VMX_VMCS_ENUM]
2. VMX_PROCBASED_CTLS2
3. VMX_EPT_VPID_CAP
4. vmx_true_ctls: [VMX_TRUE_PINBASED_CTLS, VMX_TRUE_ENTRY_CTLS]
5. VMX_VMFUNC

Every bit value is accessible by its name and bit names match existing
Xen's definitions as close as possible. There is a "raw" 64-bit field
for each MSR as well as "raw" arrays for vmx and vmx_true_ctls blocks.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/msr.c|  63 
 xen/include/asm-x86/msr.h | 373 ++
 2 files changed, 436 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 9737ed706e..9dc3de8ce1 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -216,6 +216,69 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 return X86EMUL_EXCEPTION;
 }
 
+static void __init __maybe_unused build_assertions(void)
+{
+struct msr_domain_policy dp;
+
+BUILD_BUG_ON(sizeof(dp.vmx.basic) !=
+ sizeof(dp.vmx.basic.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.pinbased_ctls) !=
+ sizeof(dp.vmx.pinbased_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.procbased_ctls) !=
+ sizeof(dp.vmx.procbased_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.exit_ctls) !=
+ sizeof(dp.vmx.exit_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.entry_ctls) !=
+ sizeof(dp.vmx.entry_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.misc) !=
+ sizeof(dp.vmx.misc.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr0_fixed0) !=
+ sizeof(dp.vmx.cr0_fixed0.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr0_fixed1) !=
+ sizeof(dp.vmx.cr0_fixed1.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr4_fixed0) !=
+ sizeof(dp.vmx.cr4_fixed0.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr4_fixed1) !=
+ sizeof(dp.vmx.cr4_fixed1.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.vmcs_enum) !=
+ sizeof(dp.vmx.vmcs_enum.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.raw) !=
+ sizeof(dp.vmx.basic) +
+ sizeof(dp.vmx.pinbased_ctls) +
+ sizeof(dp.vmx.procbased_ctls) +
+ sizeof(dp.vmx.exit_ctls) +
+ sizeof(dp.vmx.entry_ctls) +
+ sizeof(dp.vmx.misc) +
+ sizeof(dp.vmx.cr0_fixed0) +
+ sizeof(dp.vmx.cr0_fixed1) +
+ sizeof(dp.vmx.cr4_fixed0) +
+ sizeof(dp.vmx.cr4_fixed1) +
+ sizeof(dp.vmx.vmcs_enum));
+
+BUILD_BUG_ON(sizeof(dp.vmx_procbased_ctls2) !=
+ sizeof(dp.vmx_procbased_ctls2.raw));
+
+BUILD_BUG_ON(sizeof(dp.vmx_ept_vpid_cap) !=
+ sizeof(dp.vmx_ept_vpid_cap.raw));
+
+BUILD_BUG_ON(sizeof(dp.vmx_true_ctls.pinbased) !=
+ sizeof(dp.vmx_true_ctls.pinbased.raw));
+BUILD_BUG_ON(sizeof(dp.vmx_true_ctls.procbased) !=
+ sizeof(dp.vmx_true_ctls.procbased.raw));
+BUILD_BUG_ON(sizeof(dp.vmx_true_ctls.exit) !=
+ sizeof(dp.vmx_true_ctls.exit.raw));
+BUILD_BUG_ON(sizeof(dp.vmx_true_ctls.entry) !=
+ sizeof(dp.vmx_true_ctls.entry.raw));
+BUILD_BUG_ON(sizeof(dp.vmx_true_ctls.raw) !=
+ sizeof(dp.vmx_true_ctls.pinbased) +
+ sizeof(dp.vmx_true_ctls.procbased) +
+ sizeof(dp.vmx_true_ctls.exit) +
+ sizeof(dp.vmx_true_ctls.entry));
+
+BUILD_BUG_ON(sizeof(dp.vmx_vmfunc) !=
+ sizeof(dp.vmx_vmfunc.raw));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 751fa25a36..15551f 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -202,6 +202,171 @@ void write_efer(u64 val);
 
 DECLARE_PER_CPU(u32, ler_msr);
 
+union vmx_pin_based_exec_control_bits {
+uint32_t raw;
+struct {
+bool ext_intr_exiting:1;
+uint32_t :2;  /* 1:2 reserved */
+bool  nmi_exiting:1;
+uint32_t :1;  /* 4 reserved */
+bool virtual_nmis:1;
+boolpreempt_timer:1;
+bool posted_interrupt:1;
+uint32_t :24; /* 8:31 reserved */
+};
+};
+
+union vmx_cpu_based_exec_control_bits {
+uint32_t raw;
+struct {
+uint32_t:2;  /* 0:1 reserved */
+boolvirtual_intr_pending:1;
+bool   use_tsc_offseting:1;
+uint32_t:3;  /* 4:6 reserved */
+bool hlt_exiting:1;
+uint32_t:1;  /* 8 reserved */
+bool  invlpg_exiting:1;
+bool   mwait_exiting:1;
+bool

Re: [Xen-devel] [PATCH v3 2/6] x86/msr: add VMX MSRs into struct msr_domain_policy

2017-10-18 Thread Sergey Dyasli
On Mon, 2017-10-16 at 15:01 +0100, Andrew Cooper wrote:
> On 16/10/17 08:42, Sergey Dyasli wrote:
> > +
> > +secondary_available =
> > +dp->vmx_procbased_ctls.u.allowed_1.activate_secondary_controls;
> > +
> > +switch (msr)
> > +{
> > +case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMCS_ENUM:
> > +return true;
> > +
> > +case MSR_IA32_VMX_PROCBASED_CTLS2:
> > +return secondary_available;
> > +
> > +case MSR_IA32_VMX_EPT_VPID_CAP:
> > +return ( secondary_available &&
> > + (dp->vmx_procbased_ctls2.u.allowed_1.enable_ept ||
> > +  dp->vmx_procbased_ctls2.u.allowed_1.enable_vpid) );
> 
> This check can be made more efficient in two ways.  First, use a bitwise
> rather than logical or, which allows both _ept and _vpid to be tested
> with a single instruction, rather than a conditional branch.

But it's compiler's job to optimize conditions like that.
I'm getting the following asm:

if ( dp->vmx_procbased_ctls2.allowed_1.enable_ept ||
82d08027bc3d:   48 c1 e8 20 shr$0x20,%rax
82d08027bc41:   a8 22   test   $0x22,%al
82d08027bc43:   74 0d   je 82d08027bc52 
<recalculate_domain_vmx_msr_policy+0x196>

And "test   $0x22" is exactly the test for "enable_ept || enable_vpid"
with a single instruction.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 6/6] x86/msr: handle VMX MSRs with guest_rd/wrmsr()

2017-10-16 Thread Sergey Dyasli
On Fri, 2017-10-13 at 16:38 +0100, Andrew Cooper wrote:
> On 13/10/17 13:35, Sergey Dyasli wrote:
> > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> > index a22e3dfaf2..2527fdd1d1 100644
> > --- a/xen/arch/x86/msr.c
> > +++ b/xen/arch/x86/msr.c
> > @@ -426,6 +426,13 @@ int init_vcpu_msr_policy(struct vcpu *v)
> >  return 0;
> >  }
> >  
> > +#define vmx_guest_rdmsr(dp, name, msr) \
> > +case name: \
> > +if ( !dp->msr.available )  \
> > +goto gp_fault; \
> > +*val = dp->msr.u.raw;  \
> > +break;
> 
> Eww :(
> 
> For blocks of MSRs, it would be far better to go with the same structure
> as the cpuid policy.  Something like:
> 
> struct {
>     union {
>         uint64_t raw[NR_VMX_MSRS];
>         struct {
>             struct {
>                 ...
>             } basic;
>             struct {
>                 ...
>             } pinbased_ctls;
>         };
>     };
> } vmx;
> 
> This way, the guest_rdmsr() will be far more efficient.
> 
> case MSR_IA32_VMX_BASIC ... xxx:
>     if ( !cpuid->basic.vmx )
>         goto gp_fault;
>     *val = dp->vmx.raw[msr - MSR_IA32_VMX_BASIC];
>     break;
> 
> It would probably be worth splitting into a couple of different blocks
> based on the different availability checks.

I can understand an argument about removing available flags and getting
smaller msr policy's struct, but I fail to see how a big number of case
statements will make guest_rdmsr() inefficient. I expect a switch
statement to have O(log(N)) complexity which means it doesn't really
matter how many case statements there are.

Do you have some other performance concerns?

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/6] x86/msr: update domain policy on CPUID policy changes

2017-10-16 Thread Sergey Dyasli
On Fri, 2017-10-13 at 16:25 +0100, Andrew Cooper wrote:
> On 13/10/17 13:35, Sergey Dyasli wrote:
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 205b4cb685..7e6b15f8d7 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -928,9 +928,8 @@ const char *hvm_efer_valid(const struct vcpu *v, 
> > uint64_t value,
> >  X86_CR0_CD | X86_CR0_PG)))
> >  
> >  /* These bits in CR4 can be set by the guest. */
> > -unsigned long hvm_cr4_guest_valid_bits(const struct vcpu *v, bool restore)
> > +unsigned long hvm_cr4_domain_valid_bits(const struct domain *d, bool 
> > restore)
> >  {
> > -const struct domain *d = v->domain;
> >  const struct cpuid_policy *p;
> >  bool mce, vmxe;
> >  
> > @@ -963,6 +962,11 @@ unsigned long hvm_cr4_guest_valid_bits(const struct 
> > vcpu *v, bool restore)
> >  (p->feat.pku  ? X86_CR4_PKE   : 0));
> >  }
> >  
> > +unsigned long hvm_cr4_guest_valid_bits(const struct vcpu *v, bool restore)
> 
> I'd split this change out into a separate patch and change the existing
> guest valid bits to taking a domain *.
> 
> It needed to take vcpu in the past because of the old cpuid
> infrastructure, but it doesn't need to any more because of the
> domain-wide struct cpuid policy.

That was one of possibilities so I really needed a mainteiner's opinion
on this. Thanks for providing one!

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 2/6] x86/msr: add VMX MSRs into struct msr_domain_policy

2017-10-16 Thread Sergey Dyasli
On Fri, 2017-10-13 at 16:16 +0100, Andrew Cooper wrote:
> On 13/10/17 13:35, Sergey Dyasli wrote:
> > @@ -210,6 +375,255 @@ struct msr_domain_policy
> >  bool available; /* This MSR is non-architectural */
> >  bool cpuid_faulting;
> >  } plaform_info;
> > +
> > +/* 0x0480  MSR_IA32_VMX_BASIC */
> > +struct {
> > +bool available;
> 
> We don't need available bits for any of these MSRs.  Their availability
> is cpuid->basic.vmx, and we don't want (let alone need) to duplicate
> information like this.

Andrew,

What do you think about the following way of checking the availability?

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 2527fdd1d1..828f1bb503 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -33,6 +33,43 @@ struct msr_domain_policy __read_mostly 
raw_msr_domain_policy,
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+bool msr_vmx_available(const struct domain *d, uint32_t msr)
+{
+const struct msr_domain_policy *dp = d->arch.msr;
+bool secondary_available;
+
+if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
+return false;
+
+secondary_available =
+dp->vmx_procbased_ctls.u.allowed_1.activate_secondary_controls;
+
+switch (msr)
+{
+case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMCS_ENUM:
+return true;
+
+case MSR_IA32_VMX_PROCBASED_CTLS2:
+return secondary_available;
+
+case MSR_IA32_VMX_EPT_VPID_CAP:
+return ( secondary_available &&
+ (dp->vmx_procbased_ctls2.u.allowed_1.enable_ept ||
+  dp->vmx_procbased_ctls2.u.allowed_1.enable_vpid) );
+
+case MSR_IA32_VMX_TRUE_PINBASED_CTLS ... MSR_IA32_VMX_TRUE_ENTRY_CTLS:
+return dp->vmx_basic.u.default1_zero;
+
+case MSR_IA32_VMX_VMFUNC:
+return ( secondary_available &&
+ dp->vmx_procbased_ctls2.u.allowed_1.enable_vm_functions );
+
+default: break;
+}
+
+return false;
+}
+
 static void __init calculate_raw_vmx_policy(struct msr_domain_policy *dp)
 {
 if ( !cpu_has_vmx )

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 3/6] x86/msr: read VMX MSRs values into Raw policy

2017-10-13 Thread Sergey Dyasli
Add calculate_raw_vmx_policy() which fills Raw policy with H/W values
of VMX MSRs. Host policy will contain a copy of these values.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/msr.c | 77 ++
 1 file changed, 77 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 24029a2ac1..955aba0849 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -32,6 +32,81 @@ struct msr_domain_policy __read_mostly 
raw_msr_domain_policy,
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+static void __init calculate_raw_vmx_policy(struct msr_domain_policy *dp)
+{
+if ( !cpu_has_vmx )
+return;
+
+dp->vmx_basic.available = true;
+rdmsrl(MSR_IA32_VMX_BASIC, dp->vmx_basic.u.raw);
+
+dp->vmx_pinbased_ctls.available = true;
+rdmsrl(MSR_IA32_VMX_PINBASED_CTLS, dp->vmx_pinbased_ctls.u.raw);
+
+dp->vmx_procbased_ctls.available = true;
+rdmsrl(MSR_IA32_VMX_PROCBASED_CTLS, dp->vmx_procbased_ctls.u.raw);
+
+dp->vmx_exit_ctls.available = true;
+rdmsrl(MSR_IA32_VMX_EXIT_CTLS, dp->vmx_exit_ctls.u.raw);
+
+dp->vmx_entry_ctls.available = true;
+rdmsrl(MSR_IA32_VMX_ENTRY_CTLS, dp->vmx_entry_ctls.u.raw);
+
+dp->vmx_misc.available = true;
+rdmsrl(MSR_IA32_VMX_MISC, dp->vmx_misc.u.raw);
+
+dp->vmx_cr0_fixed0.available = true;
+rdmsrl(MSR_IA32_VMX_CR0_FIXED0, dp->vmx_cr0_fixed0.u.raw);
+
+dp->vmx_cr0_fixed1.available = true;
+rdmsrl(MSR_IA32_VMX_CR0_FIXED1, dp->vmx_cr0_fixed1.u.raw);
+
+dp->vmx_cr4_fixed0.available = true;
+rdmsrl(MSR_IA32_VMX_CR4_FIXED0, dp->vmx_cr4_fixed0.u.raw);
+
+dp->vmx_cr4_fixed1.available = true;
+rdmsrl(MSR_IA32_VMX_CR4_FIXED1, dp->vmx_cr4_fixed1.u.raw);
+
+dp->vmx_vmcs_enum.available = true;
+rdmsrl(MSR_IA32_VMX_VMCS_ENUM, dp->vmx_vmcs_enum.u.raw);
+
+if ( dp->vmx_procbased_ctls.u.allowed_1.activate_secondary_controls )
+{
+dp->vmx_procbased_ctls2.available = true;
+rdmsrl(MSR_IA32_VMX_PROCBASED_CTLS2, dp->vmx_procbased_ctls2.u.raw);
+
+if ( dp->vmx_procbased_ctls2.u.allowed_1.enable_ept ||
+ dp->vmx_procbased_ctls2.u.allowed_1.enable_vpid )
+{
+dp->vmx_ept_vpid_cap.available = true;
+rdmsrl(MSR_IA32_VMX_EPT_VPID_CAP, dp->vmx_ept_vpid_cap.u.raw);
+}
+}
+
+if ( dp->vmx_basic.u.default1_zero )
+{
+dp->vmx_true_pinbased_ctls.available = true;
+rdmsrl(MSR_IA32_VMX_TRUE_PINBASED_CTLS,
+   dp->vmx_true_pinbased_ctls.u.raw);
+
+dp->vmx_true_procbased_ctls.available = true;
+rdmsrl(MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
+   dp->vmx_true_procbased_ctls.u.raw);
+
+dp->vmx_true_exit_ctls.available = true;
+rdmsrl(MSR_IA32_VMX_TRUE_EXIT_CTLS, dp->vmx_true_exit_ctls.u.raw);
+
+dp->vmx_true_entry_ctls.available = true;
+rdmsrl(MSR_IA32_VMX_TRUE_ENTRY_CTLS, dp->vmx_true_entry_ctls.u.raw);
+}
+
+if ( dp->vmx_procbased_ctls2.u.allowed_1.enable_vm_functions )
+{
+dp->vmx_vmfunc.available = true;
+rdmsrl(MSR_IA32_VMX_VMFUNC, dp->vmx_vmfunc.u.raw);
+}
+}
+
 static void __init calculate_raw_policy(void)
 {
 struct msr_domain_policy *dp = _msr_domain_policy;
@@ -43,6 +118,8 @@ static void __init calculate_raw_policy(void)
 if ( val & MSR_PLATFORM_INFO_CPUID_FAULTING )
 dp->plaform_info.cpuid_faulting = true;
 }
+
+calculate_raw_vmx_policy(dp);
 }
 
 static void __init calculate_host_policy(void)
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 4/6] x86/msr: add VMX MSRs into HVM_max domain policy

2017-10-13 Thread Sergey Dyasli
Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

Add calculate_hvm_max_vmx_policy() which will save the end result of
nvmx_msr_read_intercept() on current H/W into HVM_max domain policy.
There will be no functional change to what L1 sees in VMX MSRs. But the
actual use of HVM_max domain policy will happen later, when VMX MSRs
are handled by guest_rd/wrmsr().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/msr.c | 140 +
 1 file changed, 140 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 955aba0849..388f19e50d 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -129,6 +129,144 @@ static void __init calculate_host_policy(void)
 *dp = raw_msr_domain_policy;
 }
 
+#define vmx_host_allowed_cpy(dp, msr, field) \
+do { \
+dp->msr.u.allowed_1.field =  \
+host_msr_domain_policy.msr.u.allowed_1.field;\
+dp->msr.u.allowed_0.field =  \
+host_msr_domain_policy.msr.u.allowed_0.field;\
+} while (0)
+
+static void __init calculate_hvm_max_vmx_policy(struct msr_domain_policy *dp)
+{
+if ( !cpu_has_vmx )
+return;
+
+dp->vmx_basic.available = true;
+dp->vmx_basic.u.raw = host_msr_domain_policy.vmx_basic.u.raw;
+
+dp->vmx_pinbased_ctls.available = true;
+dp->vmx_pinbased_ctls.u.raw =
+((uint64_t) VMX_PINBASED_CTLS_DEFAULT1 << 32) |
+VMX_PINBASED_CTLS_DEFAULT1;
+vmx_host_allowed_cpy(dp, vmx_pinbased_ctls, ext_intr_exiting);
+vmx_host_allowed_cpy(dp, vmx_pinbased_ctls, nmi_exiting);
+vmx_host_allowed_cpy(dp, vmx_pinbased_ctls, preempt_timer);
+
+dp->vmx_procbased_ctls.available = true;
+dp->vmx_procbased_ctls.u.raw =
+((uint64_t) VMX_PROCBASED_CTLS_DEFAULT1 << 32) |
+VMX_PROCBASED_CTLS_DEFAULT1;
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, virtual_intr_pending);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, use_tsc_offseting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, hlt_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, invlpg_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, mwait_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, rdpmc_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, rdtsc_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, cr8_load_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, cr8_store_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, tpr_shadow);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, virtual_nmi_pending);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, mov_dr_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, uncond_io_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, activate_io_bitmap);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, monitor_trap_flag);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, activate_msr_bitmap);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, monitor_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, pause_exiting);
+vmx_host_allowed_cpy(dp, vmx_procbased_ctls, activate_secondary_controls);
+
+dp->vmx_exit_ctls.available = true;
+dp->vmx_exit_ctls.u.raw =
+((uint64_t) VMX_EXIT_CTLS_DEFAULT1 << 32) |
+VMX_EXIT_CTLS_DEFAULT1;
+vmx_host_allowed_cpy(dp, vmx_exit_ctls, ia32e_mode);
+vmx_host_allowed_cpy(dp, vmx_exit_ctls, load_perf_global_ctrl);
+vmx_host_allowed_cpy(dp, vmx_exit_ctls, ack_intr_on_exit);
+vmx_host_allowed_cpy(dp, vmx_exit_ctls, save_guest_pat);
+vmx_host_allowed_cpy(dp, vmx_exit_ctls, load_host_pat);
+vmx_host_allowed_cpy(dp, vmx_exit_ctls, save_guest_efer);
+vmx_host_allowed_cpy(dp, vmx_exit_ctls, load_host_efer);
+vmx_host_allowed_cpy(dp, vmx_exit_ctls, save_preempt_timer);
+
+dp->vmx_entry_ctls.available = true;
+dp->vmx_entry_ctls.u.raw =
+((uint64_t) VMX_ENTRY_CTLS_DEFAULT1 << 32) |
+VMX_ENTRY_CTLS_DEFAULT1;
+vmx_host_allowed_cpy(dp, vmx_entry_ctls, ia32e_mode);
+vmx_host_allowed_cpy(dp, vmx_entry_ctls, load_perf_global_ctrl);
+vmx_host_allowed_cpy(dp, vmx_entry_ctls, load_guest_pat);
+vmx_host_allowed_cpy(dp, vmx_entry_ctls, load_guest_efer);
+
+dp->vmx_misc.available = true;
+dp->vmx_misc.u.raw = host_msr_domain_policy.vmx_misc.u.raw;
+/* Do not support CR3-target feature now */
+dp->vmx_misc.u.cr3_target = false;
+
+dp->vmx_cr0_fixed0.available = true;
+/* PG, PE bits must be 1 in VMX operation */
+dp->vm

[Xen-devel] [PATCH v3 0/6] VMX MSRs policy for Nested Virt: part 1

2017-10-13 Thread Sergey Dyasli
The end goal of having VMX MSRs policy is to be able to manage
L1 VMX features. This patch series is the first part of this work.
There is no functional change to what L1 sees in VMX MSRs at this
point. But each domain will have a policy object which allows to
sensibly query what VMX features the domain has. This will unblock
some other nested virtualization work items.

Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

The above makes L1 VMX feature set inconsistent between different H/W
and there is no ability to control what features are available to L1.
The overall set of issues has much in common with CPUID policy.

Part 1 adds VMX MSRs into struct msr_domain_policy and initializes them
during domain creation based on CPUID policy. In the future it should be
possible to independently configure values of VMX MSRs for each domain.

v2 --> v3:
- Rebase on top of Generic MSR Policy
- Each VMX MSR now has its own availability flag
- VMX MSRs are now completely defined during domain creation
  (all CPUID policy changes are taken into account)

Sergey Dyasli (6):
  x86/msr: add Raw and Host domain policies
  x86/msr: add VMX MSRs into struct msr_domain_policy
  x86/msr: read VMX MSRs values into Raw policy
  x86/msr: add VMX MSRs into HVM_max domain policy
  x86/msr: update domain policy on CPUID policy changes
  x86/msr: handle VMX MSRs with guest_rd/wrmsr()

 xen/arch/x86/domctl.c  |   1 +
 xen/arch/x86/hvm/hvm.c |   8 +-
 xen/arch/x86/hvm/vmx/vmx.c |   6 -
 xen/arch/x86/hvm/vmx/vvmx.c| 178 
 xen/arch/x86/msr.c | 387 +-
 xen/include/asm-x86/hvm/hvm.h  |   1 +
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 xen/include/asm-x86/msr.h  | 417 +
 8 files changed, 811 insertions(+), 189 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 6/6] x86/msr: handle VMX MSRs with guest_rd/wrmsr()

2017-10-13 Thread Sergey Dyasli
Now that each domain has a correct view of VMX MSRs in it's per-domain
MSR policy, it's possible to handle guest's RD/WRMSR with the new
handlers. Do it and remove the old nvmx_msr_read_intercept() and
associated bits.

There is no functional change to what a guest sees in VMX MSRs.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c |   6 --
 xen/arch/x86/hvm/vmx/vvmx.c| 178 -
 xen/arch/x86/msr.c |  34 +++
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 4 files changed, 34 insertions(+), 186 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c2148701ee..1a1cb98069 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2906,10 +2906,6 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 if ( nestedhvm_enabled(curr->domain) )
 *msr_content |= IA32_FEATURE_CONTROL_ENABLE_VMXON_OUTSIDE_SMX;
 break;
-case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
-if ( !nvmx_msr_read_intercept(msr, msr_content) )
-goto gp_fault;
-break;
 case MSR_IA32_MISC_ENABLE:
 rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
 /* Debug Trace Store is not supported. */
@@ -3133,8 +3129,6 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 break;
 }
 case MSR_IA32_FEATURE_CONTROL:
-case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
-/* None of these MSRs are writeable. */
 goto gp_fault;
 
 case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index dde02c076b..b0474ad310 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1976,184 +1976,6 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
 return X86EMUL_OKAY;
 }
 
-#define __emul_value(enable1, default1) \
-((enable1 | default1) << 32 | (default1))
-
-#define gen_vmx_msr(enable1, default1, host_value) \
-(((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \
-((uint32_t)(__emul_value(enable1, default1) | host_value)))
-
-/*
- * Capability reporting
- */
-int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
-{
-struct vcpu *v = current;
-struct domain *d = v->domain;
-u64 data = 0, host_data = 0;
-int r = 1;
-
-/* VMX capablity MSRs are available only when guest supports VMX. */
-if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
-return 0;
-
-/*
- * These MSRs are only available when flags in other MSRs are set.
- * These prerequisites are listed in the Intel 64 and IA-32
- * Architectures Software Developer’s Manual, Vol 3, Appendix A.
- */
-switch ( msr )
-{
-case MSR_IA32_VMX_PROCBASED_CTLS2:
-if ( !cpu_has_vmx_secondary_exec_control )
-return 0;
-break;
-
-case MSR_IA32_VMX_EPT_VPID_CAP:
-if ( !(cpu_has_vmx_ept || cpu_has_vmx_vpid) )
-return 0;
-break;
-
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_EXIT_CTLS:
-case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
-if ( !(vmx_basic_msr & VMX_BASIC_DEFAULT1_ZERO) )
-return 0;
-break;
-
-case MSR_IA32_VMX_VMFUNC:
-if ( !cpu_has_vmx_vmfunc )
-return 0;
-break;
-}
-
-rdmsrl(msr, host_data);
-
-/*
- * Remove unsupport features from n1 guest capability MSR
- */
-switch (msr) {
-case MSR_IA32_VMX_BASIC:
-{
-const struct vmcs_struct *vmcs =
-map_domain_page(_mfn(PFN_DOWN(v->arch.hvm_vmx.vmcs_pa)));
-
-data = (host_data & (~0ul << 32)) |
-   (vmcs->vmcs_revision_id & 0x7fff);
-unmap_domain_page(vmcs);
-break;
-}
-case MSR_IA32_VMX_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-/* 1-settings */
-data = PIN_BASED_EXT_INTR_MASK |
-   PIN_BASED_NMI_EXITING |
-   PIN_BASED_PREEMPT_TIMER;
-data = gen_vmx_msr(data, VMX_PINBASED_CTLS_DEFAULT1, host_data);
-break;
-case MSR_IA32_VMX_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-{
-u32 default1_bits = VMX_PROCBASED_CTLS_DEFAULT1;
-/* 1-settings */
-data = CPU_BASED_HLT_EXITING |
-   CPU_BASED_VIRTUAL_INTR_PENDING |
-   CPU_BASED_CR8_LOAD_EXITING |
-   CPU_BASED_CR8_STORE_EXITING |
-   CPU_BASED_INVLPG_EXITING |
-   CPU_BASED_CR3_LOAD_EXITING |
-   CPU_BASED_CR3_STORE_EXITING |
-   CPU_BASED_MONITOR_EXITING |
-   CPU_BASED_MWAIT_EXITING |
-   CPU_BASED_MOV_DR_EXITING |
-   CPU_BASED_AC

[Xen-devel] [PATCH v3 1/6] x86/msr: add Raw and Host domain policies

2017-10-13 Thread Sergey Dyasli
Raw policy contains the actual values from H/W MSRs. PLATFORM_INFO msr
needs to be read again because probe_intel_cpuid_faulting() records
the presence of X86_FEATURE_CPUID_FAULTING but not the presence of msr
itself (if cpuid faulting is not available).

Host policy might have certain features disabled if Xen decides not
to use them. For now, make Host policy equal to Raw policy.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/msr.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index baba44f43d..9737ed706e 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -24,12 +24,34 @@
 #include 
 #include 
 
-struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy,
+struct msr_domain_policy __read_mostly raw_msr_domain_policy,
+ __read_mostlyhost_msr_domain_policy,
+ __read_mostly hvm_max_msr_domain_policy,
  __read_mostly  pv_max_msr_domain_policy;
 
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+static void __init calculate_raw_policy(void)
+{
+struct msr_domain_policy *dp = _msr_domain_policy;
+uint64_t val;
+
+if ( rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) == 0 )
+{
+dp->plaform_info.available = true;
+if ( val & MSR_PLATFORM_INFO_CPUID_FAULTING )
+dp->plaform_info.cpuid_faulting = true;
+}
+}
+
+static void __init calculate_host_policy(void)
+{
+struct msr_domain_policy *dp = _msr_domain_policy;
+
+*dp = raw_msr_domain_policy;
+}
+
 static void __init calculate_hvm_max_policy(void)
 {
 struct msr_domain_policy *dp = _max_msr_domain_policy;
@@ -67,6 +89,8 @@ static void __init calculate_pv_max_policy(void)
 
 void __init init_guest_msr_policy(void)
 {
+calculate_raw_policy();
+calculate_host_policy();
 calculate_hvm_max_policy();
 calculate_pv_max_policy();
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 2/6] x86/msr: add VMX MSRs into struct msr_domain_policy

2017-10-13 Thread Sergey Dyasli
New definitions provide a convenient way of accessing contents of
VMX MSRs: every bit value is accessible by its name and there is a
"raw" 64-bit msr value. Bit names match existing Xen's definitions
as close as possible.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/msr.c|  42 +
 xen/include/asm-x86/msr.h | 414 ++
 2 files changed, 456 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 9737ed706e..24029a2ac1 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -216,6 +216,48 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 return X86EMUL_EXCEPTION;
 }
 
+static void __init __maybe_unused build_assertions(void)
+{
+struct msr_domain_policy p;
+
+BUILD_BUG_ON(sizeof(p.vmx_basic.u) !=
+ sizeof(p.vmx_basic.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_pinbased_ctls.u) !=
+ sizeof(p.vmx_pinbased_ctls.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_procbased_ctls.u) !=
+ sizeof(p.vmx_procbased_ctls.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_exit_ctls.u) !=
+ sizeof(p.vmx_exit_ctls.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_entry_ctls.u) !=
+ sizeof(p.vmx_entry_ctls.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_misc.u) !=
+ sizeof(p.vmx_misc.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_cr0_fixed0.u) !=
+ sizeof(p.vmx_cr0_fixed0.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_cr0_fixed1.u) !=
+ sizeof(p.vmx_cr0_fixed1.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_cr4_fixed0.u) !=
+ sizeof(p.vmx_cr4_fixed0.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_cr4_fixed1.u) !=
+ sizeof(p.vmx_cr4_fixed1.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_vmcs_enum.u) !=
+ sizeof(p.vmx_vmcs_enum.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_procbased_ctls2.u) !=
+ sizeof(p.vmx_procbased_ctls2.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_ept_vpid_cap.u) !=
+ sizeof(p.vmx_ept_vpid_cap.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_true_pinbased_ctls.u) !=
+ sizeof(p.vmx_true_pinbased_ctls.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_true_procbased_ctls.u) !=
+ sizeof(p.vmx_true_procbased_ctls.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_true_exit_ctls.u) !=
+ sizeof(p.vmx_true_exit_ctls.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_true_entry_ctls.u) !=
+ sizeof(p.vmx_true_entry_ctls.u.raw));
+BUILD_BUG_ON(sizeof(p.vmx_vmfunc.u) !=
+ sizeof(p.vmx_vmfunc.u.raw));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 751fa25a36..fc99612cca 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -202,6 +202,171 @@ void write_efer(u64 val);
 
 DECLARE_PER_CPU(u32, ler_msr);
 
+union vmx_pin_based_exec_control_bits {
+uint32_t raw;
+struct {
+bool ext_intr_exiting:1;
+uint32_t :2;  /* 1:2 reserved */
+bool  nmi_exiting:1;
+uint32_t :1;  /* 4 reserved */
+bool virtual_nmis:1;
+boolpreempt_timer:1;
+bool posted_interrupt:1;
+uint32_t :24; /* 8:31 reserved */
+};
+};
+
+union vmx_cpu_based_exec_control_bits {
+uint32_t raw;
+struct {
+uint32_t:2;  /* 0:1 reserved */
+boolvirtual_intr_pending:1;
+bool   use_tsc_offseting:1;
+uint32_t:3;  /* 4:6 reserved */
+bool hlt_exiting:1;
+uint32_t:1;  /* 8 reserved */
+bool  invlpg_exiting:1;
+bool   mwait_exiting:1;
+bool   rdpmc_exiting:1;
+bool   rdtsc_exiting:1;
+uint32_t:2;  /* 13:14 reserved */
+boolcr3_load_exiting:1;
+bool   cr3_store_exiting:1;
+uint32_t:2;  /* 17:18 reserved */
+boolcr8_load_exiting:1;
+bool   cr8_store_exiting:1;
+bool  tpr_shadow:1;
+bool virtual_nmi_pending:1;
+bool  mov_dr_exiting:1;
+bool   uncond_io_exiting:1;
+bool  activate_io_bitmap:1;
+uint32_t:1;  /* 26 reserved */
+bool   monitor_trap_flag:1;
+bool activate_msr_bitmap:1;
+bool monitor_exiting:1;
+bool   pause_exiting:1;
+bool activate_secondary_controls:1;
+};
+};
+
+union vmx_vmexit_control_bits {
+uint32_t raw;
+struct {
+uint32_t:2;  /* 0:1 reserved */
+bool   save_debug_cntrls:1;
+uint32_t:6;  /* 3:8 reserved 

[Xen-devel] [PATCH v3 5/6] x86/msr: update domain policy on CPUID policy changes

2017-10-13 Thread Sergey Dyasli
Availability of some MSRs depends on certain CPUID bits. Add function
recalculate_domain_msr_policy() which updates availability of per-domain
MSRs based on current domain's CPUID policy. This function is called
when CPUID policy is changed from a toolstack.

Add recalculate_domain_vmx_msr_policy() which changes availability of
VMX MSRs based on domain's nested virt settings.

Introduce hvm_cr4_domain_valid_bits() which accepts struct domain *
instead of struct vcpu *.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/domctl.c |  1 +
 xen/arch/x86/hvm/hvm.c|  8 +++--
 xen/arch/x86/msr.c| 70 ++-
 xen/include/asm-x86/hvm/hvm.h |  1 +
 xen/include/asm-x86/msr.h |  3 ++
 5 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 80b4df9ec9..334c67d261 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -124,6 +124,7 @@ static int update_domain_cpuid_info(struct domain *d,
 }
 
 recalculate_cpuid_policy(d);
+recalculate_domain_msr_policy(d);
 
 switch ( ctl->input[0] )
 {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 205b4cb685..7e6b15f8d7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -928,9 +928,8 @@ const char *hvm_efer_valid(const struct vcpu *v, uint64_t 
value,
 X86_CR0_CD | X86_CR0_PG)))
 
 /* These bits in CR4 can be set by the guest. */
-unsigned long hvm_cr4_guest_valid_bits(const struct vcpu *v, bool restore)
+unsigned long hvm_cr4_domain_valid_bits(const struct domain *d, bool restore)
 {
-const struct domain *d = v->domain;
 const struct cpuid_policy *p;
 bool mce, vmxe;
 
@@ -963,6 +962,11 @@ unsigned long hvm_cr4_guest_valid_bits(const struct vcpu 
*v, bool restore)
 (p->feat.pku  ? X86_CR4_PKE   : 0));
 }
 
+unsigned long hvm_cr4_guest_valid_bits(const struct vcpu *v, bool restore)
+{
+return hvm_cr4_domain_valid_bits(v->domain, restore);
+}
+
 static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
 {
 int vcpuid;
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 388f19e50d..a22e3dfaf2 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct msr_domain_policy __read_mostly raw_msr_domain_policy,
  __read_mostlyhost_msr_domain_policy,
@@ -220,7 +221,7 @@ static void __init calculate_hvm_max_vmx_policy(struct 
msr_domain_policy *dp)
 dp->vmx_cr4_fixed1.available = true;
 /*
  * Allowed CR4 bits will be updated during domain creation by
- * hvm_cr4_guest_valid_bits()
+ * hvm_cr4_domain_valid_bits()
  */
 dp->vmx_cr4_fixed1.u.raw = host_msr_domain_policy.vmx_cr4_fixed1.u.raw;
 
@@ -312,6 +313,72 @@ void __init init_guest_msr_policy(void)
 calculate_pv_max_policy();
 }
 
+static void recalculate_domain_vmx_msr_policy(struct domain *d)
+{
+struct msr_domain_policy *dp = d->arch.msr;
+
+if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
+{
+dp->vmx_basic.available = false;
+dp->vmx_pinbased_ctls.available = false;
+dp->vmx_procbased_ctls.available = false;
+dp->vmx_exit_ctls.available = false;
+dp->vmx_entry_ctls.available = false;
+dp->vmx_misc.available = false;
+dp->vmx_cr0_fixed0.available = false;
+dp->vmx_cr0_fixed1.available = false;
+dp->vmx_cr4_fixed0.available = false;
+dp->vmx_cr4_fixed1.available = false;
+dp->vmx_vmcs_enum.available = false;
+dp->vmx_procbased_ctls2.available = false;
+dp->vmx_ept_vpid_cap.available = false;
+dp->vmx_true_pinbased_ctls.available = false;
+dp->vmx_true_procbased_ctls.available = false;
+dp->vmx_true_exit_ctls.available = false;
+dp->vmx_true_entry_ctls.available = false;
+}
+else
+{
+dp->vmx_basic.available = true;
+dp->vmx_pinbased_ctls.available = true;
+dp->vmx_procbased_ctls.available = true;
+dp->vmx_exit_ctls.available = true;
+dp->vmx_entry_ctls.available = true;
+dp->vmx_misc.available = true;
+dp->vmx_cr0_fixed0.available = true;
+dp->vmx_cr0_fixed1.available = true;
+dp->vmx_cr4_fixed0.available = true;
+dp->vmx_cr4_fixed1.available = true;
+/* Get allowed CR4 bits from CPUID policy */
+dp->vmx_cr4_fixed1.u.raw = hvm_cr4_domain_valid_bits(d, false);
+dp->vmx_vmcs_enum.available = true;
+
+if ( dp->vmx_procbased_ctls.u.allowed_1.activate_secondary_controls )
+{
+dp->vmx_procbased_ctls2.available = true;
+
+if ( dp->vmx_procbased_ctls2.u.allowed_1.enable_

Re: [Xen-devel] [PATCH v3 5/9] x86/vvmx: make updating shadow EPTP value more efficient

2017-10-05 Thread Sergey Dyasli
On Thu, 2017-10-05 at 03:27 -0600, Jan Beulich wrote:
> > > > On 05.10.17 at 10:18,  wrote:
> > 
> > --- a/xen/arch/x86/hvm/vmx/entry.S
> > +++ b/xen/arch/x86/hvm/vmx/entry.S
> > @@ -80,7 +80,7 @@ UNLIKELY_END(realmode)
> >  mov  %rsp,%rdi
> >  call vmx_vmenter_helper
> >  cmp  $0,%eax
> > -jne .Lvmx_vmentry_restart
> > +je .Lvmx_vmentry_restart
> 
> If you make the function return bool, the cmp above also needs
> changing (and then preferably to "test %al, %al", in which case
> it would then also better be "jz" instead of "je").

Here's the updated delta:

diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 9fb8f89220..47cd674260 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -79,8 +79,8 @@ UNLIKELY_END(realmode)
 
 mov  %rsp,%rdi
 call vmx_vmenter_helper
-cmp  $0,%eax
-jne .Lvmx_vmentry_restart
+test %al, %al
+jz .Lvmx_vmentry_restart
 mov  VCPU_hvm_guest_cr2(%rbx),%rax
 
 pop  %r15
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c9a4111267..a5c2bd71cd 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4197,7 +4197,8 @@ static void lbr_fixup(void)
 bdw_erratum_bdf14_fixup();
 }
 
-int vmx_vmenter_helper(const struct cpu_user_regs *regs)
+/* Returns false if the vmentry has to be restarted */
+bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
 {
 struct vcpu *curr = current;
 u32 new_asid, old_asid;
@@ -4206,7 +4207,7 @@ int vmx_vmenter_helper(const struct cpu_user_regs *regs)
 
 /* Shadow EPTP can't be updated here because irqs are disabled */
  if ( nestedhvm_vcpu_in_guestmode(curr) && vcpu_nestedhvm(curr).stale_np2m 
)
- return 1;
+ return false;
 
 if ( curr->domain->arch.hvm_domain.pi_ops.do_resume )
 curr->domain->arch.hvm_domain.pi_ops.do_resume(curr);
@@ -4269,7 +4270,7 @@ int vmx_vmenter_helper(const struct cpu_user_regs *regs)
 __vmwrite(GUEST_RSP,regs->rsp);
 __vmwrite(GUEST_RFLAGS, regs->rflags | X86_EFLAGS_MBS);
 
-return 0;
+return true;
 }
 
 /*

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/9] x86/vvmx: make updating shadow EPTP value more efficient

2017-10-05 Thread Sergey Dyasli
On Wed, 2017-10-04 at 15:55 +0100, Andrew Cooper wrote:
> > >  
> > > -void vmx_vmenter_helper(const struct cpu_user_regs *regs)
> > > +int vmx_vmenter_helper(const struct cpu_user_regs *regs)
> > 
> > ...Andy, did you want a comment here explaining what the return value is
> > supposed to mean? (And/or changing this to a bool?)
> 
> Definitely a comment please (especially as it is logically inverted from
> what I would have expected originally).
> 
> Bool depending on whether it actually has boolean properties or not
> (which will depend on how the comment ends up looking).
> 
> ~Andrew

Andrew,

Are you happy with the following fixup?

diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 9fb8f89220..24265ebc08 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -80,7 +80,7 @@ UNLIKELY_END(realmode)
 mov  %rsp,%rdi
 call vmx_vmenter_helper
 cmp  $0,%eax
-jne .Lvmx_vmentry_restart
+je .Lvmx_vmentry_restart
 mov  VCPU_hvm_guest_cr2(%rbx),%rax
 
 pop  %r15
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c9a4111267..d9b35202f9 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4197,7 +4197,8 @@ static void lbr_fixup(void)
 bdw_erratum_bdf14_fixup();
 }
 
-int vmx_vmenter_helper(const struct cpu_user_regs *regs)
+/* Return false if the vmentry has to be restarted */
+bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
 {
 struct vcpu *curr = current;
 u32 new_asid, old_asid;
@@ -4206,7 +4207,7 @@ int vmx_vmenter_helper(const struct cpu_user_regs *regs)
 
 /* Shadow EPTP can't be updated here because irqs are disabled */
  if ( nestedhvm_vcpu_in_guestmode(curr) && vcpu_nestedhvm(curr).stale_np2m 
)
- return 1;
+ return false;
 
 if ( curr->domain->arch.hvm_domain.pi_ops.do_resume )
 curr->domain->arch.hvm_domain.pi_ops.do_resume(curr);
@@ -4269,7 +4270,7 @@ int vmx_vmenter_helper(const struct cpu_user_regs *regs)
 __vmwrite(GUEST_RSP,regs->rsp);
 __vmwrite(GUEST_RFLAGS, regs->rflags | X86_EFLAGS_MBS);
 
-return 0;
+return true;
 }
 
 /*

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 8/9] x86/np2m: refactor p2m_get_nestedp2m_locked()

2017-10-03 Thread Sergey Dyasli
Remove some code duplication.

Suggested-by: George Dunlap <george.dun...@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Reviewed-by: George Dunlap <george.dun...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 90bf382a49..6c937c9e17 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1829,6 +1829,7 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 struct p2m_domain *p2m;
 uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 unsigned int i;
+bool needs_flush = true;
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
 np2m_base &= ~(0xfffull);
@@ -1845,14 +1846,10 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 if ( p2m->np2m_base == np2m_base )
 {
 /* Check if np2m was flushed just before the lock */
-if ( nv->np2m_generation != p2m->np2m_generation )
-nvcpu_flush(v);
+if ( nv->np2m_generation == p2m->np2m_generation )
+needs_flush = false;
 /* np2m is up-to-date */
-p2m->np2m_base = np2m_base;
-assign_np2m(v, p2m);
-nestedp2m_unlock(d);
-
-return p2m;
+goto found;
 }
 else if ( p2m->np2m_base != P2M_BASE_EADDR )
 {
@@ -1867,15 +1864,10 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 {
 p2m = d->arch.nested_p2m[i];
 p2m_lock(p2m);
+
 if ( p2m->np2m_base == np2m_base )
-{
-nvcpu_flush(v);
-p2m->np2m_base = np2m_base;
-assign_np2m(v, p2m);
-nestedp2m_unlock(d);
+goto found;
 
-return p2m;
-}
 p2m_unlock(p2m);
 }
 
@@ -1884,8 +1876,11 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 p2m = p2m_getlru_nestedp2m(d, NULL);
 p2m_flush_table(p2m);
 p2m_lock(p2m);
+
+ found:
+if ( needs_flush )
+nvcpu_flush(v);
 p2m->np2m_base = np2m_base;
-nvcpu_flush(v);
 assign_np2m(v, p2m);
 nestedp2m_unlock(d);
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 7/9] x86/np2m: implement sharing of np2m between vCPUs

2017-10-03 Thread Sergey Dyasli
At the moment, nested p2ms are not shared between vcpus even if they
share the same base pointer.

Modify p2m_get_nestedp2m() to allow sharing a np2m between multiple
vcpus with the same np2m_base (L1 np2m_base value in VMCx12).

If the current np2m doesn't match the current base pointer, first look
for another nested p2m in the same domain with the same base pointer,
before reclaiming one from the LRU.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Signed-off-by: George Dunlap <george.dun...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c |  1 +
 xen/arch/x86/mm/p2m.c   | 26 ++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 198ca72f2a..dde02c076b 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1201,6 +1201,7 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
 
 /* Setup virtual ETP for L2 guest*/
 if ( nestedhvm_paging_mode_hap(v) )
+/* This will setup the initial np2m for the nested vCPU */
 __vmwrite(EPT_POINTER, get_shadow_eptp(v));
 else
 __vmwrite(EPT_POINTER, get_host_eptp(v));
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 3c62292165..90bf382a49 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1828,6 +1828,7 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 struct domain *d = v->domain;
 struct p2m_domain *p2m;
 uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
+unsigned int i;
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
 np2m_base &= ~(0xfffull);
@@ -1841,19 +1842,19 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 if ( p2m ) 
 {
 p2m_lock(p2m);
-if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
+if ( p2m->np2m_base == np2m_base )
 {
 /* Check if np2m was flushed just before the lock */
-if ( p2m->np2m_base == P2M_BASE_EADDR ||
- nv->np2m_generation != p2m->np2m_generation )
+if ( nv->np2m_generation != p2m->np2m_generation )
 nvcpu_flush(v);
+/* np2m is up-to-date */
 p2m->np2m_base = np2m_base;
 assign_np2m(v, p2m);
 nestedp2m_unlock(d);
 
 return p2m;
 }
-else
+else if ( p2m->np2m_base != P2M_BASE_EADDR )
 {
 /* vCPU is switching from some other valid np2m */
 cpumask_clear_cpu(v->processor, p2m->dirty_cpumask);
@@ -1861,6 +1862,23 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 p2m_unlock(p2m);
 }
 
+/* Share a np2m if possible */
+for ( i = 0; i < MAX_NESTEDP2M; i++ )
+{
+p2m = d->arch.nested_p2m[i];
+p2m_lock(p2m);
+if ( p2m->np2m_base == np2m_base )
+{
+nvcpu_flush(v);
+p2m->np2m_base = np2m_base;
+assign_np2m(v, p2m);
+nestedp2m_unlock(d);
+
+return p2m;
+}
+p2m_unlock(p2m);
+}
+
 /* All p2m's are or were in use. Take the least recent used one,
  * flush it and reuse. */
 p2m = p2m_getlru_nestedp2m(d, NULL);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 6/9] x86/np2m: send flush IPIs only when a vcpu is actively using an np2m

2017-10-03 Thread Sergey Dyasli
Flush IPIs are sent to all cpus in an np2m's dirty_cpumask when
updated.  This mask however is far too broad.  A pcpu's bit is set in
the cpumask when a vcpu runs on that pcpu, but is only cleared when a
flush happens.  This means that the IPI includes the current pcpu of
vcpus that are not currently running, and also includes any pcpu that
has ever had a vcpu use this p2m since the last flush (which in turn
will cause spurious invalidations if a different vcpu is using an np2m).

Avoid these IPIs by keeping closer track of where an np2m is being used,
and when a vcpu needs to be flushed:

- On schedule-out, clear v->processor in p2m->dirty_cpumask
- Add a 'generation' counter to the p2m and nestedvcpu structs to
  detect changes that would require re-loads on re-entry
- On schedule-in or p2m change:
  - Set v->processor in p2m->dirty_cpumask
  - flush the vcpu's nested p2m pointer (and update nv->generation) if
the generation changed

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Signed-off-by: George Dunlap <george.dun...@citrix.com>
---
v2 --> v3:
- current pointer is now calculated only once in np2m_schedule()
- Replaced "shadow p2m" with "np2m" for consistency in commit message
---
 xen/arch/x86/domain.c  |  2 ++
 xen/arch/x86/hvm/nestedhvm.c   |  1 +
 xen/arch/x86/hvm/vmx/vvmx.c|  3 +++
 xen/arch/x86/mm/p2m.c  | 56 +-
 xen/include/asm-x86/hvm/vcpu.h |  1 +
 xen/include/asm-x86/p2m.h  |  6 +
 6 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 466a1a2fac..35ea0d2418 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1668,6 +1668,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 {
 _update_runstate_area(prev);
 vpmu_switch_from(prev);
+np2m_schedule(NP2M_SCHEDLE_OUT);
 }
 
 if ( is_hvm_domain(prevd) && !list_empty(>arch.hvm_vcpu.tm_list) )
@@ -1716,6 +1717,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
 /* Must be done with interrupts enabled */
 vpmu_switch_to(next);
+np2m_schedule(NP2M_SCHEDLE_IN);
 }
 
 /* Ensure that the vcpu has an up-to-date time base. */
diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c
index 74a464d162..ab50b2ab98 100644
--- a/xen/arch/x86/hvm/nestedhvm.c
+++ b/xen/arch/x86/hvm/nestedhvm.c
@@ -57,6 +57,7 @@ nestedhvm_vcpu_reset(struct vcpu *v)
 nv->nv_flushp2m = 0;
 nv->nv_p2m = NULL;
 nv->stale_np2m = false;
+nv->np2m_generation = 0;
 
 hvm_asid_flush_vcpu_asid(>nv_n2asid);
 
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 3f596dc698..198ca72f2a 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1367,6 +1367,9 @@ static void virtual_vmexit(struct cpu_user_regs *regs)
  !(v->arch.hvm_vcpu.guest_efer & EFER_LMA) )
 shadow_to_vvmcs_bulk(v, ARRAY_SIZE(gpdpte_fields), gpdpte_fields);
 
+/* This will clear current pCPU bit in p2m->dirty_cpumask */
+np2m_schedule(NP2M_SCHEDLE_OUT);
+
 vmx_vmcs_switch(v->arch.hvm_vmx.vmcs_pa, nvcpu->nv_n1vmcx_pa);
 
 nestedhvm_vcpu_exit_guestmode(v);
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index fd48a3b9db..3c62292165 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -73,6 +73,7 @@ static int p2m_initialise(struct domain *d, struct p2m_domain 
*p2m)
 p2m->p2m_class = p2m_host;
 
 p2m->np2m_base = P2M_BASE_EADDR;
+p2m->np2m_generation = 0;
 
 for ( i = 0; i < ARRAY_SIZE(p2m->pod.mrp.list); ++i )
 p2m->pod.mrp.list[i] = gfn_x(INVALID_GFN);
@@ -1735,6 +1736,7 @@ p2m_flush_table_locked(struct p2m_domain *p2m)
 
 /* This is no longer a valid nested p2m for any address space */
 p2m->np2m_base = P2M_BASE_EADDR;
+p2m->np2m_generation++;
 
 /* Make sure nobody else is using this p2m table */
 nestedhvm_vmcx_flushtlb(p2m);
@@ -1809,6 +1811,7 @@ static void assign_np2m(struct vcpu *v, struct p2m_domain 
*p2m)
 
 nv->nv_flushp2m = 0;
 nv->nv_p2m = p2m;
+nv->np2m_generation = p2m->np2m_generation;
 cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
 }
 
@@ -1840,7 +1843,9 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 p2m_lock(p2m);
 if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
 {
-if ( p2m->np2m_base == P2M_BASE_EADDR )
+/* Check if np2m was flushed just before the lock */
+if ( p2m->np2m_base == P2M_BASE_EADDR ||
+ nv->np2m_generation != p2m->np2m_generation )
 nvcpu_flush(v);
 p2m->np2m_base = np2m_base;
 assign_np2m(v, p2m);
@@ -1848,6 +1853,11 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 

[Xen-devel] [PATCH v3 3/9] x86/np2m: remove np2m_base from p2m_get_nestedp2m()

2017-10-03 Thread Sergey Dyasli
Remove np2m_base parameter as it should always match the value of
np2m_base in VMCx12.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Reviewed-by: George Dunlap <george.dun...@citrix.com>
---
 xen/arch/x86/hvm/svm/nestedsvm.c | 6 +-
 xen/arch/x86/hvm/vmx/vvmx.c  | 3 +--
 xen/arch/x86/mm/hap/nested_hap.c | 2 +-
 xen/arch/x86/mm/p2m.c| 8 
 xen/include/asm-x86/p2m.h| 5 ++---
 5 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 66a1777298..1de896e456 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -412,7 +412,11 @@ static void nestedsvm_vmcb_set_nestedp2m(struct vcpu *v,
 ASSERT(v != NULL);
 ASSERT(vvmcb != NULL);
 ASSERT(n2vmcb != NULL);
-p2m = p2m_get_nestedp2m(v, vvmcb->_h_cr3);
+
+/* This will allow nsvm_vcpu_hostcr3() to return correct np2m_base */
+vcpu_nestedsvm(v).ns_vmcb_hostcr3 = vvmcb->_h_cr3;
+
+p2m = p2m_get_nestedp2m(v);
 n2vmcb->_h_cr3 = pagetable_get_paddr(p2m_get_pagetable(p2m));
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index d333aa6d78..2f468e6ced 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1109,8 +1109,7 @@ static void load_shadow_guest_state(struct vcpu *v)
 
 uint64_t get_shadow_eptp(struct vcpu *v)
 {
-uint64_t np2m_base = nvmx_vcpu_eptp_base(v);
-struct p2m_domain *p2m = p2m_get_nestedp2m(v, np2m_base);
+struct p2m_domain *p2m = p2m_get_nestedp2m(v);
 struct ept_data *ept = >ept;
 
 ept->mfn = pagetable_get_pfn(p2m_get_pagetable(p2m));
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 162afed46b..ed137fa784 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -212,7 +212,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
*L2_gpa,
 uint8_t p2ma_21 = p2m_access_rwx;
 
 p2m = p2m_get_hostp2m(d); /* L0 p2m */
-nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
+nested_p2m = p2m_get_nestedp2m(v);
 
 /* walk the L1 P2M table */
 rv = nestedhap_walk_L1_p2m(v, *L2_gpa, _gpa, _order_21, _21,
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b7588b2ec1..d3e602de22 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1813,11 +1813,12 @@ static void assign_np2m(struct vcpu *v, struct 
p2m_domain *p2m)
 }
 
 struct p2m_domain *
-p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
+p2m_get_nestedp2m(struct vcpu *v)
 {
 struct nestedvcpu *nv = _nestedhvm(v);
 struct domain *d = v->domain;
 struct p2m_domain *p2m;
+uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
 np2m_base &= ~(0xfffull);
@@ -1865,7 +1866,7 @@ p2m_get_p2m(struct vcpu *v)
 if (!nestedhvm_is_n2(v))
 return p2m_get_hostp2m(v->domain);
 
-return p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
+return p2m_get_nestedp2m(v);
 }
 
 unsigned long paging_gva_to_gfn(struct vcpu *v,
@@ -1880,13 +1881,12 @@ unsigned long paging_gva_to_gfn(struct vcpu *v,
 unsigned long l2_gfn, l1_gfn;
 struct p2m_domain *p2m;
 const struct paging_mode *mode;
-uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 uint8_t l1_p2ma;
 unsigned int l1_page_order;
 int rv;
 
 /* translate l2 guest va into l2 guest gfn */
-p2m = p2m_get_nestedp2m(v, np2m_base);
+p2m = p2m_get_nestedp2m(v);
 mode = paging_get_nestedmode(v);
 l2_gfn = mode->gva_to_gfn(v, p2m, va, pfec);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index ce50e37f46..798295ec12 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -360,10 +360,9 @@ struct p2m_domain {
 #define p2m_get_hostp2m(d)  ((d)->arch.p2m)
 
 /*
- * Assigns an np2m with the specified np2m_base to the specified vCPU
- * and returns that np2m.
+ * Updates vCPU's n2pm to match its np2m_base in VMCx12 and returns that np2m.
  */
-struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
+struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v);
 
 /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m().
  * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m().
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 0/9] Nested p2m: allow sharing between vCPUs

2017-10-03 Thread Sergey Dyasli
Nested p2m (shadow EPT) is an object that stores memory address
translations from L2 GPA directly to L0 HPA. This is achieved by
combining together L1 EPT with L0 EPT during L2 EPT violations.

In the usual case, L1 uses the same EPTP value in VMCS12 for all vCPUs
of a L2 guest. But unfortunately, in current Xen's implementation, each
vCPU has its own n2pm object which cannot be shared with other vCPUs.
This leads to the following issues if a nested guest has SMP:

1. There will be multiple np2m objects (1 per nested vCPU) with
   the same np2m_base (L1 EPTP value in VMCS12).

2. Same EPT violations will be processed independently by each vCPU

3. Since MAX_NESTEDP2M is defined as 10, if a domain has more than
   10 nested vCPUs, performance will be extremely degraded due to
   constant np2m LRU list thrashing and np2m flushing.

This patch series makes it possible to share one np2m object between
different vCPUs that have the same np2m_base. Sharing of np2m objects
improves scalability of a domain from 10 nested vCPUs to 10 nested
guests (with an arbitrary number of vCPUs per guest).

v2 --> v3:
- "VMCX" is replaced with "VMCx" in comments and commit messages
- current pointer is now calculated only once in nvmx_eptp_update()
  and np2m_schedule()
- moved p2m_unlock() out of nestedhap_fix_p2m() for balanced lock/unlock
- Updated commit message in patch #2
- Replaced "shadow p2m" with "np2m" for consistency in commit message
  of patch #6

v1 --> v2 (by George):
- Fixed a race with stale_np2m and vmwrite
- Squashed 14 patches down to 9
- Updated commit messages

RFC --> v1:
- Some commit messages are updated based on George's comments
- Replaced VMX's terminology in common code with HVM's one
- Patch "x86/vvmx: add stale_eptp flag" is split into
  "x86/np2m: add stale_np2m flag" and
  "x86/vvmx: restart nested vmentry in case of stale_np2m"
- Added "x86/np2m: refactor p2m_get_nestedp2m_locked()" patch
- I've done some light nested SVM testing and fixed 1 regression
  (see patch #4)

Sergey Dyasli (9):
  x86/np2m: refactor p2m_get_nestedp2m()
  x86/np2m: flush all np2m objects on nested INVEPT
  x86/np2m: remove np2m_base from p2m_get_nestedp2m()
  x86/np2m: simplify nestedhvm_hap_nested_page_fault()
  x86/vvmx: make updating shadow EPTP value more efficient
  x86/np2m: send flush IPIs only when a vcpu is actively using an np2m
  x86/np2m: implement sharing of np2m between vCPUs
  x86/np2m: refactor p2m_get_nestedp2m_locked()
  x86/np2m: add break to np2m_flush_eptp()

 xen/arch/x86/domain.c|   2 +
 xen/arch/x86/hvm/nestedhvm.c |   3 +
 xen/arch/x86/hvm/svm/nestedsvm.c |   6 +-
 xen/arch/x86/hvm/vmx/entry.S |   6 ++
 xen/arch/x86/hvm/vmx/vmx.c   |  14 ++--
 xen/arch/x86/hvm/vmx/vvmx.c  |  36 ++--
 xen/arch/x86/mm/hap/nested_hap.c |  34 
 xen/arch/x86/mm/p2m.c| 175 ---
 xen/include/asm-x86/hvm/vcpu.h   |   2 +
 xen/include/asm-x86/p2m.h|  17 +++-
 10 files changed, 223 insertions(+), 72 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 5/9] x86/vvmx: make updating shadow EPTP value more efficient

2017-10-03 Thread Sergey Dyasli
At the moment, the shadow EPTP value is written unconditionally in
ept_handle_violation().

Instead, write the value on vmentry to the guest; but only write it if
the value needs updating.

To detect this, add a flag to the nestedvcpu struct, stale_np2m, to
indicate when such an action is necessary.  Set it when the nested p2m
changes or when the np2m is flushed by an IPI, and clear it when we
write the new value.

Since an IPI invalidating the p2m may happen between
nvmx_switch_guest() and vmx_vmenter, but we can't perform the vmwrite
with interrupts disabled, check the flag just before entering the
guest and restart the vmentry if it's set.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Signed-off-by: George Dunlap <george.dun...@citrix.com>
---
v2 --> v3:
- current pointer is now calculated only once in nvmx_eptp_update()
---
 xen/arch/x86/hvm/nestedhvm.c   |  2 ++
 xen/arch/x86/hvm/vmx/entry.S   |  6 ++
 xen/arch/x86/hvm/vmx/vmx.c | 14 +++---
 xen/arch/x86/hvm/vmx/vvmx.c| 22 ++
 xen/arch/x86/mm/p2m.c  | 10 --
 xen/include/asm-x86/hvm/vcpu.h |  1 +
 6 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c
index f2f7469d86..74a464d162 100644
--- a/xen/arch/x86/hvm/nestedhvm.c
+++ b/xen/arch/x86/hvm/nestedhvm.c
@@ -56,6 +56,7 @@ nestedhvm_vcpu_reset(struct vcpu *v)
 nv->nv_vvmcxaddr = INVALID_PADDR;
 nv->nv_flushp2m = 0;
 nv->nv_p2m = NULL;
+nv->stale_np2m = false;
 
 hvm_asid_flush_vcpu_asid(>nv_n2asid);
 
@@ -107,6 +108,7 @@ nestedhvm_flushtlb_ipi(void *info)
  */
 hvm_asid_flush_core();
 vcpu_nestedhvm(v).nv_p2m = NULL;
+vcpu_nestedhvm(v).stale_np2m = true;
 }
 
 void
diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 53eedc6363..9fb8f89220 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -79,6 +79,8 @@ UNLIKELY_END(realmode)
 
 mov  %rsp,%rdi
 call vmx_vmenter_helper
+cmp  $0,%eax
+jne .Lvmx_vmentry_restart
 mov  VCPU_hvm_guest_cr2(%rbx),%rax
 
 pop  %r15
@@ -117,6 +119,10 @@ ENTRY(vmx_asm_do_vmentry)
 GET_CURRENT(bx)
 jmp  .Lvmx_do_vmentry
 
+.Lvmx_vmentry_restart:
+sti
+jmp  .Lvmx_do_vmentry
+
 .Lvmx_goto_emulator:
 sti
 mov  %rsp,%rdi
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 9cfa9b6965..c9a4111267 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3249,12 +3249,6 @@ static void ept_handle_violation(ept_qual_t q, paddr_t 
gpa)
 case 0: // Unhandled L1 EPT violation
 break;
 case 1: // This violation is handled completly
-/*Current nested EPT maybe flushed by other vcpus, so need
- * to re-set its shadow EPTP pointer.
- */
-if ( nestedhvm_vcpu_in_guestmode(current) &&
-nestedhvm_paging_mode_hap(current ) )
-__vmwrite(EPT_POINTER, get_shadow_eptp(current));
 return;
 case -1:// This vioaltion should be injected to L1 VMM
 vcpu_nestedhvm(current).nv_vmexit_pending = 1;
@@ -4203,13 +4197,17 @@ static void lbr_fixup(void)
 bdw_erratum_bdf14_fixup();
 }
 
-void vmx_vmenter_helper(const struct cpu_user_regs *regs)
+int vmx_vmenter_helper(const struct cpu_user_regs *regs)
 {
 struct vcpu *curr = current;
 u32 new_asid, old_asid;
 struct hvm_vcpu_asid *p_asid;
 bool_t need_flush;
 
+/* Shadow EPTP can't be updated here because irqs are disabled */
+ if ( nestedhvm_vcpu_in_guestmode(curr) && vcpu_nestedhvm(curr).stale_np2m 
)
+ return 1;
+
 if ( curr->domain->arch.hvm_domain.pi_ops.do_resume )
 curr->domain->arch.hvm_domain.pi_ops.do_resume(curr);
 
@@ -4270,6 +4268,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
 __vmwrite(GUEST_RIP,regs->rip);
 __vmwrite(GUEST_RSP,regs->rsp);
 __vmwrite(GUEST_RFLAGS, regs->rflags | X86_EFLAGS_MBS);
+
+return 0;
 }
 
 /*
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 2f468e6ced..3f596dc698 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1405,12 +1405,34 @@ static void virtual_vmexit(struct cpu_user_regs *regs)
 vmsucceed(regs);
 }
 
+static void nvmx_eptp_update(void)
+{
+struct vcpu *curr = current;
+
+if ( !nestedhvm_vcpu_in_guestmode(curr) ||
+  vcpu_nestedhvm(curr).nv_vmexit_pending ||
+ !vcpu_nestedhvm(curr).stale_np2m ||
+ !nestedhvm_paging_mode_hap(curr) )
+return;
+
+/*
+ * Interrupts are enabled here, so we need to clear stale_np2m
+ * before we do the vmwrite.  If we do it in the other order, an
+ * and IPI comes in changing the shadow eptp after the vmwrite,
+ * we'll complete the

[Xen-devel] [PATCH v3 1/9] x86/np2m: refactor p2m_get_nestedp2m()

2017-10-03 Thread Sergey Dyasli
1. Add a helper function assign_np2m()
2. Remove useless volatile
3. Update function's comment in the header
4. Minor style fixes ('\n' and d)

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Reviewed-by: George Dunlap <george.dun...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 31 ++-
 xen/include/asm-x86/p2m.h |  6 +++---
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 0b479105b9..27b90eb815 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1776,14 +1776,24 @@ p2m_flush_nestedp2m(struct domain *d)
 p2m_flush_table(d->arch.nested_p2m[i]);
 }
 
+static void assign_np2m(struct vcpu *v, struct p2m_domain *p2m)
+{
+struct nestedvcpu *nv = _nestedhvm(v);
+struct domain *d = v->domain;
+
+/* Bring this np2m to the top of the LRU list */
+p2m_getlru_nestedp2m(d, p2m);
+
+nv->nv_flushp2m = 0;
+nv->nv_p2m = p2m;
+cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
+}
+
 struct p2m_domain *
 p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 {
-/* Use volatile to prevent gcc to cache nv->nv_p2m in a cpu register as
- * this may change within the loop by an other (v)cpu.
- */
-volatile struct nestedvcpu *nv = _nestedhvm(v);
-struct domain *d;
+struct nestedvcpu *nv = _nestedhvm(v);
+struct domain *d = v->domain;
 struct p2m_domain *p2m;
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
@@ -1793,7 +1803,6 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 nv->nv_p2m = NULL;
 }
 
-d = v->domain;
 nestedp2m_lock(d);
 p2m = nv->nv_p2m;
 if ( p2m ) 
@@ -1801,15 +1810,13 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 p2m_lock(p2m);
 if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
 {
-nv->nv_flushp2m = 0;
-p2m_getlru_nestedp2m(d, p2m);
-nv->nv_p2m = p2m;
 if ( p2m->np2m_base == P2M_BASE_EADDR )
 hvm_asid_flush_vcpu(v);
 p2m->np2m_base = np2m_base;
-cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
+assign_np2m(v, p2m);
 p2m_unlock(p2m);
 nestedp2m_unlock(d);
+
 return p2m;
 }
 p2m_unlock(p2m);
@@ -1820,11 +1827,9 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 p2m = p2m_getlru_nestedp2m(d, NULL);
 p2m_flush_table(p2m);
 p2m_lock(p2m);
-nv->nv_p2m = p2m;
 p2m->np2m_base = np2m_base;
-nv->nv_flushp2m = 0;
 hvm_asid_flush_vcpu(v);
-cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
+assign_np2m(v, p2m);
 p2m_unlock(p2m);
 nestedp2m_unlock(d);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 8f3409b400..338317a782 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -359,9 +359,9 @@ struct p2m_domain {
 /* get host p2m table */
 #define p2m_get_hostp2m(d)  ((d)->arch.p2m)
 
-/* Get p2m table (re)usable for specified np2m base.
- * Automatically destroys and re-initializes a p2m if none found.
- * If np2m_base == 0 then v->arch.hvm_vcpu.guest_cr[3] is used.
+/*
+ * Assigns an np2m with the specified np2m_base to the specified vCPU
+ * and returns that np2m.
  */
 struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 9/9] x86/np2m: add break to np2m_flush_eptp()

2017-10-03 Thread Sergey Dyasli
Now that np2m sharing is implemented, there can be only one np2m object
with the same np2m_base. Break from loop if the required np2m was found
during np2m_flush_eptp().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Reviewed-by: George Dunlap <george.dun...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 4 
 xen/include/asm-x86/p2m.h | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6c937c9e17..d36eee7ae0 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1795,7 +1795,11 @@ void np2m_flush_base(struct vcpu *v, unsigned long 
np2m_base)
 p2m = d->arch.nested_p2m[i];
 p2m_lock(p2m);
 if ( p2m->np2m_base == np2m_base )
+{
 p2m_flush_table_locked(p2m);
+p2m_unlock(p2m);
+break;
+}
 p2m_unlock(p2m);
 }
 nestedp2m_unlock(d);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 182463b247..a26070957f 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -779,7 +779,7 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa);
 void p2m_flush(struct vcpu *v, struct p2m_domain *p2m);
 /* Flushes all nested p2m tables */
 void p2m_flush_nestedp2m(struct domain *d);
-/* Flushes all np2m objects with the specified np2m_base */
+/* Flushes the np2m specified by np2m_base (if it exists) */
 void np2m_flush_base(struct vcpu *v, unsigned long np2m_base);
 
 void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 2/9] x86/np2m: flush all np2m objects on nested INVEPT

2017-10-03 Thread Sergey Dyasli
At the moment, nvmx_handle_invept() updates the current np2m just to
flush it.  Instead introduce a function, np2m_flush_base(), which will
look up the np2m base pointer and call p2m_flush_table() instead.

Unfortunately, since we don't know which p2m a given vcpu is using, we
must flush all p2ms that share that base pointer.

Convert p2m_flush_table() into p2m_flush_table_locked() in order not
to release the p2m_lock after np2m_base check.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Signed-off-by: George Dunlap <george.dun...@citrix.com>
---
v2 --> v3:
- Commit message update
---
 xen/arch/x86/hvm/vmx/vvmx.c |  7 +--
 xen/arch/x86/mm/p2m.c   | 35 +--
 xen/include/asm-x86/p2m.h   |  2 ++
 3 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index cd0ee0a307..d333aa6d78 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1910,12 +1910,7 @@ int nvmx_handle_invept(struct cpu_user_regs *regs)
 {
 case INVEPT_SINGLE_CONTEXT:
 {
-struct p2m_domain *p2m = p2m_get_nestedp2m(current, eptp);
-if ( p2m )
-{
-p2m_flush(current, p2m);
-ept_sync_domain(p2m);
-}
+np2m_flush_base(current, eptp);
 break;
 }
 case INVEPT_ALL_CONTEXT:
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 27b90eb815..b7588b2ec1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1711,15 +1711,14 @@ p2m_getlru_nestedp2m(struct domain *d, struct 
p2m_domain *p2m)
 return p2m;
 }
 
-/* Reset this p2m table to be empty */
 static void
-p2m_flush_table(struct p2m_domain *p2m)
+p2m_flush_table_locked(struct p2m_domain *p2m)
 {
 struct page_info *top, *pg;
 struct domain *d = p2m->domain;
 mfn_t mfn;
 
-p2m_lock(p2m);
+ASSERT(p2m_locked_by_me(p2m));
 
 /*
  * "Host" p2m tables can have shared entries  that need a bit more care
@@ -1732,10 +1731,7 @@ p2m_flush_table(struct p2m_domain *p2m)
 
 /* No need to flush if it's already empty */
 if ( p2m_is_nestedp2m(p2m) && p2m->np2m_base == P2M_BASE_EADDR )
-{
-p2m_unlock(p2m);
 return;
-}
 
 /* This is no longer a valid nested p2m for any address space */
 p2m->np2m_base = P2M_BASE_EADDR;
@@ -1755,7 +1751,14 @@ p2m_flush_table(struct p2m_domain *p2m)
 d->arch.paging.free_page(d, pg);
 }
 page_list_add(top, >pages);
+}
 
+/* Reset this p2m table to be empty */
+static void
+p2m_flush_table(struct p2m_domain *p2m)
+{
+p2m_lock(p2m);
+p2m_flush_table_locked(p2m);
 p2m_unlock(p2m);
 }
 
@@ -1776,6 +1779,26 @@ p2m_flush_nestedp2m(struct domain *d)
 p2m_flush_table(d->arch.nested_p2m[i]);
 }
 
+void np2m_flush_base(struct vcpu *v, unsigned long np2m_base)
+{
+struct domain *d = v->domain;
+struct p2m_domain *p2m;
+unsigned int i;
+
+np2m_base &= ~(0xfffull);
+
+nestedp2m_lock(d);
+for ( i = 0; i < MAX_NESTEDP2M; i++ )
+{
+p2m = d->arch.nested_p2m[i];
+p2m_lock(p2m);
+if ( p2m->np2m_base == np2m_base )
+p2m_flush_table_locked(p2m);
+p2m_unlock(p2m);
+}
+nestedp2m_unlock(d);
+}
+
 static void assign_np2m(struct vcpu *v, struct p2m_domain *p2m)
 {
 struct nestedvcpu *nv = _nestedhvm(v);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 338317a782..ce50e37f46 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -772,6 +772,8 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa);
 void p2m_flush(struct vcpu *v, struct p2m_domain *p2m);
 /* Flushes all nested p2m tables */
 void p2m_flush_nestedp2m(struct domain *d);
+/* Flushes all np2m objects with the specified np2m_base */
+void np2m_flush_base(struct vcpu *v, unsigned long np2m_base);
 
 void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
 l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 4/9] x86/np2m: simplify nestedhvm_hap_nested_page_fault()

2017-10-03 Thread Sergey Dyasli
There is a possibility for nested_p2m to became stale between
nestedhvm_hap_nested_page_fault() and nestedhap_fix_p2m().  At the moment
this is handled by detecting such a race inside nestedhap_fix_p2m() and
special-casing it.

Instead, introduce p2m_get_nestedp2m_locked(), which will returned a
still-locked p2m.  This allows us to call nestedhap_fix_p2m() with the
lock held and remove the code detecting the special-case.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Signed-off-by: George Dunlap <george.dun...@citrix.com>
---
v2 --> v3:
- Moved p2m_unlock() out of nestedhap_fix_p2m() for balanced lock/unlock
---
 xen/arch/x86/mm/hap/nested_hap.c | 34 ++
 xen/arch/x86/mm/p2m.c| 12 +---
 xen/include/asm-x86/p2m.h|  2 ++
 3 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index ed137fa784..d7277cccdc 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -101,30 +101,23 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
   unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma)
 {
 int rc = 0;
+unsigned long gfn, mask;
+mfn_t mfn;
+
 ASSERT(p2m);
 ASSERT(p2m->set_entry);
+ASSERT(p2m_locked_by_me(p2m));
 
-p2m_lock(p2m);
-
-/* If this p2m table has been flushed or recycled under our feet, 
- * leave it alone.  We'll pick up the right one as we try to 
- * vmenter the guest. */
-if ( p2m->np2m_base == nhvm_vcpu_p2m_base(v) )
-{
-unsigned long gfn, mask;
-mfn_t mfn;
+/*
+ * If this is a superpage mapping, round down both addresses to
+ * the start of the superpage.
+ */
+mask = ~((1UL << page_order) - 1);
 
-/* If this is a superpage mapping, round down both addresses
- * to the start of the superpage. */
-mask = ~((1UL << page_order) - 1);
-
-gfn = (L2_gpa >> PAGE_SHIFT) & mask;
-mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
-
-rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
-}
+gfn = (L2_gpa >> PAGE_SHIFT) & mask;
+mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
 
-p2m_unlock(p2m);
+rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
 
 if ( rc )
 {
@@ -212,7 +205,6 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
*L2_gpa,
 uint8_t p2ma_21 = p2m_access_rwx;
 
 p2m = p2m_get_hostp2m(d); /* L0 p2m */
-nested_p2m = p2m_get_nestedp2m(v);
 
 /* walk the L1 P2M table */
 rv = nestedhap_walk_L1_p2m(v, *L2_gpa, _gpa, _order_21, _21,
@@ -278,8 +270,10 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
*L2_gpa,
 p2ma_10 &= (p2m_access_t)p2ma_21;
 
 /* fix p2m_get_pagetable(nested_p2m) */
+nested_p2m = p2m_get_nestedp2m_locked(v);
 nestedhap_fix_p2m(v, nested_p2m, *L2_gpa, L0_gpa, page_order_20,
 p2mt_10, p2ma_10);
+p2m_unlock(nested_p2m);
 
 return NESTEDHVM_PAGEFAULT_DONE;
 }
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d3e602de22..aa3182dec6 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1813,7 +1813,7 @@ static void assign_np2m(struct vcpu *v, struct p2m_domain 
*p2m)
 }
 
 struct p2m_domain *
-p2m_get_nestedp2m(struct vcpu *v)
+p2m_get_nestedp2m_locked(struct vcpu *v)
 {
 struct nestedvcpu *nv = _nestedhvm(v);
 struct domain *d = v->domain;
@@ -1838,7 +1838,6 @@ p2m_get_nestedp2m(struct vcpu *v)
 hvm_asid_flush_vcpu(v);
 p2m->np2m_base = np2m_base;
 assign_np2m(v, p2m);
-p2m_unlock(p2m);
 nestedp2m_unlock(d);
 
 return p2m;
@@ -1854,12 +1853,19 @@ p2m_get_nestedp2m(struct vcpu *v)
 p2m->np2m_base = np2m_base;
 hvm_asid_flush_vcpu(v);
 assign_np2m(v, p2m);
-p2m_unlock(p2m);
 nestedp2m_unlock(d);
 
 return p2m;
 }
 
+struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v)
+{
+struct p2m_domain *p2m = p2m_get_nestedp2m_locked(v);
+p2m_unlock(p2m);
+
+return p2m;
+}
+
 struct p2m_domain *
 p2m_get_p2m(struct vcpu *v)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 798295ec12..9a757792ee 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -363,6 +363,8 @@ struct p2m_domain {
  * Updates vCPU's n2pm to match its np2m_base in VMCx12 and returns that np2m.
  */
 struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v);
+/* Similar to the above except that returned p2m is still write-locked */
+struct p2m_domain *p2m_get_nestedp2m_locked(struct vcpu *v);
 
 /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m().
  * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m().
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/9] x86/np2m: Have invept flush all np2m entries with the same base pointer

2017-10-02 Thread Sergey Dyasli
On Mon, 2017-10-02 at 11:07 +0100, George Dunlap wrote:
> On 10/02/2017 10:40 AM, George Dunlap wrote:
> > On 10/02/2017 10:37 AM, Sergey Dyasli wrote:
> > > On Fri, 2017-09-29 at 16:01 +0100, George Dunlap wrote:
> > > > nvmx_handle_invept() updates current's np2m just to flush it.  This is
> > > > not only wasteful, but ineffective: if several L2 vcpus share the same
> > > > np2m base pointer, they all need to be flushed (not only the current
> > > > one).
> > > 
> > > I don't follow this completely. L1 will use INVEPT on each vCPU that
> > > shares the same np2m pointer. The main idea here was not to update
> > > current's np2m just to flush it.
> > 
> > Hmm, yes the INVEPT thing is true.  But if that's the case, why do we
> > need np2m_flush_base() to loop over the whole list and flush all np2ms
> > with the same pointer?
> 
> Oh, nevermind -- you don't know which np2m is being used by this vcpu,
> so you have to flush all of the np2ms that match that base pointer.
> 
> What about this changelog:
> 
> ---
> x86/np2m: Flush p2m rather than switching on nested invept

It's not entirely clear what "switching" means here. But I fail to
think of any other good alternatives for the patch's subject.

> 
> At the moment, nvmx_handle_invept() updates the current np2m just to
> flush it.  Instead introduce a function, np2m_flush_base(), which will
> look up the np2m base pointer and call p2m_flush_table() instead.
> 
> Unfortunately, since we don't know which p2m a given vcpu is using, we
> must flush all p2ms that share that base pointer.

My reasoning was the same:

INVEPT from L1 happens outside of L02 vCPU's context and currently it's
impossible (because of scheduling) to detect the exact np2m object that
needs to be flushed.

> 
> Convert p2m_flush_table() into p2m_flush_table_locked() in order not
> to release the p2m_lock after np2m_base check.
> 
> Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
> Signed-off-by: George Dunlap <george.dun...@citrix.com>
-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/9] x86/vvmx: Make updating shadow EPTP value more efficient

2017-10-02 Thread Sergey Dyasli
On Fri, 2017-09-29 at 16:56 +0100, Andrew Cooper wrote:
> On 29/09/17 16:01, George Dunlap wrote:
> > @@ -4203,13 +4197,17 @@ static void lbr_fixup(void)
> >  bdw_erratum_bdf14_fixup();
> >  }
> >  
> > -void vmx_vmenter_helper(const struct cpu_user_regs *regs)
> > +int vmx_vmenter_helper(const struct cpu_user_regs *regs)
> 
> What are the semantics of this call?  The result looks boolean, and
> indicates that the vmentry should be aborted?

Currently vmx_vmenter_helper() returns !0 if the vmentry must be
restarted.

> 
> >  {
> >  struct vcpu *curr = current;
> >  u32 new_asid, old_asid;
> >  struct hvm_vcpu_asid *p_asid;
> >  bool_t need_flush;
> >  
> > +/* Shadow EPTP can't be updated here because irqs are disabled */
> > + if ( nestedhvm_vcpu_in_guestmode(curr) && 
> > vcpu_nestedhvm(curr).stale_np2m )
> > + return 1;
> > +
> >  if ( curr->domain->arch.hvm_domain.pi_ops.do_resume )
> >  curr->domain->arch.hvm_domain.pi_ops.do_resume(curr);
> >  
> > @@ -4270,6 +4268,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs 
> > *regs)
> >  __vmwrite(GUEST_RIP,regs->rip);
> >  __vmwrite(GUEST_RSP,regs->rsp);
> >  __vmwrite(GUEST_RFLAGS, regs->rflags | X86_EFLAGS_MBS);
> > +
> > +return 0;
> >  }
> >  
> >  /*
> > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> > index 2f468e6ced..48e37158af 100644
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -1405,12 +1405,32 @@ static void virtual_vmexit(struct cpu_user_regs 
> > *regs)
> >  vmsucceed(regs);
> >  }
> >  
> > +static void nvmx_eptp_update(void)
> > +{
> 
> struct vcpu *curr = current; will most likely half the compiled size of
> this function.

Yes, passing a struct vcpu *v to nvmx_eptp_update() removes all
the additional:

mov%rsp,%rax
or $0x7fff,%rax

I wasn't aware of such behavior and will correct the usage of current
for all patches in v3.

> 
> > +if ( !nestedhvm_vcpu_in_guestmode(current) ||
> > +  vcpu_nestedhvm(current).nv_vmexit_pending ||
> > + !vcpu_nestedhvm(current).stale_np2m ||
> > + !nestedhvm_paging_mode_hap(current) )
> > +return;
> > +
> > +/*
> > + * Interrupts are enabled here, so we need to clear stale_np2m
> > + * before we do the vmwrite.  If we do it in the other order, an
> > + * and IPI comes in changing the shadow eptp after the vmwrite,
> > + * we'll complete the vmenter with a stale eptp value.
> > + */
> > +vcpu_nestedhvm(current).stale_np2m = false;
> > +__vmwrite(EPT_POINTER, get_shadow_eptp(current));
> > +}
> > +
> >  void nvmx_switch_guest(void)
> >  {
> >  struct vcpu *v = current;
> >  struct nestedvcpu *nvcpu = _nestedhvm(v);
> >  struct cpu_user_regs *regs = guest_cpu_user_regs();
> >  
> > +nvmx_eptp_update();
> > +
> >  /*
> >   * A pending IO emulation may still be not finished. In this case, no
> >   * virtual vmswitch is allowed. Or else, the following IO emulation 
> > will
> > diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
> > index 6c54773f1c..5cfa4b4aa4 100644
> > --- a/xen/include/asm-x86/hvm/vcpu.h
> > +++ b/xen/include/asm-x86/hvm/vcpu.h
> > @@ -115,6 +115,7 @@ struct nestedvcpu {
> >  
> >  bool_t nv_flushp2m; /* True, when p2m table must be flushed */
> >  struct p2m_domain *nv_p2m; /* used p2m table for this vcpu */
> > +bool stale_np2m; /* True when p2m_base in VMCX02 is no longer valid */
> 
> VMCx02 ? which helps distinguish the two parts of semantic information
> encoded there, and to avoid looking like we've gained a third acronym.

I like this suggestion. Will update comments and commit messages for all
patches in v3.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/9] x86/np2m: Simplify nestedhvm_hap_nested_page_fault

2017-10-02 Thread Sergey Dyasli
One comment below.

On Fri, 2017-09-29 at 16:01 +0100, George Dunlap wrote:
> There is a possibility for nested_p2m to became stale between
> nestedhvm_hap_nested_page_fault() and nestedhap_fix_p2m().  At the moment
> this is handled by detecting such a race inside nestedhap_fix_p2m() and
> special-casing it.
> 
> Instead, introduce p2m_get_nestedp2m_locked(), which will returned a
> still-locked p2m.  This allows us to call nestedhap_fix_p2m() with the
> lock held and remove the code detecting the special-case.
> 
> Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
> Signed-off-by: George Dunlap <george.dun...@citrix.com>
> ---
> v2:
> - Merged patch 9 and 10 ("x86/np2m: add p2m_get_nestedp2m_locked()"
>  and "x86/np2m: improve nestedhvm_hap_nested_page_fault()")
> - Updated commit message
> - Fix comment style in nestedhap_fix_p2m()
> 
> CC: Andrew Cooper <andrew.coop...@citrix.com>
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Jun Nakajima <jun.nakaj...@intel.com>
> CC: Kevin Tian <kevin.t...@intel.com>
> ---
>  xen/arch/x86/mm/hap/nested_hap.c | 31 +--
>  xen/arch/x86/mm/p2m.c| 12 +---
>  xen/include/asm-x86/p2m.h|  2 ++
>  3 files changed, 24 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/hap/nested_hap.c 
> b/xen/arch/x86/mm/hap/nested_hap.c
> index ed137fa784..844b32f702 100644
> --- a/xen/arch/x86/mm/hap/nested_hap.c
> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> @@ -101,28 +101,23 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain 
> *p2m,
>unsigned int page_order, p2m_type_t p2mt, p2m_access_t 
> p2ma)
>  {
>  int rc = 0;
> +unsigned long gfn, mask;
> +mfn_t mfn;
> +
>  ASSERT(p2m);
>  ASSERT(p2m->set_entry);
> +ASSERT(p2m_locked_by_me(p2m));
>  
> -p2m_lock(p2m);
> -
> -/* If this p2m table has been flushed or recycled under our feet, 
> - * leave it alone.  We'll pick up the right one as we try to 
> - * vmenter the guest. */
> -if ( p2m->np2m_base == nhvm_vcpu_p2m_base(v) )
> -{
> -unsigned long gfn, mask;
> -mfn_t mfn;
> -
> -/* If this is a superpage mapping, round down both addresses
> - * to the start of the superpage. */
> -mask = ~((1UL << page_order) - 1);
> +/* 
> + * If this is a superpage mapping, round down both addresses to
> + * the start of the superpage.
> + */
> +mask = ~((1UL << page_order) - 1);
>  
> -gfn = (L2_gpa >> PAGE_SHIFT) & mask;
> -mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
> +gfn = (L2_gpa >> PAGE_SHIFT) & mask;
> +mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
>  
> -rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
> -}
> +rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
>  
>  p2m_unlock(p2m);

I have the following fixup: move p2m_unlock() out of nestedhap_fix_p2m()
for balanced lock/unlock.

>  
> @@ -212,7 +207,6 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
> *L2_gpa,
>  uint8_t p2ma_21 = p2m_access_rwx;
>  
>  p2m = p2m_get_hostp2m(d); /* L0 p2m */
> -nested_p2m = p2m_get_nestedp2m(v);
>  
>  /* walk the L1 P2M table */
>  rv = nestedhap_walk_L1_p2m(v, *L2_gpa, _gpa, _order_21, _21,
> @@ -278,6 +272,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
> *L2_gpa,
>  p2ma_10 &= (p2m_access_t)p2ma_21;
>  
>  /* fix p2m_get_pagetable(nested_p2m) */
> +nested_p2m = p2m_get_nestedp2m_locked(v);
>  nestedhap_fix_p2m(v, nested_p2m, *L2_gpa, L0_gpa, page_order_20,
>  p2mt_10, p2ma_10);
>  
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index d3e602de22..aa3182dec6 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1813,7 +1813,7 @@ static void assign_np2m(struct vcpu *v, struct 
> p2m_domain *p2m)
>  }
>  
>  struct p2m_domain *
> -p2m_get_nestedp2m(struct vcpu *v)
> +p2m_get_nestedp2m_locked(struct vcpu *v)
>  {
>  struct nestedvcpu *nv = _nestedhvm(v);
>  struct domain *d = v->domain;
> @@ -1838,7 +1838,6 @@ p2m_get_nestedp2m(struct vcpu *v)
>  hvm_asid_flush_vcpu(v);
>  p2m->np2m_base = np2m_base;
>  assign_np2m(v, p2m);
> -p2m_unlock(p2m);
>  nestedp2m_unlock(d);
>  
>  return p2m;
> @@ -1854,12 +1853,19 @@ p2m_get_nestedp2m(struct vcpu *v)
>  p2m->np2m_base = np2m_base;
>  hvm_asid_flush_vcpu(v);
>  assign_np2m(v, p2m);
&g

Re: [Xen-devel] [PATCH 2/9] x86/np2m: Have invept flush all np2m entries with the same base pointer

2017-10-02 Thread Sergey Dyasli
On Fri, 2017-09-29 at 16:01 +0100, George Dunlap wrote:
> nvmx_handle_invept() updates current's np2m just to flush it.  This is
> not only wasteful, but ineffective: if several L2 vcpus share the same
> np2m base pointer, they all need to be flushed (not only the current
> one).

I don't follow this completely. L1 will use INVEPT on each vCPU that
shares the same np2m pointer. The main idea here was not to update
current's np2m just to flush it.

> 
> Introduce a new function, np2m_flush_base() which will flush all
> shadow p2m's that match a given base pointer.
> 
> Convert p2m_flush_table() into p2m_flush_table_locked() in order not
> to release the p2m_lock after np2m_base check.
> 
> Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
> Signed-off-by: George Dunlap <george.dun...@citrix.com>
> ---
> Changes since v1:
> - Combine patches 2 and 3 ("x86/np2m: add np2m_flush_base()" and
> "x86/vvmx: use np2m_flush_base() for INVEPT_SINGLE_CONTEXT")
> - Reword commit text
> 
-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 07/14] x86/vvmx: restart nested vmentry in case of stale_np2m

2017-09-29 Thread Sergey Dyasli
On Fri, 2017-09-29 at 11:53 +0100, George Dunlap wrote:
> On 09/04/2017 09:14 AM, Sergey Dyasli wrote:
> > If an IPI flushes vCPU's np2m object just before nested vmentry, there
> > will be a stale shadow EPTP value in VMCS02. Allow vmentry to be
> > restarted in such cases and add nvmx_eptp_update() to perform an update.
> > 
> > Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
> > ---
> >  xen/arch/x86/hvm/vmx/entry.S |  6 ++
> >  xen/arch/x86/hvm/vmx/vmx.c   |  8 +++-
> >  xen/arch/x86/hvm/vmx/vvmx.c  | 14 ++
> >  3 files changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
> > index 53eedc6363..9fb8f89220 100644
> > --- a/xen/arch/x86/hvm/vmx/entry.S
> > +++ b/xen/arch/x86/hvm/vmx/entry.S
> > @@ -79,6 +79,8 @@ UNLIKELY_END(realmode)
> >  
> >  mov  %rsp,%rdi
> >  call vmx_vmenter_helper
> > +cmp  $0,%eax
> > +jne .Lvmx_vmentry_restart
> >  mov  VCPU_hvm_guest_cr2(%rbx),%rax
> >  
> >  pop  %r15
> > @@ -117,6 +119,10 @@ ENTRY(vmx_asm_do_vmentry)
> >  GET_CURRENT(bx)
> >  jmp  .Lvmx_do_vmentry
> >  
> > +.Lvmx_vmentry_restart:
> > +sti
> > +jmp  .Lvmx_do_vmentry
> > +
> >  .Lvmx_goto_emulator:
> >  sti
> >  mov  %rsp,%rdi
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index f6da119c9f..06509590b7 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -4223,13 +4223,17 @@ static void lbr_fixup(void)
> >  bdw_erratum_bdf14_fixup();
> >  }
> >  
> > -void vmx_vmenter_helper(const struct cpu_user_regs *regs)
> > +int vmx_vmenter_helper(const struct cpu_user_regs *regs)
> >  {
> >  struct vcpu *curr = current;
> >  u32 new_asid, old_asid;
> >  struct hvm_vcpu_asid *p_asid;
> >  bool_t need_flush;
> >  
> > +/* Shadow EPTP can't be updated here because irqs are disabled */
> > + if ( nestedhvm_vcpu_in_guestmode(curr) && 
> > vcpu_nestedhvm(curr).stale_np2m )
> > + return 1;
> > +
> >  if ( curr->domain->arch.hvm_domain.pi_ops.do_resume )
> >  curr->domain->arch.hvm_domain.pi_ops.do_resume(curr);
> >  
> > @@ -4290,6 +4294,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs 
> > *regs)
> >  __vmwrite(GUEST_RIP,regs->rip);
> >  __vmwrite(GUEST_RSP,regs->rsp);
> >  __vmwrite(GUEST_RFLAGS, regs->rflags | X86_EFLAGS_MBS);
> > +
> > +return 0;
> >  }
> >  
> >  /*
> > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> > index ea2da14489..26ce349c76 100644
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -1405,12 +1405,26 @@ static void virtual_vmexit(struct cpu_user_regs 
> > *regs)
> >  vmsucceed(regs);
> >  }
> >  
> > +static void nvmx_eptp_update(void)
> > +{
> > +if ( !nestedhvm_vcpu_in_guestmode(current) ||
> > +  vcpu_nestedhvm(current).nv_vmexit_pending ||
> > + !vcpu_nestedhvm(current).stale_np2m ||
> > + !nestedhvm_paging_mode_hap(current) )
> > +return;
> > +
> > +__vmwrite(EPT_POINTER, get_shadow_eptp(current));
> > +vcpu_nestedhvm(current).stale_np2m = false;
> 
> Hmm, so interrupts are enabled here.  What happens if a flush IPI occurs
> between these two lines of code?  Won't we do the vmenter with a stale np2m?
> 
> It seems like we should clear stale_np2m first.  If an IPI occurs then,
> we'll end up re-executing the vmenter unnecessarily, but it's better to
> do that than to not re-execute it when we need to.

Good catch! Clearing of stale_np2m must indeed happen before updating
a shadow EPTP.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: avoid #GP for PV guest MSR accesses

2017-09-22 Thread Sergey Dyasli
On Fri, 2017-09-22 at 03:06 -0600, Jan Beulich wrote:
> Halfway recent Linux kernels probe MISC_FEATURES_ENABLES on all CPUs,
> leading to ugly recovered #GP fault messages with debug builds on older
> systems. We can do better, so introduce synthetic feature flags for
> both this and PLATFORM_INFO to avoid the rdmsr_safe() altogether.
> 
> The rdmsr_safe() uses for MISC_ENABLE are left in place as benign - it
> exists for all 64-bit capable Intel CPUs (see e.g. early_init_intel()).
> 
> Signed-off-by: Jan Beulich 

The intent of this patch (and the related "VMX: PLATFORM_INFO MSR is
r/o") is somewhat intersects with my series "Generic MSR policy:
infrastructure + cpuid_faulting". IMHO it's better to fix MSR-related
issues in the scope of the MSR policy work.

Also, I have one question below.

> 
> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -21,10 +21,19 @@ static bool __init probe_intel_cpuid_fau
>  {
>   uint64_t x;
>  
> - if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x) ||
> - !(x & MSR_PLATFORM_INFO_CPUID_FAULTING))
> + if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x))
>   return 0;
>  
> + setup_force_cpu_cap(X86_FEATURE_MSR_PLATFORM_INFO);
> +
> + if (!(x & MSR_PLATFORM_INFO_CPUID_FAULTING)) {
> + if (!rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, x))
> + setup_force_cpu_cap(X86_FEATURE_MSR_MISC_FEATURES);
> + return 0;
> + }
> +
> + setup_force_cpu_cap(X86_FEATURE_MSR_MISC_FEATURES);
> +
>   expected_levelling_cap |= LCAP_faulting;
>   levelling_caps |=  LCAP_faulting;
>   setup_force_cpu_cap(X86_FEATURE_CPUID_FAULTING);
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -941,8 +941,7 @@ static int read_msr(unsigned int reg, ui
>  return X86EMUL_OKAY;
>  
>  case MSR_INTEL_PLATFORM_INFO:
> -if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> - rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *val) )
> +if ( !boot_cpu_has(X86_FEATURE_MSR_PLATFORM_INFO) )
>  break;
>  *val = 0;
>  if ( this_cpu(cpuid_faulting_enabled) )
> @@ -950,8 +949,7 @@ static int read_msr(unsigned int reg, ui
>  return X86EMUL_OKAY;
>  
>  case MSR_INTEL_MISC_FEATURES_ENABLES:
> -if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> - rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
> +if ( !boot_cpu_has(X86_FEATURE_MSR_MISC_FEATURES) )
>  break;
>  *val = 0;
>  if ( curr->arch.cpuid_faulting )
> @@ -1147,15 +1145,13 @@ static int write_msr(unsigned int reg, u
>  return X86EMUL_OKAY;
>  
>  case MSR_INTEL_PLATFORM_INFO:
> -if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> - val || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) )
> +if ( !boot_cpu_has(X86_FEATURE_MSR_PLATFORM_INFO) || val )
>  break;
>  return X86EMUL_OKAY;

Why writes to MSR_INTEL_PLATFORM_INFO shouldn't produce #GP faults for
PV guests?

>  
>  case MSR_INTEL_MISC_FEATURES_ENABLES:
> -if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> - (val & ~MSR_MISC_FEATURES_CPUID_FAULTING) ||
> - rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, temp) )
> +if ( !boot_cpu_has(X86_FEATURE_MSR_MISC_FEATURES) ||
> + (val & ~MSR_MISC_FEATURES_CPUID_FAULTING) )
>  break;
>  if ( (val & MSR_MISC_FEATURES_CPUID_FAULTING) &&
>   !this_cpu(cpuid_faulting_enabled) )
> --- a/xen/include/asm-x86/cpufeatures.h
> +++ b/xen/include/asm-x86/cpufeatures.h
> @@ -22,3 +22,5 @@ XEN_CPUFEATURE(APERFMPERF,  (FSCAPIN
>  XEN_CPUFEATURE(MFENCE_RDTSC,(FSCAPINTS+0)*32+ 9) /* MFENCE synchronizes 
> RDTSC */
>  XEN_CPUFEATURE(XEN_SMEP,(FSCAPINTS+0)*32+10) /* SMEP gets used by 
> Xen itself */
>  XEN_CPUFEATURE(XEN_SMAP,(FSCAPINTS+0)*32+11) /* SMAP gets used by 
> Xen itself */
> +XEN_CPUFEATURE(MSR_PLATFORM_INFO, (FSCAPINTS+0)*32+12) /* PLATFORM_INFO MSR 
> present */
> +XEN_CPUFEATURE(MSR_MISC_FEATURES, (FSCAPINTS+0)*32+13) /* 
> MISC_FEATURES_ENABLES MSR present */
> 
> 
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1] x86/vvmx: add hvm_intsrc_vector support to nvmx_intr_intercept()

2017-09-13 Thread Sergey Dyasli
Under the following circumstances:

1. L1 doesn't enable PAUSE exiting or PAUSE-loop exiting controls
2. L2 executes PAUSE in a loop with RFLAGS.IE == 0

L1's PV IPI through event channel will never reach the target L1's vCPU
which runs L2 because nvmx_intr_intercept() doesn't know about
hvm_intsrc_vector. This leads to infinite L2 loop without nested
vmexits and can cause L1 to hang.

The issue is easily reproduced with Qemu/KVM on CentOS-7-1611 as L1
and an L2 guest with SMP.

Fix nvmx_intr_intercept() by injecting hvm_intsrc_vector irq into L1
which will cause nested vmexit.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/intr.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index e1d0190ca9..4c0f1c8f71 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -188,13 +188,13 @@ static int nvmx_intr_intercept(struct vcpu *v, struct 
hvm_intack intack)
 
 if ( nestedhvm_vcpu_in_guestmode(v) )
 {
+ctrl = get_vvmcs(v, PIN_BASED_VM_EXEC_CONTROL);
+if ( !(ctrl & PIN_BASED_EXT_INTR_MASK) )
+return 0;
+
 if ( intack.source == hvm_intsrc_pic ||
  intack.source == hvm_intsrc_lapic )
 {
-ctrl = get_vvmcs(v, PIN_BASED_VM_EXEC_CONTROL);
-if ( !(ctrl & PIN_BASED_EXT_INTR_MASK) )
-return 0;
-
 vmx_inject_extint(intack.vector, intack.source);
 
 ctrl = get_vvmcs(v, VM_EXIT_CONTROLS);
@@ -213,6 +213,11 @@ static int nvmx_intr_intercept(struct vcpu *v, struct 
hvm_intack intack)
 
 return 1;
 }
+else if ( intack.source == hvm_intsrc_vector )
+{
+vmx_inject_extint(intack.vector, intack.source);
+return 1;
+}
 }
 
 return 0;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 0/5] Generic MSR policy: infrastructure + cpuid_faulting

2017-09-11 Thread Sergey Dyasli
Ping?

On Wed, 2017-08-30 at 11:34 +0100, Sergey Dyasli wrote:
> Currently there are the following issues with handling guest's RD/WRMSR
> in Xen:
> 
> 1. There is no way to configure which MSRs a guest can and can't access.
>And if there is no MSR handler in Xen for a particular MSR then
>the default behavior is just horrible:
> 
> RDMSR: rdmsr_safe(msr, *msr_content) /* returns a H/W value */
> WRMSR: return X86EMUL_OKAY;  /* write is silently discarded */
> 
> 2. There are too many handlers. Example for RDMSR:
> priv_op_read_msr()
> hvm_msr_read_intercept()
> vmce_rdmsr()
> svm_msr_read_intercept()
> vmx_msr_read_intercept()
> nvmx_msr_read_intercept()
> rdmsr_viridian_regs()
> ...
> 
> This series tries to address the above issues in the following way.
> 2 types of MSR policy objects are introduced:
> 
> 1. Per-Domain policy (struct msr_domain_policy) -- for shared MSRs
> 2. Per-vCPU policy (struct msr_vcpu_policy) -- for unique MSRs
> 
> Each domain and each vCPU inside a domain will now have an associated
> MSR policy object. Contents of these structures are defined during
> domain creation. For now, it's just a copy of either HVM_MAX or PV_MAX
> policy, depending on a guest's type. But in the future it should be
> possible to control the availability and values in guest's MSRs from
> a toolstack. However, any MSR manipulations must be done together with
> CPUID ones.
> 
> Once policy objects are in place, it becomes easy to introduce unified
> guest's RDMSR and WRMSR handlers. They work directly with MSR policy
> objects since all the state of guest's MSRs is contained there.
> 
> Main idea of having MSR policy is to define a set and contents of MSRs
> that a guest sees. All other MSRs should be inaccessible (access would
> generate a GP fault). And this MSR information should also be sent in
> the migration stream.
> 
> Since it's impossible to convert all MSRs to use the new infrastructure
> right away, this series starts with 2 MSRs responsible for CPUID
> faulting:
> 
> 1. MSR_INTEL_PLATFORM_INFO
> 2. MSR_INTEL_MISC_FEATURES_ENABLES
> 
> My previous VMX MSR policy patch set will be rebased on top of this
> generic MSR infrastructure after it's merged.
> 
> Sergey Dyasli (5):
>   x86/msr: introduce struct msr_domain_policy
>   x86/msr: introduce struct msr_vcpu_policy
>   x86: replace arch_vcpu::cpuid_faulting with msr_vcpu_policy
>   x86/msr: introduce guest_rdmsr()
>   x86/msr: introduce guest_wrmsr()
> 
>  xen/arch/x86/Makefile  |   1 +
>  xen/arch/x86/cpu/intel.c   |   3 +-
>  xen/arch/x86/domain.c  |  24 -
>  xen/arch/x86/hvm/hvm.c |  18 +++-
>  xen/arch/x86/hvm/vmx/vmx.c |  31 ---
>  xen/arch/x86/msr.c | 203 
> +
>  xen/arch/x86/pv/emul-inv-op.c  |   4 +-
>  xen/arch/x86/pv/emul-priv-op.c |  43 ++---
>  xen/arch/x86/setup.c   |   1 +
>  xen/include/asm-x86/domain.h   |   8 +-
>  xen/include/asm-x86/msr.h  |  33 +++
>  11 files changed, 292 insertions(+), 77 deletions(-)
>  create mode 100644 xen/arch/x86/msr.c
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 14/14] x86/vvmx: remove EPTP write from ept_handle_violation()

2017-09-04 Thread Sergey Dyasli
Now there is no need to update shadow EPTP after handling L2 EPT
violation since all EPTP updates are handled by nvmx_eptp_update().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 06509590b7..6a2553cc58 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3269,12 +3269,6 @@ static void ept_handle_violation(ept_qual_t q, paddr_t 
gpa)
 case 0: // Unhandled L1 EPT violation
 break;
 case 1: // This violation is handled completly
-/*Current nested EPT maybe flushed by other vcpus, so need
- * to re-set its shadow EPTP pointer.
- */
-if ( nestedhvm_vcpu_in_guestmode(current) &&
-nestedhvm_paging_mode_hap(current ) )
-__vmwrite(EPT_POINTER, get_shadow_eptp(current));
 return;
 case -1:// This vioaltion should be injected to L1 VMM
 vcpu_nestedhvm(current).nv_vmexit_pending = 1;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 13/14] x86/np2m: add break to np2m_flush_eptp()

2017-09-04 Thread Sergey Dyasli
Now that np2m sharing is implemented, there can be only one np2m object
with the same np2m_base. Break from loop if the required np2m was found
during np2m_flush_eptp().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 4 
 xen/include/asm-x86/p2m.h | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index f783f25fa8..f11355b0d1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1792,7 +1792,11 @@ void np2m_flush_base(struct vcpu *v, unsigned long 
np2m_base)
 p2m = d->arch.nested_p2m[i];
 p2m_lock(p2m);
 if ( p2m->np2m_base == np2m_base )
+{
 p2m_flush_table_locked(p2m);
+p2m_unlock(p2m);
+break;
+}
 p2m_unlock(p2m);
 }
 nestedp2m_unlock(d);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 790635ec0b..a17e589c07 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -786,7 +786,7 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa);
 void p2m_flush(struct vcpu *v, struct p2m_domain *p2m);
 /* Flushes all nested p2m tables */
 void p2m_flush_nestedp2m(struct domain *d);
-/* Flushes all np2m objects with the specified np2m_base */
+/* Flushes the np2m specified by np2m_base (if it exists) */
 void np2m_flush_base(struct vcpu *v, unsigned long np2m_base);
 
 void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 11/14] x86/np2m: implement sharing of np2m between vCPUs

2017-09-04 Thread Sergey Dyasli
Modify p2m_get_nestedp2m() to allow sharing a np2m between multiple
vcpus with the same np2m_base (L1 np2m_base value in VMCX12).

np2m_schedule() callbacks are added to context_switch() as well as
pseudo schedule-out is performed during vvmx's virtual_vmexit().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/domain.c   |  2 ++
 xen/arch/x86/hvm/vmx/vvmx.c |  4 
 xen/arch/x86/mm/p2m.c   | 29 +++--
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dbddc536d3..c8c26dad4e 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1647,6 +1647,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 {
 _update_runstate_area(prev);
 vpmu_switch_from(prev);
+np2m_schedule(NP2M_SCHEDLE_OUT);
 }
 
 if ( is_hvm_domain(prevd) && !list_empty(>arch.hvm_vcpu.tm_list) )
@@ -1695,6 +1696,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
 /* Must be done with interrupts enabled */
 vpmu_switch_to(next);
+np2m_schedule(NP2M_SCHEDLE_IN);
 }
 
 /* Ensure that the vcpu has an up-to-date time base. */
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 26ce349c76..49733af62b 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1201,6 +1201,7 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
 
 /* Setup virtual ETP for L2 guest*/
 if ( nestedhvm_paging_mode_hap(v) )
+/* This will setup the initial np2m for the nested vCPU */
 __vmwrite(EPT_POINTER, get_shadow_eptp(v));
 else
 __vmwrite(EPT_POINTER, get_host_eptp(v));
@@ -1367,6 +1368,9 @@ static void virtual_vmexit(struct cpu_user_regs *regs)
  !(v->arch.hvm_vcpu.guest_efer & EFER_LMA) )
 shadow_to_vvmcs_bulk(v, ARRAY_SIZE(gpdpte_fields), gpdpte_fields);
 
+/* This will clear current pCPU bit in p2m->dirty_cpumask */
+np2m_schedule(NP2M_SCHEDLE_OUT);
+
 vmx_vmcs_switch(v->arch.hvm_vmx.vmcs_pa, nvcpu->nv_n1vmcx_pa);
 
 nestedhvm_vcpu_exit_guestmode(v);
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 15dedef33b..d6a474fa20 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1825,6 +1825,7 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 struct domain *d = v->domain;
 struct p2m_domain *p2m;
 uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
+unsigned int i;
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
 np2m_base &= ~(0xfffull);
@@ -1838,10 +1839,34 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 if ( p2m ) 
 {
 p2m_lock(p2m);
-if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
+if ( p2m->np2m_base == np2m_base )
 {
-if ( p2m->np2m_base == P2M_BASE_EADDR )
+/* Check if np2m was flushed just before the lock */
+if ( nv->np2m_generation != p2m->np2m_generation )
 nvcpu_flush(v);
+/* np2m is up-to-date */
+p2m->np2m_base = np2m_base;
+assign_np2m(v, p2m);
+nestedp2m_unlock(d);
+
+return p2m;
+}
+else if ( p2m->np2m_base != P2M_BASE_EADDR )
+{
+/* vCPU is switching from some other valid np2m */
+cpumask_clear_cpu(v->processor, p2m->dirty_cpumask);
+}
+p2m_unlock(p2m);
+}
+
+/* Share a np2m if possible */
+for ( i = 0; i < MAX_NESTEDP2M; i++ )
+{
+p2m = d->arch.nested_p2m[i];
+p2m_lock(p2m);
+if ( p2m->np2m_base == np2m_base )
+{
+nvcpu_flush(v);
 p2m->np2m_base = np2m_base;
 assign_np2m(v, p2m);
 nestedp2m_unlock(d);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 12/14] x86/np2m: refactor p2m_get_nestedp2m_locked()

2017-09-04 Thread Sergey Dyasli
Remove some code duplication.

Suggested-by: George Dunlap <george.dun...@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d6a474fa20..f783f25fa8 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1826,6 +1826,7 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 struct p2m_domain *p2m;
 uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 unsigned int i;
+bool needs_flush = true;
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
 np2m_base &= ~(0xfffull);
@@ -1842,14 +1843,10 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 if ( p2m->np2m_base == np2m_base )
 {
 /* Check if np2m was flushed just before the lock */
-if ( nv->np2m_generation != p2m->np2m_generation )
-nvcpu_flush(v);
+if ( nv->np2m_generation == p2m->np2m_generation )
+needs_flush = false;
 /* np2m is up-to-date */
-p2m->np2m_base = np2m_base;
-assign_np2m(v, p2m);
-nestedp2m_unlock(d);
-
-return p2m;
+goto found;
 }
 else if ( p2m->np2m_base != P2M_BASE_EADDR )
 {
@@ -1864,15 +1861,10 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 {
 p2m = d->arch.nested_p2m[i];
 p2m_lock(p2m);
+
 if ( p2m->np2m_base == np2m_base )
-{
-nvcpu_flush(v);
-p2m->np2m_base = np2m_base;
-assign_np2m(v, p2m);
-nestedp2m_unlock(d);
+goto found;
 
-return p2m;
-}
 p2m_unlock(p2m);
 }
 
@@ -1881,8 +1873,11 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 p2m = p2m_getlru_nestedp2m(d, NULL);
 p2m_flush_table(p2m);
 p2m_lock(p2m);
+
+ found:
+if ( needs_flush )
+nvcpu_flush(v);
 p2m->np2m_base = np2m_base;
-nvcpu_flush(v);
 assign_np2m(v, p2m);
 nestedp2m_unlock(d);
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 10/14] x86/np2m: improve nestedhvm_hap_nested_page_fault()

2017-09-04 Thread Sergey Dyasli
There is a possibility for nested_p2m to became stale between
nestedhvm_hap_nested_page_fault() and nestedhap_fix_p2m(). Simply
use p2m_get_nestedp2m_lock() to guarantee that correct np2m is used.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/hap/nested_hap.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index ed137fa784..96afe632b5 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -101,28 +101,21 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
   unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma)
 {
 int rc = 0;
+unsigned long gfn, mask;
+mfn_t mfn;
+
 ASSERT(p2m);
 ASSERT(p2m->set_entry);
+ASSERT(p2m_locked_by_me(p2m));
 
-p2m_lock(p2m);
-
-/* If this p2m table has been flushed or recycled under our feet, 
- * leave it alone.  We'll pick up the right one as we try to 
- * vmenter the guest. */
-if ( p2m->np2m_base == nhvm_vcpu_p2m_base(v) )
-{
-unsigned long gfn, mask;
-mfn_t mfn;
-
-/* If this is a superpage mapping, round down both addresses
- * to the start of the superpage. */
-mask = ~((1UL << page_order) - 1);
+/* If this is a superpage mapping, round down both addresses
+ * to the start of the superpage. */
+mask = ~((1UL << page_order) - 1);
 
-gfn = (L2_gpa >> PAGE_SHIFT) & mask;
-mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
+gfn = (L2_gpa >> PAGE_SHIFT) & mask;
+mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
 
-rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
-}
+rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
 
 p2m_unlock(p2m);
 
@@ -212,7 +205,6 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
*L2_gpa,
 uint8_t p2ma_21 = p2m_access_rwx;
 
 p2m = p2m_get_hostp2m(d); /* L0 p2m */
-nested_p2m = p2m_get_nestedp2m(v);
 
 /* walk the L1 P2M table */
 rv = nestedhap_walk_L1_p2m(v, *L2_gpa, _gpa, _order_21, _21,
@@ -278,6 +270,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
*L2_gpa,
 p2ma_10 &= (p2m_access_t)p2ma_21;
 
 /* fix p2m_get_pagetable(nested_p2m) */
+nested_p2m = p2m_get_nestedp2m_locked(v);
 nestedhap_fix_p2m(v, nested_p2m, *L2_gpa, L0_gpa, page_order_20,
 p2mt_10, p2ma_10);
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 03/14] x86/vvmx: use np2m_flush_base() for INVEPT_SINGLE_CONTEXT

2017-09-04 Thread Sergey Dyasli
nvmx_handle_invept() updates current's np2m just to flush it. Instead,
use the new np2m_flush_base() directly for this purpose.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index e2361a1394..3c5f560aec 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1910,12 +1910,7 @@ int nvmx_handle_invept(struct cpu_user_regs *regs)
 {
 case INVEPT_SINGLE_CONTEXT:
 {
-struct p2m_domain *p2m = p2m_get_nestedp2m(current, eptp);
-if ( p2m )
-{
-p2m_flush(current, p2m);
-ept_sync_domain(p2m);
-}
+np2m_flush_base(current, eptp);
 break;
 }
 case INVEPT_ALL_CONTEXT:
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 00/14] Nested p2m: allow sharing between vCPUs

2017-09-04 Thread Sergey Dyasli
Nested p2m (shadow EPT) is an object that stores memory address
translations from L2 GPA directly to L0 HPA. This is achieved by
combining together L1 EPT with L0 EPT during L2 EPT violations.

In the usual case, L1 uses the same EPTP value in VMCS12 for all vCPUs
of a L2 guest. But unfortunately, in current Xen's implementation, each
vCPU has its own n2pm object which cannot be shared with other vCPUs.
This leads to the following issues if a nested guest has SMP:

1. There will be multiple np2m objects (1 per nested vCPU) with
   the same np2m_base (L1 EPTP value in VMCS12).

2. Same EPT violations will be processed independently by each vCPU.

3. Since MAX_NESTEDP2M is defined as 10, if a domain has more than
   10 nested vCPUs, performance will be extremely degraded due to
   constant np2m LRU list thrashing and np2m flushing.

This patch series makes it possible to share one np2m object between
different vCPUs that have the same np2m_base. Sharing of np2m objects
improves scalability of a domain from 10 nested vCPUs to 10 nested
guests (with arbitrary number of vCPUs per guest).

RFC --> v1:
- Some commit messages are updated based on George's comments
- Replaced VMX's terminology in common code with HVM's one
- Patch "x86/vvmx: add stale_eptp flag" is split into
  "x86/np2m: add stale_np2m flag" and
  "x86/vvmx: restart nested vmentry in case of stale_np2m"
- Added "x86/np2m: refactor p2m_get_nestedp2m_locked()" patch
- I've done some light nested SVM testing and fixed 1 regression
  (see patch #4)

Sergey Dyasli (14):
  x86/np2m: refactor p2m_get_nestedp2m()
  x86/np2m: add np2m_flush_base()
  x86/vvmx: use np2m_flush_base() for INVEPT_SINGLE_CONTEXT
  x86/np2m: remove np2m_base from p2m_get_nestedp2m()
  x86/np2m: add np2m_generation
  x86/np2m: add stale_np2m flag
  x86/vvmx: restart nested vmentry in case of stale_np2m
  x86/np2m: add np2m_schedule()
  x86/np2m: add p2m_get_nestedp2m_locked()
  x86/np2m: improve nestedhvm_hap_nested_page_fault()
  x86/np2m: implement sharing of np2m between vCPUs
  x86/np2m: refactor p2m_get_nestedp2m_locked()
  x86/np2m: add break to np2m_flush_eptp()
  x86/vvmx: remove EPTP write from ept_handle_violation()

 xen/arch/x86/domain.c|   2 +
 xen/arch/x86/hvm/nestedhvm.c |   3 +
 xen/arch/x86/hvm/svm/nestedsvm.c |   6 +-
 xen/arch/x86/hvm/vmx/entry.S |   6 ++
 xen/arch/x86/hvm/vmx/vmx.c   |  14 ++--
 xen/arch/x86/hvm/vmx/vvmx.c  |  28 +--
 xen/arch/x86/mm/hap/nested_hap.c |  29 +++
 xen/arch/x86/mm/p2m.c| 174 ---
 xen/include/asm-x86/hvm/vcpu.h   |   2 +
 xen/include/asm-x86/p2m.h|  17 +++-
 10 files changed, 211 insertions(+), 70 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 02/14] x86/np2m: add np2m_flush_base()

2017-09-04 Thread Sergey Dyasli
The new function finds all np2m objects with the specified np2m_base
and flushes them.

Convert p2m_flush_table() into p2m_flush_table_locked() in order not to
release the p2m_lock after np2m_base check.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
RFC --> v1:
- p2m_unlock(p2m) is moved from p2m_flush_table_locked() to
  p2m_flush_table() for balanced lock/unlock
- np2m_flush_eptp() is renamed to np2m_flush_base()

 xen/arch/x86/mm/p2m.c | 35 +--
 xen/include/asm-x86/p2m.h |  2 ++
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b8c8bba421..94a42400ad 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1708,15 +1708,14 @@ p2m_getlru_nestedp2m(struct domain *d, struct 
p2m_domain *p2m)
 return p2m;
 }
 
-/* Reset this p2m table to be empty */
 static void
-p2m_flush_table(struct p2m_domain *p2m)
+p2m_flush_table_locked(struct p2m_domain *p2m)
 {
 struct page_info *top, *pg;
 struct domain *d = p2m->domain;
 mfn_t mfn;
 
-p2m_lock(p2m);
+ASSERT(p2m_locked_by_me(p2m));
 
 /*
  * "Host" p2m tables can have shared entries  that need a bit more care
@@ -1729,10 +1728,7 @@ p2m_flush_table(struct p2m_domain *p2m)
 
 /* No need to flush if it's already empty */
 if ( p2m_is_nestedp2m(p2m) && p2m->np2m_base == P2M_BASE_EADDR )
-{
-p2m_unlock(p2m);
 return;
-}
 
 /* This is no longer a valid nested p2m for any address space */
 p2m->np2m_base = P2M_BASE_EADDR;
@@ -1752,7 +1748,14 @@ p2m_flush_table(struct p2m_domain *p2m)
 d->arch.paging.free_page(d, pg);
 }
 page_list_add(top, >pages);
+}
 
+/* Reset this p2m table to be empty */
+static void
+p2m_flush_table(struct p2m_domain *p2m)
+{
+p2m_lock(p2m);
+p2m_flush_table_locked(p2m);
 p2m_unlock(p2m);
 }
 
@@ -1773,6 +1776,26 @@ p2m_flush_nestedp2m(struct domain *d)
 p2m_flush_table(d->arch.nested_p2m[i]);
 }
 
+void np2m_flush_base(struct vcpu *v, unsigned long np2m_base)
+{
+struct domain *d = v->domain;
+struct p2m_domain *p2m;
+unsigned int i;
+
+np2m_base &= ~(0xfffull);
+
+nestedp2m_lock(d);
+for ( i = 0; i < MAX_NESTEDP2M; i++ )
+{
+p2m = d->arch.nested_p2m[i];
+p2m_lock(p2m);
+if ( p2m->np2m_base == np2m_base )
+p2m_flush_table_locked(p2m);
+p2m_unlock(p2m);
+}
+nestedp2m_unlock(d);
+}
+
 static void assign_np2m(struct vcpu *v, struct p2m_domain *p2m)
 {
 struct nestedvcpu *nv = _nestedhvm(v);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 9086bb35dc..cfb00591cd 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -779,6 +779,8 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa);
 void p2m_flush(struct vcpu *v, struct p2m_domain *p2m);
 /* Flushes all nested p2m tables */
 void p2m_flush_nestedp2m(struct domain *d);
+/* Flushes all np2m objects with the specified np2m_base */
+void np2m_flush_base(struct vcpu *v, unsigned long np2m_base);
 
 void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
 l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 04/14] x86/np2m: remove np2m_base from p2m_get_nestedp2m()

2017-09-04 Thread Sergey Dyasli
Remove np2m_base parameter as it should always match the value of
np2m_base in VMCX12.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
RFC --> v1:
- Nested SVM: added early update of ns_vmcb_hostcr3

 xen/arch/x86/hvm/svm/nestedsvm.c | 6 +-
 xen/arch/x86/hvm/vmx/vvmx.c  | 3 +--
 xen/arch/x86/mm/hap/nested_hap.c | 2 +-
 xen/arch/x86/mm/p2m.c| 8 
 xen/include/asm-x86/p2m.h| 5 ++---
 5 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 8fd9c23a02..629d5ea497 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -411,7 +411,11 @@ static void nestedsvm_vmcb_set_nestedp2m(struct vcpu *v,
 ASSERT(v != NULL);
 ASSERT(vvmcb != NULL);
 ASSERT(n2vmcb != NULL);
-p2m = p2m_get_nestedp2m(v, vvmcb->_h_cr3);
+
+/* This will allow nsvm_vcpu_hostcr3() to return correct np2m_base */
+vcpu_nestedsvm(v).ns_vmcb_hostcr3 = vvmcb->_h_cr3;
+
+p2m = p2m_get_nestedp2m(v);
 n2vmcb->_h_cr3 = pagetable_get_paddr(p2m_get_pagetable(p2m));
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 3c5f560aec..ea2da14489 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1109,8 +1109,7 @@ static void load_shadow_guest_state(struct vcpu *v)
 
 uint64_t get_shadow_eptp(struct vcpu *v)
 {
-uint64_t np2m_base = nvmx_vcpu_eptp_base(v);
-struct p2m_domain *p2m = p2m_get_nestedp2m(v, np2m_base);
+struct p2m_domain *p2m = p2m_get_nestedp2m(v);
 struct ept_data *ept = >ept;
 
 ept->mfn = pagetable_get_pfn(p2m_get_pagetable(p2m));
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 162afed46b..ed137fa784 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -212,7 +212,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
*L2_gpa,
 uint8_t p2ma_21 = p2m_access_rwx;
 
 p2m = p2m_get_hostp2m(d); /* L0 p2m */
-nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
+nested_p2m = p2m_get_nestedp2m(v);
 
 /* walk the L1 P2M table */
 rv = nestedhap_walk_L1_p2m(v, *L2_gpa, _gpa, _order_21, _21,
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 94a42400ad..b735950349 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1810,11 +1810,12 @@ static void assign_np2m(struct vcpu *v, struct 
p2m_domain *p2m)
 }
 
 struct p2m_domain *
-p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
+p2m_get_nestedp2m(struct vcpu *v)
 {
 struct nestedvcpu *nv = _nestedhvm(v);
 struct domain *d = v->domain;
 struct p2m_domain *p2m;
+uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
 np2m_base &= ~(0xfffull);
@@ -1862,7 +1863,7 @@ p2m_get_p2m(struct vcpu *v)
 if (!nestedhvm_is_n2(v))
 return p2m_get_hostp2m(v->domain);
 
-return p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
+return p2m_get_nestedp2m(v);
 }
 
 unsigned long paging_gva_to_gfn(struct vcpu *v,
@@ -1877,13 +1878,12 @@ unsigned long paging_gva_to_gfn(struct vcpu *v,
 unsigned long l2_gfn, l1_gfn;
 struct p2m_domain *p2m;
 const struct paging_mode *mode;
-uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 uint8_t l1_p2ma;
 unsigned int l1_page_order;
 int rv;
 
 /* translate l2 guest va into l2 guest gfn */
-p2m = p2m_get_nestedp2m(v, np2m_base);
+p2m = p2m_get_nestedp2m(v);
 mode = paging_get_nestedmode(v);
 l2_gfn = mode->gva_to_gfn(v, p2m, va, pfec);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index cfb00591cd..1d17fd5f97 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -360,10 +360,9 @@ struct p2m_domain {
 #define p2m_get_hostp2m(d)  ((d)->arch.p2m)
 
 /*
- * Assigns an np2m with the specified np2m_base to the specified vCPU
- * and returns that np2m.
+ * Updates vCPU's n2pm to match its np2m_base in VMCX12 and returns that np2m.
  */
-struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
+struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v);
 
 /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m().
  * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m().
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 09/14] x86/np2m: add p2m_get_nestedp2m_locked()

2017-09-04 Thread Sergey Dyasli
The new function returns still write-locked np2m.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 12 +---
 xen/include/asm-x86/p2m.h |  2 ++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index e5d2fed361..15dedef33b 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1819,7 +1819,7 @@ static void nvcpu_flush(struct vcpu *v)
 }
 
 struct p2m_domain *
-p2m_get_nestedp2m(struct vcpu *v)
+p2m_get_nestedp2m_locked(struct vcpu *v)
 {
 struct nestedvcpu *nv = _nestedhvm(v);
 struct domain *d = v->domain;
@@ -1844,7 +1844,6 @@ p2m_get_nestedp2m(struct vcpu *v)
 nvcpu_flush(v);
 p2m->np2m_base = np2m_base;
 assign_np2m(v, p2m);
-p2m_unlock(p2m);
 nestedp2m_unlock(d);
 
 return p2m;
@@ -1860,12 +1859,19 @@ p2m_get_nestedp2m(struct vcpu *v)
 p2m->np2m_base = np2m_base;
 nvcpu_flush(v);
 assign_np2m(v, p2m);
-p2m_unlock(p2m);
 nestedp2m_unlock(d);
 
 return p2m;
 }
 
+struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v)
+{
+struct p2m_domain *p2m = p2m_get_nestedp2m_locked(v);
+p2m_unlock(p2m);
+
+return p2m;
+}
+
 struct p2m_domain *
 p2m_get_p2m(struct vcpu *v)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index f873dc4fd9..790635ec0b 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -364,6 +364,8 @@ struct p2m_domain {
  * Updates vCPU's n2pm to match its np2m_base in VMCX12 and returns that np2m.
  */
 struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v);
+/* Similar to the above except that returned p2m is still write-locked */
+struct p2m_domain *p2m_get_nestedp2m_locked(struct vcpu *v);
 
 /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m().
  * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m().
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 06/14] x86/np2m: add stale_np2m flag

2017-09-04 Thread Sergey Dyasli
The new element will indicate if update of a shadow p2m_base is needed
prior to vmentry. Update is required if a nested vcpu gets a new np2m
or if its np2m was flushed by an IPI.

Add nvcpu_flush() helper function.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/nestedhvm.c   |  2 ++
 xen/arch/x86/mm/p2m.c  | 10 --
 xen/include/asm-x86/hvm/vcpu.h |  1 +
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c
index 32b8acca6a..5b012568c4 100644
--- a/xen/arch/x86/hvm/nestedhvm.c
+++ b/xen/arch/x86/hvm/nestedhvm.c
@@ -57,6 +57,7 @@ nestedhvm_vcpu_reset(struct vcpu *v)
 nv->nv_flushp2m = 0;
 nv->nv_p2m = NULL;
 nv->np2m_generation = 0;
+nv->stale_np2m = false;
 
 hvm_asid_flush_vcpu_asid(>nv_n2asid);
 
@@ -108,6 +109,7 @@ nestedhvm_flushtlb_ipi(void *info)
  */
 hvm_asid_flush_core();
 vcpu_nestedhvm(v).nv_p2m = NULL;
+vcpu_nestedhvm(v).stale_np2m = true;
 }
 
 void
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 2999b858e4..053df0c9aa 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1812,6 +1812,12 @@ static void assign_np2m(struct vcpu *v, struct 
p2m_domain *p2m)
 cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
 }
 
+static void nvcpu_flush(struct vcpu *v)
+{
+hvm_asid_flush_vcpu(v);
+vcpu_nestedhvm(v).stale_np2m = true;
+}
+
 struct p2m_domain *
 p2m_get_nestedp2m(struct vcpu *v)
 {
@@ -1835,7 +1841,7 @@ p2m_get_nestedp2m(struct vcpu *v)
 if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
 {
 if ( p2m->np2m_base == P2M_BASE_EADDR )
-hvm_asid_flush_vcpu(v);
+nvcpu_flush(v);
 p2m->np2m_base = np2m_base;
 assign_np2m(v, p2m);
 p2m_unlock(p2m);
@@ -1852,7 +1858,7 @@ p2m_get_nestedp2m(struct vcpu *v)
 p2m_flush_table(p2m);
 p2m_lock(p2m);
 p2m->np2m_base = np2m_base;
-hvm_asid_flush_vcpu(v);
+nvcpu_flush(v);
 assign_np2m(v, p2m);
 p2m_unlock(p2m);
 nestedp2m_unlock(d);
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 91651581db..16af97942f 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -116,6 +116,7 @@ struct nestedvcpu {
 bool_t nv_flushp2m; /* True, when p2m table must be flushed */
 struct p2m_domain *nv_p2m; /* used p2m table for this vcpu */
 uint64_t np2m_generation;
+bool stale_np2m; /* True when p2m_base in VMCX02 is no longer valid */
 
 struct hvm_vcpu_asid nv_n2asid;
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 05/14] x86/np2m: add np2m_generation

2017-09-04 Thread Sergey Dyasli
Add np2m_generation element to both p2m_domain and nestedvcpu.

np2m's generation will be incremented each time the np2m is flushed.
This will allow to detect if a nested vcpu has the stale np2m.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/nestedhvm.c   | 1 +
 xen/arch/x86/mm/p2m.c  | 3 +++
 xen/include/asm-x86/hvm/vcpu.h | 1 +
 xen/include/asm-x86/p2m.h  | 1 +
 4 files changed, 6 insertions(+)

diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c
index f2f7469d86..32b8acca6a 100644
--- a/xen/arch/x86/hvm/nestedhvm.c
+++ b/xen/arch/x86/hvm/nestedhvm.c
@@ -56,6 +56,7 @@ nestedhvm_vcpu_reset(struct vcpu *v)
 nv->nv_vvmcxaddr = INVALID_PADDR;
 nv->nv_flushp2m = 0;
 nv->nv_p2m = NULL;
+nv->np2m_generation = 0;
 
 hvm_asid_flush_vcpu_asid(>nv_n2asid);
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b735950349..2999b858e4 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -73,6 +73,7 @@ static int p2m_initialise(struct domain *d, struct p2m_domain 
*p2m)
 p2m->p2m_class = p2m_host;
 
 p2m->np2m_base = P2M_BASE_EADDR;
+p2m->np2m_generation = 0;
 
 for ( i = 0; i < ARRAY_SIZE(p2m->pod.mrp.list); ++i )
 p2m->pod.mrp.list[i] = gfn_x(INVALID_GFN);
@@ -1732,6 +1733,7 @@ p2m_flush_table_locked(struct p2m_domain *p2m)
 
 /* This is no longer a valid nested p2m for any address space */
 p2m->np2m_base = P2M_BASE_EADDR;
+p2m->np2m_generation++;
 
 /* Make sure nobody else is using this p2m table */
 nestedhvm_vmcx_flushtlb(p2m);
@@ -1806,6 +1808,7 @@ static void assign_np2m(struct vcpu *v, struct p2m_domain 
*p2m)
 
 nv->nv_flushp2m = 0;
 nv->nv_p2m = p2m;
+nv->np2m_generation = p2m->np2m_generation;
 cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
 }
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 6c54773f1c..91651581db 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -115,6 +115,7 @@ struct nestedvcpu {
 
 bool_t nv_flushp2m; /* True, when p2m table must be flushed */
 struct p2m_domain *nv_p2m; /* used p2m table for this vcpu */
+uint64_t np2m_generation;
 
 struct hvm_vcpu_asid nv_n2asid;
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 1d17fd5f97..1a7002cbcd 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -209,6 +209,7 @@ struct p2m_domain {
  * to set it to any other value. */
 #define P2M_BASE_EADDR (~0ULL)
 uint64_t   np2m_base;
+uint64_t   np2m_generation;
 
 /* Nested p2ms: linked list of n2pms allocated to this domain. 
  * The host p2m hasolds the head of the list and the np2ms are 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 07/14] x86/vvmx: restart nested vmentry in case of stale_np2m

2017-09-04 Thread Sergey Dyasli
If an IPI flushes vCPU's np2m object just before nested vmentry, there
will be a stale shadow EPTP value in VMCS02. Allow vmentry to be
restarted in such cases and add nvmx_eptp_update() to perform an update.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/entry.S |  6 ++
 xen/arch/x86/hvm/vmx/vmx.c   |  8 +++-
 xen/arch/x86/hvm/vmx/vvmx.c  | 14 ++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 53eedc6363..9fb8f89220 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -79,6 +79,8 @@ UNLIKELY_END(realmode)
 
 mov  %rsp,%rdi
 call vmx_vmenter_helper
+cmp  $0,%eax
+jne .Lvmx_vmentry_restart
 mov  VCPU_hvm_guest_cr2(%rbx),%rax
 
 pop  %r15
@@ -117,6 +119,10 @@ ENTRY(vmx_asm_do_vmentry)
 GET_CURRENT(bx)
 jmp  .Lvmx_do_vmentry
 
+.Lvmx_vmentry_restart:
+sti
+jmp  .Lvmx_do_vmentry
+
 .Lvmx_goto_emulator:
 sti
 mov  %rsp,%rdi
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index f6da119c9f..06509590b7 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4223,13 +4223,17 @@ static void lbr_fixup(void)
 bdw_erratum_bdf14_fixup();
 }
 
-void vmx_vmenter_helper(const struct cpu_user_regs *regs)
+int vmx_vmenter_helper(const struct cpu_user_regs *regs)
 {
 struct vcpu *curr = current;
 u32 new_asid, old_asid;
 struct hvm_vcpu_asid *p_asid;
 bool_t need_flush;
 
+/* Shadow EPTP can't be updated here because irqs are disabled */
+ if ( nestedhvm_vcpu_in_guestmode(curr) && vcpu_nestedhvm(curr).stale_np2m 
)
+ return 1;
+
 if ( curr->domain->arch.hvm_domain.pi_ops.do_resume )
 curr->domain->arch.hvm_domain.pi_ops.do_resume(curr);
 
@@ -4290,6 +4294,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
 __vmwrite(GUEST_RIP,regs->rip);
 __vmwrite(GUEST_RSP,regs->rsp);
 __vmwrite(GUEST_RFLAGS, regs->rflags | X86_EFLAGS_MBS);
+
+return 0;
 }
 
 /*
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index ea2da14489..26ce349c76 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1405,12 +1405,26 @@ static void virtual_vmexit(struct cpu_user_regs *regs)
 vmsucceed(regs);
 }
 
+static void nvmx_eptp_update(void)
+{
+if ( !nestedhvm_vcpu_in_guestmode(current) ||
+  vcpu_nestedhvm(current).nv_vmexit_pending ||
+ !vcpu_nestedhvm(current).stale_np2m ||
+ !nestedhvm_paging_mode_hap(current) )
+return;
+
+__vmwrite(EPT_POINTER, get_shadow_eptp(current));
+vcpu_nestedhvm(current).stale_np2m = false;
+}
+
 void nvmx_switch_guest(void)
 {
 struct vcpu *v = current;
 struct nestedvcpu *nvcpu = _nestedhvm(v);
 struct cpu_user_regs *regs = guest_cpu_user_regs();
 
+nvmx_eptp_update();
+
 /*
  * A pending IO emulation may still be not finished. In this case, no
  * virtual vmswitch is allowed. Or else, the following IO emulation will
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 01/14] x86/np2m: refactor p2m_get_nestedp2m()

2017-09-04 Thread Sergey Dyasli
1. Add a helper function assign_np2m()
2. Remove useless volatile
3. Update function's comment in the header
4. Minor style fixes ('\n' and d)

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 31 ++-
 xen/include/asm-x86/p2m.h |  6 +++---
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index e8a57d118c..b8c8bba421 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1773,14 +1773,24 @@ p2m_flush_nestedp2m(struct domain *d)
 p2m_flush_table(d->arch.nested_p2m[i]);
 }
 
+static void assign_np2m(struct vcpu *v, struct p2m_domain *p2m)
+{
+struct nestedvcpu *nv = _nestedhvm(v);
+struct domain *d = v->domain;
+
+/* Bring this np2m to the top of the LRU list */
+p2m_getlru_nestedp2m(d, p2m);
+
+nv->nv_flushp2m = 0;
+nv->nv_p2m = p2m;
+cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
+}
+
 struct p2m_domain *
 p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 {
-/* Use volatile to prevent gcc to cache nv->nv_p2m in a cpu register as
- * this may change within the loop by an other (v)cpu.
- */
-volatile struct nestedvcpu *nv = _nestedhvm(v);
-struct domain *d;
+struct nestedvcpu *nv = _nestedhvm(v);
+struct domain *d = v->domain;
 struct p2m_domain *p2m;
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
@@ -1790,7 +1800,6 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 nv->nv_p2m = NULL;
 }
 
-d = v->domain;
 nestedp2m_lock(d);
 p2m = nv->nv_p2m;
 if ( p2m ) 
@@ -1798,15 +1807,13 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 p2m_lock(p2m);
 if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
 {
-nv->nv_flushp2m = 0;
-p2m_getlru_nestedp2m(d, p2m);
-nv->nv_p2m = p2m;
 if ( p2m->np2m_base == P2M_BASE_EADDR )
 hvm_asid_flush_vcpu(v);
 p2m->np2m_base = np2m_base;
-cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
+assign_np2m(v, p2m);
 p2m_unlock(p2m);
 nestedp2m_unlock(d);
+
 return p2m;
 }
 p2m_unlock(p2m);
@@ -1817,11 +1824,9 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 p2m = p2m_getlru_nestedp2m(d, NULL);
 p2m_flush_table(p2m);
 p2m_lock(p2m);
-nv->nv_p2m = p2m;
 p2m->np2m_base = np2m_base;
-nv->nv_flushp2m = 0;
 hvm_asid_flush_vcpu(v);
-cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
+assign_np2m(v, p2m);
 p2m_unlock(p2m);
 nestedp2m_unlock(d);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 6395e8fd1d..9086bb35dc 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -359,9 +359,9 @@ struct p2m_domain {
 /* get host p2m table */
 #define p2m_get_hostp2m(d)  ((d)->arch.p2m)
 
-/* Get p2m table (re)usable for specified np2m base.
- * Automatically destroys and re-initializes a p2m if none found.
- * If np2m_base == 0 then v->arch.hvm_vcpu.guest_cr[3] is used.
+/*
+ * Assigns an np2m with the specified np2m_base to the specified vCPU
+ * and returns that np2m.
  */
 struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 08/14] x86/np2m: add np2m_schedule()

2017-09-04 Thread Sergey Dyasli
np2m maintenance is required for a nested vcpu during scheduling:

1. On schedule-out: clear pCPU's bit in p2m->dirty_cpumask
to prevent useless IPIs.

2. On schedule-in: check if np2m is up to date and wasn't flushed.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
RFC --> v1:
- np2m_schedule() now accepts NP2M_SCHEDLE_IN/OUT

 xen/arch/x86/mm/p2m.c | 43 +++
 xen/include/asm-x86/p2m.h |  5 +
 2 files changed, 48 insertions(+)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 053df0c9aa..e5d2fed361 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1875,6 +1875,49 @@ p2m_get_p2m(struct vcpu *v)
 return p2m_get_nestedp2m(v);
 }
 
+void np2m_schedule(int dir)
+{
+struct nestedvcpu *nv = _nestedhvm(current);
+struct p2m_domain *p2m;
+
+ASSERT(dir == NP2M_SCHEDLE_IN || dir == NP2M_SCHEDLE_OUT);
+
+if ( !nestedhvm_enabled(current->domain) ||
+ !nestedhvm_vcpu_in_guestmode(current) ||
+ !nestedhvm_paging_mode_hap(current) )
+return;
+
+p2m = nv->nv_p2m;
+if ( p2m )
+{
+bool np2m_valid;
+
+p2m_lock(p2m);
+np2m_valid = p2m->np2m_base == nhvm_vcpu_p2m_base(current) &&
+ nv->np2m_generation == p2m->np2m_generation;
+if ( dir == NP2M_SCHEDLE_OUT && np2m_valid )
+{
+/*
+ * The np2m is up to date but this vCPU will no longer use it,
+ * which means there are no reasons to send a flush IPI.
+ */
+cpumask_clear_cpu(current->processor, p2m->dirty_cpumask);
+}
+else if ( dir == NP2M_SCHEDLE_IN )
+{
+if ( !np2m_valid )
+{
+/* This vCPU's np2m was flushed while it was not runnable */
+hvm_asid_flush_core();
+vcpu_nestedhvm(current).nv_p2m = NULL;
+}
+else
+cpumask_set_cpu(current->processor, p2m->dirty_cpumask);
+}
+p2m_unlock(p2m);
+}
+}
+
 unsigned long paging_gva_to_gfn(struct vcpu *v,
 unsigned long va,
 uint32_t *pfec)
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 1a7002cbcd..f873dc4fd9 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -370,6 +370,11 @@ struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v);
  */
 struct p2m_domain *p2m_get_p2m(struct vcpu *v);
 
+#define NP2M_SCHEDLE_IN  0
+#define NP2M_SCHEDLE_OUT 1
+
+void np2m_schedule(int dir);
+
 static inline bool_t p2m_is_hostp2m(const struct p2m_domain *p2m)
 {
 return p2m->p2m_class == p2m_host;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 1/5] x86/msr: introduce struct msr_domain_policy

2017-08-30 Thread Sergey Dyasli
The new structure contains information about guest's MSRs that are
shared between all domain's vCPUs. It starts with only 1 MSR:

MSR_INTEL_PLATFORM_INFO

Which currently has only 1 usable bit: cpuid_faulting.

Add 2 global policy objects: hvm_max and pv_max that are inited during
boot up. It's always possible to emulate CPUID faulting for HVM guests
while for PV guests the H/W support is required.

Add init_domain_msr_policy() which sets initial MSR policy during
domain creation with a special case for Dom0.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/Makefile|  1 +
 xen/arch/x86/domain.c|  6 +++
 xen/arch/x86/msr.c   | 95 
 xen/arch/x86/setup.c |  1 +
 xen/include/asm-x86/domain.h |  3 +-
 xen/include/asm-x86/msr.h| 13 ++
 6 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/x86/msr.c

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 93ead6e5dd..d5d58a205e 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -35,6 +35,7 @@ obj-y += i8259.o
 obj-y += io_apic.o
 obj-$(CONFIG_LIVEPATCH) += alternative.o livepatch.o
 obj-y += msi.o
+obj-y += msr.o
 obj-y += ioport_emulate.o
 obj-y += irq.o
 obj-$(CONFIG_KEXEC) += machine_kexec.o
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index baaf8151d2..620666b33a 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -425,6 +425,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 {
 d->arch.emulation_flags = 0;
 d->arch.cpuid = ZERO_BLOCK_PTR; /* Catch stray misuses. */
+d->arch.msr = ZERO_BLOCK_PTR;
 }
 else
 {
@@ -470,6 +471,9 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 if ( (rc = init_domain_cpuid_policy(d)) )
 goto fail;
 
+if ( (rc = init_domain_msr_policy(d)) )
+goto fail;
+
 d->arch.ioport_caps = 
 rangeset_new(d, "I/O Ports", RANGESETF_prettyprint_hex);
 rc = -ENOMEM;
@@ -540,6 +544,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 cleanup_domain_irq_mapping(d);
 free_xenheap_page(d->shared_info);
 xfree(d->arch.cpuid);
+xfree(d->arch.msr);
 if ( paging_initialised )
 paging_final_teardown(d);
 free_perdomain_mappings(d);
@@ -554,6 +559,7 @@ void arch_domain_destroy(struct domain *d)
 
 xfree(d->arch.e820);
 xfree(d->arch.cpuid);
+xfree(d->arch.msr);
 
 free_domain_pirqs(d);
 if ( !is_idle_domain(d) )
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
new file mode 100644
index 00..eac50ec897
--- /dev/null
+++ b/xen/arch/x86/msr.c
@@ -0,0 +1,95 @@
+/**
+ * arch/x86/msr.c
+ *
+ * Policy objects for Model-Specific Registers.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy,
+ __read_mostly  pv_max_msr_domain_policy;
+
+static void __init calculate_hvm_max_policy(void)
+{
+struct msr_domain_policy *dp = _max_msr_domain_policy;
+
+if ( !hvm_enabled )
+return;
+
+/* 0x00ce  MSR_INTEL_PLATFORM_INFO */
+if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+{
+dp->plaform_info.available = true;
+dp->plaform_info.cpuid_faulting = true;
+}
+}
+
+static void __init calculate_pv_max_policy(void)
+{
+struct msr_domain_policy *dp = _max_msr_domain_policy;
+
+/* 0x00ce  MSR_INTEL_PLATFORM_INFO */
+if ( cpu_has_cpuid_faulting )
+{
+dp->plaform_info.available = true;
+dp->plaform_info.cpuid_faulting = true;
+}
+}
+
+void __init init_guest_msr_policy(void)
+{
+calculate_hvm_max_policy();
+calculate_pv_max_policy();
+}
+
+int init_domain_msr_policy(struct domain *d)
+{
+struct msr_domain_policy *dp;
+
+dp = xmalloc(struct msr_domain_policy);
+
+if ( !dp )
+return -ENOMEM;
+
+*dp = is_pv_domain(d) ? pv_max_msr_domain_policy :
+hvm_max_msr_domain_policy;
+
+/* See comment

[Xen-devel] [PATCH v1 3/5] x86: replace arch_vcpu::cpuid_faulting with msr_vcpu_policy

2017-08-30 Thread Sergey Dyasli
Since each vCPU now has struct msr_vcpu_policy, use cpuid_faulting bit
from there in current logic and remove arch_vcpu::cpuid_faulting.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/cpu/intel.c   |  3 ++-
 xen/arch/x86/hvm/hvm.c |  4 +++-
 xen/arch/x86/hvm/vmx/vmx.c | 10 ++
 xen/arch/x86/pv/emul-inv-op.c  |  4 +++-
 xen/arch/x86/pv/emul-priv-op.c |  5 +++--
 xen/include/asm-x86/domain.h   |  3 ---
 6 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 2e20327569..487eb06148 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -156,6 +156,7 @@ static void intel_ctxt_switch_levelling(const struct vcpu 
*next)
struct cpuidmasks *these_masks = _cpu(cpuidmasks);
const struct domain *nextd = next ? next->domain : NULL;
const struct cpuidmasks *masks;
+   const struct msr_vcpu_policy *vp = next->arch.msr;
 
if (cpu_has_cpuid_faulting) {
/*
@@ -176,7 +177,7 @@ static void intel_ctxt_switch_levelling(const struct vcpu 
*next)
 */
set_cpuid_faulting(nextd && !is_control_domain(nextd) &&
   (is_pv_domain(nextd) ||
-   next->arch.cpuid_faulting));
+   vp->misc_features_enables.cpuid_faulting));
return;
}
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6cb903def5..2ad07d52bc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3286,7 +3286,9 @@ unsigned long copy_from_user_hvm(void *to, const void 
*from, unsigned len)
 
 bool hvm_check_cpuid_faulting(struct vcpu *v)
 {
-if ( !v->arch.cpuid_faulting )
+const struct msr_vcpu_policy *vp = v->arch.msr;
+
+if ( !vp->misc_features_enables.cpuid_faulting )
 return false;
 
 return hvm_get_cpl(v) > 0;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 67fc85b201..155fba9017 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2902,7 +2902,7 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 
 case MSR_INTEL_MISC_FEATURES_ENABLES:
 *msr_content = 0;
-if ( current->arch.cpuid_faulting )
+if ( current->arch.msr->misc_features_enables.cpuid_faulting )
 *msr_content |= MSR_MISC_FEATURES_CPUID_FAULTING;
 break;
 
@@ -3134,15 +3134,17 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 
 case MSR_INTEL_MISC_FEATURES_ENABLES:
 {
-bool old_cpuid_faulting = v->arch.cpuid_faulting;
+struct msr_vcpu_policy *vp = v->arch.msr;
+bool old_cpuid_faulting = vp->misc_features_enables.cpuid_faulting;
 
 if ( msr_content & ~MSR_MISC_FEATURES_CPUID_FAULTING )
 goto gp_fault;
 
-v->arch.cpuid_faulting = msr_content & 
MSR_MISC_FEATURES_CPUID_FAULTING;
+vp->misc_features_enables.cpuid_faulting =
+msr_content & MSR_MISC_FEATURES_CPUID_FAULTING;
 
 if ( cpu_has_cpuid_faulting &&
- (old_cpuid_faulting ^ v->arch.cpuid_faulting) )
+ (old_cpuid_faulting ^ vp->misc_features_enables.cpuid_faulting) )
 ctxt_switch_levelling(v);
 break;
 }
diff --git a/xen/arch/x86/pv/emul-inv-op.c b/xen/arch/x86/pv/emul-inv-op.c
index 415d294c53..f8944170d5 100644
--- a/xen/arch/x86/pv/emul-inv-op.c
+++ b/xen/arch/x86/pv/emul-inv-op.c
@@ -66,6 +66,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs 
*regs)
 char sig[5], instr[2];
 unsigned long eip, rc;
 struct cpuid_leaf res;
+const struct msr_vcpu_policy *vp = current->arch.msr;
 
 eip = regs->rip;
 
@@ -89,7 +90,8 @@ static int emulate_forced_invalid_op(struct cpu_user_regs 
*regs)
 return 0;
 
 /* If cpuid faulting is enabled and CPL>0 inject a #GP in place of #UD. */
-if ( current->arch.cpuid_faulting && !guest_kernel_mode(current, regs) )
+if ( vp->misc_features_enables.cpuid_faulting &&
+ !guest_kernel_mode(current, regs) )
 {
 regs->rip = eip;
 pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index d50f51944f..66cda538fc 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -948,7 +948,7 @@ static int priv_op_read_msr(unsigned int reg, uint64_t *val,
  rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
 break;
 *val = 0;
-if ( curr->arch.cpuid_faulting )
+if ( curr->arch.msr->misc_features_enables.cpuid_faulting )
 *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
 return X86EMUL_OKA

[Xen-devel] [PATCH v1 5/5] x86/msr: introduce guest_wrmsr()

2017-08-30 Thread Sergey Dyasli
The new function is responsible for handling WRMSR from both HVM and PV
guests. Currently it handles only 2 MSRs:

MSR_INTEL_PLATFORM_INFO
MSR_INTEL_MISC_FEATURES_ENABLES

It has a different behaviour compared to the old MSR handlers: if MSR
is being handled by guest_wrmsr() then WRMSR will either succeed (if
a guest is allowed to access it and provided a correct value based on
its MSR policy) or produce a GP fault. A guest will never see
a successful WRMSR of some MSR unknown to this function.

guest_wrmsr() unifies and replaces the handling code from
vmx_msr_write_intercept() and priv_op_write_msr().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/hvm.c |  7 ++-
 xen/arch/x86/hvm/vmx/vmx.c | 23 --
 xen/arch/x86/msr.c | 44 ++
 xen/arch/x86/pv/emul-priv-op.c | 22 -
 xen/include/asm-x86/msr.h  |  1 +
 5 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ec7205ee32..524f9a37c0 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3465,7 +3465,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t 
msr_content,
 {
 struct vcpu *v = current;
 struct domain *d = v->domain;
-int ret = X86EMUL_OKAY;
+int ret;
 
 HVMTRACE_3D(MSR_WRITE, msr,
(uint32_t)msr_content, (uint32_t)(msr_content >> 32));
@@ -3483,6 +3483,11 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t 
msr_content,
 return X86EMUL_OKAY;
 }
 
+if ( (ret = guest_wrmsr(v, msr, msr_content)) != X86EMUL_UNHANDLEABLE )
+return ret;
+else
+ret = X86EMUL_OKAY;
+
 switch ( msr )
 {
 unsigned int index;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ac34383658..cea2a1ae55 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3116,29 +3116,6 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 goto gp_fault;
 break;
 
-case MSR_INTEL_PLATFORM_INFO:
-if ( msr_content ||
- rdmsr_safe(MSR_INTEL_PLATFORM_INFO, msr_content) )
-goto gp_fault;
-break;
-
-case MSR_INTEL_MISC_FEATURES_ENABLES:
-{
-struct msr_vcpu_policy *vp = v->arch.msr;
-bool old_cpuid_faulting = vp->misc_features_enables.cpuid_faulting;
-
-if ( msr_content & ~MSR_MISC_FEATURES_CPUID_FAULTING )
-goto gp_fault;
-
-vp->misc_features_enables.cpuid_faulting =
-msr_content & MSR_MISC_FEATURES_CPUID_FAULTING;
-
-if ( cpu_has_cpuid_faulting &&
- (old_cpuid_faulting ^ vp->misc_features_enables.cpuid_faulting) )
-ctxt_switch_levelling(v);
-break;
-}
-
 default:
 if ( passive_domain_do_wrmsr(msr, msr_content) )
 return X86EMUL_OKAY;
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index a822a132ad..9202a4a476 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -148,6 +148,50 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
uint64_t *val)
 return X86EMUL_EXCEPTION;
 }
 
+int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
+{
+struct domain *d = v->domain;
+struct msr_domain_policy *dp = d->arch.msr;
+struct msr_vcpu_policy *vp = v->arch.msr;
+
+switch ( msr )
+{
+case MSR_INTEL_PLATFORM_INFO:
+goto gp_fault;
+
+case MSR_INTEL_MISC_FEATURES_ENABLES:
+{
+uint64_t rsvd = ~0ull;
+bool old_cpuid_faulting = vp->misc_features_enables.cpuid_faulting;
+
+if ( !vp->misc_features_enables.available )
+goto gp_fault;
+
+if ( dp->plaform_info.cpuid_faulting )
+rsvd &= ~MSR_MISC_FEATURES_CPUID_FAULTING;
+
+if ( val & rsvd )
+goto gp_fault;
+
+vp->misc_features_enables.cpuid_faulting =
+val & MSR_MISC_FEATURES_CPUID_FAULTING;
+
+if ( is_hvm_domain(d) && cpu_has_cpuid_faulting &&
+ (old_cpuid_faulting ^ vp->misc_features_enables.cpuid_faulting) )
+ctxt_switch_levelling(v);
+break;
+}
+
+default:
+return X86EMUL_UNHANDLEABLE;
+}
+
+return X86EMUL_OKAY;
+
+ gp_fault:
+return X86EMUL_EXCEPTION;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index d563214fc4..d32af7d45d 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -983,6 +983,10 @@ static int priv_op_write_msr(unsigned int reg, uint64_t 
val,
 struct vcpu *curr = current;
 const struct domain *currd = curr->domain;
 bool vpmu_msr = false;
+int ret;
+
+if ( (ret = guest_wrmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
+return 

[Xen-devel] [PATCH v1 2/5] x86/msr: introduce struct msr_vcpu_policy

2017-08-30 Thread Sergey Dyasli
The new structure contains information about guest's MSRs that are
unique to each vCPU. It starts with only 1 MSR:

MSR_INTEL_MISC_FEATURES_ENABLES

Which currently has only 1 usable bit: cpuid_faulting.

Add 2 global policy objects: hvm_max and pv_max that are inited during
boot up. Availability of MSR_INTEL_MISC_FEATURES_ENABLES depends on
availability of MSR_INTEL_PLATFORM_INFO.

Add init_vcpu_msr_policy() which sets initial MSR policy for every vCPU
during domain creation with a special case for Dom0.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/domain.c| 18 --
 xen/arch/x86/msr.c   | 33 +
 xen/include/asm-x86/domain.h |  2 ++
 xen/include/asm-x86/msr.h| 11 +++
 4 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 620666b33a..1667d2ad57 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -344,13 +344,27 @@ int vcpu_initialise(struct vcpu *v)
 /* Idle domain */
 v->arch.cr3 = __pa(idle_pg_table);
 rc = 0;
+v->arch.msr = ZERO_BLOCK_PTR; /* Catch stray misuses */
 }
 
 if ( rc )
-vcpu_destroy_fpu(v);
-else if ( !is_idle_domain(v->domain) )
+goto fail;
+
+if ( !is_idle_domain(v->domain) )
+{
 vpmu_initialise(v);
 
+if ( (rc = init_vcpu_msr_policy(v)) )
+goto fail;
+}
+
+return rc;
+
+ fail:
+vcpu_destroy_fpu(v);
+xfree(v->arch.msr);
+v->arch.msr = NULL;
+
 return rc;
 }
 
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index eac50ec897..b5ad97d3c8 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -27,9 +27,13 @@
 struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy,
  __read_mostly  pv_max_msr_domain_policy;
 
+struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
+   __read_mostly  pv_max_msr_vcpu_policy;
+
 static void __init calculate_hvm_max_policy(void)
 {
 struct msr_domain_policy *dp = _max_msr_domain_policy;
+struct msr_vcpu_policy *vp = _max_msr_vcpu_policy;
 
 if ( !hvm_enabled )
 return;
@@ -40,11 +44,15 @@ static void __init calculate_hvm_max_policy(void)
 dp->plaform_info.available = true;
 dp->plaform_info.cpuid_faulting = true;
 }
+
+/* 0x0140  MSR_INTEL_MISC_FEATURES_ENABLES */
+vp->misc_features_enables.available = dp->plaform_info.available;
 }
 
 static void __init calculate_pv_max_policy(void)
 {
 struct msr_domain_policy *dp = _max_msr_domain_policy;
+struct msr_vcpu_policy *vp = _max_msr_vcpu_policy;
 
 /* 0x00ce  MSR_INTEL_PLATFORM_INFO */
 if ( cpu_has_cpuid_faulting )
@@ -52,6 +60,9 @@ static void __init calculate_pv_max_policy(void)
 dp->plaform_info.available = true;
 dp->plaform_info.cpuid_faulting = true;
 }
+
+/* 0x0140  MSR_INTEL_MISC_FEATURES_ENABLES */
+vp->misc_features_enables.available = dp->plaform_info.available;
 }
 
 void __init init_guest_msr_policy(void)
@@ -84,6 +95,28 @@ int init_domain_msr_policy(struct domain *d)
 return 0;
 }
 
+int init_vcpu_msr_policy(struct vcpu *v)
+{
+struct domain *d = v->domain;
+struct msr_vcpu_policy *vp;
+
+vp = xmalloc(struct msr_vcpu_policy);
+
+if ( !vp )
+return -ENOMEM;
+
+*vp = is_pv_domain(d) ? pv_max_msr_vcpu_policy :
+hvm_max_msr_vcpu_policy;
+
+/* See comment in intel_ctxt_switch_levelling() */
+if ( is_control_domain(d) )
+vp->misc_features_enables.available = false;
+
+v->arch.msr = vp;
+
+return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index f08ede3a05..866a03b508 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -575,6 +575,8 @@ struct arch_vcpu
 
 struct arch_vm_event *vm_event;
 
+struct msr_vcpu_policy *msr;
+
 struct {
 bool next_interrupt_enabled;
 } monitor;
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 5cf7be1821..7c8395b9b3 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -212,8 +212,19 @@ struct msr_domain_policy
 } plaform_info;
 };
 
+/* MSR policy object for per-vCPU MSRs */
+struct msr_vcpu_policy
+{
+/* 0x0140  MSR_INTEL_MISC_FEATURES_ENABLES */
+struct {
+bool available; /* This MSR is non-architectural */
+bool cpuid_faulting;
+} misc_features_enables;
+};
+
 void init_guest_msr_policy(void);
 int init_domain_msr_policy(struct domain *d);
+int init_vcpu_msr_policy(struct vcpu *v);
 
 #endif /* !__ASSEMBLY__ */
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 4/5] x86/msr: introduce guest_rdmsr()

2017-08-30 Thread Sergey Dyasli
The new function is responsible for handling RDMSR from both HVM and PV
guests. Currently it handles only 2 MSRs:

MSR_INTEL_PLATFORM_INFO
MSR_INTEL_MISC_FEATURES_ENABLES

It has a different behaviour compared to the old MSR handlers: if MSR
is being handled by guest_rdmsr() then RDMSR will either succeed (if
a guest is allowed to access it based on its MSR policy) or produce
a GP fault. A guest will never see a H/W value of some MSR unknown to
this function.

guest_rdmsr() unifies and replaces the handling code from
vmx_msr_read_intercept() and priv_op_read_msr().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/hvm.c |  7 ++-
 xen/arch/x86/hvm/vmx/vmx.c | 10 --
 xen/arch/x86/msr.c | 31 +++
 xen/arch/x86/pv/emul-priv-op.c | 22 --
 xen/include/asm-x86/msr.h  |  8 
 5 files changed, 49 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2ad07d52bc..ec7205ee32 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3334,11 +3334,16 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t 
*msr_content)
 struct vcpu *v = current;
 struct domain *d = v->domain;
 uint64_t *var_range_base, *fixed_range_base;
-int ret = X86EMUL_OKAY;
+int ret;
 
 var_range_base = (uint64_t *)v->arch.hvm_vcpu.mtrr.var_ranges;
 fixed_range_base = (uint64_t *)v->arch.hvm_vcpu.mtrr.fixed_ranges;
 
+if ( (ret = guest_rdmsr(v, msr, msr_content)) != X86EMUL_UNHANDLEABLE )
+return ret;
+else
+ret = X86EMUL_OKAY;
+
 switch ( msr )
 {
 unsigned int index;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 155fba9017..ac34383658 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2896,16 +2896,6 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 goto gp_fault;
 break;
 
-case MSR_INTEL_PLATFORM_INFO:
-*msr_content = MSR_PLATFORM_INFO_CPUID_FAULTING;
-break;
-
-case MSR_INTEL_MISC_FEATURES_ENABLES:
-*msr_content = 0;
-if ( current->arch.msr->misc_features_enables.cpuid_faulting )
-*msr_content |= MSR_MISC_FEATURES_CPUID_FAULTING;
-break;
-
 default:
 if ( passive_domain_do_rdmsr(msr, msr_content) )
 goto done;
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index b5ad97d3c8..a822a132ad 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -117,6 +117,37 @@ int init_vcpu_msr_policy(struct vcpu *v)
 return 0;
 }
 
+int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
+{
+const struct msr_domain_policy *dp = v->domain->arch.msr;
+const struct msr_vcpu_policy *vp = v->arch.msr;
+
+switch ( msr )
+{
+case MSR_INTEL_PLATFORM_INFO:
+if ( !dp->plaform_info.available )
+goto gp_fault;
+*val = (uint64_t) dp->plaform_info.cpuid_faulting <<
+   _MSR_PLATFORM_INFO_CPUID_FAULTING;
+break;
+
+case MSR_INTEL_MISC_FEATURES_ENABLES:
+if ( !vp->misc_features_enables.available )
+goto gp_fault;
+*val = (uint64_t) vp->misc_features_enables.cpuid_faulting <<
+   _MSR_MISC_FEATURES_CPUID_FAULTING;
+break;
+
+default:
+return X86EMUL_UNHANDLEABLE;
+}
+
+return X86EMUL_OKAY;
+
+ gp_fault:
+return X86EMUL_EXCEPTION;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 66cda538fc..d563214fc4 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -834,6 +834,10 @@ static int priv_op_read_msr(unsigned int reg, uint64_t 
*val,
 const struct vcpu *curr = current;
 const struct domain *currd = curr->domain;
 bool vpmu_msr = false;
+int ret;
+
+if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
+return ret;
 
 switch ( reg )
 {
@@ -934,24 +938,6 @@ static int priv_op_read_msr(unsigned int reg, uint64_t 
*val,
 *val = 0;
 return X86EMUL_OKAY;
 
-case MSR_INTEL_PLATFORM_INFO:
-if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
- rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *val) )
-break;
-*val = 0;
-if ( this_cpu(cpuid_faulting_enabled) )
-*val |= MSR_PLATFORM_INFO_CPUID_FAULTING;
-return X86EMUL_OKAY;
-
-case MSR_INTEL_MISC_FEATURES_ENABLES:
-if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
- rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
-break;
-*val = 0;
-if ( curr->arch.msr->misc_features_enables.cpuid_faulting )
-*val |= MSR_MISC_FEATURES_CPUID_FAULTING;
-return X86EMUL_OKAY;

[Xen-devel] [PATCH v1 0/5] Generic MSR policy: infrastructure + cpuid_faulting

2017-08-30 Thread Sergey Dyasli
Currently there are the following issues with handling guest's RD/WRMSR
in Xen:

1. There is no way to configure which MSRs a guest can and can't access.
   And if there is no MSR handler in Xen for a particular MSR then
   the default behavior is just horrible:

RDMSR: rdmsr_safe(msr, *msr_content) /* returns a H/W value */
WRMSR: return X86EMUL_OKAY;  /* write is silently discarded */

2. There are too many handlers. Example for RDMSR:
priv_op_read_msr()
hvm_msr_read_intercept()
vmce_rdmsr()
svm_msr_read_intercept()
vmx_msr_read_intercept()
nvmx_msr_read_intercept()
rdmsr_viridian_regs()
...

This series tries to address the above issues in the following way.
2 types of MSR policy objects are introduced:

1. Per-Domain policy (struct msr_domain_policy) -- for shared MSRs
2. Per-vCPU policy (struct msr_vcpu_policy) -- for unique MSRs

Each domain and each vCPU inside a domain will now have an associated
MSR policy object. Contents of these structures are defined during
domain creation. For now, it's just a copy of either HVM_MAX or PV_MAX
policy, depending on a guest's type. But in the future it should be
possible to control the availability and values in guest's MSRs from
a toolstack. However, any MSR manipulations must be done together with
CPUID ones.

Once policy objects are in place, it becomes easy to introduce unified
guest's RDMSR and WRMSR handlers. They work directly with MSR policy
objects since all the state of guest's MSRs is contained there.

Main idea of having MSR policy is to define a set and contents of MSRs
that a guest sees. All other MSRs should be inaccessible (access would
generate a GP fault). And this MSR information should also be sent in
the migration stream.

Since it's impossible to convert all MSRs to use the new infrastructure
right away, this series starts with 2 MSRs responsible for CPUID
faulting:

1. MSR_INTEL_PLATFORM_INFO
2. MSR_INTEL_MISC_FEATURES_ENABLES

My previous VMX MSR policy patch set will be rebased on top of this
generic MSR infrastructure after it's merged.

Sergey Dyasli (5):
  x86/msr: introduce struct msr_domain_policy
  x86/msr: introduce struct msr_vcpu_policy
  x86: replace arch_vcpu::cpuid_faulting with msr_vcpu_policy
  x86/msr: introduce guest_rdmsr()
  x86/msr: introduce guest_wrmsr()

 xen/arch/x86/Makefile  |   1 +
 xen/arch/x86/cpu/intel.c   |   3 +-
 xen/arch/x86/domain.c  |  24 -
 xen/arch/x86/hvm/hvm.c |  18 +++-
 xen/arch/x86/hvm/vmx/vmx.c |  31 ---
 xen/arch/x86/msr.c | 203 +
 xen/arch/x86/pv/emul-inv-op.c  |   4 +-
 xen/arch/x86/pv/emul-priv-op.c |  43 ++---
 xen/arch/x86/setup.c   |   1 +
 xen/include/asm-x86/domain.h   |   8 +-
 xen/include/asm-x86/msr.h  |  33 +++
 11 files changed, 292 insertions(+), 77 deletions(-)
 create mode 100644 xen/arch/x86/msr.c

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 00/12] Nested p2m: allow sharing between vCPUs

2017-08-29 Thread Sergey Dyasli
On Mon, 2017-08-28 at 18:03 +0100, George Dunlap wrote:
> On 07/18/2017 11:34 AM, Sergey Dyasli wrote:
> > Nested p2m (shadow EPT) is an object that stores memory address
> > translations from L2 GPA directly to L0 HPA. This is achieved by
> > combining together L1 EPT tables with L0 EPT during L2 EPT violations.
> > 
> > In the usual case, L1 uses the same EPTP value in VMCS12 for all vCPUs
> > of a L2 guest. But unfortunately, in current Xen's implementation, each
> > vCPU has its own n2pm object which cannot be shared with other vCPUs.
> > This leads to the following issues if a nested guest has SMP:
> > 
> > 1. There will be multiple np2m objects (1 per nested vCPU) with
> >the same np2m_base (L1 EPTP value in VMCS12).
> > 
> > 2. Same EPT violations will be processed independently by each vCPU.
> > 
> > 3. Since MAX_NESTEDP2M is defined as 10, if a domain has more than
> >10 nested vCPUs, performance will be extremely degraded due to
> >constant np2m LRU list thrashing and np2m flushing.
> > 
> > This patch series makes it possible to share one np2m object between
> > different vCPUs that have the same np2m_base. Sharing of np2m objects
> > improves scalability of a domain from 10 nested vCPUs to 10 nested
> > guests (with arbitrary number of vCPUs per guest).
> 
> On the whole this looks like a decent approach.
> 
> Were you planning on re-sending it with the RFC removed, or would you
> like me to do a detailed review of this series as it is?

Thanks for review! My current plan is to re-send the series as v1 after
addressing your and Christoph's comments. Let's wait for that before
detailed review :)

Oh and there is a possibility I may do some AMD testing.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 02/12] x86/np2m: add np2m_flush_eptp()

2017-08-03 Thread Sergey Dyasli
On Tue, 2017-08-01 at 09:55 +0200, Egger, Christoph wrote:
> On 18.07.17 12:34, Sergey Dyasli wrote:
> > The new function finds all np2m objects with the specified eptp and
> > flushes them. p2m_flush_table_locked() is added in order not to release
> > the p2m lock after np2m_base check.
> > 
> > Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
> > ---
> >  xen/arch/x86/mm/p2m.c | 34 +++---
> >  xen/include/asm-x86/p2m.h |  2 ++
> >  2 files changed, 33 insertions(+), 3 deletions(-)
> > 
> > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> > index b8c8bba421..bc330d8f52 100644
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -1708,15 +1708,14 @@ p2m_getlru_nestedp2m(struct domain *d, struct 
> > p2m_domain *p2m)
> >  return p2m;
> >  }
> >  
> > -/* Reset this p2m table to be empty */
> >  static void
> > -p2m_flush_table(struct p2m_domain *p2m)
> > +p2m_flush_table_locked(struct p2m_domain *p2m)
> >  {
> >  struct page_info *top, *pg;
> >  struct domain *d = p2m->domain;
> >  mfn_t mfn;
> >  
> > -p2m_lock(p2m);
> > +ASSERT(p2m_locked_by_me(p2m));
> >  
> >  /*
> >   * "Host" p2m tables can have shared entries  that need a bit more 
> > care
> > @@ -1756,6 +1755,14 @@ p2m_flush_table(struct p2m_domain *p2m)
> >  p2m_unlock(p2m);
> >  }
> >  
> > +/* Reset this p2m table to be empty */
> > +static void
> > +p2m_flush_table(struct p2m_domain *p2m)
> > +{
> > +p2m_lock(p2m);
> > +p2m_flush_table_locked(p2m);
> > +}
> > +
> >  void
> >  p2m_flush(struct vcpu *v, struct p2m_domain *p2m)
> >  {
> > @@ -1773,6 +1780,27 @@ p2m_flush_nestedp2m(struct domain *d)
> >  p2m_flush_table(d->arch.nested_p2m[i]);
> >  }
> >  
> > +void np2m_flush_eptp(struct vcpu *v, unsigned long eptp)
> > +{
> > +struct domain *d = v->domain;
> > +struct p2m_domain *p2m;
> > +unsigned int i;
> > +
> > +eptp &= ~(0xfffull);
> > +
> > +nestedp2m_lock(d);
> > +for ( i = 0; i < MAX_NESTEDP2M; i++ )
> > +{
> > +p2m = d->arch.nested_p2m[i];
> > +p2m_lock(p2m);
> > +if ( p2m->np2m_base == eptp )
> > +p2m_flush_table_locked(p2m);
> > +else
> > +p2m_unlock(p2m);
> > +}
> > +nestedp2m_unlock(d);
> > +}
> > +
> 
> What exactly is eptp specific in this function ?

Yes, good point. I seem to be too focused on Intel. The correct parameter
name should be np2m_base, of course.

> >  static void assign_np2m(struct vcpu *v, struct p2m_domain *p2m)
> >  {
> >  struct nestedvcpu *nv = _nestedhvm(v);
> > diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> > index 9086bb35dc..0e3387 100644
> > --- a/xen/include/asm-x86/p2m.h
> > +++ b/xen/include/asm-x86/p2m.h
> > @@ -779,6 +779,8 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa);
> >  void p2m_flush(struct vcpu *v, struct p2m_domain *p2m);
> >  /* Flushes all nested p2m tables */
> >  void p2m_flush_nestedp2m(struct domain *d);
> > +/* Flushes all np2m objects with the specified eptp */
> > +void np2m_flush_eptp(struct vcpu *v, unsigned long eptp);
> >  
> >  void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> >  l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
> > 
> 
> Amazon Development Center Germany GmbH
> Berlin - Dresden - Aachen
> main office: Krausenstr. 38, 10117 Berlin
> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> Ust-ID: DE289237879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/6] x86/vvmx: Fix handing of the MSR_BITMAP field with VMCS shadowing

2017-07-26 Thread Sergey Dyasli
On Wed, 2017-07-19 at 12:57 +0100, Andrew Cooper wrote:
> Currently, the following sequence of actions:
> 
>  * VMPTRLD (creates a mapping, likely pointing at gfn 0 for an empty vmcs)
>  * VMWRITE CPU_BASED_VM_EXEC_CONTROL (completed by hardware)
>  * VMWRITE MSR_BITMAP (completed by hardware)
>  * VMLAUNCH
> 
> results in an L2 guest running with ACTIVATE_MSR_BITMAP set, but Xen using a
> stale mapping (likely gfn 0) when reading the interception bitmap.  The
> MSR_BITMAP field needs unconditionally intercepting even with VMCS shadowing,
> so Xen's mapping of the bitmap can be updated.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>

Reviewed-by: Sergey Dyasli <sergey.dya...@citrix.com>

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 4/5] x86/vvmx: add vvmx_max_msr_policy

2017-07-24 Thread Sergey Dyasli
Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

Add vvmx_max_msr_policy object which represents the end result of
nvmx_msr_read_intercept() on current H/W. Most of the code is moved
from nvmx_msr_read_intercept() to calculate_vvmx_max_policy() which is
called only once during the startup.

There is no functional change to what L1 sees in VMX MSRs.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
v1 --> v2:
- Renamed hvm_max_vmx_msr_policy to vvmx_max_msr_policy and made it
  static
- calculate_hvm_max_policy() is renamed to calculate_vvmx_max_policy()
- Declaration of calculate_vvmx_max_policy() is removed from vmcs.c
  and added to vvmx.h
- Removed comment "XXX: vmcs_revision_id for nested virt"
- nvmx_msr_read_intercept() now uses const struct vmx_msr_policy *
- Shortened "msr = *msr &" to "*msr &="
- Removed usage of "data" as an intermediate variable for 4 MSRs
- Replaced magic constant for disabling MSR_IA32_VMX_VMFUNC with
  gen_vmx_msr_mask()
- get_vmx_msr_ptr() and get_vmx_msr_val() helpers are used instead of
  accessing MSR array directly

 xen/arch/x86/hvm/vmx/vmcs.c|   1 +
 xen/arch/x86/hvm/vmx/vvmx.c| 284 +
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 +
 3 files changed, 134 insertions(+), 153 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index bd36b6e12a..67077cc41a 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -493,6 +493,7 @@ static int vmx_init_vmcs_config(void)
 vmx_virt_exception = !!(_vmx_secondary_exec_control &
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
 vmx_display_features();
+calculate_vvmx_max_policy();
 
 /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
 if ( raw_vmx_msr_policy.basic.vmcs_region_size > PAGE_SIZE )
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 2c8cf637a8..e71728f356 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1941,6 +1941,8 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
 return X86EMUL_OKAY;
 }
 
+static struct vmx_msr_policy __read_mostly vvmx_max_msr_policy;
+
 #define __emul_value(enable1, default1) \
 ((enable1 | default1) << 32 | (default1))
 
@@ -1948,6 +1950,128 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
 (((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \
 ((uint32_t)(__emul_value(enable1, default1) | host_value)))
 
+void __init calculate_vvmx_max_policy(void)
+{
+struct vmx_msr_policy *p = _max_msr_policy;
+uint64_t data, *msr;
+u32 default1_bits;
+
+*p = raw_vmx_msr_policy;
+
+/* Pinbased controls 1-settings */
+data = PIN_BASED_EXT_INTR_MASK |
+   PIN_BASED_NMI_EXITING |
+   PIN_BASED_PREEMPT_TIMER;
+
+msr = get_vmx_msr_ptr(p, MSR_IA32_VMX_PINBASED_CTLS);
+*msr = gen_vmx_msr(data, VMX_PINBASED_CTLS_DEFAULT1, *msr);
+msr = get_vmx_msr_ptr(p, MSR_IA32_VMX_TRUE_PINBASED_CTLS);
+*msr = gen_vmx_msr(data, VMX_PINBASED_CTLS_DEFAULT1, *msr);
+
+/* Procbased controls 1-settings */
+default1_bits = VMX_PROCBASED_CTLS_DEFAULT1;
+data = CPU_BASED_HLT_EXITING |
+   CPU_BASED_VIRTUAL_INTR_PENDING |
+   CPU_BASED_CR8_LOAD_EXITING |
+   CPU_BASED_CR8_STORE_EXITING |
+   CPU_BASED_INVLPG_EXITING |
+   CPU_BASED_CR3_LOAD_EXITING |
+   CPU_BASED_CR3_STORE_EXITING |
+   CPU_BASED_MONITOR_EXITING |
+   CPU_BASED_MWAIT_EXITING |
+   CPU_BASED_MOV_DR_EXITING |
+   CPU_BASED_ACTIVATE_IO_BITMAP |
+   CPU_BASED_USE_TSC_OFFSETING |
+   CPU_BASED_UNCOND_IO_EXITING |
+   CPU_BASED_RDTSC_EXITING |
+   CPU_BASED_MONITOR_TRAP_FLAG |
+   CPU_BASED_VIRTUAL_NMI_PENDING |
+   CPU_BASED_ACTIVATE_MSR_BITMAP |
+   CPU_BASED_PAUSE_EXITING |
+   CPU_BASED_RDPMC_EXITING |
+   CPU_BASED_TPR_SHADOW |
+   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+
+msr = get_vmx_msr_ptr(p, MSR_IA32_VMX_PROCBASED_CTLS);
+*msr = gen_vmx_msr(data, default1_bits, *msr);
+
+default1_bits &= ~(CPU_BASED_CR3_LOAD_EXITING |
+   CPU_BASED_CR3_STORE_EXITING |
+   CPU_BASED_INVLPG_EXITING);
+
+msr = get_vmx_msr_ptr(p, MSR_IA32_VMX_TRUE_PROCBASED_CTLS);
+*msr = gen_vmx_msr(data, default1_bits, *msr);
+
+/* Procbased-2 controls 1-settings */
+data = SECONDARY_EXEC_DESCRIPTOR_TABLE_EXITING |
+   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 

[Xen-devel] [PATCH v2 1/5] x86/vmx: add struct vmx_msr_policy

2017-07-24 Thread Sergey Dyasli
This structure provides a convenient way of accessing contents of
VMX MSRs: every bit value is accessible by its name. Bit names match
existing Xen's definitions as close as possible. The structure also
contains the bitmap of available MSRs since not all of them may be
available on a particular H/W.

A set of helper functions is introduced to provide a simple way of
interacting with the new structure.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
v1 --> v2:
- Replaced MSR indices with MSR names in struct vmx_msr_policy's comments
- Named "always zero bit" 31 of basic msr as mbz
- Added placeholder bits into union vmfunc
- Added structures cr0_bits and cr4_bits
- Added MSR_IA32_VMX_LAST define to use instead of MSR_IA32_VMX_VMFUNC
- vmx_msr_available() now uses pointer to const struct vmx_msr_policy
- build_assertions() now uses local struct vmx_msr_policy
- Added BUILD_BUG_ON to check that width of vmx_msr_policy::available
  bitmap is enough for all existing VMX MSRs
- Helpers get_vmx_msr_val(), get_vmx_msr_ptr() and gen_vmx_msr_mask()
  are added

 xen/arch/x86/hvm/vmx/vmcs.c|  78 
 xen/include/asm-x86/hvm/vmx/vmcs.h | 380 +
 xen/include/asm-x86/msr-index.h|   1 +
 3 files changed, 459 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 8103b20d29..33715748f0 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -144,6 +144,40 @@ static void __init vmx_display_features(void)
 printk(" - none\n");
 }
 
+bool vmx_msr_available(const struct vmx_msr_policy *p, uint32_t msr)
+{
+if ( msr < MSR_IA32_VMX_BASIC || msr > MSR_IA32_VMX_LAST )
+return 0;
+
+return p->available & (1u << (msr - MSR_IA32_VMX_BASIC));
+}
+
+uint64_t get_vmx_msr_val(const struct vmx_msr_policy *p, uint32_t msr)
+{
+if ( !vmx_msr_available(p, msr))
+return 0;
+
+return p->msr[msr - MSR_IA32_VMX_BASIC];
+}
+
+uint64_t *get_vmx_msr_ptr(struct vmx_msr_policy *p, uint32_t msr)
+{
+if ( !vmx_msr_available(p, msr))
+return NULL;
+
+return >msr[msr - MSR_IA32_VMX_BASIC];
+}
+
+uint32_t gen_vmx_msr_mask(uint32_t start_msr, uint32_t end_msr)
+{
+if ( start_msr < MSR_IA32_VMX_BASIC || start_msr > MSR_IA32_VMX_LAST ||
+ end_msr < MSR_IA32_VMX_BASIC || end_msr > MSR_IA32_VMX_LAST )
+return 0;
+
+return ((1u << (end_msr - start_msr + 1)) - 1) <<
+   (start_msr - MSR_IA32_VMX_BASIC);
+}
+
 static u32 adjust_vmx_controls(
 const char *name, u32 ctl_min, u32 ctl_opt, u32 msr, bool_t *mismatch)
 {
@@ -1956,6 +1990,50 @@ void __init setup_vmcs_dump(void)
 register_keyhandler('v', vmcs_dump, "dump VT-x VMCSs", 1);
 }
 
+static void __init __maybe_unused build_assertions(void)
+{
+struct vmx_msr_policy policy;
+
+BUILD_BUG_ON(sizeof(policy.basic) !=
+ sizeof(policy.basic.raw));
+BUILD_BUG_ON(sizeof(policy.pinbased_ctls) !=
+ sizeof(policy.pinbased_ctls.raw));
+BUILD_BUG_ON(sizeof(policy.procbased_ctls) !=
+ sizeof(policy.procbased_ctls.raw));
+BUILD_BUG_ON(sizeof(policy.exit_ctls) !=
+ sizeof(policy.exit_ctls.raw));
+BUILD_BUG_ON(sizeof(policy.entry_ctls) !=
+ sizeof(policy.entry_ctls.raw));
+BUILD_BUG_ON(sizeof(policy.misc) !=
+ sizeof(policy.misc.raw));
+BUILD_BUG_ON(sizeof(policy.cr0_fixed_0) !=
+ sizeof(policy.cr0_fixed_0.raw));
+BUILD_BUG_ON(sizeof(policy.cr0_fixed_1) !=
+ sizeof(policy.cr0_fixed_1.raw));
+BUILD_BUG_ON(sizeof(policy.cr4_fixed_0) !=
+ sizeof(policy.cr4_fixed_0.raw));
+BUILD_BUG_ON(sizeof(policy.cr4_fixed_1) !=
+ sizeof(policy.cr4_fixed_1.raw));
+BUILD_BUG_ON(sizeof(policy.vmcs_enum) !=
+ sizeof(policy.vmcs_enum.raw));
+BUILD_BUG_ON(sizeof(policy.procbased_ctls2) !=
+ sizeof(policy.procbased_ctls2.raw));
+BUILD_BUG_ON(sizeof(policy.ept_vpid_cap) !=
+ sizeof(policy.ept_vpid_cap.raw));
+BUILD_BUG_ON(sizeof(policy.true_pinbased_ctls) !=
+ sizeof(policy.true_pinbased_ctls.raw));
+BUILD_BUG_ON(sizeof(policy.true_procbased_ctls) !=
+ sizeof(policy.true_procbased_ctls.raw));
+BUILD_BUG_ON(sizeof(policy.true_exit_ctls) !=
+ sizeof(policy.true_exit_ctls.raw));
+BUILD_BUG_ON(sizeof(policy.true_entry_ctls) !=
+ sizeof(policy.true_entry_ctls.raw));
+BUILD_BUG_ON(sizeof(policy.vmfunc) !=
+ sizeof(policy.vmfunc.raw));
+
+BUILD_BUG_ON(MSR_IA32_VMX_LAST - MSR_IA32_VMX_BASIC + 1 >
+ sizeof(policy.available) * 8);
+}
 
 /*
  * Local variables:
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index e3cdfdf576..c6ff3fe0

[Xen-devel] [PATCH v2 5/5] x86/vvmx: add per domain vmx msr policy

2017-07-24 Thread Sergey Dyasli
Having a policy per domain allows to sensibly query what VMX features
the domain has, which unblocks some other nested virt work items.

For now, make policy for each domain equal to vvmx_max_msr_policy.
In the future it should be possible to independently configure
the policy for each domain.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>
---
v1 --> v2:
- nvmx_msr_read_intercept() now uses const struct vmx_msr_policy *
  (starting from patch #4)
- Added Reviewed-by: Jan Beulich <jbeul...@suse.com>

 xen/arch/x86/domain.c  |  6 ++
 xen/arch/x86/hvm/vmx/vvmx.c| 14 +-
 xen/include/asm-x86/domain.h   |  2 ++
 xen/include/asm-x86/hvm/vmx/vvmx.h |  1 +
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dd8bf1302f..e72f17c593 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -425,6 +425,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 {
 d->arch.emulation_flags = 0;
 d->arch.cpuid = ZERO_BLOCK_PTR; /* Catch stray misuses. */
+d->arch.vmx_msr = ZERO_BLOCK_PTR;
 }
 else
 {
@@ -470,6 +471,9 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 if ( (rc = init_domain_cpuid_policy(d)) )
 goto fail;
 
+if ( (rc = init_domain_vmx_msr_policy(d)) )
+goto fail;
+
 d->arch.ioport_caps = 
 rangeset_new(d, "I/O Ports", RANGESETF_prettyprint_hex);
 rc = -ENOMEM;
@@ -541,6 +545,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 cleanup_domain_irq_mapping(d);
 free_xenheap_page(d->shared_info);
 xfree(d->arch.cpuid);
+xfree(d->arch.vmx_msr);
 if ( paging_initialised )
 paging_final_teardown(d);
 free_perdomain_mappings(d);
@@ -555,6 +560,7 @@ void arch_domain_destroy(struct domain *d)
 
 xfree(d->arch.e820);
 xfree(d->arch.cpuid);
+xfree(d->arch.vmx_msr);
 
 free_domain_pirqs(d);
 if ( !is_idle_domain(d) )
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index e71728f356..9a19e7a7c0 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2072,6 +2072,18 @@ void __init calculate_vvmx_max_policy(void)
   MSR_IA32_VMX_VMFUNC);
 }
 
+int init_domain_vmx_msr_policy(struct domain *d)
+{
+d->arch.vmx_msr = xmalloc(struct vmx_msr_policy);
+
+if ( !d->arch.vmx_msr )
+return -ENOMEM;
+
+*d->arch.vmx_msr = vvmx_max_msr_policy;
+
+return 0;
+}
+
 /*
  * Capability reporting
  */
@@ -2079,7 +2091,7 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 
*msr_content)
 {
 struct vcpu *v = current;
 struct domain *d = v->domain;
-const struct vmx_msr_policy *p = _max_msr_policy;
+const struct vmx_msr_policy *p = d->arch.vmx_msr;
 int r = 1;
 
 /* VMX capablity MSRs are available only when guest supports VMX. */
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index c10522b7f5..430188c1fa 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -359,6 +359,8 @@ struct arch_domain
 /* CPUID Policy. */
 struct cpuid_policy *cpuid;
 
+struct vmx_msr_policy *vmx_msr;
+
 struct PITState vpit;
 
 /* TSC management (emulation, pv, scaling, stats) */
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h 
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 150124f3a3..0f5e44ae94 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -246,5 +246,6 @@ int nvmx_cpu_up_prepare(unsigned int cpu);
 void nvmx_cpu_dead(unsigned int cpu);
 
 void calculate_vvmx_max_policy(void);
+int init_domain_vmx_msr_policy(struct domain *d);
 #endif /* __ASM_X86_HVM_VVMX_H__ */
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 0/5] VMX MSRs policy for Nested Virt: part 1

2017-07-24 Thread Sergey Dyasli
The end goal of having VMX MSRs policy is to be able to manage
L1 VMX features. This patch series is the first part of this work.
There is no functional change to what L1 sees in VMX MSRs at this
point. But each domain will have a policy object which allows to
sensibly query what VMX features the domain has. This will unblock
some other nested virtualization work items.

Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

The above makes L1 VMX feature set inconsistent between different H/W
and there is no ability to control what features are available to L1.
The overall set of issues has much in common with CPUID policy.

Part 1 introduces struct vmx_msr_policy and the following instances:

* Raw policy (raw_vmx_msr_policy) -- the actual contents of H/W VMX MSRs
* VVMX max policy (vvmx_max_msr_policy) -- the end result of
   nvmx_msr_read_intercept() on current H/W
* Per-domain policy (d->arch.vmx_msr) -- the copy of VVMX max policy
 (for now)

In the future it should be possible to independently configure the VMX
policy for each domain using some new domctl.

There is no "Host policy" object because Xen already has a set of
variables (vmx_pin_based_exec_control and others) which represent
the set of VMX features that Xen uses. There are features that Xen
doesn't use (e.g. CPU_BASED_PAUSE_EXITING) but they are available to L1.
This makes it not worthy to introduce "Host policy" at this stage.

v1 --> v2:
- Rebased to the latest master
- hvm_max_vmx_msr_policy is renamed to vvmx_max_msr_policy
- Dropped the debug patch
- Other changes are available on a per-patch basis

Sergey Dyasli (5):
  x86/vmx: add struct vmx_msr_policy
  x86/vmx: add raw_vmx_msr_policy
  x86/vmx: refactor vmx_init_vmcs_config()
  x86/vvmx: add vvmx_max_msr_policy
  x86/vvmx: add per domain vmx msr policy

 xen/arch/x86/domain.c  |   6 +
 xen/arch/x86/hvm/vmx/vmcs.c| 269 +-
 xen/arch/x86/hvm/vmx/vmx.c |   2 +
 xen/arch/x86/hvm/vmx/vvmx.c| 296 ++--
 xen/include/asm-x86/domain.h   |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h | 383 +
 xen/include/asm-x86/hvm/vmx/vvmx.h |   3 +
 xen/include/asm-x86/msr-index.h|   1 +
 8 files changed, 722 insertions(+), 240 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 3/5] x86/vmx: refactor vmx_init_vmcs_config()

2017-07-24 Thread Sergey Dyasli
1. Remove RDMSRs of VMX MSRs since all values are already available in
   raw_vmx_msr_policy.
2. Replace bit operations involving VMX bitmasks with accessing VMX
   features by name and using vmx_msr_available() where appropriate.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
v1 --> v2:
- get_vmx_msr_val() is used instead of accessing policy's msr array
  directly

 xen/arch/x86/hvm/vmx/vmcs.c | 56 +
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 8070ed21c8..bd36b6e12a 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -257,7 +257,8 @@ static u32 adjust_vmx_controls(
 {
 u32 vmx_msr_low, vmx_msr_high, ctl = ctl_min | ctl_opt;
 
-rdmsr(msr, vmx_msr_low, vmx_msr_high);
+vmx_msr_low = get_vmx_msr_val(_vmx_msr_policy, msr);
+vmx_msr_high = get_vmx_msr_val(_vmx_msr_policy, msr) >> 32;
 
 ctl &= vmx_msr_high; /* bit == 0 in high word ==> must be zero */
 ctl |= vmx_msr_low;  /* bit == 1 in low word  ==> must be one  */
@@ -275,19 +276,16 @@ static u32 adjust_vmx_controls(
 
 static int vmx_init_vmcs_config(void)
 {
-u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt;
+u32 min, opt;
 u32 _vmx_pin_based_exec_control;
 u32 _vmx_cpu_based_exec_control;
 u32 _vmx_secondary_exec_control = 0;
 u64 _vmx_ept_vpid_cap = 0;
-u64 _vmx_misc_cap = 0;
 u32 _vmx_vmexit_control;
 u32 _vmx_vmentry_control;
 u64 _vmx_vmfunc = 0;
 bool_t mismatch = 0;
 
-rdmsr(MSR_IA32_VMX_BASIC, vmx_basic_msr_low, vmx_basic_msr_high);
-
 min = (PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING);
 opt = (PIN_BASED_VIRTUAL_NMIS |
@@ -321,7 +319,7 @@ static int vmx_init_vmcs_config(void)
 _vmx_cpu_based_exec_control &=
 ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
 
-if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
+if ( vmx_msr_available(_vmx_msr_policy, MSR_IA32_VMX_PROCBASED_CTLS2) )
 {
 min = 0;
 opt = (SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
@@ -335,8 +333,7 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
SECONDARY_EXEC_XSAVES |
SECONDARY_EXEC_TSC_SCALING);
-rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
-if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
+if ( raw_vmx_msr_policy.misc.vmwrite_all )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
 if ( opt_vpid_enabled )
 opt |= SECONDARY_EXEC_ENABLE_VPID;
@@ -361,10 +358,9 @@ static int vmx_init_vmcs_config(void)
 }
 
 /* The IA32_VMX_EPT_VPID_CAP MSR exists only when EPT or VPID available */
-if ( _vmx_secondary_exec_control & (SECONDARY_EXEC_ENABLE_EPT |
-SECONDARY_EXEC_ENABLE_VPID) )
+if ( vmx_msr_available(_vmx_msr_policy, MSR_IA32_VMX_EPT_VPID_CAP) )
 {
-rdmsrl(MSR_IA32_VMX_EPT_VPID_CAP, _vmx_ept_vpid_cap);
+_vmx_ept_vpid_cap = raw_vmx_msr_policy.ept_vpid_cap.raw;
 
 if ( !opt_ept_ad )
 _vmx_ept_vpid_cap &= ~VMX_EPT_AD_BIT;
@@ -409,10 +405,14 @@ static int vmx_init_vmcs_config(void)
  * To use EPT we expect to be able to clear certain intercepts.
  * We check VMX_BASIC_MSR[55] to correctly handle default controls.
  */
-uint32_t must_be_one, must_be_zero, msr = MSR_IA32_VMX_PROCBASED_CTLS;
-if ( vmx_basic_msr_high & (VMX_BASIC_DEFAULT1_ZERO >> 32) )
-msr = MSR_IA32_VMX_TRUE_PROCBASED_CTLS;
-rdmsr(msr, must_be_one, must_be_zero);
+uint32_t must_be_one = raw_vmx_msr_policy.procbased_ctls.allowed_0.raw;
+uint32_t must_be_zero = 
raw_vmx_msr_policy.procbased_ctls.allowed_1.raw;
+if ( vmx_msr_available(_vmx_msr_policy,
+   MSR_IA32_VMX_TRUE_PROCBASED_CTLS) )
+{
+must_be_one = raw_vmx_msr_policy.true_procbased_ctls.allowed_0.raw;
+must_be_zero = 
raw_vmx_msr_policy.true_procbased_ctls.allowed_1.raw;
+}
 if ( must_be_one & (CPU_BASED_INVLPG_EXITING |
 CPU_BASED_CR3_LOAD_EXITING |
 CPU_BASED_CR3_STORE_EXITING) )
@@ -453,9 +453,9 @@ static int vmx_init_vmcs_config(void)
 _vmx_pin_based_exec_control  &= ~ PIN_BASED_POSTED_INTERRUPT;
 
 /* The IA32_VMX_VMFUNC MSR exists only when VMFUNC is available */
-if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS )
+if ( vmx_msr_available(_vmx_msr_policy, MSR_IA32_VMX_VMFUNC) )
 {
-rdmsrl(MSR_IA32_VMX_VMFUNC, _vmx_vmfunc);
+_vmx_vmfunc = raw_vmx_msr_policy.vmfunc.raw;
 
 /*
  * VMFUNC leaf 0 (EPTP switching) must be su

[Xen-devel] [PATCH v2 2/5] x86/vmx: add raw_vmx_msr_policy

2017-07-24 Thread Sergey Dyasli
Add calculate_vmx_raw_policy() which fills the raw_vmx_msr_policy
object (the actual contents of H/W VMX MSRs) on the boot CPU. On
secondary CPUs, this function checks that contents of VMX MSRs match
the boot CPU's contents.

Remove lesser version of same-contents-check from vmx_init_vmcs_config().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
v1 --> v2:
- calculate_raw_policy() is renamed to calculate_vmx_raw_policy()
  to avoid clash with the same-name function in cpuid.c
- Declaration of calculate_vmx_raw_policy() is removed from vmx.c
  and added to vmcs.h
- msr variable is now unsigned in calculate_vmx_raw_policy()
- "\n" moved to the same line as the printk format string
- Replaced magic constants for available bitmap with gen_vmx_msr_mask()
- get_vmx_msr_ptr() and get_vmx_msr_val() helpers are used instead of
  accessing MSR array directly

 xen/arch/x86/hvm/vmx/vmcs.c| 134 +
 xen/arch/x86/hvm/vmx/vmx.c |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   3 +
 3 files changed, 82 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 33715748f0..8070ed21c8 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -144,6 +144,8 @@ static void __init vmx_display_features(void)
 printk(" - none\n");
 }
 
+struct vmx_msr_policy __read_mostly raw_vmx_msr_policy;
+
 bool vmx_msr_available(const struct vmx_msr_policy *p, uint32_t msr)
 {
 if ( msr < MSR_IA32_VMX_BASIC || msr > MSR_IA32_VMX_LAST )
@@ -178,6 +180,78 @@ uint32_t gen_vmx_msr_mask(uint32_t start_msr, uint32_t 
end_msr)
(start_msr - MSR_IA32_VMX_BASIC);
 }
 
+int calculate_vmx_raw_policy(bool bsp)
+{
+struct vmx_msr_policy policy;
+struct vmx_msr_policy *p = 
+unsigned int msr;
+
+/* Raw policy is filled only on boot CPU */
+if ( bsp )
+p = _vmx_msr_policy;
+else
+memset(, 0, sizeof(policy));
+
+p->available = gen_vmx_msr_mask(MSR_IA32_VMX_BASIC, 
MSR_IA32_VMX_VMCS_ENUM);
+for ( msr = MSR_IA32_VMX_BASIC; msr <= MSR_IA32_VMX_VMCS_ENUM; msr++ )
+rdmsrl(msr, *get_vmx_msr_ptr(p, msr));
+
+if ( p->basic.default1_zero )
+{
+p->available |= gen_vmx_msr_mask(MSR_IA32_VMX_TRUE_PINBASED_CTLS,
+ MSR_IA32_VMX_TRUE_ENTRY_CTLS);
+for ( msr = MSR_IA32_VMX_TRUE_PINBASED_CTLS;
+  msr <= MSR_IA32_VMX_TRUE_ENTRY_CTLS; msr++ )
+rdmsrl(msr, *get_vmx_msr_ptr(p, msr));
+}
+
+if ( p->procbased_ctls.allowed_1.activate_secondary_controls )
+{
+p->available |= gen_vmx_msr_mask(MSR_IA32_VMX_PROCBASED_CTLS2,
+ MSR_IA32_VMX_PROCBASED_CTLS2);
+msr = MSR_IA32_VMX_PROCBASED_CTLS2;
+rdmsrl(msr, *get_vmx_msr_ptr(p, msr));
+
+if ( p->procbased_ctls2.allowed_1.enable_ept ||
+ p->procbased_ctls2.allowed_1.enable_vpid )
+{
+p->available |= gen_vmx_msr_mask(MSR_IA32_VMX_EPT_VPID_CAP,
+ MSR_IA32_VMX_EPT_VPID_CAP);
+msr = MSR_IA32_VMX_EPT_VPID_CAP;
+rdmsrl(msr, *get_vmx_msr_ptr(p, msr));
+}
+
+if ( p->procbased_ctls2.allowed_1.enable_vm_functions )
+{
+p->available |= gen_vmx_msr_mask(MSR_IA32_VMX_VMFUNC,
+ MSR_IA32_VMX_VMFUNC);
+msr = MSR_IA32_VMX_VMFUNC;
+rdmsrl(msr, *get_vmx_msr_ptr(p, msr));
+}
+}
+
+/* Check that secondary CPUs have exactly the same bits in VMX MSRs */
+if ( !bsp && memcmp(p, _vmx_msr_policy, sizeof(*p)) != 0 )
+{
+for ( msr = MSR_IA32_VMX_BASIC; msr <= MSR_IA32_VMX_LAST; msr++ )
+{
+if ( get_vmx_msr_val(p, msr) !=
+ get_vmx_msr_val(_vmx_msr_policy, msr) )
+{
+printk("VMX msr %#x: saw 0x%016"PRIx64" expected 
0x%016"PRIx64"\n",
+msr, get_vmx_msr_val(p, msr),
+get_vmx_msr_val(_vmx_msr_policy, msr));
+}
+}
+
+printk("VMX: Capabilities fatally differ between CPU%d and boot CPU\n",
+   smp_processor_id());
+return -EINVAL;
+}
+
+return 0;
+}
+
 static u32 adjust_vmx_controls(
 const char *name, u32 ctl_min, u32 ctl_opt, u32 msr, bool_t *mismatch)
 {
@@ -199,13 +273,6 @@ static u32 adjust_vmx_controls(
 return ctl;
 }
 
-static bool_t cap_check(const char *name, u32 expected, u32 saw)
-{
-if ( saw != expected )
-printk("VMX %s: saw %#x expected %#x\n", name, saw, expected);
-return saw != expected;
-}
-
 static int vmx_init_vmcs_config(void)
 {
 u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt;
@@ -438,56 +505,6 @@ static int vmx_in

[Xen-devel] [PATCH RFC 12/12] x86/vvmx: remove EPTP write from ept_handle_violation()

2017-07-18 Thread Sergey Dyasli
Now there is no need to update shadow EPTP after handling L2 EPT
violation since all EPTP updates are handled by nvmx_eptp_update().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 35aa57e24f..3a3e04bb0f 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3282,12 +3282,6 @@ static void ept_handle_violation(ept_qual_t q, paddr_t 
gpa)
 case 0: // Unhandled L1 EPT violation
 break;
 case 1: // This violation is handled completly
-/*Current nested EPT maybe flushed by other vcpus, so need
- * to re-set its shadow EPTP pointer.
- */
-if ( nestedhvm_vcpu_in_guestmode(current) &&
-nestedhvm_paging_mode_hap(current ) )
-__vmwrite(EPT_POINTER, get_shadow_eptp(current));
 return;
 case -1:// This vioaltion should be injected to L1 VMM
 vcpu_nestedhvm(current).nv_vmexit_pending = 1;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 11/12] x86/np2m: add break to np2m_flush_eptp()

2017-07-18 Thread Sergey Dyasli
Now that np2m sharing is implemented, there can be only one np2m object
with the same np2m_base. Break from loop if the required np2m was found
during np2m_flush_eptp().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 3 +++
 xen/include/asm-x86/p2m.h | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 480459ae51..d0a2aef1f2 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1796,7 +1796,10 @@ void np2m_flush_eptp(struct vcpu *v, unsigned long eptp)
 p2m = d->arch.nested_p2m[i];
 p2m_lock(p2m);
 if ( p2m->np2m_base == eptp )
+{
 p2m_flush_table_locked(p2m);
+break;
+}
 else
 p2m_unlock(p2m);
 }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 936d1142c8..7cc44cc496 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -784,7 +784,7 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa);
 void p2m_flush(struct vcpu *v, struct p2m_domain *p2m);
 /* Flushes all nested p2m tables */
 void p2m_flush_nestedp2m(struct domain *d);
-/* Flushes all np2m objects with the specified eptp */
+/* Flushes the np2m specified by eptp (if it exists) */
 void np2m_flush_eptp(struct vcpu *v, unsigned long eptp);
 
 void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 10/12] x86/np2m: implement sharing of np2m between vCPUs

2017-07-18 Thread Sergey Dyasli
Modify p2m_get_nestedp2m() to allow sharing a np2m between multiple
vcpus with the same np2m_base (L1 EPTP value in VMCS12).

np2m_schedule_in/out() callbacks are added to context_switch() as well
as pseudo schedule-out is performed during virtual_vmexit().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/domain.c   |  2 ++
 xen/arch/x86/hvm/vmx/vvmx.c |  4 
 xen/arch/x86/mm/p2m.c   | 29 +++--
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dd8bf1302f..38c86a5ded 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1642,6 +1642,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 {
 _update_runstate_area(prev);
 vpmu_switch_from(prev);
+np2m_schedule_out();
 }
 
 if ( is_hvm_domain(prevd) && !list_empty(>arch.hvm_vcpu.tm_list) )
@@ -1690,6 +1691,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
 /* Must be done with interrupts enabled */
 vpmu_switch_to(next);
+np2m_schedule_in();
 }
 
 /* Ensure that the vcpu has an up-to-date time base. */
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 7b193767cd..2203d541ea 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1187,6 +1187,7 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
 
 /* Setup virtual ETP for L2 guest*/
 if ( nestedhvm_paging_mode_hap(v) )
+/* This will setup the initial np2m for the nested vCPU */
 __vmwrite(EPT_POINTER, get_shadow_eptp(v));
 else
 __vmwrite(EPT_POINTER, get_host_eptp(v));
@@ -1353,6 +1354,9 @@ static void virtual_vmexit(struct cpu_user_regs *regs)
  !(v->arch.hvm_vcpu.guest_efer & EFER_LMA) )
 shadow_to_vvmcs_bulk(v, ARRAY_SIZE(gpdpte_fields), gpdpte_fields);
 
+/* This will clear current pCPU bit in p2m->dirty_cpumask */
+np2m_schedule_out();
+
 vmx_vmcs_switch(v->arch.hvm_vmx.vmcs_pa, nvcpu->nv_n1vmcx_pa);
 
 nestedhvm_vcpu_exit_guestmode(v);
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 364fdd8c13..480459ae51 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1830,6 +1830,7 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 struct domain *d = v->domain;
 struct p2m_domain *p2m;
 uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
+unsigned int i;
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
 np2m_base &= ~(0xfffull);
@@ -1843,10 +1844,34 @@ p2m_get_nestedp2m_locked(struct vcpu *v)
 if ( p2m ) 
 {
 p2m_lock(p2m);
-if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
+if ( p2m->np2m_base == np2m_base )
 {
-if ( p2m->np2m_base == P2M_BASE_EADDR )
+/* Check if np2m was flushed just before the lock */
+if ( nv->np2m_generation != p2m->np2m_generation )
 nvcpu_flush(v);
+/* np2m is up-to-date */
+p2m->np2m_base = np2m_base;
+assign_np2m(v, p2m);
+nestedp2m_unlock(d);
+
+return p2m;
+}
+else if ( p2m->np2m_base != P2M_BASE_EADDR )
+{
+/* vCPU is switching from some other valid np2m */
+cpumask_clear_cpu(v->processor, p2m->dirty_cpumask);
+}
+p2m_unlock(p2m);
+}
+
+/* Share a np2m if possible */
+for ( i = 0; i < MAX_NESTEDP2M; i++ )
+{
+p2m = d->arch.nested_p2m[i];
+p2m_lock(p2m);
+if ( p2m->np2m_base == np2m_base )
+{
+nvcpu_flush(v);
 p2m->np2m_base = np2m_base;
 assign_np2m(v, p2m);
 nestedp2m_unlock(d);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 02/12] x86/np2m: add np2m_flush_eptp()

2017-07-18 Thread Sergey Dyasli
The new function finds all np2m objects with the specified eptp and
flushes them. p2m_flush_table_locked() is added in order not to release
the p2m lock after np2m_base check.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 34 +++---
 xen/include/asm-x86/p2m.h |  2 ++
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b8c8bba421..bc330d8f52 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1708,15 +1708,14 @@ p2m_getlru_nestedp2m(struct domain *d, struct 
p2m_domain *p2m)
 return p2m;
 }
 
-/* Reset this p2m table to be empty */
 static void
-p2m_flush_table(struct p2m_domain *p2m)
+p2m_flush_table_locked(struct p2m_domain *p2m)
 {
 struct page_info *top, *pg;
 struct domain *d = p2m->domain;
 mfn_t mfn;
 
-p2m_lock(p2m);
+ASSERT(p2m_locked_by_me(p2m));
 
 /*
  * "Host" p2m tables can have shared entries  that need a bit more care
@@ -1756,6 +1755,14 @@ p2m_flush_table(struct p2m_domain *p2m)
 p2m_unlock(p2m);
 }
 
+/* Reset this p2m table to be empty */
+static void
+p2m_flush_table(struct p2m_domain *p2m)
+{
+p2m_lock(p2m);
+p2m_flush_table_locked(p2m);
+}
+
 void
 p2m_flush(struct vcpu *v, struct p2m_domain *p2m)
 {
@@ -1773,6 +1780,27 @@ p2m_flush_nestedp2m(struct domain *d)
 p2m_flush_table(d->arch.nested_p2m[i]);
 }
 
+void np2m_flush_eptp(struct vcpu *v, unsigned long eptp)
+{
+struct domain *d = v->domain;
+struct p2m_domain *p2m;
+unsigned int i;
+
+eptp &= ~(0xfffull);
+
+nestedp2m_lock(d);
+for ( i = 0; i < MAX_NESTEDP2M; i++ )
+{
+p2m = d->arch.nested_p2m[i];
+p2m_lock(p2m);
+if ( p2m->np2m_base == eptp )
+p2m_flush_table_locked(p2m);
+else
+p2m_unlock(p2m);
+}
+nestedp2m_unlock(d);
+}
+
 static void assign_np2m(struct vcpu *v, struct p2m_domain *p2m)
 {
 struct nestedvcpu *nv = _nestedhvm(v);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 9086bb35dc..0e3387 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -779,6 +779,8 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa);
 void p2m_flush(struct vcpu *v, struct p2m_domain *p2m);
 /* Flushes all nested p2m tables */
 void p2m_flush_nestedp2m(struct domain *d);
+/* Flushes all np2m objects with the specified eptp */
+void np2m_flush_eptp(struct vcpu *v, unsigned long eptp);
 
 void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
 l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 08/12] x86/np2m: add p2m_get_nestedp2m_locked()

2017-07-18 Thread Sergey Dyasli
The new function returns still write-locked np2m.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 12 +---
 xen/include/asm-x86/p2m.h |  2 ++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 4b83d4a4f1..364fdd8c13 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1824,7 +1824,7 @@ static void nvcpu_flush(struct vcpu *v)
 }
 
 struct p2m_domain *
-p2m_get_nestedp2m(struct vcpu *v)
+p2m_get_nestedp2m_locked(struct vcpu *v)
 {
 struct nestedvcpu *nv = _nestedhvm(v);
 struct domain *d = v->domain;
@@ -1849,7 +1849,6 @@ p2m_get_nestedp2m(struct vcpu *v)
 nvcpu_flush(v);
 p2m->np2m_base = np2m_base;
 assign_np2m(v, p2m);
-p2m_unlock(p2m);
 nestedp2m_unlock(d);
 
 return p2m;
@@ -1865,12 +1864,19 @@ p2m_get_nestedp2m(struct vcpu *v)
 p2m->np2m_base = np2m_base;
 nvcpu_flush(v);
 assign_np2m(v, p2m);
-p2m_unlock(p2m);
 nestedp2m_unlock(d);
 
 return p2m;
 }
 
+struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v)
+{
+struct p2m_domain *p2m = p2m_get_nestedp2m_locked(v);
+p2m_unlock(p2m);
+
+return p2m;
+}
+
 struct p2m_domain *
 p2m_get_p2m(struct vcpu *v)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 801a11a960..936d1142c8 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -364,6 +364,8 @@ struct p2m_domain {
  * Updates vCPU's n2pm to match its EPTP in VMCS12 and returns that np2m.
  */
 struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v);
+/* Similar to the above except that returned p2m is still write-locked */
+struct p2m_domain *p2m_get_nestedp2m_locked(struct vcpu *v);
 
 /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m().
  * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m().
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 05/12] x86/np2m: add np2m_generation

2017-07-18 Thread Sergey Dyasli
Add np2m_generation variable to both p2m_domain and nestedvcpu.

np2m's generation will be incremented each time the np2m is flushed.
This will allow to detect if a nested vcpu has the stale np2m.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/nestedhvm.c   | 1 +
 xen/arch/x86/mm/p2m.c  | 3 +++
 xen/include/asm-x86/hvm/vcpu.h | 1 +
 xen/include/asm-x86/p2m.h  | 1 +
 4 files changed, 6 insertions(+)

diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c
index f2f7469d86..32b8acca6a 100644
--- a/xen/arch/x86/hvm/nestedhvm.c
+++ b/xen/arch/x86/hvm/nestedhvm.c
@@ -56,6 +56,7 @@ nestedhvm_vcpu_reset(struct vcpu *v)
 nv->nv_vvmcxaddr = INVALID_PADDR;
 nv->nv_flushp2m = 0;
 nv->nv_p2m = NULL;
+nv->np2m_generation = 0;
 
 hvm_asid_flush_vcpu_asid(>nv_n2asid);
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index e7bd0dbac8..4fc2d94b46 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -73,6 +73,7 @@ static int p2m_initialise(struct domain *d, struct p2m_domain 
*p2m)
 p2m->p2m_class = p2m_host;
 
 p2m->np2m_base = P2M_BASE_EADDR;
+p2m->np2m_generation = 0;
 
 for ( i = 0; i < ARRAY_SIZE(p2m->pod.mrp.list); ++i )
 p2m->pod.mrp.list[i] = gfn_x(INVALID_GFN);
@@ -1735,6 +1736,7 @@ p2m_flush_table_locked(struct p2m_domain *p2m)
 
 /* This is no longer a valid nested p2m for any address space */
 p2m->np2m_base = P2M_BASE_EADDR;
+p2m->np2m_generation++;
 
 /* Make sure nobody else is using this p2m table */
 nestedhvm_vmcx_flushtlb(p2m);
@@ -1811,6 +1813,7 @@ static void assign_np2m(struct vcpu *v, struct p2m_domain 
*p2m)
 
 nv->nv_flushp2m = 0;
 nv->nv_p2m = p2m;
+nv->np2m_generation = p2m->np2m_generation;
 cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
 }
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 6c54773f1c..91651581db 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -115,6 +115,7 @@ struct nestedvcpu {
 
 bool_t nv_flushp2m; /* True, when p2m table must be flushed */
 struct p2m_domain *nv_p2m; /* used p2m table for this vcpu */
+uint64_t np2m_generation;
 
 struct hvm_vcpu_asid nv_n2asid;
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index cc1bab9eb7..eedc7fd412 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -209,6 +209,7 @@ struct p2m_domain {
  * to set it to any other value. */
 #define P2M_BASE_EADDR (~0ULL)
 uint64_t   np2m_base;
+uint64_t   np2m_generation;
 
 /* Nested p2ms: linked list of n2pms allocated to this domain. 
  * The host p2m hasolds the head of the list and the np2ms are 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 01/12] x86/np2m: refactor p2m_get_nestedp2m()

2017-07-18 Thread Sergey Dyasli
1. Add a helper function assign_np2m()
2. Remove useless volatile
3. Update function's comment in the header
4. Minor style fixes ('\n' and d)

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 31 ++-
 xen/include/asm-x86/p2m.h |  6 +++---
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index e8a57d118c..b8c8bba421 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1773,14 +1773,24 @@ p2m_flush_nestedp2m(struct domain *d)
 p2m_flush_table(d->arch.nested_p2m[i]);
 }
 
+static void assign_np2m(struct vcpu *v, struct p2m_domain *p2m)
+{
+struct nestedvcpu *nv = _nestedhvm(v);
+struct domain *d = v->domain;
+
+/* Bring this np2m to the top of the LRU list */
+p2m_getlru_nestedp2m(d, p2m);
+
+nv->nv_flushp2m = 0;
+nv->nv_p2m = p2m;
+cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
+}
+
 struct p2m_domain *
 p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 {
-/* Use volatile to prevent gcc to cache nv->nv_p2m in a cpu register as
- * this may change within the loop by an other (v)cpu.
- */
-volatile struct nestedvcpu *nv = _nestedhvm(v);
-struct domain *d;
+struct nestedvcpu *nv = _nestedhvm(v);
+struct domain *d = v->domain;
 struct p2m_domain *p2m;
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
@@ -1790,7 +1800,6 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 nv->nv_p2m = NULL;
 }
 
-d = v->domain;
 nestedp2m_lock(d);
 p2m = nv->nv_p2m;
 if ( p2m ) 
@@ -1798,15 +1807,13 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 p2m_lock(p2m);
 if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
 {
-nv->nv_flushp2m = 0;
-p2m_getlru_nestedp2m(d, p2m);
-nv->nv_p2m = p2m;
 if ( p2m->np2m_base == P2M_BASE_EADDR )
 hvm_asid_flush_vcpu(v);
 p2m->np2m_base = np2m_base;
-cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
+assign_np2m(v, p2m);
 p2m_unlock(p2m);
 nestedp2m_unlock(d);
+
 return p2m;
 }
 p2m_unlock(p2m);
@@ -1817,11 +1824,9 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 p2m = p2m_getlru_nestedp2m(d, NULL);
 p2m_flush_table(p2m);
 p2m_lock(p2m);
-nv->nv_p2m = p2m;
 p2m->np2m_base = np2m_base;
-nv->nv_flushp2m = 0;
 hvm_asid_flush_vcpu(v);
-cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
+assign_np2m(v, p2m);
 p2m_unlock(p2m);
 nestedp2m_unlock(d);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 6395e8fd1d..9086bb35dc 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -359,9 +359,9 @@ struct p2m_domain {
 /* get host p2m table */
 #define p2m_get_hostp2m(d)  ((d)->arch.p2m)
 
-/* Get p2m table (re)usable for specified np2m base.
- * Automatically destroys and re-initializes a p2m if none found.
- * If np2m_base == 0 then v->arch.hvm_vcpu.guest_cr[3] is used.
+/*
+ * Assigns an np2m with the specified np2m_base to the specified vCPU
+ * and returns that np2m.
  */
 struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 07/12] x86/np2m: add np2m_schedule_in/out()

2017-07-18 Thread Sergey Dyasli
np2m maintenance is required for a nested vcpu during scheduling:

1. On schedule-out: clear pCPU's bit in p2m->dirty_cpumask
to prevent useless IPIs.

2. On schedule-in: check if np2m is up to date and wasn't flushed.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/p2m.c | 52 +++
 xen/include/asm-x86/p2m.h |  3 +++
 2 files changed, 55 insertions(+)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 3d65899b05..4b83d4a4f1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1880,6 +1880,58 @@ p2m_get_p2m(struct vcpu *v)
 return p2m_get_nestedp2m(v);
 }
 
+static void np2m_schedule(bool sched_out)
+{
+struct nestedvcpu *nv = _nestedhvm(current);
+struct p2m_domain *p2m;
+bool sched_in = !sched_out;
+
+if ( !nestedhvm_enabled(current->domain) ||
+ !nestedhvm_vcpu_in_guestmode(current) ||
+ !nestedhvm_paging_mode_hap(current) )
+return;
+
+p2m = nv->nv_p2m;
+if ( p2m )
+{
+bool np2m_valid;
+
+p2m_lock(p2m);
+np2m_valid = p2m->np2m_base == nhvm_vcpu_p2m_base(current) &&
+ nv->np2m_generation == p2m->np2m_generation;
+if ( sched_out && np2m_valid )
+{
+/*
+ * The np2m is up to date but this vCPU will no longer use it,
+ * which means there are no reasons to send a flush IPI.
+ */
+cpumask_clear_cpu(current->processor, p2m->dirty_cpumask);
+}
+else if ( sched_in )
+{
+if ( !np2m_valid )
+{
+/* This vCPU's np2m was flushed while it was not runnable */
+hvm_asid_flush_core();
+vcpu_nestedhvm(current).nv_p2m = NULL;
+}
+else
+cpumask_set_cpu(current->processor, p2m->dirty_cpumask);
+}
+p2m_unlock(p2m);
+}
+}
+
+void np2m_schedule_out(void)
+{
+np2m_schedule(true);
+}
+
+void np2m_schedule_in(void)
+{
+np2m_schedule(false);
+}
+
 unsigned long paging_gva_to_gfn(struct vcpu *v,
 unsigned long va,
 uint32_t *pfec)
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index eedc7fd412..801a11a960 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -370,6 +370,9 @@ struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v);
  */
 struct p2m_domain *p2m_get_p2m(struct vcpu *v);
 
+void np2m_schedule_out(void);
+void np2m_schedule_in(void);
+
 static inline bool_t p2m_is_hostp2m(const struct p2m_domain *p2m)
 {
 return p2m->p2m_class == p2m_host;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 03/12] x86/vvmx: use np2m_flush_eptp() for INVEPT_SINGLE_CONTEXT

2017-07-18 Thread Sergey Dyasli
nvmx_handle_invept() updates current's np2m just to flush it. Instead,
use the new np2m_flush_eptp() directly for this purpose.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 2c8cf637a8..56678127e1 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1895,12 +1895,7 @@ int nvmx_handle_invept(struct cpu_user_regs *regs)
 {
 case INVEPT_SINGLE_CONTEXT:
 {
-struct p2m_domain *p2m = p2m_get_nestedp2m(current, eptp);
-if ( p2m )
-{
-p2m_flush(current, p2m);
-ept_sync_domain(p2m);
-}
+np2m_flush_eptp(current, eptp);
 break;
 }
 case INVEPT_ALL_CONTEXT:
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 04/12] x86/np2m: remove np2m_base from p2m_get_nestedp2m()

2017-07-18 Thread Sergey Dyasli
Remove np2m_base parameter as it should always match the value of
EPTP in VMCS12.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/svm/nestedsvm.c | 2 +-
 xen/arch/x86/hvm/vmx/vvmx.c  | 3 +--
 xen/arch/x86/mm/hap/nested_hap.c | 2 +-
 xen/arch/x86/mm/p2m.c| 8 
 xen/include/asm-x86/p2m.h| 5 ++---
 5 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 8fd9c23a02..c3665aec01 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -411,7 +411,7 @@ static void nestedsvm_vmcb_set_nestedp2m(struct vcpu *v,
 ASSERT(v != NULL);
 ASSERT(vvmcb != NULL);
 ASSERT(n2vmcb != NULL);
-p2m = p2m_get_nestedp2m(v, vvmcb->_h_cr3);
+p2m = p2m_get_nestedp2m(v);
 n2vmcb->_h_cr3 = pagetable_get_paddr(p2m_get_pagetable(p2m));
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 56678127e1..1011829c15 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1094,8 +1094,7 @@ static void load_shadow_guest_state(struct vcpu *v)
 
 uint64_t get_shadow_eptp(struct vcpu *v)
 {
-uint64_t np2m_base = nvmx_vcpu_eptp_base(v);
-struct p2m_domain *p2m = p2m_get_nestedp2m(v, np2m_base);
+struct p2m_domain *p2m = p2m_get_nestedp2m(v);
 struct ept_data *ept = >ept;
 
 ept->mfn = pagetable_get_pfn(p2m_get_pagetable(p2m));
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 162afed46b..ed137fa784 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -212,7 +212,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
*L2_gpa,
 uint8_t p2ma_21 = p2m_access_rwx;
 
 p2m = p2m_get_hostp2m(d); /* L0 p2m */
-nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
+nested_p2m = p2m_get_nestedp2m(v);
 
 /* walk the L1 P2M table */
 rv = nestedhap_walk_L1_p2m(v, *L2_gpa, _gpa, _order_21, _21,
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index bc330d8f52..e7bd0dbac8 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1815,11 +1815,12 @@ static void assign_np2m(struct vcpu *v, struct 
p2m_domain *p2m)
 }
 
 struct p2m_domain *
-p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
+p2m_get_nestedp2m(struct vcpu *v)
 {
 struct nestedvcpu *nv = _nestedhvm(v);
 struct domain *d = v->domain;
 struct p2m_domain *p2m;
+uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 
 /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
 np2m_base &= ~(0xfffull);
@@ -1867,7 +1868,7 @@ p2m_get_p2m(struct vcpu *v)
 if (!nestedhvm_is_n2(v))
 return p2m_get_hostp2m(v->domain);
 
-return p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
+return p2m_get_nestedp2m(v);
 }
 
 unsigned long paging_gva_to_gfn(struct vcpu *v,
@@ -1882,13 +1883,12 @@ unsigned long paging_gva_to_gfn(struct vcpu *v,
 unsigned long l2_gfn, l1_gfn;
 struct p2m_domain *p2m;
 const struct paging_mode *mode;
-uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 uint8_t l1_p2ma;
 unsigned int l1_page_order;
 int rv;
 
 /* translate l2 guest va into l2 guest gfn */
-p2m = p2m_get_nestedp2m(v, np2m_base);
+p2m = p2m_get_nestedp2m(v);
 mode = paging_get_nestedmode(v);
 l2_gfn = mode->gva_to_gfn(v, p2m, va, pfec);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 0e3387..cc1bab9eb7 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -360,10 +360,9 @@ struct p2m_domain {
 #define p2m_get_hostp2m(d)  ((d)->arch.p2m)
 
 /*
- * Assigns an np2m with the specified np2m_base to the specified vCPU
- * and returns that np2m.
+ * Updates vCPU's n2pm to match its EPTP in VMCS12 and returns that np2m.
  */
-struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
+struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v);
 
 /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m().
  * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m().
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 00/12] Nested p2m: allow sharing between vCPUs

2017-07-18 Thread Sergey Dyasli
Nested p2m (shadow EPT) is an object that stores memory address
translations from L2 GPA directly to L0 HPA. This is achieved by
combining together L1 EPT tables with L0 EPT during L2 EPT violations.

In the usual case, L1 uses the same EPTP value in VMCS12 for all vCPUs
of a L2 guest. But unfortunately, in current Xen's implementation, each
vCPU has its own n2pm object which cannot be shared with other vCPUs.
This leads to the following issues if a nested guest has SMP:

1. There will be multiple np2m objects (1 per nested vCPU) with
   the same np2m_base (L1 EPTP value in VMCS12).

2. Same EPT violations will be processed independently by each vCPU.

3. Since MAX_NESTEDP2M is defined as 10, if a domain has more than
   10 nested vCPUs, performance will be extremely degraded due to
   constant np2m LRU list thrashing and np2m flushing.

This patch series makes it possible to share one np2m object between
different vCPUs that have the same np2m_base. Sharing of np2m objects
improves scalability of a domain from 10 nested vCPUs to 10 nested
guests (with arbitrary number of vCPUs per guest).

Known issues in current implementation:

* AMD's nested SVM is likely broken. Unfortunately, I don't have any
  H/W currently to perform a proper testing.

Sergey Dyasli (12):
  x86/np2m: refactor p2m_get_nestedp2m()
  x86/np2m: add np2m_flush_eptp()
  x86/vvmx: use np2m_flush_eptp() for INVEPT_SINGLE_CONTEXT
  x86/np2m: remove np2m_base from p2m_get_nestedp2m()
  x86/np2m: add np2m_generation
  x86/vvmx: add stale_eptp flag
  x86/np2m: add np2m_schedule_in/out()
  x86/np2m: add p2m_get_nestedp2m_locked()
  x86/np2m: improve nestedhvm_hap_nested_page_fault()
  x86/np2m: implement sharing of np2m between vCPUs
  x86/np2m: add break to np2m_flush_eptp()
  x86/vvmx: remove EPTP write from ept_handle_violation()

 xen/arch/x86/domain.c  |   2 +
 xen/arch/x86/hvm/nestedhvm.c   |   2 +
 xen/arch/x86/hvm/svm/nestedsvm.c   |   2 +-
 xen/arch/x86/hvm/vmx/entry.S   |   6 ++
 xen/arch/x86/hvm/vmx/vmx.c |  14 +--
 xen/arch/x86/hvm/vmx/vvmx.c|  29 --
 xen/arch/x86/mm/hap/nested_hap.c   |  29 +++---
 xen/arch/x86/mm/p2m.c  | 180 +++--
 xen/include/asm-x86/hvm/vcpu.h |   1 +
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 +
 xen/include/asm-x86/p2m.h  |  15 +++-
 11 files changed, 218 insertions(+), 64 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 09/12] x86/np2m: improve nestedhvm_hap_nested_page_fault()

2017-07-18 Thread Sergey Dyasli
There is a possibility for nested_p2m to became stale between
nestedhvm_hap_nested_page_fault() and nestedhap_fix_p2m(). Simply
use p2m_get_nestedp2m_lock() to guarantee that correct np2m is used.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/mm/hap/nested_hap.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index ed137fa784..96afe632b5 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -101,28 +101,21 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
   unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma)
 {
 int rc = 0;
+unsigned long gfn, mask;
+mfn_t mfn;
+
 ASSERT(p2m);
 ASSERT(p2m->set_entry);
+ASSERT(p2m_locked_by_me(p2m));
 
-p2m_lock(p2m);
-
-/* If this p2m table has been flushed or recycled under our feet, 
- * leave it alone.  We'll pick up the right one as we try to 
- * vmenter the guest. */
-if ( p2m->np2m_base == nhvm_vcpu_p2m_base(v) )
-{
-unsigned long gfn, mask;
-mfn_t mfn;
-
-/* If this is a superpage mapping, round down both addresses
- * to the start of the superpage. */
-mask = ~((1UL << page_order) - 1);
+/* If this is a superpage mapping, round down both addresses
+ * to the start of the superpage. */
+mask = ~((1UL << page_order) - 1);
 
-gfn = (L2_gpa >> PAGE_SHIFT) & mask;
-mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
+gfn = (L2_gpa >> PAGE_SHIFT) & mask;
+mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
 
-rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
-}
+rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
 
 p2m_unlock(p2m);
 
@@ -212,7 +205,6 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
*L2_gpa,
 uint8_t p2ma_21 = p2m_access_rwx;
 
 p2m = p2m_get_hostp2m(d); /* L0 p2m */
-nested_p2m = p2m_get_nestedp2m(v);
 
 /* walk the L1 P2M table */
 rv = nestedhap_walk_L1_p2m(v, *L2_gpa, _gpa, _order_21, _21,
@@ -278,6 +270,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
*L2_gpa,
 p2ma_10 &= (p2m_access_t)p2ma_21;
 
 /* fix p2m_get_pagetable(nested_p2m) */
+nested_p2m = p2m_get_nestedp2m_locked(v);
 nestedhap_fix_p2m(v, nested_p2m, *L2_gpa, L0_gpa, page_order_20,
 p2mt_10, p2ma_10);
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RFC 06/12] x86/vvmx: add stale_eptp flag

2017-07-18 Thread Sergey Dyasli
The new variable will indicate if update of a shadow EPTP is needed
prior to vmentry. Update is required if a nested vcpu gets a new np2m
or if its np2m was flushed by an IPI.

Helper function nvcpu_flush() is added.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/nestedhvm.c   |  1 +
 xen/arch/x86/hvm/vmx/entry.S   |  6 ++
 xen/arch/x86/hvm/vmx/vmx.c |  8 +++-
 xen/arch/x86/hvm/vmx/vvmx.c| 15 +++
 xen/arch/x86/mm/p2m.c  | 10 --
 xen/include/asm-x86/hvm/vmx/vvmx.h |  2 ++
 6 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c
index 32b8acca6a..e9b1d8e628 100644
--- a/xen/arch/x86/hvm/nestedhvm.c
+++ b/xen/arch/x86/hvm/nestedhvm.c
@@ -108,6 +108,7 @@ nestedhvm_flushtlb_ipi(void *info)
  */
 hvm_asid_flush_core();
 vcpu_nestedhvm(v).nv_p2m = NULL;
+vcpu_2_nvmx(v).stale_eptp = true;
 }
 
 void
diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 9f1755b31c..5480206cac 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -77,6 +77,8 @@ UNLIKELY_END(realmode)
 
 mov  %rsp,%rdi
 call vmx_vmenter_helper
+cmp  $0,%eax
+jne .Lvmx_vmentry_restart
 mov  VCPU_hvm_guest_cr2(%rbx),%rax
 
 pop  %r15
@@ -115,6 +117,10 @@ ENTRY(vmx_asm_do_vmentry)
 GET_CURRENT(bx)
 jmp  .Lvmx_do_vmentry
 
+.Lvmx_vmentry_restart:
+sti
+jmp  .Lvmx_do_vmentry
+
 .Lvmx_goto_emulator:
 sti
 mov  %rsp,%rdi
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 69ce3aae25..35aa57e24f 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4236,13 +4236,17 @@ static void lbr_fixup(void)
 bdw_erratum_bdf14_fixup();
 }
 
-void vmx_vmenter_helper(const struct cpu_user_regs *regs)
+int vmx_vmenter_helper(const struct cpu_user_regs *regs)
 {
 struct vcpu *curr = current;
 u32 new_asid, old_asid;
 struct hvm_vcpu_asid *p_asid;
 bool_t need_flush;
 
+/* Shadow EPTP can't be updated here because irqs are disabled */
+ if ( nestedhvm_vcpu_in_guestmode(curr) && vcpu_2_nvmx(curr).stale_eptp )
+ return 1;
+
 if ( curr->domain->arch.hvm_domain.pi_ops.do_resume )
 curr->domain->arch.hvm_domain.pi_ops.do_resume(curr);
 
@@ -4303,6 +4307,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
 __vmwrite(GUEST_RIP,regs->rip);
 __vmwrite(GUEST_RSP,regs->rsp);
 __vmwrite(GUEST_RFLAGS, regs->rflags | X86_EFLAGS_MBS);
+
+return 0;
 }
 
 /*
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 1011829c15..7b193767cd 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -120,6 +120,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
 nvmx->iobitmap[1] = NULL;
 nvmx->msrbitmap = NULL;
 INIT_LIST_HEAD(>launched_list);
+nvmx->stale_eptp = false;
 return 0;
 }
  
@@ -1390,12 +1391,26 @@ static void virtual_vmexit(struct cpu_user_regs *regs)
 vmsucceed(regs);
 }
 
+static void nvmx_eptp_update(void)
+{
+if ( !nestedhvm_vcpu_in_guestmode(current) ||
+  vcpu_nestedhvm(current).nv_vmexit_pending ||
+ !vcpu_2_nvmx(current).stale_eptp ||
+ !nestedhvm_paging_mode_hap(current) )
+return;
+
+__vmwrite(EPT_POINTER, get_shadow_eptp(current));
+vcpu_2_nvmx(current).stale_eptp = false;
+}
+
 void nvmx_switch_guest(void)
 {
 struct vcpu *v = current;
 struct nestedvcpu *nvcpu = _nestedhvm(v);
 struct cpu_user_regs *regs = guest_cpu_user_regs();
 
+nvmx_eptp_update();
+
 /*
  * A pending IO emulation may still be not finished. In this case, no
  * virtual vmswitch is allowed. Or else, the following IO emulation will
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 4fc2d94b46..3d65899b05 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1817,6 +1817,12 @@ static void assign_np2m(struct vcpu *v, struct 
p2m_domain *p2m)
 cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
 }
 
+static void nvcpu_flush(struct vcpu *v)
+{
+hvm_asid_flush_vcpu(v);
+vcpu_2_nvmx(v).stale_eptp = true;
+}
+
 struct p2m_domain *
 p2m_get_nestedp2m(struct vcpu *v)
 {
@@ -1840,7 +1846,7 @@ p2m_get_nestedp2m(struct vcpu *v)
 if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
 {
 if ( p2m->np2m_base == P2M_BASE_EADDR )
-hvm_asid_flush_vcpu(v);
+nvcpu_flush(v);
 p2m->np2m_base = np2m_base;
 assign_np2m(v, p2m);
 p2m_unlock(p2m);
@@ -1857,7 +1863,7 @@ p2m_get_nestedp2m(struct vcpu *v)
 p2m_flush_table(p2m);
 p2m_lock(p2m);
 p2m->np2m_base = np2m_base;
-hvm_asid_flush_vcpu(v);
+nvcpu_flush(v);
   

Re: [Xen-devel] [PATCH v1 4/6] vvmx: add hvm_max_vmx_msr_policy

2017-07-07 Thread Sergey Dyasli
On Thu, 2017-07-06 at 06:28 -0600, Jan Beulich wrote:
> > > > On 06.07.17 at 12:23,  wrote:
> > 
> > On Tue, 2017-07-04 at 09:04 -0600, Jan Beulich wrote:
> > > > > > On 26.06.17 at 12:44,  wrote:
> > > > 
> > > > +{
> > > > +struct vmx_msr_policy *p = _max_vmx_msr_policy;
> > > > +uint64_t data, *msr;
> > > > +u32 default1_bits;
> > > > +
> > > > +*p = raw_vmx_msr_policy;
> > > > +
> > > > +/* XXX: vmcs_revision_id for nested virt */
> > > 
> > > There was no such comment (presumably indicating something that
> > > yet needs doing) in the old code - what's this about? Can't this be
> > > implemented instead of such a comment be added?
> > 
> > Currently L1 sees vmcs_revision_id value from the H/W MSR. Which is
> > fine until live migration is concerned. The question is: what should
> > happen if L1 is migrated to some other H/W with different vmcs id?
> > One possible solution is to use "virtual vmcs id" in the policy object.
> 
> Are there any other (reasonable) ones, besides forbidding
> migration (live or not). Otoh, if migration between hosts with
> different IDs is allowed, won't we risk the page layout (which
> is intentionally unknown to us) changing as well? Or in order
> to be migrateable, such guests would have to be forced to
> not use shadow VMCS, and we'd have to pin down (as part of
> the guest ABI) the software layout we use.

During a discussion with Andrew, we identified difficulties in migration
of an L1 hypervisor to a H/W with the different vmcs revision id when
VMCS shadowing is used.

It seems to be a reasonable requirement for migration to have H/W with
the same vmcs revision id. Therefore it is fine to provide L1 with
the real H/W id and I will remove that comment in v2.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 4/6] vvmx: add hvm_max_vmx_msr_policy

2017-07-06 Thread Sergey Dyasli
On Tue, 2017-07-04 at 09:04 -0600, Jan Beulich wrote:
> > > > On 26.06.17 at 12:44,  wrote:
> > 
> > +{
> > +struct vmx_msr_policy *p = _max_vmx_msr_policy;
> > +uint64_t data, *msr;
> > +u32 default1_bits;
> > +
> > +*p = raw_vmx_msr_policy;
> > +
> > +/* XXX: vmcs_revision_id for nested virt */
> 
> There was no such comment (presumably indicating something that
> yet needs doing) in the old code - what's this about? Can't this be
> implemented instead of such a comment be added?

Currently L1 sees vmcs_revision_id value from the H/W MSR. Which is
fine until live migration is concerned. The question is: what should
happen if L1 is migrated to some other H/W with different vmcs id?
One possible solution is to use "virtual vmcs id" in the policy object.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 2/6] vmx: add raw_vmx_msr_policy

2017-07-06 Thread Sergey Dyasli
On Tue, 2017-07-04 at 08:15 -0600, Jan Beulich wrote:
> > > > On 26.06.17 at 12:44,  wrote:
> > 
> > @@ -611,6 +624,9 @@ int vmx_cpu_up(void)
> >  
> >  BUG_ON(!(read_cr4() & X86_CR4_VMXE));
> >  
> > +if ( (rc = calculate_raw_policy(false)) != 0 )
> > +return rc;
> > +
> >  /* 
> >   * Ensure the current processor operating mode meets 
> >   * the requred CRO fixed bits in VMX operation. 
> 
> Btw., is it intentional that the function is being invoked for the BSP a
> second time here (after start_vmx() did so already), with the flag
> now being passed with the wrong value?

Unfortunately, I couldn't find a better way of detecting if the code is running
on the boot CPU. And I decided to use the existing practice of passing a flag.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 1/6] vmx: add struct vmx_msr_policy

2017-07-06 Thread Sergey Dyasli
On Tue, 2017-07-04 at 07:57 -0600, Jan Beulich wrote:
> > > > On 26.06.17 at 12:44,  wrote:
> > 
> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> > @@ -562,6 +562,350 @@ void vmx_domain_flush_pml_buffers(struct domain *d);
> >  
> >  void vmx_domain_update_eptp(struct domain *d);
> >  
> > +union vmx_pin_based_exec_control_bits {
> > +uint32_t raw;
> > +struct {
> > +bool ext_intr_exiting:1;
> > +uint32_t :2;  /* 1:2 reserved */
> > +bool  nmi_exiting:1;
> > +uint32_t :1;  /* 4 reserved */
> > +bool virtual_nmis:1;
> > +boolpreempt_timer:1;
> > +bool posted_interrupt:1;
> > +uint32_t :24; /* 8:31 reserved */
> 
> This mixture of bool and uint32_t worries me - I don't think the
> resulting layout is well defined. Yes, you put suitable
> BUILD_BUG_ON()s in place to catch possible issues, but anyway.

It was Andrew's suggestion to use bool because "It avoids subtle bugs like
foo.exec_only = (a & EXEC) truncating to zero". In the end it doesn't matter
which types are being used for bitfields, the layout depends only on the width.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1] vvmx: fix ept_sync() for nested p2m

2017-06-28 Thread Sergey Dyasli
If ept_sync_domain() is called for np2m, the following happens:

1. *np2m*::ept_data::invalidate cpumask is updated
2. IPIs are sent for CPUs in domain_dirty_cpumask forcing vmexits
3. vmx_vmenter_helper() checks *hostp2m*::ept_data::invalidate
   and does nothing

Which is clearly a bug. Make ept_sync_domain() to update hostp2m's
invalidate mask in nested p2m case and make vmx_vmenter_helper() to
invalidate EPT translations for all EPTPs if nested virt is enabled.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 5 -
 xen/arch/x86/mm/p2m-ept.c  | 9 +++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c53b24955a..a8bb550720 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4278,7 +4278,10 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
 if ( cpumask_test_cpu(cpu, ept->invalidate) )
 {
 cpumask_clear_cpu(cpu, ept->invalidate);
-__invept(INVEPT_SINGLE_CONTEXT, ept->eptp, 0);
+if ( nestedhvm_enabled(curr->domain) )
+__invept(INVEPT_ALL_CONTEXT, 0, 0);
+else
+__invept(INVEPT_SINGLE_CONTEXT, ept->eptp, 0);
 }
 }
 
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index ecab56fbec..8d9da9203c 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -1153,8 +1153,13 @@ static void ept_sync_domain_prepare(struct p2m_domain 
*p2m)
 struct domain *d = p2m->domain;
 struct ept_data *ept = >ept;
 
-if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) )
-p2m_flush_nestedp2m(d);
+if ( nestedhvm_enabled(d) )
+{
+if ( p2m_is_nestedp2m(p2m) )
+ept = _get_hostp2m(d)->ept;
+else
+p2m_flush_nestedp2m(d);
+}
 
 /*
  * Need to invalidate on all PCPUs because either:
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 4/6] vvmx: add hvm_max_vmx_msr_policy

2017-06-26 Thread Sergey Dyasli
Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

Add hvm_max_vmx_msr_policy object which represents the end result of
nvmx_msr_read_intercept() on current H/W.  Most of the code is moved
from nvmx_msr_read_intercept() to calculate_hvm_max_policy() which is
called only once during the startup.

There is no functional change to what L1 sees in VMX MSRs.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c |   3 +
 xen/arch/x86/hvm/vmx/vvmx.c | 297 +---
 2 files changed, 147 insertions(+), 153 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index dbf6eb7433..da6ddf52f1 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -244,6 +244,8 @@ static u32 adjust_vmx_controls(
 return ctl;
 }
 
+void calculate_hvm_max_policy(void);
+
 static int vmx_init_vmcs_config(void)
 {
 u32 min, opt;
@@ -463,6 +465,7 @@ static int vmx_init_vmcs_config(void)
 vmx_virt_exception = !!(_vmx_secondary_exec_control &
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
 vmx_display_features();
+calculate_hvm_max_policy();
 
 /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
 if ( raw_vmx_msr_policy.basic.vmcs_region_size > PAGE_SIZE )
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 3560faec6d..657371ec69 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1941,6 +1941,8 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
 return X86EMUL_OKAY;
 }
 
+struct vmx_msr_policy __read_mostly hvm_max_vmx_msr_policy;
+
 #define __emul_value(enable1, default1) \
 ((enable1 | default1) << 32 | (default1))
 
@@ -1948,6 +1950,134 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
 (((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \
 ((uint32_t)(__emul_value(enable1, default1) | host_value)))
 
+void __init calculate_hvm_max_policy(void)
+{
+struct vmx_msr_policy *p = _max_vmx_msr_policy;
+uint64_t data, *msr;
+u32 default1_bits;
+
+*p = raw_vmx_msr_policy;
+
+/* XXX: vmcs_revision_id for nested virt */
+
+/* Pinbased controls 1-settings */
+data = PIN_BASED_EXT_INTR_MASK |
+   PIN_BASED_NMI_EXITING |
+   PIN_BASED_PREEMPT_TIMER;
+
+msr = >msr[MSR_IA32_VMX_PINBASED_CTLS - MSR_IA32_VMX_BASIC];
+*msr = gen_vmx_msr(data, VMX_PINBASED_CTLS_DEFAULT1, *msr);
+msr = >msr[MSR_IA32_VMX_TRUE_PINBASED_CTLS - MSR_IA32_VMX_BASIC];
+*msr = gen_vmx_msr(data, VMX_PINBASED_CTLS_DEFAULT1, *msr);
+
+/* Procbased controls 1-settings */
+default1_bits = VMX_PROCBASED_CTLS_DEFAULT1;
+data = CPU_BASED_HLT_EXITING |
+   CPU_BASED_VIRTUAL_INTR_PENDING |
+   CPU_BASED_CR8_LOAD_EXITING |
+   CPU_BASED_CR8_STORE_EXITING |
+   CPU_BASED_INVLPG_EXITING |
+   CPU_BASED_CR3_LOAD_EXITING |
+   CPU_BASED_CR3_STORE_EXITING |
+   CPU_BASED_MONITOR_EXITING |
+   CPU_BASED_MWAIT_EXITING |
+   CPU_BASED_MOV_DR_EXITING |
+   CPU_BASED_ACTIVATE_IO_BITMAP |
+   CPU_BASED_USE_TSC_OFFSETING |
+   CPU_BASED_UNCOND_IO_EXITING |
+   CPU_BASED_RDTSC_EXITING |
+   CPU_BASED_MONITOR_TRAP_FLAG |
+   CPU_BASED_VIRTUAL_NMI_PENDING |
+   CPU_BASED_ACTIVATE_MSR_BITMAP |
+   CPU_BASED_PAUSE_EXITING |
+   CPU_BASED_RDPMC_EXITING |
+   CPU_BASED_TPR_SHADOW |
+   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+
+msr = >msr[MSR_IA32_VMX_PROCBASED_CTLS - MSR_IA32_VMX_BASIC];
+*msr = gen_vmx_msr(data, default1_bits, *msr);
+
+default1_bits &= ~(CPU_BASED_CR3_LOAD_EXITING |
+   CPU_BASED_CR3_STORE_EXITING |
+   CPU_BASED_INVLPG_EXITING);
+
+msr = >msr[MSR_IA32_VMX_TRUE_PROCBASED_CTLS - MSR_IA32_VMX_BASIC];
+*msr = gen_vmx_msr(data, default1_bits, *msr);
+
+/* Procbased-2 controls 1-settings */
+data = SECONDARY_EXEC_DESCRIPTOR_TABLE_EXITING |
+   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+   SECONDARY_EXEC_ENABLE_VPID |
+   SECONDARY_EXEC_UNRESTRICTED_GUEST |
+   SECONDARY_EXEC_ENABLE_EPT;
+msr = >msr[MSR_IA32_VMX_PROCBASED_CTLS2 - MSR_IA32_VMX_BASIC];
+*msr = gen_vmx_msr(data, 0, *msr);
+
+/* Vmexit controls 1-settings */
+data = VM_EXIT_ACK_INTR_ON_EXIT |
+   VM_EXIT_IA32E_MODE |
+   VM_EXIT_SAVE_PREEMPT_TIMER |
+   VM_EXIT_SAVE_GUEST_PAT |
+   VM_EXIT_LOAD_HOST_PAT |
+   VM_EXIT_

[Xen-devel] [PATCH v1 2/6] vmx: add raw_vmx_msr_policy

2017-06-26 Thread Sergey Dyasli
Add calculate_raw_policy() which fills raw_vmx_msr_policy (the actual
contents of H/W VMX MSRs) on the boot CPU.  On secondary CPUs, this
function checks that contents of VMX MSRs match the boot CPU's contents.

Remove lesser version of same-contents-check from vmx_init_vmcs_config().

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c| 130 +
 xen/arch/x86/hvm/vmx/vmx.c |   4 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h |   2 +
 3 files changed, 79 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index e6ea197230..00fbc0ccb8 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -144,6 +144,8 @@ static void __init vmx_display_features(void)
 printk(" - none\n");
 }
 
+struct vmx_msr_policy __read_mostly raw_vmx_msr_policy;
+
 bool vmx_msr_available(struct vmx_msr_policy *p, uint32_t msr)
 {
 if ( msr < MSR_IA32_VMX_BASIC || msr > MSR_IA32_VMX_VMFUNC )
@@ -152,6 +154,74 @@ bool vmx_msr_available(struct vmx_msr_policy *p, uint32_t 
msr)
 return p->available & (1u << (msr - MSR_IA32_VMX_BASIC));
 }
 
+int calculate_raw_policy(bool bsp)
+{
+struct vmx_msr_policy policy;
+struct vmx_msr_policy *p = 
+int msr;
+
+/* Raw policy is filled only on boot CPU */
+if ( bsp )
+p = _vmx_msr_policy;
+else
+memset(, 0, sizeof(policy));
+
+p->available = 0x7ff;
+for ( msr = MSR_IA32_VMX_BASIC; msr <= MSR_IA32_VMX_VMCS_ENUM; msr++ )
+rdmsrl(msr, p->msr[msr - MSR_IA32_VMX_BASIC]);
+
+if ( p->basic.default1_zero )
+{
+p->available |= 0x1e000;
+for ( msr = MSR_IA32_VMX_TRUE_PINBASED_CTLS;
+  msr <= MSR_IA32_VMX_TRUE_ENTRY_CTLS; msr++ )
+rdmsrl(msr, p->msr[msr - MSR_IA32_VMX_BASIC]);
+}
+
+if ( p->procbased_ctls.allowed_1.activate_secondary_controls )
+{
+p->available |= 0x800;
+msr = MSR_IA32_VMX_PROCBASED_CTLS2;
+rdmsrl(msr, p->msr[msr - MSR_IA32_VMX_BASIC]);
+
+if ( p->procbased_ctls2.allowed_1.enable_ept ||
+ p->procbased_ctls2.allowed_1.enable_vpid )
+{
+p->available |= 0x1000;
+msr = MSR_IA32_VMX_EPT_VPID_CAP;
+rdmsrl(msr, p->msr[msr - MSR_IA32_VMX_BASIC]);
+}
+
+if ( p->procbased_ctls2.allowed_1.enable_vm_functions )
+{
+p->available |= 0x2;
+msr = MSR_IA32_VMX_VMFUNC;
+rdmsrl(msr, p->msr[msr - MSR_IA32_VMX_BASIC]);
+}
+}
+
+/* Check that secondary CPUs have exactly the same bits in VMX MSRs */
+if ( !bsp && memcmp(p, _vmx_msr_policy, sizeof(*p)) != 0 )
+{
+for ( msr = MSR_IA32_VMX_BASIC; msr <= MSR_IA32_VMX_VMFUNC; msr++ )
+{
+if ( p->msr[msr - MSR_IA32_VMX_BASIC] !=
+ raw_vmx_msr_policy.msr[msr - MSR_IA32_VMX_BASIC] )
+{
+printk("VMX msr %#x: saw 0x%016"PRIx64" expected 0x%016"PRIx64
+"\n", msr, p->msr[msr - MSR_IA32_VMX_BASIC],
+raw_vmx_msr_policy.msr[msr - MSR_IA32_VMX_BASIC]);
+}
+}
+
+printk("VMX: Capabilities fatally differ between CPU%d and boot CPU\n",
+   smp_processor_id());
+return -EINVAL;
+}
+
+return 0;
+}
+
 static u32 adjust_vmx_controls(
 const char *name, u32 ctl_min, u32 ctl_opt, u32 msr, bool_t *mismatch)
 {
@@ -173,13 +243,6 @@ static u32 adjust_vmx_controls(
 return ctl;
 }
 
-static bool_t cap_check(const char *name, u32 expected, u32 saw)
-{
-if ( saw != expected )
-printk("VMX %s: saw %#x expected %#x\n", name, saw, expected);
-return saw != expected;
-}
-
 static int vmx_init_vmcs_config(void)
 {
 u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt;
@@ -412,56 +475,6 @@ static int vmx_init_vmcs_config(void)
 return -EINVAL;
 }
 }
-else
-{
-/* Globals are already initialised: re-check them. */
-mismatch |= cap_check(
-"VMCS revision ID",
-vmcs_revision_id, vmx_basic_msr_low & VMX_BASIC_REVISION_MASK);
-mismatch |= cap_check(
-"Pin-Based Exec Control",
-vmx_pin_based_exec_control, _vmx_pin_based_exec_control);
-mismatch |= cap_check(
-"CPU-Based Exec Control",
-vmx_cpu_based_exec_control, _vmx_cpu_based_exec_control);
-mismatch |= cap_check(
-"Secondary Exec Control",
-vmx_secondary_exec_control, _vmx_secondary_exec_control);
-mismatch |= cap_check(
-"VMExit Control",
-vmx_vmexit_control, _vmx_vmexit_control);
-mismat

[Xen-devel] [PATCH v1 3/6] vmx: refactor vmx_init_vmcs_config()

2017-06-26 Thread Sergey Dyasli
1. Remove RDMSRs of VMX MSRs since all values are already available in
   raw_vmx_msr_policy.
2. Replace bit operations involving VMX bitmasks with accessing VMX
   features by name and using vmx_msr_available() where appropriate.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c | 56 +
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 00fbc0ccb8..dbf6eb7433 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -227,7 +227,8 @@ static u32 adjust_vmx_controls(
 {
 u32 vmx_msr_low, vmx_msr_high, ctl = ctl_min | ctl_opt;
 
-rdmsr(msr, vmx_msr_low, vmx_msr_high);
+vmx_msr_low = raw_vmx_msr_policy.msr[msr - MSR_IA32_VMX_BASIC];
+vmx_msr_high = raw_vmx_msr_policy.msr[msr - MSR_IA32_VMX_BASIC] >> 32;
 
 ctl &= vmx_msr_high; /* bit == 0 in high word ==> must be zero */
 ctl |= vmx_msr_low;  /* bit == 1 in low word  ==> must be one  */
@@ -245,19 +246,16 @@ static u32 adjust_vmx_controls(
 
 static int vmx_init_vmcs_config(void)
 {
-u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt;
+u32 min, opt;
 u32 _vmx_pin_based_exec_control;
 u32 _vmx_cpu_based_exec_control;
 u32 _vmx_secondary_exec_control = 0;
 u64 _vmx_ept_vpid_cap = 0;
-u64 _vmx_misc_cap = 0;
 u32 _vmx_vmexit_control;
 u32 _vmx_vmentry_control;
 u64 _vmx_vmfunc = 0;
 bool_t mismatch = 0;
 
-rdmsr(MSR_IA32_VMX_BASIC, vmx_basic_msr_low, vmx_basic_msr_high);
-
 min = (PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING);
 opt = (PIN_BASED_VIRTUAL_NMIS |
@@ -291,7 +289,7 @@ static int vmx_init_vmcs_config(void)
 _vmx_cpu_based_exec_control &=
 ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
 
-if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
+if ( vmx_msr_available(_vmx_msr_policy, MSR_IA32_VMX_PROCBASED_CTLS2) )
 {
 min = 0;
 opt = (SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
@@ -305,8 +303,7 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
SECONDARY_EXEC_XSAVES |
SECONDARY_EXEC_TSC_SCALING);
-rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
-if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
+if ( raw_vmx_msr_policy.misc.vmwrite_all )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
 if ( opt_vpid_enabled )
 opt |= SECONDARY_EXEC_ENABLE_VPID;
@@ -331,10 +328,9 @@ static int vmx_init_vmcs_config(void)
 }
 
 /* The IA32_VMX_EPT_VPID_CAP MSR exists only when EPT or VPID available */
-if ( _vmx_secondary_exec_control & (SECONDARY_EXEC_ENABLE_EPT |
-SECONDARY_EXEC_ENABLE_VPID) )
+if ( vmx_msr_available(_vmx_msr_policy, MSR_IA32_VMX_EPT_VPID_CAP) )
 {
-rdmsrl(MSR_IA32_VMX_EPT_VPID_CAP, _vmx_ept_vpid_cap);
+_vmx_ept_vpid_cap = raw_vmx_msr_policy.ept_vpid_cap.raw;
 
 if ( !opt_ept_ad )
 _vmx_ept_vpid_cap &= ~VMX_EPT_AD_BIT;
@@ -379,10 +375,14 @@ static int vmx_init_vmcs_config(void)
  * To use EPT we expect to be able to clear certain intercepts.
  * We check VMX_BASIC_MSR[55] to correctly handle default controls.
  */
-uint32_t must_be_one, must_be_zero, msr = MSR_IA32_VMX_PROCBASED_CTLS;
-if ( vmx_basic_msr_high & (VMX_BASIC_DEFAULT1_ZERO >> 32) )
-msr = MSR_IA32_VMX_TRUE_PROCBASED_CTLS;
-rdmsr(msr, must_be_one, must_be_zero);
+uint32_t must_be_one = raw_vmx_msr_policy.procbased_ctls.allowed_0.raw;
+uint32_t must_be_zero = 
raw_vmx_msr_policy.procbased_ctls.allowed_1.raw;
+if ( vmx_msr_available(_vmx_msr_policy,
+   MSR_IA32_VMX_TRUE_PROCBASED_CTLS) )
+{
+must_be_one = raw_vmx_msr_policy.true_procbased_ctls.allowed_0.raw;
+must_be_zero = 
raw_vmx_msr_policy.true_procbased_ctls.allowed_1.raw;
+}
 if ( must_be_one & (CPU_BASED_INVLPG_EXITING |
 CPU_BASED_CR3_LOAD_EXITING |
 CPU_BASED_CR3_STORE_EXITING) )
@@ -423,9 +423,9 @@ static int vmx_init_vmcs_config(void)
 _vmx_pin_based_exec_control  &= ~ PIN_BASED_POSTED_INTERRUPT;
 
 /* The IA32_VMX_VMFUNC MSR exists only when VMFUNC is available */
-if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS )
+if ( vmx_msr_available(_vmx_msr_policy, MSR_IA32_VMX_VMFUNC) )
 {
-rdmsrl(MSR_IA32_VMX_VMFUNC, _vmx_vmfunc);
+_vmx_vmfunc = raw_vmx_msr_policy.vmfunc.raw;
 
 /*
  * VMFUNC leaf 0 (EPTP switching) must be supported.
@@ -451,33 +451,31 @@ static int vmx_init_vmcs_config(void)
 if ( !vm

[Xen-devel] [DEBUG PATCH 6/6] vmx: print H/W VMX MSRs values during startup

2017-06-26 Thread Sergey Dyasli
This is a debug patch I used when developing this series.
It's not intended for merging, I post it because it might be useful
to someone.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c | 405 
 1 file changed, 405 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index da6ddf52f1..b142f29560 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -154,6 +154,408 @@ bool vmx_msr_available(struct vmx_msr_policy *p, uint32_t 
msr)
 return p->available & (1u << (msr - MSR_IA32_VMX_BASIC));
 }
 
+static char *vmx_msr_bit_status(u32 mask, u32 all_0, u32 all_1)
+{
+if ( (all_0 & mask) && (all_1 & mask) )
+return "1";
+if ( !(all_0 & mask) && !(all_1 & mask) )
+return "0";
+
+return "0/1";
+}
+
+static char *btoa(uint32_t val)
+{
+return val ? "yes" : "no";
+}
+
+static void print_vmx_basic_msr(struct vmx_msr_policy *p)
+{
+printk("%-33s %#018lx\n", "MSR_IA32_VMX_BASIC",  p->basic.raw);
+printk("  %-31s %#x\n", "VMCS revision:", p->basic.vmcs_revision_id);
+printk("  %-31s %d\n", "VMCS/VMXON region size:",
+   p->basic.vmcs_region_size);
+printk("  %-31s %s\n", "32-bit phys addr limit:",
+   btoa(p->basic.addresses_32bit));
+printk("  %-31s %s\n", "Dual monitor mode:", btoa(p->basic.dual_monitor));
+printk("  %-31s %d ", "VMCS memory type:", p->basic.memory_type);
+switch ( p->basic.memory_type )
+{
+case MTRR_TYPE_UNCACHABLE:
+printk("(Uncacheable)\n");
+break;
+case MTRR_TYPE_WRBACK:
+printk("(Write Back)\n");
+break;
+default:
+printk("(Unrecognized)\n");
+break;
+}
+printk("  %-31s %s\n", "Report INS/OUTS VM exits:",
+   btoa(p->basic.ins_out_info));
+printk("  %-31s %s\n", "Default1 CTLS clearable:",
+   btoa(p->basic.default1_zero));
+}
+
+static void print_vmx_msr_pinbased_bits(u32 all_0, u32 all_1)
+{
+printk("  %-31s %s\n", "External-interrupt exiting:",
+   vmx_msr_bit_status(PIN_BASED_EXT_INTR_MASK, all_0, all_1));
+printk("  %-31s %s\n", "NMI exiting:",
+   vmx_msr_bit_status(PIN_BASED_NMI_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "Virtual NMIs:",
+   vmx_msr_bit_status(PIN_BASED_VIRTUAL_NMIS, all_0, all_1));
+printk("  %-31s %s\n", "VMX-preemption timer:",
+   vmx_msr_bit_status(PIN_BASED_PREEMPT_TIMER, all_0, all_1));
+printk("  %-31s %s\n", "Posted interrupts:",
+   vmx_msr_bit_status(PIN_BASED_POSTED_INTERRUPT, all_0, all_1));
+}
+
+static void print_vmx_msr_procbased_bits(u32 all_0, u32 all_1)
+{
+printk("  %-31s %s\n", "Interrupt-window exiting:",
+   vmx_msr_bit_status(CPU_BASED_VIRTUAL_INTR_PENDING, all_0, all_1));
+printk("  %-31s %s\n", "Use TSC offsetting:",
+   vmx_msr_bit_status(CPU_BASED_USE_TSC_OFFSETING, all_0, all_1));
+printk("  %-31s %s\n", "HLT exiting:",
+   vmx_msr_bit_status(CPU_BASED_HLT_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "INVLPG exiting:",
+   vmx_msr_bit_status(CPU_BASED_INVLPG_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "MWAIT exiting:",
+   vmx_msr_bit_status(CPU_BASED_MWAIT_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "RDPMC exiting:",
+   vmx_msr_bit_status(CPU_BASED_RDPMC_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "RDTSC exiting:",
+   vmx_msr_bit_status(CPU_BASED_RDTSC_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "CR3-load exiting:",
+   vmx_msr_bit_status(CPU_BASED_CR3_LOAD_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "CR3-store exiting:",
+   vmx_msr_bit_status(CPU_BASED_CR3_STORE_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "CR8-load exiting:",
+   vmx_msr_bit_status(CPU_BASED_CR8_LOAD_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "CR8-store exiting:",
+   vmx_msr_bit_status(CPU_BASED_CR8_STORE_EXITING, all_0, all_1));
+printk("  %-31s %s\n", "Use TPR shadow:",
+   vmx_msr_bit_status(CPU_BASED_TPR_SHADOW, all_0, all_1));
+printk("  %-31s %s\n", "NMI-window exiting:",
+   vmx_msr_bit_sta

[Xen-devel] [PATCH v1 5/6] vvmx: add per domain vmx msr policy

2017-06-26 Thread Sergey Dyasli
Having a policy per domain allows to sensibly query what VMX features
the domain has, which unblocks some other nested virt work items.

For now, make policy for each domain equal to hvm_max_vmx_msr_policy.
In the future it should be possible to independently configure
the policy for each domain.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/domain.c  |  6 ++
 xen/arch/x86/hvm/vmx/vvmx.c| 14 +-
 xen/include/asm-x86/domain.h   |  2 ++
 xen/include/asm-x86/hvm/vmx/vvmx.h |  3 +++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 49388f48d7..2a3518328e 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -419,6 +419,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 {
 d->arch.emulation_flags = 0;
 d->arch.cpuid = ZERO_BLOCK_PTR; /* Catch stray misuses. */
+d->arch.vmx_msr = ZERO_BLOCK_PTR;
 }
 else
 {
@@ -464,6 +465,9 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 if ( (rc = init_domain_cpuid_policy(d)) )
 goto fail;
 
+if ( (rc = init_domain_vmx_msr_policy(d)) )
+goto fail;
+
 d->arch.ioport_caps = 
 rangeset_new(d, "I/O Ports", RANGESETF_prettyprint_hex);
 rc = -ENOMEM;
@@ -535,6 +539,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 cleanup_domain_irq_mapping(d);
 free_xenheap_page(d->shared_info);
 xfree(d->arch.cpuid);
+xfree(d->arch.vmx_msr);
 if ( paging_initialised )
 paging_final_teardown(d);
 free_perdomain_mappings(d);
@@ -549,6 +554,7 @@ void arch_domain_destroy(struct domain *d)
 
 xfree(d->arch.e820);
 xfree(d->arch.cpuid);
+xfree(d->arch.vmx_msr);
 
 free_domain_pirqs(d);
 if ( !is_idle_domain(d) )
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 657371ec69..ae24dc4680 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2078,6 +2078,18 @@ void __init calculate_hvm_max_policy(void)
 p->available &= ~0x2;
 }
 
+int init_domain_vmx_msr_policy(struct domain *d)
+{
+d->arch.vmx_msr = xmalloc(struct vmx_msr_policy);
+
+if ( !d->arch.vmx_msr )
+return -ENOMEM;
+
+*d->arch.vmx_msr = hvm_max_vmx_msr_policy;
+
+return 0;
+}
+
 /*
  * Capability reporting
  */
@@ -2085,7 +2097,7 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 
*msr_content)
 {
 struct vcpu *v = current;
 struct domain *d = v->domain;
-struct vmx_msr_policy *p = _max_vmx_msr_policy;
+struct vmx_msr_policy *p = d->arch.vmx_msr;
 u64 data;
 int r = 1;
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 924caac834..3cb753e46b 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -359,6 +359,8 @@ struct arch_domain
 /* CPUID Policy. */
 struct cpuid_policy *cpuid;
 
+struct vmx_msr_policy *vmx_msr;
+
 struct PITState vpit;
 
 /* TSC management (emulation, pv, scaling, stats) */
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h 
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index ca2fb2535c..627112bea8 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -246,5 +246,8 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
 uint64_t *exit_qual, uint32_t *exit_reason);
 int nvmx_cpu_up_prepare(unsigned int cpu);
 void nvmx_cpu_dead(unsigned int cpu);
+
+int init_domain_vmx_msr_policy(struct domain *d);
+
 #endif /* __ASM_X86_HVM_VVMX_H__ */
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 0/6] VMX MSRs policy for Nested Virt: part 1

2017-06-26 Thread Sergey Dyasli
The end goal of having VMX MSRs policy is to be able to manage
L1 VMX features. This patch series is the first part of this work.
There is no functional change to what L1 sees in VMX MSRs at this
point. But each domain will have a policy object which allows to
sensibly query what VMX features the domain has. This will unblock
some other nested virt work items.

Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

The above makes L1 VMX feature set inconsistent between different H/W
and there is no ability to control what features are available to L1.
The overall set of issues has much in common with CPUID policy.

Part 1 introduces struct vmx_msr_policy and the following instances:

* Raw policy (raw_vmx_msr_policy) -- the actual contents of H/W VMX MSRs
* HVM max policy (hvm_max_vmx_msr_policy) -- the end result of
   nvmx_msr_read_intercept() on current H/W
* Per-domain policy (d->arch.vmx_msr) -- the copy of HVM max policy
 (for now)

There is no "Host policy" because Xen already has a set of variables
(vmx_pin_based_exec_control and others) which represent the set of
VMX features that Xen uses.  There are features that Xen doesn't use
(i.g. CPU_BASED_PAUSE_EXITING) but they are available to L1.  This makes
it not worthy to introduce "Host policy" at this stage.

Sergey Dyasli (6):
  vmx: add struct vmx_msr_policy
  vmx: add raw_vmx_msr_policy
  vmx: refactor vmx_init_vmcs_config()
  vvmx: add hvm_max_vmx_msr_policy
  vvmx: add per domain vmx msr policy
  vmx: print H/W VMX MSRs values during startup

 xen/arch/x86/domain.c  |   6 +
 xen/arch/x86/hvm/vmx/vmcs.c| 639 -
 xen/arch/x86/hvm/vmx/vmx.c |   4 +
 xen/arch/x86/hvm/vmx/vvmx.c| 309 +-
 xen/include/asm-x86/domain.h   |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h | 346 
 xen/include/asm-x86/hvm/vmx/vvmx.h |   3 +
 7 files changed, 1070 insertions(+), 239 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 1/6] vmx: add struct vmx_msr_policy

2017-06-26 Thread Sergey Dyasli
This structure provides a convenient way of accessing contents of
VMX MSRs: every bit value is accessible by its name.  Bit names match
existing Xen's definitions as close as possible.

The structure also contains the bitmap of available MSRs since not all
of them may be available on a particular H/W.

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c|  47 +
 xen/include/asm-x86/hvm/vmx/vmcs.h | 344 +
 2 files changed, 391 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 8103b20d29..e6ea197230 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -144,6 +144,14 @@ static void __init vmx_display_features(void)
 printk(" - none\n");
 }
 
+bool vmx_msr_available(struct vmx_msr_policy *p, uint32_t msr)
+{
+if ( msr < MSR_IA32_VMX_BASIC || msr > MSR_IA32_VMX_VMFUNC )
+return 0;
+
+return p->available & (1u << (msr - MSR_IA32_VMX_BASIC));
+}
+
 static u32 adjust_vmx_controls(
 const char *name, u32 ctl_min, u32 ctl_opt, u32 msr, bool_t *mismatch)
 {
@@ -1956,6 +1964,45 @@ void __init setup_vmcs_dump(void)
 register_keyhandler('v', vmcs_dump, "dump VT-x VMCSs", 1);
 }
 
+static void __init __maybe_unused build_assertions(void)
+{
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.basic) !=
+ sizeof(raw_vmx_msr_policy.basic.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.pinbased_ctls) !=
+ sizeof(raw_vmx_msr_policy.pinbased_ctls.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.procbased_ctls) !=
+ sizeof(raw_vmx_msr_policy.procbased_ctls.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.exit_ctls) !=
+ sizeof(raw_vmx_msr_policy.exit_ctls.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.entry_ctls) !=
+ sizeof(raw_vmx_msr_policy.entry_ctls.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.misc) !=
+ sizeof(raw_vmx_msr_policy.misc.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.cr0_fixed_0) !=
+ sizeof(raw_vmx_msr_policy.cr0_fixed_0.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.cr0_fixed_1) !=
+ sizeof(raw_vmx_msr_policy.cr0_fixed_1.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.cr4_fixed_0) !=
+ sizeof(raw_vmx_msr_policy.cr4_fixed_0.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.cr4_fixed_1) !=
+ sizeof(raw_vmx_msr_policy.cr4_fixed_1.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.vmcs_enum) !=
+ sizeof(raw_vmx_msr_policy.vmcs_enum.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.procbased_ctls2) !=
+ sizeof(raw_vmx_msr_policy.procbased_ctls2.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.ept_vpid_cap) !=
+ sizeof(raw_vmx_msr_policy.ept_vpid_cap.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.true_pinbased_ctls) !=
+ sizeof(raw_vmx_msr_policy.true_pinbased_ctls.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.true_procbased_ctls) !=
+ sizeof(raw_vmx_msr_policy.true_procbased_ctls.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.true_exit_ctls) !=
+ sizeof(raw_vmx_msr_policy.true_exit_ctls.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.true_entry_ctls) !=
+ sizeof(raw_vmx_msr_policy.true_entry_ctls.raw));
+BUILD_BUG_ON(sizeof(raw_vmx_msr_policy.vmfunc) !=
+ sizeof(raw_vmx_msr_policy.vmfunc.raw));
+}
 
 /*
  * Local variables:
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index e3cdfdf576..fca1e62e4c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -562,6 +562,350 @@ void vmx_domain_flush_pml_buffers(struct domain *d);
 
 void vmx_domain_update_eptp(struct domain *d);
 
+union vmx_pin_based_exec_control_bits {
+uint32_t raw;
+struct {
+bool ext_intr_exiting:1;
+uint32_t :2;  /* 1:2 reserved */
+bool  nmi_exiting:1;
+uint32_t :1;  /* 4 reserved */
+bool virtual_nmis:1;
+boolpreempt_timer:1;
+bool posted_interrupt:1;
+uint32_t :24; /* 8:31 reserved */
+};
+};
+
+union vmx_cpu_based_exec_control_bits {
+uint32_t raw;
+struct {
+uint32_t:2;  /* 0:1 reserved */
+boolvirtual_intr_pending:1;
+bool   use_tsc_offseting:1;
+uint32_t:3;  /* 4:6 reserved */
+bool hlt_exiting:1;
+uint32_t:1;  /* 8 reserved */
+bool  invlpg_exiting:1;
+bool   mwait_exiting:1;
+bool   rdpmc_exiting:1;
+bool   rdtsc_exiting:1;
+uint32_t 

[Xen-devel] [PATCH v2] xen: fix HYPERVISOR_dm_op() prototype

2017-06-07 Thread Sergey Dyasli
Change the third parameter to be the required struct xen_dm_op_buf *
instead of a generic void * (which blindly accepts any pointer).

Signed-off-by: Sergey Dyasli <sergey.dya...@citrix.com>
---
v1 --> v2:
- Replaced "#include " with
  forward declaration of struct xen_dm_op_buf

 arch/x86/include/asm/xen/hypercall.h | 4 +++-
 include/xen/arm/hypercall.h  | 5 -
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index f6d20f6cca12..7a4db5fefd15 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -50,6 +50,8 @@
 #include 
 #include 
 
+struct xen_dm_op_buf;
+
 /*
  * The hypercall asms have to meet several constraints:
  * - Work on 32- and 64-bit.
@@ -474,7 +476,7 @@ HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
 
 static inline int
 HYPERVISOR_dm_op(
-   domid_t dom, unsigned int nr_bufs, void *bufs)
+   domid_t dom, unsigned int nr_bufs, struct xen_dm_op_buf *bufs)
 {
return _hypercall3(int, dm_op, dom, nr_bufs, bufs);
 }
diff --git a/include/xen/arm/hypercall.h b/include/xen/arm/hypercall.h
index 73db4b2eeb89..b40485e54d80 100644
--- a/include/xen/arm/hypercall.h
+++ b/include/xen/arm/hypercall.h
@@ -39,6 +39,8 @@
 #include 
 #include 
 
+struct xen_dm_op_buf;
+
 long privcmd_call(unsigned call, unsigned long a1,
unsigned long a2, unsigned long a3,
unsigned long a4, unsigned long a5);
@@ -53,7 +55,8 @@ int HYPERVISOR_physdev_op(int cmd, void *arg);
 int HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args);
 int HYPERVISOR_tmem_op(void *arg);
 int HYPERVISOR_vm_assist(unsigned int cmd, unsigned int type);
-int HYPERVISOR_dm_op(domid_t domid, unsigned int nr_bufs, void *bufs);
+int HYPERVISOR_dm_op(domid_t domid, unsigned int nr_bufs,
+struct xen_dm_op_buf *bufs);
 int HYPERVISOR_platform_op_raw(void *arg);
 static inline int HYPERVISOR_platform_op(struct xen_platform_op *op)
 {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] xen: fix HYPERVISOR_dm_op() prototype

2017-06-06 Thread Sergey Dyasli
On Tue, 2017-06-06 at 02:03 -0600, Jan Beulich wrote:
> > > > On 05.06.17 at 10:41,  wrote:
> > 
> > --- a/arch/x86/include/asm/xen/hypercall.h
> > +++ b/arch/x86/include/asm/xen/hypercall.h
> > @@ -49,6 +49,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> 
> Why?
> 
> > @@ -474,7 +475,7 @@ HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
> >  
> >  static inline int
> >  HYPERVISOR_dm_op(
> > -   domid_t dom, unsigned int nr_bufs, void *bufs)
> > +   domid_t dom, unsigned int nr_bufs, struct xen_dm_op_buf *bufs)
> 
> All you need above here is a forward declaration of the structure.
> We should really avoid forcing source files to include all sorts of
> headers without actually needing anything from them.

Thank you for the good suggestion. I will fix this in v2.

> 
> > --- a/include/xen/arm/hypercall.h
> > +++ b/include/xen/arm/hypercall.h
> > @@ -38,6 +38,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> 
> Same here.
> 
> > @@ -53,7 +54,8 @@ int HYPERVISOR_physdev_op(int cmd, void *arg);
> >  int HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args);
> >  int HYPERVISOR_tmem_op(void *arg);
> >  int HYPERVISOR_vm_assist(unsigned int cmd, unsigned int type);
> > -int HYPERVISOR_dm_op(domid_t domid, unsigned int nr_bufs, void *bufs);
> > +int HYPERVISOR_dm_op(domid_t domid, unsigned int nr_bufs,
> > +struct xen_dm_op_buf *bufs);
> 
> How come you get away with changing a declaration without
> also changing the matching definition?

The definition of HYPERVISOR_dm_op() is in arch/arm/xen/hypercall.S

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   >