>>> On 08.02.17 at 16:32, <jbeul...@suse.com> wrote: >>>> On 07.02.17 at 18:26, <anshul.mak...@citrix.com> wrote: >> Facing a issue where bootstorm of guests leads to host crash. I debugged >> and found that that enabling PML introduces a race condition during >> guest teardown stage while disabling PML on a vcpu and context switch >> happening for another vcpu. >> >> Crash happens only on Broadwell processors as PML got introduced in this >> generation. >> >> Here is my analysis: >> >> Race condition: >> >> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( >> disable_PML); vmx_vmcx_exit();) >> >> In between I have a callpath from another pcpu executing context >> switch-> vmx_fpu_leave() and crash on vmwrite.. >> >> if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) >> { >> v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS; >> __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash >> } > > So that's after current has changed already, so it's effectively > dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter(). > The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(), > however, assumes that any user of a VMCS either owns the lock > or has current as the owner of the VMCS. Of course such a call > also can't be added here, as a vcpu on the context-switch-from > path can't vcpu_pause() itself. > > That taken together with the bypassing of __context_switch() > when the incoming vCPU is the idle one (which means that via > context_saved() ->is_running will be cleared before running > ->ctxt_switch_from()), the vcpu_pause() invocation in > vmx_vmcs_try_enter() may not have to wait at all if the call > happens between the clearing of ->is_running and the > eventual invocation of vmx_ctxt_switch_from().
Mind giving the attached patch a try (which admittedly was only lightly tested so far - in particular I haven't seen the second of the debug messages being logged, yet)? Jan
TODO: remove //temp-s --- unstable.orig/xen/arch/x86/hvm/vmx/vmcs.c 2016-12-20 11:21:35.000000000 +0100 +++ unstable/xen/arch/x86/hvm/vmx/vmcs.c 2017-02-09 16:56:51.000000000 +0100 @@ -553,6 +553,36 @@ static void vmx_load_vmcs(struct vcpu *v local_irq_restore(flags); } +void vmx_vmcs_reload(struct vcpu *v) +{ +static unsigned long cnt1, cnt2, thr1, thr2;//temp +if(++cnt1 > thr1) {//temp + thr1 |= cnt1; + printk("%pv/%lx (%pv/%lx)\n", v, PFN_DOWN(v->arch.hvm_vmx.vmcs_pa), current, PFN_DOWN(this_cpu(current_vmcs))); +} + /* + * As we're running with interrupts disabled, we can't acquire + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled + * the VMCS can't be taken away from us anymore if we still own it. + */ + ASSERT(!local_irq_is_enabled()); + if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) + return; +if(++cnt2 > thr2) {//temp + thr2 |= cnt2; + printk("%pv reload (on %pv)\n", v, current); +} + ASSERT(!this_cpu(current_vmcs)); + + /* + * Wait for the remote side to be done with the VMCS before loading + * it here. + */ + while ( v->arch.hvm_vmx.active_cpu != -1 ) + cpu_relax(); + vmx_load_vmcs(v); +} + int vmx_cpu_up_prepare(unsigned int cpu) { /* --- unstable.orig/xen/arch/x86/hvm/vmx/vmx.c 2017-01-13 16:00:19.000000000 +0100 +++ unstable/xen/arch/x86/hvm/vmx/vmx.c 2017-02-09 16:56:19.000000000 +0100 @@ -936,6 +936,18 @@ static void vmx_ctxt_switch_from(struct if ( unlikely(!this_cpu(vmxon)) ) return; + if ( !v->is_running ) + { + /* + * When this vCPU isn't marked as running anymore, a remote pCPU's + * attempt to pause us (from vmx_vmcs_enter()) won't have a reason + * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken + * away the VMCS from us. As we're running with interrupts disabled, + * we also can't call vmx_vmcs_enter(). + */ + vmx_vmcs_reload(v); + } + vmx_fpu_leave(v); vmx_save_guest_msrs(v); vmx_restore_host_msrs(); --- unstable.orig/xen/include/asm-x86/hvm/vmx/vmcs.h 2016-12-20 11:21:35.000000000 +0100 +++ unstable/xen/include/asm-x86/hvm/vmx/vmcs.h 2017-02-09 16:50:53.000000000 +0100 @@ -179,6 +179,7 @@ void vmx_destroy_vmcs(struct vcpu *v); void vmx_vmcs_enter(struct vcpu *v); bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v); void vmx_vmcs_exit(struct vcpu *v); +void vmx_vmcs_reload(struct vcpu *v); #define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 #define CPU_BASED_USE_TSC_OFFSETING 0x00000008
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel