Re: [Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
On 06/25/2018 03:54 PM, Jan Beulich wrote: On 25.06.18 at 14:40, wrote: >>> Crash: >>> >>> (XEN) [ 1924.367273] altp2m_vcpu_initialise() >>> (XEN) [ 1924.367277] altp2m_vcpu_reset() >>> (XEN) [ 1924.367278] 1 altp2m_vcpu_update_p2m() >>> (XEN) [ 1924.367279] vmx_vcpu_update_eptp() >>> (XEN) [ 1924.367318] HVMOP_altp2m_vcpu_enable_notify >>> (XEN) [ 1924.367321] vmx_vcpu_update_vmfunc_ve(0), >>> v->arch.hvm_vmx.secondary_exec_control: 0x1054eb >>> (XEN) [ 1924.367326] exit vmx_vcpu_update_vmfunc_ve(0), >>> v->arch.hvm_vmx.secondary_exec_control: 0x1474eb >>> (XEN) [ 1924.367344] Xen BUG at vmx.c:3407 >> >> Actually I think this shows us the problem: 65535 (INVALID_ALTP2M) is a >> stale value from a previous good run. But the EPTP_INDEX value is >> ignored unless SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set. So at the >> crash point, SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS just got set, the >> "live" index is 0, and the stale INVALID_ALTP2M value is being read from >> the previous run (and compared to 0 and MAX_ALTP2M). > > So perhaps the writing of EPTP_INDEX should be done earlier? And indeed I can confirm this: I've added a sleep() in my test between xc_altp2m_set_vcpu_enable_notify() and xc_altp2m_set_domain_state(xci, domid, 0), and it _always_ crashes Xen on the second run. Quite right, that's exactly what I've been doing: a satisfactory fix appears to be to simply reverse the order of altp2m_vcpu_update_p2m(v) and altp2m_vcpu_update_vmfunc_ve(v) in altp2m_vcpu_destroy(). I'll send out a patch ASAP. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
>>> On 25.06.18 at 14:40, wrote: >> Crash: >> >> (XEN) [ 1924.367273] altp2m_vcpu_initialise() >> (XEN) [ 1924.367277] altp2m_vcpu_reset() >> (XEN) [ 1924.367278] 1 altp2m_vcpu_update_p2m() >> (XEN) [ 1924.367279] vmx_vcpu_update_eptp() >> (XEN) [ 1924.367318] HVMOP_altp2m_vcpu_enable_notify >> (XEN) [ 1924.367321] vmx_vcpu_update_vmfunc_ve(0), >> v->arch.hvm_vmx.secondary_exec_control: 0x1054eb >> (XEN) [ 1924.367326] exit vmx_vcpu_update_vmfunc_ve(0), >> v->arch.hvm_vmx.secondary_exec_control: 0x1474eb >> (XEN) [ 1924.367344] Xen BUG at vmx.c:3407 > > Actually I think this shows us the problem: 65535 (INVALID_ALTP2M) is a > stale value from a previous good run. But the EPTP_INDEX value is > ignored unless SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set. So at the > crash point, SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS just got set, the > "live" index is 0, and the stale INVALID_ALTP2M value is being read from > the previous run (and compared to 0 and MAX_ALTP2M). So perhaps the writing of EPTP_INDEX should be done earlier? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
> (XEN) [ 1923.964832] altp2m_vcpu_initialise() > (XEN) [ 1923.964836] altp2m_vcpu_reset() > (XEN) [ 1923.964837] 1 altp2m_vcpu_update_p2m() > (XEN) [ 1923.964838] vmx_vcpu_update_eptp() > (XEN) [ 1923.964876] HVMOP_altp2m_vcpu_enable_notify > (XEN) [ 1923.964878] vmx_vcpu_update_vmfunc_ve(0), > v->arch.hvm_vmx.secondary_exec_control: 0x1054eb > (XEN) [ 1923.964880] exit vmx_vcpu_update_vmfunc_ve(0), > v->arch.hvm_vmx.secondary_exec_control: 0x1474eb > (XEN) [ 1923.964986] altp2m_vcpu_destroy() > (XEN) [ 1923.964987] altp2m_vcpu_reset() > (XEN) [ 1923.964988] 2 altp2m_vcpu_update_p2m() > (XEN) [ 1923.964989] vmx_vcpu_update_eptp() > (XEN) [ 1923.964991] __vmwrite(EPTP_INDEX, 65535) > (XEN) [ 1923.964992] vmx_vcpu_update_vmfunc_ve(0), > v->arch.hvm_vmx.secondary_exec_control: 0x1474eb > (XEN) [ 1923.964993] exit vmx_vcpu_update_vmfunc_ve(0), > v->arch.hvm_vmx.secondary_exec_control: 0x1054eb > > Crash: > > (XEN) [ 1924.367273] altp2m_vcpu_initialise() > (XEN) [ 1924.367277] altp2m_vcpu_reset() > (XEN) [ 1924.367278] 1 altp2m_vcpu_update_p2m() > (XEN) [ 1924.367279] vmx_vcpu_update_eptp() > (XEN) [ 1924.367318] HVMOP_altp2m_vcpu_enable_notify > (XEN) [ 1924.367321] vmx_vcpu_update_vmfunc_ve(0), > v->arch.hvm_vmx.secondary_exec_control: 0x1054eb > (XEN) [ 1924.367326] exit vmx_vcpu_update_vmfunc_ve(0), > v->arch.hvm_vmx.secondary_exec_control: 0x1474eb > (XEN) [ 1924.367344] Xen BUG at vmx.c:3407 Actually I think this shows us the problem: 65535 (INVALID_ALTP2M) is a stale value from a previous good run. But the EPTP_INDEX value is ignored unless SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set. So at the crash point, SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS just got set, the "live" index is 0, and the stale INVALID_ALTP2M value is being read from the previous run (and compared to 0 and MAX_ALTP2M). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
On 06/25/2018 03:28 PM, Jan Beulich wrote: On 25.06.18 at 14:12, wrote: >> On 06/22/2018 07:55 PM, Razvan Cojocaru wrote: >>> On 06/22/2018 06:28 PM, Jan Beulich wrote: >>> On 13.06.18 at 10:52, wrote: > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -3592,7 +3592,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > } > } > > -if ( idx != vcpu_altp2m(v).p2midx ) > +if ( idx != INVALID_ALTP2M && idx != vcpu_altp2m(v).p2midx ) > { > BUG_ON(idx >= MAX_ALTP2M); In the code immediately ahead of this there is an INVALID_ALTP2M check already (in the else branch). If the __vmread() can legitimately produce this value, why would the domain be crashed when getting back INVALID_ALTP2M in the other case? I think the correctness of your change can only be judged once both code paths behave consistently. >>> >>> You're right, I had somehow convinced myself that this is a #VE-specific >>> problem, but it looks like a generic altp2m problem. I'll simulate the >>> other branch in the code and see what it does with my small test >>> application. >> >> After a bit of debugging, the issue explained in full seems to be this >> (it indeed appears to be #VE-specific, as initially assumed): client >> application calls xc_altp2m_set_domain_state(xci, domid, 1), followed by >> xc_altp2m_set_vcpu_enable_notify() (with a suitable gfn), followed by >> xc_altp2m_set_domain_state(xci, domid, 0). >> >> This causes Xen to go through the following steps: >> >> 1. altp2m_vcpu_initialise() (calls altp2m_vcpu_reset()). >> 2. HVMOP_altp2m_vcpu_enable_notify -> vmx_vcpu_update_vmfunc_ve(). >> 3. altp2m_vcpu_destroy() (calls altp2m_vcpu_reset() and (indirectly) >> vmx_vcpu_update_eptp()). >> 4. Still part of the altp2m_vcpu_destroy() workflow, >> altp2m_vcpu_update_vmfunc_ve(v) gets called. >> >> At step 2, vmx_vcpu_update_vmfunc_ve() modifies >> v->arch.hvm_vmx.secondary_exec_control (from 0x1054eb to 0x1474eb - >> which has the SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS bit set). >> >> At step 3, altp2m_vcpu_reset() sets av->p2midx = INVALID_ALTP2M, then >> vmx_vcpu_update_eptp() sees that SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS >> is set, and as a consequence calls __vmwrite(EPTP_INDEX, >> vcpu_altp2m(v).p2midx). >> >> Now, at step 4 the SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS bit should now >> become 0, because altp2m_vcpu_reset() has set veinfo_gfn to INVALID_GFN. >> But _sometimes_, what happens is that _between_ steps 3 and 4 a >> vmx_vmexit_handler() occurs, which __vmread()s EPTP_INDEX (on the logic >> branch I've tried to fix), compares it to MAX_ALTP2M and then proceeds >> to BUG_ON(), bringing the hypervisor down. > > Thanks for the detailed analysis. With that I wonder whether it is > reasonable for a VM exit to occur in parallel with the processing of > altp2m_vcpu_destroy(). Shouldn't a domain (or vCPU) undergoing such > a mode change be paused? > > I also remain unconvinced that a similar race is entirely impossible in the > non-#VE case. Apologies, I seem to have misread the crash timing. A "good run": (XEN) [ 1923.964832] altp2m_vcpu_initialise() (XEN) [ 1923.964836] altp2m_vcpu_reset() (XEN) [ 1923.964837] 1 altp2m_vcpu_update_p2m() (XEN) [ 1923.964838] vmx_vcpu_update_eptp() (XEN) [ 1923.964876] HVMOP_altp2m_vcpu_enable_notify (XEN) [ 1923.964878] vmx_vcpu_update_vmfunc_ve(0), v->arch.hvm_vmx.secondary_exec_control: 0x1054eb (XEN) [ 1923.964880] exit vmx_vcpu_update_vmfunc_ve(0), v->arch.hvm_vmx.secondary_exec_control: 0x1474eb (XEN) [ 1923.964986] altp2m_vcpu_destroy() (XEN) [ 1923.964987] altp2m_vcpu_reset() (XEN) [ 1923.964988] 2 altp2m_vcpu_update_p2m() (XEN) [ 1923.964989] vmx_vcpu_update_eptp() (XEN) [ 1923.964991] __vmwrite(EPTP_INDEX, 65535) (XEN) [ 1923.964992] vmx_vcpu_update_vmfunc_ve(0), v->arch.hvm_vmx.secondary_exec_control: 0x1474eb (XEN) [ 1923.964993] exit vmx_vcpu_update_vmfunc_ve(0), v->arch.hvm_vmx.secondary_exec_control: 0x1054eb Crash: (XEN) [ 1924.367273] altp2m_vcpu_initialise() (XEN) [ 1924.367277] altp2m_vcpu_reset() (XEN) [ 1924.367278] 1 altp2m_vcpu_update_p2m() (XEN) [ 1924.367279] vmx_vcpu_update_eptp() (XEN) [ 1924.367318] HVMOP_altp2m_vcpu_enable_notify (XEN) [ 1924.367321] vmx_vcpu_update_vmfunc_ve(0), v->arch.hvm_vmx.secondary_exec_control: 0x1054eb (XEN) [ 1924.367326] exit vmx_vcpu_update_vmfunc_ve(0), v->arch.hvm_vmx.secondary_exec_control: 0x1474eb (XEN) [ 1924.367344] Xen BUG at vmx.c:3407 The vmx_vmexit_handler() call appears to happen right after the first vmx_vcpu_update_vmfunc_ve() call, but still before altp2m_vcpu_destroy(). I was also quite confuse that a vmx_vmexit_handler() run is possible in parallel with an HVMOP. I'll keep digging. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org
Re: [Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
>>> On 25.06.18 at 14:12, wrote: > On 06/22/2018 07:55 PM, Razvan Cojocaru wrote: >> On 06/22/2018 06:28 PM, Jan Beulich wrote: >> On 13.06.18 at 10:52, wrote: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3592,7 +3592,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) } } -if ( idx != vcpu_altp2m(v).p2midx ) +if ( idx != INVALID_ALTP2M && idx != vcpu_altp2m(v).p2midx ) { BUG_ON(idx >= MAX_ALTP2M); >>> >>> In the code immediately ahead of this there is an INVALID_ALTP2M check >>> already (in the else branch). If the __vmread() can legitimately produce >>> this value, why would the domain be crashed when getting back >>> INVALID_ALTP2M in the other case? I think the correctness of your change >>> can only be judged once both code paths behave consistently. >> >> You're right, I had somehow convinced myself that this is a #VE-specific >> problem, but it looks like a generic altp2m problem. I'll simulate the >> other branch in the code and see what it does with my small test >> application. > > After a bit of debugging, the issue explained in full seems to be this > (it indeed appears to be #VE-specific, as initially assumed): client > application calls xc_altp2m_set_domain_state(xci, domid, 1), followed by > xc_altp2m_set_vcpu_enable_notify() (with a suitable gfn), followed by > xc_altp2m_set_domain_state(xci, domid, 0). > > This causes Xen to go through the following steps: > > 1. altp2m_vcpu_initialise() (calls altp2m_vcpu_reset()). > 2. HVMOP_altp2m_vcpu_enable_notify -> vmx_vcpu_update_vmfunc_ve(). > 3. altp2m_vcpu_destroy() (calls altp2m_vcpu_reset() and (indirectly) > vmx_vcpu_update_eptp()). > 4. Still part of the altp2m_vcpu_destroy() workflow, > altp2m_vcpu_update_vmfunc_ve(v) gets called. > > At step 2, vmx_vcpu_update_vmfunc_ve() modifies > v->arch.hvm_vmx.secondary_exec_control (from 0x1054eb to 0x1474eb - > which has the SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS bit set). > > At step 3, altp2m_vcpu_reset() sets av->p2midx = INVALID_ALTP2M, then > vmx_vcpu_update_eptp() sees that SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS > is set, and as a consequence calls __vmwrite(EPTP_INDEX, > vcpu_altp2m(v).p2midx). > > Now, at step 4 the SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS bit should now > become 0, because altp2m_vcpu_reset() has set veinfo_gfn to INVALID_GFN. > But _sometimes_, what happens is that _between_ steps 3 and 4 a > vmx_vmexit_handler() occurs, which __vmread()s EPTP_INDEX (on the logic > branch I've tried to fix), compares it to MAX_ALTP2M and then proceeds > to BUG_ON(), bringing the hypervisor down. Thanks for the detailed analysis. With that I wonder whether it is reasonable for a VM exit to occur in parallel with the processing of altp2m_vcpu_destroy(). Shouldn't a domain (or vCPU) undergoing such a mode change be paused? I also remain unconvinced that a similar race is entirely impossible in the non-#VE case. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
On 06/22/2018 07:55 PM, Razvan Cojocaru wrote: > On 06/22/2018 06:28 PM, Jan Beulich wrote: > On 13.06.18 at 10:52, wrote: >>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>> @@ -3592,7 +3592,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) >>> } >>> } >>> >>> -if ( idx != vcpu_altp2m(v).p2midx ) >>> +if ( idx != INVALID_ALTP2M && idx != vcpu_altp2m(v).p2midx ) >>> { >>> BUG_ON(idx >= MAX_ALTP2M); >> >> In the code immediately ahead of this there is an INVALID_ALTP2M check >> already (in the else branch). If the __vmread() can legitimately produce >> this value, why would the domain be crashed when getting back >> INVALID_ALTP2M in the other case? I think the correctness of your change >> can only be judged once both code paths behave consistently. > > You're right, I had somehow convinced myself that this is a #VE-specific > problem, but it looks like a generic altp2m problem. I'll simulate the > other branch in the code and see what it does with my small test > application. After a bit of debugging, the issue explained in full seems to be this (it indeed appears to be #VE-specific, as initially assumed): client application calls xc_altp2m_set_domain_state(xci, domid, 1), followed by xc_altp2m_set_vcpu_enable_notify() (with a suitable gfn), followed by xc_altp2m_set_domain_state(xci, domid, 0). This causes Xen to go through the following steps: 1. altp2m_vcpu_initialise() (calls altp2m_vcpu_reset()). 2. HVMOP_altp2m_vcpu_enable_notify -> vmx_vcpu_update_vmfunc_ve(). 3. altp2m_vcpu_destroy() (calls altp2m_vcpu_reset() and (indirectly) vmx_vcpu_update_eptp()). 4. Still part of the altp2m_vcpu_destroy() workflow, altp2m_vcpu_update_vmfunc_ve(v) gets called. At step 2, vmx_vcpu_update_vmfunc_ve() modifies v->arch.hvm_vmx.secondary_exec_control (from 0x1054eb to 0x1474eb - which has the SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS bit set). At step 3, altp2m_vcpu_reset() sets av->p2midx = INVALID_ALTP2M, then vmx_vcpu_update_eptp() sees that SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set, and as a consequence calls __vmwrite(EPTP_INDEX, vcpu_altp2m(v).p2midx). Now, at step 4 the SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS bit should now become 0, because altp2m_vcpu_reset() has set veinfo_gfn to INVALID_GFN. But _sometimes_, what happens is that _between_ steps 3 and 4 a vmx_vmexit_handler() occurs, which __vmread()s EPTP_INDEX (on the logic branch I've tried to fix), compares it to MAX_ALTP2M and then proceeds to BUG_ON(), bringing the hypervisor down. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
On 06/22/2018 06:28 PM, Jan Beulich wrote: On 13.06.18 at 10:52, wrote: >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -3592,7 +3592,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) >> } >> } >> >> -if ( idx != vcpu_altp2m(v).p2midx ) >> +if ( idx != INVALID_ALTP2M && idx != vcpu_altp2m(v).p2midx ) >> { >> BUG_ON(idx >= MAX_ALTP2M); > > In the code immediately ahead of this there is an INVALID_ALTP2M check > already (in the else branch). If the __vmread() can legitimately produce > this value, why would the domain be crashed when getting back > INVALID_ALTP2M in the other case? I think the correctness of your change > can only be judged once both code paths behave consistently. You're right, I had somehow convinced myself that this is a #VE-specific problem, but it looks like a generic altp2m problem. I'll simulate the other branch in the code and see what it does with my small test application. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
>>> On 13.06.18 at 10:52, wrote: > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -3592,7 +3592,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > } > } > > -if ( idx != vcpu_altp2m(v).p2midx ) > +if ( idx != INVALID_ALTP2M && idx != vcpu_altp2m(v).p2midx ) > { > BUG_ON(idx >= MAX_ALTP2M); In the code immediately ahead of this there is an INVALID_ALTP2M check already (in the else branch). If the __vmread() can legitimately produce this value, why would the domain be crashed when getting back INVALID_ALTP2M in the other case? I think the correctness of your change can only be judged once both code paths behave consistently. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH V2 2/2] x86/altp2m: Fixed domain crash with INVALID_ALTP2M EPTP index
vcpu_altp2m(v).p2midx can become INVALID_ALTP2M with normal usage (in altp2m_vcpu_reset()), which can then result in that value being __vmwritten() in EPTP_INDEX by vmx_vcpu_update_eptp(). The value can then end up being __vmread() in vmx_vmexit_handler() which then calls BUG_ON(idx >= MAX_ALTP2M). Since MAX_ALTP2M is currently 10 and INVALID_ALTP2M is #defined as 0x, the domain will always crash in this case. Signed-off-by: Razvan Cojocaru --- Cc: Jun Nakajima Cc: Kevin Tian Cc: Jan Beulich Cc: Andrew Cooper Cc: Tamas K Lengyel --- xen/arch/x86/hvm/vmx/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 9707514..c7f3925 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3592,7 +3592,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) } } -if ( idx != vcpu_altp2m(v).p2midx ) +if ( idx != INVALID_ALTP2M && idx != vcpu_altp2m(v).p2midx ) { BUG_ON(idx >= MAX_ALTP2M); atomic_dec(_get_altp2m(v)->active_vcpus); -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel