Re: [PATCH] x86/vpmu: fix race-condition in vpmu_load

Tamas K Lengyel Mon, 19 Sep 2022 06:25:46 -0700

On Mon, Sep 19, 2022 at 9:21 AM Jan Beulich <jbeul...@suse.com> wrote:
>
> On 19.09.2022 14:25, Tamas K Lengyel wrote:
> > On Mon, Sep 19, 2022 at 5:28 AM Jan Beulich <jbeul...@suse.com> wrote:
> >>
> >> On 16.09.2022 23:35, Boris Ostrovsky wrote:
> >>>
> >>> On 9/16/22 8:52 AM, Jan Beulich wrote:
> >>>> On 15.09.2022 16:01, Tamas K Lengyel wrote:
> >>>>> While experimenting with the vPMU subsystem an ASSERT failure was
> >>>>> observed in vmx_find_msr because the vcpu_runnable state was true.
> >>>>>
> >>>>> The root cause of the bug appears to be the fact that the vPMU subsystem
> >>>>> doesn't save its state on context_switch.
>
> For the further reply below - is this actually true? What is the
> vpmu_switch_from() (resolving to vpmu_save()) doing then early
> in context_switch()?
>
> >>>>> The vpmu_load function will attempt
> >>>>> to gather the PMU state if its still loaded two different ways:
> >>>>>      1. if the current pcpu is not where the vcpu ran before doing a 
> >>>>> remote save
> >>>>>      2. if the current pcpu had another vcpu active before doing a 
> >>>>> local save
> >>>>>
> >>>>> However, in case the prev vcpu is being rescheduled on another pcpu its 
> >>>>> state
> >>>>> has already changed and vcpu_runnable is returning true, thus #2 will 
> >>>>> trip the
> >>>>> ASSERT. The only way to avoid this race condition is to make sure the
> >>>>> prev vcpu is paused while being checked and its context saved. Once the 
> >>>>> prev
> >>>>> vcpu is resumed and does #1 it will find its state already saved.
> >>>> While I consider this explanation plausible, I'm worried:
> >>>>
> >>>>> --- a/xen/arch/x86/cpu/vpmu.c
> >>>>> +++ b/xen/arch/x86/cpu/vpmu.c
> >>>>> @@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
> >>>>>           vpmu = vcpu_vpmu(prev);
> >>>>>
> >>>>>           /* Someone ran here before us */
> >>>>> +        vcpu_pause(prev);
> >>>>>           vpmu_save_force(prev);
> >>>>>           vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> >>>>> +        vcpu_unpause(prev);
> >>>>>
> >>>>>           vpmu = vcpu_vpmu(v);
> >>>>>       }
> >>>> We're running with IRQs off here, yet vcpu_pause() waits for the vcpu
> >>>> to actually be de-scheduled. Even with IRQs on this is already a
> >>>> relatively heavy operation (also including its impact on the remote
> >>>> side). Additionally the function is called from context_switch(), and
> >>>> I'm unsure of the usability of vcpu_pause() on such a path. In
> >>>> particular: Is there a risk of two CPUs doing this mutually to one
> >>>> another? If so, is deadlocking excluded?
> >>>>
> >>>> Hence at the very least I think the description wants extending, to
> >>>> discuss the safety of the change.
> >>>>
> >>>> Boris - any chance you could comment here? Iirc that's code you did
> >>>> introduce.
> >>>
> >>>
> >>> Is the assertion in vmx_find_msr() really needs to be for runnable vcpu 
> >>> or can it be a check on whether vcpu is actually running (e.g. 
> >>> RUNSTATE_running)?
> >>
> >> You cannot safely check for "running", as "runnable" may transition
> >> to/from "running" behind your back.
> >
> > The more I look at this the more I think the only sensible solution is
> > to have the vPMU state be saved on vmexit for all vCPUs.
>
> Do you really mean vmexit? It would suffice if state was reliably
> saved during context-switch-out, wouldn't it? At that point the
> vCPU can't be resumed on another pCPU, yet.
>
> > That way all
> > this having to figure out where and when a context needs saving during
> > scheduling goes away. Yes, it adds a bit of overhead for cases where
> > the vCPU will resume on the same pCPU and that context saved could
> > have been skipped,
>
> If you really mean vmexit, then I'm inclined to say that's more
> than just "a bit of overhead". I'd agree if you really meant
> context-switch-out, but as said further up it looks to me as if
> that was already occurring. Apparently I'm overlooking something
> crucial ...


Yes, the current setup is doing exactly that, saving the vPMU context
on context-switch-out, and that's where the ASSERT failure occurs
because the vCPU it needs to save the context for is already runnable
on another pCPU.

Tamas

Re: [PATCH] x86/vpmu: fix race-condition in vpmu_load

Reply via email to