On Mon, Sep 19, 2022 at 9:58 AM Jan Beulich <jbeul...@suse.com> wrote:

> On 19.09.2022 15:24, Tamas K Lengyel wrote:
> > On Mon, Sep 19, 2022 at 9:21 AM Jan Beulich <jbeul...@suse.com> wrote:
> >>
> >> On 19.09.2022 14:25, Tamas K Lengyel wrote:
> >>> On Mon, Sep 19, 2022 at 5:28 AM Jan Beulich <jbeul...@suse.com> wrote:
> >>>>
> >>>> On 16.09.2022 23:35, Boris Ostrovsky wrote:
> >>>>>
> >>>>> On 9/16/22 8:52 AM, Jan Beulich wrote:
> >>>>>> On 15.09.2022 16:01, Tamas K Lengyel wrote:
> >>>>>>> While experimenting with the vPMU subsystem an ASSERT failure was
> >>>>>>> observed in vmx_find_msr because the vcpu_runnable state was true.
> >>>>>>>
> >>>>>>> The root cause of the bug appears to be the fact that the vPMU
> subsystem
> >>>>>>> doesn't save its state on context_switch.
> >>
> >> For the further reply below - is this actually true? What is the
> >> vpmu_switch_from() (resolving to vpmu_save()) doing then early
> >> in context_switch()?
> >>
> >>>>>>> The vpmu_load function will attempt
> >>>>>>> to gather the PMU state if its still loaded two different ways:
> >>>>>>>      1. if the current pcpu is not where the vcpu ran before doing
> a remote save
> >>>>>>>      2. if the current pcpu had another vcpu active before doing a
> local save
> >>>>>>>
> >>>>>>> However, in case the prev vcpu is being rescheduled on another
> pcpu its state
> >>>>>>> has already changed and vcpu_runnable is returning true, thus #2
> will trip the
> >>>>>>> ASSERT. The only way to avoid this race condition is to make sure
> the
> >>>>>>> prev vcpu is paused while being checked and its context saved.
> Once the prev
> >>>>>>> vcpu is resumed and does #1 it will find its state already saved.
> >>>>>> While I consider this explanation plausible, I'm worried:
> >>>>>>
> >>>>>>> --- a/xen/arch/x86/cpu/vpmu.c
> >>>>>>> +++ b/xen/arch/x86/cpu/vpmu.c
> >>>>>>> @@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t
> from_guest)
> >>>>>>>           vpmu = vcpu_vpmu(prev);
> >>>>>>>
> >>>>>>>           /* Someone ran here before us */
> >>>>>>> +        vcpu_pause(prev);
> >>>>>>>           vpmu_save_force(prev);
> >>>>>>>           vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> >>>>>>> +        vcpu_unpause(prev);
> >>>>>>>
> >>>>>>>           vpmu = vcpu_vpmu(v);
> >>>>>>>       }
> >>>>>> We're running with IRQs off here, yet vcpu_pause() waits for the
> vcpu
> >>>>>> to actually be de-scheduled. Even with IRQs on this is already a
> >>>>>> relatively heavy operation (also including its impact on the remote
> >>>>>> side). Additionally the function is called from context_switch(),
> and
> >>>>>> I'm unsure of the usability of vcpu_pause() on such a path. In
> >>>>>> particular: Is there a risk of two CPUs doing this mutually to one
> >>>>>> another? If so, is deadlocking excluded?
> >>>>>>
> >>>>>> Hence at the very least I think the description wants extending, to
> >>>>>> discuss the safety of the change.
> >>>>>>
> >>>>>> Boris - any chance you could comment here? Iirc that's code you did
> >>>>>> introduce.
> >>>>>
> >>>>>
> >>>>> Is the assertion in vmx_find_msr() really needs to be for runnable
> vcpu or can it be a check on whether vcpu is actually running (e.g.
> RUNSTATE_running)?
> >>>>
> >>>> You cannot safely check for "running", as "runnable" may transition
> >>>> to/from "running" behind your back.
> >>>
> >>> The more I look at this the more I think the only sensible solution is
> >>> to have the vPMU state be saved on vmexit for all vCPUs.
> >>
> >> Do you really mean vmexit? It would suffice if state was reliably
> >> saved during context-switch-out, wouldn't it? At that point the
> >> vCPU can't be resumed on another pCPU, yet.
> >>
> >>> That way all
> >>> this having to figure out where and when a context needs saving during
> >>> scheduling goes away. Yes, it adds a bit of overhead for cases where
> >>> the vCPU will resume on the same pCPU and that context saved could
> >>> have been skipped,
> >>
> >> If you really mean vmexit, then I'm inclined to say that's more
> >> than just "a bit of overhead". I'd agree if you really meant
> >> context-switch-out, but as said further up it looks to me as if
> >> that was already occurring. Apparently I'm overlooking something
> >> crucial ...
> >
> > Yes, the current setup is doing exactly that, saving the vPMU context
> > on context-switch-out, and that's where the ASSERT failure occurs
> > because the vCPU it needs to save the context for is already runnable
> > on another pCPU.
>
> Well, if that's the scenario (sorry for not understanding it that
> way earlier on), then the assertion is too strict: While in context
> switch, the vCPU may be runnable, but certainly won't actually run
> anywhere. Therefore I'd be inclined to suggest to relax the
> condition just enough to cover this case, without actually going to
> checking for "running".
>

What ensures the vCPU won't actually run anywhere if it's in the runnable
state? And how do I relax the condition just enough to cover just this case?

Tamas

Reply via email to