Re: [Xen-devel] [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
>>> On 20.02.18 at 12:58,wrote: > The handling of RDTSCP for PV guests has been broken (AFAICT forever). > > To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should > be unreachable. However, this appears to be a "feature" of TSC_MODE_PVRDTSCP, > and the emulator doesn't perform appropriate feature checking. (Conversely, > we unilaterally advertise RDPID which uses the same path, but it should never > trap on #GP to arrive here in the first place). > > A PV guest typically can see RDTSCP in native CPUID, so userspace will > probably end up using it. On a capable pipeline (without TSD, see below), it > will execute normally and return non-virtualised data. > > When a virtual TSC mode is not specified for the domain, CR4.TSD is left > clear, so executing RDTSCP will execute without trapping. However, a guest > kernel may set TSD itself, at which point the emulator should not suddenly > switch to virtualised TSC mode and start handing out differently-scaled > values. > > Drop all the deferral logic, and return scaled or raw TSC values depending > only on currd->arch.vtsc. This changes the exact moment at which the > timestamp is taken, but that doesn't matter from the guests point of view, and > is consistent with the HVM side of things. It also means that RDTSC and > RDTSCP are now consistent WRT handing out native or virtualised timestamps. The reason I didn't want to drop that deferral logic back when I converted this code to use the main emulator was that pv_soft_rdtsc() updates guest state beyond register values. That is if the TSC_AUX access fails (and hence instruction emulation as a whole fails), we would still have updated that other state. The stats part of this is quite likely irrelevant in this regard, but the update to d->arch.vtsc_last may yield unintended guest observable change in behavior. I don't mean to say this is a no-go, but it is a change that goes beyond what you describe, and I'd like it to (a) be spelled out and (b) given an explanation of why this is okay. > The MSR_TSC_AUX case unconditionally returns the migration incarnation or > zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out > of hardware. This, iiuc, is correct solely because we don't currently permit PV guests to write TSC_AUX. Which renders wrong exposing RDPID to such guests. But without having checked yet, I could imagine patch 5 to take care of this. The changes themselves look fine to me, provided the wider behavioral change is both intended and acceptable. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
On 20/02/18 16:28, Roger Pau Monné wrote: > On Tue, Feb 20, 2018 at 11:58:42AM +, Andrew Cooper wrote: >> The handling of RDTSCP for PV guests has been broken (AFAICT forever). >> >> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should >> be unreachable. However, this appears to be a "feature" of >> TSC_MODE_PVRDTSCP, >> and the emulator doesn't perform appropriate feature checking. (Conversely, >> we unilaterally advertise RDPID which uses the same path, but it should never >> trap on #GP to arrive here in the first place). >> >> A PV guest typically can see RDTSCP in native CPUID, so userspace will >> probably end up using it. On a capable pipeline (without TSD, see below), it >> will execute normally and return non-virtualised data. >> >> When a virtual TSC mode is not specified for the domain, CR4.TSD is left >> clear, so executing RDTSCP will execute without trapping. However, a guest >> kernel may set TSD itself, at which point the emulator should not suddenly >> switch to virtualised TSC mode and start handing out differently-scaled >> values. >> >> Drop all the deferral logic, and return scaled or raw TSC values depending >> only on currd->arch.vtsc. This changes the exact moment at which the >> timestamp is taken, but that doesn't matter from the guests point of view, >> and >> is consistent with the HVM side of things. It also means that RDTSC and >> RDTSCP are now consistent WRT handing out native or virtualised timestamps. >> >> The MSR_TSC_AUX case unconditionally returns the migration incarnation or >> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out >> of hardware. >> >> This is a behavioural change for guests, but the semantics are rather more >> sane. It lays groundwork for further fixes. >> >> Signed-off-by: Andrew Cooper>> --- >> CC: Jan Beulich >> CC: Wei Liu >> CC: Roger Pau Monné >> CC: Konrad Rzeszutek Wilk >> --- >> xen/arch/x86/pv/emul-priv-op.c | 35 +-- >> 1 file changed, 5 insertions(+), 30 deletions(-) >> >> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c >> index d4d64f2..4e3641d 100644 >> --- a/xen/arch/x86/pv/emul-priv-op.c >> +++ b/xen/arch/x86/pv/emul-priv-op.c >> @@ -60,9 +60,6 @@ struct priv_op_ctxt { >> } cs; >> char *io_emul_stub; >> unsigned int bpmatch; >> -unsigned int tsc; >> -#define TSC_BASE 1 >> -#define TSC_AUX 2 >> }; >> >> /* I/O emulation support. Helper routines for, and type of, the stack stub. >> */ >> @@ -843,8 +840,7 @@ static inline bool is_cpufreq_controller(const struct >> domain *d) >> static int read_msr(unsigned int reg, uint64_t *val, >> struct x86_emulate_ctxt *ctxt) >> { >> -struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, >> ctxt); >> -const struct vcpu *curr = current; >> +struct vcpu *curr = current; > I think you can keep the const here? pv_soft_rdtsc() mutates curr. This change is most certainly not spurious. >> const struct domain *currd = curr->domain; >> bool vpmu_msr = false; >> int ret; >> @@ -880,20 +876,13 @@ static int read_msr(unsigned int reg, uint64_t *val, >> *val = curr->arch.pv_vcpu.gs_base_user; >> return X86EMUL_OKAY; >> >> -/* >> - * In order to fully retain original behavior, defer calling >> - * pv_soft_rdtsc() until after emulation. This may want/need to be >> - * reconsidered. >> - */ >> case MSR_IA32_TSC: >> -poc->tsc |= TSC_BASE; >> -goto normal; >> +*val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc(); >> +return X86EMUL_OKAY; >> >> case MSR_TSC_AUX: >> -poc->tsc |= TSC_AUX; >> -if ( cpu_has_rdtscp ) >> -goto normal; >> -*val = 0; >> +*val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP) >> + ? currd->arch.incarnation : 0); > I wonder whether Xen should inject a #GP here if tsc_mode is not > PVRDTSCP and RDPID is not available. RDPID would be a layering violation. Also, a lot of this changes again in patch 5. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
On Tue, Feb 20, 2018 at 11:58:42AM +, Andrew Cooper wrote: > The handling of RDTSCP for PV guests has been broken (AFAICT forever). > > To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should > be unreachable. However, this appears to be a "feature" of TSC_MODE_PVRDTSCP, > and the emulator doesn't perform appropriate feature checking. (Conversely, > we unilaterally advertise RDPID which uses the same path, but it should never > trap on #GP to arrive here in the first place). > > A PV guest typically can see RDTSCP in native CPUID, so userspace will > probably end up using it. On a capable pipeline (without TSD, see below), it > will execute normally and return non-virtualised data. > > When a virtual TSC mode is not specified for the domain, CR4.TSD is left > clear, so executing RDTSCP will execute without trapping. However, a guest > kernel may set TSD itself, at which point the emulator should not suddenly > switch to virtualised TSC mode and start handing out differently-scaled > values. > > Drop all the deferral logic, and return scaled or raw TSC values depending > only on currd->arch.vtsc. This changes the exact moment at which the > timestamp is taken, but that doesn't matter from the guests point of view, and > is consistent with the HVM side of things. It also means that RDTSC and > RDTSCP are now consistent WRT handing out native or virtualised timestamps. > > The MSR_TSC_AUX case unconditionally returns the migration incarnation or > zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out > of hardware. > > This is a behavioural change for guests, but the semantics are rather more > sane. It lays groundwork for further fixes. > > Signed-off-by: Andrew Cooper> --- > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > CC: Konrad Rzeszutek Wilk > --- > xen/arch/x86/pv/emul-priv-op.c | 35 +-- > 1 file changed, 5 insertions(+), 30 deletions(-) > > diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c > index d4d64f2..4e3641d 100644 > --- a/xen/arch/x86/pv/emul-priv-op.c > +++ b/xen/arch/x86/pv/emul-priv-op.c > @@ -60,9 +60,6 @@ struct priv_op_ctxt { > } cs; > char *io_emul_stub; > unsigned int bpmatch; > -unsigned int tsc; > -#define TSC_BASE 1 > -#define TSC_AUX 2 > }; > > /* I/O emulation support. Helper routines for, and type of, the stack stub. > */ > @@ -843,8 +840,7 @@ static inline bool is_cpufreq_controller(const struct > domain *d) > static int read_msr(unsigned int reg, uint64_t *val, > struct x86_emulate_ctxt *ctxt) > { > -struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt); > -const struct vcpu *curr = current; > +struct vcpu *curr = current; I think you can keep the const here? > const struct domain *currd = curr->domain; > bool vpmu_msr = false; > int ret; > @@ -880,20 +876,13 @@ static int read_msr(unsigned int reg, uint64_t *val, > *val = curr->arch.pv_vcpu.gs_base_user; > return X86EMUL_OKAY; > > -/* > - * In order to fully retain original behavior, defer calling > - * pv_soft_rdtsc() until after emulation. This may want/need to be > - * reconsidered. > - */ > case MSR_IA32_TSC: > -poc->tsc |= TSC_BASE; > -goto normal; > +*val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc(); > +return X86EMUL_OKAY; > > case MSR_TSC_AUX: > -poc->tsc |= TSC_AUX; > -if ( cpu_has_rdtscp ) > -goto normal; > -*val = 0; > +*val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP) > + ? currd->arch.incarnation : 0); I wonder whether Xen should inject a #GP here if tsc_mode is not PVRDTSCP and RDPID is not available. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()
On Tue, Feb 20, 2018 at 11:58:42AM +, Andrew Cooper wrote: > The handling of RDTSCP for PV guests has been broken (AFAICT forever). > > To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should > be unreachable. However, this appears to be a "feature" of TSC_MODE_PVRDTSCP, > and the emulator doesn't perform appropriate feature checking. (Conversely, > we unilaterally advertise RDPID which uses the same path, but it should never > trap on #GP to arrive here in the first place). > > A PV guest typically can see RDTSCP in native CPUID, so userspace will > probably end up using it. On a capable pipeline (without TSD, see below), it > will execute normally and return non-virtualised data. > > When a virtual TSC mode is not specified for the domain, CR4.TSD is left > clear, so executing RDTSCP will execute without trapping. However, a guest > kernel may set TSD itself, at which point the emulator should not suddenly > switch to virtualised TSC mode and start handing out differently-scaled > values. > > Drop all the deferral logic, and return scaled or raw TSC values depending > only on currd->arch.vtsc. This changes the exact moment at which the > timestamp is taken, but that doesn't matter from the guests point of view, and > is consistent with the HVM side of things. It also means that RDTSC and > RDTSCP are now consistent WRT handing out native or virtualised timestamps. > > The MSR_TSC_AUX case unconditionally returns the migration incarnation or > zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out > of hardware. > > This is a behavioural change for guests, but the semantics are rather more > sane. It lays groundwork for further fixes. > > Signed-off-by: Andrew CooperReviewed-by: Wei Liu ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel