Re: [Xen-devel] [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()

2018-02-23 Thread Jan Beulich
>>> On 20.02.18 at 12:58,  wrote:
> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
> 
> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
> be unreachable.  However, this appears to be a "feature" of TSC_MODE_PVRDTSCP,
> and the emulator doesn't perform appropriate feature checking.  (Conversely,
> we unilaterally advertise RDPID which uses the same path, but it should never
> trap on #GP to arrive here in the first place).
> 
> A PV guest typically can see RDTSCP in native CPUID, so userspace will
> probably end up using it.  On a capable pipeline (without TSD, see below), it
> will execute normally and return non-virtualised data.
> 
> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
> clear, so executing RDTSCP will execute without trapping.  However, a guest
> kernel may set TSD itself, at which point the emulator should not suddenly
> switch to virtualised TSC mode and start handing out differently-scaled
> values.
> 
> Drop all the deferral logic, and return scaled or raw TSC values depending
> only on currd->arch.vtsc.  This changes the exact moment at which the
> timestamp is taken, but that doesn't matter from the guests point of view, and
> is consistent with the HVM side of things.  It also means that RDTSC and
> RDTSCP are now consistent WRT handing out native or virtualised timestamps.

The reason I didn't want to drop that deferral logic back when I
converted this code to use the main emulator was that
pv_soft_rdtsc() updates guest state beyond register values.
That is if the TSC_AUX access fails (and hence instruction
emulation as a whole fails), we would still have updated that
other state. The stats part of this is quite likely irrelevant in this
regard, but the update to d->arch.vtsc_last may yield
unintended guest observable change in behavior.

I don't mean to say this is a no-go, but it is a change that goes
beyond what you describe, and I'd like it to (a) be spelled out
and (b) given an explanation of why this is okay.

> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
> of hardware.

This, iiuc, is correct solely because we don't currently permit PV
guests to write TSC_AUX. Which renders wrong exposing RDPID
to such guests. But without having checked yet, I could imagine
patch 5 to take care of this.

The changes themselves look fine to me, provided the wider
behavioral change is both intended and acceptable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()

2018-02-20 Thread Andrew Cooper
On 20/02/18 16:28, Roger Pau Monné wrote:
> On Tue, Feb 20, 2018 at 11:58:42AM +, Andrew Cooper wrote:
>> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
>>
>> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
>> be unreachable.  However, this appears to be a "feature" of 
>> TSC_MODE_PVRDTSCP,
>> and the emulator doesn't perform appropriate feature checking.  (Conversely,
>> we unilaterally advertise RDPID which uses the same path, but it should never
>> trap on #GP to arrive here in the first place).
>>
>> A PV guest typically can see RDTSCP in native CPUID, so userspace will
>> probably end up using it.  On a capable pipeline (without TSD, see below), it
>> will execute normally and return non-virtualised data.
>>
>> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
>> clear, so executing RDTSCP will execute without trapping.  However, a guest
>> kernel may set TSD itself, at which point the emulator should not suddenly
>> switch to virtualised TSC mode and start handing out differently-scaled
>> values.
>>
>> Drop all the deferral logic, and return scaled or raw TSC values depending
>> only on currd->arch.vtsc.  This changes the exact moment at which the
>> timestamp is taken, but that doesn't matter from the guests point of view, 
>> and
>> is consistent with the HVM side of things.  It also means that RDTSC and
>> RDTSCP are now consistent WRT handing out native or virtualised timestamps.
>>
>> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
>> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
>> of hardware.
>>
>> This is a behavioural change for guests, but the semantics are rather more
>> sane.  It lays groundwork for further fixes.
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Wei Liu 
>> CC: Roger Pau Monné 
>> CC: Konrad Rzeszutek Wilk 
>> ---
>>  xen/arch/x86/pv/emul-priv-op.c | 35 +--
>>  1 file changed, 5 insertions(+), 30 deletions(-)
>>
>> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
>> index d4d64f2..4e3641d 100644
>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -60,9 +60,6 @@ struct priv_op_ctxt {
>>  } cs;
>>  char *io_emul_stub;
>>  unsigned int bpmatch;
>> -unsigned int tsc;
>> -#define TSC_BASE 1
>> -#define TSC_AUX 2
>>  };
>>  
>>  /* I/O emulation support. Helper routines for, and type of, the stack stub. 
>> */
>> @@ -843,8 +840,7 @@ static inline bool is_cpufreq_controller(const struct 
>> domain *d)
>>  static int read_msr(unsigned int reg, uint64_t *val,
>>  struct x86_emulate_ctxt *ctxt)
>>  {
>> -struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, 
>> ctxt);
>> -const struct vcpu *curr = current;
>> +struct vcpu *curr = current;
> I think you can keep the const here?

pv_soft_rdtsc() mutates curr.  This change is most certainly not spurious.

>>  const struct domain *currd = curr->domain;
>>  bool vpmu_msr = false;
>>  int ret;
>> @@ -880,20 +876,13 @@ static int read_msr(unsigned int reg, uint64_t *val,
>>  *val = curr->arch.pv_vcpu.gs_base_user;
>>  return X86EMUL_OKAY;
>>  
>> -/*
>> - * In order to fully retain original behavior, defer calling
>> - * pv_soft_rdtsc() until after emulation. This may want/need to be
>> - * reconsidered.
>> - */
>>  case MSR_IA32_TSC:
>> -poc->tsc |= TSC_BASE;
>> -goto normal;
>> +*val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc();
>> +return X86EMUL_OKAY;
>>  
>>  case MSR_TSC_AUX:
>> -poc->tsc |= TSC_AUX;
>> -if ( cpu_has_rdtscp )
>> -goto normal;
>> -*val = 0;
>> +*val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
>> +  ? currd->arch.incarnation : 0);
> I wonder whether Xen should inject a #GP here if tsc_mode is not
> PVRDTSCP and RDPID is not available.

RDPID would be a layering violation.  Also, a lot of this changes again
in patch 5.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()

2018-02-20 Thread Roger Pau Monné
On Tue, Feb 20, 2018 at 11:58:42AM +, Andrew Cooper wrote:
> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
> 
> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
> be unreachable.  However, this appears to be a "feature" of TSC_MODE_PVRDTSCP,
> and the emulator doesn't perform appropriate feature checking.  (Conversely,
> we unilaterally advertise RDPID which uses the same path, but it should never
> trap on #GP to arrive here in the first place).
> 
> A PV guest typically can see RDTSCP in native CPUID, so userspace will
> probably end up using it.  On a capable pipeline (without TSD, see below), it
> will execute normally and return non-virtualised data.
> 
> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
> clear, so executing RDTSCP will execute without trapping.  However, a guest
> kernel may set TSD itself, at which point the emulator should not suddenly
> switch to virtualised TSC mode and start handing out differently-scaled
> values.
> 
> Drop all the deferral logic, and return scaled or raw TSC values depending
> only on currd->arch.vtsc.  This changes the exact moment at which the
> timestamp is taken, but that doesn't matter from the guests point of view, and
> is consistent with the HVM side of things.  It also means that RDTSC and
> RDTSCP are now consistent WRT handing out native or virtualised timestamps.
> 
> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
> of hardware.
> 
> This is a behavioural change for guests, but the semantics are rather more
> sane.  It lays groundwork for further fixes.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Konrad Rzeszutek Wilk 
> ---
>  xen/arch/x86/pv/emul-priv-op.c | 35 +--
>  1 file changed, 5 insertions(+), 30 deletions(-)
> 
> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> index d4d64f2..4e3641d 100644
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -60,9 +60,6 @@ struct priv_op_ctxt {
>  } cs;
>  char *io_emul_stub;
>  unsigned int bpmatch;
> -unsigned int tsc;
> -#define TSC_BASE 1
> -#define TSC_AUX 2
>  };
>  
>  /* I/O emulation support. Helper routines for, and type of, the stack stub. 
> */
> @@ -843,8 +840,7 @@ static inline bool is_cpufreq_controller(const struct 
> domain *d)
>  static int read_msr(unsigned int reg, uint64_t *val,
>  struct x86_emulate_ctxt *ctxt)
>  {
> -struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
> -const struct vcpu *curr = current;
> +struct vcpu *curr = current;

I think you can keep the const here?

>  const struct domain *currd = curr->domain;
>  bool vpmu_msr = false;
>  int ret;
> @@ -880,20 +876,13 @@ static int read_msr(unsigned int reg, uint64_t *val,
>  *val = curr->arch.pv_vcpu.gs_base_user;
>  return X86EMUL_OKAY;
>  
> -/*
> - * In order to fully retain original behavior, defer calling
> - * pv_soft_rdtsc() until after emulation. This may want/need to be
> - * reconsidered.
> - */
>  case MSR_IA32_TSC:
> -poc->tsc |= TSC_BASE;
> -goto normal;
> +*val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc();
> +return X86EMUL_OKAY;
>  
>  case MSR_TSC_AUX:
> -poc->tsc |= TSC_AUX;
> -if ( cpu_has_rdtscp )
> -goto normal;
> -*val = 0;
> +*val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
> +  ? currd->arch.incarnation : 0);

I wonder whether Xen should inject a #GP here if tsc_mode is not
PVRDTSCP and RDPID is not available.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/5] x86/pv: Remove deferred RDTSC{, P} handling in pv_emulate_privileged_op()

2018-02-20 Thread Wei Liu
On Tue, Feb 20, 2018 at 11:58:42AM +, Andrew Cooper wrote:
> The handling of RDTSCP for PV guests has been broken (AFAICT forever).
> 
> To start with, RDTSCP is hidden from PV guests so the MSR_TSC_AUX path should
> be unreachable.  However, this appears to be a "feature" of TSC_MODE_PVRDTSCP,
> and the emulator doesn't perform appropriate feature checking.  (Conversely,
> we unilaterally advertise RDPID which uses the same path, but it should never
> trap on #GP to arrive here in the first place).
> 
> A PV guest typically can see RDTSCP in native CPUID, so userspace will
> probably end up using it.  On a capable pipeline (without TSD, see below), it
> will execute normally and return non-virtualised data.
> 
> When a virtual TSC mode is not specified for the domain, CR4.TSD is left
> clear, so executing RDTSCP will execute without trapping.  However, a guest
> kernel may set TSD itself, at which point the emulator should not suddenly
> switch to virtualised TSC mode and start handing out differently-scaled
> values.
> 
> Drop all the deferral logic, and return scaled or raw TSC values depending
> only on currd->arch.vtsc.  This changes the exact moment at which the
> timestamp is taken, but that doesn't matter from the guests point of view, and
> is consistent with the HVM side of things.  It also means that RDTSC and
> RDTSCP are now consistent WRT handing out native or virtualised timestamps.
> 
> The MSR_TSC_AUX case unconditionally returns the migration incarnation or
> zero, depending on TSC_MODE_PVRDTSCP, which is faster than re-reading it out
> of hardware.
> 
> This is a behavioural change for guests, but the semantics are rather more
> sane.  It lays groundwork for further fixes.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel