RE: [PATCH v15 7/9] ptp: arm/arm64: Enable ptp_kvm for arm/arm64

2020-11-23 Thread Jianyong Wu
Hi Marc,

> -Original Message-
> From: Marc Zyngier 
> Sent: Monday, November 23, 2020 6:49 PM
> To: Jianyong Wu 
> Cc: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com;
> richardcoch...@gmail.com; Mark Rutland ;
> w...@kernel.org; Suzuki Poulose ; Andre
> Przywara ; Steven Price
> ; linux-ker...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> k...@vger.kernel.org; Steve Capper ; Justin He
> ; nd 
> Subject: Re: [PATCH v15 7/9] ptp: arm/arm64: Enable ptp_kvm for
> arm/arm64
> 
> On 2020-11-11 06:22, Jianyong Wu wrote:
> > Currently, there is no mechanism to keep time sync between guest and
> > host in arm/arm64 virtualization environment. Time in guest will drift
> > compared with host after boot up as they may both use third party time
> > sources to correct their time respectively. The time deviation will be
> > in order of milliseconds. But in some scenarios,like in cloud
> > envirenment, we ask
> 
> environment
> 
OK

> > for higher time precision.
> >
> > kvm ptp clock, which chooses the host clock source as a reference
> > clock to sync time between guest and host, has been adopted by x86
> > which takes the time sync order from milliseconds to nanoseconds.
> >
> > This patch enables kvm ptp clock for arm/arm64 and improves clock sync
> > precison
> 
> precision
>
OK
 
> > significantly.
> >
> > Test result comparisons between with kvm ptp clock and without it in
> > arm/arm64
> > are as follows. This test derived from the result of command 'chronyc
> > sources'. we should take more care of the last sample column which
> > shows the offset between the local clock and the source at the last
> > measurement.
> >
> > no kvm ptp in guest:
> > MS Name/IP address   Stratum Poll Reach LastRx Last sample
> >
> ==
> ==
> > ^* dns1.synet.edu.cn  2   6   37713  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn  2   6   37721  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn  2   6   37729  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn  2   6   37737  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn  2   6   37745  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn  2   6   37753  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn  2   6   37761  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn  2   6   377 4   -130us[ +796us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn  2   6   37712   -130us[ +796us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn  2   6   37720   -130us[ +796us] +/-
> > 21ms
> >
> > in host:
> > MS Name/IP address   Stratum Poll Reach LastRx Last sample
> >
> ==
> ==
> > ^* 120.25.115.20  2   7   37772   -470us[ -603us] +/-
> > 18ms
> > ^* 120.25.115.20  2   7   37792   -470us[ -603us] +/-
> > 18ms
> > ^* 120.25.115.20  2   7   377   112   -470us[ -603us] +/-
> > 18ms
> > ^* 120.25.115.20  2   7   377 2   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20  2   7   37722   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20  2   7   37743   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20  2   7   37763   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20  2   7   37783   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20  2   7   377   103   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20  2   7   377   123   +872ns[-6808ns] +/-
> > 17ms
> >
> > The dns1.synet.edu.cn is the network reference clock for guest and
> > 120.25.115.20 is the network reference clock for host. we can't get
> > the clock error between guest and host directly, but a roughly
> > estimated value will be in order of hundreds of us to ms.
> >
> > with kvm ptp in guest:
> > chrony has been disabled in host to remove the disturb by network
> > clock.
> >
> > MS Name/IP address Stratum Poll Reach LastRx Last sample
> >
> ==
> ==
> > * PHC00   3   377 8 -7ns[   +1ns] +/-
> > 3ns
> > * PHC00   3   377 8 +1ns[  +16ns] +/-
> > 3ns
> > * PHC00   3   377 6 -4ns[   -0ns] +/-
> > 6ns
> > * PHC00   3   377 6 -8ns[  -12ns] +/-
> > 5ns
> > * PHC00   3   377 5 +2ns[   +4ns] +/-
> > 4ns
> > * PHC00   3   37713 +2ns[   +4ns] +/-
> > 4ns
> > * PHC00   3   37712 -4ns[   -6ns] +/-
> > 4ns
> > * PHC00   3   37711 -8ns[  -11ns] +/-
> > 6ns
> > * PHC00   3   37710-14ns[  -20ns] +/-
> > 4ns
> > * PHC00   3   377 8 

RE: [PATCH v15 8/9] doc: add ptp_kvm introduction for arm64 support

2020-11-23 Thread Jianyong Wu
Hi Marc,

> -Original Message-
> From: Marc Zyngier 
> Sent: Monday, November 23, 2020 6:58 PM
> To: Jianyong Wu 
> Cc: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com;
> richardcoch...@gmail.com; Mark Rutland ;
> w...@kernel.org; Suzuki Poulose ; Andre
> Przywara ; Steven Price
> ; linux-ker...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> k...@vger.kernel.org; Steve Capper ; Justin He
> ; nd 
> Subject: Re: [PATCH v15 8/9] doc: add ptp_kvm introduction for arm64
> support
> 
> On 2020-11-11 06:22, Jianyong Wu wrote:
> > PTP_KVM implementation depends on hypercall using SMCCC. So we
> > introduce a new SMCCC service ID. This doc explains how does the ID
> > define and how does PTP_KVM works on arm/arm64.
> >
> > Signed-off-by: Jianyong Wu 
> > ---
> >  Documentation/virt/kvm/api.rst |  9 +++
> >  Documentation/virt/kvm/arm/index.rst   |  1 +
> >  Documentation/virt/kvm/arm/ptp_kvm.rst | 29
> +
> > Documentation/virt/kvm/timekeeping.rst | 35
> ++
> >  4 files changed, 74 insertions(+)
> >  create mode 100644 Documentation/virt/kvm/arm/ptp_kvm.rst
> >
> > diff --git a/Documentation/virt/kvm/api.rst
> > b/Documentation/virt/kvm/api.rst index 36d5f1f3c6dd..9843dbcbf770
> > 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6391,3 +6391,12 @@ When enabled, KVM will disable paravirtual
> > features provided to the  guest according to the bits in the
> > KVM_CPUID_FEATURES CPUID leaf  (0x4001). Otherwise, a guest may
> > use the paravirtual features  regardless of what has actually been
> > exposed through the CPUID leaf.
> > +
> > +8.27 KVM_CAP_PTP_KVM
> > +
> > +
> > +:Architectures: arm64
> > +
> > +This capability indicates that KVM virtual PTP service is supported
> > +in
> > host.
> > +It must company with the implementation of KVM virtual PTP service in
> > host
> > +so VMM can probe if there is the service in host by checking this
> > capability.
> > diff --git a/Documentation/virt/kvm/arm/index.rst
> > b/Documentation/virt/kvm/arm/index.rst
> > index 3e2b2aba90fc..78a9b670aafe 100644
> > --- a/Documentation/virt/kvm/arm/index.rst
> > +++ b/Documentation/virt/kvm/arm/index.rst
> > @@ -10,3 +10,4 @@ ARM
> > hyp-abi
> > psci
> > pvtime
> > +   ptp_kvm
> > diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst
> > b/Documentation/virt/kvm/arm/ptp_kvm.rst
> > new file mode 100644
> > index ..bb1e6cfefe44
> > --- /dev/null
> > +++ b/Documentation/virt/kvm/arm/ptp_kvm.rst
> > @@ -0,0 +1,29 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +PTP_KVM support for arm/arm64
> > +=
> > +
> > +PTP_KVM is used for time sync between guest and host in a high
> > precision.
> > +It needs to get the wall time and counter value from the host and
> > transfer these
> > +to guest via hypercall service. So one more hypercall service has
> > +been
> > added.
> > +
> > +This new SMCCC hypercall is defined as:
> > +
> > +* ARM_SMCCC_HYP_KVM_PTP_FUNC_ID: 0x8601
> > +
> > +As both 32 and 64-bits ptp_kvm client should be supported, we choose
> > SMC32/HVC32
> > +calling convention.
> > +
> > +ARM_SMCCC_HYP_KVM_PTP_FUNC_ID:
> > +
> > +=====
> > +Function ID: (uint32)  0x8601
> > +Arguments:  (uint32)  ARM_PTP_PHY_COUNTER(1) or
> > ARM_PTP_VIRT_COUNTER(0)
> > +   which indicate acquiring physical
> > counter or
> > +   virtual counter respectively.
> > +return value:(uint32)  NOT_SUPPORTED(-1) or val0 and val1
> > represent
> > +   wall clock time and val2 and val3
> > represent
> > +   counter cycle.
> 
> This needs a lot more description:
> 
> - Which word contains what part of the data (upper/lower part of the 64bit
> data)
> - The endianness of the data returned

OK.

Thanks
Jianyong 
> 
>  M.
> --
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.

2020-11-23 Thread Jianyong Wu
Hi Marc,

> -Original Message-
> From: Marc Zyngier 
> Sent: Monday, November 23, 2020 6:44 PM
> To: Jianyong Wu 
> Cc: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com;
> richardcoch...@gmail.com; Mark Rutland ;
> w...@kernel.org; Suzuki Poulose ; Andre
> Przywara ; Steven Price
> ; linux-ker...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> k...@vger.kernel.org; Steve Capper ; Justin He
> ; nd 
> Subject: Re: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.
> 
> On 2020-11-11 06:22, Jianyong Wu wrote:
> > ptp_kvm will get this service through SMCC call.
> > The service offers wall time and cycle count of host to guest.
> > The caller must specify whether they want the host cycle count or the
> > difference between host cycle count and cntvoff.
> >
> > Signed-off-by: Jianyong Wu 
> > ---
> >  arch/arm64/kvm/hypercalls.c | 61
> +
> >  include/linux/arm-smccc.h   | 17 +++
> >  2 files changed, 78 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index b9d8607083eb..f7d189563f3d 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -9,6 +9,51 @@
> >  #include 
> >  #include 
> >
> > +static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val) {
> > +   struct system_time_snapshot systime_snapshot;
> > +   u64 cycles = ~0UL;
> > +   u32 feature;
> > +
> > +   /*
> > +* system time and counter value must captured in the same
> > +* time to keep consistency and precision.
> > +*/
> > +   ktime_get_snapshot(_snapshot);
> > +
> > +   // binding ptp_kvm clocksource to arm_arch_counter
> > +   if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
> > +   return;
> > +
> > +   val[0] = upper_32_bits(systime_snapshot.real);
> > +   val[1] = lower_32_bits(systime_snapshot.real);
> 
> What is the endianness of these values? I can't see it defined anywhere, and
> this is likely not to work if guest and hypervisor don't align.
> 
> > +
> > +   /*
> > +* which of virtual counter or physical counter being
> > +* asked for is decided by the r1 value of SMCCC
> > +* call. If no invalid r1 value offered, default cycle
> > +* value(-1) will be returned.
> > +* Note: keep in mind that feature is u32 and smccc_get_arg1
> > +* will return u64, so need auto cast here.
> > +*/
> > +   feature = smccc_get_arg1(vcpu);
> > +   switch (feature) {
> > +   case ARM_PTP_VIRT_COUNTER:
> > +   cycles = systime_snapshot.cycles - vcpu_read_sys_reg(vcpu,
> > CNTVOFF_EL2);
> > +   break;
> > +   case ARM_PTP_PHY_COUNTER:
> > +   cycles = systime_snapshot.cycles;
> > +   break;
> > +   case ARM_PTP_NONE_COUNTER:
> 
> What is this "NONE" counter?

Yeah, there is no counter named "NONE". this is not a counter, and only means 
no counter data needed for guest and just do nothing.
If no this arm here, it will go to the default one and return "NOT_SUPPORTED"

> 
> > +   break;
> > +   default:
> > +   val[0] = SMCCC_RET_NOT_SUPPORTED;
> > +   break;
> > +   }
> > +   val[2] = upper_32_bits(cycles);
> > +   val[3] = lower_32_bits(cycles);
> 
> Same problem as above.
> 
> > +}
> > +
> >  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)  {
> > u32 func_id = smccc_get_function(vcpu); @@ -79,6 +124,22 @@ int
> > kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > break;
> > case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> > val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> > +   val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP);
> > +   break;
> > +   /*
> > +* This serves virtual kvm_ptp.
> > +* Four values will be passed back.
> > +* reg0 stores high 32-bits of host ktime;
> > +* reg1 stores low 32-bits of host ktime;
> > +* For ARM_PTP_VIRT_COUNTER:
> > +* reg2 stores high 32-bits of difference of host cycles and cntvoff;
> > +* reg3 stores low 32-bits of difference of host cycles and cntvoff.
> > +* For ARM_PTP_PHY_COUNTER:
> > +* reg2 stores the high 32-bits of host cycles;
> > +* reg3 stores the low 32-bits of host cycles.
> > +*/
> > +   case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > +   kvm_ptp_get_time(vcpu, val);
> > break;
> > default:
> > return kvm_psci_call(vcpu);
> > diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> > index d75408141137..a03c5dd409d3 100644
> > --- a/include/linux/arm-smccc.h
> > +++ b/include/linux/arm-smccc.h
> > @@ -103,6 +103,7 @@
> >
> >  /* KVM "vendor specific" services */
> >  #define ARM_SMCCC_KVM_FUNC_FEATURES0
> > +#define ARM_SMCCC_KVM_FUNC_KVM_PTP 1
> 
> I think having KVM once in the name is enough.
> 
> >  #define ARM_SMCCC_KVM_FUNC_FEATURES_2  127
> >  #define 

RE: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.

2020-11-23 Thread Jianyong Wu
Hi Marc,

> -Original Message-
> From: Marc Zyngier 
> Sent: Monday, November 23, 2020 7:59 PM
> To: Jianyong Wu 
> Cc: Justin He ; k...@vger.kernel.org;
> net...@vger.kernel.org; richardcoch...@gmail.com; linux-
> ker...@vger.kernel.org; sean.j.christopher...@intel.com; Steven Price
> ; Andre Przywara ;
> john.stu...@linaro.org; yangbo...@nxp.com; pbonz...@redhat.com;
> t...@linutronix.de; nd ; w...@kernel.org;
> kvmarm@lists.cs.columbia.edu; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.
> 
> On 2020-11-23 10:44, Marc Zyngier wrote:
> > On 2020-11-11 06:22, Jianyong Wu wrote:
> >> ptp_kvm will get this service through SMCC call.
> >> The service offers wall time and cycle count of host to guest.
> >> The caller must specify whether they want the host cycle count or the
> >> difference between host cycle count and cntvoff.
> >>
> >> Signed-off-by: Jianyong Wu 
> >> ---
> >>  arch/arm64/kvm/hypercalls.c | 61
> >> +
> >>  include/linux/arm-smccc.h   | 17 +++
> >>  2 files changed, 78 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/hypercalls.c
> >> b/arch/arm64/kvm/hypercalls.c index b9d8607083eb..f7d189563f3d
> 100644
> >> --- a/arch/arm64/kvm/hypercalls.c
> >> +++ b/arch/arm64/kvm/hypercalls.c
> >> @@ -9,6 +9,51 @@
> >>  #include 
> >>  #include 
> >>
> >> +static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val) {
> >> +  struct system_time_snapshot systime_snapshot;
> >> +  u64 cycles = ~0UL;
> >> +  u32 feature;
> >> +
> >> +  /*
> >> +   * system time and counter value must captured in the same
> >> +   * time to keep consistency and precision.
> >> +   */
> >> +  ktime_get_snapshot(_snapshot);
> >> +
> >> +  // binding ptp_kvm clocksource to arm_arch_counter
> >> +  if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
> >> +  return;
> >> +
> >> +  val[0] = upper_32_bits(systime_snapshot.real);
> >> +  val[1] = lower_32_bits(systime_snapshot.real);
> >
> > What is the endianness of these values? I can't see it defined
> > anywhere, and this is likely not to work if guest and hypervisor don't
> > align.
> 
> Scratch that. This is all passed via registers, so the endianness of the data 
> is
> irrelevant. Please discard any comment about endianness I made in this
> review.
> 
Yeah, these data process and transfer are no relationship with endianness. 
Thanks.

> The documentation aspect still requires to be beefed up.

So the endianness description will be "no endianness restriction".

Thanks 
Jianyong

> 
> Thanks,
> 
>  M.
> --
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 0/4] Add support for ARMv8.6 TWED feature

2020-11-23 Thread Jingyi Wang

Hi Marc,

Gentle ping, could you please give some comments on this patch or the
current test results? Thanks in advance.

Thanks,
Jingyi

On 11/13/2020 3:54 PM, Jingyi Wang wrote:

Hi all,

Sorry for the delay. I have been testing the TWED feature performance
lately. We select unixbench as the benchmark for some items of it is 
lock-intensive(fstime/fsbuffer/fsdisk). We run unixbench on a 4-VCPU
VM, and bind every two VCPUs on one PCPU. Fixed TWED value is used and 
here is the result.


  twed_value  | fstime     | fsbuffer  | fsdisk
     --+---++
  disable   | 16.0  | 14.1    | 18.0
  0  | 16.3  | 13.5    | 17.2
  1  | 17.5  | 14.7    | 17.4
  2  | 17.3  | 15.3    | 18.0
  3  | 17.7  | 15.2    | 18.9
  4  | 17.9  | 14.3    | 18.2
  5  | 17.2  | 14.1    | 19.0
  6  | 5.8   | 4.2     | 5.7
  7  | 6.2  | 5.6     | 12.8

Note:
fstime: File Copy 1024 bufsize 2000 maxblocks
fsbuffer: File Copy 256 bufsize 500 maxblocks
fsdisk: File Copy 4096 bufsize 8000 maxblocks
The index of unixbench, higher is better.

It is shown that, compared to the circumstance that TWED is disabled,
lock-intensive testing items have better performance if an appropriate
TWED value is set(up to 5.6%~11.9%). Meanwhile, the complete unixbench
test is run to prove that other testing items are not sensitive to this
parameter.

Thanks
Jingyi

On 9/29/2020 5:17 PM, Jingyi Wang wrote:

TWE Delay is an optional feature in ARMv8.6 Extentions. There is a
performance benefit in waiting for a period of time for an event to
arrive before taking the trap as it is common that event will arrive
“quite soon” after executing the WFE instruction.

This series adds support for TWED feature and implements TWE delay
value dynamic adjustment.

Thanks for Shameer's advice on this series. The function of this patch
has been tested on TWED supported hardware and the performance of it is
still on test, any advice will be welcomed.

Jingyi Wang (2):
   KVM: arm64: Make use of TWED feature
   KVM: arm64: Use dynamic TWE Delay value

Zengruan Ye (2):
   arm64: cpufeature: TWED support detection
   KVM: arm64: Add trace for TWED update

  arch/arm64/Kconfig   | 10 +
  arch/arm64/include/asm/cpucaps.h |  3 +-
  arch/arm64/include/asm/kvm_arm.h |  5 +++
  arch/arm64/include/asm/kvm_emulate.h | 38 ++
  arch/arm64/include/asm/kvm_host.h    | 19 -
  arch/arm64/include/asm/virt.h    |  8 
  arch/arm64/kernel/cpufeature.c   | 12 ++
  arch/arm64/kvm/arm.c | 58 
  arch/arm64/kvm/handle_exit.c |  2 +
  arch/arm64/kvm/trace_arm.h   | 21 ++
  10 files changed, 174 insertions(+), 2 deletions(-)


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 09/27] KVM: arm64: Allow using kvm_nvhe_sym() in hyp code

2020-11-23 Thread David Brazdil
On Mon, Nov 23, 2020 at 02:02:50PM +, 'Quentin Perret' via kernel-team 
wrote:
> On Monday 23 Nov 2020 at 12:57:23 (+), David Brazdil wrote:
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 882eb383bd75..391cf6753a13 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -1369,7 +1369,7 @@ static void cpu_prepare_hyp_mode(int cpu)
> > >  
> > >   params->vector_hyp_va = kern_hyp_va((unsigned 
> > > long)kvm_ksym_ref(__kvm_hyp_host_vector));
> > >   params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) 
> > > + PAGE_SIZE);
> > > - params->entry_hyp_va = kern_hyp_va((unsigned 
> > > long)kvm_ksym_ref(__kvm_hyp_psci_cpu_entry));
> > > + params->entry_hyp_va = kern_hyp_va((unsigned 
> > > long)kvm_ksym_ref_nvhe(__kvm_hyp_psci_cpu_entry));
> > 
> > Why is this change needed?
> 
> You mean this line specifically or the whole __kvm_hyp_psci_cpu_entry
> thing?
> 
> For the latter, it is to avoid having the compiler complain about
> __kvm_hyp_psci_cpu_entry being re-defined as a different symbol. If
> there is a better way to solve this problem I'm happy to change it -- I
> must admit I got a little confused with the namespacing along the way.

Yeah, we do need a more robust approach. It's getting out of control.

> 
> Thanks,
> Quentin
> 
> -- 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 13/27] KVM: arm64: Enable access to sanitized CPU features at EL2

2020-11-23 Thread Quentin Perret
On Monday 23 Nov 2020 at 13:22:23 (+), David Brazdil wrote:
> Could you help my understand why we need this?
> * Why do we need PI routines in the first place? Would my series that fixes
>   relocations in hyp code remove the need?
> * You added these aliases for the string routines because you were worried
>   somebody would change the implementation in arch/arm64/lib, right? But this
>   cache flush function is defined in hyp/nvhe. So why do we need to point to
>   the PI alias if we control the implementation?

Right, in the specific case of the __flush_dcache_area() function none
of the PI stuff is really needed I think. I did it this way to keep
things as consistent as possible with the host-side implementation, but
that is not required.

I understand this can cause confusion, so yes, I'll simplify this for
v2.

Cheers,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 02/27] KVM: arm64: Link position-independent string routines into .hyp.text

2020-11-23 Thread Quentin Perret
On Monday 23 Nov 2020 at 12:34:25 (+), David Brazdil wrote:
> On Tue, Nov 17, 2020 at 06:15:42PM +, 'Quentin Perret' via kernel-team 
> wrote:
> > diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> > index 8539f34d7538..dd8ccc9efb6a 100644
> > --- a/arch/arm64/kernel/image-vars.h
> > +++ b/arch/arm64/kernel/image-vars.h
> > @@ -105,6 +105,17 @@ KVM_NVHE_ALIAS(__stop___kvm_ex_table);
> >  /* Array containing bases of nVHE per-CPU memory regions. */
> >  KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
> >  
> > +/* Position-independent library routines */
> > +__kvm_nvhe_clear_page  = __kvm_nvhe___pi_clear_page;
> > +__kvm_nvhe_copy_page   = __kvm_nvhe___pi_copy_page;
> > +__kvm_nvhe_memcpy  = __kvm_nvhe___pi_memcpy;
> > +__kvm_nvhe_memset  = __kvm_nvhe___pi_memset;
> > +
> > +#ifdef CONFIG_KASAN
> > +__kvm_nvhe___memcpy= __kvm_nvhe___pi_memcpy;
> > +__kvm_nvhe___memset= __kvm_nvhe___pi_memset;
> > +#endif
> > +
> >  #endif /* CONFIG_KVM */
> 
> Nit: Would be good to use the kvm_nvhe_sym() helper for the namespacing.
> And feel free to define something like KVM_NVHE_ALIAS for PI in hyp-image.h.

Ack, that'd be much nicer, I'll fix it up for v2.

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 09/27] KVM: arm64: Allow using kvm_nvhe_sym() in hyp code

2020-11-23 Thread Quentin Perret
On Monday 23 Nov 2020 at 12:57:23 (+), David Brazdil wrote:
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 882eb383bd75..391cf6753a13 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1369,7 +1369,7 @@ static void cpu_prepare_hyp_mode(int cpu)
> >  
> > params->vector_hyp_va = kern_hyp_va((unsigned 
> > long)kvm_ksym_ref(__kvm_hyp_host_vector));
> > params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) 
> > + PAGE_SIZE);
> > -   params->entry_hyp_va = kern_hyp_va((unsigned 
> > long)kvm_ksym_ref(__kvm_hyp_psci_cpu_entry));
> > +   params->entry_hyp_va = kern_hyp_va((unsigned 
> > long)kvm_ksym_ref_nvhe(__kvm_hyp_psci_cpu_entry));
> 
> Why is this change needed?

You mean this line specifically or the whole __kvm_hyp_psci_cpu_entry
thing?

For the latter, it is to avoid having the compiler complain about
__kvm_hyp_psci_cpu_entry being re-defined as a different symbol. If
there is a better way to solve this problem I'm happy to change it -- I
must admit I got a little confused with the namespacing along the way.

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 00/24] Opt-in always-on nVHE hypervisor

2020-11-23 Thread Marc Zyngier
Hi David,

On Mon, 16 Nov 2020 20:42:54 +,
David Brazdil  wrote:
> 
> As we progress towards being able to keep guest state private to the
> host running nVHE hypervisor, this series allows the hypervisor to
> install itself on newly booted CPUs before the host is allowed to run
> on them.
> 
> All functionality described below is opt-in, guarded by an early param
> 'kvm-arm.protected'. Future patches specific to the new "protected" mode
> should be hidden behind the same param.
> 
> The hypervisor starts trapping host SMCs and intercepting host's PSCI
> CPU_ON/SUSPEND calls. It replaces the host's entry point with its own,
> initializes the EL2 state of the new CPU and installs the nVHE hyp vector
> before ERETing to the host's entry point.
> 
> The kernel checks new cores' features against the finalized system
> capabilities. To avoid the need to move this code/data to EL2, the
> implementation only allows to boot cores that were online at the time of
> KVM initialization and therefore had been checked already.
> 
> Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs
> implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded
> to EL3. Future changes will need to ensure the safety of all SMCs wrt.
> private guests.
> 
> The host is still allowed to reset EL2 back to the stub vector, eg. for
> hibernation or kexec, but will not disable nVHE when there are no VMs.

I have now been through the whole series, and I don't think there is
anything really major (although I haven't tried it yet).

I think it would benefit from being rebased on top of kvmarm/queue, as
it'd give you the opportunity to replace a number of per-CPU state
fields with global function pointers. Another thing is that we seem to
have diverging interpretations of the PSCI spec when it comes to
CPU_SUSPEND.

Finally, please include the PSCI maintainers in your next posting, as
I'll need their Ack to pick the first few patches.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 08/24] kvm: arm64: Add SMC handler in nVHE EL2

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:43:02 +,
David Brazdil  wrote:
> 
> Add handler of host SMCs in KVM nVHE trap handler. Forward all SMCs to
> EL3 and propagate the result back to EL1. This is done in preparation
> for validating host SMCs in KVM nVHE protected mode.
> 
> The implementation assumes that firmware uses SMCCC v1.2 or older. That
> means x0-x17 can be used both for arguments and results, other GPRs are
> preserved.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/kvm/hyp/nvhe/host.S | 38 ++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 26 
>  2 files changed, 64 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> index ed27f06a31ba..52dae5cd5a28 100644
> --- a/arch/arm64/kvm/hyp/nvhe/host.S
> +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> @@ -183,3 +183,41 @@ SYM_CODE_START(__kvm_hyp_host_vector)
>   invalid_host_el1_vect   // FIQ 32-bit EL1
>   invalid_host_el1_vect   // Error 32-bit EL1
>  SYM_CODE_END(__kvm_hyp_host_vector)
> +
> +/*
> + * Forward SMC with arguments in struct kvm_cpu_context, and
> + * store the result into the same struct. Assumes SMCCC 1.2 or older.
> + *
> + * x0: struct kvm_cpu_context*
> + */
> +SYM_CODE_START(__kvm_hyp_host_forward_smc)
> + /*
> +  * Use x18 to keep a pointer to the host context because x18
> +  * is callee-saved SMCCC but not in AAPCS64.
> +  */
> + mov x18, x0
> +
> + ldp x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
> + ldp x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
> + ldp x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
> + ldp x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
> + ldp x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
> + ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
> + ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
> + ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
> + ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
> +
> + smc #0
> +
> + stp x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
> + stp x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
> + stp x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
> + stp x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
> + stp x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
> + stp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
> + stp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
> + stp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
> + stp x16, x17, [x18, #CPU_XREG_OFFSET(16)]

This is going to be really good for CPUs that need to use ARCH_WA1 for
their Spectre-v2 mitigation... :-( If that's too expensive, we may
have to reduce the number of save/restored registers, but I'm worried
the battle is already lost by the time we reach this (the host trap
path is already a huge hammer).

Eventually, we'll have to insert the mitigation in the vectors anyway,
just like we have on the guest exit path. Boo.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 12/24] kvm: arm64: Bootstrap PSCI SMC handler in nVHE EL2

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:43:06 +,
David Brazdil  wrote:
> 
> Add a handler of PSCI SMCs in nVHE hyp code. The handler is initialized
> with the version used by the host's PSCI driver and the function IDs it
> was configured with. If the SMC function ID matches one of the
> configured PSCI calls (for v0.1) or falls into the PSCI function ID
> range (for v0.2+), the SMC is handled by the PSCI handler. For now, all
> SMCs return PSCI_RET_NOT_SUPPORTED.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_hyp.h |   4 ++
>  arch/arm64/kvm/arm.c |  14 
>  arch/arm64/kvm/hyp/nvhe/Makefile |   2 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |   6 +-
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c | 104 +++
>  5 files changed, 128 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/psci-relay.c
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> b/arch/arm64/include/asm/kvm_hyp.h
> index a3289071f3d8..95a2bbbcc7e1 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -96,6 +96,10 @@ void deactivate_traps_vhe_put(void);
>  
>  u64 __guest_enter(struct kvm_vcpu *vcpu);
>  
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
> +#endif
> +
>  void __noreturn hyp_panic(void);
>  #ifdef __KVM_NVHE_HYPERVISOR__
>  void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 
> par);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index cdd7981ea560..7d2270eeecfb 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #define CREATE_TRACE_POINTS
> @@ -1514,6 +1515,18 @@ static void init_cpu_logical_map(void)
>   CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
>  }
>  
> +static void init_psci_relay(void)
> +{
> + extern u32 kvm_nvhe_sym(kvm_host_psci_version);
> + extern u32 kvm_nvhe_sym(kvm_host_psci_function_id)[PSCI_FN_MAX];

nit: I'd rather have these outside of the function body.

> + int i;
> +
> + CHOOSE_NVHE_SYM(kvm_host_psci_version) = psci_ops.get_version
> + ? psci_ops.get_version() : PSCI_VERSION(0, 0);

nit: please write this with an if/else construct, it will read a lot
better.

> + for (i = 0; i < PSCI_FN_MAX; ++i)
> + CHOOSE_NVHE_SYM(kvm_host_psci_function_id)[i] = 
> psci_get_function_id(i);

Either pick kvm_nvhe_sym(), or CHOOSE_NVHE_SYM(). Having both used
together is just an annoyance (and in this case there is nothing to
choose, really).

> +}
> +
>  static int init_common_resources(void)
>  {
>   return kvm_set_ipa_limit();
> @@ -1693,6 +1706,7 @@ static int init_hyp_mode(void)
>   }
>  
>   init_cpu_logical_map();
> + init_psci_relay();
>  
>   return 0;
>  
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile 
> b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 2d842e009a40..bf62c8e42ab2 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -7,7 +7,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__
>  ccflags-y := -D__KVM_NVHE_HYPERVISOR__
>  
>  obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
> -  hyp-main.o hyp-smp.o
> +  hyp-main.o hyp-smp.o psci-relay.o
>  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
>../fpsimd.o ../hyp-entry.o
>  
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
> b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 71a17af05953..df4acb40dd39 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -120,7 +120,11 @@ static void skip_host_instruction(void)
>  
>  static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
>  {
> - default_host_smc_handler(host_ctxt);
> + bool handled;
> +
> + handled = kvm_host_psci_handler(host_ctxt);
> + if (!handled)
> + default_host_smc_handler(host_ctxt);
>  
>   /*
>* Unlike HVC, the return address of an SMC is the instruction's PC.
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c 
> b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> new file mode 100644
> index ..d75d3f896bfd
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> @@ -0,0 +1,104 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 - Google LLC
> + * Author: David Brazdil 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Config options set by the host. */
> +u32 __ro_after_init kvm_host_psci_version = PSCI_VERSION(0, 0);
> +u32 __ro_after_init kvm_host_psci_function_id[PSCI_FN_MAX];
> +
> +static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
> +{
> + return host_ctxt->regs.regs[0];
> +}
> +
> +static bool is_psci_0_1_call(u64 func_id)
> +{
> + unsigned int i;
> +
> + for (i = 0; i 

Re: [RFC PATCH 13/27] KVM: arm64: Enable access to sanitized CPU features at EL2

2020-11-23 Thread Quentin Perret
On Monday 23 Nov 2020 at 10:55:20 (+), Fuad Tabba wrote:
> > diff --git a/arch/arm64/include/asm/kvm_cpufeature.h 
> > b/arch/arm64/include/asm/kvm_cpufeature.h
> > new file mode 100644
> > index ..d34f85cba358
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/kvm_cpufeature.h
> > @@ -0,0 +1,17 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2020 - Google LLC
> > + * Author: Quentin Perret 
> > + */
> 
> Missing include guard.

Right, but on purpose :)

See how arm.c includes this header twice with different definitions of
KVM_HYP_CPU_FTR_REG for instance.

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 23/24] kvm: arm64: Trap host SMCs in protected mode.

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:43:17 +,
David Brazdil  wrote:
> 
> While protected nVHE KVM is installed, start trapping all host SMCs.
> By default, these are simply forwarded to EL3, but PSCI SMCs are
> validated first.
> 
> Create new constant HCR_HOST_NVHE_PROTECTED_FLAGS with the new set of HCR
> flags to use while the nVHE vector is installed when the kernel was
> booted with the protected flag enabled. Switch back to the default HCR
> flags when switching back to the stub vector.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_arm.h   |  1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S | 12 
>  arch/arm64/kvm/hyp/nvhe/switch.c   |  5 -
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index 64ce29378467..4e90c2debf70 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -80,6 +80,7 @@
>HCR_FMO | HCR_IMO | HCR_PTW )
>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
>  #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
> +#define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
>  #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
>  
>  /* TCR_EL2 Registers bits */
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S 
> b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index 6d8202d2bdfb..8f3602f320ac 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -88,6 +88,12 @@ SYM_CODE_END(__kvm_hyp_init)
>   * x0: struct kvm_nvhe_init_params PA
>   */
>  SYM_CODE_START(___kvm_hyp_init)
> +alternative_if ARM64_PROTECTED_KVM
> + mov_q   x1, HCR_HOST_NVHE_PROTECTED_FLAGS
> + msr hcr_el2, x1
> + isb

Why the ISB? For HCR_TSC to have any effect, you'll have to go via an
ERET to EL1 first, which will have the required synchronisation effect.

> +alternative_else_nop_endif
> +
>   ldr x1, [x0, #NVHE_INIT_TPIDR_EL2]
>   msr tpidr_el2, x1
>  
> @@ -224,6 +230,12 @@ reset:
>   msr sctlr_el2, x5
>   isb
>  
> +alternative_if ARM64_PROTECTED_KVM
> + mov_q   x5, HCR_HOST_NVHE_FLAGS
> + msr hcr_el2, x5
> + isb

Same thing here, I believe.

> +alternative_else_nop_endif
> +
>   /* Install stub vectors */
>   adr_l   x5, __hyp_stub_vectors
>   msr vbar_el2, x5
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c 
> b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 8ae8160bc93a..e1f8e0797144 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -96,7 +96,10 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
>   mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
>  
>   write_sysreg(mdcr_el2, mdcr_el2);
> - write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
> + if (is_protected_kvm_enabled())
> + write_sysreg(HCR_HOST_NVHE_PROTECTED_FLAGS, hcr_el2);
> + else
> + write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
>   write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
>   write_sysreg(__kvm_hyp_host_vector, vbar_el2);
>  }
> -- 
> 2.29.2.299.gdc1121823c-goog
> 
> 

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 21/24] kvm: arm64: Add kvm-arm.protected early kernel parameter

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:43:15 +,
David Brazdil  wrote:
> 
> Add an early parameter that allows users to opt into protected KVM mode
> when using the nVHE hypervisor. In this mode, guest state will be kept
> private from the host. This will primarily involve enabling stage-2
> address translation for the host, restricting DMA to host memory, and
> filtering host SMCs.
> 
> Capability ARM64_PROTECTED_KVM is set if the param is passed, CONFIG_KVM
> is enabled and the kernel was not booted with VHE.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/cpucaps.h |  3 ++-
>  arch/arm64/include/asm/virt.h|  8 
>  arch/arm64/kernel/cpufeature.c   | 29 +
>  arch/arm64/kvm/arm.c | 10 +-
>  4 files changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cpucaps.h 
> b/arch/arm64/include/asm/cpucaps.h
> index e7d98997c09c..ac075f70b2e4 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -66,7 +66,8 @@
>  #define ARM64_HAS_TLB_RANGE  56
>  #define ARM64_MTE57
>  #define ARM64_WORKAROUND_1508412 58
> +#define ARM64_PROTECTED_KVM  59
>  
> -#define ARM64_NCAPS  59
> +#define ARM64_NCAPS  60
>  
>  #endif /* __ASM_CPUCAPS_H */
> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> index 6069be50baf9..2fde1186b962 100644
> --- a/arch/arm64/include/asm/virt.h
> +++ b/arch/arm64/include/asm/virt.h
> @@ -97,6 +97,14 @@ static __always_inline bool has_vhe(void)
>   return cpus_have_final_cap(ARM64_HAS_VIRT_HOST_EXTN);
>  }
>  
> +static __always_inline bool is_protected_kvm_enabled(void)
> +{
> + if (is_vhe_hyp_code())
> + return false;
> + else
> + return cpus_have_final_cap(ARM64_PROTECTED_KVM);
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* ! __ASM__VIRT_H */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 6f36c4f62f69..dd5bc0f0cf0d 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1709,6 +1709,29 @@ static void cpu_enable_mte(struct 
> arm64_cpu_capabilities const *cap)
>  }
>  #endif /* CONFIG_ARM64_MTE */
>  
> +#ifdef CONFIG_KVM
> +static bool enable_protected_kvm;
> +
> +static bool has_protected_kvm(const struct arm64_cpu_capabilities *entry, 
> int __unused)
> +{
> + if (!enable_protected_kvm)
> + return false;
> +
> + if (is_kernel_in_hyp_mode()) {
> + pr_warn("Protected KVM not available with VHE\n");
> + return false;
> + }
> +
> + return true;
> +}
> +
> +static int __init early_protected_kvm_cfg(char *buf)
> +{
> + return strtobool(buf, _protected_kvm);
> +}
> +early_param("kvm-arm.protected", early_protected_kvm_cfg);

Please add some documentation to
Documentation/admin-guide/kernel-parameters.txt.

> +#endif /* CONFIG_KVM */
> +
>  /* Internal helper functions to match cpu capability type */
>  static bool
>  cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
> @@ -1822,6 +1845,12 @@ static const struct arm64_cpu_capabilities 
> arm64_features[] = {
>   .field_pos = ID_AA64PFR0_EL1_SHIFT,
>   .min_field_value = ID_AA64PFR0_EL1_32BIT_64BIT,
>   },
> + {
> + .desc = "Protected KVM",
> + .capability = ARM64_PROTECTED_KVM,
> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> + .matches = has_protected_kvm,
> + },
>  #endif
>   {
>   .desc = "Kernel page table isolation (KPTI)",
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index c76a8e5bd19c..49d2474f2a80 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1796,6 +1796,12 @@ int kvm_arch_init(void *opaque)
>   return -ENODEV;
>   }
>  
> + /* The PROTECTED_KVM cap should not have been enabled for VHE. */
> + if (in_hyp_mode && is_protected_kvm_enabled()) {
> + kvm_pr_unimpl("VHE protected mode unsupported, not 
> initializing\n");
> + return -ENODEV;

How can this happen? Don't we already take care of this?

> + }
> +
>   if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) ||
>   cpus_have_final_cap(ARM64_WORKAROUND_1508412))
>   kvm_info("Guests without required CPU erratum workarounds can 
> deadlock system!\n" \
> @@ -1827,7 +1833,9 @@ int kvm_arch_init(void *opaque)
>   if (err)
>   goto out_hyp;
>  
> - if (in_hyp_mode)
> + if (is_protected_kvm_enabled())
> + kvm_info("Protected nVHE mode initialized successfully\n");
> + else if (in_hyp_mode)
>   kvm_info("VHE mode initialized successfully\n");
>   else
>   kvm_info("Hyp mode initialized successfully\n");
> -- 
> 2.29.2.299.gdc1121823c-goog
> 
> 


Re: [RFC PATCH 13/27] KVM: arm64: Enable access to sanitized CPU features at EL2

2020-11-23 Thread David Brazdil
> +int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst)
> +{
> + struct arm64_ftr_reg *regp = get_arm64_ftr_reg(id);
> +
> + if (!regp)
> + return -EINVAL;
> +
> + memcpy(dst, regp, sizeof(*regp));
> +
> + return 0;
> +}
> +
>  #define read_sysreg_case(r)  \
>   case r: return read_sysreg_s(r)
>  
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index dd8ccc9efb6a..c35d768672eb 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -116,6 +116,8 @@ __kvm_nvhe___memcpy   = 
> __kvm_nvhe___pi_memcpy;
>  __kvm_nvhe___memset  = __kvm_nvhe___pi_memset;
>  #endif
>  
> +_kvm_nvhe___flush_dcache_area= 
> __kvm_nvhe___pi___flush_dcache_area;
> +

Could you help my understand why we need this?
* Why do we need PI routines in the first place? Would my series that fixes
  relocations in hyp code remove the need?
* You added these aliases for the string routines because you were worried
  somebody would change the implementation in arch/arm64/lib, right? But this
  cache flush function is defined in hyp/nvhe. So why do we need to point to
  the PI alias if we control the implementation?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 20/24] kvm: arm64: Intercept host's CPU_SUSPEND PSCI SMCs

2020-11-23 Thread Marc Zyngier
Adding Lorenzo and Sudeep to this one in particular, as there is a bit
of a corner case below.

On Mon, 16 Nov 2020 20:43:14 +,
David Brazdil  wrote:
> 
> Add a handler of CPU_SUSPEND host PSCI SMCs. The SMC can either enter
> a sleep state indistinguishable from a WFI or a deeper sleep state that
> behaves like a CPU_OFF+CPU_ON.
> 
> The handler saves r0,pc of the host and makes the same call to EL3 with
> the hyp CPU entry point. It either returns back to the handler and then
> back to the host, or wakes up into the entry point and initializes EL2
> state before dropping back to EL1.
> 
> There is a simple atomic lock around the reset state struct to protect
> from races with CPU_ON. A well-behaved host should never run CPU_ON
> against an already online core, and the kernel indeed does not allow
> that, so if the core sees its reset state struct locked, it will return
> a non-spec error code PENDING_ON. This protects the hypervisor state and

"non-spec" as in "outside of the PSCI specification"? Err...

> avoids the need for more complicated locking and/or tracking power state
> of individual cores.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c | 39 +++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c 
> b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> index 2daf52b59846..313ef42f0eab 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> @@ -121,6 +121,39 @@ static void release_reset_state(struct 
> kvm_host_psci_state *cpu_state)
>   atomic_set_release(_state->pending_on, 0);
>  }
>  
> +static int psci_cpu_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
> +{
> + u64 power_state = host_ctxt->regs.regs[1];
> + unsigned long pc = host_ctxt->regs.regs[2];
> + unsigned long r0 = host_ctxt->regs.regs[3];
> + struct kvm_host_psci_state *cpu_state;
> + struct kvm_nvhe_init_params *cpu_params;
> + int ret;
> +
> + cpu_state = this_cpu_ptr(_host_psci_state);
> + cpu_params = this_cpu_ptr(_init_params);
> +
> + /*
> +  * Lock the reset state struct. This fails if the host has concurrently
> +  * called CPU_ON with this CPU as target. The kernel keeps track of
> +  * online CPUs, so that should never happen. If it does anyway, return
> +  * a non-spec error. This avoids the need for spinlocks.
> +  */
> + if (!try_acquire_reset_state(cpu_state, pc, r0))
> + return PSCI_RET_ALREADY_ON;

So that's the core of the problem. I'm definitely not keen on EL2
returning unspecified error codes. But there is something I don't get:

If the CPU is currently booting (reset state is locked), it means that
CPU hasn't reached the EL1 kernel yet. So how can this same CPU issue
a CPU_SUSPEND from EL1? CPU_SUSPEND can't be called for a third party,
only by a CPU for itself.

It looks like this case cannot happen by construction. And if it
happens, it looks like the only course of action should be to panic,
as we have lost track of the running CPUs. Am I missing something
obvious?

> +
> + /*
> +  * Will either return if shallow sleep state, or wake up into the entry
> +  * point if it is a deep sleep state.
> +  */
> + ret = psci_call(func_id, power_state,
> + __hyp_pa(hyp_symbol_addr(__kvm_hyp_cpu_entry)),
> + __hyp_pa(cpu_params));
> +
> + release_reset_state(cpu_state);
> + return ret;
> +}
> +
>  static int psci_cpu_on(u64 func_id, struct kvm_cpu_context *host_ctxt)
>  {
>   u64 mpidr = host_ctxt->regs.regs[1];
> @@ -178,7 +211,9 @@ asmlinkage void __noreturn __kvm_hyp_psci_cpu_entry(void)
>  
>  static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context 
> *host_ctxt)
>  {
> - if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
> + if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_SUSPEND])
> + return psci_cpu_suspend(func_id, host_ctxt);
> + else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
>   return psci_forward(host_ctxt);
>   else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_ON])
>   return psci_cpu_on(func_id, host_ctxt);
> @@ -202,6 +237,8 @@ static unsigned long psci_0_2_handler(u64 func_id, struct 
> kvm_cpu_context *host_
>   case PSCI_0_2_FN_SYSTEM_RESET:
>   psci_forward_noreturn(host_ctxt);
>   unreachable();
> + case PSCI_0_2_FN64_CPU_SUSPEND:
> + return psci_cpu_suspend(func_id, host_ctxt);
>   case PSCI_0_2_FN64_CPU_ON:
>   return psci_cpu_on(func_id, host_ctxt);
>   default:
> -- 
> 2.29.2.299.gdc1121823c-goog
> 
> 

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu

Re: [PATCH v2 19/24] kvm: arm64: Intercept host's PSCI_CPU_ON SMCs

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:43:13 +,
David Brazdil  wrote:
> 
> Add a handler of the CPU_ON PSCI call from host. When invoked, it looks
> up the logical CPU ID corresponding to the provided MPIDR and populates
> the state struct of the target CPU with the provided x0, pc. It then
> calls CPU_ON itself, with an entry point in hyp that initializes EL2
> state before returning ERET to the provided PC in EL1.
> 
> There is a simple atomic lock around the reset state struct. If it is
> already locked, CPU_ON will return PENDING_ON error code.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_asm.h |   8 ++-
>  arch/arm64/kvm/arm.c |   1 +
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c | 104 +++
>  3 files changed, 110 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 109867fb76f6..2e36ba4be748 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -175,9 +175,11 @@ struct kvm_s2_mmu;
>  DECLARE_KVM_NVHE_SYM(__kvm_hyp_init);
>  DECLARE_KVM_NVHE_SYM(__kvm_hyp_host_vector);
>  DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
> -#define __kvm_hyp_init   CHOOSE_NVHE_SYM(__kvm_hyp_init)
> -#define __kvm_hyp_host_vectorCHOOSE_NVHE_SYM(__kvm_hyp_host_vector)
> -#define __kvm_hyp_vector CHOOSE_HYP_SYM(__kvm_hyp_vector)
> +DECLARE_KVM_NVHE_SYM(__kvm_hyp_psci_cpu_entry);
> +#define __kvm_hyp_init   CHOOSE_NVHE_SYM(__kvm_hyp_init)
> +#define __kvm_hyp_host_vector
> CHOOSE_NVHE_SYM(__kvm_hyp_host_vector)
> +#define __kvm_hyp_vector CHOOSE_HYP_SYM(__kvm_hyp_vector)
> +#define __kvm_hyp_psci_cpu_entry 
> CHOOSE_NVHE_SYM(__kvm_hyp_psci_cpu_entry)
>  
>  extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
>  DECLARE_KVM_NVHE_SYM(__per_cpu_start);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 7d2270eeecfb..c76a8e5bd19c 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1365,6 +1365,7 @@ static void cpu_init_hyp_mode(void)
>  
>   params->vector_hyp_va = (unsigned 
> long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
>   params->stack_hyp_va = 
> kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
> + params->entry_hyp_va = (unsigned 
> long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_psci_cpu_entry));

It feels really odd to use a per-CPU variable to keep track of
something that is essentially a constant. Why can't we just have an
assembly version of __kimg_hyp_va() and use that to compute the branch
target directly in __kvm_hyp_cpu_entry()? __kvm_hyp_host_vector is
another one.

>   params->pgd_pa = kvm_mmu_get_httbr();
>  
>   /*
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c 
> b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> index 7542de8bd679..2daf52b59846 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> @@ -9,10 +9,15 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
> +#define INVALID_CPU_ID UINT_MAX
> +
> +extern char __kvm_hyp_cpu_entry[];
> +
>  /* Config options set by the host. */
>  u32 __ro_after_init kvm_host_psci_version = PSCI_VERSION(0, 0);
>  u32 __ro_after_init kvm_host_psci_function_id[PSCI_FN_MAX];
> @@ -20,6 +25,14 @@ s64 __ro_after_init hyp_physvirt_offset;
>  
>  #define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
>  
> +struct kvm_host_psci_state {
> + atomic_t pending_on;
> + unsigned long pc;
> + unsigned long r0;
> +};
> +
> +static DEFINE_PER_CPU(struct kvm_host_psci_state, kvm_host_psci_state);
> +
>  static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
>  {
>   return host_ctxt->regs.regs[0];
> @@ -76,10 +89,99 @@ static __noreturn unsigned long 
> psci_forward_noreturn(struct kvm_cpu_context *ho
>   hyp_panic(); /* unreachable */
>  }
>  
> +static unsigned int find_cpu_id(u64 mpidr)
> +{
> + int i;

nit: unsigned int?

> +
> + if (mpidr != INVALID_HWID) {

This is a little ugly on the side [(c) FZ], and deserves a comment
("Reject MPIDRs matching the init value of the __cpu_logical_map[]
array"?).

Also, I personally prefer a construct that reduces the nesting:

if (mpidr == INVALID_HWID)
return INVALID_CPU_ID;

> + for (i = 0; i < NR_CPUS; i++) {
> + if (cpu_logical_map(i) == mpidr)
> + return i;
> + }
> + }
> +
> + return INVALID_CPU_ID;
> +}
> +
> +static bool try_acquire_reset_state(struct kvm_host_psci_state *cpu_state,
> + unsigned long pc, unsigned long r0)
> +{
> + if (atomic_cmpxchg_acquire(_state->pending_on, 0, 1) != 0)

What guarantees that this cmpxchg is inlined here? Also, having some
names for 0 and 1 would be nice.

> + return false;
> +
> + cpu_state->pc = pc;
> + 

Re: [RFC PATCH 09/27] KVM: arm64: Allow using kvm_nvhe_sym() in hyp code

2020-11-23 Thread David Brazdil
On Tue, Nov 17, 2020 at 06:15:49PM +, 'Quentin Perret' via kernel-team 
wrote:
> In order to allow the usage of code shared by the host and the hyp in
> static inline library function, allow the usage of kvm_nvhe_sym() at el2
> by defaulting to the raw symbol name.
> 
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/include/asm/hyp_image.h | 4 
>  arch/arm64/include/asm/kvm_asm.h   | 4 ++--
>  arch/arm64/kvm/arm.c   | 2 +-
>  3 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/hyp_image.h 
> b/arch/arm64/include/asm/hyp_image.h
> index daa1a1da539e..8b807b646b8f 100644
> --- a/arch/arm64/include/asm/hyp_image.h
> +++ b/arch/arm64/include/asm/hyp_image.h
> @@ -7,11 +7,15 @@
>  #ifndef __ARM64_HYP_IMAGE_H__
>  #define __ARM64_HYP_IMAGE_H__
>  
> +#ifndef __KVM_NVHE_HYPERVISOR__
>  /*
>   * KVM nVHE code has its own symbol namespace prefixed with __kvm_nvhe_,
>   * to separate it from the kernel proper.
>   */
>  #define kvm_nvhe_sym(sym)__kvm_nvhe_##sym
> +#else
> +#define kvm_nvhe_sym(sym)sym
> +#endif
>  
>  #ifdef LINKER_SCRIPT
>  
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 1a86581e581e..e4934f5e4234 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -173,11 +173,11 @@ struct kvm_s2_mmu;
>  DECLARE_KVM_NVHE_SYM(__kvm_hyp_init);
>  DECLARE_KVM_NVHE_SYM(__kvm_hyp_host_vector);
>  DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
> -DECLARE_KVM_NVHE_SYM(__kvm_hyp_psci_cpu_entry);
>  #define __kvm_hyp_init   CHOOSE_NVHE_SYM(__kvm_hyp_init)
>  #define __kvm_hyp_host_vector
> CHOOSE_NVHE_SYM(__kvm_hyp_host_vector)
>  #define __kvm_hyp_vector CHOOSE_HYP_SYM(__kvm_hyp_vector)
> -#define __kvm_hyp_psci_cpu_entry 
> CHOOSE_NVHE_SYM(__kvm_hyp_psci_cpu_entry)
> +
> +void kvm_nvhe_sym(__kvm_hyp_psci_cpu_entry)(void);
>  
>  extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
>  DECLARE_KVM_NVHE_SYM(__per_cpu_start);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 882eb383bd75..391cf6753a13 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1369,7 +1369,7 @@ static void cpu_prepare_hyp_mode(int cpu)
>  
>   params->vector_hyp_va = kern_hyp_va((unsigned 
> long)kvm_ksym_ref(__kvm_hyp_host_vector));
>   params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) 
> + PAGE_SIZE);
> - params->entry_hyp_va = kern_hyp_va((unsigned 
> long)kvm_ksym_ref(__kvm_hyp_psci_cpu_entry));
> + params->entry_hyp_va = kern_hyp_va((unsigned 
> long)kvm_ksym_ref_nvhe(__kvm_hyp_psci_cpu_entry));

Why is this change needed?

>   params->pgd_pa = kvm_mmu_get_httbr();
>  
>   /*
> -- 
> 2.29.2.299.gdc1121823c-goog
> 
> -- 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 08/27] KVM: arm64: Make kvm_call_hyp() a function call at Hyp

2020-11-23 Thread David Brazdil
On Tue, Nov 17, 2020 at 06:15:48PM +, 'Quentin Perret' via kernel-team 
wrote:
> kvm_call_hyp() has some logic to issue a function call or a hypercall
> depending the EL at which the kernel is running. However, all the code
> compiled under __KVM_NVHE_HYPERVISOR__ is guaranteed to run only at EL2,
> and in this case a simple function call is needed.
> 
> Add ifdefery to kvm_host.h to symplify kvm_call_hyp() in .hyp.text.
> 
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/include/asm/kvm_host.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index ac11adab6602..7a5d5f4b3351 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -557,6 +557,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
>  void kvm_arm_halt_guest(struct kvm *kvm);
>  void kvm_arm_resume_guest(struct kvm *kvm);
>  
> +#ifndef __KVM_NVHE_HYPERVISOR__
>  #define kvm_call_hyp_nvhe(f, ...)
> \
>   ({  \
>   struct arm_smccc_res res;   \
> @@ -596,6 +597,11 @@ void kvm_arm_resume_guest(struct kvm *kvm);
>   \
>   ret;\
>   })
> +#else /* __KVM_NVHE_HYPERVISOR__ */
> +#define kvm_call_hyp(f, ...) f(__VA_ARGS__)
> +#define kvm_call_hyp_ret(f, ...) f(__VA_ARGS__)
> +#define kvm_call_hyp_nvhe(f, ...) f(__VA_ARGS__)
> +#endif /* __KVM_NVHE_HYPERVISOR__ */

I was hoping we could define this as the following instead. That would require
adding host-side declarations of all functions currently called with _nvhe.

#define kvm_call_hyp_nvhe(f, ...)   \
+   is_nvhe_hyp_code() ? f(__VA_ARGS__) :   \
({  \
struct arm_smccc_res res;   \
\
arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(f),   \
##__VA_ARGS__, );   \
WARN_ON(res.a0 != SMCCC_RET_SUCCESS);   \
\
res.a1; \
})

Up to you what you think is cleaner, just my 2 cents...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 02/27] KVM: arm64: Link position-independent string routines into .hyp.text

2020-11-23 Thread David Brazdil
On Tue, Nov 17, 2020 at 06:15:42PM +, 'Quentin Perret' via kernel-team 
wrote:
> From: Will Deacon 
> 
> Pull clear_page(), copy_page(), memcpy() and memset() into the nVHE hyp
> code and ensure that we always execute the '__pi_' entry point on the
> offchance that it changes in future.
> 
> [ qperret: Commit title nits ]
> 
> Signed-off-by: Will Deacon 
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/kernel/image-vars.h   | 11 +++
>  arch/arm64/kvm/hyp/nvhe/Makefile |  4 
>  2 files changed, 15 insertions(+)
> 
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index 8539f34d7538..dd8ccc9efb6a 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -105,6 +105,17 @@ KVM_NVHE_ALIAS(__stop___kvm_ex_table);
>  /* Array containing bases of nVHE per-CPU memory regions. */
>  KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
>  
> +/* Position-independent library routines */
> +__kvm_nvhe_clear_page= __kvm_nvhe___pi_clear_page;
> +__kvm_nvhe_copy_page = __kvm_nvhe___pi_copy_page;
> +__kvm_nvhe_memcpy= __kvm_nvhe___pi_memcpy;
> +__kvm_nvhe_memset= __kvm_nvhe___pi_memset;
> +
> +#ifdef CONFIG_KASAN
> +__kvm_nvhe___memcpy  = __kvm_nvhe___pi_memcpy;
> +__kvm_nvhe___memset  = __kvm_nvhe___pi_memset;
> +#endif
> +
>  #endif /* CONFIG_KVM */

Nit: Would be good to use the kvm_nvhe_sym() helper for the namespacing.
And feel free to define something like KVM_NVHE_ALIAS for PI in hyp-image.h.

>  
>  #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile 
> b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 1f1e351c5fe2..590fdefb42dd 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -6,10 +6,14 @@
>  asflags-y := -D__KVM_NVHE_HYPERVISOR__
>  ccflags-y := -D__KVM_NVHE_HYPERVISOR__
>  
> +lib-objs := clear_page.o copy_page.o memcpy.o memset.o
> +lib-objs := $(addprefix ../../../lib/, $(lib-objs))
> +
>  obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
>hyp-main.o hyp-smp.o psci-relay.o
>  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
>../fpsimd.o ../hyp-entry.o ../exception.o
> +obj-y += $(lib-objs)
>  
>  ##
>  ## Build rules for compiling nVHE hyp code
> -- 
> 2.29.2.299.gdc1121823c-goog
> 
> -- 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 15/24] kvm: arm64: Extract parts of el2_setup into a macro

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:43:09 +,
David Brazdil  wrote:
> 
> When the a CPU is booted in EL2, the kernel checks for VHE support and
> initializes the CPU core accordingly. For nVHE it also installs the stub
> vectors and drops down to EL1.
> 
> Once KVM gains the ability to boot cores without going through the
> kernel entry point, it will need to initialize the CPU the same way.
> Extract the relevant bits of el2_setup into an init_el2_state macro
> with an argument specifying whether to initialize for VHE or nVHE.
> 
> No functional change. Size of el2_setup increased by 148 bytes due
> to duplication.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/el2_setup.h | 185 +
>  arch/arm64/kernel/head.S   | 144 +++---
>  2 files changed, 201 insertions(+), 128 deletions(-)
>  create mode 100644 arch/arm64/include/asm/el2_setup.h
> 
> diff --git a/arch/arm64/include/asm/el2_setup.h 
> b/arch/arm64/include/asm/el2_setup.h
> new file mode 100644
> index ..e5026e0aa878
> --- /dev/null
> +++ b/arch/arm64/include/asm/el2_setup.h
> @@ -0,0 +1,185 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2012,2013 - ARM Ltd
> + * Author: Marc Zyngier 
> + */
> +
> +#ifndef __ARM_KVM_INIT_H__
> +#define __ARM_KVM_INIT_H__
> +
> +#ifndef __ASSEMBLY__
> +#error Assembly-only header
> +#endif
> +
> +#ifdef CONFIG_ARM_GIC_V3
> +#include 
> +#endif
> +
> +#include 
> +#include 
> +#include 
> +
> +.macro __init_el2_sctlr
> + mov_q   x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
> + msr sctlr_el2, x0
> + isb
> +.endm
> +
> +/*
> + * Allow Non-secure EL1 and EL0 to access physical timer and counter.
> + * This is not necessary for VHE, since the host kernel runs in EL2,
> + * and EL0 accesses are configured in the later stage of boot process.
> + * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
> + * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
> + * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
> + * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
> + * EL2.
> + */
> +.macro __init_el2_timers mode
> +.ifeqs "\mode", "nvhe"
> + mrs x0, cnthctl_el2
> + orr x0, x0, #3  // Enable EL1 physical timers
> + msr cnthctl_el2, x0
> +.endif
> + msr cntvoff_el2, xzr// Clear virtual offset
> +.endm
> +
> +.macro __init_el2_debug mode
> + mrs x1, id_aa64dfr0_el1
> + sbfxx0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
> + cmp x0, #1
> + b.lt1f  // Skip if no PMU present
> + mrs x0, pmcr_el0// Disable debug access traps
> + ubfxx0, x0, #11, #5 // to EL2 and allow access to
> +1:
> + cselx2, xzr, x0, lt // all PMU counters from EL1
> +
> + /* Statistical profiling */
> + ubfxx0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
> + cbz x0, 3f  // Skip if SPE not present
> +
> +.ifeqs "\mode", "nvhe"
> + mrs_s   x0, SYS_PMBIDR_EL1  // If SPE available at EL2,
> + and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
> + cbnzx0, 2f  // then permit sampling of 
> physical
> + mov x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
> +   1 << SYS_PMSCR_EL2_PA_SHIFT)
> + msr_s   SYS_PMSCR_EL2, x0   // addresses and physical 
> counter
> +2:
> + mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
> + orr x2, x2, x0  // If we don't have VHE, then
> + // use EL1&0 translation.
> +.else
> + orr x2, x2, #MDCR_EL2_TPMS  // For VHE, use EL2 translation
> + // and disable access from EL1
> +.endif
> +
> +3:
> + msr mdcr_el2, x2// Configure debug traps
> +.endm
> +
> +/* LORegions */
> +.macro __init_el2_lor
> + mrs x1, id_aa64mmfr1_el1
> + ubfxx0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
> + cbz x0, 1f
> + msr_s   SYS_LORC_EL1, xzr
> +1:
> +.endm
> +
> +/* Stage-2 translation */
> +.macro __init_el2_stage2
> + msr vttbr_el2, xzr
> +.endm
> +
> +/* GICv3 system register access */
> +#ifdef CONFIG_ARM_GIC_V3

nit: this #ifdef isn't relevant anymore and can be dropped throughout
the file.

> +.macro __init_el2_gicv3
> + mrs x0, id_aa64pfr0_el1
> + ubfxx0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
> + cbz x0, 1f
> +
> + mrs_s   x0, SYS_ICC_SRE_EL2
> + orr x0, x0, #ICC_SRE_EL2_SRE// Set ICC_SRE_EL2.SRE==1
> + orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
> + msr_s   SYS_ICC_SRE_EL2, x0
> + isb // Make sure SRE is now set
> + mrs_s   x0, SYS_ICC_SRE_EL2 // Read SRE 

Re: [RFC PATCH 13/27] KVM: arm64: Enable access to sanitized CPU features at EL2

2020-11-23 Thread Fuad Tabba
Hi Quentin,

On Tue, Nov 17, 2020 at 6:16 PM 'Quentin Perret' via kernel-team
 wrote:
>
> Introduce the infrastructure in KVM enabling to copy CPU feature
> registers into EL2-owned data-structures, to allow reading sanitised
> values directly at EL2 in nVHE.
>
> Given that only a subset of these features are being read by the
> hypervisor, the ones that need to be copied are to be listed under
>  together with the name of the nVHE variable that
> will hold the copy.
>
> While at it, introduce the first user of this infrastructure by
> implementing __flush_dcache_area at EL2, which needs
> arm64_ftr_reg_ctrel0.
>
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/include/asm/cpufeature.h |  1 +
>  arch/arm64/include/asm/kvm_cpufeature.h | 17 ++
>  arch/arm64/kernel/cpufeature.c  | 12 ++
>  arch/arm64/kernel/image-vars.h  |  2 ++
>  arch/arm64/kvm/arm.c| 31 +
>  arch/arm64/kvm/hyp/nvhe/Makefile|  3 ++-
>  arch/arm64/kvm/hyp/nvhe/cache.S | 13 +++
>  arch/arm64/kvm/hyp/nvhe/cpufeature.c|  8 +++
>  8 files changed, 86 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c
>
> diff --git a/arch/arm64/include/asm/cpufeature.h 
> b/arch/arm64/include/asm/cpufeature.h
> index da250e4741bd..3dfbd76fb647 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -600,6 +600,7 @@ void __init setup_cpu_features(void);
>  void check_local_cpu_capabilities(void);
>
>  u64 read_sanitised_ftr_reg(u32 id);
> +int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst);
>
>  static inline bool cpu_supports_mixed_endian_el0(void)
>  {
> diff --git a/arch/arm64/include/asm/kvm_cpufeature.h 
> b/arch/arm64/include/asm/kvm_cpufeature.h
> new file mode 100644
> index ..d34f85cba358
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_cpufeature.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2020 - Google LLC
> + * Author: Quentin Perret 
> + */

Missing include guard.


> +
> +#include 
> +
> +#ifndef KVM_HYP_CPU_FTR_REG
> +#if defined(__KVM_NVHE_HYPERVISOR__)
> +#define KVM_HYP_CPU_FTR_REG(id, name) extern struct arm64_ftr_reg name;
> +#else
> +#define KVM_HYP_CPU_FTR_REG(id, name) DECLARE_KVM_NVHE_SYM(name);
> +#endif
> +#endif
> +
> +KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index dd5bc0f0cf0d..3bc86d1423f8 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1116,6 +1116,18 @@ u64 read_sanitised_ftr_reg(u32 id)
>  }
>  EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg);
>
> +int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst)
> +{
> +   struct arm64_ftr_reg *regp = get_arm64_ftr_reg(id);
> +
> +   if (!regp)
> +   return -EINVAL;
> +
> +   memcpy(dst, regp, sizeof(*regp));
> +
> +   return 0;
> +}
> +
>  #define read_sysreg_case(r)\
> case r: return read_sysreg_s(r)
>
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index dd8ccc9efb6a..c35d768672eb 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -116,6 +116,8 @@ __kvm_nvhe___memcpy = 
> __kvm_nvhe___pi_memcpy;
>  __kvm_nvhe___memset= __kvm_nvhe___pi_memset;
>  #endif
>
> +_kvm_nvhe___flush_dcache_area  = __kvm_nvhe___pi___flush_dcache_area;
> +
>  #endif /* CONFIG_KVM */
>
>  #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 391cf6753a13..c7f8fca97202 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1636,6 +1637,29 @@ static void teardown_hyp_mode(void)
> }
>  }
>
> +#undef KVM_HYP_CPU_FTR_REG
> +#define KVM_HYP_CPU_FTR_REG(id, name) \
> +   { .sys_id = id, .dst = (struct arm64_ftr_reg *)_nvhe_sym(name) },
> +static const struct __ftr_reg_copy_entry {
> +   u32 sys_id;
> +   struct arm64_ftr_reg*dst;
> +} hyp_ftr_regs[] = {
> +   #include 
> +};
> +
> +static int copy_cpu_ftr_regs(void)
> +{
> +   int i, ret;
> +
> +   for (i = 0; i < ARRAY_SIZE(hyp_ftr_regs); i++) {
> +   ret = copy_ftr_reg(hyp_ftr_regs[i].sys_id, 
> hyp_ftr_regs[i].dst);
> +   if (ret)
> +   return ret;
> +   }
> +
> +   return 0;
> +}
> +
>  /**
>   * Inits Hyp-mode on all online CPUs
>   */
> @@ -1644,6 +1668,13 @@ static int init_hyp_mode(void)
> int cpu;
> int err = 0;
>
> +   /*
> +* Copy the required CPU feature register in their EL2 counterpart

Re: [PATCH v2 07/24] kvm: arm64: Refactor handle_trap to use a switch

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:43:01 +,
David Brazdil  wrote:
> 
> Small refactor so that nVHE's handle_trap uses a switch on the Exception
> Class value of ESR_EL2 in preparation for adding a handler of SMC32/64.

nit: SMC32 seems to be a leftover from the previous version.

>
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
> b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 411b0f652417..19332c20fcde 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -16,9 +16,9 @@
>  
>  DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
>  
> -static void handle_host_hcall(unsigned long func_id,
> -   struct kvm_cpu_context *host_ctxt)
> +static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
>  {
> + unsigned long func_id = host_ctxt->regs.regs[0];
>   unsigned long ret = 0;
>  
>   switch (func_id) {
> @@ -109,11 +109,12 @@ static void handle_host_hcall(unsigned long func_id,
>  void handle_trap(struct kvm_cpu_context *host_ctxt)
>  {
>   u64 esr = read_sysreg_el2(SYS_ESR);
> - unsigned long func_id;
>  
> - if (ESR_ELx_EC(esr) != ESR_ELx_EC_HVC64)
> + switch (ESR_ELx_EC(esr)) {
> + case ESR_ELx_EC_HVC64:
> + handle_host_hcall(host_ctxt);
> + break;
> + default:
>   hyp_panic();
> -
> - func_id = host_ctxt->regs.regs[0];
> - handle_host_hcall(func_id, host_ctxt);
> + }
>  }
> -- 
> 2.29.2.299.gdc1121823c-goog
> 
> 

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 06/24] kvm: arm64: Move hyp-init params to a per-CPU struct

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:43:00 +,
David Brazdil  wrote:
> 
> Once we start initializing KVM on newly booted cores before the rest of
> the kernel, parameters to __do_hyp_init will need to be provided by EL2
> rather than EL1. At that point it will not be possible to pass its four
> arguments directly because PSCI_CPU_ON only supports one context
> argument.
> 
> Refactor __do_hyp_init to accept its parameters in a struct. This
> prepares the code for KVM booting cores as well as removes any limits on
> the number of __do_hyp_init arguments.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_asm.h   |  7 +++
>  arch/arm64/include/asm/kvm_hyp.h   |  4 
>  arch/arm64/kernel/asm-offsets.c|  4 
>  arch/arm64/kvm/arm.c   | 26 ++
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S | 21 ++---
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c |  2 ++
>  6 files changed, 41 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 54387ccd1ab2..01904e88cead 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -150,6 +150,13 @@ extern void *__vhe_undefined_symbol;
>  
>  #endif
>  
> +struct kvm_nvhe_init_params {
> + unsigned long tpidr_el2;
> + unsigned long vector_hyp_va;
> + unsigned long stack_hyp_va;
> + phys_addr_t pgd_pa;
> +};
> +
>  /* Translate a kernel address @ptr into its equivalent linear mapping */
>  #define kvm_ksym_ref(ptr)\
>   ({  \
> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> b/arch/arm64/include/asm/kvm_hyp.h
> index 6b664de5ec1f..a3289071f3d8 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -15,6 +15,10 @@
>  DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
>  DECLARE_PER_CPU(unsigned long, kvm_hyp_vector);
>  
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> +#endif

I'm not sure we should bother with this #ifdefery. Having the
declaration present at all times doesn't really hurt, since it is only
defined in the HYP code. Cutting down on the conditionals would
certainly help readability.

> +
>  #define read_sysreg_elx(r,nvh,vh)\
>   ({  \
>   u64 reg;\
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 7d32fc959b1a..4435ad8be938 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -110,6 +110,10 @@ int main(void)
>DEFINE(CPU_APGAKEYLO_EL1,  offsetof(struct kvm_cpu_context, 
> sys_regs[APGAKEYLO_EL1]));
>DEFINE(HOST_CONTEXT_VCPU,  offsetof(struct kvm_cpu_context, 
> __hyp_running_vcpu));
>DEFINE(HOST_DATA_CONTEXT,  offsetof(struct kvm_host_data, host_ctxt));
> +  DEFINE(NVHE_INIT_TPIDR_EL2,offsetof(struct kvm_nvhe_init_params, 
> tpidr_el2));
> +  DEFINE(NVHE_INIT_VECTOR_HYP_VA,offsetof(struct kvm_nvhe_init_params, 
> vector_hyp_va));
> +  DEFINE(NVHE_INIT_STACK_HYP_VA, offsetof(struct kvm_nvhe_init_params, 
> stack_hyp_va));
> +  DEFINE(NVHE_INIT_PGD_PA,   offsetof(struct kvm_nvhe_init_params, pgd_pa));
>  #endif
>  #ifdef CONFIG_CPU_PM
>DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index c0ffb019ca8b..4838556920fb 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -50,6 +50,7 @@ DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
>  
>  static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
> +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
>  
>  /* The VMID used in the VTTBR */
>  static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> @@ -1347,10 +1348,7 @@ static int kvm_map_vectors(void)
>  
>  static void cpu_init_hyp_mode(void)
>  {
> - phys_addr_t pgd_ptr;
> - unsigned long hyp_stack_ptr;
> - unsigned long vector_ptr;
> - unsigned long tpidr_el2;
> + struct kvm_nvhe_init_params *params = 
> this_cpu_ptr_nvhe_sym(kvm_init_params);
>   struct arm_smccc_res res;
>  
>   /* Switch from the HYP stub to our own HYP init vector */
> @@ -1361,13 +1359,18 @@ static void cpu_init_hyp_mode(void)
>* kernel's mapping to the linear mapping, and store it in tpidr_el2
>* so that we can use adr_l to access per-cpu variables in EL2.
>*/
> - tpidr_el2 = (unsigned long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
> - (unsigned 
> long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
> + params->tpidr_el2 = (unsigned 
> long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
> + 

Re: [PATCH v2 04/24] arm64: Move MAIR_EL1_SET to asm/memory.h

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:42:58 +,
David Brazdil  wrote:
> 
> KVM currently initializes MAIR_EL2 to the value of MAIR_EL1. In
> preparation for initializing MAIR_EL2 before MAIR_EL1, move the constant
> into a shared header file. Since it is used for EL1 and EL2, rename to
> MAIR_ELx_SET.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/memory.h | 29 ++---
>  arch/arm64/include/asm/sysreg.h | 30 ++
>  arch/arm64/mm/proc.S| 15 +--
>  3 files changed, 45 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index cd61239bae8c..8ae8fd883a0c 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Size of the PCI I/O space. This must remain a power of two so that
> @@ -124,21 +125,6 @@
>   */
>  #define SEGMENT_ALIGNSZ_64K
>  
> -/*
> - * Memory types available.
> - *
> - * IMPORTANT: MT_NORMAL must be index 0 since vm_get_page_prot() may 'or' in
> - * the MT_NORMAL_TAGGED memory type for PROT_MTE mappings. Note
> - * that protection_map[] only contains MT_NORMAL attributes.
> - */
> -#define MT_NORMAL0
> -#define MT_NORMAL_TAGGED 1
> -#define MT_NORMAL_NC 2
> -#define MT_NORMAL_WT 3
> -#define MT_DEVICE_nGnRnE 4
> -#define MT_DEVICE_nGnRE  5
> -#define MT_DEVICE_GRE6
> -
>  /*
>   * Memory types for Stage-2 translation
>   */
> @@ -152,6 +138,19 @@
>  #define MT_S2_FWB_NORMAL 6
>  #define MT_S2_FWB_DEVICE_nGnRE   1
>  
> +/*
> + * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal memory 
> and
> + * changed during __cpu_setup to Normal Tagged if the system supports MTE.
> + */
> +#define MAIR_ELx_SET \
> + (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) |  \
> +  MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) |\
> +  MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) |\
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) |  \
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) |\
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) |  \
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
> +
>  #ifdef CONFIG_ARM64_4K_PAGES
>  #define IOREMAP_MAX_ORDER(PUD_SHIFT)
>  #else
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index e2ef4c2edf06..24e773414cb4 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -635,6 +635,34 @@
>  /* Position the attr at the correct index */
>  #define MAIR_ATTRIDX(attr, idx)  ((attr) << ((idx) * 8))
>  
> +/*
> + * Memory types available.
> + *
> + * IMPORTANT: MT_NORMAL must be index 0 since vm_get_page_prot() may 'or' in
> + * the MT_NORMAL_TAGGED memory type for PROT_MTE mappings. Note
> + * that protection_map[] only contains MT_NORMAL attributes.
> + */
> +#define MT_NORMAL0
> +#define MT_NORMAL_TAGGED 1
> +#define MT_NORMAL_NC 2
> +#define MT_NORMAL_WT 3
> +#define MT_DEVICE_nGnRnE 4
> +#define MT_DEVICE_nGnRE  5
> +#define MT_DEVICE_GRE6
> +
> +/*
> + * Default MAIR_ELx. MT_NORMAL_TAGGED is initially mapped as Normal memory 
> and
> + * changed during __cpu_setup to Normal Tagged if the system supports MTE.
> + */
> +#define MAIR_ELx_SET \
> + (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) |  \
> +  MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) |\
> +  MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) |\
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) |  \
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) |\
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) |  \
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
> +
>  /* id_aa64isar0 */
>  #define ID_AA64ISAR0_RNDR_SHIFT  60
>  #define ID_AA64ISAR0_TLB_SHIFT   56
> @@ -992,6 +1020,7 @@
>  /* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
>  #define SYS_MPIDR_SAFE_VAL   (BIT(31))
>  
> +#ifndef LINKER_SCRIPT

This is terribly ugly. Why is this included by the linker script? Does
it actually define __ASSEMBLY__?

>  #ifdef __ASSEMBLY__
>  
>   .irp
> num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
> @@ -1109,5 +1138,6 @@
>  })
>  
>  #endif
> +#endif   /* LINKER_SCRIPT */
>  
>  #endif   /* __ASM_SYSREG_H */
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 23c326a06b2d..e3b9aa372b96 100644
> --- a/arch/arm64/mm/proc.S
> +++ 

Re: [PATCH v2 02/24] psci: Accessor for configured PSCI function IDs

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:42:56 +,
David Brazdil  wrote:
> 
> Function IDs used by PSCI are configurable for v0.1 via DT/APCI. If the
> host is using PSCI v0.1, KVM's host PSCI proxy needs to use the same IDs.
> Expose the array holding the information with a read-only accessor.
> 
> Signed-off-by: David Brazdil 
> ---
>  drivers/firmware/psci/psci.c | 14 ++
>  include/linux/psci.h | 10 ++
>  2 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 213c68418a65..d835f3d8b121 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -58,16 +58,14 @@ typedef unsigned long (psci_fn)(unsigned long, unsigned 
> long,
>   unsigned long, unsigned long);
>  static psci_fn *invoke_psci_fn;
>  
> -enum psci_function {
> - PSCI_FN_CPU_SUSPEND,
> - PSCI_FN_CPU_ON,
> - PSCI_FN_CPU_OFF,
> - PSCI_FN_MIGRATE,
> - PSCI_FN_MAX,
> -};
> -
>  static u32 psci_function_id[PSCI_FN_MAX];
>  
> +u32 psci_get_function_id(enum psci_function fn)
> +{
> + WARN_ON(fn >= PSCI_FN_MAX);

If we are going to warn on something that is out of bounds, maybe we
shouldn't perform the access at all? And maybe check for lower bound
as well?

> + return psci_function_id[fn];
> +}
> +
>  #define PSCI_0_2_POWER_STATE_MASK\
>   (PSCI_0_2_POWER_STATE_ID_MASK | \
>   PSCI_0_2_POWER_STATE_TYPE_MASK | \
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index 2a1bfb890e58..5b49a5c82d6f 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -21,6 +21,16 @@ bool psci_power_state_is_valid(u32 state);
>  int psci_set_osi_mode(bool enable);
>  bool psci_has_osi_support(void);
>  
> +enum psci_function {
> + PSCI_FN_CPU_SUSPEND,
> + PSCI_FN_CPU_ON,
> + PSCI_FN_CPU_OFF,
> + PSCI_FN_MIGRATE,
> + PSCI_FN_MAX,
> +};
> +
> +u32 psci_get_function_id(enum psci_function fn);
> +
>  struct psci_operations {
>   u32 (*get_version)(void);
>   int (*cpu_suspend)(u32 state, unsigned long entry_point);
> -- 
> 2.29.2.299.gdc1121823c-goog
> 
> 

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 00/24] Opt-in always-on nVHE hypervisor

2020-11-23 Thread Marc Zyngier
On Mon, 16 Nov 2020 20:42:54 +,
David Brazdil  wrote:
> 
> As we progress towards being able to keep guest state private to the
> host running nVHE hypervisor, this series allows the hypervisor to
> install itself on newly booted CPUs before the host is allowed to run
> on them.
> 
> All functionality described below is opt-in, guarded by an early param
> 'kvm-arm.protected'. Future patches specific to the new "protected" mode
> should be hidden behind the same param.
> 
> The hypervisor starts trapping host SMCs and intercepting host's PSCI
> CPU_ON/SUSPEND calls. It replaces the host's entry point with its own,
> initializes the EL2 state of the new CPU and installs the nVHE hyp vector
> before ERETing to the host's entry point.
> 
> The kernel checks new cores' features against the finalized system
> capabilities. To avoid the need to move this code/data to EL2, the
> implementation only allows to boot cores that were online at the time of
> KVM initialization and therefore had been checked already.
> 
> Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs
> implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded
> to EL3. Future changes will need to ensure the safety of all SMCs wrt.
> private guests.
> 
> The host is still allowed to reset EL2 back to the stub vector, eg. for
> hibernation or kexec, but will not disable nVHE when there are no VMs.
> 
> Tested on Rock Pi 4b, based on 5.10-rc4.

Adding Lorenzo and Sudeep for the PSCI side of things.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt indices

2020-11-23 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 16 November 2020 11:00
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org; kvmarm@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> alex.william...@redhat.com
> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> Thodi ;
> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> nicoleots...@gmail.com; yuzenghui 
> Subject: [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt
> indices
> 
> Implement IRQ capability chain infrastructure. All interrupt
> indexes beyond VFIO_PCI_NUM_IRQS are handled as extended
> interrupts. They are registered with a specific type/subtype
> and supported flags.
> 
> Signed-off-by: Eric Auger 
> ---
>  drivers/vfio/pci/vfio_pci.c | 99 +++--
>  drivers/vfio/pci/vfio_pci_intrs.c   | 62 ++
>  drivers/vfio/pci/vfio_pci_private.h | 14 
>  3 files changed, 157 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 2a6cc1a87323..93e03a4a5f32 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -608,6 +608,14 @@ static void vfio_pci_disable(struct vfio_pci_device
> *vdev)
> 
>   WARN_ON(iommu_unregister_device_fault_handler(>pdev->dev));
> 
> + for (i = 0; i < vdev->num_ext_irqs; i++)
> + vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
> + VFIO_IRQ_SET_ACTION_TRIGGER,
> + VFIO_PCI_NUM_IRQS + i, 0, 0, NULL);
> + vdev->num_ext_irqs = 0;
> + kfree(vdev->ext_irqs);
> + vdev->ext_irqs = NULL;
> +
>   /* Device closed, don't need mutex here */
>   list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
>>ioeventfds_list, next) {
> @@ -823,6 +831,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device
> *vdev, int irq_type)
>   return 1;
>   } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>   return 1;
> + } else if (irq_type >= VFIO_PCI_NUM_IRQS &&
> +irq_type < VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs) {
> + return 1;
>   }
> 
>   return 0;
> @@ -1008,7 +1019,7 @@ static long vfio_pci_ioctl(void *device_data,
>   info.flags |= VFIO_DEVICE_FLAGS_RESET;
> 
>   info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
> - info.num_irqs = VFIO_PCI_NUM_IRQS;
> + info.num_irqs = VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs;
> 
>   if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV)) {
>   int ret = vfio_pci_info_zdev_add_caps(vdev, );
> @@ -1187,36 +1198,87 @@ static long vfio_pci_ioctl(void *device_data,
> 
>   } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>   struct vfio_irq_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + unsigned long capsz;
> 
>   minsz = offsetofend(struct vfio_irq_info, count);
> 
> + /* For backward compatibility, cannot require this */
> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
> +
>   if (copy_from_user(, (void __user *)arg, minsz))
>   return -EFAULT;
> 
> - if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
> + if (info.argsz < minsz ||
> + info.index >= VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs)
>   return -EINVAL;
> 
> - switch (info.index) {
> - case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
> - case VFIO_PCI_REQ_IRQ_INDEX:
> - break;
> - case VFIO_PCI_ERR_IRQ_INDEX:
> - if (pci_is_pcie(vdev->pdev))
> - break;
> - fallthrough;
> - default:
> - return -EINVAL;
> - }
> + if (info.argsz >= capsz)
> + minsz = capsz;
> 
>   info.flags = VFIO_IRQ_INFO_EVENTFD;
> 
> - info.count = vfio_pci_get_irq_count(vdev, info.index);
> -
> - if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
> + switch (info.index) {
> + case VFIO_PCI_INTX_IRQ_INDEX:
>   info.flags |= (VFIO_IRQ_INFO_MASKABLE |
>  VFIO_IRQ_INFO_AUTOMASKED);
> - else
> + break;
> + case VFIO_PCI_MSI_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
> + case VFIO_PCI_REQ_IRQ_INDEX:
>   info.flags |= VFIO_IRQ_INFO_NORESIZE;
> + 

Re: [PATCH v5 0/2] MTE support for KVM guest

2020-11-23 Thread Dr. David Alan Gilbert
* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Thu, 19 Nov 2020 at 15:39, Steven Price  wrote:
> > This series adds support for Arm's Memory Tagging Extension (MTE) to
> > KVM, allowing KVM guests to make use of it. This builds on the existing
> > user space support already in v5.10-rc1, see [1] for an overview.
> 
> > The change to require the VMM to map all guest memory PROT_MTE is
> > significant as it means that the VMM has to deal with the MTE tags even
> > if it doesn't care about them (e.g. for virtual devices or if the VMM
> > doesn't support migration). Also unfortunately because the VMM can
> > change the memory layout at any time the check for PROT_MTE/VM_MTE has
> > to be done very late (at the point of faulting pages into stage 2).
> 
> I'm a bit dubious about requring the VMM to map the guest memory
> PROT_MTE unless somebody's done at least a sketch of the design
> for how this would work on the QEMU side. Currently QEMU just
> assumes the guest memory is guest memory and it can access it
> without special precautions...

Although that is also changing because of the encrypted/protected memory
in things like SEV.

Dave

> thanks
> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.

2020-11-23 Thread Marc Zyngier

On 2020-11-23 10:44, Marc Zyngier wrote:

On 2020-11-11 06:22, Jianyong Wu wrote:

ptp_kvm will get this service through SMCC call.
The service offers wall time and cycle count of host to guest.
The caller must specify whether they want the host cycle count
or the difference between host cycle count and cntvoff.

Signed-off-by: Jianyong Wu 
---
 arch/arm64/kvm/hypercalls.c | 61 
+

 include/linux/arm-smccc.h   | 17 +++
 2 files changed, 78 insertions(+)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index b9d8607083eb..f7d189563f3d 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -9,6 +9,51 @@
 #include 
 #include 

+static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
+{
+   struct system_time_snapshot systime_snapshot;
+   u64 cycles = ~0UL;
+   u32 feature;
+
+   /*
+* system time and counter value must captured in the same
+* time to keep consistency and precision.
+*/
+   ktime_get_snapshot(_snapshot);
+
+   // binding ptp_kvm clocksource to arm_arch_counter
+   if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
+   return;
+
+   val[0] = upper_32_bits(systime_snapshot.real);
+   val[1] = lower_32_bits(systime_snapshot.real);


What is the endianness of these values? I can't see it defined
anywhere, and this is likely not to work if guest and hypervisor
don't align.


Scratch that. This is all passed via registers, so the endianness
of the data is irrelevant. Please discard any comment about endianness
I made in this review.

The documentation aspect still requires to be beefed up.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v15 8/9] doc: add ptp_kvm introduction for arm64 support

2020-11-23 Thread Marc Zyngier

On 2020-11-11 06:22, Jianyong Wu wrote:

PTP_KVM implementation depends on hypercall using SMCCC. So we
introduce a new SMCCC service ID. This doc explains how does the
ID define and how does PTP_KVM works on arm/arm64.

Signed-off-by: Jianyong Wu 
---
 Documentation/virt/kvm/api.rst |  9 +++
 Documentation/virt/kvm/arm/index.rst   |  1 +
 Documentation/virt/kvm/arm/ptp_kvm.rst | 29 +
 Documentation/virt/kvm/timekeeping.rst | 35 ++
 4 files changed, 74 insertions(+)
 create mode 100644 Documentation/virt/kvm/arm/ptp_kvm.rst

diff --git a/Documentation/virt/kvm/api.rst 
b/Documentation/virt/kvm/api.rst

index 36d5f1f3c6dd..9843dbcbf770 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6391,3 +6391,12 @@ When enabled, KVM will disable paravirtual
features provided to the
 guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
 (0x4001). Otherwise, a guest may use the paravirtual features
 regardless of what has actually been exposed through the CPUID leaf.
+
+8.27 KVM_CAP_PTP_KVM
+
+
+:Architectures: arm64
+
+This capability indicates that KVM virtual PTP service is supported in 
host.
+It must company with the implementation of KVM virtual PTP service in 
host
+so VMM can probe if there is the service in host by checking this 
capability.

diff --git a/Documentation/virt/kvm/arm/index.rst
b/Documentation/virt/kvm/arm/index.rst
index 3e2b2aba90fc..78a9b670aafe 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -10,3 +10,4 @@ ARM
hyp-abi
psci
pvtime
+   ptp_kvm
diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst
b/Documentation/virt/kvm/arm/ptp_kvm.rst
new file mode 100644
index ..bb1e6cfefe44
--- /dev/null
+++ b/Documentation/virt/kvm/arm/ptp_kvm.rst
@@ -0,0 +1,29 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+PTP_KVM support for arm/arm64
+=
+
+PTP_KVM is used for time sync between guest and host in a high 
precision.

+It needs to get the wall time and counter value from the host and
transfer these
+to guest via hypercall service. So one more hypercall service has been 
added.

+
+This new SMCCC hypercall is defined as:
+
+* ARM_SMCCC_HYP_KVM_PTP_FUNC_ID: 0x8601
+
+As both 32 and 64-bits ptp_kvm client should be supported, we choose
SMC32/HVC32
+calling convention.
+
+ARM_SMCCC_HYP_KVM_PTP_FUNC_ID:
+
+=====
+Function ID: (uint32)  0x8601
+Arguments:  (uint32)  ARM_PTP_PHY_COUNTER(1) or
ARM_PTP_VIRT_COUNTER(0)
+   which indicate acquiring physical 
counter or

+   virtual counter respectively.
+return value:(uint32)  NOT_SUPPORTED(-1) or val0 and val1 
represent
+   wall clock time and val2 and val3 
represent

+   counter cycle.


This needs a lot more description:

- Which word contains what part of the data (upper/lower part of the 
64bit data)

- The endianness of the data returned

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v15 7/9] ptp: arm/arm64: Enable ptp_kvm for arm/arm64

2020-11-23 Thread Marc Zyngier

On 2020-11-11 06:22, Jianyong Wu wrote:
Currently, there is no mechanism to keep time sync between guest and 
host
in arm/arm64 virtualization environment. Time in guest will drift 
compared

with host after boot up as they may both use third party time sources
to correct their time respectively. The time deviation will be in order
of milliseconds. But in some scenarios,like in cloud envirenment, we 
ask


environment


for higher time precision.

kvm ptp clock, which chooses the host clock source as a reference
clock to sync time between guest and host, has been adopted by x86
which takes the time sync order from milliseconds to nanoseconds.

This patch enables kvm ptp clock for arm/arm64 and improves clock sync 
precison


precision


significantly.

Test result comparisons between with kvm ptp clock and without it in 
arm/arm64

are as follows. This test derived from the result of command 'chronyc
sources'. we should take more care of the last sample column which 
shows
the offset between the local clock and the source at the last 
measurement.


no kvm ptp in guest:
MS Name/IP address   Stratum Poll Reach LastRx Last sample

^* dns1.synet.edu.cn  2   6   37713  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37721  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37729  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37737  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37745  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37753  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37761  +1040us[+1581us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   377 4   -130us[ +796us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37712   -130us[ +796us] +/-   
21ms
^* dns1.synet.edu.cn  2   6   37720   -130us[ +796us] +/-   
21ms


in host:
MS Name/IP address   Stratum Poll Reach LastRx Last sample

^* 120.25.115.20  2   7   37772   -470us[ -603us] +/-   
18ms
^* 120.25.115.20  2   7   37792   -470us[ -603us] +/-   
18ms
^* 120.25.115.20  2   7   377   112   -470us[ -603us] +/-   
18ms
^* 120.25.115.20  2   7   377 2   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   37722   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   37743   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   37763   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   37783   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   377   103   +872ns[-6808ns] +/-   
17ms
^* 120.25.115.20  2   7   377   123   +872ns[-6808ns] +/-   
17ms


The dns1.synet.edu.cn is the network reference clock for guest and
120.25.115.20 is the network reference clock for host. we can't get the
clock error between guest and host directly, but a roughly estimated 
value

will be in order of hundreds of us to ms.

with kvm ptp in guest:
chrony has been disabled in host to remove the disturb by network 
clock.


MS Name/IP address Stratum Poll Reach LastRx Last sample

* PHC00   3   377 8 -7ns[   +1ns] +/-
3ns
* PHC00   3   377 8 +1ns[  +16ns] +/-
3ns
* PHC00   3   377 6 -4ns[   -0ns] +/-
6ns
* PHC00   3   377 6 -8ns[  -12ns] +/-
5ns
* PHC00   3   377 5 +2ns[   +4ns] +/-
4ns
* PHC00   3   37713 +2ns[   +4ns] +/-
4ns
* PHC00   3   37712 -4ns[   -6ns] +/-
4ns
* PHC00   3   37711 -8ns[  -11ns] +/-
6ns
* PHC00   3   37710-14ns[  -20ns] +/-
4ns
* PHC00   3   377 8 +4ns[   +5ns] +/-
4ns


The PHC0 is the ptp clock which choose the host clock as its source
clock. So we can see that the clock difference between host and guest
is in order of ns.

Signed-off-by: Jianyong Wu 
---
 drivers/clocksource/arm_arch_timer.c | 28 ++
 drivers/ptp/Kconfig  |  2 +-
 drivers/ptp/Makefile |  1 +
 drivers/ptp/ptp_kvm_arm.c| 44 
 4 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ptp/ptp_kvm_arm.c

diff --git a/drivers/clocksource/arm_arch_timer.c
b/drivers/clocksource/arm_arch_timer.c
index d55acffb0b90..b33c5a663d30 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -1650,3 +1651,30 @@ static int __init arch_timer_acpi_init(struct
acpi_table_header *table)
 }
 

Re: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.

2020-11-23 Thread Marc Zyngier

On 2020-11-11 06:22, Jianyong Wu wrote:

ptp_kvm will get this service through SMCC call.
The service offers wall time and cycle count of host to guest.
The caller must specify whether they want the host cycle count
or the difference between host cycle count and cntvoff.

Signed-off-by: Jianyong Wu 
---
 arch/arm64/kvm/hypercalls.c | 61 +
 include/linux/arm-smccc.h   | 17 +++
 2 files changed, 78 insertions(+)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index b9d8607083eb..f7d189563f3d 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -9,6 +9,51 @@
 #include 
 #include 

+static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
+{
+   struct system_time_snapshot systime_snapshot;
+   u64 cycles = ~0UL;
+   u32 feature;
+
+   /*
+* system time and counter value must captured in the same
+* time to keep consistency and precision.
+*/
+   ktime_get_snapshot(_snapshot);
+
+   // binding ptp_kvm clocksource to arm_arch_counter
+   if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
+   return;
+
+   val[0] = upper_32_bits(systime_snapshot.real);
+   val[1] = lower_32_bits(systime_snapshot.real);


What is the endianness of these values? I can't see it defined
anywhere, and this is likely not to work if guest and hypervisor
don't align.


+
+   /*
+* which of virtual counter or physical counter being
+* asked for is decided by the r1 value of SMCCC
+* call. If no invalid r1 value offered, default cycle
+* value(-1) will be returned.
+* Note: keep in mind that feature is u32 and smccc_get_arg1
+* will return u64, so need auto cast here.
+*/
+   feature = smccc_get_arg1(vcpu);
+   switch (feature) {
+   case ARM_PTP_VIRT_COUNTER:
+		cycles = systime_snapshot.cycles - vcpu_read_sys_reg(vcpu, 
CNTVOFF_EL2);

+   break;
+   case ARM_PTP_PHY_COUNTER:
+   cycles = systime_snapshot.cycles;
+   break;
+   case ARM_PTP_NONE_COUNTER:


What is this "NONE" counter?


+   break;
+   default:
+   val[0] = SMCCC_RET_NOT_SUPPORTED;
+   break;
+   }
+   val[2] = upper_32_bits(cycles);
+   val[3] = lower_32_bits(cycles);


Same problem as above.


+}
+
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
u32 func_id = smccc_get_function(vcpu);
@@ -79,6 +124,22 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
+   val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP);
+   break;
+   /*
+* This serves virtual kvm_ptp.
+* Four values will be passed back.
+* reg0 stores high 32-bits of host ktime;
+* reg1 stores low 32-bits of host ktime;
+* For ARM_PTP_VIRT_COUNTER:
+* reg2 stores high 32-bits of difference of host cycles and cntvoff;
+* reg3 stores low 32-bits of difference of host cycles and cntvoff.
+* For ARM_PTP_PHY_COUNTER:
+* reg2 stores the high 32-bits of host cycles;
+* reg3 stores the low 32-bits of host cycles.
+*/
+   case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
+   kvm_ptp_get_time(vcpu, val);
break;
default:
return kvm_psci_call(vcpu);
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index d75408141137..a03c5dd409d3 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -103,6 +103,7 @@

 /* KVM "vendor specific" services */
 #define ARM_SMCCC_KVM_FUNC_FEATURES0
+#define ARM_SMCCC_KVM_FUNC_KVM_PTP 1


I think having KVM once in the name is enough.


 #define ARM_SMCCC_KVM_FUNC_FEATURES_2  127
 #define ARM_SMCCC_KVM_NUM_FUNCS128

@@ -114,6 +115,22 @@

 #define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED   1

+/*
+ * ptp_kvm is a feature used for time sync between vm and host.
+ * ptp_kvm module in guest kernel will get service from host using
+ * this hypercall ID.
+ */
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID   \
+   ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+  ARM_SMCCC_SMC_32,\
+  ARM_SMCCC_OWNER_VENDOR_HYP,  \
+  ARM_SMCCC_KVM_FUNC_KVM_PTP)
+
+/* ptp_kvm counter type ID */
+#define ARM_PTP_VIRT_COUNTER   0
+#define ARM_PTP_PHY_COUNTER1
+#define ARM_PTP_NONE_COUNTER   2


The architecture definitely doesn't have this last counter.


+
 /* Paravirtualised time calls (defined by ARM DEN0057A) */
 #define ARM_SMCCC_HV_PV_TIME_FEATURES  \

Re: [RFC PATCH v1 3/4] KVM: arm64: GICv4.1: Restore VLPI's pending state to physical side

2020-11-23 Thread Marc Zyngier

On 2020-11-23 06:54, Shenming Lu wrote:

From: Zenghui Yu 

When setting the forwarding path of a VLPI, it is more consistent to


I'm not sure it is more consistent. It is a *new* behaviour, because it 
only

matters for migration, which has been so far unsupported.

also transfer the pending state from irq->pending_latch to VPT 
(especially

in migration, the pending states of VLPIs are restored into kvm’s vgic
first). And we currently send "INT+VSYNC" to trigger a VLPI to pending.

Signed-off-by: Zenghui Yu 
Signed-off-by: Shenming Lu 
---
 arch/arm64/kvm/vgic/vgic-v4.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/kvm/vgic/vgic-v4.c 
b/arch/arm64/kvm/vgic/vgic-v4.c

index b5fa73c9fd35..cc3ab9cea182 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -418,6 +418,18 @@ int kvm_vgic_v4_set_forwarding(struct kvm *kvm, 
int virq,

irq->host_irq= virq;
atomic_inc(>vlpi_count);

+   /* Transfer pending state */
+   ret = irq_set_irqchip_state(irq->host_irq,
+   IRQCHIP_STATE_PENDING,
+   irq->pending_latch);
+   WARN_RATELIMIT(ret, "IRQ %d", irq->host_irq);
+
+   /*
+* Let it be pruned from ap_list later and don't bother
+* the List Register.
+*/
+   irq->pending_latch = false;


It occurs to me that calling into irq_set_irqchip_state() for a large
number of interrupts can take a significant amount of time. It is also
odd that you dump the VPT with the VPE unmapped, but rely on the VPE
being mapped for the opposite operation.

Shouldn't these be symmetric, all performed while the VPE is unmapped?
It would also save a lot of ITS traffic.


+
 out:
mutex_unlock(>its_lock);
return ret;


Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v15 0/9] Enable ptp_kvm for arm/arm64

2020-11-23 Thread Jianyong Wu
Hi,
Ping ...
Any comments?

> -Original Message-
> From: Jianyong Wu 
> Sent: Wednesday, November 11, 2020 2:22 PM
> To: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com;
> m...@kernel.org; richardcoch...@gmail.com; Mark Rutland
> ; w...@kernel.org; Suzuki Poulose
> ; Andre Przywara ;
> Steven Price 
> Cc: linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> kvmarm@lists.cs.columbia.edu; k...@vger.kernel.org; Steve Capper
> ; Justin He ; Jianyong Wu
> ; nd 
> Subject: [PATCH v15 0/9] Enable ptp_kvm for arm/arm64
> 
> Currently, we offen use ntp (sync time with remote network clock) to sync
> time in VM. But the precision of ntp is subject to network delay so it's 
> difficult
> to sync time in a high precision.
> 
> kvm virtual ptp clock (ptp_kvm) offers another way to sync time in VM, as
> the remote clock locates in the host instead of remote network clock.
> It targets to sync time between guest and host in virtualization environment
> and in this way, we can keep the time of all the VMs running in the same host
> in sync. In general, the delay of communication between host and guest is
> quiet small, so ptp_kvm can offer time sync precision up to in order of
> nanosecond. Please keep in mind that ptp_kvm just limits itself to be a
> channel which transmit the remote clock from host to guest and leaves the
> time sync jobs to an application, eg. chrony, in usersapce in VM.
> 
> How ptp_kvm works:
> After ptp_kvm initialized, there will be a new device node under /dev called
> ptp%d. A guest userspace service, like chrony, can use this device to get host
> walltime, sometimes also counter cycle, which depends on the service it calls.
> Then this guest userspace service can use those data to do the time sync for
> guest.
> here is a rough sketch to show how kvm ptp clock works.
> 
> ||  |--|
> |   guest userspace  |  |  host|
> |ioctl -> /dev/ptp%d |  |  |
> |   ^   ||  |  |
> ||  |  |
> |   |   | guest kernel   |  |  |
> |   |   V  (get host walltime/counter cycle)   |
> |  ptp_kvm -> hypercall - - - - - - - - - - ->hypercall service|
> | <- - - - - - - - - - - - |
> ||  |--|
> 
> 1. time sync service in guest userspace call ptp device through /dev/ptp%d.
> 2. ptp_kvm module in guest recive this request then invoke hypercall to
> route into host kernel to request host walltime/counter cycle.
> 3. ptp_kvm hypercall service in host response to the request and send data
> back.
> 4. ptp (not ptp_kvm) in guest copy the data to userspace.
> 
> This ptp_kvm implementation focuses itself to step 2 and 3 and step 2 works
> in guest comparing step 3 works in host kernel.
> 
> change log:
> 
> from v14 to v15
> (1) enable ptp_kvm on arm32 guest, also ptp_kvm has been tested on
> both arm64 and arm32 guest running on arm64 kvm host.
> (2) move arch-agnostic part of ptp_kvm.rst into timekeeping.rst.
> (3) rename KVM_CAP_ARM_PTP_KVM to KVM_CAP_PTP_KVM as it
> should be arch agnostic.
> (4) add description for KVM_CAP_PTP_KVM in
> Documentation/virt/kvm/api.rst.
> (5) adjust dependency in Kconfig for ptp_kvm.
> (6) refine multi-arch process in driver/ptp/Makefile.
> (7) fix make pdfdocs htmldocs issue for ptp_kvm doc.
> (8) address other issues from comments in v14.
> (9) fold hypercall service of ptp_kvm as a function.
> (10) rebase to 5.10-rc3.
> 
> from v13 to v14
> (1) rebase code on 5.9-rc3.
> (2) add a document to introduce implementation of PTP_KVM on arm64.
> (3) fix comments issue in hypercall.c.
> (4) export arm_smccc_1_1_get_conduit using EXPORT_SYMBOL_GPL.
> (5) fix make issue on x86 reported by kernel test robot.
> 
> from v12 to v13:
> (1) rebase code on 5.8-rc1.
> (2) this patch set base on 2 patches of 1/8 and 2/8 from Will Decon.
> (3) remove the change to ptp device code of extend getcrosststamp.
> (4) remove the mechanism of letting user choose the counter type in
> ptp_kvm for arm64.
> (5) add virtual counter option in ptp_kvm service to let user choose 
> the
> specific counter explicitly.
> 
> from v11 to v12:
> (1) rebase code on 5.7-rc6 and rebase 2 patches from Will Decon
> including 1/11 and 2/11. as these patches introduce discover mechanism of
> vendor smccc service.
> (2) rebase ptp_kvm hypercall service from standard smccc to vendor
> smccc and add ptp_kvm to 

Re: [RFC PATCH v1 2/4] KVM: arm64: GICv4.1: Try to save hw pending state in save_pending_tables

2020-11-23 Thread Marc Zyngier

On 2020-11-23 06:54, Shenming Lu wrote:

After pausing all vCPUs and devices capable of interrupting, in order

^
See my comment below about this.


to save the information of all interrupts, besides flushing the pending
states in kvm’s vgic, we also try to flush the states of VLPIs in the
virtual pending tables into guest RAM, but we need to have GICv4.1 and
safely unmap the vPEs first.

Signed-off-by: Shenming Lu 
---
 arch/arm64/kvm/vgic/vgic-v3.c | 62 +++
 1 file changed, 56 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-v3.c 
b/arch/arm64/kvm/vgic/vgic-v3.c

index 9cdf39a94a63..e1b3aa4b2b12 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -1,6 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0-only

 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -356,6 +358,39 @@ int vgic_v3_lpi_sync_pending_status(struct kvm
*kvm, struct vgic_irq *irq)
return 0;
 }

+/*
+ * With GICv4.1, we can get the VLPI's pending state after unmapping
+ * the vPE. The deactivation of the doorbell interrupt will trigger
+ * the unmapping of the associated vPE.
+ */
+static void get_vlpi_state_pre(struct vgic_dist *dist)
+{
+   struct irq_desc *desc;
+   int i;
+
+   if (!kvm_vgic_global_state.has_gicv4_1)
+   return;
+
+   for (i = 0; i < dist->its_vm.nr_vpes; i++) {
+   desc = irq_to_desc(dist->its_vm.vpes[i]->irq);
+   irq_domain_deactivate_irq(irq_desc_get_irq_data(desc));
+   }
+}
+
+static void get_vlpi_state_post(struct vgic_dist *dist)


nit: the naming feels a bit... odd. Pre/post what?


+{
+   struct irq_desc *desc;
+   int i;
+
+   if (!kvm_vgic_global_state.has_gicv4_1)
+   return;
+
+   for (i = 0; i < dist->its_vm.nr_vpes; i++) {
+   desc = irq_to_desc(dist->its_vm.vpes[i]->irq);
+   irq_domain_activate_irq(irq_desc_get_irq_data(desc), false);
+   }
+}
+
 /**
  * vgic_v3_save_pending_tables - Save the pending tables into guest 
RAM

  * kvm lock and all vcpu lock must be held
@@ -365,14 +400,17 @@ int vgic_v3_save_pending_tables(struct kvm *kvm)
struct vgic_dist *dist = >arch.vgic;
struct vgic_irq *irq;
gpa_t last_ptr = ~(gpa_t)0;
-   int ret;
+   int ret = 0;
u8 val;

+   get_vlpi_state_pre(dist);
+
list_for_each_entry(irq, >lpi_list_head, lpi_list) {
int byte_offset, bit_nr;
struct kvm_vcpu *vcpu;
gpa_t pendbase, ptr;
bool stored;
+   bool is_pending = irq->pending_latch;

vcpu = irq->target_vcpu;
if (!vcpu)
@@ -387,24 +425,36 @@ int vgic_v3_save_pending_tables(struct kvm *kvm)
if (ptr != last_ptr) {
ret = kvm_read_guest_lock(kvm, ptr, , 1);
if (ret)
-   return ret;
+   goto out;
last_ptr = ptr;
}

stored = val & (1U << bit_nr);
-   if (stored == irq->pending_latch)
+
+   /* also flush hw pending state */


This comment looks out of place, as we aren't flushing anything.


+   if (irq->hw) {
+   WARN_RATELIMIT(irq_get_irqchip_state(irq->host_irq,
+   IRQCHIP_STATE_PENDING, 
_pending),
+  "IRQ %d", irq->host_irq);


Isn't this going to warn like mad on a GICv4.0 system where this, by 
definition,

will generate an error?


+   }
+
+   if (stored == is_pending)
continue;

-   if (irq->pending_latch)
+   if (is_pending)
val |= 1 << bit_nr;
else
val &= ~(1 << bit_nr);

ret = kvm_write_guest_lock(kvm, ptr, , 1);
if (ret)
-   return ret;
+   goto out;
}
-   return 0;
+
+out:
+   get_vlpi_state_post(dist);


This bit worries me: you have unmapped the VPEs, so any interrupt that 
has been
generated during that phase is now forever lost (the GIC doesn't have 
ownership

of the pending tables).

Do you really expect the VM to be restartable from that point? I don't 
see how

this is possible.


+
+   return ret;
 }

 /**


Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v1 1/4] irqchip/gic-v4.1: Plumb get_irqchip_state VLPI callback

2020-11-23 Thread Marc Zyngier

On 2020-11-23 06:54, Shenming Lu wrote:

From: Zenghui Yu 

Up to now, the irq_get_irqchip_state() callback of its_irq_chip
leaves unimplemented since there is no architectural way to get
the VLPI's pending state before GICv4.1. Yeah, there has one in
v4.1 for VLPIs.

With GICv4.1, after unmapping the vPE, which cleans and invalidates
any caching of the VPT, we can get the VLPI's pending state by


This is a crucial note: without this unmapping and invalidation,
the pending bits are not generally accessible (they could be cached
in a GIC private structure, cache or otherwise).


peeking at the VPT. So we implement the irq_get_irqchip_state()
callback of its_irq_chip to do it.

Signed-off-by: Zenghui Yu 
Signed-off-by: Shenming Lu 
---
 drivers/irqchip/irq-gic-v3-its.c | 38 
 1 file changed, 38 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c 
b/drivers/irqchip/irq-gic-v3-its.c

index 0fec31931e11..287003cacac7 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1695,6 +1695,43 @@ static void its_irq_compose_msi_msg(struct
irq_data *d, struct msi_msg *msg)
iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg);
 }

+static bool its_peek_vpt(struct its_vpe *vpe, irq_hw_number_t hwirq)
+{
+   int mask = hwirq % BITS_PER_BYTE;


nit: this isn't a mask, but a shift instead. BIT(hwirq % BPB) would give
you a mask.


+   void *va;
+   u8 *pt;
+
+   va = page_address(vpe->vpt_page);
+   pt = va + hwirq / BITS_PER_BYTE;
+
+   return !!(*pt & (1U << mask));
+}
+
+static int its_irq_get_irqchip_state(struct irq_data *d,
+enum irqchip_irq_state which, bool *val)
+{
+   struct its_device *its_dev = irq_data_get_irq_chip_data(d);
+   struct its_vlpi_map *map = get_vlpi_map(d);
+
+   if (which != IRQCHIP_STATE_PENDING)
+   return -EINVAL;
+
+   /* not intended for physical LPI's pending state */
+   if (!map)
+   return -EINVAL;
+
+   /*
+* In GICv4.1, a VMAPP with {V,Alloc}=={0,1} cleans and invalidates
+* any caching of the VPT associated with the vPEID held in the GIC.
+*/
+   if (!is_v4_1(its_dev->its) || atomic_read(>vpe->vmapp_count))


It isn't clear to me what prevents this from racing against a mapping of
the VPE. Actually, since we only hold the LPI irqdesc lock, I'm pretty 
sure

nothing prevents it.


+   return -EACCES;


I can sort of buy EACCESS for a VPE that is currently mapped, but a 
non-4.1

ITS should definitely return EINVAL.


+
+   *val = its_peek_vpt(map->vpe, map->vintid);
+
+   return 0;
+}
+
 static int its_irq_set_irqchip_state(struct irq_data *d,
 enum irqchip_irq_state which,
 bool state)
@@ -1975,6 +2012,7 @@ static struct irq_chip its_irq_chip = {
.irq_eoi= irq_chip_eoi_parent,
.irq_set_affinity   = its_set_affinity,
.irq_compose_msi_msg= its_irq_compose_msi_msg,
+   .irq_get_irqchip_state  = its_irq_get_irqchip_state,


My biggest issue with this is that it isn't a reliable interface.
It happens to work in the context of KVM, because you make sure it
is called at the right time, but that doesn't make it safe in general
(anyone with the interrupt number is allowed to call this at any time).

Is there a problem with poking at the VPT page from the KVM side?
The code should be exactly the same (maybe simpler even), and at least
you'd be guaranteed to be in the correct context.


.irq_set_irqchip_state  = its_irq_set_irqchip_state,
.irq_retrigger  = its_irq_retrigger,
.irq_set_vcpu_affinity  = its_irq_set_vcpu_affinity,


Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v1 3/4] KVM: arm64: GICv4.1: Restore VLPI's pending state to physical side

2020-11-23 Thread Shenming Lu
From: Zenghui Yu 

When setting the forwarding path of a VLPI, it is more consistent to
also transfer the pending state from irq->pending_latch to VPT (especially
in migration, the pending states of VLPIs are restored into kvm’s vgic
first). And we currently send "INT+VSYNC" to trigger a VLPI to pending.

Signed-off-by: Zenghui Yu 
Signed-off-by: Shenming Lu 
---
 arch/arm64/kvm/vgic/vgic-v4.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index b5fa73c9fd35..cc3ab9cea182 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -418,6 +418,18 @@ int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int virq,
irq->host_irq   = virq;
atomic_inc(>vlpi_count);
 
+   /* Transfer pending state */
+   ret = irq_set_irqchip_state(irq->host_irq,
+   IRQCHIP_STATE_PENDING,
+   irq->pending_latch);
+   WARN_RATELIMIT(ret, "IRQ %d", irq->host_irq);
+
+   /*
+* Let it be pruned from ap_list later and don't bother
+* the List Register.
+*/
+   irq->pending_latch = false;
+
 out:
mutex_unlock(>its_lock);
return ret;
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v1 0/4] KVM: arm64: Add VLPI migration support on GICv4.1

2020-11-23 Thread Shenming Lu
In GICv4.1, migration has been supported except for (directly-injected)
VLPI. And GICv4.1 spec explicitly gives a way to get the VLPI's pending
state (which was crucially missing in GICv4.0). So we make VLPI migration
capable on GICv4.1 in this patch set.

In order to support VLPI migration, we need to save and restore all
required configuration information and pending states of VLPIs. But
in fact, the configuration information of VLPIs has already been saved
(or will be reallocated on the dst host...) in vgic(kvm) migration.
So we only have to migrate the pending states of VLPIs specially.

Below is the related workflow in migration.

On the save path:
In migration completion:
pause all vCPUs
|
call each VM state change handler:
pause other devices (just keep from sending interrupts, 
and
such as VFIO migration protocol has already realized it 
[1])
|
flush ITS tables into guest RAM
|
flush RDIST pending tables (also flush VLPI state here)
|
...
On the resume path:
load each device's state:
restore ITS tables (include pending tables) from guest RAM
|
for other (PCI) devices (paused), if configured to have VLPIs,
establish the forwarding paths of their VLPIs (and transfer
the pending states from kvm's vgic to VPT here)

Yet TODO:
 - For some reason, such as for VFIO PCI devices, there may be repeated
   resettings of HW VLPI configuration in load_state, resulting in the
   loss of pending state. A very intuitive solution is to retrieve the
   pending state in unset_forwarding (and this should be so regardless
   of migration). But at normal run time, this function may be called
   when all devices are running, in which case the unmapping of VPE is
   not allowed. It seems to be an almost insoluble bug...
   There are other possible solutions as follows:
   1) avoid unset_forwarding being called from QEMU in resuming (simply
   allocate all needed vectors first), which is more reasonable and
   efficient.
   2) add a new dedicated interface to transfer these pending states to
   HW in GIC VM state change handler corresponding to save_pending_tables.
   ...

Any comments and suggestions are very welcome.

Besides, we have tested this series in VFIO migration, and nothing else
goes wrong (with two issues committed [2][3]).

Links:
[1] vfio: UAPI for migration interface for device state:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=a8a24f3f6e38103b77cf399c38eb54e1219d00d6
[2] vfio: Move the saving of the config space to the right place in VFIO 
migration:
https://patchwork.ozlabs.org/patch/1400246/
[3] vfio: Set the priority of VFIO VM state change handler explicitly:
https://patchwork.ozlabs.org/patch/1401280/

Shenming Lu (2):
  KVM: arm64: GICv4.1: Try to save hw pending state in
save_pending_tables
  KVM: arm64: GICv4.1: Give a chance to save VLPI's pending state

Zenghui Yu (2):
  irqchip/gic-v4.1: Plumb get_irqchip_state VLPI callback
  KVM: arm64: GICv4.1: Restore VLPI's pending state to physical side

 .../virt/kvm/devices/arm-vgic-its.rst |  2 +-
 arch/arm64/kvm/vgic/vgic-its.c|  6 +-
 arch/arm64/kvm/vgic/vgic-v3.c | 62 +--
 arch/arm64/kvm/vgic/vgic-v4.c | 12 
 drivers/irqchip/irq-gic-v3-its.c  | 38 
 5 files changed, 110 insertions(+), 10 deletions(-)

-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v1 1/4] irqchip/gic-v4.1: Plumb get_irqchip_state VLPI callback

2020-11-23 Thread Shenming Lu
From: Zenghui Yu 

Up to now, the irq_get_irqchip_state() callback of its_irq_chip
leaves unimplemented since there is no architectural way to get
the VLPI's pending state before GICv4.1. Yeah, there has one in
v4.1 for VLPIs.

With GICv4.1, after unmapping the vPE, which cleans and invalidates
any caching of the VPT, we can get the VLPI's pending state by
peeking at the VPT. So we implement the irq_get_irqchip_state()
callback of its_irq_chip to do it.

Signed-off-by: Zenghui Yu 
Signed-off-by: Shenming Lu 
---
 drivers/irqchip/irq-gic-v3-its.c | 38 
 1 file changed, 38 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 0fec31931e11..287003cacac7 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1695,6 +1695,43 @@ static void its_irq_compose_msi_msg(struct irq_data *d, 
struct msi_msg *msg)
iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg);
 }
 
+static bool its_peek_vpt(struct its_vpe *vpe, irq_hw_number_t hwirq)
+{
+   int mask = hwirq % BITS_PER_BYTE;
+   void *va;
+   u8 *pt;
+
+   va = page_address(vpe->vpt_page);
+   pt = va + hwirq / BITS_PER_BYTE;
+
+   return !!(*pt & (1U << mask));
+}
+
+static int its_irq_get_irqchip_state(struct irq_data *d,
+enum irqchip_irq_state which, bool *val)
+{
+   struct its_device *its_dev = irq_data_get_irq_chip_data(d);
+   struct its_vlpi_map *map = get_vlpi_map(d);
+
+   if (which != IRQCHIP_STATE_PENDING)
+   return -EINVAL;
+
+   /* not intended for physical LPI's pending state */
+   if (!map)
+   return -EINVAL;
+
+   /*
+* In GICv4.1, a VMAPP with {V,Alloc}=={0,1} cleans and invalidates
+* any caching of the VPT associated with the vPEID held in the GIC.
+*/
+   if (!is_v4_1(its_dev->its) || atomic_read(>vpe->vmapp_count))
+   return -EACCES;
+
+   *val = its_peek_vpt(map->vpe, map->vintid);
+
+   return 0;
+}
+
 static int its_irq_set_irqchip_state(struct irq_data *d,
 enum irqchip_irq_state which,
 bool state)
@@ -1975,6 +2012,7 @@ static struct irq_chip its_irq_chip = {
.irq_eoi= irq_chip_eoi_parent,
.irq_set_affinity   = its_set_affinity,
.irq_compose_msi_msg= its_irq_compose_msi_msg,
+   .irq_get_irqchip_state  = its_irq_get_irqchip_state,
.irq_set_irqchip_state  = its_irq_set_irqchip_state,
.irq_retrigger  = its_irq_retrigger,
.irq_set_vcpu_affinity  = its_irq_set_vcpu_affinity,
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v1 4/4] KVM: arm64: GICv4.1: Give a chance to save VLPI's pending state

2020-11-23 Thread Shenming Lu
Before GICv4.1, we do not have direct access to the VLPI's pending
state. So we simply let it fail early when encountering any VLPI.

But now we don't have to return -EACCES directly if on GICv4.1. So
let’s change the hard code and give a chance to save the VLPI's pending
state (and preserve the interfaces).

Signed-off-by: Shenming Lu 
---
 Documentation/virt/kvm/devices/arm-vgic-its.rst | 2 +-
 arch/arm64/kvm/vgic/vgic-its.c  | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/devices/arm-vgic-its.rst 
b/Documentation/virt/kvm/devices/arm-vgic-its.rst
index 6c304fd2b1b4..d257eddbae29 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-its.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-its.rst
@@ -80,7 +80,7 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
 -EFAULT  Invalid guest ram access
 -EBUSY   One or more VCPUS are running
 -EACCES  The virtual ITS is backed by a physical GICv4 ITS, and the
-state is not available
+state is not available without GICv4.1
 ===  ==
 
 KVM_DEV_ARM_VGIC_GRP_ITS_REGS
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 40cbaca81333..ec7543a9617c 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2218,10 +2218,10 @@ static int vgic_its_save_itt(struct vgic_its *its, 
struct its_device *device)
/*
 * If an LPI carries the HW bit, this means that this
 * interrupt is controlled by GICv4, and we do not
-* have direct access to that state. Let's simply fail
-* the save operation...
+* have direct access to that state without GICv4.1.
+* Let's simply fail the save operation...
 */
-   if (ite->irq->hw)
+   if (ite->irq->hw && !kvm_vgic_global_state.has_gicv4_1)
return -EACCES;
 
ret = vgic_its_save_ite(its, device, ite, gpa, ite_esz);
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v1 2/4] KVM: arm64: GICv4.1: Try to save hw pending state in save_pending_tables

2020-11-23 Thread Shenming Lu
After pausing all vCPUs and devices capable of interrupting, in order
to save the information of all interrupts, besides flushing the pending
states in kvm’s vgic, we also try to flush the states of VLPIs in the
virtual pending tables into guest RAM, but we need to have GICv4.1 and
safely unmap the vPEs first.

Signed-off-by: Shenming Lu 
---
 arch/arm64/kvm/vgic/vgic-v3.c | 62 +++
 1 file changed, 56 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 9cdf39a94a63..e1b3aa4b2b12 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -1,6 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -356,6 +358,39 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, 
struct vgic_irq *irq)
return 0;
 }
 
+/*
+ * With GICv4.1, we can get the VLPI's pending state after unmapping
+ * the vPE. The deactivation of the doorbell interrupt will trigger
+ * the unmapping of the associated vPE.
+ */
+static void get_vlpi_state_pre(struct vgic_dist *dist)
+{
+   struct irq_desc *desc;
+   int i;
+
+   if (!kvm_vgic_global_state.has_gicv4_1)
+   return;
+
+   for (i = 0; i < dist->its_vm.nr_vpes; i++) {
+   desc = irq_to_desc(dist->its_vm.vpes[i]->irq);
+   irq_domain_deactivate_irq(irq_desc_get_irq_data(desc));
+   }
+}
+
+static void get_vlpi_state_post(struct vgic_dist *dist)
+{
+   struct irq_desc *desc;
+   int i;
+
+   if (!kvm_vgic_global_state.has_gicv4_1)
+   return;
+
+   for (i = 0; i < dist->its_vm.nr_vpes; i++) {
+   desc = irq_to_desc(dist->its_vm.vpes[i]->irq);
+   irq_domain_activate_irq(irq_desc_get_irq_data(desc), false);
+   }
+}
+
 /**
  * vgic_v3_save_pending_tables - Save the pending tables into guest RAM
  * kvm lock and all vcpu lock must be held
@@ -365,14 +400,17 @@ int vgic_v3_save_pending_tables(struct kvm *kvm)
struct vgic_dist *dist = >arch.vgic;
struct vgic_irq *irq;
gpa_t last_ptr = ~(gpa_t)0;
-   int ret;
+   int ret = 0;
u8 val;
 
+   get_vlpi_state_pre(dist);
+
list_for_each_entry(irq, >lpi_list_head, lpi_list) {
int byte_offset, bit_nr;
struct kvm_vcpu *vcpu;
gpa_t pendbase, ptr;
bool stored;
+   bool is_pending = irq->pending_latch;
 
vcpu = irq->target_vcpu;
if (!vcpu)
@@ -387,24 +425,36 @@ int vgic_v3_save_pending_tables(struct kvm *kvm)
if (ptr != last_ptr) {
ret = kvm_read_guest_lock(kvm, ptr, , 1);
if (ret)
-   return ret;
+   goto out;
last_ptr = ptr;
}
 
stored = val & (1U << bit_nr);
-   if (stored == irq->pending_latch)
+
+   /* also flush hw pending state */
+   if (irq->hw) {
+   WARN_RATELIMIT(irq_get_irqchip_state(irq->host_irq,
+   IRQCHIP_STATE_PENDING, 
_pending),
+  "IRQ %d", irq->host_irq);
+   }
+
+   if (stored == is_pending)
continue;
 
-   if (irq->pending_latch)
+   if (is_pending)
val |= 1 << bit_nr;
else
val &= ~(1 << bit_nr);
 
ret = kvm_write_guest_lock(kvm, ptr, , 1);
if (ret)
-   return ret;
+   goto out;
}
-   return 0;
+
+out:
+   get_vlpi_state_post(dist);
+
+   return ret;
 }
 
 /**
-- 
2.23.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm