Re: [PATCH 3/4] iommu/virtio: Add event queue

2018-02-21 Thread kbuild test robot
Hi Jean-Philippe,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180221]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jean-Philippe-Brucker/Add-virtio-iommu-driver/20180217-075417
config: parisc-allmodconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=parisc 

All errors (new ones prefixed by >>):

   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
   (.init.text+0x24): undefined reference to `register_virtio_driver'
   drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
   (.text.viommu_send_reqs_sync+0xdc): undefined reference to 
`virtqueue_add_sgs'
   (.text.viommu_send_reqs_sync+0x12c): undefined reference to `virtqueue_kick'
   (.text.viommu_send_reqs_sync+0x29c): undefined reference to 
`virtqueue_get_buf'
   drivers/iommu/virtio-iommu.o: In function `viommu_event_handler':
>> (.text.viommu_event_handler+0x288): undefined reference to 
>> `virtqueue_add_inbuf'
>> (.text.viommu_event_handler+0x2a8): undefined reference to 
>> `virtqueue_get_buf'
>> (.text.viommu_event_handler+0x2b8): undefined reference to `virtqueue_kick'
   drivers/iommu/virtio-iommu.o: In function `viommu_probe':
   (.text.viommu_probe+0x1a0): undefined reference to 
`virtio_check_driver_offered_feature'
   (.text.viommu_probe+0x248): undefined reference to 
`virtio_check_driver_offered_feature'
   (.text.viommu_probe+0x2ec): undefined reference to 
`virtio_check_driver_offered_feature'
   (.text.viommu_probe+0x328): undefined reference to 
`virtio_check_driver_offered_feature'
>> (.text.viommu_probe+0x428): undefined reference to `virtqueue_add_inbuf'
>> (.text.viommu_probe+0x440): undefined reference to `virtqueue_kick'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_exit':
   (.exit.text+0x18): undefined reference to `unregister_virtio_driver'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-21 Thread kbuild test robot
Hi Jean-Philippe,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180221]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jean-Philippe-Brucker/Add-virtio-iommu-driver/20180217-075417
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
>> virtio-iommu.c:(.text+0xa82): undefined reference to `virtqueue_add_sgs'
>> virtio-iommu.c:(.text+0xb52): undefined reference to `virtqueue_kick'
>> virtio-iommu.c:(.text+0xd82): undefined reference to `virtqueue_get_buf'
   drivers/iommu/virtio-iommu.o: In function `viommu_probe':
>> virtio-iommu.c:(.text+0x23f2): undefined reference to 
>> `virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0x2572): undefined reference to 
`virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0x26f2): undefined reference to 
`virtio_check_driver_offered_feature'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
>> virtio-iommu.c:(.init.text+0x22): undefined reference to 
>> `register_virtio_driver'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-21 Thread kbuild test robot
Hi Jean-Philippe,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180221]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jean-Philippe-Brucker/Add-virtio-iommu-driver/20180217-075417
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   aarch64-linux-gnu-ld: arch/arm64/kernel/head.o: relocation R_AARCH64_ABS32 
against `_kernel_offset_le_lo32' can not be used when making a shared object
   arch/arm64/kernel/head.o: In function `kimage_vaddr':
   (.idmap.text+0x0): dangerous relocation: unsupported relocation
   arch/arm64/kernel/head.o: In function `__primary_switch':
   (.idmap.text+0x340): dangerous relocation: unsupported relocation
   (.idmap.text+0x348): dangerous relocation: unsupported relocation
   drivers/iommu/virtio-iommu.o: In function `viommu_probe':
   virtio-iommu.c:(.text+0xbdc): undefined reference to 
`virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0xcfc): undefined reference to 
`virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0xe10): undefined reference to 
`virtio_check_driver_offered_feature'
   drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
   virtio-iommu.c:(.text+0x130c): undefined reference to `virtqueue_add_sgs'
   virtio-iommu.c:(.text+0x1398): undefined reference to `virtqueue_kick'
   virtio-iommu.c:(.text+0x14d4): undefined reference to `virtqueue_get_buf'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
   virtio-iommu.c:(.init.text+0x1c): undefined reference to 
`register_virtio_driver'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_exit':
>> virtio-iommu.c:(.exit.text+0x14): undefined reference to 
>> `unregister_virtio_driver'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 13/40] KVM: arm64: Introduce VHE-specific kvm_vcpu_run

2018-02-21 Thread Andrew Jones
On Wed, Feb 21, 2018 at 06:43:00PM +0100, Andrew Jones wrote:
> On Thu, Feb 15, 2018 at 10:03:05PM +0100, Christoffer Dall wrote:
> > So far this is mostly (see below) a copy of the legacy non-VHE switch
> > function, but we will start reworking these functions in separate
> > directions to work on VHE and non-VHE in the most optimal way in later
> > patches.
> > 
> > The only difference after this patch between the VHE and non-VHE run
> > functions is that we omit the branch-predictor variant-2 hardening for
> > QC Falkor CPUs, because this workaround is specific to a series of
> > non-VHE ARMv8.0 CPUs.
> > 
> > Reviewed-by: Marc Zyngier 
> > Signed-off-by: Christoffer Dall 
> > ---
> > 
> > Notes:
> > Changes since v3:
> >  - Added BUG() to 32-bit ARM VHE run function
> >  - Omitted QC Falkor BP Hardening functionality from VHE-specific
> >function
> > 
> > Changes since v2:
> >  - Reworded commit message
> > 
> > Changes since v1:
> >  - Rename kvm_vcpu_run to kvm_vcpu_run_vhe and rename __kvm_vcpu_run to
> >__kvm_vcpu_run_nvhe
> >  - Removed stray whitespace line
> > 
> >  arch/arm/include/asm/kvm_asm.h   |  5 ++-
> >  arch/arm/kvm/hyp/switch.c|  2 +-
> >  arch/arm64/include/asm/kvm_asm.h |  4 ++-
> >  arch/arm64/kvm/hyp/switch.c  | 66 
> > +++-
> >  virt/kvm/arm/arm.c   |  5 ++-
> >  5 files changed, 77 insertions(+), 5 deletions(-)
> > 
> 
> ...
> 
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 2062d9357971..5bd879c78951 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -736,7 +736,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> > struct kvm_run *run)
> > if (has_vhe())
> > kvm_arm_vhe_guest_enter();
> >  
> > -   ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> > +   if (has_vhe())
> > +   ret = kvm_vcpu_run_vhe(vcpu);
> > +   else
> > +   ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
> >  
> > if (has_vhe())
> > kvm_arm_vhe_guest_exit();
> 
> We can combine these has_vhe()'s
> 
>  if (has_vhe()) {
> kvm_arm_vhe_guest_enter();
> ret = kvm_vcpu_run_vhe(vcpu);
> kvm_arm_vhe_guest_exit();
>  } else
> ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);

Maybe even do a cleanup patch that removes
kvm_arm_vhe_guest_enter/exit by putting the daif
masking/restoring directly into kvm_vcpu_run_vhe()?

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 34/40] KVM: arm64: Cleanup __activate_traps and __deactive_traps for VHE and non-VHE

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:26 +,
Christoffer Dall wrote:
> 
> To make the code more readable and to avoid the overhead of a function
> call, let's get rid of a pair of the alternative function selectors and
> explicitly call the VHE and non-VHE functions using the has_vhe() static
> key based selector instead, telling the compiler to try to inline the
> static function if it can.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/kvm/hyp/switch.c | 30 ++
>  1 file changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 5e94955b89ea..0e54fe2aab1c 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -85,7 +85,7 @@ static void __hyp_text __deactivate_traps_common(void)
>   write_sysreg(0, pmuserenr_el0);
>  }
>  
> -static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> +static inline void activate_traps_vhe(struct kvm_vcpu *vcpu)
>  {
>   u64 val;
>  
> @@ -97,7 +97,7 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu 
> *vcpu)
>   write_sysreg(kvm_get_hyp_vector(), vbar_el1);
>  }
>  
> -static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> +static inline void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)

Having both inline and __hyp_text does not make much sense (the
function will be emitted in the section defined by the caller). I
suggest you drop the inline and let the compiler do its magic.

>  {
>   u64 val;
>  
> @@ -108,11 +108,7 @@ static void __hyp_text __activate_traps_nvhe(struct 
> kvm_vcpu *vcpu)
>   write_sysreg(val, cptr_el2);
>  }
>  
> -static hyp_alternate_select(__activate_traps_arch,
> - __activate_traps_nvhe, __activate_traps_vhe,
> - ARM64_HAS_VIRT_HOST_EXTN);
> -
> -static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
> +static inline void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)

Same here, and in other spots in this file.

>  {
>   u64 hcr = vcpu->arch.hcr_el2;
>  
> @@ -122,10 +118,13 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
> *vcpu)
>   write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
>  
>   __activate_traps_fpsimd32(vcpu);
> - __activate_traps_arch()(vcpu);
> + if (has_vhe())
> + activate_traps_vhe(vcpu);
> + else
> + __activate_traps_nvhe(vcpu);
>  }
>  
> -static void __hyp_text __deactivate_traps_vhe(void)
> +static inline void deactivate_traps_vhe(void)
>  {
>   extern char vectors[];  /* kernel exception vectors */
>   write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
> @@ -133,7 +132,7 @@ static void __hyp_text __deactivate_traps_vhe(void)
>   write_sysreg(vectors, vbar_el1);
>  }
>  
> -static void __hyp_text __deactivate_traps_nvhe(void)
> +static inline void __hyp_text __deactivate_traps_nvhe(void)
>  {
>   u64 mdcr_el2 = read_sysreg(mdcr_el2);
>  
> @@ -147,11 +146,7 @@ static void __hyp_text __deactivate_traps_nvhe(void)
>   write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
>  }
>  
> -static hyp_alternate_select(__deactivate_traps_arch,
> - __deactivate_traps_nvhe, __deactivate_traps_vhe,
> - ARM64_HAS_VIRT_HOST_EXTN);
> -
> -static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
> +static inline void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
>  {
>   /*
>* If we pended a virtual abort, preserve it until it gets
> @@ -162,7 +157,10 @@ static void __hyp_text __deactivate_traps(struct 
> kvm_vcpu *vcpu)
>   if (vcpu->arch.hcr_el2 & HCR_VSE)
>   vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
>  
> - __deactivate_traps_arch()();
> + if (has_vhe())
> + deactivate_traps_vhe();
> + else
> + __deactivate_traps_nvhe();
>  }
>  
>  void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
> -- 
> 2.14.2
> 

Thanks,

M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 33/40] KVM: arm64: Configure c15, PMU, and debug register traps on cpu load/put for VHE

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:25 +,
Christoffer Dall wrote:
> 
> We do not have to change the c15 trap setting on each switch to/from the
> guest on VHE systems, because this setting only affects EL0.

Did you mean EL1 instead?

> 
> The PMU and debug trap configuration can also be done on vcpu load/put
> instead, because they don't affect how the host kernel can access the
> debug registers while executing KVM kernel code.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/include/asm/kvm_hyp.h |  3 +++
>  arch/arm64/kvm/hyp/switch.c  | 31 ++-
>  arch/arm64/kvm/hyp/sysreg-sr.c   |  4 
>  3 files changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> b/arch/arm64/include/asm/kvm_hyp.h
> index 2b1fda90dde4..949f2e77ae58 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -147,6 +147,9 @@ void __fpsimd_save_state(struct user_fpsimd_state 
> *fp_regs);
>  void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
>  bool __fpsimd_enabled(void);
>  
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
> +void deactivate_traps_vhe_put(void);
> +
>  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>  void __noreturn __hyp_do_panic(unsigned long, ...);
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 9c40e203bd09..5e94955b89ea 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -101,6 +101,8 @@ static void __hyp_text __activate_traps_nvhe(struct 
> kvm_vcpu *vcpu)
>  {
>   u64 val;
>  
> + __activate_traps_common(vcpu);
> +
>   val = CPTR_EL2_DEFAULT;
>   val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
>   write_sysreg(val, cptr_el2);
> @@ -120,20 +122,12 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
> *vcpu)
>   write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
>  
>   __activate_traps_fpsimd32(vcpu);
> - __activate_traps_common(vcpu);
>   __activate_traps_arch()(vcpu);
>  }
>  
>  static void __hyp_text __deactivate_traps_vhe(void)
>  {
>   extern char vectors[];  /* kernel exception vectors */
> - u64 mdcr_el2 = read_sysreg(mdcr_el2);
> -
> - mdcr_el2 &= MDCR_EL2_HPMN_MASK |
> - MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
> - MDCR_EL2_TPMS;
> -
> - write_sysreg(mdcr_el2, mdcr_el2);
>   write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
>   write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
>   write_sysreg(vectors, vbar_el1);
> @@ -143,6 +137,8 @@ static void __hyp_text __deactivate_traps_nvhe(void)
>  {
>   u64 mdcr_el2 = read_sysreg(mdcr_el2);
>  
> + __deactivate_traps_common();
> +
>   mdcr_el2 &= MDCR_EL2_HPMN_MASK;
>   mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
>  
> @@ -166,10 +162,27 @@ static void __hyp_text __deactivate_traps(struct 
> kvm_vcpu *vcpu)
>   if (vcpu->arch.hcr_el2 & HCR_VSE)
>   vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
>  
> - __deactivate_traps_common();
>   __deactivate_traps_arch()();
>  }
>  
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
> +{
> + __activate_traps_common(vcpu);
> +}
> +
> +void deactivate_traps_vhe_put(void)
> +{
> + u64 mdcr_el2 = read_sysreg(mdcr_el2);
> +
> + mdcr_el2 &= MDCR_EL2_HPMN_MASK |
> + MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
> + MDCR_EL2_TPMS;
> +
> + write_sysreg(mdcr_el2, mdcr_el2);
> +
> + __deactivate_traps_common();
> +}
> +
>  static void __hyp_text __activate_vm(struct kvm *kvm)
>  {
>   write_sysreg(kvm->arch.vttbr, vttbr_el2);
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index aacba4636871..b3894df6bf1a 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -254,6 +254,8 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   __sysreg_restore_el1_state(guest_ctxt);
>  
>   vcpu->arch.sysregs_loaded_on_cpu = true;
> +
> + activate_traps_vhe_load(vcpu);
>  }
>  
>  /**
> @@ -275,6 +277,8 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>   if (!has_vhe())
>   return;
>  
> + deactivate_traps_vhe_put();
> +
>   __sysreg_save_el1_state(guest_ctxt);
>   __sysreg_save_user_state(guest_ctxt);
>   __sysreg32_save_state(vcpu);
> -- 
> 2.14.2
> 

I must admit that I find these two layers of trap configuration mildly
confusing. I can see why it is done like this (there is hardly any
other way), but still...

Reviewed-by: Marc Zyngier 

M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 32/40] KVM: arm64: Directly call VHE and non-VHE FPSIMD enabled functions

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:24 +,
Christoffer Dall wrote:
> 
> There is no longer a need for an alternative to choose the right
> function to tell us whether or not FPSIMD was enabled for the VM,
> because we can simply cann the appropriate functions directly fromwithin

from within

> the _vhe and _nvhe run functions.
> 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v3:
>  - New patch since we no longer defer FPSIMD handling to load/put
> 
>  arch/arm64/kvm/hyp/switch.c | 15 +++
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 17e3c6f26a34..9c40e203bd09 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -33,20 +33,11 @@ static bool __hyp_text __fpsimd_enabled_nvhe(void)
>   return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
>  }
>  
> -static bool __hyp_text __fpsimd_enabled_vhe(void)
> +static bool fpsimd_enabled_vhe(void)
>  {
>   return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
>  }
>  
> -static hyp_alternate_select(__fpsimd_is_enabled,
> - __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> - ARM64_HAS_VIRT_HOST_EXTN);
> -
> -bool __hyp_text __fpsimd_enabled(void)
> -{
> - return __fpsimd_is_enabled()();
> -}
> -
>  /* Save the 32-bit only FPSIMD system register state */
>  static inline void __hyp_text __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu)
>  {
> @@ -413,7 +404,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   /* And we're baaack! */
>   } while (fixup_guest_exit(vcpu, _code));
>  
> - fp_enabled = __fpsimd_enabled();
> + fp_enabled = fpsimd_enabled_vhe();
>  
>   sysreg_save_guest_state_vhe(guest_ctxt);
>   __vgic_save_state(vcpu);
> @@ -478,7 +469,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>   __qcom_hyp_sanitize_btac_predictors();
>   }
>  
> - fp_enabled = __fpsimd_enabled();
> + fp_enabled = __fpsimd_enabled_nvhe();
>  
>   __sysreg_save_state_nvhe(guest_ctxt);
>   __sysreg32_save_state(vcpu);
> -- 
> 2.14.2
> 

Reviewed-by: Marc Zyngier 

M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 31/40] KVM: arm64: Move common VHE/non-VHE trap config in separate functions

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:23 +,
Christoffer Dall wrote:
> 
> As we are about to be more lazy with some of the trap configuration
> register read/writes for VHE systems, move the logic that is currently
> shared between VHE and non-VHE into a separate function which can be
> called from either the world-switch path or from vcpu_load/vcpu_put.
> 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v3:
>  - Separate fpsimd32 trap configuration into a separate function
>which is still called from __activate_traps, because we no longer
>defer saving/restoring of VFP registers to load/put.
> 
>  arch/arm64/kvm/hyp/switch.c | 76 
> +++--
>  1 file changed, 45 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 909aa3fe9196..17e3c6f26a34 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -56,7 +56,45 @@ static inline void __hyp_text __fpsimd_save_fpexc32(struct 
> kvm_vcpu *vcpu)
>   vcpu->arch.ctxt.sys_regs[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
>  }
>  
> -static void __hyp_text __activate_traps_vhe(void)
> +static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
> +{
> + /*
> +  * We are about to set CPTR_EL2.TFP to trap all floating point
> +  * register accesses to EL2, however, the ARM ARM clearly states that
> +  * traps are only taken to EL2 if the operation would not otherwise
> +  * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> +  * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> +  * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
> +  * it will cause an exception.
> +  */
> + if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
> + write_sysreg(1 << 30, fpexc32_el2);
> + isb();
> + }
> +}
> +
> +static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> +{
> + /* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
> + write_sysreg(1 << 15, hstr_el2);
> + /*
> +  * Make sure we trap PMU access from EL0 to EL2. Also sanitize
> +  * PMSELR_EL0 to make sure it never contains the cycle
> +  * counter, which could make a PMXEVCNTR_EL0 access UNDEF at
> +  * EL1 instead of being trapped to EL2.
> +  */
> + write_sysreg(0, pmselr_el0);
> + write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
> + write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
> +}
> +
> +static void __hyp_text __deactivate_traps_common(void)
> +{
> + write_sysreg(0, hstr_el2);
> + write_sysreg(0, pmuserenr_el0);
> +}
> +
> +static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
>  {
>   u64 val;
>  
> @@ -68,7 +106,7 @@ static void __hyp_text __activate_traps_vhe(void)
>   write_sysreg(kvm_get_hyp_vector(), vbar_el1);
>  }
>  
> -static void __hyp_text __activate_traps_nvhe(void)
> +static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)

I have the ugly feeling that this hunk should not be in this
patch. Have you tried bisecting the compilation of this series?

>  {
>   u64 val;
>  
> @@ -85,37 +123,14 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
> *vcpu)
>  {
>   u64 hcr = vcpu->arch.hcr_el2;
>  
> - /*
> -  * We are about to set CPTR_EL2.TFP to trap all floating point
> -  * register accesses to EL2, however, the ARM ARM clearly states that
> -  * traps are only taken to EL2 if the operation would not otherwise
> -  * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> -  * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> -  * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
> -  * it will cause an exception.
> -  */
> - if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
> - write_sysreg(1 << 30, fpexc32_el2);
> - isb();
> - }
> + write_sysreg(hcr, hcr_el2);
>  
>   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
>   write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
>  
> - write_sysreg(hcr, hcr_el2);
> -
> - /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
> - write_sysreg(1 << 15, hstr_el2);
> - /*
> -  * Make sure we trap PMU access from EL0 to EL2. Also sanitize
> -  * PMSELR_EL0 to make sure it never contains the cycle
> -  * counter, which could make a PMXEVCNTR_EL0 access UNDEF at
> -  * EL1 instead of being trapped to EL2.
> -  */
> - write_sysreg(0, pmselr_el0);
> - write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
> - write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
> - __activate_traps_arch()();
> + __activate_traps_fpsimd32(vcpu);
> + __activate_traps_common(vcpu);
> + __activate_traps_arch()(vcpu);
>  

Re: [PATCH v4 10/40] KVM: arm64: Slightly improve debug save/restore functions

2018-02-21 Thread Marc Zyngier
On 21/02/18 17:39, Andrew Jones wrote:
> On Thu, Feb 15, 2018 at 10:03:02PM +0100, Christoffer Dall wrote:
>> The debug save/restore functions can be improved by using the has_vhe()
>> static key instead of the instruction alternative.  Using the static key
>> uses the same paradigm as we're going to use elsewhere, it makes the
>> code more readable, and it generates slightly better code (no
>> stack setups and function calls unless necessary).
>>
>> We also use a static key on the restore path, because it will be
>> marginally faster than loading a value from memory.
>>
>> Finally, we don't have to conditionally clear the debug dirty flag if
>> it's set, we can just clear it.
>>
>> Reviewed-by: Marc Zyngier 
>> Signed-off-by: Christoffer Dall 
>> ---
>>
>> Notes:
>> Changes since v1:
>>  - Change dot to comma in comment
>>  - Rename __debug_restore_spe to __debug_restore_spe_nvhe
>>
>>  arch/arm64/kvm/hyp/debug-sr.c | 26 --
>>  1 file changed, 12 insertions(+), 14 deletions(-)
>>
> 
> Maybe after this series is merged, if there are any hyp_alternate_select's
> left, we can replace all the remaining ones with has_vhe() and then just
> completely remove hyp_alternate_select.

Note that older compilers (such as GCC 4.8) will generate horrible code
with static keys, as they do not support "asm goto". Not that I want to
preserve the home brew hyp_alternate_select mechanism, but I just want
to make it plain that some distros will definitely suffer from the
transition.

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 22/40] KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:03:14PM +0100, Christoffer Dall wrote:
> On non-VHE systems we need to save the ELR_EL2 and SPSR_EL2 so that we can
> return to the host in EL1 in the same state and location where we issued a
> hypercall to EL2, but on VHE ELR_EL2 and SPSR_EL2 are not useful because we
> never enter a guest as a result of an exception entry that would be directly
> handled by KVM. The kernel entry code already saves ELR_EL1/SPSR_EL1 on
> exception entry, which is enough.  Therefore, factor out these registers into
> separate save/restore functions, making it easy to exclude them from the VHE
> world-switch path later on.
> 
> Reviewed-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/kvm/hyp/sysreg-sr.c | 13 +
>  1 file changed, 13 insertions(+)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 20/40] KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:03:12PM +0100, Christoffer Dall wrote:
> The comment only applied to SPE on non-VHE systems, so we simply remove
> it.
> 
> Suggested-by: Andrew Jones 
> Acked-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/kvm/hyp/switch.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 717c8f5c6be7..a99224996e33 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -410,10 +410,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   __fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
>   }
>  
> - /*
> -  * This must come after restoring the host sysregs, since a non-VHE
> -  * system may enable SPE here and make use of the TTBRs.
> -  */
>   __debug_switch_to_host(vcpu);
>  
>   return exit_code;
> -- 
> 2.14.2
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 18/40] KVM: arm64: Rewrite sysreg alternatives to static keys

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:03:10PM +0100, Christoffer Dall wrote:
> As we are about to move calls around in the sysreg save/restore logic,
> let's first rewrite the alternative function callers, because it is
> going to make the next patches much easier to read.
> 
> Acked-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/kvm/hyp/sysreg-sr.c | 17 -
>  1 file changed, 4 insertions(+), 13 deletions(-)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 16/40] KVM: arm64: Remove noop calls to timer save/restore from VHE switch

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:03:08PM +0100, Christoffer Dall wrote:
> The VHE switch function calls __timer_enable_traps and
> __timer_disable_traps which don't do anything on VHE systems.
> Therefore, simply remove these calls from the VHE switch function and
> make the functions non-conditional as they are now only called from the
> non-VHE switch path.
> 
> Acked-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v2:
>  - Added comment explaining the timer enable/disable functions
>are for !VHE only.
> 
>  arch/arm64/kvm/hyp/switch.c |  2 --
>  virt/kvm/arm/hyp/timer-sr.c | 44 ++--
>  2 files changed, 22 insertions(+), 24 deletions(-)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 13/40] KVM: arm64: Introduce VHE-specific kvm_vcpu_run

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:03:05PM +0100, Christoffer Dall wrote:
> So far this is mostly (see below) a copy of the legacy non-VHE switch
> function, but we will start reworking these functions in separate
> directions to work on VHE and non-VHE in the most optimal way in later
> patches.
> 
> The only difference after this patch between the VHE and non-VHE run
> functions is that we omit the branch-predictor variant-2 hardening for
> QC Falkor CPUs, because this workaround is specific to a series of
> non-VHE ARMv8.0 CPUs.
> 
> Reviewed-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v3:
>  - Added BUG() to 32-bit ARM VHE run function
>  - Omitted QC Falkor BP Hardening functionality from VHE-specific
>function
> 
> Changes since v2:
>  - Reworded commit message
> 
> Changes since v1:
>  - Rename kvm_vcpu_run to kvm_vcpu_run_vhe and rename __kvm_vcpu_run to
>__kvm_vcpu_run_nvhe
>  - Removed stray whitespace line
> 
>  arch/arm/include/asm/kvm_asm.h   |  5 ++-
>  arch/arm/kvm/hyp/switch.c|  2 +-
>  arch/arm64/include/asm/kvm_asm.h |  4 ++-
>  arch/arm64/kvm/hyp/switch.c  | 66 
> +++-
>  virt/kvm/arm/arm.c   |  5 ++-
>  5 files changed, 77 insertions(+), 5 deletions(-)
> 

...

> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 2062d9357971..5bd879c78951 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -736,7 +736,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>   if (has_vhe())
>   kvm_arm_vhe_guest_enter();
>  
> - ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> + if (has_vhe())
> + ret = kvm_vcpu_run_vhe(vcpu);
> + else
> + ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
>  
>   if (has_vhe())
>   kvm_arm_vhe_guest_exit();

We can combine these has_vhe()'s

 if (has_vhe()) {
kvm_arm_vhe_guest_enter();
ret = kvm_vcpu_run_vhe(vcpu);
kvm_arm_vhe_guest_exit();
 } else
ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 12/40] KVM: arm64: Factor out fault info population and gic workarounds

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:03:04PM +0100, Christoffer Dall wrote:
> The current world-switch function has functionality to detect a number
> of cases where we need to fixup some part of the exit condition and
> possibly run the guest again, before having restored the host state.
> 
> This includes populating missing fault info, emulating GICv2 CPU
> interface accesses when mapped at unaligned addresses, and emulating
> the GICv3 CPU interface on systems that need it.
> 
> As we are about to have an alternative switch function for VHE systems,
> but VHE systems still need the same early fixup logic, factor out this
> logic into a separate function that can be shared by both switch
> functions.
> 
> No functional change.
> 
> Reviewed-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/kvm/hyp/switch.c | 104 
> 
>  1 file changed, 57 insertions(+), 47 deletions(-)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 11/40] KVM: arm64: Improve debug register save/restore flow

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:03:03PM +0100, Christoffer Dall wrote:
> Instead of having multiple calls from the world switch path to the debug
> logic, each figuring out if the dirty bit is set and if we should
> save/restore the debug registers, let's just provide two hooks to the
> debug save/restore functionality, one for switching to the guest
> context, and one for switching to the host context, and we get the
> benefit of only having to evaluate the dirty flag once on each path,
> plus we give the compiler some more room to inline some of this
> functionality.
> 
> Reviewed-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v1:
>  - Remove leading underscores from local variables
> 
>  arch/arm64/include/asm/kvm_hyp.h | 10 ++-
>  arch/arm64/kvm/hyp/debug-sr.c| 56 
> +++-
>  arch/arm64/kvm/hyp/switch.c  |  6 ++---
>  3 files changed, 42 insertions(+), 30 deletions(-)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 09/40] KVM: arm64: Move debug dirty flag calculation out of world switch

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:03:01PM +0100, Christoffer Dall wrote:
> There is no need to figure out inside the world-switch if we should
> save/restore the debug registers or not, we might as well do that in the
> higher level debug setup code, making it easier to optimize down the
> line.
> 
> Reviewed-by: Julien Thierry 
> Reviewed-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/kvm/debug.c| 5 +
>  arch/arm64/kvm/hyp/debug-sr.c | 6 --
>  2 files changed, 5 insertions(+), 6 deletions(-)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 08/40] KVM: arm/arm64: Introduce vcpu_el1_is_32bit

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:03:00PM +0100, Christoffer Dall wrote:
> We have numerous checks around that checks if the HCR_EL2 has the RW bit
> set to figure out if we're running an AArch64 or AArch32 VM.  In some
> cases, directly checking the RW bit (given its unintuitive name), is a
> bit confusing, and that's not going to improve as we move logic around
> for the following patches that optimize KVM on AArch64 hosts with VHE.
> 
> Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
> direct checks of HCR_EL2.RW with the helper.
> 
> Reviewed-by: Julien Grall 
> Reviewed-by: Julien Thierry 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v2:
>  - New patch
> 
> Changes since v1:
>  - Reworded comments as suggested by Drew
> 
>  arch/arm64/include/asm/kvm_emulate.h |  7 ++-
>  arch/arm64/kvm/hyp/switch.c  | 11 +--
>  arch/arm64/kvm/hyp/sysreg-sr.c   |  5 +++--
>  arch/arm64/kvm/inject_fault.c|  6 +++---
>  4 files changed, 17 insertions(+), 12 deletions(-)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 03/40] KVM: arm64: Avoid storing the vcpu pointer on the stack

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:02:55PM +0100, Christoffer Dall wrote:
> We already have the percpu area for the host cpu state, which points to
> the VCPU, so there's no need to store the VCPU pointer on the stack on
> every context switch.  We can be a little more clever and just use
> tpidr_el2 for the percpu offset and load the VCPU pointer from the host
> context.
> 
> This does require us to calculate the percpu offset without including
> the offset from the kernel mapping of the percpu array to the linear
> mapping of the array (which is what we store in tpidr_el1), because a
> PC-relative generated address in EL2 is already giving us the hyp alias
> of the linear mapping of a kernel address.  We do this in
> __cpu_init_hyp_mode() by using kvm_ksym_ref().
> 
> This change also requires us to have a scratch register, so we take the
> chance to rearrange some of the el1_sync code to only look at the
> vttbr_el2 to determine if this is a trap from the guest or an HVC from
> the host.  We do add an extra check to call the panic code if the kernel
> is configured with debugging enabled and we saw a trap from the host
> which wasn't an HVC, indicating that we left some EL2 trap configured by
> mistake.
> 
> The code that accesses ESR_EL2 was previously using an alternative to
> use the _EL1 accessor on VHE systems, but this was actually unnecessary
> as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2
> accessor does the same thing on both systems.
> 
> Cc: Ard Biesheuvel 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v3:
>  - Reworked the assembly part of the patch after rebasing on v4.16-rc1
>which created a conflict with the variant 2 mitigations.
>  - Removed Marc's reviewed-by due to the rework.
>  - Removed unneeded extern keyword in declaration in header file
> 
> Changes since v1:
>  - Use PC-relative addressing to access per-cpu variables instead of
>using a load from the literal pool.
>  - Remove stale comments as pointed out by Marc
>  - Reworded the commit message as suggested by Drew
> 
>  arch/arm64/include/asm/kvm_asm.h  | 14 ++
>  arch/arm64/include/asm/kvm_host.h | 15 +++
>  arch/arm64/kernel/asm-offsets.c   |  1 +
>  arch/arm64/kvm/hyp/entry.S|  6 +-
>  arch/arm64/kvm/hyp/hyp-entry.S| 31 +--
>  arch/arm64/kvm/hyp/switch.c   |  5 +
>  arch/arm64/kvm/hyp/sysreg-sr.c|  5 +
>  7 files changed, 50 insertions(+), 27 deletions(-)
> 

I'm not clear on the motivation for this patch. I assumed it enabled
simpler patches later in the series, but I did a bit of reading ahead
and didn't see anything obvious. I doubt it gives a speedup, so is it
just to avoid stack use? Making it easier to maintain these assembly
functions that span a couple files? If so, should it be posted separately
from this series? If not, could you please add some more text to the
commit message helping me better understand the full motivation?

Besides my confusion on motivation, it looks good to me

Reviewed-by: Andrew Jones 

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 01/40] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN

2018-02-21 Thread Andrew Jones
On Thu, Feb 15, 2018 at 10:02:53PM +0100, Christoffer Dall wrote:
> Calling vcpu_load() registers preempt notifiers for this vcpu and calls
> kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
> lifting on arm/arm64 and will try to do things such as enabling the
> virtual timer and setting us up to handle interrupts from the timer
> hardware.
> 
> Loading state onto hardware registers and enabling hardware to signal
> interrupts can be problematic when we're not actually about to run the
> VCPU, because it makes it difficult to establish the right context when
> handling interrupts from the timer, and it makes the register access
> code difficult to reason about.
> 
> Luckily, now when we call vcpu_load in each ioctl implementation, we can
> simply remove the call from the non-KVM_RUN vcpu ioctls, and our
> kvm_arch_vcpu_load() is only used for loading vcpu content to the
> physical CPU when we're actually going to run the vcpu.
> 
> Reviewed-by: Julien Grall 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/kvm/guest.c | 3 ---
>  virt/kvm/arm/arm.c | 9 -
>  2 files changed, 12 deletions(-)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Robin Murphy

On 21/02/18 16:14, Shanker Donthineni wrote:
[...]

@@ -1100,6 +1114,20 @@ static int cpu_copy_el2regs(void *__unused)
.enable = cpu_clear_disr,
},
  #endif /* CONFIG_ARM64_RAS_EXTN */
+#ifdef CONFIG_ARM64_SKIP_CACHE_POU
+   {
+   .desc = "DCache clean to POU",


This description is confusing, and sounds like it's describing DC CVAU, rather
than the ability to ellide it. How about:



Sure, I'll take your suggestion.


Can we at least spell "elision" correctly please? ;)

Personally I read DIC and IDC as "D-cache to I-cache coherency" and 
"I-cache to D-cache coherency" respectively (just my interpretation, 
I've not looked into the spec work for any hints of rationale), but out 
loud those do sound so poorly-defined that keeping things in terms of 
the required maintenance probably is better.



.desc = "D-cache maintenance ellision (IDC)"


+   .capability = ARM64_HAS_CACHE_IDC,
+   .def_scope = SCOPE_SYSTEM,
+   .matches = has_cache_idc,
+   },
+   {
+   .desc = "ICache invalidation to POU",


... and correspondingly:

.desc = "I-cache maintenance ellision (DIC)"


+   .capability = ARM64_HAS_CACHE_DIC,
+   .def_scope = SCOPE_SYSTEM,
+   .matches = has_cache_dic,
+   },
+#endif /* CONFIG_ARM64_CACHE_DIC */
{},
  };

[...]

+alternative_if ARM64_HAS_CACHE_DIC
+   isb


Why have we gained an ISB here if DIC is set?



I believe synchronization barrier (ISB) is required here to support 
self-modifying/jump-labels
code.
   

This is for a user address, and I can't see why DIC would imply we need an
extra ISB kernel-side.



This is for user and kernel addresses, alternatives and jumplabel patching logic
calls flush_icache_range().


There's an ISB hidden in invalidate_icache_by_line(), so it probably 
would be unsafe to start implicitly skipping that.



+   b   8f
+alternative_else_nop_endif
invalidate_icache_by_line x0, x1, x2, x3, 9f
-   mov x0, #0
+8: mov x0, #0
  1:
uaccess_ttbr0_disable x1, x2
ret
@@ -80,6 +87,12 @@ ENDPROC(__flush_cache_user_range)
   *- end - virtual end address of region
   */
  ENTRY(invalidate_icache_range)
+alternative_if ARM64_HAS_CACHE_DIC
+   mov x0, xzr
+   dsb ish


Do we actually need a DSB in this case?



I'll remove if everyone agree.

Will, Can you comment on this?


As-is, this function *only* invalidates the I-cache, so we already assume that
the data is visible at the PoU at this point. I don't see what extra gaurantee
we'd need the DSB for.


If so, then ditto for the existing invalidate_icache_by_line() code 
presumably.


Robin.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 30/40] KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:22 +,
Christoffer Dall wrote:
> 
> When running a 32-bit VM (EL1 in AArch32), the AArch32 system registers
> can be deferred to vcpu load/put on VHE systems because neither
> the host kernel nor host userspace uses these registers.
> 
> Note that we can not defer saving DBGVCR32_EL2 conditionally based
> on the state of the debug dirty flag on VHE, but since we do the
> load/put pretty rarely, this comes out as a win anyway.

I'm not sure I understand that comment. We don't have any deferring
for this register, so the load/put reference seems out of place.

> 
> We can also not defer saving FPEXC32_32 because this register only holds
> a guest-valid value for 32-bit guests during the exit path when the
> guest has used FPSIMD registers and restored the register in the early
> assembly handler from taking the EL2 fault, and therefore we have to
> check if fpsimd is enabled for the guest in the exit path and save the
> register then, for both VHE and non-VHE guests.
> 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v3:
>  - Rework the FPEXC32 save/restore logic to no longer attempt to
>save/restore this register lazily.
> 
> Changes since v2:
>  - New patch (deferred register handling has been reworked)
> 
>  arch/arm64/kvm/hyp/switch.c| 17 +++--
>  arch/arm64/kvm/hyp/sysreg-sr.c | 15 ++-
>  2 files changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 22e77deb8e2e..909aa3fe9196 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -47,6 +47,15 @@ bool __hyp_text __fpsimd_enabled(void)
>   return __fpsimd_is_enabled()();
>  }
>  
> +/* Save the 32-bit only FPSIMD system register state */
> +static inline void __hyp_text __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu)
> +{
> + if (!vcpu_el1_is_32bit(vcpu))
> + return;
> +
> + vcpu->arch.ctxt.sys_regs[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
> +}
> +
>  static void __hyp_text __activate_traps_vhe(void)
>  {
>   u64 val;
> @@ -380,11 +389,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  
>   __vgic_restore_state(vcpu);
>  
> - /*
> -  * We must restore the 32-bit state before the sysregs, thanks
> -  * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> -  */
> - __sysreg32_restore_state(vcpu);
>   sysreg_restore_guest_state_vhe(guest_ctxt);
>   __debug_switch_to_guest(vcpu);
>  
> @@ -398,7 +402,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   fp_enabled = __fpsimd_enabled();
>  
>   sysreg_save_guest_state_vhe(guest_ctxt);
> - __sysreg32_save_state(vcpu);
>   __vgic_save_state(vcpu);
>  
>   __deactivate_traps(vcpu);
> @@ -408,6 +411,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   if (fp_enabled) {
>   __fpsimd_save_state(_ctxt->gp_regs.fp_regs);
>   __fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
> + __fpsimd_save_fpexc32(vcpu);
>   }
>  
>   __debug_switch_to_host(vcpu);
> @@ -475,6 +479,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>   if (fp_enabled) {
>   __fpsimd_save_state(_ctxt->gp_regs.fp_regs);
>   __fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
> + __fpsimd_save_fpexc32(vcpu);
>   }
>  
>   /*
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 9c60b8062724..aacba4636871 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -196,10 +196,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu 
> *vcpu)
>   sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
>   sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
>  
> - if (__fpsimd_enabled())
> - sysreg[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
> -
> - if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
> + if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
>   sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
>  }
>  
> @@ -221,7 +218,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu 
> *vcpu)
>   write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
>   write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
>  
> - if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
> + if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
>   write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
>  }
>  
> @@ -246,6 +243,13 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>  
>   __sysreg_save_user_state(host_ctxt);
>  
> + /*
> +  * Load guest EL1 and user state
> +  *
> +  * We must restore the 32-bit state before the sysregs, thanks
> +  * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> +  */
> + __sysreg32_restore_state(vcpu);
>   __sysreg_restore_user_state(guest_ctxt);

Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Shanker Donthineni
Hi Mark,

On 02/21/2018 09:09 AM, Mark Rutland wrote:
> On Wed, Feb 21, 2018 at 07:49:06AM -0600, Shanker Donthineni wrote:
>> The DCache clean & ICache invalidation requirements for instructions
>> to be data coherence are discoverable through new fields in CTR_EL0.
>> The following two control bits DIC and IDC were defined for this
>> purpose. No need to perform point of unification cache maintenance
>> operations from software on systems where CPU caches are transparent.
>>
>> This patch optimize the three functions __flush_cache_user_range(),
>> clean_dcache_area_pou() and invalidate_icache_range() if the hardware
>> reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
>> instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
>> in order to avoid the unnecessary overhead.
>>
>> CTR_EL0.DIC: Instruction cache invalidation requirements for
>>  instruction to data coherence. The meaning of this bit[29].
>>   0: Instruction cache invalidation to the point of unification
>>  is required for instruction to data coherence.
>>   1: Instruction cache cleaning to the point of unification is
>>   not required for instruction to data coherence.
>>
>> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>>  coherence. The meaning of this bit[28].
>>   0: Data cache clean to the point of unification is required for
>>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>>   1: Data cache clean to the point of unification is not required
>>  for instruction to data coherence.
>>
>> Signed-off-by: Philip Elcan 
>> Signed-off-by: Shanker Donthineni 
>> ---
>> Changes since v2:
>>   -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
>>   -Single Kconfig option.
>>
>> Changes since v1:
>>   -Reworded commit text.
>>   -Used the alternatives framework as Catalin suggested.
>>   -Rebased on top of https://patchwork.kernel.org/patch/10227927/
>>
>>  arch/arm64/Kconfig   | 12 
>>  arch/arm64/include/asm/cache.h   |  5 +
>>  arch/arm64/include/asm/cpucaps.h |  4 +++-
>>  arch/arm64/kernel/cpufeature.c   | 40 
>> ++--
>>  arch/arm64/mm/cache.S| 21 +++--
>>  5 files changed, 73 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index f55fe5b..82b8053 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN
>>and access the new registers if the system supports the extension.
>>Platform RAS features may additionally depend on firmware support.
>>  
>> +config ARM64_SKIP_CACHE_POU
>> +bool "Enable support to skip cache POU operations"
>> +default y
>> +help
>> +  Explicit point of unification cache operations can be eliminated
>> +  in software if the hardware handles transparently. The new bits in
>> +  CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
>> +  capabilities of ICache and DCache POU requirements.
>> +
>> +  Selecting this feature will allow the kernel to optimize the POU
>> +  cache maintaince operations where it requires 'D{I}C C{I}VAU'
>> +
>>  endmenu
> 
> Is it worth having a config option for this at all? The savings from turning
> this off seem trivial.
> 
>>  
>>  config ARM64_SVE
>> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>> index ea9bb4e..e22178b 100644
>> --- a/arch/arm64/include/asm/cache.h
>> +++ b/arch/arm64/include/asm/cache.h
>> @@ -20,8 +20,13 @@
>>  
>>  #define CTR_L1IP_SHIFT  14
>>  #define CTR_L1IP_MASK   3
>> +#define CTR_DMLINE_SHIFT16
>> +#define CTR_ERG_SHIFT   20
>>  #define CTR_CWG_SHIFT   24
>>  #define CTR_CWG_MASK15
>> +#define CTR_IDC_SHIFT   28
>> +#define CTR_DIC_SHIFT   29
>> +#define CTR_B31_SHIFT   31
>>  
>>  #define CTR_L1IP(ctr)   (((ctr) >> CTR_L1IP_SHIFT) & 
>> CTR_L1IP_MASK)
>>  
>> diff --git a/arch/arm64/include/asm/cpucaps.h 
>> b/arch/arm64/include/asm/cpucaps.h
>> index bb26382..8dd42ae 100644
>> --- a/arch/arm64/include/asm/cpucaps.h
>> +++ b/arch/arm64/include/asm/cpucaps.h
>> @@ -45,7 +45,9 @@
>>  #define ARM64_HARDEN_BRANCH_PREDICTOR   24
>>  #define ARM64_HARDEN_BP_POST_GUEST_EXIT 25
>>  #define ARM64_HAS_RAS_EXTN  26
>> +#define ARM64_HAS_CACHE_IDC 27
>> +#define ARM64_HAS_CACHE_DIC 28
>>  
>> -#define ARM64_NCAPS 27
>> +#define ARM64_NCAPS 29
>>  
>>  #endif /* __ASM_CPUCAPS_H */
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index ff8a6e9..12e100a 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ 

Re: [PATCH v4 29/40] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:21 +,
Christoffer Dall wrote:
> 
> 32-bit registers are not used by a 64-bit host kernel and can be
> deferred, but we need to rework the accesses to this register to access
> the latest value depending on whether or not guest system registers are
> loaded on the CPU or only reside in memory.
> 
> Signed-off-by: Christoffer Dall 

Ah, much nicer than the "clunky" version! :-)

Reviewed-by: Marc Zyngier 

M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 28/40] KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on VHE

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:20 +,
Christoffer Dall wrote:
> 
> Some system registers do not affect the host kernel's execution and can
> therefore be loaded when we are about to run a VCPU and we don't have to
> restore the host state to the hardware before the time when we are
> actually about to return to userspace or schedule out the VCPU thread.
> 
> The EL1 system registers and the userspace state registers only
> affecting EL0 execution do not need to be saved and restored on every
> switch between the VM and the host, because they don't affect the host
> kernel's execution.
> 
> We mark all registers which are now deffered as such in the
> vcpu_{read,write}_sys_reg accessors in sys-regs.c to ensure the most
> up-to-date copy is always accessed.
> 
> Note MPIDR_EL1 (controlled via VMPIDR_EL2) is accessed from other vcpu
> threads, for example via the GIC emulation, and therefore must be
> declared as immediate, which is fine as the guest cannot modify this
> value.
> 
> The 32-bit sysregs can also be deferred but we do this in a separate
> patch as it requires a bit more infrastructure.
> 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v3:
>  - Changed to switch-based sysreg approach
> 
>  arch/arm64/kvm/hyp/sysreg-sr.c | 39 +++
>  arch/arm64/kvm/sys_regs.c  | 40 
>  2 files changed, 71 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 906606dc4e2c..9c60b8062724 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -25,8 +25,12 @@
>  /*
>   * Non-VHE: Both host and guest must save everything.
>   *
> - * VHE: Host must save tpidr*_el0, mdscr_el1, sp_el0,
> - * and guest must save everything.
> + * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and pstate,
> + * which are handled as part of the el2 return state) on every switch.
> + * tpidr_el0 and tpidrro_el0 only need to be switched when going

How about suspend/resume, which saves/restores both of these EL0
registers (see cpu_do_suspend)? We may not need to do anything (either
because vcpu_put will have happened, or because we'll come back
exactly where we were), but I'd like to make sure this hasn't been
overlooked.

> + * to host userspace or a different VCPU.  EL1 registers only need to be
> + * switched when potentially going to run a different VCPU.  The latter two
> + * classes are handled as part of kvm_arch_vcpu_load and kvm_arch_vcpu_put.
>   */
>  
>  static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context 
> *ctxt)
> @@ -93,14 +97,11 @@ void __hyp_text __sysreg_save_state_nvhe(struct 
> kvm_cpu_context *ctxt)
>  void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt)
>  {
>   __sysreg_save_common_state(ctxt);
> - __sysreg_save_user_state(ctxt);
>  }
>  
>  void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt)
>  {
> - __sysreg_save_el1_state(ctxt);
>   __sysreg_save_common_state(ctxt);
> - __sysreg_save_user_state(ctxt);
>   __sysreg_save_el2_return_state(ctxt);
>  }
>  
> @@ -169,14 +170,11 @@ void __hyp_text __sysreg_restore_state_nvhe(struct 
> kvm_cpu_context *ctxt)
>  void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt)
>  {
>   __sysreg_restore_common_state(ctxt);
> - __sysreg_restore_user_state(ctxt);
>  }
>  
>  void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
>  {
> - __sysreg_restore_el1_state(ctxt);
>   __sysreg_restore_common_state(ctxt);
> - __sysreg_restore_user_state(ctxt);
>   __sysreg_restore_el2_return_state(ctxt);
>  }
>  
> @@ -240,6 +238,18 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu 
> *vcpu)
>   */
>  void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>  {
> + struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> + struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
> +
> + if (!has_vhe())
> + return;
> +
> + __sysreg_save_user_state(host_ctxt);
> +
> + __sysreg_restore_user_state(guest_ctxt);
> + __sysreg_restore_el1_state(guest_ctxt);
> +
> + vcpu->arch.sysregs_loaded_on_cpu = true;
>  }
>  
>  /**
> @@ -255,6 +265,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   */
>  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>  {
> + struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> + struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
> +
> + if (!has_vhe())
> + return;
> +
> + __sysreg_save_el1_state(guest_ctxt);
> + __sysreg_save_user_state(guest_ctxt);
> +
> + /* Restore host user state */
> + __sysreg_restore_user_state(host_ctxt);
> +
> + vcpu->arch.sysregs_loaded_on_cpu = false;
>  }
>  
>  void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 

Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Mark Rutland
On Wed, Feb 21, 2018 at 07:49:06AM -0600, Shanker Donthineni wrote:
> The DCache clean & ICache invalidation requirements for instructions
> to be data coherence are discoverable through new fields in CTR_EL0.
> The following two control bits DIC and IDC were defined for this
> purpose. No need to perform point of unification cache maintenance
> operations from software on systems where CPU caches are transparent.
> 
> This patch optimize the three functions __flush_cache_user_range(),
> clean_dcache_area_pou() and invalidate_icache_range() if the hardware
> reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
> instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
> in order to avoid the unnecessary overhead.
> 
> CTR_EL0.DIC: Instruction cache invalidation requirements for
>  instruction to data coherence. The meaning of this bit[29].
>   0: Instruction cache invalidation to the point of unification
>  is required for instruction to data coherence.
>   1: Instruction cache cleaning to the point of unification is
>   not required for instruction to data coherence.
> 
> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>  coherence. The meaning of this bit[28].
>   0: Data cache clean to the point of unification is required for
>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>   1: Data cache clean to the point of unification is not required
>  for instruction to data coherence.
> 
> Signed-off-by: Philip Elcan 
> Signed-off-by: Shanker Donthineni 
> ---
> Changes since v2:
>   -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
>   -Single Kconfig option.
> 
> Changes since v1:
>   -Reworded commit text.
>   -Used the alternatives framework as Catalin suggested.
>   -Rebased on top of https://patchwork.kernel.org/patch/10227927/
> 
>  arch/arm64/Kconfig   | 12 
>  arch/arm64/include/asm/cache.h   |  5 +
>  arch/arm64/include/asm/cpucaps.h |  4 +++-
>  arch/arm64/kernel/cpufeature.c   | 40 
> ++--
>  arch/arm64/mm/cache.S| 21 +++--
>  5 files changed, 73 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index f55fe5b..82b8053 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN
> and access the new registers if the system supports the extension.
> Platform RAS features may additionally depend on firmware support.
>  
> +config ARM64_SKIP_CACHE_POU
> + bool "Enable support to skip cache POU operations"
> + default y
> + help
> +   Explicit point of unification cache operations can be eliminated
> +   in software if the hardware handles transparently. The new bits in
> +   CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
> +   capabilities of ICache and DCache POU requirements.
> +
> +   Selecting this feature will allow the kernel to optimize the POU
> +   cache maintaince operations where it requires 'D{I}C C{I}VAU'
> +
>  endmenu

Is it worth having a config option for this at all? The savings from turning
this off seem trivial.

>  
>  config ARM64_SVE
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index ea9bb4e..e22178b 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -20,8 +20,13 @@
>  
>  #define CTR_L1IP_SHIFT   14
>  #define CTR_L1IP_MASK3
> +#define CTR_DMLINE_SHIFT 16
> +#define CTR_ERG_SHIFT20
>  #define CTR_CWG_SHIFT24
>  #define CTR_CWG_MASK 15
> +#define CTR_IDC_SHIFT28
> +#define CTR_DIC_SHIFT29
> +#define CTR_B31_SHIFT31
>  
>  #define CTR_L1IP(ctr)(((ctr) >> CTR_L1IP_SHIFT) & 
> CTR_L1IP_MASK)
>  
> diff --git a/arch/arm64/include/asm/cpucaps.h 
> b/arch/arm64/include/asm/cpucaps.h
> index bb26382..8dd42ae 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -45,7 +45,9 @@
>  #define ARM64_HARDEN_BRANCH_PREDICTOR24
>  #define ARM64_HARDEN_BP_POST_GUEST_EXIT  25
>  #define ARM64_HAS_RAS_EXTN   26
> +#define ARM64_HAS_CACHE_IDC  27
> +#define ARM64_HAS_CACHE_DIC  28
>  
> -#define ARM64_NCAPS  27
> +#define ARM64_NCAPS  29
>  
>  #endif /* __ASM_CPUCAPS_H */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index ff8a6e9..12e100a 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void)
>  };
>  
>  static const struct arm64_ftr_bits ftr_ctr[] = {
> 

Re: [PATCH v4 27/40] KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:19 +,
Christoffer Dall wrote:
> 
> ELR_EL1 is not used by a VHE host kernel and can be deferred, but we
> need to rework the accesses to this register to access the latest value
> depending on whether or not guest system registers are loaded on the CPU
> or only reside in memory.
> 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v2:
>  - New patch (deferred register handling has been reworked)
> 
>  arch/arm64/include/asm/kvm_emulate.h | 18 +-
>  arch/arm64/kvm/inject_fault.c|  4 ++--
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 47c2406755fa..9cb13b23c7a1 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -90,11 +90,27 @@ static inline unsigned long *vcpu_pc(const struct 
> kvm_vcpu *vcpu)
>   return (unsigned long *)_gp_regs(vcpu)->regs.pc;
>  }
>  
> -static inline unsigned long *vcpu_elr_el1(const struct kvm_vcpu *vcpu)
> +static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
>  {
>   return (unsigned long *)_gp_regs(vcpu)->elr_el1;
>  }
>  
> +static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
> +{
> + if (vcpu->arch.sysregs_loaded_on_cpu)
> + return read_sysreg_el1(elr);
> + else
> + return *__vcpu_elr_el1(vcpu);
> +}
> +
> +static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned 
> long v)
> +{
> + if (vcpu->arch.sysregs_loaded_on_cpu)
> + write_sysreg_el1(v, elr);
> + else
> + *__vcpu_elr_el1(vcpu) = v;
> +}
> +
>  static inline unsigned long *vcpu_cpsr(const struct kvm_vcpu *vcpu)
>  {
>   return (unsigned long *)_gp_regs(vcpu)->regs.pstate;
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index 8dda1edae727..cc13b6f5ad11 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -67,7 +67,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool 
> is_iabt, unsigned long addr
>   bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
>   u32 esr = 0;
>  
> - *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
> + vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
>   *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
>  
>   *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
> @@ -102,7 +102,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>   unsigned long cpsr = *vcpu_cpsr(vcpu);
>   u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
>  
> - *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
> + vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
>   *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
>  
>   *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
> -- 
> 2.14.2
> 

Reviewed-by: Marc Zyngier 

M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 26/40] KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:18 +,
Christoffer Dall wrote:
> 
> SPSR_EL1 is not used by a VHE host kernel and can be deferred, but we
> need to rework the accesses to this register to access the latest value
> depending on whether or not guest system registers are loaded on the CPU
> or only reside in memory.
> 
> The handling of accessing the various banked SPSRs for 32-bit VMs is a
> bit clunky, but this will be improved in following patches which will
> first prepare and subsequently implement deferred save/restore of the
> 32-bit registers, including the 32-bit SPSRs.
> 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v2:
>  - New patch (deferred register handling has been reworked)
> 
>  arch/arm/include/asm/kvm_emulate.h   | 12 ++-
>  arch/arm/kvm/emulate.c   |  2 +-
>  arch/arm64/include/asm/kvm_emulate.h | 41 
> +++-
>  arch/arm64/kvm/inject_fault.c|  4 ++--
>  virt/kvm/arm/aarch32.c   |  2 +-
>  5 files changed, 51 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h 
> b/arch/arm/include/asm/kvm_emulate.h
> index e27caa4b47a1..6493bd479ddc 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -41,7 +41,17 @@ static inline unsigned long *vcpu_reg32(struct kvm_vcpu 
> *vcpu, u8 reg_num)
>   return vcpu_reg(vcpu, reg_num);
>  }
>  
> -unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
> +unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu);
> +
> +static inline unsigned long vpcu_read_spsr(struct kvm_vcpu *vcpu)
> +{
> + return *__vcpu_spsr(vcpu);
> +}
> +
> +static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
> +{
> + *__vcpu_spsr(vcpu) = v;
> +}
>  
>  static inline unsigned long vcpu_get_reg(struct kvm_vcpu *vcpu,
>u8 reg_num)
> diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
> index fa501bf437f3..9046b53d87c1 100644
> --- a/arch/arm/kvm/emulate.c
> +++ b/arch/arm/kvm/emulate.c
> @@ -142,7 +142,7 @@ unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num)
>  /*
>   * Return the SPSR for the current mode of the virtual CPU.
>   */
> -unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu)
> +unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu)
>  {
>   unsigned long mode = *vcpu_cpsr(vcpu) & MODE_MASK;
>   switch (mode) {
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index d313aaae5c38..47c2406755fa 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -26,6 +26,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -143,13 +144,43 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, 
> u8 reg_num,
>   vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
>  }
>  
> -/* Get vcpu SPSR for current mode */
> -static inline unsigned long *vcpu_spsr(const struct kvm_vcpu *vcpu)
> +static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
>  {
> - if (vcpu_mode_is_32bit(vcpu))
> - return vcpu_spsr32(vcpu);
> + unsigned long *p = (unsigned long 
> *)_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
> +
> + if (vcpu_mode_is_32bit(vcpu)) {
> + unsigned long *p_32bit = vcpu_spsr32(vcpu);
> +
> + /* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
> + if (p_32bit != (unsigned long *)p)
> + return *p_32bit;

Clunky, you said? ;-) p is already an unsigned long *, so there's no
need to cast it.

> + }
> +
> + if (vcpu->arch.sysregs_loaded_on_cpu)
> + return read_sysreg_el1(spsr);
> + else
> + return *p;
> +}
>  
> - return (unsigned long *)_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
> +static inline void vcpu_write_spsr(const struct kvm_vcpu *vcpu, unsigned 
> long v)
> +{
> + unsigned long *p = (unsigned long 
> *)_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
> +
> + /* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
> + if (vcpu_mode_is_32bit(vcpu)) {
> + unsigned long *p_32bit = vcpu_spsr32(vcpu);
> +
> + /* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
> + if (p_32bit != (unsigned long *)p) {
> + *p_32bit = v;
> + return;

Same remark.

> + }
> + }
> +
> + if (vcpu->arch.sysregs_loaded_on_cpu)
> + write_sysreg_el1(v, spsr);
> + else
> + *p = v;
>  }
>  
>  static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index e08db2f2dd75..8dda1edae727 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -71,7 +71,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool 
> is_iabt, unsigned long addr
>   *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
>  

Re: [PATCH v4 25/40] KVM: arm64: Introduce framework for accessing deferred sysregs

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:17 +,
Christoffer Dall wrote:
> 
> We are about to defer saving and restoring some groups of system
> registers to vcpu_put and vcpu_load on supported systems.  This means
> that we need some infrastructure to access system registes which
> supports either accessing the memory backing of the register or directly
> accessing the system registers, depending on the state of the system
> when we access the register.
> 
> We do this by defining read/write accessor functions, which can handle
> both "immediate" and "deferrable" system registers.  Immediate registers
> are always saved/restored in the world-switch path, but deferrable
> registers are only saved/restored in vcpu_put/vcpu_load when supported
> and sysregs_loaded_on_cpu will be set in that case.
> 
> Note that we don't use the deferred mechanism yet in this patch, but only
> introduce infrastructure.  This is to improve convenience of review in
> the subsequent patches where it is clear which registers become
> deferred.
> 
> Signed-off-by: Christoffer Dall 

Reviewed-by: Marc Zyngier 

 M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Shanker Donthineni
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.

This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
 is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
  not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
 instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
 for instruction to data coherence.

Signed-off-by: Philip Elcan 
Signed-off-by: Shanker Donthineni 
---
Changes since v2:
  -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
  -Single Kconfig option.

Changes since v1:
  -Reworded commit text.
  -Used the alternatives framework as Catalin suggested.
  -Rebased on top of https://patchwork.kernel.org/patch/10227927/

 arch/arm64/Kconfig   | 12 
 arch/arm64/include/asm/cache.h   |  5 +
 arch/arm64/include/asm/cpucaps.h |  4 +++-
 arch/arm64/kernel/cpufeature.c   | 40 ++--
 arch/arm64/mm/cache.S| 21 +++--
 5 files changed, 73 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f55fe5b..82b8053 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN
  and access the new registers if the system supports the extension.
  Platform RAS features may additionally depend on firmware support.
 
+config ARM64_SKIP_CACHE_POU
+   bool "Enable support to skip cache POU operations"
+   default y
+   help
+ Explicit point of unification cache operations can be eliminated
+ in software if the hardware handles transparently. The new bits in
+ CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
+ capabilities of ICache and DCache POU requirements.
+
+ Selecting this feature will allow the kernel to optimize the POU
+ cache maintaince operations where it requires 'D{I}C C{I}VAU'
+
 endmenu
 
 config ARM64_SVE
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index ea9bb4e..e22178b 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -20,8 +20,13 @@
 
 #define CTR_L1IP_SHIFT 14
 #define CTR_L1IP_MASK  3
+#define CTR_DMLINE_SHIFT   16
+#define CTR_ERG_SHIFT  20
 #define CTR_CWG_SHIFT  24
 #define CTR_CWG_MASK   15
+#define CTR_IDC_SHIFT  28
+#define CTR_DIC_SHIFT  29
+#define CTR_B31_SHIFT  31
 
 #define CTR_L1IP(ctr)  (((ctr) >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK)
 
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index bb26382..8dd42ae 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -45,7 +45,9 @@
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
 #define ARM64_HARDEN_BP_POST_GUEST_EXIT25
 #define ARM64_HAS_RAS_EXTN 26
+#define ARM64_HAS_CACHE_IDC27
+#define ARM64_HAS_CACHE_DIC28
 
-#define ARM64_NCAPS27
+#define ARM64_NCAPS29
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index ff8a6e9..12e100a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_ctr[] = {
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1),   
/* RES1 */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 29, 1, 1),  
/* DIC */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 28, 1, 1),  
/* IDC */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, 24, 4, 0), 
/* CWG */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, 

Re: [PATCH v4 24/40] KVM: arm64: Rewrite system register accessors to read/write functions

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:16 +,
Christoffer Dall wrote:
> 
> From: Christoffer Dall 
> 
> Currently we access the system registers array via the vcpu_sys_reg()
> macro.  However, we are about to change the behavior to some times
> modify the register file directly, so let's change this to two
> primitives:
> 
>  * Accessor macros vcpu_write_sys_reg() and vcpu_read_sys_reg()
>  * Direct array access macro __vcpu_sys_reg()
> 
> The first primitive should be used in places where the code needs to
> access the currently loaded VCPU's state as observed by the guest.  For
> example, when trapping on cache related registers, a write to a system
> register should go directly to the VCPU version of the register.
> 
> The second primitive can be used in places where the VCPU is known to
> never be running (for example userspace access) or for registers which
> are never context switched (for example all the PMU system registers).
> 
> This rewrites all users of vcpu_sys_regs to one of the two primitives
> above.
> 
> No functional change.
> 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v2:
>  - New patch (deferred register handling has been reworked)
> 
>  arch/arm64/include/asm/kvm_emulate.h | 13 ---
>  arch/arm64/include/asm/kvm_host.h| 13 ++-
>  arch/arm64/include/asm/kvm_mmu.h |  2 +-
>  arch/arm64/kvm/debug.c   | 27 +-
>  arch/arm64/kvm/inject_fault.c|  8 ++--
>  arch/arm64/kvm/sys_regs.c| 71 
> ++--
>  arch/arm64/kvm/sys_regs.h|  4 +-
>  arch/arm64/kvm/sys_regs_generic_v8.c |  4 +-
>  virt/kvm/arm/pmu.c   | 37 ++-
>  9 files changed, 102 insertions(+), 77 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 3cc535591bdf..d313aaae5c38 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -290,15 +290,18 @@ static inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu 
> *vcpu)
>  
>  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> - return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> + return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
>  {
> - if (vcpu_mode_is_32bit(vcpu))
> + if (vcpu_mode_is_32bit(vcpu)) {
>   *vcpu_cpsr(vcpu) |= COMPAT_PSR_E_BIT;
> - else
> - vcpu_sys_reg(vcpu, SCTLR_EL1) |= (1 << 25);
> + } else {
> + u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
> + sctlr |= (1 << 25);
> + vcpu_write_sys_reg(vcpu, SCTLR_EL1, sctlr);

General comment: it is slightly annoying that vcpu_write_sys_reg takes
its parameters in an order different from that of write_sysreg
(register followed with value, instead of value followed with
register). Not a deal breaker, but slightly confusing.

> + }
>  }
>  
>  static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
> @@ -306,7 +309,7 @@ static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
>   if (vcpu_mode_is_32bit(vcpu))
>   return !!(*vcpu_cpsr(vcpu) & COMPAT_PSR_E_BIT);
>  
> - return !!(vcpu_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
> + return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
>  }
>  
>  static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index f2a6f39aec87..68398bf7882f 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -287,7 +287,18 @@ struct kvm_vcpu_arch {
>  };
>  
>  #define vcpu_gp_regs(v)  (&(v)->arch.ctxt.gp_regs)
> -#define vcpu_sys_reg(v,r)((v)->arch.ctxt.sys_regs[(r)])
> +
> +/*
> + * Only use __vcpu_sys_reg if you know you want the memory backed version of 
> a
> + * register, and not the one most recently accessed by a runnning VCPU.  For
> + * example, for userpace access or for system registers that are never 
> context
> + * switched, but only emulated.
> + */
> +#define __vcpu_sys_reg(v,r)  ((v)->arch.ctxt.sys_regs[(r)])
> +
> +#define vcpu_read_sys_reg(v,r)   __vcpu_sys_reg(v,r)
> +#define vcpu_write_sys_reg(v,r,n)do { __vcpu_sys_reg(v,r) = n; } while 
> (0)
> +
>  /*
>   * CP14 and CP15 live in the same array, as they are backed by the
>   * same system registers.
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index 9679067a1574..95f46e73c4dc 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -249,7 +249,7 @@ struct kvm;
>  
>  static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  {
> - return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
> + return (vcpu_read_sys_reg(vcpu, 

Re: [PATCH v2] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Will Deacon
On Wed, Feb 21, 2018 at 07:10:34AM -0600, Shanker Donthineni wrote:
> On 02/21/2018 05:12 AM, Catalin Marinas wrote:
> > However, my worry is that in an implementation with DIC set, we also
> > skip the DSB/ISB sequence in the invalidate_icache_by_line macro. For
> > example, in an implementation with transparent PoU, we could have:
> > 
> > str , [addr]
> > // no cache maintenance or barrier
> > br  
> > 
> 
> Thanks for pointing out the missing barriers. I think it make sense to follow
> the existing barrier semantics in order to avoid the unknown things.
>  
> > Is an ISB required between the instruction store and execution? I would
> > say yes but maybe Will has a better opinion here.
> > 
> Agree, an ISB is required especially for self-modifying code. I'll include in 
> v3 patch. 

I'd have thought you'd need a DSB too, before the ISB.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Shanker Donthineni
Hi Catalin,

On 02/21/2018 05:12 AM, Catalin Marinas wrote:
> On Mon, Feb 19, 2018 at 08:59:06PM -0600, Shanker Donthineni wrote:
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index f55fe5b..4061210 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1095,6 +1095,27 @@ config ARM64_RAS_EXTN
>>and access the new registers if the system supports the extension.
>>Platform RAS features may additionally depend on firmware support.
>>  
>> +config ARM64_CACHE_IDC
>> +bool "Enable support for DCache clean PoU optimization"
>> +default y
>> +help
>> +  The data cache clean to the point of unification is not required
>> +  for instruction to be data coherence if CTR_EL0.IDC has value 1.
>> +
>> +  Selecting this feature will allow the kernel to optimize the POU
>> +  cache maintaince operations where it requires 'DC CVAU'.
>> +
>> +config ARM64_CACHE_DIC
>> +bool "Enable support for ICache invalidation PoU optimization"
>> +default y
>> +help
>> +  Instruction cache invalidation to the point of unification is not
>> +  required for instruction to be data coherence if CTR_EL0.DIC has
>> +  value 1.
>> +
>> +  Selecting this feature will allow the kernel to optimize the POU
>> +  cache maintaince operations where it requires 'IC IVAU'.
> 
> A single Kconfig entry is sufficient for both features.
> 

I'll do in v3 patch.

>> @@ -864,6 +864,22 @@ static bool has_no_fpsimd(const struct 
>> arm64_cpu_capabilities *entry, int __unus
>>  ID_AA64PFR0_FP_SHIFT) < 0;
>>  }
>>  
>> +#ifdef CONFIG_ARM64_CACHE_IDC
>> +static bool has_cache_idc(const struct arm64_cpu_capabilities *entry,
>> +  int __unused)
>> +{
>> +return !!(read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_IDC_SHIFT));
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_ARM64_CACHE_DIC
>> +static bool has_cache_dic(const struct arm64_cpu_capabilities *entry,
>> +  int __unused)
>> +{
>> +return !!(read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_DIC_SHIFT));
>> +}
>> +#endif
> 
> Nitpick: no need for !! since the function type is bool already.
> 

Sure, I'll remove '!!'.

>> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
>> index 758bde7..7d37d71 100644
>> --- a/arch/arm64/mm/cache.S
>> +++ b/arch/arm64/mm/cache.S
>> @@ -50,6 +50,9 @@ ENTRY(flush_icache_range)
>>   */
>>  ENTRY(__flush_cache_user_range)
>>  uaccess_ttbr0_enable x2, x3, x4
>> +alternative_if ARM64_HAS_CACHE_IDC
>> +b   8f
>> +alternative_else_nop_endif
>>  dcache_line_size x2, x3
>>  sub x3, x2, #1
>>  bic x4, x0, x3
>> @@ -60,6 +63,11 @@ user_alt 9f, "dc cvau, x4",  "dc civac, x4",  
>> ARM64_WORKAROUND_CLEAN_CACHE
>>  b.lo1b
>>  dsb ish
>>  
>> +8:
>> +alternative_if ARM64_HAS_CACHE_DIC
>> +mov x0, #0
>> +b   1f
>> +alternative_else_nop_endif
>>  invalidate_icache_by_line x0, x1, x2, x3, 9f
>>  mov x0, #0
>>  1:
> 
> You can add another label at mov x0, #0 below this hunk and keep a
> single instruction in the alternative path.
> 
> However, my worry is that in an implementation with DIC set, we also
> skip the DSB/ISB sequence in the invalidate_icache_by_line macro. For
> example, in an implementation with transparent PoU, we could have:
> 
>   str , [addr]
>   // no cache maintenance or barrier
>   br  
> 

Thanks for pointing out the missing barriers. I think it make sense to follow
the existing barrier semantics in order to avoid the unknown things.
 
> Is an ISB required between the instruction store and execution? I would
> say yes but maybe Will has a better opinion here.
> 
Agree, an ISB is required especially for self-modifying code. I'll include in 
v3 patch. 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 17/40] KVM: arm64: Move userspace system registers into separate function

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:09 +,
Christoffer Dall wrote:
> 
> There's a semantic difference between the EL1 registers that control
> operation of a kernel running in EL1 and EL1 registers that only control
> userspace execution in EL0.  Since we can defer saving/restoring the
> latter, move them into their own function.
> 
> ACTLR_EL1 is not used by a VHE host, so we can move this register into
> the EL1 state which is not saved/restored for a VHE host.

Nit: maybe worth adding a reference to the D10.2.1 comment in the
ARMv8 ARM that indicates the "recommended" behaviour of this register?

> 
> We also take this chance to rename the function saving/restoring the
> remaining system register to make it clear this function deals with
> the EL1 system registers.
> 
> Reviewed-by: Andrew Jones 
> Signed-off-by: Christoffer Dall 

Reviewed-by: Marc Zyngier 

M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 08/40] KVM: arm/arm64: Introduce vcpu_el1_is_32bit

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:03:00 +,
Christoffer Dall wrote:
> 
> We have numerous checks around that checks if the HCR_EL2 has the RW bit
> set to figure out if we're running an AArch64 or AArch32 VM.  In some
> cases, directly checking the RW bit (given its unintuitive name), is a
> bit confusing, and that's not going to improve as we move logic around
> for the following patches that optimize KVM on AArch64 hosts with VHE.
> 
> Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
> direct checks of HCR_EL2.RW with the helper.
> 
> Reviewed-by: Julien Grall 
> Reviewed-by: Julien Thierry 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v2:
>  - New patch
> 
> Changes since v1:
>  - Reworded comments as suggested by Drew
> 
>  arch/arm64/include/asm/kvm_emulate.h |  7 ++-
>  arch/arm64/kvm/hyp/switch.c  | 11 +--
>  arch/arm64/kvm/hyp/sysreg-sr.c   |  5 +++--
>  arch/arm64/kvm/inject_fault.c|  6 +++---
>  4 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 9ee316b962c8..3cc535591bdf 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -45,6 +45,11 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
>  void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
>  void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
>  
> +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> +{
> + return !(vcpu->arch.hcr_el2 & HCR_RW);
> +}
> +
>  static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  {
>   vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
> @@ -65,7 +70,7 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>* For now this is conditional, since no AArch32 feature regs
>* are currently virtualised.
>*/
> - if (vcpu->arch.hcr_el2 & HCR_RW)
> + if (!vcpu_el1_is_32bit(vcpu))
>   vcpu->arch.hcr_el2 |= HCR_TID3;
>  }
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index b51638490d85..fbab9752a9f4 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -74,7 +74,7 @@ static hyp_alternate_select(__activate_traps_arch,
>  
>  static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>  {
> - u64 val;
> + u64 hcr = vcpu->arch.hcr_el2;
>  
>   /*
>* We are about to set CPTR_EL2.TFP to trap all floating point
> @@ -85,17 +85,16 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
> *vcpu)
>* If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
>* it will cause an exception.
>*/
> - val = vcpu->arch.hcr_el2;
> -
> - if (!(val & HCR_RW) && system_supports_fpsimd()) {
> + if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
>   write_sysreg(1 << 30, fpexc32_el2);
>   isb();
>   }
> - write_sysreg(val, hcr_el2);
>  
> - if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (val & HCR_VSE))
> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
>   write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
>  
> + write_sysreg(hcr, hcr_el2);
> +
>   /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
>   write_sysreg(1 << 15, hstr_el2);
>   /*
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 434f0fc9cfb3..99fc60516103 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -19,6 +19,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  
>  /* Yes, this does nothing, on purpose */
> @@ -147,7 +148,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu 
> *vcpu)
>  {
>   u64 *spsr, *sysreg;
>  
> - if (read_sysreg(hcr_el2) & HCR_RW)
> + if (!vcpu_el1_is_32bit(vcpu))
>   return;
>  
>   spsr = vcpu->arch.ctxt.gp_regs.spsr;
> @@ -172,7 +173,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu 
> *vcpu)
>  {
>   u64 *spsr, *sysreg;
>  
> - if (read_sysreg(hcr_el2) & HCR_RW)
> + if (!vcpu_el1_is_32bit(vcpu))
>   return;
>  
>   spsr = vcpu->arch.ctxt.gp_regs.spsr;
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index c1e179d34e6a..30a3f58cdb7b 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -128,7 +128,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>   */
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
>  {
> - if (!(vcpu->arch.hcr_el2 & HCR_RW))
> + if (vcpu_el1_is_32bit(vcpu))
>   kvm_inject_dabt32(vcpu, addr);
>   else
>   inject_abt64(vcpu, false, addr);
> @@ -144,7 +144,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long 
> 

Re: [PATCH v4 03/40] KVM: arm64: Avoid storing the vcpu pointer on the stack

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:02:55 +,
Christoffer Dall wrote:
> 
> We already have the percpu area for the host cpu state, which points to
> the VCPU, so there's no need to store the VCPU pointer on the stack on
> every context switch.  We can be a little more clever and just use
> tpidr_el2 for the percpu offset and load the VCPU pointer from the host
> context.
> 
> This does require us to calculate the percpu offset without including
> the offset from the kernel mapping of the percpu array to the linear
> mapping of the array (which is what we store in tpidr_el1), because a
> PC-relative generated address in EL2 is already giving us the hyp alias
> of the linear mapping of a kernel address.  We do this in
> __cpu_init_hyp_mode() by using kvm_ksym_ref().
> 
> This change also requires us to have a scratch register, so we take the
> chance to rearrange some of the el1_sync code to only look at the
> vttbr_el2 to determine if this is a trap from the guest or an HVC from
> the host.  We do add an extra check to call the panic code if the kernel
> is configured with debugging enabled and we saw a trap from the host
> which wasn't an HVC, indicating that we left some EL2 trap configured by
> mistake.
> 
> The code that accesses ESR_EL2 was previously using an alternative to
> use the _EL1 accessor on VHE systems, but this was actually unnecessary
> as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2
> accessor does the same thing on both systems.
> 
> Cc: Ard Biesheuvel 
> Signed-off-by: Christoffer Dall 
> ---
> 
> Notes:
> Changes since v3:
>  - Reworked the assembly part of the patch after rebasing on v4.16-rc1
>which created a conflict with the variant 2 mitigations.
>  - Removed Marc's reviewed-by due to the rework.
>  - Removed unneeded extern keyword in declaration in header file
> 
> Changes since v1:
>  - Use PC-relative addressing to access per-cpu variables instead of
>using a load from the literal pool.
>  - Remove stale comments as pointed out by Marc
>  - Reworded the commit message as suggested by Drew
> 
>  arch/arm64/include/asm/kvm_asm.h  | 14 ++
>  arch/arm64/include/asm/kvm_host.h | 15 +++
>  arch/arm64/kernel/asm-offsets.c   |  1 +
>  arch/arm64/kvm/hyp/entry.S|  6 +-
>  arch/arm64/kvm/hyp/hyp-entry.S| 31 +--
>  arch/arm64/kvm/hyp/switch.c   |  5 +
>  arch/arm64/kvm/hyp/sysreg-sr.c|  5 +
>  7 files changed, 50 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 24961b732e65..6b626750b0a1 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -33,6 +33,7 @@
>  #define KVM_ARM64_DEBUG_DIRTY_SHIFT  0
>  #define KVM_ARM64_DEBUG_DIRTY(1 << 
> KVM_ARM64_DEBUG_DIRTY_SHIFT)
>  
> +/* Translate a kernel address of @sym into its equivalent linear mapping */
>  #define kvm_ksym_ref(sym)\
>   ({  \
>   void *val =\
> @@ -70,6 +71,19 @@ extern u32 __init_stage2_translation(void);
>  
>  extern void __qcom_hyp_sanitize_btac_predictors(void);
>  
> +#else /* __ASSEMBLY__ */
> +
> +.macro get_host_ctxt reg, tmp
> + adr_l   \reg, kvm_host_cpu_state
> + mrs \tmp, tpidr_el2
> + add \reg, \reg, \tmp
> +.endm
> +
> +.macro get_vcpu vcpu, ctxt
> + ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU]
> + kern_hyp_va \vcpu
> +.endm
> +
>  #endif
>  
>  #endif /* __ARM_KVM_ASM_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 596f8e414a4c..618cfee7206a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -358,10 +358,15 @@ int kvm_perf_teardown(void);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>  
> +void __kvm_set_tpidr_el2(u64 tpidr_el2);
> +DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
> +
>  static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>  unsigned long hyp_stack_ptr,
>  unsigned long vector_ptr)
>  {
> + u64 tpidr_el2;
> +
>   /*
>* Call initialization code, and switch to the full blown HYP code.
>* If the cpucaps haven't been finalized yet, something has gone very
> @@ -370,6 +375,16 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
> pgd_ptr,
>*/
>   BUG_ON(!static_branch_likely(_const_caps_ready));
>   __kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
> +
> + /*
> +  * Calculate the raw per-cpu offset without a translation from the
> +  * kernel's mapping to the linear mapping, and store it in 

Re: [PATCH v2] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Catalin Marinas
On Mon, Feb 19, 2018 at 08:59:06PM -0600, Shanker Donthineni wrote:
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index f55fe5b..4061210 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1095,6 +1095,27 @@ config ARM64_RAS_EXTN
> and access the new registers if the system supports the extension.
> Platform RAS features may additionally depend on firmware support.
>  
> +config ARM64_CACHE_IDC
> + bool "Enable support for DCache clean PoU optimization"
> + default y
> + help
> +   The data cache clean to the point of unification is not required
> +   for instruction to be data coherence if CTR_EL0.IDC has value 1.
> +
> +   Selecting this feature will allow the kernel to optimize the POU
> +   cache maintaince operations where it requires 'DC CVAU'.
> +
> +config ARM64_CACHE_DIC
> + bool "Enable support for ICache invalidation PoU optimization"
> + default y
> + help
> +   Instruction cache invalidation to the point of unification is not
> +   required for instruction to be data coherence if CTR_EL0.DIC has
> +   value 1.
> +
> +   Selecting this feature will allow the kernel to optimize the POU
> +   cache maintaince operations where it requires 'IC IVAU'.

A single Kconfig entry is sufficient for both features.

> @@ -864,6 +864,22 @@ static bool has_no_fpsimd(const struct 
> arm64_cpu_capabilities *entry, int __unus
>   ID_AA64PFR0_FP_SHIFT) < 0;
>  }
>  
> +#ifdef CONFIG_ARM64_CACHE_IDC
> +static bool has_cache_idc(const struct arm64_cpu_capabilities *entry,
> +   int __unused)
> +{
> + return !!(read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_IDC_SHIFT));
> +}
> +#endif
> +
> +#ifdef CONFIG_ARM64_CACHE_DIC
> +static bool has_cache_dic(const struct arm64_cpu_capabilities *entry,
> +   int __unused)
> +{
> + return !!(read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_DIC_SHIFT));
> +}
> +#endif

Nitpick: no need for !! since the function type is bool already.

> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index 758bde7..7d37d71 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -50,6 +50,9 @@ ENTRY(flush_icache_range)
>   */
>  ENTRY(__flush_cache_user_range)
>   uaccess_ttbr0_enable x2, x3, x4
> +alternative_if ARM64_HAS_CACHE_IDC
> + b   8f
> +alternative_else_nop_endif
>   dcache_line_size x2, x3
>   sub x3, x2, #1
>   bic x4, x0, x3
> @@ -60,6 +63,11 @@ user_alt 9f, "dc cvau, x4",  "dc civac, x4",  
> ARM64_WORKAROUND_CLEAN_CACHE
>   b.lo1b
>   dsb ish
>  
> +8:
> +alternative_if ARM64_HAS_CACHE_DIC
> + mov x0, #0
> + b   1f
> +alternative_else_nop_endif
>   invalidate_icache_by_line x0, x1, x2, x3, 9f
>   mov x0, #0
>  1:

You can add another label at mov x0, #0 below this hunk and keep a
single instruction in the alternative path.

However, my worry is that in an implementation with DIC set, we also
skip the DSB/ISB sequence in the invalidate_icache_by_line macro. For
example, in an implementation with transparent PoU, we could have:

str , [addr]
// no cache maintenance or barrier
br  

Is an ISB required between the instruction store and execution? I would
say yes but maybe Will has a better opinion here.

-- 
Catalin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 02/40] KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:02:54 +,
Christoffer Dall wrote:
> 
> Moving the call to vcpu_load() in kvm_arch_vcpu_ioctl_run() to after
> we've called kvm_vcpu_first_run_init() simplifies some of the vgic and
> there is also no need to do vcpu_load() for things such as handling the
> immediate_exit flag.
> 
> Reviewed-by: Julien Grall 
> Signed-off-by: Christoffer Dall 

Reviewed-by: Marc Zyngier 

M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 01/40] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN

2018-02-21 Thread Marc Zyngier
On Thu, 15 Feb 2018 21:02:53 +,
Christoffer Dall wrote:
> 
> Calling vcpu_load() registers preempt notifiers for this vcpu and calls
> kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
> lifting on arm/arm64 and will try to do things such as enabling the
> virtual timer and setting us up to handle interrupts from the timer
> hardware.
> 
> Loading state onto hardware registers and enabling hardware to signal
> interrupts can be problematic when we're not actually about to run the
> VCPU, because it makes it difficult to establish the right context when
> handling interrupts from the timer, and it makes the register access
> code difficult to reason about.
> 
> Luckily, now when we call vcpu_load in each ioctl implementation, we can
> simply remove the call from the non-KVM_RUN vcpu ioctls, and our
> kvm_arch_vcpu_load() is only used for loading vcpu content to the
> physical CPU when we're actually going to run the vcpu.
> 
> Reviewed-by: Julien Grall 
> Signed-off-by: Christoffer Dall 

Reviewed-by: Marc Zyngier 

M.

-- 
Jazz is not dead, it just smell funny.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm