Re: [PATCH] KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT

2020-10-27 Thread Gavin Shan

Hi Marc,

On 10/27/20 8:27 PM, Marc Zyngier wrote:

On 2020-10-26 23:41, Gavin Shan wrote:

On 10/27/20 1:44 AM, Will Deacon wrote:

For consistency with the rest of the stage-2 page-table page allocations
(performing using a kvm_mmu_memory_cache), ensure that __GFP_ACCOUNT is
included in the GFP flags for the PGD pages.

Cc: Marc Zyngier 
Cc: Quentin Perret 
Signed-off-by: Will Deacon 
---
  arch/arm64/kvm/hyp/pgtable.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)





[...]


Another question is why the page-table pages for hyp mode aren't
allocated with __GFP_ACCOUNT in kvm_pgtable_hyp_init and hyp_map_walker()?


Why user task would you account the hypervisor mappings to? The page tables
used for HYP code and data are definitely not attributable to any task.

The kvm and kvm_vcpu mappings *could* be attributed to a user task, but
the page tables are likely shared with other tasks. So who gets the blame?



As replied to Will, the qemu could be put into one cgroup (memory cgroup
specificly). Without __GFP_ACCOUNT, the consumed memory for the page-table
isn't limited. I think qemu is the owner of the consumed memory in this
case.

Cheers,
Gavin

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT

2020-10-27 Thread Gavin Shan

Hi Will,

On 10/27/20 8:27 PM, Will Deacon wrote:

On Tue, Oct 27, 2020 at 10:41:33AM +1100, Gavin Shan wrote:

On 10/27/20 1:44 AM, Will Deacon wrote:

For consistency with the rest of the stage-2 page-table page allocations
(performing using a kvm_mmu_memory_cache), ensure that __GFP_ACCOUNT is
included in the GFP flags for the PGD pages.

Cc: Marc Zyngier 
Cc: Quentin Perret 
Signed-off-by: Will Deacon 
---
   arch/arm64/kvm/hyp/pgtable.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)



The patch itself looks good to me:

Reviewed-by: Gavin Shan 

Another question is why the page-table pages for hyp mode aren't
allocated with __GFP_ACCOUNT in kvm_pgtable_hyp_init and hyp_map_walker()?
The page-table pages for host or guest are allocated with GFP_PGTABLE_USER
in alloc_pte_one().

#define GFP_PGTABLE_USER  (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
#define GFP_PGTABLE_KERNEL(GFP_KERNEL | __GFP_ZERO)


I think because the guest pages are allocated as a direct result of the VMM,
whereas I tend to think of the hyp page-tables more like kernel page-tables
(which aren't accounted afaik: see GFP_PGTABLE_USER vs GFP_PGTABLE_KERNEL).



Assume qemu is the only userspace counter-part. qemu is the process and could
be put into one cgroup (memory cgroup specificly). Without __GFP_ACCOUNT,
the memory consumed by page-table isn't limited by cgroup policies. I'm not
sure if this is exactly what we want, even it's trivial in terms of the issue
itself and the amount of consumed memory.

Cheers,
Gavin

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/11] KVM: arm64: Inject AArch32 exceptions from HYP

2020-10-27 Thread Marc Zyngier

On 2020-10-27 17:41, James Morse wrote:

Hi Marc,

On 26/10/2020 13:34, Marc Zyngier wrote:
Similarily to what has been done for AArch64, move the AArch32 
exception

inhjection to HYP.

In order to not use the regmap selection code at EL2, simplify the 
code

populating the target mode's LR register by harcoding the two possible
LR registers (LR_abt in X20, LR_und in X22).



diff --git a/arch/arm64/kvm/hyp/exception.c 
b/arch/arm64/kvm/hyp/exception.c

index cd6e643639e8..8d1d1bcd9e69 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -57,10 +67,25 @@ static void __vcpu_write_spsr(struct kvm_vcpu 
*vcpu, u64 val)


+static inline u32 __vcpu_read_cp15(const struct kvm_vcpu *vcpu, int 
reg)

+{
+   return __vcpu_read_sys_reg(vcpu, reg / 2);
+}


Doesn't this re-implement the issue 3204be4109ad biased?


I don't think it does. The issue existed when accessing the 32bit 
shadow,

and we had to pick which side of the 64bit register had our 32bit value.
Here, we directly access the 64bit file, which is safe.

But thinking of it, we may as well change the call sites to directly
use the 64bit enum, rather than playing games (we used to use the
32bit definition for the sake of the defunct 32bit port).




@@ -155,23 +180,189 @@ static void enter_exception64(struct kvm_vcpu 
*vcpu, unsigned long target_mode,


+static void enter_exception32(struct kvm_vcpu *vcpu, u32 mode, u32 
vect_offset)

+{



+   /*
+* Table D1-27 of DDI 0487F.c shows the GPR mapping between
+* AArch32 and AArch64. We only deal with ABT/UND.


(to check I understand : because these are the only two KVM ever 
injects?)


Yes, that's indeed the reason. I'll try to clarify.





+*/
+   switch(mode) {
+   case PSR_AA32_MODE_ABT:
+   __vcpu_write_spsr_abt(vcpu, host_spsr_to_spsr32(spsr));
+   lr = 20;
break;
+


(two bonus tabs!)



+   case PSR_AA32_MODE_UND:
+   __vcpu_write_spsr_und(vcpu, host_spsr_to_spsr32(spsr));
+   lr = 22;
break;
}> +
+   vcpu_set_reg(vcpu, lr, *vcpu_pc(vcpu) + return_offset);



Can we, abuse, the compat_lr_abt definitions to do something like:

|   u32 return_address = *vcpu_pc(vcpu) + return_offset;
[..]
|   switch(mode) {
|   case PSR_AA32_MODE_ABT:>
|   __vcpu_write_spsr_abt(vcpu, host_spsr_to_spsr32(spsr));
|   vcpu_gp_regs(vcpu)->compat_lr_abt = return_address;
|   break;
|   case PSR_AA32_MODE_UND:
|   __vcpu_write_spsr_und(vcpu, host_spsr_to_spsr32(spsr));
|   vcpu_gp_regs(vcpu)->compat_lr_und = return_address;
|   break;

...as someone who has no clue about 32bit, this hides all the worrying
magic-14==magic-22!


Ah, I totally forgot about them (the only use was in the file I delete
two patches later...)!

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 07/11] KVM: arm64: Inject AArch64 exceptions from HYP

2020-10-27 Thread Marc Zyngier

Hi James,

On 2020-10-27 17:41, James Morse wrote:

Hi Marc,

On 26/10/2020 13:34, Marc Zyngier wrote:

Move the AArch64 exception injection code from EL1 to HYP, leaving
only the ESR_EL1 updates to EL1. In order to come with the differences


(cope with the differences?)


Yes, much better!

between VHE and nVHE, two set of system register accessors are 
provided.


SPSR, ELR, PC and PSTATE are now completely handled in the hypervisor.



diff --git a/arch/arm64/kvm/hyp/exception.c 
b/arch/arm64/kvm/hyp/exception.c

index 6533a9270850..cd6e643639e8 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -11,7 +11,167 @@
  */

 #include 
+#include 
+#include 
+
+#if defined (__KVM_NVHE_HYPERVISOR__)
+/*
+ * System registers are never loaded on the CPU until we actually
+ * restore them.
+ */
+static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, 
int reg)

+{
+   return __vcpu_sys_reg(vcpu, reg);
+}
+
+static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 
val, int reg)

+{
+__vcpu_sys_reg(vcpu, reg) = val;
+}
+
+static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
+{
+   write_sysreg_el1(val, SYS_SPSR);
+}
+#elif defined (__KVM_VHE_HYPERVISOR__)
+/* On VHE, all the registers are already loaded on the CPU */
+static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, 
int reg)

+{
+   u64 val;



+   if (__vcpu_read_sys_reg_from_cpu(reg, ))
+   return val;


As has_vhe()'s behaviour changes based on these KVM preprocessor 
symbols, would:

|   if (has_vhe() && __vcpu_read_sys_reg_from_cpu(reg, ))
|   return val;

let you do both of these with only one copy of the function?


Indeed that's better. Even better, let's move the has_vhe() into
__vcpu_read_sys_reg_from_cpu(), as that's the only case this is
used for.

Further cleanup could involve a new helper that would gate the
test of vcpu->sysregs_loaded_on_cpu with has_vhe() too, as this
definitely is a VHE-only feature.





+   return __vcpu_sys_reg(vcpu, reg);
+}
+
+static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 
val, int reg)

+{
+   if (__vcpu_write_sys_reg_to_cpu(val, reg))
+   return;
+
+__vcpu_sys_reg(vcpu, reg) = val;
+}




+static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
+{
+   write_sysreg_el1(val, SYS_SPSR);
+}


This one doesn't look like it needs duplicating.


Spot on again, thanks!

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 07/11] KVM: arm64: Inject AArch64 exceptions from HYP

2020-10-27 Thread James Morse
Hi Marc,

On 26/10/2020 13:34, Marc Zyngier wrote:
> Move the AArch64 exception injection code from EL1 to HYP, leaving
> only the ESR_EL1 updates to EL1. In order to come with the differences

(cope with the differences?)


> between VHE and nVHE, two set of system register accessors are provided.
> 
> SPSR, ELR, PC and PSTATE are now completely handled in the hypervisor.


> diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> index 6533a9270850..cd6e643639e8 100644
> --- a/arch/arm64/kvm/hyp/exception.c
> +++ b/arch/arm64/kvm/hyp/exception.c
> @@ -11,7 +11,167 @@
>   */
>  
>  #include 
> +#include 
> +#include 
> +
> +#if defined (__KVM_NVHE_HYPERVISOR__)
> +/*
> + * System registers are never loaded on the CPU until we actually
> + * restore them.
> + */
> +static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
> +{
> + return __vcpu_sys_reg(vcpu, reg);
> +}
> +
> +static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int 
> reg)
> +{
> +  __vcpu_sys_reg(vcpu, reg) = val;
> +}
> +
> +static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
> +{
> + write_sysreg_el1(val, SYS_SPSR);
> +}
> +#elif defined (__KVM_VHE_HYPERVISOR__)
> +/* On VHE, all the registers are already loaded on the CPU */
> +static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
> +{
> + u64 val;

> + if (__vcpu_read_sys_reg_from_cpu(reg, ))
> + return val;

As has_vhe()'s behaviour changes based on these KVM preprocessor symbols, would:
|   if (has_vhe() && __vcpu_read_sys_reg_from_cpu(reg, ))
|   return val;

let you do both of these with only one copy of the function?


> + return __vcpu_sys_reg(vcpu, reg);
> +}
> +
> +static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int 
> reg)
> +{
> + if (__vcpu_write_sys_reg_to_cpu(val, reg))
> + return;
> +
> +  __vcpu_sys_reg(vcpu, reg) = val;
> +}


> +static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
> +{
> + write_sysreg_el1(val, SYS_SPSR);
> +}

This one doesn't look like it needs duplicating.


> +#else
> +#error Hypervisor code only!
> +#endif


Thanks,

James
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/11] KVM: arm64: Inject AArch32 exceptions from HYP

2020-10-27 Thread James Morse
Hi Marc,

On 26/10/2020 13:34, Marc Zyngier wrote:
> Similarily to what has been done for AArch64, move the AArch32 exception
> inhjection to HYP.
> 
> In order to not use the regmap selection code at EL2, simplify the code
> populating the target mode's LR register by harcoding the two possible
> LR registers (LR_abt in X20, LR_und in X22).


> diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> index cd6e643639e8..8d1d1bcd9e69 100644
> --- a/arch/arm64/kvm/hyp/exception.c
> +++ b/arch/arm64/kvm/hyp/exception.c
> @@ -57,10 +67,25 @@ static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 
> val)

> +static inline u32 __vcpu_read_cp15(const struct kvm_vcpu *vcpu, int reg)
> +{
> + return __vcpu_read_sys_reg(vcpu, reg / 2);
> +}

Doesn't this re-implement the issue 3204be4109ad biased?


> @@ -155,23 +180,189 @@ static void enter_exception64(struct kvm_vcpu *vcpu, 
> unsigned long target_mode,

> +static void enter_exception32(struct kvm_vcpu *vcpu, u32 mode, u32 
> vect_offset)
> +{

> + /*
> +  * Table D1-27 of DDI 0487F.c shows the GPR mapping between
> +  * AArch32 and AArch64. We only deal with ABT/UND.

(to check I understand : because these are the only two KVM ever injects?)


> +  */
> + switch(mode) {
> + case PSR_AA32_MODE_ABT:
> + __vcpu_write_spsr_abt(vcpu, host_spsr_to_spsr32(spsr));
> + lr = 20;
>   break;
> + 

(two bonus tabs!)


> + case PSR_AA32_MODE_UND:
> + __vcpu_write_spsr_und(vcpu, host_spsr_to_spsr32(spsr));
> + lr = 22;
>   break;
>   }> +
> + vcpu_set_reg(vcpu, lr, *vcpu_pc(vcpu) + return_offset);


Can we, abuse, the compat_lr_abt definitions to do something like:

|   u32 return_address = *vcpu_pc(vcpu) + return_offset;
[..]
|   switch(mode) {
|   case PSR_AA32_MODE_ABT:>
|   __vcpu_write_spsr_abt(vcpu, host_spsr_to_spsr32(spsr));
|   vcpu_gp_regs(vcpu)->compat_lr_abt = return_address;
|   break;
|   case PSR_AA32_MODE_UND:
|   __vcpu_write_spsr_und(vcpu, host_spsr_to_spsr32(spsr));
|   vcpu_gp_regs(vcpu)->compat_lr_und = return_address;
|   break;

...as someone who has no clue about 32bit, this hides all the worrying 
magic-14==magic-22!



Thanks,

James

> +}
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 03/16] KVM: arm64: Hide SPE from guests

2020-10-27 Thread Alexandru Elisei
When SPE is not implemented, accesses to the SPE registers cause an
undefined exception. KVM advertises the presence of SPE in the
ID_AA64DFR0_EL1 register, but configures MDCR_EL2 to trap accesses to the
registers and injects an undefined exception when that happens.

The architecture doesn't allow trapping access to the PMBIDR_EL1 register,
which means the guest will be able to read it even if SPE is not advertised
in the ID register. However, since it's usually better for a read to
unexpectedly succeed than to cause an exception, let's stop advertising the
presence of SPE to guests to better match how KVM emulates the
architecture.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/kvm/sys_regs.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d9117bc56237..aa776c006a2a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -244,6 +244,12 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
return true;
 }
 
+static unsigned int spe_visibility(const struct kvm_vcpu *vcpu,
+  const struct sys_reg_desc *r)
+{
+   return REG_HIDDEN_GUEST | REG_HIDDEN_USER;
+}
+
 static bool access_actlr(struct kvm_vcpu *vcpu,
 struct sys_reg_params *p,
 const struct sys_reg_desc *r)
@@ -1143,6 +1149,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
val = cpuid_feature_cap_perfmon_field(val,
ID_AA64DFR0_PMUVER_SHIFT,
ID_AA64DFR0_PMUVER_8_1);
+   /* Don't advertise SPE to guests */
+   val &= ~(0xfUL << ID_AA64DFR0_PMSVER_SHIFT);
} else if (id == SYS_ID_DFR0_EL1) {
/* Limit guests to PMUv3 for ARMv8.1 */
val = cpuid_feature_cap_perfmon_field(val,
@@ -1590,6 +1598,17 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
+   { SYS_DESC(SYS_PMSCR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSICR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSIRR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSFCR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSEVFR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSLATFR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSIDR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMBLIMITR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMBPTR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMBSR_EL1), .visibility = spe_visibility },
+
{ SYS_DESC(SYS_PMINTENSET_EL1), access_pminten, reset_unknown, 
PMINTENSET_EL1 },
{ SYS_DESC(SYS_PMINTENCLR_EL1), access_pminten, reset_unknown, 
PMINTENSET_EL1 },
 
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 10/16] KVM: arm64: Add a new VM device control group for SPE

2020-10-27 Thread Alexandru Elisei
Stage 2 faults triggered by the profiling buffer attempting to write to
memory are reported by the SPE hardware by asserting a buffer management
event interrupt. Interrupts are by their nature asynchronous, which means
that the guest might have changed its stage 1 translation tables since the
attempted write. SPE reports the guest virtual address that caused the data
abort, but not the IPA, which means that KVM would have to walk the guest's
stage 1 tables to find the IPA; using the AT instruction to walk the
guest's tables in hardware is not an option because it doesn't report the
IPA in the case of a stage 2 fault on a stage 1 table walk.

Fix both problems by pre-mapping the guest's memory at stage 2 with write
permissions to avoid any faults. Userspace calls mlock() on the VMAs that
back the guest's memory, pinning the pages in memory, then tells KVM to map
the memory at stage 2 by using the VM control group KVM_ARM_VM_SPE_CTRL
with the attribute KVM_ARM_VM_SPE_FINALIZE. KVM will map all writable VMAs
which have the VM_LOCKED flag set. Hugetlb VMAs are practically pinned in
memory after they are faulted in and mlock() doesn't set the VM_LOCKED
flag, and just faults the pages in; KVM will treat hugetlb VMAs like they
have the VM_LOCKED flag and will also map them, faulting them in if
necessary, when handling the ioctl.

VM live migration relies on a bitmap of dirty pages. This bitmap is created
by write-protecting a memslot and updating it as KVM handles stage 2 write
faults. Because KVM cannot handle stage 2 faults reported by the profiling
buffer, it will not pre-map a logging memslot. This effectively means that
profiling is not available when the VM is configured for live migration.

Signed-off-by: Alexandru Elisei 
---
 Documentation/virt/kvm/devices/vm.rst |  28 +
 arch/arm64/include/asm/kvm_host.h |   5 +
 arch/arm64/include/asm/kvm_mmu.h  |   2 +
 arch/arm64/include/uapi/asm/kvm.h |   3 +
 arch/arm64/kvm/arm.c  |  78 +++-
 arch/arm64/kvm/guest.c|  48 
 arch/arm64/kvm/mmu.c  | 169 ++
 arch/arm64/kvm/spe.c  |  81 
 include/kvm/arm_spe.h |  36 ++
 9 files changed, 448 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vm.rst 
b/Documentation/virt/kvm/devices/vm.rst
index 0aa5b1cfd700..b70798a72d8a 100644
--- a/Documentation/virt/kvm/devices/vm.rst
+++ b/Documentation/virt/kvm/devices/vm.rst
@@ -314,3 +314,31 @@ Allows userspace to query the status of migration mode.
 if it is enabled
 :Returns:   -EFAULT if the given address is not accessible from kernel space;
0 in case of success.
+
+6. GROUP: KVM_ARM_VM_SPE_CTRL
+===
+
+:Architectures: arm64
+
+6.1. ATTRIBUTE: KVM_ARM_VM_SPE_FINALIZE
+-
+
+Finalizes the creation of the SPE feature by mapping the guest memory in the
+stage 2 table. Guest memory must be readable, writable and pinned in RAM, which
+is achieved with an mlock() system call; the memory can be backed by a 
hugetlbfs
+file. Memory regions from read-only or dirty page logging enabled memslots will
+be ignored. After the call, no changes to the guest memory, including to its
+contents, are permitted.
+
+Subsequent KVM_ARM_VCPU_INIT calls will cause the memory to become unmapped and
+the feature must be finalized again before any VCPU can run.
+
+If any VCPUs are run before finalizing the feature, KVM_RUN will return -EPERM.
+
+:Parameters: none
+:Returns:   -EAGAIN if guest memory has been modified while the call was
+executing
+-EBUSY if the feature is already initialized
+-EFAULT if an address backing the guest memory is invalid
+-ENXIO if SPE is not supported or not properly configured
+0 in case of success
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 5b68c06930c6..27f581750c6e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -92,6 +92,7 @@ struct kvm_s2_mmu {
 
 struct kvm_arch {
struct kvm_s2_mmu mmu;
+   struct kvm_spe spe;
 
/* VTCR_EL2 value for this VM */
u64vtcr;
@@ -612,6 +613,10 @@ void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
+int kvm_arm_vm_arch_set_attr(struct kvm *kvm, struct kvm_device_attr *attr);
+int kvm_arm_vm_arch_get_attr(struct kvm *kvm, struct kvm_device_attr *attr);
+int kvm_arm_vm_arch_has_attr(struct kvm *kvm, struct kvm_device_attr *attr);
+
 int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
   struct kvm_device_attr *attr);
 int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/include/asm/kvm_mmu.h 

[RFC PATCH v3 04/16] arm64: Introduce CPU SPE feature

2020-10-27 Thread Alexandru Elisei
Detect Statistical Profiling Extension (SPE) support using the cpufeatures
framework. The presence of SPE is reported via the ARM64_SPE capability.

The feature will be necessary for emulating SPE in KVM, because KVM needs
that all CPUs have SPE hardware to avoid scheduling a VCPU on a CPU without
support. For this reason, the feature type ARM64_CPUCAP_SYSTEM_FEATURE has
been selected to disallow hotplugging a CPU which doesn't support SPE.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/kernel/cpufeature.c   | 24 
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 42868dbd29fd..10fd094d9a5b 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -65,7 +65,8 @@
 #define ARM64_HAS_ARMv8_4_TTL  55
 #define ARM64_HAS_TLB_RANGE56
 #define ARM64_MTE  57
+#define ARM64_SPE  58
 
-#define ARM64_NCAPS58
+#define ARM64_NCAPS59
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index dcc165b3fc04..4a0f4dc53824 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1278,6 +1278,18 @@ has_useable_cnp(const struct arm64_cpu_capabilities 
*entry, int scope)
return has_cpuid_feature(entry, scope);
 }
 
+static bool __maybe_unused
+has_usable_spe(const struct arm64_cpu_capabilities *entry, int scope)
+{
+   u64 pmbidr;
+
+   if (!has_cpuid_feature(entry, scope))
+   return false;
+
+   pmbidr = read_sysreg_s(SYS_PMBIDR_EL1);
+   return !(pmbidr & BIT(SYS_PMBIDR_EL1_P_SHIFT));
+}
+
 /*
  * This check is triggered during the early boot before the cpufeature
  * is initialised. Checking the status on the local CPU allows the boot
@@ -2003,6 +2015,18 @@ static const struct arm64_cpu_capabilities 
arm64_features[] = {
.min_field_value = 1,
.cpu_enable = cpu_enable_cnp,
},
+#endif
+#ifdef CONFIG_ARM_SPE_PMU
+   {
+   .desc = "Statistical Profiling Extension (SPE)",
+   .capability = ARM64_SPE,
+   .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+   .matches = has_usable_spe,
+   .sys_reg = SYS_ID_AA64DFR0_EL1,
+   .sign = FTR_UNSIGNED,
+   .field_pos = ID_AA64DFR0_PMSVER_SHIFT,
+   .min_field_value = 1,
+   },
 #endif
{
.desc = "Speculation barrier (SB)",
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 12/16] KVM: arm64: VHE: Clear MDCR_EL2.E2PB in vcpu_put()

2020-10-27 Thread Alexandru Elisei
From: Sudeep Holla 

On VHE systems, the kernel executes at EL2 and configures the profiling
buffer to use the EL2&0 translation regime and to trap accesses from the
guest by clearing MDCR_EL2.E2PB. In vcpu_put(), KVM does a bitwise or with
the E2PB mask, preserving its value. This has been correct so far, since
MDCR_EL2.E2B has the same value (0b00) for all VMs.

However, this will change when KVM enables support for SPE in guests. For
such guests KVM will configure the profiling buffer to use the EL1&0
translation regime, a setting that is obviously undesirable to be preserved
for the host running at EL2. Let's avoid this situation by explicitly
clearing E2PB in vcpu_put().

[ Alexandru E: Rebased on top of 5.10-rc1, reworded commit ]

Signed-off-by: Sudeep Holla 
Signed-off-by: Alexandru Elisei 
---
 arch/arm64/kvm/hyp/vhe/switch.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index fe69de16dadc..3f4db1fa388b 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -97,9 +97,7 @@ void deactivate_traps_vhe_put(void)
 {
u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
-   mdcr_el2 &= MDCR_EL2_HPMN_MASK |
-   MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
-   MDCR_EL2_TPMS;
+   mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
 
write_sysreg(mdcr_el2, mdcr_el2);
 
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 16/16] Documentation: arm64: Document ARM Neoverse-N1 erratum #1688567

2020-10-27 Thread Alexandru Elisei
According to erratum #1688567, a SPE buffer write that results in an Access
flag fault or Permission fault at stage 2 is reported with an unsupported
PMBSR_EL1.FSC code.

KVM avoids SPE stage 2 faults altogether by requiring userspace to lock the
guest memory in RAM and pre-mapping it in stage 2 before the VM is started.
As a result, KVM is not impacted by this erratum.

Signed-off-by: Alexandru Elisei 
---
 Documentation/arm64/silicon-errata.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst 
b/Documentation/arm64/silicon-errata.rst
index d3587805de64..1f6c403fd555 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -96,6 +96,8 @@ stable kernels.
 
++-+-+-+
 | ARM| Neoverse-N1 | #1542419| ARM64_ERRATUM_1542419   
|
 
++-+-+-+
+| ARM| Neoverse-N1 | #1688567| N/A 
|
+++-+-+-+
 | ARM| MMU-500 | #841119,826419  | N/A 
|
 
++-+-+-+
 
++-+-+-+
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 13/16] KVM: arm64: Switch SPE context on VM entry/exit

2020-10-27 Thread Alexandru Elisei
When the host and the guest are using SPE at the same time, KVM will have
to save and restore the proper SPE context on VM entry (save host's,
restore guest's, and on VM exit (save guest's, restore host's).

On systems without VHE, the world switch happens at EL2, while both the
guest and the host execute at EL1, and according to ARM DDI 0487F.b, page
D9-2807, sampling is disabled in this case:

"If the PE takes an exception to an Exception level where the Statistical
Profiling Extension is disabled, no new operations are selected for
sampling."

We still have to disable the buffer before we switch translation regimes
because we don't want the SPE buffer to speculate memory accesses using a
stale buffer pointer.

On VHE systems, the world switch happens at EL2, with the host potentially
in the middle of a profiling session and we also need to explicitely
disable host sampling.

The buffer owning Exception level is determined by MDCR_EL2.E2PB. On
systems with VHE, this is the different between the guest (executes at EL1)
and the host (executes at EL2). The current behavior of perf is to profile
KVM until it drops to the guest at EL1. To preserve this behavior as much
as possible, KVM will defer changing the value of MDCR_EL2 until
__{activate,deactivate}_traps().

For the purposes of emulating the SPE buffer management interrupt, MDCR_EL2
is configured to trap accesses to the buffer control registers; the guest
can access the rest of the registers directly.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/asm/kvm_arm.h|   1 +
 arch/arm64/include/asm/kvm_hyp.h|  28 +-
 arch/arm64/include/asm/sysreg.h |   1 +
 arch/arm64/kvm/debug.c  |  29 +-
 arch/arm64/kvm/hyp/include/hyp/spe-sr.h |  38 
 arch/arm64/kvm/hyp/include/hyp/switch.h |   1 -
 arch/arm64/kvm/hyp/nvhe/Makefile|   1 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c  |  16 ++-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c|  93 ++
 arch/arm64/kvm/hyp/nvhe/switch.c|  12 +++
 arch/arm64/kvm/hyp/vhe/Makefile |   1 +
 arch/arm64/kvm/hyp/vhe/spe-sr.c | 124 
 arch/arm64/kvm/hyp/vhe/switch.c |  48 -
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c  |   2 +-
 arch/arm64/kvm/spe.c|   3 +
 arch/arm64/kvm/sys_regs.c   |   1 +
 16 files changed, 384 insertions(+), 15 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/spe-sr.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/spe-sr.c
 create mode 100644 arch/arm64/kvm/hyp/vhe/spe-sr.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 64ce29378467..033980a9b3fc 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -280,6 +280,7 @@
 #define MDCR_EL2_TPMS  (1 << 14)
 #define MDCR_EL2_E2PB_MASK (UL(0x3))
 #define MDCR_EL2_E2PB_SHIFT(UL(12))
+#define MDCR_EL2_E2PB_EL1_TRAP (2 << MDCR_EL2_E2PB_SHIFT);
 #define MDCR_EL2_TDRA  (1 << 11)
 #define MDCR_EL2_TDOSA (1 << 10)
 #define MDCR_EL2_TDA   (1 << 9)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 6b664de5ec1f..4358cba6784a 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -79,6 +79,32 @@ void sysreg_save_guest_state_vhe(struct kvm_cpu_context 
*ctxt);
 void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt);
 #endif
 
+#ifdef CONFIG_KVM_ARM_SPE
+#ifdef __KVM_NVHE_HYPERVISOR__
+void __sysreg_save_spe_host_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_spe_host_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_save_spe_guest_state_nvhe(struct kvm_vcpu *vcpu);
+void __sysreg_restore_spe_guest_state_nvhe(struct kvm_vcpu *vcpu);
+#else
+void sysreg_save_spe_host_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_restore_spe_host_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_save_spe_guest_state_vhe(struct kvm_vcpu *vcpu);
+void sysreg_restore_spe_guest_state_vhe(struct kvm_vcpu *vcpu);
+#endif
+#else  /* !CONFIG_KVM_ARM_SPE */
+#ifdef __KVM_NVHE_HYPERVISOR__
+void __sysreg_save_spe_host_state_nvhe(struct kvm_cpu_context *ctxt) {}
+void __sysreg_restore_spe_host_state_nvhe(struct kvm_cpu_context *ctxt) {}
+void __sysreg_save_spe_guest_state_nvhe(struct kvm_vcpu *vcpu) {}
+void __sysreg_restore_spe_guest_state_nvhe(struct kvm_vcpu *vcpu) {}
+#else
+void sysreg_save_spe_host_state_vhe(struct kvm_cpu_context *ctxt) {}
+void sysreg_restore_spe_host_state_vhe(struct kvm_cpu_context *ctxt) {}
+void sysreg_save_spe_guest_state_vhe(struct kvm_vcpu *vcpu) {}
+void sysreg_restore_spe_guest_state_vhe(struct kvm_vcpu *vcpu) {}
+#endif
+#endif /* CONFIG_KVM_ARM_SPE */
+
 void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
 void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 
@@ -87,7 +113,7 @@ void __fpsimd_restore_state(struct user_fpsimd_state 
*fp_regs);
 
 #ifndef 

[RFC PATCH v3 06/16] KVM: arm64: Introduce SPE primitives

2020-10-27 Thread Alexandru Elisei
KVM SPE emulation depends on the configuration option KVM_ARM_SPE and on on
having hardware SPE support on all CPUs. The host driver must be
compiled-in because we need the SPE interrupt to be enabled; it will be
used to kick us out of the guest when the profiling buffer management
interrupt is asserted by the GIC (for example, when the buffer is full).

Add a VCPU flag to inform KVM that the guest has SPE enabled.

It's worth noting that even though the KVM_ARM_SPE config option is gated
by the SPE host driver being compiled-in, we don't actually check that the
driver was loaded successfully when we advertise SPE support for guests.
That's because we can live with the SPE interrupt being disabled. There is
a delay between when the SPE hardware asserts the interrupt and when the
GIC samples the interrupt line and asserts it to the CPU. If the SPE
interrupt is disabled at the GIC level, this delay will be larger, at most
a host timer tick.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/asm/kvm_host.h |  9 +
 arch/arm64/kvm/Kconfig|  8 
 include/kvm/arm_spe.h | 19 +++
 3 files changed, 36 insertions(+)
 create mode 100644 include/kvm/arm_spe.h

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 25d326aecded..43eee197764f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -406,6 +406,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_GUEST_HAS_SVE(1 << 5) /* SVE exposed to 
guest */
 #define KVM_ARM64_VCPU_SVE_FINALIZED   (1 << 6) /* SVE config completed */
 #define KVM_ARM64_GUEST_HAS_PTRAUTH(1 << 7) /* PTRAUTH exposed to guest */
+#define KVM_ARM64_GUEST_HAS_SPE(1 << 8) /* SPE exposed to 
guest */
 
 #define vcpu_has_sve(vcpu) (system_supports_sve() && \
((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
@@ -419,6 +420,14 @@ struct kvm_vcpu_arch {
 #define vcpu_has_ptrauth(vcpu) false
 #endif
 
+#ifdef CONFIG_KVM_ARM_SPE
+#define vcpu_has_spe(vcpu) \
+   (cpus_have_final_cap(ARM64_SPE) &&  \
+((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SPE))
+#else
+#define vcpu_has_spe(vcpu) false
+#endif
+
 #define vcpu_gp_regs(v)(&(v)->arch.ctxt.regs)
 
 /*
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 043756db8f6e..8b35c0b806a7 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -57,6 +57,14 @@ config KVM_ARM_PMU
  Adds support for a virtual Performance Monitoring Unit (PMU) in
  virtual machines.
 
+config KVM_ARM_SPE
+   bool "Virtual Statistical Profiling Extension (SPE) support"
+   depends on ARM_SPE_PMU
+   default y
+   help
+ Adds support for a virtual Statistical Profiling Extension (SPE) in
+ virtual machines.
+
 endif # KVM
 
 endif # VIRTUALIZATION
diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
new file mode 100644
index ..db51ef15bf45
--- /dev/null
+++ b/include/kvm/arm_spe.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 ARM Ltd.
+ */
+
+#ifndef __ASM_ARM_KVM_SPE_H
+#define __ASM_ARM_KVM_SPE_H
+
+#ifdef CONFIG_KVM_ARM_SPE
+static inline bool kvm_arm_supports_spe(void)
+{
+   return cpus_have_final_cap(ARM64_SPE);
+}
+
+#else
+#define kvm_arm_supports_spe() false
+
+#endif /* CONFIG_KVM_ARM_SPE */
+#endif /* __ASM_ARM_KVM_SPE_H */
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 07/16] KVM: arm64: Define SPE data structure for each VCPU

2020-10-27 Thread Alexandru Elisei
From: Sudeep Holla 

Define basic struct for supporting SPE for guest VCPUs.

[Andrew M: Add irq_level, rename irq to irq_num for kvm_spe ]
[Alexandru E: Reworked patch ]

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/asm/kvm_host.h | 2 ++
 include/kvm/arm_spe.h | 9 +
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 43eee197764f..5b68c06930c6 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
@@ -329,6 +330,7 @@ struct kvm_vcpu_arch {
struct vgic_cpu vgic_cpu;
struct arch_timer_cpu timer_cpu;
struct kvm_pmu pmu;
+   struct kvm_spe_cpu spe_cpu;
 
/*
 * Anything that is not used directly from assembly code goes
diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
index db51ef15bf45..46ec447ed013 100644
--- a/include/kvm/arm_spe.h
+++ b/include/kvm/arm_spe.h
@@ -12,8 +12,17 @@ static inline bool kvm_arm_supports_spe(void)
return cpus_have_final_cap(ARM64_SPE);
 }
 
+struct kvm_spe_cpu {
+   int irq_num;/* Guest visibile INTID */
+   bool irq_level; /* 'true' if interrupt is asserted to the VGIC 
*/
+   bool initialized;   /* Feature is initialized on VCPU */
+};
+
 #else
 #define kvm_arm_supports_spe() false
 
+struct kvm_spe_cpu {
+};
+
 #endif /* CONFIG_KVM_ARM_SPE */
 #endif /* __ASM_ARM_KVM_SPE_H */
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 09/16] KVM: arm64: Use separate function for the mapping size in user_mem_abort()

2020-10-27 Thread Alexandru Elisei
user_mem_abort() is already a long and complex function, let's make it
slightly easier to understand by abstracting the algorithm for choosing the
stage 2 IPA entry size into its own function.

This also makes it possible to reuse the code when guest SPE support will
be added.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/kvm/mmu.c | 55 ++--
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 19aacc7d64de..c3c43555490d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -738,12 +738,43 @@ transparent_hugepage_adjust(struct kvm_memory_slot 
*memslot,
return PAGE_SIZE;
 }
 
+static short stage2_max_pageshift(struct kvm_memory_slot *memslot,
+ struct vm_area_struct *vma, hva_t hva,
+ bool *force_pte)
+{
+   short pageshift;
+
+   *force_pte = false;
+
+   if (is_vm_hugetlb_page(vma))
+   pageshift = huge_page_shift(hstate_vma(vma));
+   else
+   pageshift = PAGE_SHIFT;
+
+   if (memslot_is_logging(memslot) || (vma->vm_flags & VM_PFNMAP)) {
+   *force_pte = true;
+   pageshift = PAGE_SHIFT;
+   }
+
+   if (pageshift == PUD_SHIFT &&
+   !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
+   pageshift = PMD_SHIFT;
+
+   if (pageshift == PMD_SHIFT &&
+   !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
+   *force_pte = true;
+   pageshift = PAGE_SHIFT;
+   }
+
+   return pageshift;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
 {
int ret = 0;
-   bool write_fault, writable, force_pte = false;
+   bool write_fault, writable, force_pte;
bool exec_fault;
bool device = false;
unsigned long mmu_seq;
@@ -776,27 +807,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
return -EFAULT;
}
 
-   if (is_vm_hugetlb_page(vma))
-   vma_shift = huge_page_shift(hstate_vma(vma));
-   else
-   vma_shift = PAGE_SHIFT;
-
-   if (logging_active ||
-   (vma->vm_flags & VM_PFNMAP)) {
-   force_pte = true;
-   vma_shift = PAGE_SHIFT;
-   }
-
-   if (vma_shift == PUD_SHIFT &&
-   !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
-  vma_shift = PMD_SHIFT;
-
-   if (vma_shift == PMD_SHIFT &&
-   !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
-   force_pte = true;
-   vma_shift = PAGE_SHIFT;
-   }
-
+   vma_shift = stage2_max_pageshift(memslot, vma, hva, _pte);
vma_pagesize = 1UL << vma_shift;
if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
fault_ipa &= ~(vma_pagesize - 1);
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 15/16] KVM: arm64: Enable SPE for guests

2020-10-27 Thread Alexandru Elisei
We have all the bits in place to expose SPE to guests, allow userspace to
set the feature and advertise the presence of SPE in the ID_AA64DFR0_EL1
register.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/sys_regs.c | 8 ++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index bcecc6224c59..e5504c9847fc 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -39,7 +39,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3a0687602839..076be04d2e28 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1178,8 +1178,12 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
val = cpuid_feature_cap_perfmon_field(val,
ID_AA64DFR0_PMUVER_SHIFT,
ID_AA64DFR0_PMUVER_8_1);
-   /* Don't advertise SPE to guests */
-   val &= ~(0xfUL << ID_AA64DFR0_PMSVER_SHIFT);
+   /*
+* Don't advertise SPE to guests without SPE. Otherwise, allow
+* the guest to detect the hardware SPE version.
+*/
+   if (!vcpu_has_spe(vcpu))
+   val &= ~(0xfUL << ID_AA64DFR0_PMSVER_SHIFT);
} else if (id == SYS_ID_DFR0_EL1) {
/* Limit guests to PMUv3 for ARMv8.1 */
val = cpuid_feature_cap_perfmon_field(val,
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 14/16] KVM: arm64: Emulate SPE buffer management interrupt

2020-10-27 Thread Alexandru Elisei
A profiling buffer management interrupt is asserted when the buffer fills,
on a fault or on an external abort. The service bit, PMBSR_EL1.S, is set as
long as SPE asserts this interrupt. SPE can also assert the interrupt
following a direct write to PMBSR_EL1 that sets the bit. The SPE hardware
stops asserting the interrupt only when the service bit is cleared.

KVM emulates the interrupt by reading the value of the service bit on each
guest exit to determine if the SPE hardware asserted the interrupt (for
example, if the buffer was full). Writes to the buffer registers are
trapped, to determine when the interrupt should be cleared or when the
guest wants to explicitely assert the interrupt by setting the service bit.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/asm/sysreg.h  |   3 +
 arch/arm64/kvm/arm.c |   3 +
 arch/arm64/kvm/hyp/nvhe/spe-sr.c |  20 +-
 arch/arm64/kvm/hyp/vhe/spe-sr.c  |  19 +-
 arch/arm64/kvm/spe.c | 101 +++
 include/kvm/arm_spe.h|   4 ++
 6 files changed, 146 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 20159af17578..0398bcba83a6 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -299,6 +299,7 @@
 #define SYS_PMBLIMITR_EL1_FM_SHIFT 1
 #define SYS_PMBLIMITR_EL1_FM_MASK  0x3UL
 #define SYS_PMBLIMITR_EL1_FM_STOP_IRQ  (0 << SYS_PMBLIMITR_EL1_FM_SHIFT)
+#define SYS_PMBLIMITR_EL1_RES0 0xf007UL
 
 #define SYS_PMBPTR_EL1 sys_reg(3, 0, 9, 10, 1)
 
@@ -323,6 +324,8 @@
 
 #define SYS_PMBSR_EL1_BUF_BSC_FULL (0x1UL << SYS_PMBSR_EL1_BUF_BSC_SHIFT)
 
+#define SYS_PMBSR_EL1_RES0 0xfc0fUL
+
 /*** End of Statistical Profiling Extension ***/
 
 #define SYS_PMINTENSET_EL1 sys_reg(3, 0, 9, 14, 1)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2d98248f2c66..c6a675aba71e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -775,6 +775,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 */
kvm_vgic_sync_hwstate(vcpu);
 
+   if (vcpu_has_spe(vcpu))
+   kvm_arm_spe_sync_hwstate(vcpu);
+
/*
 * Sync the timer hardware state before enabling interrupts as
 * we don't want vtimer interrupts to race with syncing the
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index a73ee820b27f..2794a7c7fcd9 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -47,6 +47,14 @@ void __sysreg_save_spe_host_state_nvhe(struct 
kvm_cpu_context *ctxt)
 void __sysreg_restore_spe_guest_state_nvhe(struct kvm_vcpu *vcpu)
 {
struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
+   struct kvm_spe_cpu *spe_cpu = >arch.spe_cpu;
+   u64 pmblimitr;
+
+   /* Disable guest profiling when the interrupt is asserted. */
+   if (spe_cpu->irq_level)
+   pmblimitr = 0;
+   else
+   pmblimitr = ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1);
 
__sysreg_restore_spe_state_common(guest_ctxt);
 
@@ -54,16 +62,24 @@ void __sysreg_restore_spe_guest_state_nvhe(struct kvm_vcpu 
*vcpu)
isb();
 
write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
-   /* The guest buffer management event interrupt is always virtual. */
+   /* The guest buffer management interrupt is always virtual. */
write_sysreg_s(0, SYS_PMBSR_EL1);
-   write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), 
SYS_PMBLIMITR_EL1);
+   write_sysreg_s(pmblimitr, SYS_PMBLIMITR_EL1);
write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
 }
 
 void __sysreg_save_spe_guest_state_nvhe(struct kvm_vcpu *vcpu)
 {
struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
+   struct kvm_spe_cpu *spe_cpu = >arch.spe_cpu;
u64 pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+   u64 pmbsr = read_sysreg_s(SYS_PMBSR_EL1);
+
+   /* Update guest PMBSR_EL1 only when SPE asserts an interrupt. */
+   if (pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT)) {
+   ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = pmbsr;
+   spe_cpu->pmbirq_asserted = true;
+   }
 
if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
psb_csync();
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index dd947e9f249c..24173f838bb1 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -44,6 +44,8 @@ NOKPROBE_SYMBOL(sysreg_save_spe_host_state_vhe);
 void sysreg_restore_spe_guest_state_vhe(struct kvm_vcpu *vcpu)
 {
struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
+   struct kvm_spe_cpu *spe_cpu = >arch.spe_cpu;
+   u64 pmblimitr;
 
/*
 * Make sure the write to MDCR_EL2 which changes the buffer owning
@@ -51,6 

[RFC PATCH v3 08/16] KVM: arm64: Add a new VCPU device control group for SPE

2020-10-27 Thread Alexandru Elisei
From: Sudeep Holla 

To configure the virtual SPE buffer management interrupt number, we use a
VCPU kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_IRQ attribute
within the KVM_ARM_VCPU_SPE_CTRL group.

After configuring the SPE, userspace is required to call the VCPU ioctl
with the attribute KVM_ARM_VCPU_SPE_INIT to initialize SPE on the VCPU.

[Alexandru E: Fixed compilation errors, don't allow userspace to set the
VCPU feature, removed unused functions, fixed mismatched
descriptions, comments and error codes, reworked logic, rebased on
top of v5.10-rc1]

Signed-off-by: Sudeep Holla 
Signed-off-by: Alexandru Elisei 
---
 Documentation/virt/kvm/devices/vcpu.rst |  40 
 arch/arm64/include/uapi/asm/kvm.h   |   3 +
 arch/arm64/kvm/Makefile |   1 +
 arch/arm64/kvm/guest.c  |   9 ++
 arch/arm64/kvm/reset.c  |  23 +
 arch/arm64/kvm/spe.c| 129 
 include/kvm/arm_spe.h   |  27 +
 include/uapi/linux/kvm.h|   1 +
 8 files changed, 233 insertions(+)
 create mode 100644 arch/arm64/kvm/spe.c

diff --git a/Documentation/virt/kvm/devices/vcpu.rst 
b/Documentation/virt/kvm/devices/vcpu.rst
index 2acec3b9ef65..6135b9827fbe 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -161,3 +161,43 @@ Specifies the base address of the stolen time structure 
for this VCPU. The
 base address must be 64 byte aligned and exist within a valid guest memory
 region. See Documentation/virt/kvm/arm/pvtime.rst for more information
 including the layout of the stolen time structure.
+
+4. GROUP: KVM_ARM_VCPU_SPE_CTRL
+===
+
+:Architectures: ARM64
+
+4.1 ATTRIBUTE: KVM_ARM_VCPU_SPE_IRQ
+---
+
+:Parameters: in kvm_device_attr.addr the address for the SPE buffer management
+ interrupt is a pointer to an int
+
+Returns:
+
+===  
+-EBUSY   The SPE buffer management interrupt is already set
+-EINVAL  Invalid SPE overflow interrupt number
+-EFAULT  Could not read the buffer management interrupt number
+-ENXIO   SPE not supported or not properly configured
+===  
+
+A value describing the SPE (Statistical Profiling Extension) overflow interrupt
+number for this vcpu. This interrupt should be a PPI and the interrupt type and
+number must be same for each vcpu.
+
+4.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_INIT
+
+
+:Parameters: no additional parameter in kvm_device_attr.addr
+
+Returns:
+
+===  ==
+-EBUSY   SPE already initialized
+-ENODEV  GIC not initialized
+-ENXIO   SPE not supported or not properly configured
+===  ==
+
+Request the initialization of the SPE. Must be done after initializing the
+in-kernel irqchip and after setting the interrupt number for the VCPU.
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 489e12304dbb..ca57dfb7abf0 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -360,6 +360,9 @@ struct kvm_vcpu_events {
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
 #define KVM_ARM_VCPU_PVTIME_CTRL   2
 #define   KVM_ARM_VCPU_PVTIME_IPA  0
+#define KVM_ARM_VCPU_SPE_CTRL  3
+#define   KVM_ARM_VCPU_SPE_IRQ 0
+#define   KVM_ARM_VCPU_SPE_INIT1
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT28
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 1504c81fbf5d..f6e76f64ffbe 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,3 +25,4 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o \
 vgic/vgic-its.o vgic/vgic-debug.o
 
 kvm-$(CONFIG_KVM_ARM_PMU)  += pmu-emul.o
+kvm-$(CONFIG_KVM_ARM_SPE)  += spe.o
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index dfb5218137ca..2ba790eeb782 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -926,6 +926,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
case KVM_ARM_VCPU_PVTIME_CTRL:
ret = kvm_arm_pvtime_set_attr(vcpu, attr);
break;
+   case KVM_ARM_VCPU_SPE_CTRL:
+   ret = kvm_arm_spe_set_attr(vcpu, attr);
+   break;
default:
ret = -ENXIO;
break;
@@ -949,6 +952,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
case KVM_ARM_VCPU_PVTIME_CTRL:
ret = kvm_arm_pvtime_get_attr(vcpu, attr);
break;
+   case KVM_ARM_VCPU_SPE_CTRL:
+   ret = 

[RFC PATCH v3 11/16] KVM: arm64: Add SPE system registers to VCPU context

2020-10-27 Thread Alexandru Elisei
Add the SPE system registers to the VCPU context. Omitted are PMBIDR_EL1,
which cannot be trapped, and PMSIR_EL1, which is a read-only register. The
registers are simply stored in the sys_regs array on a write, and returned
on a read; complete emulation and save/restore on world switch will be
added in a future patch.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/asm/kvm_host.h | 11 +++
 arch/arm64/kvm/spe.c  | 10 +++
 arch/arm64/kvm/sys_regs.c | 48 ---
 include/kvm/arm_spe.h |  9 ++
 4 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 27f581750c6e..bcecc6224c59 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -194,6 +194,17 @@ enum vcpu_sysreg {
CNTP_CVAL_EL0,
CNTP_CTL_EL0,
 
+   /* Statistical Profiling Extension Registers. */
+   PMSCR_EL1,  /* Statistical Profiling Control Register */
+   PMSICR_EL1, /* Sampling Interval Counter Register */
+   PMSIRR_EL1, /* Sampling Interval Reload Register */
+   PMSFCR_EL1, /* Sampling Filter Control Register */
+   PMSEVFR_EL1,/* Sampling Event Filter Register */
+   PMSLATFR_EL1,   /* Sampling Latency Filter Register */
+   PMBLIMITR_EL1,  /* Profiling Buffer Limit Address Register */
+   PMBPTR_EL1, /* Profiling Buffer Write Pointer Register */
+   PMBSR_EL1,  /* Profiling Buffer Status/syndrome Register */
+
/* 32bit specific registers. Keep them at the end of the range */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 316ff8dfed5b..0e365a51cac7 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -12,6 +12,16 @@
 
 #include 
 
+void kvm_arm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
+{
+   __vcpu_sys_reg(vcpu, reg) = val;
+}
+
+u64 kvm_arm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
+{
+   return __vcpu_sys_reg(vcpu, reg);
+}
+
 void kvm_arm_spe_notify_vcpu_init(struct kvm_vcpu *vcpu)
 {
vcpu->kvm->arch.spe.finalized = false;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index aa776c006a2a..2871484993ec 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -244,9 +244,37 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
return true;
 }
 
+static bool access_spe_reg(struct kvm_vcpu *vcpu,
+  struct sys_reg_params *p,
+  const struct sys_reg_desc *r)
+{
+   u64 val = p->regval;
+   int reg = r->reg;
+   u32 sr = sys_reg((u32)r->Op0, (u32)r->Op1,
+(u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
+
+   if (sr == SYS_PMSIDR_EL1) {
+   /* Ignore writes. */
+   if (!p->is_write)
+   p->regval = read_sysreg_s(SYS_PMSIDR_EL1);
+   goto out;
+   }
+
+   if (p->is_write)
+   kvm_arm_spe_write_sysreg(vcpu, reg, val);
+   else
+   p->regval = kvm_arm_spe_read_sysreg(vcpu, reg);
+
+out:
+   return true;
+}
+
 static unsigned int spe_visibility(const struct kvm_vcpu *vcpu,
   const struct sys_reg_desc *r)
 {
+   if (vcpu_has_spe(vcpu))
+   return 0;
+
return REG_HIDDEN_GUEST | REG_HIDDEN_USER;
 }
 
@@ -1598,16 +1626,16 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
-   { SYS_DESC(SYS_PMSCR_EL1), .visibility = spe_visibility },
-   { SYS_DESC(SYS_PMSICR_EL1), .visibility = spe_visibility },
-   { SYS_DESC(SYS_PMSIRR_EL1), .visibility = spe_visibility },
-   { SYS_DESC(SYS_PMSFCR_EL1), .visibility = spe_visibility },
-   { SYS_DESC(SYS_PMSEVFR_EL1), .visibility = spe_visibility },
-   { SYS_DESC(SYS_PMSLATFR_EL1), .visibility = spe_visibility },
-   { SYS_DESC(SYS_PMSIDR_EL1), .visibility = spe_visibility },
-   { SYS_DESC(SYS_PMBLIMITR_EL1), .visibility = spe_visibility },
-   { SYS_DESC(SYS_PMBPTR_EL1), .visibility = spe_visibility },
-   { SYS_DESC(SYS_PMBSR_EL1), .visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSCR_EL1), access_spe_reg, reset_val, PMSCR_EL1, 0, 
.visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSICR_EL1), access_spe_reg, reset_val, PMSICR_EL1, 0, 
.visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSIRR_EL1), access_spe_reg, reset_val, PMSIRR_EL1, 0, 
.visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSFCR_EL1), access_spe_reg, reset_val, PMSFCR_EL1, 0, 
.visibility = spe_visibility },
+   { SYS_DESC(SYS_PMSEVFR_EL1), access_spe_reg, reset_val, PMSEVFR_EL1, 0, 
.visibility = 

[RFC PATCH v3 05/16] KVM: arm64: Introduce VCPU SPE feature

2020-10-27 Thread Alexandru Elisei
Introduce the feature bit, but don't allow userspace to set it yet.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 1c17c3a24411..489e12304dbb 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE   4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS   5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC   6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_SPE   7 /* Enable SPE for this CPU */
 
 struct kvm_vcpu_init {
__u32 target;
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 00/16] KVM: arm64: Add Statistical Profiling Extension (SPE) support

2020-10-27 Thread Alexandru Elisei
Statistical Profiling Extension (SPE) is an optional feature added in
ARMv8.2. It allows sampling at regular intervals of the operations executed
by the PE and storing a record of each operation in a memory buffer. A high
level overview of the extension is presented in an article on arm.com [1].

This series implements SPE support for KVM guests. The series is based on
v5.10-rc1 has been almost completely rewritten, but I've tried to keep some
patches from v2 [2] and the initial version of the series [3]. The series
can also be found in a repo [4] to make testing easier.

This series is firmly in RFC territory for several reasons:

* It introduces an userspace API to pre-map guest memory at stage 2, which
  I think deserves some discussion before we commit to it.

* The way I'm handling the SPE interrupt is completely different than what
  was implemented in v2.

* SPE state save/restore unconditionally save the host SPE state on VM
  entry and restores it on VM exit, regardless of whether the host is
  actually profiling or not. I plan to improve this in following
  iterations.

I am also interested to know why the spe header lives in
/include/kvm/kvm_spe.h instead of /arch/arm64/incluse/asm/kvm_spe.h. My
guess is that the headers there are for code that was shared with KVM arm.
 Since KVM arm was removed, I would like to move the header to /arch/arm64,
but I wanted to make sure that is acceptable.

The profiling buffer


KVM cannot handle SPE stage 2 faults and the guest memory must be
memory-resident and mapped at stage 2 the entire lifetime of the guest.
More details in patch #10 ("KVM: arm64: Add a new VM device control group
for SPE").

This is achieved with the help of userspace in two stages:

1. Userspace calls mlock() on the VMAs that represent the guest memory.

2. After userspace has copied everything to the guest memory, it uses the
   KVM_ARM_VM_SPE_CTRL(KVM_ARM_VM_SPE_FINALIZE) ioctl to tell KVM to map
   all VM_LOCKED and VM_HUGETLB VMAs at stage 2 (explanation why VM_HUGETLB
   is also mapped in patch #10).

I have added support for SPE to kvmtool, patches are on the mailing list
[5], as well as in a repo [6] for easy testing.

There are some things that I'm not 100% sure about and I would like to get
some feedback before we commit to an ABI:

* At the moment, having SPE enabled for a guest forces unmapping of the
  guest memory when the VCPU is reset. This is done to make sure the
  dcaches are cleaned to POC when the VM starts. It isn't necessary when
  the system has FWB, but I decided to unmap the guest memory even in this
  case for two reasons:

  1. Userspace doesn't know when FWB is available and thus if the finalize
call is necessary.

  2. I haven't seen anywhere in the documentation a statement regarding
changing memslots when the VM is in the process of resetting, I am assuming
it's not forbidden (please correct me if I'm wrong).

If it's forbidden to change memslots when resetting the VM, then we could
add an extension of something similar that tells userspace if a finalize
call is required after VM reset.

* Instead of a SPE control group we could have a KVM_ARM_VM_FINALIZE ioctl
  on the vm fd, similar to KVM_ARM_VCPU_FINALIZE. I don't have a strong
  preference for either, the reason for the current implementation is that
  I hadn't thought about KVM_ARM_VM_FINALIZE until the series were almost
  finished.

The buffer interrupt


Also referred to in the Arm ARM as the Profiling Buffer management
interrupt. The guest SPE interrupt handling has been completely reworked
and now it's handled by checking the service bit in the PMBSR_EL1 register
on every switch to host; implementation in patch #14 ("KVM: arm64: Emulate
SPE buffer management event interrupt").

Another option that I considered was to change the host irq handler for the
SPE interrupt to check kvm_get_running_cpu() and defer the handling of the
interrupt to the KVM code. There are a few reasons I decided against it:

* We need to keet the PMBSR_EL1.S bit set until KVM enables interrupts,
  which means that the host won't be able to profile KVM between
  kvm_load()/kvm_put().

* Software can trigger the interrupt with a write to the PMBSR_EL1 register
  that sets the service bit. This means that the KVM irq handler won't be
  able to distinguish between the guest configuring PMBSR_EL1 to report a
  stage 2 fault, which is harmless for the host, and the hardware reporting
  it, which can indicate a bug. Even more serious, KVM won't be able to
  distinguish between a PMBSR_EL1 value indicating an External Abort written
  by the guest, again, harmless, and one reported by the hardware, which
  is pretty serious.

This is what the architecture says about SPE external aborts, on page
D9-2806:

"A write to the Profiling Buffer might generate an external abort,
including an external abort on a translation table walk or translation
table update. It is an IMPLEMENTATION DEFINED choice 

[RFC PATCH v3 02/16] dt-bindings: ARM SPE: Highlight the need for PPI partitions on heterogeneous systems

2020-10-27 Thread Alexandru Elisei
From: Sudeep Holla 

It's not entirely clear from the binding document that the only way to
express ARM SPE affined to a subset of CPUs on a heterogeneous systems is
through the use of PPI partitions available in the interrupt controller
bindings.

Let's make it clear.

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
Signed-off-by: Alexandru Elisei 
---
 Documentation/devicetree/bindings/arm/spe-pmu.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/spe-pmu.txt 
b/Documentation/devicetree/bindings/arm/spe-pmu.txt
index 93372f2a7df9..4f4815800f6e 100644
--- a/Documentation/devicetree/bindings/arm/spe-pmu.txt
+++ b/Documentation/devicetree/bindings/arm/spe-pmu.txt
@@ -9,8 +9,9 @@ performance sample data using an in-memory trace buffer.
   "arm,statistical-profiling-extension-v1"
 
 - interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where
-   SPE is only supported on a subset of the CPUs, please consult
-  the arm,gic-v3 binding for details on describing a PPI partition.
+   SPE is only supported on a subset of the CPUs, a PPI partition
+  described in the arm,gic-v3 binding must be used to describe
+  the set of CPUs this interrupt is affine to.
 
 ** Example:
 
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v3 01/16] KVM: arm64: Initialize VCPU mdcr_el2 before loading it

2020-10-27 Thread Alexandru Elisei
When a VCPU is created, the kvm_vcpu struct is initialized to zero in
kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
result is that KVM executes for a short time with MDCR_EL2 set to zero.

This is mostly harmless as we don't need to trap debug and SPE register
accesses from EL1 (we're still running in the host at EL2), but we do set
MDCR_EL2.HPMN to 0 which is constrained unpredictable according to ARM DDI
0487F.b, page D13-3620; the required behavior from the hardware in this
case is to reserve an unkown number of registers for EL2 and EL3 exclusive
use.

Initialize mdcr_el2 in kvm_vcpu_vcpu_first_run_init(), so we can avoid the
constrained unpredictable behavior and to ensure that the MDCR_EL2 register
has the same value after each vcpu_load(), including the first time the
VCPU is run.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/arm.c  |  3 +-
 arch/arm64/kvm/debug.c| 81 +--
 3 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 0aecbab6a7fb..25d326aecded 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -597,6 +597,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, 
int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
 void kvm_arm_init_debug(void);
+void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f56122eedffc..e51d8f328c7e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -544,6 +544,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
static_branch_inc(_irqchip_in_use);
}
 
+   kvm_arm_vcpu_init_debug(vcpu);
+
ret = kvm_timer_enable(vcpu);
if (ret)
return ret;
@@ -739,7 +741,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
}
 
kvm_arm_setup_debug(vcpu);
-
/**
 * Enter the guest
 */
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 7a7e425616b5..22ee448aee2b 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -68,6 +68,59 @@ void kvm_arm_init_debug(void)
__this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
 }
 
+/**
+ * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
+ *
+ * @vcpu:  the vcpu pointer
+ * @host_mdcr:  host mdcr_el2 value
+ *
+ * This ensures we will trap access to:
+ *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
+ *  - Debug ROM Address (MDCR_EL2_TDRA)
+ *  - OS related registers (MDCR_EL2_TDOSA)
+ *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
+ */
+static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
+{
+   bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
+
+   /*
+* This also clears MDCR_EL2_E2PB_MASK to disable guest access
+* to the profiling buffer.
+*/
+   vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
+   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
+   MDCR_EL2_TPMS |
+   MDCR_EL2_TPMCR |
+   MDCR_EL2_TDRA |
+   MDCR_EL2_TDOSA);
+
+   if (vcpu->guest_debug) {
+   /* Route all software debug exceptions to EL2 */
+   vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
+   if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
+   trap_debug = true;
+   }
+
+   /* Trap debug register access */
+   if (trap_debug)
+   vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
+
+   trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
+}
+
+/**
+ * kvm_arm_vcpu_init_debug - setup vcpu debug traps
+ *
+ * @vcpu:  the vcpu pointer
+ *
+ * Set vcpu initial mdcr_el2 value.
+ */
+void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
+{
+   kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
+}
+
 /**
  * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state
  */
@@ -83,12 +136,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
  * @vcpu:  the vcpu pointer
  *
  * This is called before each entry into the hypervisor to setup any
- * debug related registers. Currently this just ensures we will trap
- * access to:
- *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
- *  - Debug ROM Address (MDCR_EL2_TDRA)
- *  - OS related registers (MDCR_EL2_TDOSA)
- *  - 

[kvm-unit-tests RFC PATCH v2 1/5] arm64: Move get_id_aa64dfr0() in processor.h

2020-10-27 Thread Alexandru Elisei
From: Eric Auger 

SPE support is reported in the ID_AA64DFR0_EL1 register. Move the function
get_id_aa64dfr0() from the pmu test to processor.h so it can be reused by
the SPE tests.

[ Alexandru E: Reworded commit ]

Signed-off-by: Eric Auger 
Signed-off-by: Alexandru Elisei 
---
 lib/arm64/asm/processor.h | 5 +
 arm/pmu.c | 1 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
index 02665b84cc7e..11b756475494 100644
--- a/lib/arm64/asm/processor.h
+++ b/lib/arm64/asm/processor.h
@@ -88,6 +88,11 @@ static inline uint64_t get_mpidr(void)
return read_sysreg(mpidr_el1);
 }
 
+static inline uint64_t get_id_aa64dfr0(void)
+{
+   return read_sysreg(id_aa64dfr0_el1);
+}
+
 #define MPIDR_HWID_BITMASK 0xff00ff
 extern int mpidr_to_cpu(uint64_t mpidr);
 
diff --git a/arm/pmu.c b/arm/pmu.c
index 831fb6618279..5406ca3b31ed 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -167,7 +167,6 @@ static void test_overflow_interrupt(void) {}
 #define ID_DFR0_PMU_V3_8_5 0b0110
 #define ID_DFR0_PMU_IMPDEF 0b
 
-static inline uint32_t get_id_aa64dfr0(void) { return 
read_sysreg(id_aa64dfr0_el1); }
 static inline uint32_t get_pmcr(void) { return read_sysreg(pmcr_el0); }
 static inline void set_pmcr(uint32_t v) { write_sysreg(v, pmcr_el0); }
 static inline uint64_t get_pmccntr(void) { return read_sysreg(pmccntr_el0); }
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[kvm-unit-tests RFC PATCH v2 3/5] arm64: spe: Add introspection test

2020-10-27 Thread Alexandru Elisei
From: Eric Auger 

Probe the DTB and the ID registers to get information about SPE, then
compare the register fields with the valid values as defined by ARM DDI
0487F.b.

SPE is supported only on AArch64, so make the test exclusive to the
arm64 architecture.

[ Alexandru E: Removed aarch32 compilation support, added DTB probing,
reworded commit, mostly cosmetic changes to the code ]

Signed-off-by: Eric Auger 
Signed-off-by: Alexandru Elisei 
---
 arm/Makefile.arm64 |   1 +
 lib/libcflat.h |   1 +
 arm/spe.c  | 172 +
 arm/unittests.cfg  |   7 ++
 4 files changed, 181 insertions(+)
 create mode 100644 arm/spe.c

diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index dbc7524d3070..94b9c63f0b05 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -30,6 +30,7 @@ OBJDIRS += lib/arm64
 tests = $(TEST_DIR)/timer.flat
 tests += $(TEST_DIR)/micro-bench.flat
 tests += $(TEST_DIR)/cache.flat
+tests += $(TEST_DIR)/spe.flat
 
 include $(SRCDIR)/$(TEST_DIR)/Makefile.common
 
diff --git a/lib/libcflat.h b/lib/libcflat.h
index ec0f58b05701..37550c99ffb6 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -37,6 +37,7 @@
 #define IS_ALIGNED(x, a)   (((x) & ((typeof(x))(a) - 1)) == 0)
 
 #define SZ_256 (1 << 8)
+#define SZ_2K  (1 << 11)
 #define SZ_4K  (1 << 12)
 #define SZ_8K  (1 << 13)
 #define SZ_16K (1 << 14)
diff --git a/arm/spe.c b/arm/spe.c
new file mode 100644
index ..c199cd239194
--- /dev/null
+++ b/arm/spe.c
@@ -0,0 +1,172 @@
+/*
+ * Copyright (C) 2020, Red Hat Inc, Eric Auger 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define ID_AA64DFR0_PMSVER_SHIFT   32
+#define ID_AA64DFR0_PMSVER_MASK0xf
+
+#define SYS_PMBIDR_EL1 sys_reg(3, 0, 9, 10, 7)
+#define SYS_PMBIDR_EL1_F_SHIFT 5
+#define SYS_PMBIDR_EL1_P_SHIFT 4
+#define SYS_PMBIDR_EL1_ALIGN_MASK  0xfUL
+#define SYS_PMBIDR_EL1_ALIGN_SHIFT 0
+
+#define SYS_PMSIDR_EL1 sys_reg(3, 0, 9, 9, 7)
+#define SYS_PMSIDR_EL1_FE_SHIFT0
+#define SYS_PMSIDR_EL1_FT_SHIFT1
+#define SYS_PMSIDR_EL1_FL_SHIFT2
+#define SYS_PMSIDR_EL1_INTERVAL_SHIFT  8
+#define SYS_PMSIDR_EL1_INTERVAL_MASK   0xfUL
+#define SYS_PMSIDR_EL1_MAXSIZE_SHIFT   12
+#define SYS_PMSIDR_EL1_MAXSIZE_MASK0xfUL
+#define SYS_PMSIDR_EL1_MAXSIZE_MASK0xfUL
+#define SYS_PMSIDR_EL1_COUNTSIZE_SHIFT 16
+#define SYS_PMSIDR_EL1_COUNTSIZE_MASK  0xfUL
+
+struct spe {
+   uint32_t intid;
+   int min_interval;
+   int max_record_size;
+   int countsize;
+   bool fl_cap;
+   bool ft_cap;
+   bool fe_cap;
+   int align;
+};
+static struct spe spe;
+
+static int spe_min_interval(uint8_t interval)
+{
+   switch (interval) {
+   case 0x0:
+   return 256;
+   case 0x2:
+   return 512;
+   case 0x3:
+   return 768;
+   case 0x4:
+   return 1024;
+   case 0x5:
+   return 1536;
+   case 0x6:
+   return 2048;
+   case 0x7:
+   return 3072;
+   case 0x8:
+   return 4096;
+   default:
+   return 0;
+   }
+}
+
+static bool spe_probe(void)
+{
+   const struct fdt_property *prop;
+   const void *fdt = dt_fdt();
+   int node, len;
+   uint32_t *data;
+   uint64_t pmbidr, pmsidr;
+   uint64_t aa64dfr0 = get_id_aa64dfr0();
+   uint8_t pmsver, interval;
+
+   node = fdt_node_offset_by_compatible(fdt, -1, 
"arm,statistical-profiling-extension-v1");
+   assert(node >= 0);
+   prop = fdt_get_property(fdt, node, "interrupts", );
+   assert(prop && len == 3 * sizeof(u32));
+
+   data = (u32 *)prop->data;
+   /* SPE interrupt is required to be a PPI. */
+   assert(fdt32_to_cpu(data[0]) == 1);
+   spe.intid = fdt32_to_cpu(data[1]);
+
+   pmsver = (aa64dfr0 >> ID_AA64DFR0_PMSVER_SHIFT) & 
ID_AA64DFR0_PMSVER_MASK;
+   if (!pmsver || pmsver > 2) {
+   report_info("Unknown SPE version: 0x%x", pmsver);
+   return false;
+   }
+
+   pmbidr = read_sysreg_s(SYS_PMBIDR_EL1);
+   if (pmbidr & BIT(SYS_PMBIDR_EL1_P_SHIFT)) {
+   report_info("Profiling buffer owned by higher exception level");
+   return false;
+   }
+
+   spe.align = (pmbidr 

[kvm-unit-tests RFC PATCH v2 0/5] arm64: Statistical Profiling Extension Tests

2020-10-27 Thread Alexandru Elisei
This series implements two basic tests for KVM SPE: one that checks that
the reported features match what is specified in the architecture,
implemented in patch #3 ("arm64: spe: Add introspection test"), and another
test that checks that the buffer management interrupt is asserted
correctly, implemented in patch #5 ("am64: spe: Add buffer test"). The rest
of the patches are either preparatory patches, or a fix, in the case of
patch #2 ("lib/{bitops,alloc_page}.h: Add missing headers").

This series builds on Eric's initial version [1], to which I've added the
buffer tests that I used while developing SPE support for KVM.

KVM SPE support is current in RFC phase, hence why this series is also sent
as RFC. The KVM patches needed for the tests to work can be found at [2].
Userspace support is also necessary, which I've implemented for kvmtool;
this can be found at [3]. This series is also available in a repo [4] to make
testing easier.

[1] https://www.spinics.net/lists/kvm/msg223792.html
[2] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/kvm-spe-v3
[3] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/kvm-spe-v3
[4] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/kvm-spe-v2

Alexandru Elisei (3):
  lib/{bitops,alloc_page}.h: Add missing headers
  lib: arm/arm64: Add function to unmap a page
  am64: spe: Add buffer test

Eric Auger (2):
  arm64: Move get_id_aa64dfr0() in processor.h
  arm64: spe: Add introspection test

 arm/Makefile.arm64|   1 +
 lib/arm/asm/mmu-api.h |   1 +
 lib/arm64/asm/processor.h |   5 +
 lib/alloc_page.h  |   2 +
 lib/bitops.h  |   2 +
 lib/libcflat.h|   1 +
 lib/arm/mmu.c |  32 +++
 arm/pmu.c |   1 -
 arm/spe.c | 506 ++
 arm/unittests.cfg |  15 ++
 10 files changed, 565 insertions(+), 1 deletion(-)
 create mode 100644 arm/spe.c

-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[kvm-unit-tests RFC PATCH v2 5/5] am64: spe: Add buffer test

2020-10-27 Thread Alexandru Elisei
According to ARM DDI 0487F.b, a profiling buffer management event occurs:

* On a fault.
* On an external abort.
* When the buffer fills.
* When software sets the service bit, PMBSR_EL1.S.

Set up the buffer to trigger the events and check that they are reported
correctly.

Signed-off-by: Alexandru Elisei 
---
 arm/spe.c | 342 +-
 arm/unittests.cfg |   8 ++
 2 files changed, 346 insertions(+), 4 deletions(-)

diff --git a/arm/spe.c b/arm/spe.c
index c199cd239194..c185883d079a 100644
--- a/arm/spe.c
+++ b/arm/spe.c
@@ -15,8 +15,10 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
+#include 
 #include 
 #include 
 
@@ -41,6 +43,44 @@
 #define SYS_PMSIDR_EL1_COUNTSIZE_SHIFT 16
 #define SYS_PMSIDR_EL1_COUNTSIZE_MASK  0xfUL
 
+#define SYS_PMSCR_EL1  sys_reg(3, 0, 9, 9, 0)
+#define SYS_PMSCR_EL1_E1SPE_SHIFT  1
+#define SYS_PMSCR_EL1_PA_SHIFT 4
+#define SYS_PMSCR_EL1_TS_SHIFT 5
+
+#define SYS_PMSICR_EL1 sys_reg(3, 0, 9, 9, 2)
+
+#define SYS_PMSIRR_EL1 sys_reg(3, 0, 9, 9, 3)
+#define SYS_PMSIRR_EL1_INTERVAL_SHIFT  8
+#define SYS_PMSIRR_EL1_INTERVAL_MASK   0xffUL
+
+#define SYS_PMSFCR_EL1 sys_reg(3, 0, 9, 9, 4)
+#define SYS_PMSFCR_EL1_FE_SHIFT0
+#define SYS_PMSFCR_EL1_FT_SHIFT1
+#define SYS_PMSFCR_EL1_FL_SHIFT2
+#define SYS_PMSFCR_EL1_B_SHIFT 16
+#define SYS_PMSFCR_EL1_LD_SHIFT17
+#define SYS_PMSFCR_EL1_ST_SHIFT18
+
+#define SYS_PMSEVFR_EL1sys_reg(3, 0, 9, 9, 5)
+#define SYS_PMSLATFR_EL1   sys_reg(3, 0, 9, 9, 6)
+
+#define SYS_PMBLIMITR_EL1  sys_reg(3, 0, 9, 10, 0)
+#define SYS_PMBLIMITR_EL1_E_SHIFT  0
+
+#define SYS_PMBPTR_EL1 sys_reg(3, 0, 9, 10, 1)
+
+#define SYS_PMBSR_EL1  sys_reg(3, 0, 9, 10, 3)
+#define SYS_PMBSR_EL1_S_SHIFT  17
+#define SYS_PMBSR_EL1_EA_SHIFT 18
+#define SYS_PMBSR_EL1_BSC_BUF_FULL 1
+#define SYS_PMBSR_EL1_EC_SHIFT 26
+#define SYS_PMBSR_EL1_EC_MASK  0x3fUL
+#define SYS_PMBSR_EL1_EC_FAULT_S1  0x24
+#define SYS_PMBSR_EL1_RES0 0xfc0fUL
+
+#define psb_csync()asm volatile("hint #17" : : : "memory")
+
 struct spe {
uint32_t intid;
int min_interval;
@@ -53,6 +93,15 @@ struct spe {
 };
 static struct spe spe;
 
+struct spe_buffer {
+   uint64_t base;
+   uint64_t limit;
+};
+static struct spe_buffer buffer;
+
+static volatile bool pmbirq_asserted, reassert_irq;
+static volatile uint64_t irq_pmbsr;
+
 static int spe_min_interval(uint8_t interval)
 {
switch (interval) {
@@ -131,6 +180,273 @@ static bool spe_probe(void)
return true;
 }
 
+/*
+ * Resets and starts a profiling session. Must be called with sampling and
+ * buffer disabled.
+ */
+static void spe_reset_and_start(struct spe_buffer *spe_buffer)
+{
+   uint64_t pmscr;
+
+   assert(spe_buffer->base);
+   assert(spe_buffer->limit > spe_buffer->base);
+
+   write_sysreg_s(spe_buffer->base, SYS_PMBPTR_EL1);
+   /* Change the buffer pointer before PMBLIMITR_EL1. */
+   isb();
+
+   write_sysreg_s(spe_buffer->limit | BIT(SYS_PMBLIMITR_EL1_E_SHIFT),
+  SYS_PMBLIMITR_EL1);
+   write_sysreg_s(0, SYS_PMBSR_EL1);
+   write_sysreg_s(0, SYS_PMSICR_EL1);
+   /* PMSICR_EL1 must be written to zero before a new sampling session. */
+   isb();
+
+   pmscr = BIT(SYS_PMSCR_EL1_E1SPE_SHIFT) |
+   BIT(SYS_PMSCR_EL1_PA_SHIFT) |
+   BIT(SYS_PMSCR_EL1_TS_SHIFT);
+   write_sysreg_s(pmscr, SYS_PMSCR_EL1);
+   isb();
+}
+
+static void spe_stop_and_drain(void)
+{
+   write_sysreg_s(0, SYS_PMSCR_EL1);
+   isb();
+
+   psb_csync();
+   dsb(nsh);
+   write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+   isb();
+}
+
+static void spe_irq_handler(struct pt_regs *regs)
+{
+   uint32_t intid = gic_read_iar();
+
+   spe_stop_and_drain();
+
+   irq_pmbsr = read_sysreg_s(SYS_PMBSR_EL1);
+
+   if (intid != PPI(spe.intid)) {
+   report_info("Unexpected interrupt: %u", intid);
+   /*
+* When we get the interrupt, at least one bit, PMBSR_EL1.S,
+* will be set. We can the value zero for an error.
+*/
+   irq_pmbsr = 0;
+   goto out;
+   }
+
+   if (irq_pmbsr && reassert_irq) {
+   /*
+* Don't clear the service bit now, but ack the interrupt so it
+* can be handled again.
+*/
+   gic_write_eoir(intid);
+   reassert_irq = false;
+   irq_pmbsr = 0;
+   return;
+   }
+
+out:
+   write_sysreg_s(0, SYS_PMBSR_EL1);
+   /* Clear PMBSR_EL1 before EOI'ing the interrupt */
+   isb();
+   

[kvm-unit-tests RFC PATCH v2 2/5] lib/{bitops, alloc_page}.h: Add missing headers

2020-10-27 Thread Alexandru Elisei
bitops.h uses the 'bool' and 'size_t' types, but doesn't include the
stddef.h and stdbool.h headers, where the types are defined. This can cause
the following error when compiling:

In file included from arm/new-test.c:9:
/path/to/kvm-unit-tests/lib/bitops.h:77:15: error: unknown type name 'bool'
   77 | static inline bool is_power_of_2(unsigned long n)
  |   ^~~~
/path/to/kvm-unit-tests/lib/bitops.h:82:38: error: unknown type name 'size_t'
   82 | static inline unsigned int get_order(size_t size)
  |  ^~
/path/to/kvm-unit-tests/lib/bitops.h:24:1: note: 'size_t' is defined in header 
''; did you forget to '#include '?
   23 | #include 
  +++ |+#include 
   24 |
make: *** [: arm/new-test.o] Error 1

The same errors were observed when including alloc_page.h. Fix both files
by including stddef.h and stdbool.h.

Signed-off-by: Alexandru Elisei 
---
 lib/alloc_page.h | 2 ++
 lib/bitops.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/lib/alloc_page.h b/lib/alloc_page.h
index 88540d1def06..182862c43363 100644
--- a/lib/alloc_page.h
+++ b/lib/alloc_page.h
@@ -4,6 +4,8 @@
  * This is a simple allocator that provides contiguous physical addresses
  * with byte granularity.
  */
+#include 
+#include 
 
 #ifndef ALLOC_PAGE_H
 #define ALLOC_PAGE_H 1
diff --git a/lib/bitops.h b/lib/bitops.h
index 308aa86514a8..5aeea0b998b1 100644
--- a/lib/bitops.h
+++ b/lib/bitops.h
@@ -1,5 +1,7 @@
 #ifndef _BITOPS_H_
 #define _BITOPS_H_
+#include 
+#include 
 
 /*
  * Adapted from
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[kvm-unit-tests RFC PATCH v2 4/5] lib: arm/arm64: Add function to unmap a page

2020-10-27 Thread Alexandru Elisei
Being able to cause a stage 1 data abort might be useful for future tests.
Add a function that unmaps a page from the translation tables.

Signed-off-by: Alexandru Elisei 
---
 lib/arm/asm/mmu-api.h |  1 +
 lib/arm/mmu.c | 32 
 2 files changed, 33 insertions(+)

diff --git a/lib/arm/asm/mmu-api.h b/lib/arm/asm/mmu-api.h
index 2bbe1faea900..305f77c6501f 100644
--- a/lib/arm/asm/mmu-api.h
+++ b/lib/arm/asm/mmu-api.h
@@ -23,4 +23,5 @@ extern void mmu_set_range_ptes(pgd_t *pgtable, uintptr_t 
virt_offset,
   phys_addr_t phys_start, phys_addr_t phys_end,
   pgprot_t prot);
 extern void mmu_clear_user(pgd_t *pgtable, unsigned long vaddr);
+extern void mmu_unmap_page(pgd_t *pgtable, unsigned long vaddr);
 #endif
diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
index 540a1e842d5b..72ac0be8d146 100644
--- a/lib/arm/mmu.c
+++ b/lib/arm/mmu.c
@@ -232,3 +232,35 @@ void mmu_clear_user(pgd_t *pgtable, unsigned long vaddr)
 out_flush_tlb:
flush_tlb_page(vaddr);
 }
+
+void mmu_unmap_page(pgd_t *pgtable, unsigned long vaddr)
+{
+   pgd_t *pgd;
+   pmd_t *pmd;
+   pte_t *pte;
+
+   if (!mmu_enabled())
+   return;
+
+   pgd = pgd_offset(pgtable, vaddr);
+   if (!pgd_valid(*pgd))
+   return;
+
+   pmd = pmd_offset(pgd, vaddr);
+   if (!pmd_valid(*pmd))
+   return;
+
+   if (pmd_huge(*pmd)) {
+   WRITE_ONCE(*pmd, 0);
+   goto out_flush_tlb;
+   } else {
+   pte = pte_offset(pmd, vaddr);
+   if (!pte_valid(*pte))
+   return;
+   WRITE_ONCE(*pte, 0);
+   goto out_flush_tlb;
+   }
+
+out_flush_tlb:
+   flush_tlb_page(vaddr);
+}
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH kvmtool v3 3/3] arm64: Add SPE support

2020-10-27 Thread Alexandru Elisei
From: Sudeep Holla 

Add a runtime configurable for kvmtool to enable SPE (Statistical
Profiling Extensions) for each vcpu and to create a corresponding DT node.
SPE is enabled at runtime with the --spe option.

[ Andrew M: Add SPE to init features ]
[ Alexandru E: Reworded commit, renamed spev1->spe to match kernel, added
KVM_ARM_VM_SPE_FINALIZE, set VCPU feature only if requested by
user ]

Signed-off-by: Sudeep Holla 
Signed-off-by: Andrew Murray 
Signed-off-by: Alexandru Elisei 
---
 Makefile  |   2 +-
 arm/aarch64/arm-cpu.c |   2 +
 arm/aarch64/include/kvm/kvm-config-arch.h |   2 +
 arm/aarch64/include/kvm/kvm-cpu-arch.h|   3 +-
 arm/aarch64/kvm-cpu.c |   5 +
 arm/include/arm-common/kvm-config-arch.h  |   1 +
 arm/include/arm-common/spe.h  |   7 +
 arm/spe.c | 154 ++
 8 files changed, 174 insertions(+), 2 deletions(-)
 create mode 100644 arm/include/arm-common/spe.h
 create mode 100644 arm/spe.c

diff --git a/Makefile b/Makefile
index c465a491cf7e..9bfae78f0171 100644
--- a/Makefile
+++ b/Makefile
@@ -158,7 +158,7 @@ endif
 # ARM
 OBJS_ARM_COMMON:= arm/fdt.o arm/gic.o arm/gicv2m.o 
arm/ioport.o \
   arm/kvm.o arm/kvm-cpu.o arm/pci.o arm/timer.o \
-  arm/pmu.o
+  arm/pmu.o arm/spe.o
 HDRS_ARM_COMMON:= arm/include
 ifeq ($(ARCH), arm)
DEFINES += -DCONFIG_ARM
diff --git a/arm/aarch64/arm-cpu.c b/arm/aarch64/arm-cpu.c
index d7572b7790b1..6ccea033f361 100644
--- a/arm/aarch64/arm-cpu.c
+++ b/arm/aarch64/arm-cpu.c
@@ -6,6 +6,7 @@
 #include "arm-common/gic.h"
 #include "arm-common/timer.h"
 #include "arm-common/pmu.h"
+#include "arm-common/spe.h"
 
 #include 
 #include 
@@ -17,6 +18,7 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm)
gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
pmu__generate_fdt_nodes(fdt, kvm);
+   spe__generate_fdt_nodes(fdt, kvm);
 }
 
 static int arm_cpu__vcpu_init(struct kvm_cpu *vcpu)
diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h 
b/arm/aarch64/include/kvm/kvm-config-arch.h
index 04be43dfa9b2..9f618cd9d2c1 100644
--- a/arm/aarch64/include/kvm/kvm-config-arch.h
+++ b/arm/aarch64/include/kvm/kvm-config-arch.h
@@ -6,6 +6,8 @@
"Run AArch32 guest"),   \
OPT_BOOLEAN('\0', "pmu", &(cfg)->has_pmuv3, \
"Create PMUv3 device"), \
+   OPT_BOOLEAN('\0', "spe", &(cfg)->has_spe,   \
+   "Create SPE device"),   \
OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed, \
"Specify random seed for Kernel Address Space " \
"Layout Randomization (KASLR)"),
diff --git a/arm/aarch64/include/kvm/kvm-cpu-arch.h 
b/arm/aarch64/include/kvm/kvm-cpu-arch.h
index 8dfb82ecbc37..6868f2f66040 100644
--- a/arm/aarch64/include/kvm/kvm-cpu-arch.h
+++ b/arm/aarch64/include/kvm/kvm-cpu-arch.h
@@ -8,7 +8,8 @@
 #define ARM_VCPU_FEATURE_FLAGS(kvm, cpuid) {   
\
[0] = ((!!(cpuid) << KVM_ARM_VCPU_POWER_OFF) |  
\
   (!!(kvm)->cfg.arch.aarch32_guest << KVM_ARM_VCPU_EL1_32BIT) |
\
-  (!!(kvm)->cfg.arch.has_pmuv3 << KVM_ARM_VCPU_PMU_V3))
\
+  (!!(kvm)->cfg.arch.has_pmuv3 << KVM_ARM_VCPU_PMU_V3) |   
\
+  (!!(kvm)->cfg.arch.has_spe << KVM_ARM_VCPU_SPE)) 
\
 }
 
 #define ARM_MPIDR_HWID_BITMASK 0xFF00FFUL
diff --git a/arm/aarch64/kvm-cpu.c b/arm/aarch64/kvm-cpu.c
index 9f3e8586880c..9b67c5f1d2e2 100644
--- a/arm/aarch64/kvm-cpu.c
+++ b/arm/aarch64/kvm-cpu.c
@@ -140,6 +140,11 @@ void kvm_cpu__select_features(struct kvm *kvm, struct 
kvm_vcpu_init *init)
/* Enable SVE if available */
if (kvm__supports_extension(kvm, KVM_CAP_ARM_SVE))
init->features[0] |= 1UL << KVM_ARM_VCPU_SVE;
+
+   /* Enable SPE if requested */
+   if (kvm->cfg.arch.has_spe &&
+   kvm__supports_extension(kvm, KVM_CAP_ARM_SPE))
+   init->features[0] |= 1UL << KVM_ARM_VCPU_SPE;
 }
 
 int kvm_cpu__configure_features(struct kvm_cpu *vcpu)
diff --git a/arm/include/arm-common/kvm-config-arch.h 
b/arm/include/arm-common/kvm-config-arch.h
index 5734c46ab9e6..08d8bfd3f7e0 100644
--- a/arm/include/arm-common/kvm-config-arch.h
+++ b/arm/include/arm-common/kvm-config-arch.h
@@ -9,6 +9,7 @@ struct kvm_config_arch {
boolvirtio_trans_pci;
boolaarch32_guest;
boolhas_pmuv3;
+   boolhas_spe;
u64 kaslr_seed;
enum irqchip_type irqchip;

[RFC PATCH kvmtool v3 1/3] update_headers: Sync kvm UAPI headers with linux 5.10-rc1

2020-10-27 Thread Alexandru Elisei
From: Sudeep Holla 

The local copies of the kvm user API headers are getting stale.  In
preparation for some arch-specific updated, this patch reflects a re-run of
util/update_headers.sh to pull in upstream updates from linux v5.10-rc1.

[ Alexandru E: Updated headers to Linux tag v5.10-rc1 ]

Signed-off-by: Sudeep Holla 
Signed-off-by: Alexandru Elisei 
---
 arm/aarch64/include/asm/kvm.h |  53 +--
 include/linux/kvm.h   | 117 --
 powerpc/include/asm/kvm.h |   8 +++
 x86/include/asm/kvm.h |  42 +++-
 4 files changed, 209 insertions(+), 11 deletions(-)

diff --git a/arm/aarch64/include/asm/kvm.h b/arm/aarch64/include/asm/kvm.h
index 9a507716ae2f..8876e564ba56 100644
--- a/arm/aarch64/include/asm/kvm.h
+++ b/arm/aarch64/include/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE   4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS   5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC   6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_SPE   7 /* Enable SPE for this CPU */
 
 struct kvm_vcpu_init {
__u32 target;
@@ -159,13 +160,29 @@ struct kvm_sync_regs {
 struct kvm_arch_memory_slot {
 };
 
+/*
+ * PMU filter structure. Describe a range of events with a particular
+ * action. To be used with KVM_ARM_VCPU_PMU_V3_FILTER.
+ */
+struct kvm_pmu_event_filter {
+   __u16   base_event;
+   __u16   nevents;
+
+#define KVM_PMU_EVENT_ALLOW0
+#define KVM_PMU_EVENT_DENY 1
+
+   __u8action;
+   __u8pad[3];
+};
+
 /* for KVM_GET/SET_VCPU_EVENTS */
 struct kvm_vcpu_events {
struct {
__u8 serror_pending;
__u8 serror_has_esr;
+   __u8 ext_dabt_pending;
/* Align it to 8 bytes */
-   __u8 pad[6];
+   __u8 pad[5];
__u64 serror_esr;
} exception;
__u32 reserved[12];
@@ -219,10 +236,18 @@ struct kvm_vcpu_events {
 #define KVM_REG_ARM_PTIMER_CVALARM64_SYS_REG(3, 3, 14, 2, 2)
 #define KVM_REG_ARM_PTIMER_CNT ARM64_SYS_REG(3, 3, 14, 0, 1)
 
-/* EL0 Virtual Timer Registers */
+/*
+ * EL0 Virtual Timer Registers
+ *
+ * WARNING:
+ *  KVM_REG_ARM_TIMER_CVAL and KVM_REG_ARM_TIMER_CNT are not defined
+ *  with the appropriate register encodings.  Their values have been
+ *  accidentally swapped.  As this is set API, the definitions here
+ *  must be used, rather than ones derived from the encodings.
+ */
 #define KVM_REG_ARM_TIMER_CTL  ARM64_SYS_REG(3, 3, 14, 3, 1)
-#define KVM_REG_ARM_TIMER_CNT  ARM64_SYS_REG(3, 3, 14, 3, 2)
 #define KVM_REG_ARM_TIMER_CVAL ARM64_SYS_REG(3, 3, 14, 0, 2)
+#define KVM_REG_ARM_TIMER_CNT  ARM64_SYS_REG(3, 3, 14, 3, 2)
 
 /* KVM-as-firmware specific pseudo-registers */
 #define KVM_REG_ARM_FW (0x0014 << KVM_REG_ARM_COPROC_SHIFT)
@@ -233,6 +258,15 @@ struct kvm_vcpu_events {
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL  0
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL  1
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED   2
+
+/*
+ * Only two states can be presented by the host kernel:
+ * - NOT_REQUIRED: the guest doesn't need to do anything
+ * - NOT_AVAIL: the guest isn't mitigated (it can still use SSBS if available)
+ *
+ * All the other values are deprecated. The host still accepts all
+ * values (they are ABI), but will narrow them to the above two.
+ */
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2KVM_REG_ARM_FW_REG(2)
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL  0
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN1
@@ -316,17 +350,28 @@ struct kvm_vcpu_events {
 #define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES 3
 #define   KVM_DEV_ARM_ITS_CTRL_RESET   4
 
+#define KVM_ARM_VM_SPE_CTRL0
+#define   KVM_ARM_VM_SPE_FINALIZE  0
+
 /* Device Control API on vcpu fd */
 #define KVM_ARM_VCPU_PMU_V3_CTRL   0
 #define   KVM_ARM_VCPU_PMU_V3_IRQ  0
 #define   KVM_ARM_VCPU_PMU_V3_INIT 1
+#define   KVM_ARM_VCPU_PMU_V3_FILTER   2
 #define KVM_ARM_VCPU_TIMER_CTRL1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
+#define KVM_ARM_VCPU_PVTIME_CTRL   2
+#define   KVM_ARM_VCPU_PVTIME_IPA  0
+#define KVM_ARM_VCPU_SPE_CTRL  3
+#define   KVM_ARM_VCPU_SPE_IRQ 0
+#define   KVM_ARM_VCPU_SPE_INIT1
 
 /* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_VCPU2_SHIFT28
+#define KVM_ARM_IRQ_VCPU2_MASK 0xf
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
-#define KVM_ARM_IRQ_TYPE_MASK  0xff
+#define KVM_ARM_IRQ_TYPE_MASK  0xf
 #define KVM_ARM_IRQ_VCPU_SHIFT 16
 #define KVM_ARM_IRQ_VCPU_MASK  0xff
 #define 

[RFC PATCH kvmtool v3 0/3] SPE support

2020-10-27 Thread Alexandru Elisei
This series adds userspace support for creating a guest which can use SPE.
It requires KVM SPE support which is in the RFC phase, hence why this
series is also RFC. The kvmtool patches can also be found at [1], and the
KVM SPE patches can be found at [2].

To create a guest with SPE support the following steps must be executed:

1. Set the SPE virtual interrupt ID and then initialize the features on
every VCPU.

2. After the guest memory memslots have been created, kvmtool must mlock()
the VMAs backing the memslots.

3. After everything has been copied to the guest's memory, kvmtool must
execute the KVM_ARM_VM_SPE_CTRL(KVM_ARM_VM_SPE_FINALIZE) on the VM fd.

The first patch is a simple update to the Linux headers; the second patch
add a new init list that executes last which is necessary to make sure the
gest memory will not be touched after that; and the third patch contains
the actual SPE support.

[1] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/kvm-spe-v3
[2] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/kvm-spe-v3

Alexandru Elisei (1):
  init: Add last_{init, exit} list macros

Sudeep Holla (2):
  update_headers: Sync kvm UAPI headers with linux 5.10-rc1
  arm64: Add SPE support

 Makefile  |   2 +-
 arm/aarch64/arm-cpu.c |   2 +
 arm/aarch64/include/asm/kvm.h |  53 +++-
 arm/aarch64/include/kvm/kvm-config-arch.h |   2 +
 arm/aarch64/include/kvm/kvm-cpu-arch.h|   3 +-
 arm/aarch64/kvm-cpu.c |   5 +
 arm/include/arm-common/kvm-config-arch.h  |   1 +
 arm/include/arm-common/spe.h  |   7 +
 arm/spe.c | 154 ++
 include/kvm/util-init.h   |   6 +-
 include/linux/kvm.h   | 117 +++-
 powerpc/include/asm/kvm.h |   8 ++
 x86/include/asm/kvm.h |  42 +-
 13 files changed, 387 insertions(+), 15 deletions(-)
 create mode 100644 arm/include/arm-common/spe.h
 create mode 100644 arm/spe.c

-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH kvmtool v3 2/3] init: Add last_{init, exit} list macros

2020-10-27 Thread Alexandru Elisei
Add a last_init macro for constructor functions that will be executed last
in the initialization process. Add a symmetrical macro, last_exit, for
destructor functions that will be the last to be executed when kvmtool
exits.

The list priority for the late_{init, exit} macros has been bumped down a
spot, but their relative priority remains unchanged, to keep the same size
for the init_lists and exit_lists.

Signed-off-by: Alexandru Elisei 
---
 include/kvm/util-init.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/kvm/util-init.h b/include/kvm/util-init.h
index 13d4f04df678..e6a0e1696689 100644
--- a/include/kvm/util-init.h
+++ b/include/kvm/util-init.h
@@ -39,7 +39,8 @@ static void __attribute__ ((constructor)) __init__##cb(void)  
\
 #define dev_init(cb) __init_list_add(cb, 5)
 #define virtio_dev_init(cb) __init_list_add(cb, 6)
 #define firmware_init(cb) __init_list_add(cb, 7)
-#define late_init(cb) __init_list_add(cb, 9)
+#define late_init(cb) __init_list_add(cb, 8)
+#define last_init(cb) __init_list_add(cb, 9)
 
 #define core_exit(cb) __exit_list_add(cb, 0)
 #define base_exit(cb) __exit_list_add(cb, 2)
@@ -47,5 +48,6 @@ static void __attribute__ ((constructor)) __init__##cb(void)  
\
 #define dev_exit(cb) __exit_list_add(cb, 5)
 #define virtio_dev_exit(cb) __exit_list_add(cb, 6)
 #define firmware_exit(cb) __exit_list_add(cb, 7)
-#define late_exit(cb) __exit_list_add(cb, 9)
+#define late_exit(cb) __exit_list_add(cb, 8)
+#define last_exit(cb) __exit_list_add(cb, 9)
 #endif
-- 
2.29.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 07/11] KVM: arm64: Inject AArch64 exceptions from HYP

2020-10-27 Thread Marc Zyngier

On 2020-10-26 14:22, Mark Rutland wrote:

On Mon, Oct 26, 2020 at 01:34:46PM +, Marc Zyngier wrote:

Move the AArch64 exception injection code from EL1 to HYP, leaving
only the ESR_EL1 updates to EL1. In order to come with the differences
between VHE and nVHE, two set of system register accessors are 
provided.


SPSR, ELR, PC and PSTATE are now completely handled in the hypervisor.

Signed-off-by: Marc Zyngier 



 void kvm_inject_exception(struct kvm_vcpu *vcpu)
 {
+   switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
+   case KVM_ARM64_EXCEPT_AA64_EL1_SYNC:
+   enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
+   break;
+   case KVM_ARM64_EXCEPT_AA64_EL1_IRQ:
+   enter_exception64(vcpu, PSR_MODE_EL1h, except_type_irq);
+   break;
+   case KVM_ARM64_EXCEPT_AA64_EL1_FIQ:
+   enter_exception64(vcpu, PSR_MODE_EL1h, except_type_fiq);
+   break;
+   case KVM_ARM64_EXCEPT_AA64_EL1_SERR:
+   enter_exception64(vcpu, PSR_MODE_EL1h, except_type_serror);
+   break;
+   default:
+   /* EL2 are unimplemented until we get NV. One day. */
+   break;
+   }
 }


Huh, we're going to allow EL1 to inject IRQ/FIQ/SERROR *exceptions*
directly, rather than pending those via HCR_EL2.{VI,VF,VSE}? We never
used to have code to do that.


True, and I feel like I got carried away while thinking of NV.
Though James had some "interesting" use case [1] lately...

If we're going to support that we'll need to check against the DAIF 
bits

to make sure we don't inject an exception that can't be architecturally
taken.


Nah, forget it. Unless we really need to implement something like James'
idea, I'd rather drop this altogether.


I guess we'll tighten that up along with the synchronous exception
checks, but given those three cases aren't needed today it might be
worth removing them from the switch for now and/or adding a comment to
that effect.


Agreed.

M.

[1] https://lore.kernel.org/r/20201023165108.15061-1-james.mo...@arm.com
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP

2020-10-27 Thread Marc Zyngier

On 2020-10-26 14:04, Mark Rutland wrote:

On Mon, Oct 26, 2020 at 01:34:42PM +, Marc Zyngier wrote:

In an effort to remove the vcpu PC manipulations from EL1 on nVHE
systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
to increment PC post emulation is now signalled via a flag in the
vcpu structure.

Signed-off-by: Marc Zyngier 


[...]


+/*
+ * Adjust the guest PC on entry, depending on flags provided by EL1
+ * for the purpose of emulation (MMIO, sysreg).
+ */
+static inline void __adjust_pc(struct kvm_vcpu *vcpu)
+{
+   if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
+   kvm_skip_instr(vcpu);
+   vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
+   }
+}


What's your plan for restricting *when* EL1 can ask for the PC to be
adjusted?

I'm assuming that either:

1. You have EL2 sanity-check all responses from EL1 are permitted for
   the current state. e.g. if EL1 asks to increment the PC, EL2 must
   check that that was a sane response for the current state.

2. You raise the level of abstraction at the EL2/EL1 boundary, such 
that

   EL2 simply knows. e.g. if emulating a memory access, EL1 can either
   provide the response or signal an abort, but doesn't choose to
   manipulate the PC as EL2 will infer the right thing to do.

I know that either are tricky in practice, so I'm curious what your 
view

is. Generally option #2 is easier to fortify, but I guess we might have
to do #1 since we also have to support unprotected VMs?


To be honest, I'm still in two minds about it, which is why I have
gone with this "middle of the road" option (moving the PC update
to EL2, but leave the control at EL1).

I guess the answer is "it depends". MMIO is easy to put in the #2 model,
while things like WFI/WFE really need #1. sysregs are yet another can of
worm.

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 04/11] KVM: arm64: Move PC rollback on SError to HYP

2020-10-27 Thread Marc Zyngier

On 2020-10-27 14:56, James Morse wrote:

Hi Marc,

On 26/10/2020 13:34, Marc Zyngier wrote:
Instead of handling the "PC rollback on SError during HVC" at EL1 
(which

requires disclosing PC to a potentially untrusted kernel), let's move
this fixup to ... fixup_guest_exit(), which is where we do all fixups.


diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
b/arch/arm64/kvm/hyp/include/hyp/switch.h

index d687e574cde5..668f02c7b0b3 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -411,6 +411,21 @@ static inline bool fixup_guest_exit(struct 
kvm_vcpu *vcpu, u64 *exit_code)

if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);

+   if (ARM_SERROR_PENDING(*exit_code)) {
+   u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
+
+   /*
+* HVC already have an adjusted PC, which we need to
+* correct in order to return to after having injected
+* the SError.
+*
+* SMC, on the other hand, is *trapped*, meaning its
+* preferred return address is the SMC itself.
+*/
+   if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64)
+   *vcpu_pc(vcpu) -= 4;


Isn't *vcpu_pc(vcpu) the PC of the previous entry for this vcpu?
its not the PC of the
exit until __sysreg_save_el2_return_state() saves it, which happens 
just after

fixup_guest_exit().


Hmmm. Good point. The move was obviously done in haste, thank you for 
pointing

this blatant bug.


Mess with ELR_EL2 directly?


Yes, that's the best course of action. We never run this code anyway.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 04/11] KVM: arm64: Move PC rollback on SError to HYP

2020-10-27 Thread James Morse
Hi Marc,

On 26/10/2020 13:34, Marc Zyngier wrote:
> Instead of handling the "PC rollback on SError during HVC" at EL1 (which
> requires disclosing PC to a potentially untrusted kernel), let's move
> this fixup to ... fixup_guest_exit(), which is where we do all fixups.

> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
> b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index d687e574cde5..668f02c7b0b3 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -411,6 +411,21 @@ static inline bool fixup_guest_exit(struct kvm_vcpu 
> *vcpu, u64 *exit_code)
>   if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
>   vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
>  
> + if (ARM_SERROR_PENDING(*exit_code)) {
> + u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
> +
> + /*
> +  * HVC already have an adjusted PC, which we need to
> +  * correct in order to return to after having injected
> +  * the SError.
> +  *
> +  * SMC, on the other hand, is *trapped*, meaning its
> +  * preferred return address is the SMC itself.
> +  */
> + if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64)
> + *vcpu_pc(vcpu) -= 4;

Isn't *vcpu_pc(vcpu) the PC of the previous entry for this vcpu? its not 
the PC of the
exit until __sysreg_save_el2_return_state() saves it, which happens just after
fixup_guest_exit().

Mess with ELR_EL2 directly?


Thanks,

James

> + }
> +
>   /*
>* We're using the raw exception code in order to only process
>* the trap if no SError is pending. We will come back to the
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

2020-10-27 Thread Auger Eric
Hi Shameer,

On 10/27/20 1:20 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
>> Auger Eric
>> Sent: 23 September 2020 12:47
>> To: yuzenghui ; eric.auger@gmail.com;
>> io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvmarm@lists.cs.columbia.edu; j...@8bytes.org;
>> alex.william...@redhat.com; jacob.jun@linux.intel.com;
>> yi.l@intel.com; robin.mur...@arm.com
>> Subject: Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE
> 
> ...
> 
>>> Besides, before going through the whole series [1][2], I'd like to
>>> know if this is the latest version of your Nested-Stage-Setup work in
>>> case I had missed something.
>>>
>>> [1]
>>> https://lore.kernel.org/r/20200320161911.27494-1-eric.au...@redhat.com
>>> [2]
>>> https://lore.kernel.org/r/20200414150607.28488-1-eric.au...@redhat.com
>>
>> yes those 2 series are the last ones. Thank you for reviewing.
>>
>> FYI, I intend to respin within a week or 2 on top of Jacob's  [PATCH v10 0/7]
>> IOMMU user API enhancement. 
> 
> Thanks for that. Also is there any plan to respin the related Qemu series as 
> well?
> I know dual stage interface proposals are still under discussion, but it 
> would be
> nice to have a testable solution based on new interfaces for ARM64 as well.
> Happy to help with any tests or verifications.

Yes the QEMU series will be respinned as well. That's on the top of my
todo list right now.

Thanks

Eric
> 
> Please let me know.
> 
> Thanks,
> Shameer
>   
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

2020-10-27 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
> Auger Eric
> Sent: 23 September 2020 12:47
> To: yuzenghui ; eric.auger@gmail.com;
> io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org; kvmarm@lists.cs.columbia.edu; j...@8bytes.org;
> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> yi.l@intel.com; robin.mur...@arm.com
> Subject: Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

...

> > Besides, before going through the whole series [1][2], I'd like to
> > know if this is the latest version of your Nested-Stage-Setup work in
> > case I had missed something.
> >
> > [1]
> > https://lore.kernel.org/r/20200320161911.27494-1-eric.au...@redhat.com
> > [2]
> > https://lore.kernel.org/r/20200414150607.28488-1-eric.au...@redhat.com
> 
> yes those 2 series are the last ones. Thank you for reviewing.
> 
> FYI, I intend to respin within a week or 2 on top of Jacob's  [PATCH v10 0/7]
> IOMMU user API enhancement. 

Thanks for that. Also is there any plan to respin the related Qemu series as 
well?
I know dual stage interface proposals are still under discussion, but it would 
be
nice to have a testable solution based on new interfaces for ARM64 as well.
Happy to help with any tests or verifications.

Please let me know.

Thanks,
Shameer
  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP

2020-10-27 Thread Marc Zyngier

On 2020-10-27 10:55, Suzuki K Poulose wrote:

On 10/26/20 1:34 PM, Marc Zyngier wrote:

In an effort to remove the vcpu PC manipulations from EL1 on nVHE
systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
to increment PC post emulation is now signalled via a flag in the
vcpu structure.

Signed-off-by: Marc Zyngier 


[...]


+static inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
+{
+   if (vcpu_mode_is_32bit(vcpu)) {
+   kvm_skip_instr32(vcpu);
+   } else {
+   *vcpu_pc(vcpu) += 4;
+   *vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK;
+   }
+
+   /* advance the singlestep state machine */
+   *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
+}
+
+/*
+ * Skip an instruction which has been emulated at hyp while most 
guest sysregs

+ * are live.
+ */
+static inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
+{
+   *vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
+   vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR);
+
+   __kvm_skip_instr(vcpu);


Did you mean kvm_skip_instr() instead ?


Damn. How embarrassing! Yes, of course. I should have thrown my TX1 at 
it!


Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP

2020-10-27 Thread Suzuki K Poulose

On 10/26/20 1:34 PM, Marc Zyngier wrote:

In an effort to remove the vcpu PC manipulations from EL1 on nVHE
systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
to increment PC post emulation is now signalled via a flag in the
vcpu structure.

Signed-off-by: Marc Zyngier 
---
  arch/arm64/include/asm/kvm_emulate.h   | 27 +--
  arch/arm64/include/asm/kvm_host.h  |  1 +
  arch/arm64/kvm/handle_exit.c   |  6 +--
  arch/arm64/kvm/hyp/include/hyp/adjust_pc.h | 56 ++
  arch/arm64/kvm/hyp/include/hyp/switch.h|  2 +
  arch/arm64/kvm/hyp/nvhe/switch.c   |  3 ++
  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c   |  2 +
  arch/arm64/kvm/hyp/vgic-v3-sr.c|  2 +
  arch/arm64/kvm/hyp/vhe/switch.c|  3 ++
  arch/arm64/kvm/mmio.c  |  2 +-
  arch/arm64/kvm/mmu.c   |  2 +-
  arch/arm64/kvm/sys_regs.c  |  2 +-
  12 files changed, 77 insertions(+), 31 deletions(-)
  create mode 100644 arch/arm64/kvm/hyp/include/hyp/adjust_pc.h

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 0864f425547d..6d2b5d1aa7b3 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -472,32 +472,9 @@ static inline unsigned long vcpu_data_host_to_guest(struct 
kvm_vcpu *vcpu,
return data;/* Leave LE untouched */
  }
  
-static __always_inline void kvm_skip_instr(struct kvm_vcpu *vcpu)

+static __always_inline void kvm_incr_pc(struct kvm_vcpu *vcpu)
  {
-   if (vcpu_mode_is_32bit(vcpu)) {
-   kvm_skip_instr32(vcpu);
-   } else {
-   *vcpu_pc(vcpu) += 4;
-   *vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK;
-   }
-
-   /* advance the singlestep state machine */
-   *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
-}
-
-/*
- * Skip an instruction which has been emulated at hyp while most guest sysregs
- * are live.
- */
-static __always_inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
-{
-   *vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
-   vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR);
-
-   kvm_skip_instr(vcpu);
-
-   write_sysreg_el2(vcpu_gp_regs(vcpu)->pstate, SYS_SPSR);
-   write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
+   vcpu->arch.flags |= KVM_ARM64_INCREMENT_PC;
  }
  
  #endif /* __ARM64_KVM_EMULATE_H__ */

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 0aecbab6a7fb..9a75de3ad8da 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -406,6 +406,7 @@ struct kvm_vcpu_arch {
  #define KVM_ARM64_GUEST_HAS_SVE   (1 << 5) /* SVE exposed to 
guest */
  #define KVM_ARM64_VCPU_SVE_FINALIZED  (1 << 6) /* SVE config completed */
  #define KVM_ARM64_GUEST_HAS_PTRAUTH   (1 << 7) /* PTRAUTH exposed to guest */
+#define KVM_ARM64_INCREMENT_PC (1 << 8) /* Increment PC */
  
  #define vcpu_has_sve(vcpu) (system_supports_sve() && \

((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 30bf8e22df54..d4e00a864ee6 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -61,7 +61,7 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 * otherwise return to the same address...
 */
vcpu_set_reg(vcpu, 0, ~0UL);
-   kvm_skip_instr(vcpu);
+   kvm_incr_pc(vcpu);
return 1;
  }
  
@@ -100,7 +100,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)

kvm_clear_request(KVM_REQ_UNHALT, vcpu);
}
  
-	kvm_skip_instr(vcpu);

+   kvm_incr_pc(vcpu);
  
  	return 1;

  }
@@ -221,7 +221,7 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
 * that fail their condition code check"
 */
if (!kvm_condition_valid(vcpu)) {
-   kvm_skip_instr(vcpu);
+   kvm_incr_pc(vcpu);
handled = 1;
} else {
exit_handle_fn exit_handler;
diff --git a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h 
b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
new file mode 100644
index ..4ecaf5cb2633
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Guest PC manipulation helpers
+ *
+ * Copyright (C) 2012,2013 - ARM Ltd
+ * Copyright (C) 2020 - Google LLC
+ * Author: Marc Zyngier 
+ */
+
+#ifndef __ARM64_KVM_HYP_ADJUST_PC_H__
+#define __ARM64_KVM_HYP_ADJUST_PC_H__
+
+#include 
+#include 
+
+static inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
+{
+   if (vcpu_mode_is_32bit(vcpu)) {
+   kvm_skip_instr32(vcpu);
+   } else {
+   *vcpu_pc(vcpu) += 4;
+   *vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK;
+   }
+
+   /* advance the singlestep state machine */
+   *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
+}
+
+/*
+ * Skip 

Re: [PATCH] KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT

2020-10-27 Thread Will Deacon
On Tue, Oct 27, 2020 at 10:41:33AM +1100, Gavin Shan wrote:
> On 10/27/20 1:44 AM, Will Deacon wrote:
> > For consistency with the rest of the stage-2 page-table page allocations
> > (performing using a kvm_mmu_memory_cache), ensure that __GFP_ACCOUNT is
> > included in the GFP flags for the PGD pages.
> > 
> > Cc: Marc Zyngier 
> > Cc: Quentin Perret 
> > Signed-off-by: Will Deacon 
> > ---
> >   arch/arm64/kvm/hyp/pgtable.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> 
> The patch itself looks good to me:
> 
> Reviewed-by: Gavin Shan 
> 
> Another question is why the page-table pages for hyp mode aren't
> allocated with __GFP_ACCOUNT in kvm_pgtable_hyp_init and hyp_map_walker()?
> The page-table pages for host or guest are allocated with GFP_PGTABLE_USER
> in alloc_pte_one().
> 
> #define GFP_PGTABLE_USER  (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
> #define GFP_PGTABLE_KERNEL(GFP_KERNEL | __GFP_ZERO)

I think because the guest pages are allocated as a direct result of the VMM,
whereas I tend to think of the hyp page-tables more like kernel page-tables
(which aren't accounted afaik: see GFP_PGTABLE_USER vs GFP_PGTABLE_KERNEL).

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT

2020-10-27 Thread Marc Zyngier

On 2020-10-26 23:41, Gavin Shan wrote:

Hi Will,

On 10/27/20 1:44 AM, Will Deacon wrote:
For consistency with the rest of the stage-2 page-table page 
allocations
(performing using a kvm_mmu_memory_cache), ensure that __GFP_ACCOUNT 
is

included in the GFP flags for the PGD pages.

Cc: Marc Zyngier 
Cc: Quentin Perret 
Signed-off-by: Will Deacon 
---
  arch/arm64/kvm/hyp/pgtable.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



The patch itself looks good to me:

Reviewed-by: Gavin Shan 

Another question is why the page-table pages for hyp mode aren't
allocated with __GFP_ACCOUNT in kvm_pgtable_hyp_init and 
hyp_map_walker()?


Why user task would you account the hypervisor mappings to? The page 
tables

used for HYP code and data are definitely not attributable to any task.

The kvm and kvm_vcpu mappings *could* be attributed to a user task, but
the page tables are likely shared with other tasks. So who gets the 
blame?


M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm