Re: [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper

2017-10-19 Thread Will Deacon
On Mon, Oct 09, 2017 at 04:20:24PM +0100, Marc Zyngier wrote:
> We currently tightly couple dcache clean with icache invalidation,
> but KVM could do without the initial flush to PoU, as we've
> already flushed things to PoC.
> 
> Let's introduce invalidate_icache_range which is limited to
> invalidating the icache from the linear mapping (and thus
> has none of the userspace fault handling complexity), and
> wire it in KVM instead of flush_icache_range.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/cacheflush.h |  8 
>  arch/arm64/include/asm/kvm_mmu.h|  4 ++--
>  arch/arm64/mm/cache.S   | 24 
>  3 files changed, 34 insertions(+), 2 deletions(-)

[...]

> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index 7f1dbe962cf5..0c330666a8c9 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -80,6 +80,30 @@ USER(9f, icivau, x4)   // 
> invalidate I line PoU
>  ENDPROC(flush_icache_range)
>  ENDPROC(__flush_cache_user_range)
>  
> +/*
> + *   invalidate_icache_range(start,end)
> + *
> + *   Ensure that the I cache is invalid within specified region. This
> + *   assumes that this is done on the linear mapping. Do not use it
> + *   on a userspace range, as this may fault horribly.
> + *
> + *   - start   - virtual start address of region
> + *   - end - virtual end address of region
> + */
> +ENTRY(invalidate_icache_range)
> + icache_line_size x2, x3
> + sub x3, x2, #1
> + bic x4, x0, x3
> +1:
> + ic  ivau, x4// invalidate I line PoU
> + add x4, x4, x2
> + cmp x4, x1
> + b.lo1b
> + dsb ish
> + isb
> + ret
> +ENDPROC(invalidate_icache_range)

Is there a good reason not to make this work for user addresses? If it's as
simple as adding a USER annotation and a fallback, then we should wrap that
in a macro and reuse it for __flush_cache_user_range.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 27/26] KVM: arm/arm64: GICv4: Prevent userspace from changing doorbell affinity

2017-10-19 Thread Marc Zyngier
We so far allocate the doorbell interrupts without taking any
special measure regarding the affinity of these interrupts. We
simply move them around as required when the vcpu gets scheduled
on a different CPU.

But that's counting without userspace (and the evil irqbalance) that
can try and move the VPE interrupt around, causing the ITS code
to emit VMOVP commands and remap the doorbell to another redistributor.
Worse, this can happen while the vcpu is running, causing all kind
of trouble if the VPE is already resident, and we end-up in UNPRED
territory.

So let's take a definitive action and prevent userspace from messing
with us. This is just a matter of adding IRQ_NO_BALANCING to the
set of flags we already have, letting the kernel in sole control
of the affinity.

Signed-off-by: Marc Zyngier 
---
 virt/kvm/arm/vgic/vgic-v4.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

This extra patch goes on top of the current GICv4 patches, and is hence
being posted with a weird sequence number...

diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
index b87806fea554..c3dbab714328 100644
--- a/virt/kvm/arm/vgic/vgic-v4.c
+++ b/virt/kvm/arm/vgic/vgic-v4.c
@@ -89,6 +89,8 @@
  * reason), the doorbell interrupt is disabled.
  */
 
+#define DB_IRQ_FLAGS   (IRQ_NOAUTOEN | IRQ_DISABLE_UNLAZY | IRQ_NO_BALANCING)
+
 static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
 {
struct kvm_vcpu *vcpu = info;
@@ -149,7 +151,7 @@ int vgic_v4_init(struct kvm *kvm)
 * doorbell could kick us out of the guest too
 * early...
 */
-   irq_set_status_flags(irq, IRQ_NOAUTOEN | IRQ_DISABLE_UNLAZY);
+   irq_set_status_flags(irq, DB_IRQ_FLAGS);
ret = request_irq(irq, vgic_v4_doorbell_handler,
  0, "vcpu", vcpu);
if (ret) {
@@ -187,7 +189,7 @@ void vgic_v4_teardown(struct kvm *kvm)
struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, i);
int irq = its_vm->vpes[i]->irq;
 
-   irq_clear_status_flags(irq, IRQ_NOAUTOEN | IRQ_DISABLE_UNLAZY);
+   irq_clear_status_flags(irq, DB_IRQ_FLAGS);
free_irq(irq, vcpu);
}
 
-- 
2.14.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 16/21] KVM: arm64: Save/Restore guest DISR_EL1

2017-10-19 Thread James Morse
If we deliver a virtual SError to the guest, the guest may defer it
with an ESB instruction. The guest reads the deferred value via DISR_EL1,
but the guests view of DISR_EL1 is re-mapped to VDISR_EL2 when HCR_EL2.AMO
is set.

Add the KVM code to save/restore VDISR_EL2, and make it accessible to
userspace as DISR_EL1.

Signed-off-by: James Morse 
---
 arch/arm64/include/asm/kvm_host.h | 1 +
 arch/arm64/include/asm/sysreg.h   | 1 +
 arch/arm64/kvm/hyp/sysreg-sr.c| 6 ++
 arch/arm64/kvm/sys_regs.c | 1 +
 4 files changed, 9 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 28a4de85edee..97438cc3a9ad 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -120,6 +120,7 @@ enum vcpu_sysreg {
PAR_EL1,/* Physical Address Register */
MDSCR_EL1,  /* Monitor Debug System Control Register */
MDCCINT_EL1,/* Monitor Debug Comms Channel Interrupt Enable Reg */
+   DISR_EL1,   /* Deferred Interrupt Status Register */
 
/* Performance Monitors Registers */
PMCR_EL0,   /* Control Register */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index a493e93de296..1b8b9012234d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -256,6 +256,7 @@
 #define SYS_VSESR_EL2  sys_reg(3, 4, 5, 2, 3)
 #define SYS_FPEXC32_EL2sys_reg(3, 4, 5, 3, 0)
 
+#define SYS_VDISR_EL2  sys_reg(3, 4, 12, 1,  1)
 #define __SYS__AP0Rx_EL2(x)sys_reg(3, 4, 12, 8, x)
 #define SYS_ICH_AP0R0_EL2  __SYS__AP0Rx_EL2(0)
 #define SYS_ICH_AP0R1_EL2  __SYS__AP0Rx_EL2(1)
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 934137647837..f4d604803b29 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -66,6 +66,9 @@ static void __hyp_text __sysreg_save_state(struct 
kvm_cpu_context *ctxt)
ctxt->gp_regs.sp_el1= read_sysreg(sp_el1);
ctxt->gp_regs.elr_el1   = read_sysreg_el1(elr);
ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
+
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
+   ctxt->sys_regs[DISR_EL1] = read_sysreg_s(SYS_VDISR_EL2);
 }
 
 static hyp_alternate_select(__sysreg_call_save_host_state,
@@ -119,6 +122,9 @@ static void __hyp_text __sysreg_restore_state(struct 
kvm_cpu_context *ctxt)
write_sysreg(ctxt->gp_regs.sp_el1,  sp_el1);
write_sysreg_el1(ctxt->gp_regs.elr_el1, elr);
write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
+
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
+   write_sysreg_s(ctxt->sys_regs[DISR_EL1], SYS_VDISR_EL2);
 }
 
 static hyp_alternate_select(__sysreg_call_restore_host_state,
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2e070d3baf9f..713275b501ce 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -963,6 +963,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
 
{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
+   { SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
 
{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
{ SYS_DESC(SYS_ICC_EOIR0_EL1), read_from_write_only },
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 19/21] KVM: arm64: Handle RAS SErrors from EL2 on guest exit

2017-10-19 Thread James Morse
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.

There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.

The current SError from EL2 code unmasks SError and tries to fence any
pending SError into a single instruction window. It then leaves SError
unmasked.

With the v8.2 RAS Extensions we may take an SError for a 'corrected'
error, but KVM is only able to handle SError from EL2 if they occur
during this single instruction window...

The RAS Extensions give us a new instruction to synchronise and
consume SErrors. The RAS Extensions document (ARM DDI0587),
'2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
SError interrupts generated by 'instructions, translation table walks,
hardware updates to the translation tables, and instruction fetches on
the same PE'. This makes ESB equivalent to KVMs existing
'dsb, mrs-daifclr, isb' sequence.

Use the alternatives to synchronise and consume any SError using ESB
instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
in the exit_code so that we can restart the vcpu if it turns out this
SError has no impact on the vcpu.

Signed-off-by: James Morse 

---
Changes since v3:
 * Moved that nop out of the firing line

 arch/arm64/include/asm/kvm_emulate.h |  5 +
 arch/arm64/include/asm/kvm_host.h|  1 +
 arch/arm64/kernel/asm-offsets.c  |  1 +
 arch/arm64/kvm/handle_exit.c | 10 +-
 arch/arm64/kvm/hyp/entry.S   | 13 +
 5 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 8a7a838eb17a..8274d16df3cd 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -173,6 +173,11 @@ static inline phys_addr_t kvm_vcpu_get_fault_ipa(const 
struct kvm_vcpu *vcpu)
return ((phys_addr_t)vcpu->arch.fault.hpfar_el2 & HPFAR_MASK) << 8;
 }
 
+static inline u64 kvm_vcpu_get_disr(const struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.fault.disr_el1;
+}
+
 static inline u32 kvm_vcpu_hvc_get_imm(const struct kvm_vcpu *vcpu)
 {
return kvm_vcpu_get_hsr(vcpu) & ESR_ELx_xVC_IMM_MASK;
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 97438cc3a9ad..cf5d78ba14b5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -89,6 +89,7 @@ struct kvm_vcpu_fault_info {
u32 esr_el2;/* Hyp Syndrom Register */
u64 far_el2;/* Hyp Fault Address Register */
u64 hpfar_el2;  /* Hyp IPA Fault Address Register */
+   u64 disr_el1;   /* Deferred [SError] Status Register */
 };
 
 /*
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 71bf088f1e4b..121889c49542 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -130,6 +130,7 @@ int main(void)
   BLANK();
 #ifdef CONFIG_KVM_ARM_HOST
   DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt));
+  DEFINE(VCPU_FAULT_DISR,  offsetof(struct kvm_vcpu, arch.fault.disr_el1));
   DEFINE(CPU_GP_REGS,  offsetof(struct kvm_cpu_context, gp_regs));
   DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs));
   DEFINE(CPU_FP_REGS,  offsetof(struct kvm_regs, fp_regs));
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 345fdbba6c2e..e1e6cfe7d4d9 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -23,6 +23,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -208,7 +209,14 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
*vcpu_pc(vcpu) -= adj;
}
 
-   kvm_inject_vabt(vcpu);
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+   u64 disr = kvm_vcpu_get_disr(vcpu);
+
+   kvm_handle_guest_serror(vcpu, disr_to_esr(disr));
+   } else {
+   kvm_inject_vabt(vcpu);
+   }
+
return 1;
}
 
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 12ee62d6d410..024c7afc78f8 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -124,6 +124,17 @@ ENTRY(__guest_exit)
// Now restore the host regs
restore_callee_saved_regs x2
 
+alternative_if ARM64_HAS_RAS_EXTN
+   // If we have the RAS extensions we can consume a pending error
+   // without an unmask-SError and isb.
+   esb
+   mrs_s   x2, SYS_DISR_EL1
+   str x2, [x1, #(VCPU_FAULT_DISR - VCPU_CONTEXT)]
+   cbz x2, 1f
+   msr_s   

[PATCH v4 18/21] KVM: arm64: Handle RAS SErrors from EL1 on guest exit

2017-10-19 Thread James Morse
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.

There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.

For SError that interrupt a guest and are routed to EL2 the existing
behaviour is to inject an impdef SError into the guest.

Add code to handle RAS SError based on the ESR. For uncontained errors
arm64_is_blocking_ras_serror() will panic(), these errors compromise
the host too. All other error types are contained: For the 'blocking'
errors the vCPU can't make progress, so we inject a virtual SError.
We ignore contained errors where we can make progress as if we're lucky,
we may not hit them again.

Signed-off-by: James Morse 
---
 arch/arm64/kvm/handle_exit.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 7debb74843a0..345fdbba6c2e 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -28,12 +28,19 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
 
 typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);
 
+static void kvm_handle_guest_serror(struct kvm_vcpu *vcpu, u32 esr)
+{
+   if (!arm64_is_ras_serror(esr) || arm64_blocking_ras_serror(NULL, esr))
+   kvm_inject_vabt(vcpu);
+}
+
 static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
int ret;
@@ -211,7 +218,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
case ARM_EXCEPTION_IRQ:
return 1;
case ARM_EXCEPTION_EL1_SERROR:
-   kvm_inject_vabt(vcpu);
+   kvm_handle_guest_serror(vcpu, kvm_vcpu_get_hsr(vcpu));
return 1;
case ARM_EXCEPTION_TRAP:
/*
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 20/21] KVM: arm64: Take any host SError before entering the guest

2017-10-19 Thread James Morse
On VHE systems KVM masks SError before switching the VBAR value. Any
host RAS error that the CPU knew about before world-switch may become
pending as an SError during world-switch, and only be taken once we enter
the guest.

Until KVM can take RAS SErrors during world switch, add an ESB to
force any RAS errors to be synchronised and taken on the host before
we enter world switch.

RAS errors that become pending during world switch are still taken
once we enter the guest.

Signed-off-by: James Morse 
---
 arch/arm64/include/asm/kvm_host.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index cf5d78ba14b5..5dc6f2877762 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -392,6 +392,7 @@ static inline void __cpu_init_stage2(void)
 
 static inline void kvm_arm_vhe_guest_enter(void)
 {
+   esb();
local_daif_mask();
 }
 
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 15/21] KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.

2017-10-19 Thread James Morse
Prior to v8.2's RAS Extensions, the HCR_EL2.VSE 'virtual SError' feature
generated an SError with an implementation defined ESR_EL1.ISS, because we
had no mechanism to specify the ESR value.

On Juno this generates an all-zero ESR, the most significant bit 'ISV'
is clear indicating the remainder of the ISS field is invalid.

With the RAS Extensions we have a mechanism to specify this value, and the
most significant bit has a new meaning: 'IDS - Implementation Defined
Syndrome'. An all-zero SError ESR now means: 'RAS error: Uncategorized'
instead of 'no valid ISS'.

Add KVM support for the VSESR_EL2 register to specify an ESR value when
HCR_EL2.VSE generates a virtual SError. Change kvm_inject_vabt() to
specify an implementation-defined value.

We only need to restore the VSESR_EL2 value when HCR_EL2.VSE is set, KVM
save/restores this bit during __deactivate_traps() and hardware clears the
bit once the guest has consumed the virtual-SError.

Future patches may add an API (or KVM CAP) to pend a virtual SError with
a specified ESR.

Cc: Dongjiu Geng 
Signed-off-by: James Morse 
---
 arch/arm64/include/asm/kvm_emulate.h |  5 +
 arch/arm64/include/asm/kvm_host.h|  3 +++
 arch/arm64/include/asm/sysreg.h  |  1 +
 arch/arm64/kvm/hyp/switch.c  |  4 
 arch/arm64/kvm/inject_fault.c| 13 -
 5 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index e5df3fce0008..8a7a838eb17a 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -61,6 +61,11 @@ static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, 
unsigned long hcr)
vcpu->arch.hcr_el2 = hcr;
 }
 
+static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
+{
+   vcpu->arch.vsesr_el2 = vsesr;
+}
+
 static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
 {
return (unsigned long *)_gp_regs(vcpu)->regs.pc;
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index a0e2f7962401..28a4de85edee 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -277,6 +277,9 @@ struct kvm_vcpu_arch {
 
/* Detect first run of a vcpu */
bool has_run_once;
+
+   /* Virtual SError ESR to restore when HCR_EL2.VSE is set */
+   u64 vsesr_el2;
 };
 
 #define vcpu_gp_regs(v)(&(v)->arch.ctxt.gp_regs)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 427c36bc5dd6..a493e93de296 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -253,6 +253,7 @@
 
 #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
 #define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
+#define SYS_VSESR_EL2  sys_reg(3, 4, 5, 2, 3)
 #define SYS_FPEXC32_EL2sys_reg(3, 4, 5, 3, 0)
 
 #define __SYS__AP0Rx_EL2(x)sys_reg(3, 4, 12, 8, x)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c641c4..af37658223a0 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -86,6 +86,10 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
*vcpu)
isb();
}
write_sysreg(val, hcr_el2);
+
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (val & HCR_VSE))
+   write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
+
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
/*
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index da6a8cfa54a0..52f7f66f1356 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -232,14 +232,25 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
inject_undef64(vcpu);
 }
 
+static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
+{
+   vcpu_set_vsesr(vcpu, esr);
+   vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+}
+
 /**
  * kvm_inject_vabt - inject an async abort / SError into the guest
  * @vcpu: The VCPU to receive the exception
  *
  * It is assumed that this code is called from the VCPU thread and that the
  * VCPU therefore is not currently executing guest code.
+ *
+ * Systems with the RAS Extensions specify an imp-def ESR (ISV/IDS = 1) with
+ * the remaining ISS all-zeros so that this error is not interpreted as an
+ * uncatagorized RAS error. Without the RAS Extensions we can't specify an ESR
+ * value, so the CPU generates an imp-def value.
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-   vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+   pend_guest_serror(vcpu, ESR_ELx_ISV);
 }
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 08/21] arm64: entry.S: convert elX_irq

2017-10-19 Thread James Morse
Following our 'dai' order, irqs should be processed with debug and
serror exceptions unmasked.

Add a helper to unmask these two, (and fiq for good measure).

Signed-off-by: James Morse 
Reviewed-by: Julien Thierry 
Reviewed-by: Catalin Marinas 

---
Changes since v3:
* Added comment against enable_da_f

 arch/arm64/include/asm/assembler.h | 5 +
 arch/arm64/kernel/entry.S  | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index c2a37e2f733c..e4ac505b7b3d 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -54,6 +54,11 @@
msr daif, \tmp
.endm
 
+   /* IRQ is the lowest priority flag, unconditionally unmask the rest. */
+   .macro enable_da_f
+   msr daifclr, #(8 | 4 | 1)
+   .endm
+
 /*
  * Enable and disable interrupts.
  */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index f7dfe5d2b1fb..df085ec003b0 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -557,7 +557,7 @@ ENDPROC(el1_sync)
.align  6
 el1_irq:
kernel_entry 1
-   enable_dbg
+   enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
bl  trace_hardirqs_off
 #endif
@@ -766,7 +766,7 @@ ENDPROC(el0_sync)
 el0_irq:
kernel_entry 0
 el0_irq_naked:
-   enable_dbg
+   enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
bl  trace_hardirqs_off
 #endif
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 10/21] arm64: entry.S: move SError handling into a C function for future expansion

2017-10-19 Thread James Morse
From: Xie XiuQi 

Today SError is taken using the inv_entry macro that ends up in
bad_mode.

SError can be used by the RAS Extensions to notify either the OS or
firmware of CPU problems, some of which may have been corrected.

To allow this handling to be added, add a do_serror() C function
that just panic()s. Add the entry.S boiler plate to save/restore the
CPU registers and unmask debug exceptions. Future patches may change
do_serror() to return if the SError Interrupt was notification of a
corrected error.

Signed-off-by: Xie XiuQi 
Signed-off-by: Wang Xiongfeng 
[Split out of a bigger patch, added compat path, renamed, enabled debug
 exceptions]
Signed-off-by: James Morse 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/Kconfig|  2 +-
 arch/arm64/kernel/entry.S | 36 +---
 arch/arm64/kernel/traps.c | 13 +
 3 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..70dfe4e9ccc5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -98,7 +98,7 @@ config ARM64
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
-   select HAVE_NMI if ACPI_APEI_SEA
+   select HAVE_NMI
select HAVE_PATA_PLATFORM
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index df085ec003b0..e147c1d00b41 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -375,18 +375,18 @@ ENTRY(vectors)
kernel_ventry   el1_sync// Synchronous EL1h
kernel_ventry   el1_irq // IRQ EL1h
kernel_ventry   el1_fiq_invalid // FIQ EL1h
-   kernel_ventry   el1_error_invalid   // Error EL1h
+   kernel_ventry   el1_error   // Error EL1h
 
kernel_ventry   el0_sync// Synchronous 64-bit 
EL0
kernel_ventry   el0_irq // IRQ 64-bit EL0
kernel_ventry   el0_fiq_invalid // FIQ 64-bit EL0
-   kernel_ventry   el0_error_invalid   // Error 64-bit EL0
+   kernel_ventry   el0_error   // Error 64-bit EL0
 
 #ifdef CONFIG_COMPAT
kernel_ventry   el0_sync_compat // Synchronous 32-bit 
EL0
kernel_ventry   el0_irq_compat  // IRQ 32-bit EL0
kernel_ventry   el0_fiq_invalid_compat  // FIQ 32-bit EL0
-   kernel_ventry   el0_error_invalid_compat// Error 32-bit EL0
+   kernel_ventry   el0_error_compat// Error 32-bit EL0
 #else
kernel_ventry   el0_sync_invalid// Synchronous 32-bit 
EL0
kernel_ventry   el0_irq_invalid // IRQ 32-bit EL0
@@ -455,10 +455,6 @@ ENDPROC(el0_error_invalid)
 el0_fiq_invalid_compat:
inv_entry 0, BAD_FIQ, 32
 ENDPROC(el0_fiq_invalid_compat)
-
-el0_error_invalid_compat:
-   inv_entry 0, BAD_ERROR, 32
-ENDPROC(el0_error_invalid_compat)
 #endif
 
 el1_sync_invalid:
@@ -663,6 +659,10 @@ el0_svc_compat:
 el0_irq_compat:
kernel_entry 0, 32
b   el0_irq_naked
+
+el0_error_compat:
+   kernel_entry 0, 32
+   b   el0_error_naked
 #endif
 
 el0_da:
@@ -780,6 +780,28 @@ el0_irq_naked:
b   ret_to_user
 ENDPROC(el0_irq)
 
+el1_error:
+   kernel_entry 1
+   mrs x1, esr_el1
+   enable_dbg
+   mov x0, sp
+   bl  do_serror
+   kernel_exit 1
+ENDPROC(el1_error)
+
+el0_error:
+   kernel_entry 0
+el0_error_naked:
+   mrs x1, esr_el1
+   enable_dbg
+   mov x0, sp
+   bl  do_serror
+   enable_daif
+   ct_user_exit
+   b   ret_to_user
+ENDPROC(el0_error)
+
+
 /*
  * This is the fast syscall return path.  We do as little as possible here,
  * and this includes saving x0 back into the kernel stack.
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 1808be65d22f..773aae69c376 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -709,6 +709,19 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs)
 }
 #endif
 
+asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
+{
+   nmi_enter();
+
+   console_verbose();
+
+   pr_crit("SError Interrupt on CPU%d, code 0x%08x -- %s\n",
+   smp_processor_id(), esr, esr_get_class_string(esr));
+   __show_regs(regs);
+
+   panic("Asynchronous SError Interrupt");
+}
+
 void __pte_error(const char *file, int line, unsigned long val)
 {
pr_err("%s:%d: bad pte %016lx.\n", file, line, val);
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu

[PATCH v4 07/21] arm64: entry.S convert el0_sync

2017-10-19 Thread James Morse
el0_sync also unmasks exceptions on a case-by-case basis, debug exceptions
are enabled, unless this was a debug exception. Irqs are unmasked for
some exception types but not for others.

el0_dbg should run with everything masked to prevent us taking a debug
exception from do_debug_exception. For the other cases we can unmask
everything. This changes the behaviour of fpsimd_{acc,exc} and el0_inv
which previously ran with irqs masked.

This patch removed the last user of enable_dbg_and_irq, remove it.

Signed-off-by: James Morse 
Reviewed-by: Julien Thierry 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/assembler.h |  9 -
 arch/arm64/kernel/entry.S  | 24 ++--
 2 files changed, 10 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index abb5abd61ddb..c2a37e2f733c 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -97,15 +97,6 @@
.endm
 
 /*
- * Enable both debug exceptions and interrupts. This is likely to be
- * faster than two daifclr operations, since writes to this register
- * are self-synchronising.
- */
-   .macro  enable_dbg_and_irq
-   msr daifclr, #(8 | 2)
-   .endm
-
-/*
  * SMP data memory barrier
  */
.macro  smp_dmb, opt
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index bd54115972a4..f7dfe5d2b1fb 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -670,8 +670,7 @@ el0_da:
 * Data abort handling
 */
mrs x26, far_el1
-   // enable interrupts before calling the main handler
-   enable_dbg_and_irq
+   enable_daif
ct_user_exit
clear_address_tag x0, x26
mov x1, x25
@@ -683,8 +682,7 @@ el0_ia:
 * Instruction abort handling
 */
mrs x26, far_el1
-   // enable interrupts before calling the main handler
-   enable_dbg_and_irq
+   enable_daif
ct_user_exit
mov x0, x26
mov x1, x25
@@ -695,7 +693,7 @@ el0_fpsimd_acc:
/*
 * Floating Point or Advanced SIMD access
 */
-   enable_dbg
+   enable_daif
ct_user_exit
mov x0, x25
mov x1, sp
@@ -705,7 +703,7 @@ el0_fpsimd_exc:
/*
 * Floating Point or Advanced SIMD exception
 */
-   enable_dbg
+   enable_daif
ct_user_exit
mov x0, x25
mov x1, sp
@@ -716,8 +714,7 @@ el0_sp_pc:
 * Stack or PC alignment exception handling
 */
mrs x26, far_el1
-   // enable interrupts before calling the main handler
-   enable_dbg_and_irq
+   enable_daif
ct_user_exit
mov x0, x26
mov x1, x25
@@ -728,8 +725,7 @@ el0_undef:
/*
 * Undefined instruction
 */
-   // enable interrupts before calling the main handler
-   enable_dbg_and_irq
+   enable_daif
ct_user_exit
mov x0, sp
bl  do_undefinstr
@@ -738,7 +734,7 @@ el0_sys:
/*
 * System instructions, for trapped cache maintenance instructions
 */
-   enable_dbg_and_irq
+   enable_daif
ct_user_exit
mov x0, x25
mov x1, sp
@@ -753,11 +749,11 @@ el0_dbg:
mov x1, x25
mov x2, sp
bl  do_debug_exception
-   enable_dbg
+   enable_daif
ct_user_exit
b   ret_to_user
 el0_inv:
-   enable_dbg
+   enable_daif
ct_user_exit
mov x0, sp
mov x1, #BAD_SYNC
@@ -836,7 +832,7 @@ el0_svc:
mov wsc_nr, #__NR_syscalls
 el0_svc_naked: // compat entry point
stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and 
syscall number
-   enable_dbg_and_irq
+   enable_daif
ct_user_exit 1
 
ldr x16, [tsk, #TSK_TI_FLAGS]   // check for syscall hooks
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 17/21] KVM: arm64: Save ESR_EL2 on guest SError

2017-10-19 Thread James Morse
When we exit a guest due to an SError the vcpu fault info isn't updated
with the ESR. Today this is only done for traps.

The v8.2 RAS Extensions define ISS values for SError. Update the vcpu's
fault_info with the ESR on SError so that handle_exit() can determine
if this was a RAS SError and decode its severity.

Signed-off-by: James Morse 
---
 arch/arm64/kvm/hyp/switch.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index af37658223a0..cba6d8ac105c 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -230,13 +230,20 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, 
u64 *hpfar)
return true;
 }
 
+static void __hyp_text __populate_fault_info_esr(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
+}
+
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 {
-   u64 esr = read_sysreg_el2(esr);
-   u8 ec = ESR_ELx_EC(esr);
+   u8 ec;
+   u64 esr;
u64 hpfar, far;
 
-   vcpu->arch.fault.esr_el2 = esr;
+   __populate_fault_info_esr(vcpu);
+   esr = vcpu->arch.fault.esr_el2;
+   ec = ESR_ELx_EC(esr);
 
if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
return true;
@@ -325,6 +332,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 */
if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
goto again;
+   else if (ARM_EXCEPTION_CODE(exit_code) == ARM_EXCEPTION_EL1_SERROR)
+   __populate_fault_info_esr(vcpu);
 
if (static_branch_unlikely(_v2_cpuif_trap) &&
exit_code == ARM_EXCEPTION_TRAP) {
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 12/21] arm64: kernel: Survive corrected RAS errors notified by SError

2017-10-19 Thread James Morse
Prior to v8.2, SError is an uncontainable fatal exception. The v8.2 RAS
extensions use SError to notify software about RAS errors, these can be
contained by the ESB instruction.

An ACPI system with firmware-first may use SError as its 'SEI'
notification. Future patches may add code to 'claim' this SError as a
notification.

Other systems can distinguish these RAS errors from the SError ESR and
use the AET bits and additional data from RAS-Error registers to handle
the error. Future patches may add this kernel-first handling.

Without support for either of these we will panic(), even if we received
a corrected error. Add code to decode the severity of RAS errors. We can
safely ignore contained errors where the CPU can continue to make
progress. For all other errors we continue to panic().

Signed-off-by: James Morse 
Reviewed-by: Catalin Marinas 

---
I couldn't come up with a concise way to capture 'can continue to make
progress', so opted for 'blocking' instead.

 arch/arm64/include/asm/esr.h   | 10 
 arch/arm64/include/asm/traps.h | 36 ++
 arch/arm64/kernel/traps.c  | 58 ++
 3 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 66ed8b6b9976..8ea52f15bf1c 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -85,6 +85,15 @@
 #define ESR_ELx_WNR_SHIFT  (6)
 #define ESR_ELx_WNR(UL(1) << ESR_ELx_WNR_SHIFT)
 
+/* Asynchronous Error Type */
+#define ESR_ELx_AET(UL(0x7) << 10)
+
+#define ESR_ELx_AET_UC (UL(0) << 10)   /* Uncontainable */
+#define ESR_ELx_AET_UEU(UL(1) << 10)   /* Uncorrected 
Unrecoverable */
+#define ESR_ELx_AET_UEO(UL(2) << 10)   /* Uncorrected 
Restartable */
+#define ESR_ELx_AET_UER(UL(3) << 10)   /* Uncorrected 
Recoverable */
+#define ESR_ELx_AET_CE (UL(6) << 10)   /* Corrected */
+
 /* Shared ISS field definitions for Data/Instruction aborts */
 #define ESR_ELx_SET_SHIFT  (11)
 #define ESR_ELx_SET_MASK   (UL(3) << ESR_ELx_SET_SHIFT)
@@ -99,6 +108,7 @@
 #define ESR_ELx_FSC(0x3F)
 #define ESR_ELx_FSC_TYPE   (0x3C)
 #define ESR_ELx_FSC_EXTABT (0x10)
+#define ESR_ELx_FSC_SERROR (0x11)
 #define ESR_ELx_FSC_ACCESS (0x08)
 #define ESR_ELx_FSC_FAULT  (0x04)
 #define ESR_ELx_FSC_PERM   (0x0C)
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
index d131501c6222..8d2a1fff5c6b 100644
--- a/arch/arm64/include/asm/traps.h
+++ b/arch/arm64/include/asm/traps.h
@@ -19,6 +19,7 @@
 #define __ASM_TRAP_H
 
 #include 
+#include 
 #include 
 
 struct pt_regs;
@@ -58,4 +59,39 @@ static inline int in_entry_text(unsigned long ptr)
return ptr >= (unsigned long)&__entry_text_start &&
   ptr < (unsigned long)&__entry_text_end;
 }
+
+static inline bool arm64_is_ras_serror(u32 esr)
+{
+   bool impdef = esr & ESR_ELx_ISV; /* aka IDS */
+
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
+   return !impdef;
+
+   return false;
+}
+
+/* Return the AET bits of an SError ESR, or 0/uncontainable/uncategorized */
+static inline u32 arm64_ras_serror_get_severity(u32 esr)
+{
+   u32 aet = esr & ESR_ELx_AET;
+
+   if (!arm64_is_ras_serror(esr)) {
+   /* Not a RAS error, we can't interpret the ESR */
+   return 0;
+   }
+
+   /*
+* AET is RES0 if 'the value returned in the DFSC field is not
+* [ESR_ELx_FSC_SERROR]'
+*/
+   if ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR) {
+   /* No severity information */
+   return 0;
+   }
+
+   return aet;
+}
+
+bool arm64_blocking_ras_serror(struct pt_regs *regs, unsigned int esr);
+void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr);
 #endif
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 773aae69c376..53aeb25158b0 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -709,17 +709,65 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs)
 }
 #endif
 
-asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
+void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr)
 {
-   nmi_enter();
-
console_verbose();
 
pr_crit("SError Interrupt on CPU%d, code 0x%08x -- %s\n",
smp_processor_id(), esr, esr_get_class_string(esr));
-   __show_regs(regs);
+   if (regs)
+   __show_regs(regs);
+
+   /* KVM may call this this from a preemptible context */
+   preempt_disable();
+
+   /*
+* panic() unmasks interrupts, which unmasks SError. Use nmi_panic()
+* to avoid re-entering panic.
+*/
+   nmi_panic(regs, "Asynchronous SError Interrupt");
+
+   cpu_park_loop();
+   unreachable();
+}
+
+bool 

[PATCH v4 13/21] arm64: cpufeature: Enable IESB on exception entry/return for firmware-first

2017-10-19 Thread James Morse
ARM v8.2 has a feature to add implicit error synchronization barriers
whenever the CPU enters or returns from an exception level. Add code to
detect this feature and enable the SCTLR_ELx.IESB bit.

This feature causes RAS errors that are not yet visible to software to
become pending SErrors. We expect to have firmware-first RAS support
so synchronised RAS errors will be take immediately to EL3.
Any system without firmware-first handling of errors will take the SError
either immediatly after exception return, or when we unmask SError after
entry.S's work.

Platform level RAS support may require additional firmware support.

Cc: Christoffer Dall 
Cc: Marc Zyngier 
Signed-off-by: James Morse 
Reviewed-by: Catalin Marinas 

---
Note the sneaky KVM change,

Changes since v3:
 * removed IESB Kconfig option

 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/include/asm/sysreg.h|  1 +
 arch/arm64/kernel/cpufeature.c | 19 +++
 arch/arm64/kvm/hyp-init.S  |  3 +++
 5 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 4820d441bfb9..7a2bbbfdff49 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -41,7 +41,8 @@
 #define ARM64_WORKAROUND_CAVIUM_30115  20
 #define ARM64_HAS_DCPOP21
 #define ARM64_HAS_RAS_EXTN 22
+#define ARM64_HAS_IESB 23
 
-#define ARM64_NCAPS23
+#define ARM64_NCAPS24
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index 29adab8138c3..6b72ddc33d06 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -193,5 +193,6 @@ static inline void spin_lock_prefetch(const void *ptr)
 
 int cpu_enable_pan(void *__unused);
 int cpu_enable_cache_maint_trap(void *__unused);
+int cpu_enable_iesb(void *__unused);
 
 #endif /* __ASM_PROCESSOR_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 64e2a80fd749..4500a70c6a57 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -297,6 +297,7 @@
 
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_EE(1 << 25)
+#define SCTLR_ELx_IESB (1 << 21)
 #define SCTLR_ELx_I(1 << 12)
 #define SCTLR_ELx_SA   (1 << 3)
 #define SCTLR_ELx_C(1 << 2)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 0fc017b55cb1..356a5de51f5e 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -912,6 +912,17 @@ static const struct arm64_cpu_capabilities 
arm64_features[] = {
.field_pos = ID_AA64PFR0_RAS_SHIFT,
.min_field_value = ID_AA64PFR0_RAS_V1,
},
+   {
+   .desc = "Implicit Error Synchronization Barrier",
+   .capability = ARM64_HAS_IESB,
+   .def_scope = SCOPE_SYSTEM,
+   .matches = has_cpuid_feature,
+   .sys_reg = SYS_ID_AA64MMFR2_EL1,
+   .sign = FTR_UNSIGNED,
+   .field_pos = ID_AA64MMFR2_IESB_SHIFT,
+   .min_field_value = 1,
+   .enable = cpu_enable_iesb,
+   },
 #endif /* CONFIG_ARM64_RAS_EXTN */
{},
 };
@@ -1321,3 +1332,11 @@ static int __init enable_mrs_emulation(void)
 }
 
 late_initcall(enable_mrs_emulation);
+
+int cpu_enable_iesb(void *__unused)
+{
+   if (cpus_have_cap(ARM64_HAS_RAS_EXTN))
+   config_sctlr_el1(0, SCTLR_ELx_IESB);
+
+   return 0;
+}
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index 3f9615582377..8983e9473017 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -113,6 +113,9 @@ __do_hyp_init:
 */
ldr x4, =(SCTLR_EL2_RES1 | (SCTLR_ELx_FLAGS & ~SCTLR_ELx_A))
 CPU_BE(orr x4, x4, #SCTLR_ELx_EE)
+alternative_if ARM64_HAS_IESB
+   orr x4, x4, #SCTLR_ELx_IESB
+alternative_else_nop_endif
msr sctlr_el2, x4
isb
 
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 06/21] arm64: entry.S: convert el1_sync

2017-10-19 Thread James Morse
el1_sync unmasks exceptions on a case-by-case basis, debug exceptions
are unmasked, unless this was a debug exception. IRQs are unmasked
for instruction and data aborts only if the interupted context had
irqs unmasked.

Following our 'dai' order, el1_dbg should run with everything masked.
For the other cases we can inherit whatever we interrupted.

Add a macro inherit_daif to set daif based on the interrupted pstate.

Signed-off-by: James Morse 
Reviewed-by: Julien Thierry 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/assembler.h |  6 ++
 arch/arm64/kernel/entry.S  | 12 
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index 1b0972046a72..abb5abd61ddb 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -48,6 +48,12 @@
msr daif, \flags
.endm
 
+   /* Only on aarch64 pstate, PSR_D_BIT is different for aarch32 */
+   .macro  inherit_daif, pstate:req, tmp:req
+   and \tmp, \pstate, #(PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)
+   msr daif, \tmp
+   .endm
+
 /*
  * Enable and disable interrupts.
  */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index f7d7bf9d76e7..bd54115972a4 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -510,11 +510,7 @@ el1_da:
 * Data abort handling
 */
mrs x3, far_el1
-   enable_dbg
-   // re-enable interrupts if they were enabled in the aborted context
-   tbnzx23, #7, 1f // PSR_I_BIT
-   enable_irq
-1:
+   inherit_daifpstate=x23, tmp=x2
clear_address_tag x0, x3
mov x2, sp  // struct pt_regs
bl  do_mem_abort
@@ -525,7 +521,7 @@ el1_sp_pc:
 * Stack or PC alignment exception handling
 */
mrs x0, far_el1
-   enable_dbg
+   inherit_daifpstate=x23, tmp=x2
mov x2, sp
bl  do_sp_pc_abort
ASM_BUG()
@@ -533,7 +529,7 @@ el1_undef:
/*
 * Undefined instruction
 */
-   enable_dbg
+   inherit_daifpstate=x23, tmp=x2
mov x0, sp
bl  do_undefinstr
ASM_BUG()
@@ -550,7 +546,7 @@ el1_dbg:
kernel_exit 1
 el1_inv:
// TODO: add support for undefined instructions in kernel mode
-   enable_dbg
+   inherit_daifpstate=x23, tmp=x2
mov x0, sp
mov x2, x1
mov x1, #BAD_SYNC
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 11/21] arm64: cpufeature: Detect CPU RAS Extentions

2017-10-19 Thread James Morse
From: Xie XiuQi 

ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending.

Add cpufeature detection and a barrier in the context-switch code.
There is no need to use alternatives for this as CPUs that don't
support this feature will treat the instruction as a nop.

Platform level RAS support may require additional firmware support.

Signed-off-by: Xie XiuQi 
[Rebased, added esb and config option, reworded commit message]
Signed-off-by: James Morse 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/Kconfig   | 16 
 arch/arm64/include/asm/barrier.h |  1 +
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  2 ++
 arch/arm64/kernel/cpufeature.c   | 13 +
 arch/arm64/kernel/process.c  |  3 +++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 70dfe4e9ccc5..b68f5e93baac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -973,6 +973,22 @@ config ARM64_PMEM
  operations if DC CVAP is not supported (following the behaviour of
  DC CVAP itself if the system does not define a point of persistence).
 
+config ARM64_RAS_EXTN
+   bool "Enable support for RAS CPU Extensions"
+   default y
+   help
+ CPUs that support the Reliability, Availability and Serviceability
+ (RAS) Extensions, part of ARMv8.2 are able to track faults and
+ errors, classify them and report them to software.
+
+ On CPUs with these extensions system software can use additional
+ barriers to determine if faults are pending and read the
+ classification from a new set of registers.
+
+ Selecting this feature will allow the kernel to use these barriers
+ and access the new registers if the system supports the extension.
+ Platform RAS features may additionally depend on firmware support.
+
 endmenu
 
 config ARM64_MODULE_CMODEL_LARGE
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 0fe7e43b7fbc..8b0a0eb67625 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -30,6 +30,7 @@
 #define isb()  asm volatile("isb" : : : "memory")
 #define dmb(opt)   asm volatile("dmb " #opt : : : "memory")
 #define dsb(opt)   asm volatile("dsb " #opt : : : "memory")
+#define esb()  asm volatile("hint #16"  : : : "memory")
 
 #define mb()   dsb(sy)
 #define rmb()  dsb(ld)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da621627d7c..4820d441bfb9 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_85892119
 #define ARM64_WORKAROUND_CAVIUM_30115  20
 #define ARM64_HAS_DCPOP21
+#define ARM64_HAS_RAS_EXTN 22
 
-#define ARM64_NCAPS22
+#define ARM64_NCAPS23
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index f707fed5886f..64e2a80fd749 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -332,6 +332,7 @@
 #define ID_AA64ISAR1_DPB_SHIFT 0
 
 /* id_aa64pfr0 */
+#define ID_AA64PFR0_RAS_SHIFT  28
 #define ID_AA64PFR0_GIC_SHIFT  24
 #define ID_AA64PFR0_ASIMD_SHIFT20
 #define ID_AA64PFR0_FP_SHIFT   16
@@ -340,6 +341,7 @@
 #define ID_AA64PFR0_EL1_SHIFT  4
 #define ID_AA64PFR0_EL0_SHIFT  0
 
+#define ID_AA64PFR0_RAS_V1 0x1
 #define ID_AA64PFR0_FP_NI  0xf
 #define ID_AA64PFR0_FP_SUPPORTED   0x0
 #define ID_AA64PFR0_ASIMD_NI   0xf
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index cd52d365d1f0..0fc017b55cb1 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -125,6 +125,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar1[] = {
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
+   ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, 
ID_AA64PFR0_RAS_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, 
ID_AA64PFR0_GIC_SHIFT, 4, 0),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
@@ -900,6 +901,18 @@ static const struct arm64_cpu_capabilities 
arm64_features[] = {
.min_field_value = 1,
},
 #endif
+#ifdef CONFIG_ARM64_RAS_EXTN
+   {
+   

[PATCH v4 09/21] KVM: arm/arm64: mask/unmask daif around VHE guests

2017-10-19 Thread James Morse
Non-VHE systems take an exception to EL2 in order to world-switch into the
guest. When returning from the guest KVM implicitly restores the DAIF
flags when it returns to the kernel at EL1.

With VHE none of this exception-level jumping happens, so KVMs
world-switch code is exposed to the host kernel's DAIF values, and KVM
spills the guest-exit DAIF values back into the host kernel.
On entry to a guest we have Debug and SError exceptions unmasked, KVM
has switched VBAR but isn't prepared to handle these. On guest exit
Debug exceptions are left disabled once we return to the host and will
stay this way until we enter user space.

Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
happen after the hosts VBAR value has been synchronised by the isb in
__vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
setting KVMs VBAR value, but is kept here for symmetry.

Signed-off-by: James Morse 

---
Give me a kick if you want this reworked as a fix (which will then
conflict with this series), or a backportable version.

 arch/arm64/include/asm/kvm_host.h | 10 ++
 virt/kvm/arm/arm.c|  4 
 2 files changed, 14 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e923b58606e2..a0e2f7962401 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -384,4 +385,13 @@ static inline void __cpu_init_stage2(void)
  "PARange is %d bits, unsupported configuration!", parange);
 }
 
+static inline void kvm_arm_vhe_guest_enter(void)
+{
+   local_daif_mask();
+}
+
+static inline void kvm_arm_vhe_guest_exit(void)
+{
+   local_daif_restore(DAIF_PROCCTX_NOIRQ);
+}
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index b9f68e4add71..665529924b34 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -698,9 +698,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 */
trace_kvm_entry(*vcpu_pc(vcpu));
guest_enter_irqoff();
+   if (has_vhe())
+   kvm_arm_vhe_guest_enter();
 
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
+   if (has_vhe())
+   kvm_arm_vhe_guest_exit();
vcpu->mode = OUTSIDE_GUEST_MODE;
vcpu->stat.exits++;
/*
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 14/21] arm64: kernel: Prepare for a DISR user

2017-10-19 Thread James Morse
KVM would like to consume any pending SError (or RAS error) after guest
exit. Today it has to unmask SError and use dsb+isb to synchronise the
CPU. With the RAS extensions we can use ESB to synchronise any pending
SError.

Add the necessary macros to allow DISR to be read and converted to an
ESR.

We clear the DISR register when we enable the RAS cpufeature. While the
kernel has issued ESB instructions before this point it has only done so
with SError unmasked, any value we find in DISR must have belonged to
firmware. Executing an ESB instruction is the only way to update DISR,
so we can expect firmware to have handled any deferred SError. By the
same logic we clear DISR in the idle path.

Signed-off-by: James Morse 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/assembler.h |  7 +++
 arch/arm64/include/asm/esr.h   |  7 +++
 arch/arm64/include/asm/exception.h | 14 ++
 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/include/asm/sysreg.h|  1 +
 arch/arm64/kernel/cpufeature.c |  9 +
 arch/arm64/mm/proc.S   |  5 +
 7 files changed, 44 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index e4ac505b7b3d..7a918a4cf4f7 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -109,6 +109,13 @@
.endm
 
 /*
+ * RAS Error Synchronization barrier
+ */
+   .macro  esb
+   hint#16
+   .endm
+
+/*
  * NOP sequence
  */
.macro  nops, num
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 8ea52f15bf1c..b3f17fd18b14 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -136,6 +136,13 @@
 #define ESR_ELx_WFx_ISS_WFE(UL(1) << 0)
 #define ESR_ELx_xVC_IMM_MASK   ((1UL << 16) - 1)
 
+#define DISR_EL1_IDS   (UL(1) << 24)
+/*
+ * DISR_EL1 and ESR_ELx share the bottom 13 bits, but the RES0 bits may mean
+ * different things in the future...
+ */
+#define DISR_EL1_ESR_MASK  (ESR_ELx_AET | ESR_ELx_EA | ESR_ELx_FSC)
+
 /* ESR value templates for specific events */
 
 /* BRK instruction trap from AArch64 state */
diff --git a/arch/arm64/include/asm/exception.h 
b/arch/arm64/include/asm/exception.h
index 0c2eec490abf..bc30429d8e91 100644
--- a/arch/arm64/include/asm/exception.h
+++ b/arch/arm64/include/asm/exception.h
@@ -18,6 +18,8 @@
 #ifndef __ASM_EXCEPTION_H
 #define __ASM_EXCEPTION_H
 
+#include 
+
 #include 
 
 #define __exception__attribute__((section(".exception.text")))
@@ -27,4 +29,16 @@
 #define __exception_irq_entry  __exception
 #endif
 
+static inline u32 disr_to_esr(u64 disr)
+{
+   unsigned int esr = ESR_ELx_EC_SERROR << ESR_ELx_EC_SHIFT;
+
+   if ((disr & DISR_EL1_IDS) == 0)
+   esr |= (disr & DISR_EL1_ESR_MASK);
+   else
+   esr |= (disr & ESR_ELx_ISS_MASK);
+
+   return esr;
+}
+
 #endif /* __ASM_EXCEPTION_H */
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index 6b72ddc33d06..9de3f839be8a 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -194,5 +194,6 @@ static inline void spin_lock_prefetch(const void *ptr)
 int cpu_enable_pan(void *__unused);
 int cpu_enable_cache_maint_trap(void *__unused);
 int cpu_enable_iesb(void *__unused);
+int cpu_clear_disr(void *__unused);
 
 #endif /* __ASM_PROCESSOR_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 4500a70c6a57..427c36bc5dd6 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -179,6 +179,7 @@
 #define SYS_AMAIR_EL1  sys_reg(3, 0, 10, 3, 0)
 
 #define SYS_VBAR_EL1   sys_reg(3, 0, 12, 0, 0)
+#define SYS_DISR_EL1   sys_reg(3, 0, 12, 1,  1)
 
 #define SYS_ICC_IAR0_EL1   sys_reg(3, 0, 12, 8, 0)
 #define SYS_ICC_EOIR0_EL1  sys_reg(3, 0, 12, 8, 1)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 356a5de51f5e..e799b72f1395 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -911,6 +911,7 @@ static const struct arm64_cpu_capabilities arm64_features[] 
= {
.sign = FTR_UNSIGNED,
.field_pos = ID_AA64PFR0_RAS_SHIFT,
.min_field_value = ID_AA64PFR0_RAS_V1,
+   .enable = cpu_clear_disr,
},
{
.desc = "Implicit Error Synchronization Barrier",
@@ -1340,3 +1341,11 @@ int cpu_enable_iesb(void *__unused)
 
return 0;
 }
+
+int cpu_clear_disr(void *__unused)
+{
+   /* Firmware may have left a deferred SError in this register. */
+   write_sysreg_s(0, SYS_DISR_EL1);
+
+   return 0;
+}
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 95233dfc4c39..ac571223672d 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -124,6 +124,11 @@ 

[PATCH v4 04/21] arm64: Mask all exceptions during kernel_exit

2017-10-19 Thread James Morse
To take RAS Exceptions as quickly as possible we need to keep SError
unmasked as much as possible. We need to mask it during kernel_exit
as taking an error from this code will overwrite the exception-registers.

Adding a naked 'disable_daif' to kernel_exit causes a performance problem
for micro-benchmarks that do no real work, (e.g. calling getpid() in a
loop). This is because the ret_to_user loop has already masked IRQs so
that the TIF_WORK_MASK thread flags can't change underneath it, adding
disable_daif is an additional self-synchronising operation.

In the future, the RAS APEI code may need to modify the TIF_WORK_MASK
flags from an SError, in which case the ret_to_user loop must mask SError
while it examines the flags.

Disable all exceptions for return to EL1. For return to EL0 get the
ret_to_user loop to leave all exceptions masked once it has done its
work, this avoids an extra pstate-write.

Signed-off-by: James Morse 
Reviewed-by: Julien Thierry 
Reviewed-by: Catalin Marinas 

---
Changes since v3:
* swapped verb/daif word-order.

 arch/arm64/kernel/entry.S  | 10 +-
 arch/arm64/kernel/signal.c |  8 ++--
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index e1c59d4008a8..f7d7bf9d76e7 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -221,6 +221,8 @@ alternative_else_nop_endif
 
.macro  kernel_exit, el
.if \el != 0
+   disable_daif
+
/* Restore the task's original addr_limit. */
ldr x20, [sp, #S_ORIG_ADDR_LIMIT]
str x20, [tsk, #TSK_TI_ADDR_LIMIT]
@@ -517,8 +519,6 @@ el1_da:
mov x2, sp  // struct pt_regs
bl  do_mem_abort
 
-   // disable interrupts before pulling preserved data off the stack
-   disable_irq
kernel_exit 1
 el1_sp_pc:
/*
@@ -793,7 +793,7 @@ ENDPROC(el0_irq)
  * and this includes saving x0 back into the kernel stack.
  */
 ret_fast_syscall:
-   disable_irq // disable interrupts
+   disable_daif
str x0, [sp, #S_X0] // returned x0
ldr x1, [tsk, #TSK_TI_FLAGS]// re-check for syscall tracing
and x2, x1, #_TIF_SYSCALL_WORK
@@ -803,7 +803,7 @@ ret_fast_syscall:
enable_step_tsk x1, x2
kernel_exit 0
 ret_fast_syscall_trace:
-   enable_irq  // enable interrupts
+   enable_daif
b   __sys_trace_return_skipped  // we already saved x0
 
 /*
@@ -821,7 +821,7 @@ work_pending:
  * "slow" syscall return path.
  */
 ret_to_user:
-   disable_irq // disable interrupts
+   disable_daif
ldr x1, [tsk, #TSK_TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK
cbnzx2, work_pending
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 0bdc96c61bc0..8e6500c9471b 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -756,9 +757,12 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
addr_limit_user_check();
 
if (thread_flags & _TIF_NEED_RESCHED) {
+   /* Unmask Debug and SError for the next task */
+   local_daif_restore(DAIF_PROCCTX_NOIRQ);
+
schedule();
} else {
-   local_irq_enable();
+   local_daif_restore(DAIF_PROCCTX);
 
if (thread_flags & _TIF_UPROBE)
uprobe_notify_resume(regs);
@@ -775,7 +779,7 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
fpsimd_restore_current_state();
}
 
-   local_irq_disable();
+   local_daif_mask();
thread_flags = READ_ONCE(current_thread_info()->flags);
} while (thread_flags & _TIF_WORK_MASK);
 }
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 05/21] arm64: entry.S: Remove disable_dbg

2017-10-19 Thread James Morse
enable_step_tsk is the only user of disable_dbg, which doesn't respect
our 'dai' order for exception masking. enable_step_tsk may enable
single-step, so previously needed to mask debug exceptions to prevent us
from single-stepping kernel_exit. enable_step_tsk is called at the end
of the ret_to_user loop, which has already masked all exceptions so this
is no longer needed.

Remove disable_dbg, add a comment that enable_step_tsk's caller should
have masked debug.

Signed-off-by: James Morse 
Reviewed-by: Julien Thierry 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/assembler.h | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index 114e741ca873..1b0972046a72 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -68,13 +68,6 @@
msr daif, \flags
.endm
 
-/*
- * Enable and disable debug exceptions.
- */
-   .macro  disable_dbg
-   msr daifset, #8
-   .endm
-
.macro  enable_dbg
msr daifclr, #8
.endm
@@ -88,9 +81,9 @@
 9990:
.endm
 
+   /* call with daif masked */
.macro  enable_step_tsk, flgs, tmp
tbz \flgs, #TIF_SINGLESTEP, 9990f
-   disable_dbg
mrs \tmp, mdscr_el1
orr \tmp, \tmp, #1
msr mdscr_el1, \tmp
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 01/21] arm64: explicitly mask all exceptions

2017-10-19 Thread James Morse
There are a few places where we want to mask all exceptions. Today we
do this in a piecemeal fashion, typically we expect the caller to
have masked irqs and the arch code masks debug exceptions, ignoring
serror which is probably masked.

Make it clear that 'mask all exceptions' is the intention by adding
helpers to do exactly that.

This will let us unmask SError without having to add 'oh and SError'
to these paths.

Signed-off-by: James Morse 
Reviewed-by: Julien Thierry 
Reviewed-by: Catalin Marinas 

---
Remove the disable IRQs comment above cpu_die(): we return from idle via
cpu_resume. This comment is confusing once the local_irq_disable() call
is changed.

Changes since v3:
 * Split local_mask_daif into a save and mask versions,
 * swapped verb/daif word-order.
 * Removed {asm,linux} includes - one will always include the other

 arch/arm64/include/asm/assembler.h | 17 ++
 arch/arm64/include/asm/daifflags.h | 69 ++
 arch/arm64/kernel/hibernate.c  |  5 +--
 arch/arm64/kernel/machine_kexec.c  |  4 +--
 arch/arm64/kernel/smp.c|  9 ++---
 arch/arm64/kernel/suspend.c|  7 ++--
 arch/arm64/kernel/traps.c  |  3 +-
 arch/arm64/mm/proc.S   |  9 +++--
 8 files changed, 104 insertions(+), 19 deletions(-)
 create mode 100644 arch/arm64/include/asm/daifflags.h

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d58a6253c6ab..114e741ca873 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -31,6 +31,23 @@
 #include 
 #include 
 
+   .macro save_and_disable_daif, flags
+   mrs \flags, daif
+   msr daifset, #0xf
+   .endm
+
+   .macro disable_daif
+   msr daifset, #0xf
+   .endm
+
+   .macro enable_daif
+   msr daifclr, #0xf
+   .endm
+
+   .macro  restore_daif, flags:req
+   msr daif, \flags
+   .endm
+
 /*
  * Enable and disable interrupts.
  */
diff --git a/arch/arm64/include/asm/daifflags.h 
b/arch/arm64/include/asm/daifflags.h
new file mode 100644
index ..55e2598a8c4c
--- /dev/null
+++ b/arch/arm64/include/asm/daifflags.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#ifndef __ASM_DAIFFLAGS_H
+#define __ASM_DAIFFLAGS_H
+
+#include 
+
+/* mask/save/unmask/restore all exceptions, including interrupts. */
+static inline void local_daif_mask(void)
+{
+   asm volatile(
+   "msrdaifset, #0xf   // local_daif_mask\n"
+   :
+   :
+   : "memory");
+   trace_hardirqs_off();
+}
+
+static inline unsigned long local_daif_save(void)
+{
+   unsigned long flags;
+
+   asm volatile(
+   "mrs%0, daif// local_daif_save\n"
+   : "=r" (flags)
+   :
+   : "memory");
+   local_daif_mask();
+
+   return flags;
+}
+
+static inline void local_daif_unmask(void)
+{
+   trace_hardirqs_on();
+   asm volatile(
+   "msrdaifclr, #0xf   // local_daif_unmask"
+   :
+   :
+   : "memory");
+}
+
+static inline void local_daif_restore(unsigned long flags)
+{
+   if (!arch_irqs_disabled_flags(flags))
+   trace_hardirqs_on();
+   asm volatile(
+   "msrdaif, %0// local_daif_restore"
+   :
+   : "r" (flags)
+   : "memory");
+   if (arch_irqs_disabled_flags(flags))
+   trace_hardirqs_off();
+}
+
+#endif
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 095d3c170f5d..3009b8b80f08 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -285,7 +286,7 @@ int swsusp_arch_suspend(void)
return -EBUSY;
}
 
-   local_dbg_save(flags);
+   flags = local_daif_save();
 
if (__cpu_suspend_enter()) {
/* make the crash dump kernel image visible/saveable */
@@ -315,7 +316,7 @@ int swsusp_arch_suspend(void)
__cpu_suspend_exit();
}
 
-   local_dbg_restore(flags);
+   local_daif_restore(flags);
 

[PATCH v4 00/21] SError rework + RAS for firmware first support

2017-10-19 Thread James Morse
Hello,

The aim of this series is to enable IESB and add ESB-instructions to let us
kick any pending RAS errors into firmware to be handled by firmware-first.

Not all systems will have this firmware, so these RAS errors will become
pending SErrors. We should take these as quickly as possible and avoid
panic()ing for errors where we could have continued.

This first part of this series reworks the DAIF masking so that SError is
unmasked unless we are handling a debug exception.

The last part provides the same minimal handling for SError that interrupt
KVM. KVM is currently unable to handle SErrors during world-switch, unless
they occur during a magic single-instruction window, it hyp-panics. I suspect
this will be easier to fix once the VHE world-switch is further optimised.

KVMs kvm_inject_vabt() needs updating for v8.2 as now we can specify an ESR,
and all-zeros has a RAS meaning.

KVM's existing 'impdef SError to the guest' behaviour probably needs revisiting.
These are errors where we don't know what they mean, they may not be
synchronised by ESB. Today we blame the guest.
My half-baked suggestion would be to make a virtual SError pending, but then
exit to user-space to give Qemu the change to quit (for virtual machines that
don't generate SError), pend an SError with a new Qemu-specific ESR, or blindly
continue and take KVMs default all-zeros impdef ESR.

Known issues:
 * Synchronous external abort SET severity is not yet considered, all
   synchronous-external-aborts are still considered fatal.

 * KVM-Migration: VDISR_EL2 is exposed to userspace as DISR_EL1, but how should
   HCR_EL2.VSE or VSESR_EL2 be migrated when the guest has an SError pending but
   hasn't taken it yet...?

 * KVM unmasks SError and IRQ before calling handle_exit, we may be rescheduled
   while holding an uncontained ESR... (this is currently an improvement on
   assuming its an impdef error we can blame on the guest)
* We need to fix this for APEI's SEI or kernel-first RAS, the guest-exit
  SError handling will need to move to before kvm_arm_vhe_guest_exit().


Changes from v3:
 * Symbol naming around daif flag helpers
 * Removed the IESB kconfig option
 * Moved that nop out of the firing line in the vaxorcism code
 * Added last patch to Trap ERR registers and make them RAZ/WI


Comments and contradictions welcome,

James


Dongjiu Geng (1):
  KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA

James Morse (18):
  arm64: explicitly mask all exceptions
  arm64: introduce an order for exceptions
  arm64: Move the async/fiq helpers to explicitly set process context
flags
  arm64: Mask all exceptions during kernel_exit
  arm64: entry.S: Remove disable_dbg
  arm64: entry.S: convert el1_sync
  arm64: entry.S convert el0_sync
  arm64: entry.S: convert elX_irq
  KVM: arm/arm64: mask/unmask daif around VHE guests
  arm64: kernel: Survive corrected RAS errors notified by SError
  arm64: cpufeature: Enable IESB on exception entry/return for
firmware-first
  arm64: kernel: Prepare for a DISR user
  KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.
  KVM: arm64: Save/Restore guest DISR_EL1
  KVM: arm64: Save ESR_EL2 on guest SError
  KVM: arm64: Handle RAS SErrors from EL1 on guest exit
  KVM: arm64: Handle RAS SErrors from EL2 on guest exit
  KVM: arm64: Take any host SError before entering the guest

Xie XiuQi (2):
  arm64: entry.S: move SError handling into a C function for future
expansion
  arm64: cpufeature: Detect CPU RAS Extentions

 arch/arm64/Kconfig   | 18 +++-
 arch/arm64/include/asm/assembler.h   | 51 ++---
 arch/arm64/include/asm/barrier.h |  1 +
 arch/arm64/include/asm/cpucaps.h |  4 +-
 arch/arm64/include/asm/daifflags.h   | 72 ++
 arch/arm64/include/asm/esr.h | 17 +++
 arch/arm64/include/asm/exception.h   | 14 ++
 arch/arm64/include/asm/irqflags.h| 40 ++---
 arch/arm64/include/asm/kvm_arm.h |  2 +
 arch/arm64/include/asm/kvm_emulate.h | 17 +++
 arch/arm64/include/asm/kvm_host.h| 16 +++
 arch/arm64/include/asm/processor.h   |  2 +
 arch/arm64/include/asm/sysreg.h  | 16 +++
 arch/arm64/include/asm/traps.h   | 36 +++
 arch/arm64/kernel/asm-offsets.c  |  1 +
 arch/arm64/kernel/cpufeature.c   | 41 +
 arch/arm64/kernel/debug-monitors.c   |  5 ++-
 arch/arm64/kernel/entry.S| 86 +---
 arch/arm64/kernel/hibernate.c|  5 ++-
 arch/arm64/kernel/machine_kexec.c|  4 +-
 arch/arm64/kernel/process.c  |  3 ++
 arch/arm64/kernel/setup.c|  8 ++--
 arch/arm64/kernel/signal.c   |  8 +++-
 arch/arm64/kernel/smp.c  | 12 ++---
 arch/arm64/kernel/suspend.c  |  7 +--
 arch/arm64/kernel/traps.c| 64 ++-
 arch/arm64/kvm/handle_exit.c | 19 +++-
 arch/arm64/kvm/hyp-init.S|  

[PATCH v4 03/21] arm64: Move the async/fiq helpers to explicitly set process context flags

2017-10-19 Thread James Morse
Remove the local_{async,fiq}_{en,dis}able macros as they don't respect
our newly defined order and are only used to set the flags for process
context when we bring CPUs online.

Add a helper to do this. The IRQ flag varies as we want it masked on
the boot CPU until we are ready to handle interrupts.
The boot CPU unmasks SError during early boot once it can print an error
message. If we can print an error message about SError, we can do the
same for FIQ. Debug exceptions are already enabled by __cpu_setup(),
which has also configured MDSCR_EL1 to disable MDE and KDE.

Signed-off-by: James Morse 
Reviewed-by: Julien Thierry 
Reviewed-by: Catalin Marinas 

---
Changes since v3:
 * swapped verb/daif word-order.

 arch/arm64/include/asm/daifflags.h | 3 +++
 arch/arm64/include/asm/irqflags.h  | 6 --
 arch/arm64/kernel/setup.c  | 8 +---
 arch/arm64/kernel/smp.c| 3 +--
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/daifflags.h 
b/arch/arm64/include/asm/daifflags.h
index 55e2598a8c4c..22e4c83de5a5 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -18,6 +18,9 @@
 
 #include 
 
+#define DAIF_PROCCTX   0
+#define DAIF_PROCCTX_NOIRQ PSR_I_BIT
+
 /* mask/save/unmask/restore all exceptions, including interrupts. */
 static inline void local_daif_mask(void)
 {
diff --git a/arch/arm64/include/asm/irqflags.h 
b/arch/arm64/include/asm/irqflags.h
index 9ecdca7011f0..24692edf1a69 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -66,12 +66,6 @@ static inline void arch_local_irq_disable(void)
: "memory");
 }
 
-#define local_fiq_enable() asm("msrdaifclr, #1" : : : "memory")
-#define local_fiq_disable()asm("msrdaifset, #1" : : : "memory")
-
-#define local_async_enable()   asm("msrdaifclr, #4" : : : "memory")
-#define local_async_disable()  asm("msrdaifset, #4" : : : "memory")
-
 /*
  * Save the current interrupt enable state.
  */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index d4b740538ad5..ad285f024934 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -262,10 +263,11 @@ void __init setup_arch(char **cmdline_p)
parse_early_param();
 
/*
-*  Unmask asynchronous aborts after bringing up possible earlycon.
-* (Report possible System Errors once we can report this occurred)
+* Unmask asynchronous aborts and fiq after bringing up possible
+* earlycon. (Report possible System Errors once we can report this
+* occurred).
 */
-   local_async_enable();
+   local_daif_restore(DAIF_PROCCTX_NOIRQ);
 
/*
 * TTBR0 is only used for the identity mapping at this stage. Make it
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 5a407eba01f7..d92e03faa51a 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -272,8 +272,7 @@ asmlinkage void secondary_start_kernel(void)
set_cpu_online(cpu, true);
complete(_running);
 
-   local_irq_enable();
-   local_async_enable();
+   local_daif_restore(DAIF_PROCCTX);
 
/*
 * OK, it's off to the idle thread for us
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 02/21] arm64: introduce an order for exceptions

2017-10-19 Thread James Morse
Currently SError is always masked in the kernel. To support RAS exceptions
using SError on hardware with the v8.2 RAS Extensions we need to unmask
SError as much as possible.

Let's define an order for masking and unmasking exceptions. 'dai' is
memorable and effectively what we have today.

Disabling debug exceptions should cause all other exceptions to be masked.
Masking SError should mask irq, but not disable debug exceptions.
Masking irqs has no side effects for other flags. Keeping to this order
makes it easier for entry.S to know which exceptions should be unmasked.

FIQ is never expected, but we mask it when we mask debug exceptions, and
unmask it at all other times.

Given masking debug exceptions masks everything, we don't need macros
to save/restore that bit independently. Remove them and switch the last
caller over to use the daif calls.

Signed-off-by: James Morse 
Reviewed-by: Julien Thierry 
Reviewed-by: Catalin Marinas 

---
Changes since v3:
 * Use save_daif version, swapped verb/daif word-order.
 * tweak last sentence of commit message.

 arch/arm64/include/asm/irqflags.h  | 34 +-
 arch/arm64/kernel/debug-monitors.c |  5 +++--
 2 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/irqflags.h 
b/arch/arm64/include/asm/irqflags.h
index 8c581281fa12..9ecdca7011f0 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -21,6 +21,19 @@
 #include 
 
 /*
+ * Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts and
+ * FIQ exceptions, in the 'daif' register. We mask and unmask them in 'dai'
+ * order:
+ * Masking debug exceptions causes all other exceptions to be masked too/
+ * Masking SError masks irq, but not debug exceptions. Masking irqs has no
+ * side effects for other flags. Keeping to this order makes it easier for
+ * entry.S to know which exceptions should be unmasked.
+ *
+ * FIQ is never expected, but we mask it when we disable debug exceptions, and
+ * unmask it at all other times.
+ */
+
+/*
  * CPU interrupt mask handling.
  */
 static inline unsigned long arch_local_irq_save(void)
@@ -89,26 +102,5 @@ static inline int arch_irqs_disabled_flags(unsigned long 
flags)
 {
return flags & PSR_I_BIT;
 }
-
-/*
- * save and restore debug state
- */
-#define local_dbg_save(flags)  \
-   do {\
-   typecheck(unsigned long, flags);\
-   asm volatile(   \
-   "mrs%0, daif// local_dbg_save\n"\
-   "msrdaifset, #8"\
-   : "=r" (flags) : : "memory");   \
-   } while (0)
-
-#define local_dbg_restore(flags)   \
-   do {\
-   typecheck(unsigned long, flags);\
-   asm volatile(   \
-   "msrdaif, %0// local_dbg_restore\n" \
-   : : "r" (flags) : "memory");\
-   } while (0)
-
 #endif
 #endif
diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index c7ef99904934..a88b6ccebbb4 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -30,6 +30,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -46,9 +47,9 @@ u8 debug_monitors_arch(void)
 static void mdscr_write(u32 mdscr)
 {
unsigned long flags;
-   local_dbg_save(flags);
+   flags = local_daif_save();
write_sysreg(mdscr, mdscr_el1);
-   local_dbg_restore(flags);
+   local_daif_restore(flags);
 }
 NOKPROBE_SYMBOL(mdscr_write);
 
-- 
2.13.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 3/7] KVM: arm/arm64: Don't cache the timer IRQ level

2017-10-19 Thread Christoffer Dall
On Tue, Oct 10, 2017 at 11:39:58AM +0100, Marc Zyngier wrote:
> On Fri, Sep 15 2017 at  3:19:50 pm BST, Christoffer Dall 
>  wrote:
> > From: Christoffer Dall 
> >
> > The timer was modeled after a strict idea of modelling an interrupt line
> > level in software, meaning that only transitions in the level needed to
> > be reported to the VGIC.  This works well for the timer, because the
> > arch timer code is in complete control of the device and can track the
> > transitions of the line.
> >
> > However, as we are about to support using the HW bit in the VGIC not
> > just for the timer, but also for VFIO which cannot track transitions of
> > the interrupt line, we have to decide on an interface for level
> > triggered mapped interrupts to the GIC, which both the timer and VFIO
> > can use.
> >
> > VFIO only sees an asserting transition of the physical interrupt line,
> > and tells the VGIC when that happens.  That means that part of the
> > interrupt flow is offloaded to the hardware.
> >
> > To use the same interface for VFIO devices and the timer, we therefore
> > have to change the timer (we cannot change VFIO because it doesn't know
> > the details of the device it is assigning to a VM).
> >
> > Luckily, changing the timer is simple, we just need to stop 'caching'
> > the line level, but instead let the VGIC know the state of the timer on
> > every entry to the guest, and the VGIC can ignore notifications using
> > its validate mechanism.
> >
> > Signed-off-by: Christoffer Dall 
> > ---
> >  virt/kvm/arm/arch_timer.c | 14 --
> >  1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 8e89d63..2a5f877 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -219,9 +219,10 @@ static void kvm_timer_update_irq(struct kvm_vcpu 
> > *vcpu, bool new_level,
> > int ret;
> >  
> > timer_ctx->active_cleared_last = false;
> > +   if (timer_ctx->irq.level != new_level)
> > +   trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
> > +  new_level);
> > timer_ctx->irq.level = new_level;
> > -   trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
> > -  timer_ctx->irq.level);
> >  
> > if (likely(irqchip_in_kernel(vcpu->kvm))) {
> > ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > @@ -241,6 +242,7 @@ static void kvm_timer_update_state(struct kvm_vcpu 
> > *vcpu)
> > struct arch_timer_cpu *timer = >arch.timer_cpu;
> > struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> > +   bool level;
> >  
> > /*
> >  * If userspace modified the timer registers via SET_ONE_REG before
> > @@ -251,11 +253,11 @@ static void kvm_timer_update_state(struct kvm_vcpu 
> > *vcpu)
> > if (unlikely(!timer->enabled))
> > return;
> >  
> > -   if (kvm_timer_should_fire(vtimer) != vtimer->irq.level)
> > -   kvm_timer_update_irq(vcpu, !vtimer->irq.level, vtimer);
> > +   level = kvm_timer_should_fire(vtimer);
> > +   kvm_timer_update_irq(vcpu, level, vtimer);
> >  
> > -   if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> > -   kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> > +   level = kvm_timer_should_fire(ptimer);
> > +   kvm_timer_update_irq(vcpu, level, ptimer);
> 
> Well, at this stage, you might as well fold the kvm_timer_should_fire()
> into kvm_timer_update_irq() and from the level parameter. But I suspect
> this is going to clash badly with your timer series?
> 

Yes, it's doing extra unnecessary work in the critical path on IRQ
injections from the ISR after the timer series.

How about I rebase this series on the timer series and do a resend of
this one, without additional changes, and we have a look at it?

> >  }
> >  
> >  /* Schedule the background timer for the emulated timer. */
> 
> Otherwise:
> 
> Reviewed-by: Marc Zyngier 
> 
Thanks!
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 18/20] KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit

2017-10-19 Thread Christoffer Dall
On Tue, Oct 10, 2017 at 10:45:15AM +0100, Marc Zyngier wrote:
> On Sat, Sep 23 2017 at  2:42:05 am BST, Christoffer Dall  
> wrote:
> > There is no need to schedule and cancel a hrtimer when entering and
> > exiting the guest, because we know when the physical timer is going to
> > fire when the guest programs it, and we can simply program the hrtimer
> > at that point.
> >
> > Now when the register modifications from the guest go through the
> > kvm_arm_timer_set/get_reg functions, which always call
> > kvm_timer_update_state(), we can simply consider the timer state in this
> > function and schedule and cancel the timers as needed.
> >
> > This avoids looking at the physical timer emulation state when entering
> > and exiting the VCPU, allowing for faster servicing of the VM when
> > needed.
> >
> > Signed-off-by: Christoffer Dall 
> > ---
> >  virt/kvm/arm/arch_timer.c | 75 
> > ---
> >  1 file changed, 51 insertions(+), 24 deletions(-)
> >
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 1f82c21..aa18a5d 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -199,7 +199,27 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct 
> > hrtimer *hrt)
> >  
> >  static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
> >  {
> > -   WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> > +   struct arch_timer_context *ptimer;
> > +   struct arch_timer_cpu *timer;
> > +   struct kvm_vcpu *vcpu;
> > +   u64 ns;
> > +
> > +   timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
> > +   vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
> > +   ptimer = vcpu_ptimer(vcpu);
> > +
> > +   /*
> > +* Check that the timer has really expired from the guest's
> > +* PoV (NTP on the host may have forced it to expire
> > +* early). If not ready, schedule for a later time.
> > +*/
> > +   ns = kvm_timer_compute_delta(ptimer);
> > +   if (unlikely(ns)) {
> > +   hrtimer_forward_now(hrt, ns_to_ktime(ns));
> > +   return HRTIMER_RESTART;
> > +   }
> 
> Don't we already have a similar logic for the background timer (I must
> admit I've lost track of how we changed things in this series)? If so,
> can we make this common code?
> 

I looked at it, but the functions are slightly different.  In the
phys_timer, we just have to calculate the delta and figure out if we
need to restart it.  In the bg_timer, we have to compute the delta of
both the vtimer and the ptimer, figure out the earliest one, and figure
out if we have to restart it.  Therefore, kvm_timer_earliest_exp()
alrady calls kvm_timer_compute_delate() to share code, and the only code
we can really share is:

if (unlikely(ns)) {
hrtimer_forward_now(hrt, ns_to_ktime(ns));
return HRTIMER_RESTART;
}

The following code also cannot really be shared because in one case we
have to schedule work (the bg timer) and in the other case we have to
inject an irq (the phys timer).  Ther alternative would be:

static enum hrtimer_restart kvm_soft_timer_expire(struct hrtimer *hrt, bool 
is_bg_timer)
{
struct arch_timer_context *ptimer;
struct arch_timer_cpu *timer;
struct kvm_vcpu *vcpu;
u64 ns;

if (is_bg_timer)
timer = container_of(hrt, struct arch_timer_cpu, bg_timer);
else
timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
ptimer = vcpu_ptimer(vcpu);

/*
 * Check that the timer has really expired from the guest's
 * PoV (NTP on the host may have forced it to expire
 * early). If we should have slept longer, restart it.
 */
if (is_bg_timer)
ns = kvm_timer_earliest_exp(vcpu);
else
ns = kvm_timer_compute_delta(ptimer);
if (unlikely(ns)) {
hrtimer_forward_now(hrt, ns_to_ktime(ns));
return HRTIMER_RESTART;
}

if (is_bg_timer)
schedule_work(>expired);
else
kvm_timer_update_irq(vcpu, true, ptimer);
return HRTIMER_NORESTART;
}

But I prefer just having them separate.


> > +
> > +   kvm_timer_update_irq(vcpu, true, ptimer);
> > return HRTIMER_NORESTART;
> >  }
> >  
> > @@ -253,24 +273,28 @@ static void kvm_timer_update_irq(struct kvm_vcpu 
> > *vcpu, bool new_level,
> >  }
> >  
> >  /* Schedule the background timer for the emulated timer. */
> > -static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> > - struct arch_timer_context *timer_ctx)
> > +static void phys_timer_emulate(struct kvm_vcpu *vcpu)
> >  {
> > struct arch_timer_cpu *timer = >arch.timer_cpu;
> > +   struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> > -   

Re: [PATCH v3 15/20] KVM: arm/arm64: Support EL1 phys timer register access in set/get reg

2017-10-19 Thread Christoffer Dall
On Tue, Oct 10, 2017 at 10:10:27AM +0100, Marc Zyngier wrote:
> On Sat, Sep 23 2017 at  2:42:02 am BST, Christoffer Dall  
> wrote:
> > Add suport for the physical timer registers in kvm_arm_timer_set_reg and
> > kvm_arm_timer_get_reg so that these functions can be reused to interact
> > with the rest of the system.
> >
> > Note that this paves part of the way for the physical timer state
> > save/restore, but we still need to add those registers to
> > KVM_GET_REG_LIST before we support migrating the physical timer state.
> >
> > Signed-off-by: Christoffer Dall 
> > ---
> >  arch/arm/include/uapi/asm/kvm.h   |  6 ++
> >  arch/arm64/include/uapi/asm/kvm.h |  6 ++
> >  virt/kvm/arm/arch_timer.c | 33 +++--
> >  3 files changed, 43 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm/include/uapi/asm/kvm.h 
> > b/arch/arm/include/uapi/asm/kvm.h
> > index 5db2d4c..665c454 100644
> > --- a/arch/arm/include/uapi/asm/kvm.h
> > +++ b/arch/arm/include/uapi/asm/kvm.h
> > @@ -151,6 +151,12 @@ struct kvm_arch_memory_slot {
> > (__ARM_CP15_REG(op1, 0, crm, 0) | KVM_REG_SIZE_U64)
> >  #define ARM_CP15_REG64(...) __ARM_CP15_REG64(__VA_ARGS__)
> >  
> > +/* PL1 Physical Timer Registers */
> > +#define KVM_REG_ARM_PTIMER_CTL ARM_CP15_REG32(0, 14, 2, 1)
> > +#define KVM_REG_ARM_PTIMER_CNT ARM_CP15_REG64(0, 14)
> > +#define KVM_REG_ARM_PTIMER_CVALARM_CP15_REG64(2, 14)
> > +
> > +/* Virtual Timer Registers */
> >  #define KVM_REG_ARM_TIMER_CTL  ARM_CP15_REG32(0, 14, 3, 1)
> >  #define KVM_REG_ARM_TIMER_CNT  ARM_CP15_REG64(1, 14)
> >  #define KVM_REG_ARM_TIMER_CVAL ARM_CP15_REG64(3, 14)
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> > b/arch/arm64/include/uapi/asm/kvm.h
> > index 9f3ca24..07be6e2 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -195,6 +195,12 @@ struct kvm_arch_memory_slot {
> >  
> >  #define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | 
> > KVM_REG_SIZE_U64)
> >  
> > +/* EL1 Physical Timer Registers */
> 
> These are EL0 registers, even if we tend to restrict them to EL1. Even
> the 32bit version is not strictly a PL1 register, since PL1 can delegate
> it to userspace (but the ARMv7 ARM still carries this PL1 thing...).
> 

The latest publicly available ARM ARM also refers to the timer as the
"EL1 Physical Timer", for example, the EL0 register CNTP_CTL_EL0 is
described as "Control register for the EL1 physical timer", so the
associativity in my comment was "(EL1 Physical Timer) Registers", and
not "Physical Timer (EL1 Registers)" :)

How about "EL0 Registers for the EL1 Physical Timer" or "Physical Timer
EL0 Registers" or "EL1 Physical Timer EL0 Registers".  Take your pick...

> > +#define KVM_REG_ARM_PTIMER_CTL ARM64_SYS_REG(3, 3, 14, 2, 1)
> > +#define KVM_REG_ARM_PTIMER_CVALARM64_SYS_REG(3, 3, 14, 2, 2)
> > +#define KVM_REG_ARM_PTIMER_CNT ARM64_SYS_REG(3, 3, 14, 0, 1)
> > +
> > +/* EL0 Virtual Timer Registers */
> >  #define KVM_REG_ARM_TIMER_CTL  ARM64_SYS_REG(3, 3, 14, 3, 1)
> >  #define KVM_REG_ARM_TIMER_CNT  ARM64_SYS_REG(3, 3, 14, 3, 2)
> >  #define KVM_REG_ARM_TIMER_CVAL ARM64_SYS_REG(3, 3, 14, 0, 2)
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 70110ea..d5b632d 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -626,10 +626,11 @@ static void kvm_timer_init_interrupt(void *info)
> >  int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  {
> > struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +   struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> > switch (regid) {
> > case KVM_REG_ARM_TIMER_CTL:
> > -   vtimer->cnt_ctl = value;
> > +   vtimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
> 
> Ah, interesting. Does this change anything to userspace behaviour?
> 

The only effect is that you don't get read-as-written from userspace
behavior, but we don't guarantee that anywhere in the API and the
current QEMU code doesn't rely on it.

It can't have any meaningful effect, because ISTATUS is purely a
function of the remaining state of the timer.

> > break;
> > case KVM_REG_ARM_TIMER_CNT:
> > update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
> > @@ -637,6 +638,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 
> > regid, u64 value)
> > case KVM_REG_ARM_TIMER_CVAL:
> > vtimer->cnt_cval = value;
> > break;
> > +   case KVM_REG_ARM_PTIMER_CTL:
> > +   ptimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
> > +   break;
> > +   case KVM_REG_ARM_PTIMER_CVAL:
> > +   ptimer->cnt_cval = value;
> > +   break;
> > +
> > default:
> > return -1;
> > }
> > 

Re: [PATCH v3 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit

2017-10-19 Thread Christoffer Dall
On Tue, Oct 10, 2017 at 09:47:33AM +0100, Marc Zyngier wrote:
> On Sat, Sep 23 2017 at  2:42:01 am BST, Christoffer Dall  
> wrote:
> > We don't need to save and restore the hardware timer state and examine
> > if it generates interrupts on on every entry/exit to the guest.  The
> > timer hardware is perfectly capable of telling us when it has expired
> > by signaling interrupts.
> >
> > When taking a vtimer interrupt in the host, we don't want to mess with
> > the timer configuration, we just want to forward the physical interrupt
> > to the guest as a virtual interrupt.  We can use the split priority drop
> > and deactivate feature of the GIC to do this, which leaves an EOI'ed
> > interrupt active on the physical distributor, making sure we don't keep
> > taking timer interrupts which would prevent the guest from running.  We
> > can then forward the physical interrupt to the VM using the HW bit in
> > the LR of the GIC VE, like we do already, which lets the guest directly
> 
> VE?
> 

Virtualization Extensions.  I can use GIC hardware virtualization
support or VGIC instead.

> > deactivate both the physical and virtual timer simultaneously, allowing
> > the timer hardware to exit the VM and generate a new physical interrupt
> > when the timer output is again asserted later on.
> >
> > We do need to capture this state when migrating VCPUs between physical
> > CPUs, however, which we use the vcpu put/load functions for, which are
> > called through preempt notifiers whenever the thread is scheduled away
> > from the CPU or called directly if we return from the ioctl to
> > userspace.
> >
> > One caveat is that we cannot restore the timer state during
> > kvm_timer_vcpu_load, because the flow of sleeping a VCPU is:
> >
> >   1. kvm_vcpu_block
> >   2. kvm_timer_schedule
> >   3. schedule
> >   4. kvm_timer_vcpu_put (preempt notifier)
> >   5. schedule (vcpu thread gets scheduled back)
> >   6. kvm_timer_vcpu_load
> > < We restore the hardware state here, but the bg_timer
> >   hrtimer may have scheduled a work function that also
> >   changes the timer state here.
> >   7. kvm_timer_unschedule
> > < We can restore the state here instead
> >
> > So, while we do need to restore the timer state in step (6) in all other
> > cases than when we called kvm_vcpu_block(), we have to defer the restore
> > to step (7) when coming back after kvm_vcpu_block().  Note that we
> > cannot simply call cancel_work_sync() in step (6), because vcpu_load can
> > be called from a preempt notifier.
> >
> > An added benefit beyond not having to read and write the timer sysregs
> > on every entry and exit is that we no longer have to actively write the
> > active state to the physical distributor, because we set the affinity of
> 
> I don't understand this thing about the affinity of the timer. It is a
> PPI, so it cannot go anywhere else.
> 

Ah, silly wording perhaps.  I mean that we call irq_set_vcpu_affinity()
so that the interrupt doesn't get deactivated by the GIC driver.  I can
try to reword.

How about:

  An added benefit beyond not having to read and write the timer sysregs
  on every entry and exit is that we no longer have to actively write the
  active state to the physical distributor, because we configured the
  irq for the vtimer to only get a priority drop when handling the
  interrupt in the GIC driver (we called irq_set_vcpu_affinity()), and
  the interrupt stays active after firing on the host.


> > the vtimer interrupt when loading the timer state, so that the interrupt
> > automatically stays active after firing.
> >
> > Signed-off-by: Christoffer Dall 
> > ---
> >  include/kvm/arm_arch_timer.h |   9 +-
> >  virt/kvm/arm/arch_timer.c| 238 
> > +++
> >  virt/kvm/arm/arm.c   |  19 +++-
> >  virt/kvm/arm/hyp/timer-sr.c  |   8 +-
> >  4 files changed, 174 insertions(+), 100 deletions(-)
> >
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index 16887c0..8e5ed54 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -31,8 +31,8 @@ struct arch_timer_context {
> > /* Timer IRQ */
> > struct kvm_irq_levelirq;
> >  
> > -   /* Active IRQ state caching */
> > -   boolactive_cleared_last;
> > +   /* Is the timer state loaded on the hardware timer */
> > +   boolloaded;
> 
> I think this little guy is pretty crucial to understand the flow, as
> there is now two points where we save/restore the timer:
> vcpu_load/vcpu_put and timer_schedule/timer_unschedule. Both can be
> executed on the blocking path, and this is the predicate to find out if
> there is actually something to do.
> 
> Would you mind adding a small comment to that effect?
> 

I don't mind at all, will add a comment.  How about:

/*
 * We have multiple paths which can save/restore 

Re: [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace

2017-10-19 Thread gengdongjiu


On 2017/10/7 1:31, James Morse wrote:
> Hi gengdongjiu,
> 
> On 27/09/17 12:07, gengdongjiu wrote:
>> On 2017/9/23 0:51, James Morse wrote:
>>> If this wasn't a firmware-first notification, then you're right KVM hands 
>>> the
>>> guest an asynchronous external abort. This could be considered a bug in 
>>> KVM. (we
>>> can discuss with Marc and Christoffer what it should do), but:
>>>
>>> I'm not sure what scenario you could see this in: surely all your
>>> CPU:external-aborts are taken to EL3 by SCR_EL3.EA and become firmware-first
>>> notifications. So they should always be claimed by APEI.
> 
>> Yes, if it is firmware-first we should not exist such issue.
> 
> [...]
> 
>>> What you may be seeing is some awkwardness with the change in the SError ESR
>>> with v8.2. Previously the VSE mechanism injected an impdef SError, (but they
>>> were all impdef so it didn't matter).
>>> With VSESR_EL2 KVM has to specify one, and all-zeros is a bad choice as this
>>> means 'classified as a RAS error ... unknown!'.
> 
>>> I have a patch in the upcoming SError/RAS series that changes KVMs 
>>> virtual-abort
>>> code to specify an impdef ESR for this path.
> 
> https://www.spinics.net/lists/arm-kernel/msg609589.html
> 
> 
>> Before I remember Marc and you suggest me specify the an impdef ESR (set the 
>> vsesr_el2) in
>> the userspace,
> 
> If Qemu/kvmtool wants to emulate a machine that notifies the guest about
> firmware-first RAS Errors using SError/SEI, it needs to decide when to send
> these SError and what ESR to specify.
yes, it is. I agree.

> 
> 
>> now do you want to specify an impdef ESR in KVM instead of usrspace?
> 
> No, that patch is just trying to fixup the existing cases where KVM already
> injects an SError. I'm just trying to keep the behaviour the-same:
> 
> Before the RAS Extensions the guest would always get an impdef SError ESR.
> After the RAS Extensions KVM has to pick an ESR, as otherwise the guest gets 
> the
> hardware-reset value of VSESR_EL2. On the FVP this is all-zeros, which is a 
> RAS
> encoding. That patch changes it to still be an impdef SError ESR.
> 
> 
>> if setting the vsesr_el2 in the KVM, whether user-space need to specify?
> 
> I think we need a KVM CAP API to do this. With that patch it can be wired into
> pend_guest_serror(), which takes the ESR to make pending if the CPU supports 
> it.

For this CAP API, I have made a patch in the new series patches.
> 
> It's not clear to me whether its useful for user-space to make an SError 
> pending
> even if it can't set the ESR
why it can not set the ESR?
In the KVM, we can return a encoding fault info to userspace, then user space 
can
specify its own ESR for guest.
I already made a patch for it.


> 
> [...]
> 
>>> Because the SEI notification depends on v8.2 I'd like to get the SError/RAS
>>> series posted (currently re-testing), then I'll pick up enough of the 
>>> patches
>>> you've posted for a consolidated version of the series, and we can take the
>>> discussion from there.
> 
>> James, it is great, we can make a consolidated version of the series.
> 
> We need to get some of the wider issues fixed first, the big-ugly one is
> memory_failure_queue() not being NMI safe. (this isn't a problem for SEA as 
> this
> would only become re-entrant if the kernel text was corrupt). It is a problem
> for SEI and SDEI.
 For memory_failure_queue(), I think the big problem is it is in a process 
context, not error handling context.
there are two context. and the memory_failure_queue() is scheduled later than 
the error handling.


> 
> 
>>> I'd still like to know what your firmware does if the normal-world believes 
>>> its
>>> masked physical-SError and you want to hand it an SEI notification.
> 
> Aha, thanks for this:
> 
>> firstly the physical-SError that happened in the EL2/EL1/EL1 can not be 
>> masked if SCR_EL3.EA is set.
> 
> Masked for the CPU because the CPU can deliver the SError to EL3.
> 
> What about software? Code at EL1 and EL2 each have a PSTATE.A bit they may 
> have
> set. HCR_EL2.VSE respects EL1's PSTATE.A ... the question is does your 
> firmware
> respect the PSTATE.A value of the exception level that SError are routed to?

Before route to the target EL, software set the spsr_el3.A to 1, then "eret", 
the PSTATE.A will be to 1.

Note:
PSTATE.A is shared by different EL, in the hardware, it is one register, not 
many registers.
spsr_elx has more registers, such as spsr_el1, spsr_el2, spsr_el3.


> 
> 
>> when trap to EL3, firmware will record the error to APEI CPER from reading 
>> ERR* RAS registers.
>>
>> (1) if HCR_EL2.TEA is set to 1, exception come from EL0, El1. firmware knows 
>> this
> 
> HCR_EL2.TEA covers synchronous-external-aborts. For SError you need to check
> HCR_EL2.AMO. Some crazy hypervisor may set one and not the other.
sorry, it is typo issue, should check HCR_EL2.AMO
> 
> 
>> SError come from guest OS, copy the elr_el3 to elr_el2, copy ESR_El3 to
>> ESR_EL2.
> 
> The EC value in the ELR 

Re: [PATCH v3 11/20] KVM: arm/arm64: Move timer save/restore out of the hyp code

2017-10-19 Thread Christoffer Dall
On Mon, Oct 09, 2017 at 06:47:42PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > As we are about to be lazy with saving and restoring the timer
> > registers, we prepare by moving all possible timer configuration logic
> > out of the hyp code.  All virtual timer registers can be programmed from
> > EL1 and since the arch timer is always a level triggered interrupt we
> > can safely do this with interrupts disabled in the host kernel on the
> > way to the guest without taking vtimer interrupts in the host kernel
> > (yet).
> > 
> > The downside is that the cntvoff register can only be programmed from
> > hyp mode, so we jump into hyp mode and back to program it.  This is also
> > safe, because the host kernel doesn't use the virtual timer in the KVM
> > code.  It may add a little performance performance penalty, but only
> > until following commits where we move this operation to vcpu load/put.
> > 
> > Signed-off-by: Christoffer Dall 
> > ---
> >  arch/arm/include/asm/kvm_asm.h   |  2 ++
> >  arch/arm/include/asm/kvm_hyp.h   |  4 +--
> >  arch/arm/kvm/hyp/switch.c|  7 ++--
> >  arch/arm64/include/asm/kvm_asm.h |  2 ++
> >  arch/arm64/include/asm/kvm_hyp.h |  4 +--
> >  arch/arm64/kvm/hyp/switch.c  |  6 ++--
> >  virt/kvm/arm/arch_timer.c| 40 ++
> >  virt/kvm/arm/hyp/timer-sr.c  | 74 
> > +---
> >  8 files changed, 87 insertions(+), 52 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> > index 14d68a4..36dd296 100644
> > --- a/arch/arm/include/asm/kvm_asm.h
> > +++ b/arch/arm/include/asm/kvm_asm.h
> > @@ -68,6 +68,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, 
> > phys_addr_t ipa);
> >  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> > +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> > +
> >  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >  
> >  extern void __init_stage2_translation(void);
> > diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> > index 14b5903..ab20ffa 100644
> > --- a/arch/arm/include/asm/kvm_hyp.h
> > +++ b/arch/arm/include/asm/kvm_hyp.h
> > @@ -98,8 +98,8 @@
> >  #define cntvoff_el2CNTVOFF
> >  #define cnthctl_el2CNTHCTL
> >  
> > -void __timer_save_state(struct kvm_vcpu *vcpu);
> > -void __timer_restore_state(struct kvm_vcpu *vcpu);
> > +void __timer_enable_traps(struct kvm_vcpu *vcpu);
> > +void __timer_disable_traps(struct kvm_vcpu *vcpu);
> >  
> >  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> >  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> > diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> > index ebd2dd4..330c9ce 100644
> > --- a/arch/arm/kvm/hyp/switch.c
> > +++ b/arch/arm/kvm/hyp/switch.c
> > @@ -174,7 +174,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> > __activate_vm(vcpu);
> >  
> > __vgic_restore_state(vcpu);
> > -   __timer_restore_state(vcpu);
> > +   __timer_enable_traps(vcpu);
> >  
> > __sysreg_restore_state(guest_ctxt);
> > __banked_restore_state(guest_ctxt);
> > @@ -191,7 +191,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  
> > __banked_save_state(guest_ctxt);
> > __sysreg_save_state(guest_ctxt);
> > -   __timer_save_state(vcpu);
> > +   __timer_disable_traps(vcpu);
> > +
> > __vgic_save_state(vcpu);
> >  
> > __deactivate_traps(vcpu);
> > @@ -237,7 +238,7 @@ void __hyp_text __noreturn __hyp_panic(int cause)
> >  
> > vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
> > host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -   __timer_save_state(vcpu);
> > +   __timer_disable_traps(vcpu);
> > __deactivate_traps(vcpu);
> > __deactivate_vm(vcpu);
> > __banked_restore_state(host_ctxt);
> > diff --git a/arch/arm64/include/asm/kvm_asm.h 
> > b/arch/arm64/include/asm/kvm_asm.h
> > index 26a64d0..ab4d0a9 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -55,6 +55,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, 
> > phys_addr_t ipa);
> >  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> > +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> > +
> >  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >  
> >  extern u64 __vgic_v3_get_ich_vtr_el2(void);
> > diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> > b/arch/arm64/include/asm/kvm_hyp.h
> > index 4572a9b..08d3bb6 100644
> > --- a/arch/arm64/include/asm/kvm_hyp.h
> > +++ b/arch/arm64/include/asm/kvm_hyp.h
> > @@ -129,8 +129,8 @@ void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> >  void __vgic_v3_restore_state(struct 

Re: [PATCH v3 09/20] KVM: arm/arm64: Use separate timer for phys timer emulation

2017-10-19 Thread Christoffer Dall
On Mon, Oct 09, 2017 at 06:23:45PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > We were using the same hrtimer for emulating the physical timer and for
> > making sure a blocking VCPU thread would be eventually woken up.  That
> > worked fine in the previous arch timer design, but as we are about to
> > actually use the soft timer expire function for the physical timer
> > emulation, change the logic to use a dedicated hrtimer.
> > 
> > This has the added benefit of not having to cancel any work in the sync
> > path, which in turn allows us to run the flush and sync with IRQs
> > disabled.
> > 
> > Signed-off-by: Christoffer Dall 
> > ---
> >  include/kvm/arm_arch_timer.h |  3 +++
> >  virt/kvm/arm/arch_timer.c| 18 ++
> >  2 files changed, 17 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index dcbb2e1..16887c0 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -47,6 +47,9 @@ struct arch_timer_cpu {
> > /* Work queued with the above timer expires */
> > struct work_struct  expired;
> >  
> > +   /* Physical timer emulation */
> > +   struct hrtimer  phys_timer;
> > +
> > /* Background timer active */
> > boolarmed;
> >  
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index c2e8326..7f87099 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -178,6 +178,12 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct 
> > hrtimer *hrt)
> > return HRTIMER_NORESTART;
> >  }
> >  
> > +static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
> > +{
> > +   WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> > +   return HRTIMER_NORESTART;
> > +}
> > +
> 
> So what prevents this handler from actually firing? Is it that we cancel
> the hrtimer while interrupts are still disabled, hence the timer never
> fires? If that's the intention, then this patch is slightly out of
> place, as we haven't moved the timer sync within the irq_disable() section.
> 
> Or am I missing something obvious?
> 

No you're not missing anything, indeed, that is broken.  I think I had
in the back of my mind that we disable stuff in the world-switch still,
but that obviously doesn't apply to the soft timers.

I'll just move this patch following the next one where interrupts are
disabled.

Nice catch!

> >  bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
> >  {
> > u64 cval, now;
> > @@ -255,7 +261,7 @@ static void kvm_timer_update_state(struct kvm_vcpu 
> > *vcpu)
> >  }
> >  
> >  /* Schedule the background timer for the emulated timer. */
> > -static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> > +static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> >   struct arch_timer_context *timer_ctx)
> >  {
> > struct arch_timer_cpu *timer = >arch.timer_cpu;
> > @@ -267,7 +273,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> > return;
> >  
> > /*  The timer has not yet expired, schedule a background timer */
> > -   soft_timer_start(>bg_timer, kvm_timer_compute_delta(timer_ctx));
> > +   soft_timer_start(>phys_timer, 
> > kvm_timer_compute_delta(timer_ctx));
> >  }
> >  
> >  /*
> > @@ -424,7 +430,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > kvm_timer_update_state(vcpu);
> >  
> > /* Set the background timer for the physical timer emulation. */
> > -   kvm_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> > +   phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> >  
> > if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > kvm_timer_flush_hwstate_user(vcpu);
> > @@ -447,7 +453,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  * This is to cancel the background timer for the physical timer
> >  * emulation if it is set.
> >  */
> > -   soft_timer_cancel(>bg_timer, >expired);
> > +   soft_timer_cancel(>phys_timer, NULL);
> 
> Right, that now explains the "work" test in one of the previous patches.
> 

Yes, I've moved the addition of the test to this patch which actually
uses is.

> >  
> > /*
> >  * The guest could have modified the timer registers or the timer
> > @@ -507,6 +513,9 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> > hrtimer_init(>bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> > timer->bg_timer.function = kvm_bg_timer_expire;
> >  
> > +   hrtimer_init(>phys_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> > +   timer->phys_timer.function = kvm_phys_timer_expire;
> > +
> > vtimer->irq.irq = default_vtimer_irq.irq;
> > ptimer->irq.irq = default_ptimer_irq.irq;
> >  }
> > @@ -615,6 +624,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
> > struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >